Reference

This document contains reference information for Google Sitemap Generator administrators.

In this document:

  1. Installation Command
  2. Administration Command for Windows
  3. Administration Command for Linux
  4. Status Values
  5. Site Configuration Settings
  6. Sitemap Type Configuration Settings
    1. Web Sitemap configuration
    2. Mobile Sitemap configuration
    3. Code Search Sitemap configuration
    4. Blog Search configuration

Installation Command

The installation command is for Linux only.

This is the command format:

sitemap-install/install.sh [option]...

The option variable can be one of the following:

All of these parameters are optional. Use a space to separate multiple options.

Example: sitemap-install/install.sh -t /usr/sbin/apache2ctl -g www-data

Back to top

Administration Command for Windows

This command controls the Google Sitemap Generator service. To start or stop the service, use the Windows Services interface.

This is the command format:

SitemapService.exe option

The option variable can be one of the following:

You can specify only one option at a time.

Back to top

Administration Command for Linux

This command controls the Google Sitemap Generator daemon.

This is the command format:

sitemap-daemon option

The option variable can be one of the following:

You can specify only one option at a time.

Status Values

The Site Status page displays status information under the headers URL collectors and Sitemap creators. This section describes the meaning of the status values.

These are the status values for the URL collectors (webserver filter, file scanner, and log parser):

These are the status values for the Sitemap collectors (Web, Mobile, Code Search, and Blog Search):

Back to top

Site Configuration Settings

The following table lists and explains the site configuration settings.

Option or Section Description Default Value
Host name The name used in generated Sitemaps. Google Sitemap Generator deduces the host name by monitoring site traffic.
Pathname for log files The Apache or IIS web server log that the Google Sitemap Generator log parser monitors. The value can resolve to a file or a directory. If the value resolves to a directory, the log parser monitors all files in the directory, but it does not monitor files in any subdirectories.

Linux example: /var/log/apach2/access.log

Windows example: C:\WINDOWS\system32\LogFiles\W3SVC1

System dependent
Resource limits Specifies the resources that Google Sitemap Generator uses on this web server. You can override these site-level default values on a per-site basis.

Maximum age of URLs included in Sitemap file lets you exclude old URLs from Sitemaps files, to prevent files from becoming bloated with URLs that are already known.

Maximum number of URLs in memory and Maximum number of URLs on disk help you limit the web server resources used by Google Sitemap Generator. As Google Sitemap Generator finds URLs, it enforces these limits and removes older URLs as needed. These values apply to unique URLs; each URL can appear only once in the cache and once on disk.

The number of URLs in memory should be smaller than the number of URLs on disk, because the memory cache is periodically written to disk.

Maximum age of URLs included in Sitemap file: 365 days

Maximum number of URLs in memory: 100000

Maximum number of URLs on disk: 500000

URL collectors Specifies the way that URLs are collected.

You can activate or deactivate the web server filter, file scanner, and log parser.

The web server filter runs continuously when activated, but the file scanner and log parser run at intervals that you can specify.

Web server filter: Default value is set at installation.

File scanner and log parser: Disabled by default.

Default execution interval for the file scanner and log parser: 1440 minutes

URL query fields Includes specified URL query fields in generated Sitemaps, overriding the default exclusion of all query fields.

Read the privacy notice and test the generated Sitemaps to ensure that you do not inadvertently compromise user privacy by including inappropriate query fields in Sitemaps.

All query fields are excluded by default.
Sitemap types Enables and disables the generation of URLs for Web, Mobile, Code Search, and Blog Search. Enabled: Web

Disabled: All others

Back to top

Sitemap Type Configuration Settings

This section lists and explains the configuration settings that are specific to specific types of Sitemaps. It contains sections for the following types of Sitemaps:


Web Sitemap Configuration

The following table describes the configuration settings for Web Sitemaps.

Option or Section Description Default Values
Sitemap generation schedule Specifies the time of the first Sitemap generation and the frequency of subsequent Sitemap generation.

The default value of 1 day can cause some lag between the time that you make configuration changes and the time that you see the results in a Web Sitemap file. To test the results of configuration changes, or to test Sitemap generation in general, consider shortening this schedule, at least temporarily.

You can use the start date and time to defer the effective start time for Google Sitemap Generator, even after it is running and Sitemap types are enabled.

Start time and date: Installation time

Interval: 1 day

Sitemap file settings Configures settings that affect each Sitemap file.

Sitemap file name specifies the first Sitemap file for this Sitemap type.

Maximum number of URLs specifies the maximum number of unique URLs per file. For example, if this value is set to 20000 and the number of URLs is 100000, Google Sitemap Generator creates five Sitemap files.

Maximum file size specifies the size of each Sitemap file.

Sitemap file compression: Enabled

File name: web_sitemap_auto-generated-number.xml

Maximum number of URLs: 20000

Maximum file size: 5120 KB

Sitemap file submission Specifies the mechanism for informing search engines about your Sitemaps. .

The Include Sitemap URL in robots.txt option lets the robots.txt file point to the Sitemap file. Search engine crawlers will then follow the link to the URLs in the Sitemap file. The line looks something like this: Sitemap://example.com/sitemap.xml/gz

Search engine notification URLs provide the ping URLs that notify various search engines about your Sitemaps. Google Sitemap Generator automatically adds the Sitemap filename to the ping request.

This section enables or disables automatic submission of Web Sitemaps.

Default values are determined by an installation question about whether to start up with automatic submission of Web Sitemaps enabled or disabled. The default value for the installation question is to start with automatic submission disabled.

Sitemap URL filter Defines URL patterns that determine how Google Sitemap Generator selects the URLs that are included in each Sitemap.

The Time-to-live for URLs value specifies how long to keep content in your Sitemap file.

The values under Included URL patterns (inclusion patterns) and Excluded URL patterns (exclusion patterns) determine how Google Sitemap Generator selects the URLs that it sends to the search engines.

How the patterns work

A URL is included in Sitemaps files if both of the following are true:

  • It matches an inclusion pattern.
  • It does not match an exclusion pattern.

A URL is excluded from Sitemap files if either of the following is true:

  • It matches an exclusion pattern.
  • It is included by the inclusion and exclusion patterns in this section, but is prohibited by the settings under URL query fields.

All URLs are excluded and Sitemaps files are empty, if the following is true:

  • Both Included URL patterns and Excluded URL patterns fields are empty.

Pattern syntax rules

Construct patterns as follows:

  • Start with a slash (/).
  • Use the asterisk (*) to match zero or more characters.
  • Do not use http: or https: at the start of the pattern.

Example

For the pattern /*staff.html, these are some matched URLs:

  • http://www.example.com/sports-staff.html
  • http://www.example.org/20080506staff.html
Included URL patterns: /* (all files)

Excluded URL patterns: Files whose names contain "password"; all .swf, .js, .css, .png, .gif, and .jpg files; and robots.txt.

Excluded URL patterns: none

Back to top


Mobile Sitemap Configuration

The following table describes the configuration settings for Mobile Sitemaps.

Option or Section Description Default Values
Sitemap generation schedule Specifies the start time for generating Sitemaps and the frequency of Sitemap generation.

You can use the start date and time to defer the start time for Google Sitemap Generator, even after Google Sitemap Generator is running and Mobile Sitemap types are enabled.

Start time and date: Installation time

Interval: 1 day

Sitemap file settings Configures settings that affect the Sitemap files.

Sitemap file name specifies the first Sitemap file for this Sitemap type.

Maximum number of URLs specifies the maximum number of unique URLs per file. For example, if this value is set to 20000 and the number of URLs is 100000, Google Sitemap Generator creates five Sitemap files.

Maximum file size specifies the size of each Sitemap file.

Sitemap file compression: Enabled

File name: mobile_sitemap_auto-generated-number.xml

Maximum number of URLs: 20000

Maximum file size: 5120 KB

Sitemap URL filter Defines URL patterns that determine how Google Sitemap Generator selects the URLs that are included in each Sitemap.

Refer to the information for this setting under Web Sitemaps.

Refer to the information for this setting under Web Sitemaps.

Back to top


Code Search Sitemap configuration

The following table describes the configuration settings for Code Search Sitemaps.

Option or Section Description Default Values
Sitemap generation schedule Specifies the start time for generating Sitemaps and the frequency of Sitemap generation.

You can use the start date and time to defer the start time for Google Sitemap Generator, even after Google Sitemap Generator is running and Code Search Sitemap types are enabled.

Start time and date: Installation time

Interval: 1 day

Sitemap file settings Configures settings that affect the Sitemap files.

Sitemap file name specifies the first Sitemap file for this Sitemap type.

Maximum number of URLs specifies the maximum number of unique URLs per file. For example, if this value is set to 20000 and the number of URLs is 100000, Google Sitemap Generator creates five Sitemap files.

Maximum file size specifies the size of each Sitemap file.

Sitemap file compression: enabled

File name: codesearch_sitemap_auto-generated-number.xml

Maximum number of URLs: 20000

Maximum file size: 5120 KB

Sitemap URL filter Defines URL patterns that determine how Google Sitemap Generator selects the URLs that are included in each Sitemap.

Refer to the information for this setting under Web Sitemaps.

Be careful to ensure that the file extensions specified here actually match the file extensions in use. If they do not match, the Sitemap could be empty.

Included URL patterns: All files with the extensions .vb, .c, .cxx, .cpp, .h, .cc, and .java.

Excluded URL patterns: none

 

Back to top


Blog Search Configuration

The following table describes the configuration settings for Blog Search Sitemaps. For Blog Search, Google Sitemap Generator does not generate and then submit Sitemaps files, as it does for other content types. Instead, Google Sitemap Generator monitors the web server traffic for new content and then dynamically pings search engines periodically when it finds the new content.

Option or Section Description Default Values
Ping schedule Specifies the frequency with which Google Sitemap Generator pings search engines to notify them about new blog content. Start time and date: Installation time

Interval: 1 day

Sitemap URL filter Defines URL patterns that determine how Google Sitemap Generator selects the URLs that are sent to the search engines. You can use this filter to specify the blogs that you want to include and exclude. Included URL patterns: * (all files)

Excluded URL patterns: none

Back to top

 

Updated on