Return to Google Code homepage

Google Sitemap Generator

  • Introduction
  • Installation
  • Configuration
  • Reference

Configuration

The Google Sitemap Generator Admin Console lets you configure the way that Google Sitemap Generator works and the types of URLs that it generates. This document explains how to use the Admin Console to set up Google Sitemap Generator. Use this document with the reference document, which provides details on command lines, configuration settings, and status values.

In this document:

  1. Before you start
    1. Enabling HTTPS
    2. Changing the administration port
  2. Setting up
    1. Displaying the Admin Console and logging in
    2. Enabling remote access
    3. Managing site activation
    4. Setting configuration values
    5. Applying changes
    6. Testing Sitemaps before going live
  3. Changing the Administrator password
  4. Troubleshooting

Before You Start

Before you configure Google Sitemap Generator, you might need to do some system configuration.


Enabling HTTPS This requirement is removed after build 20091231

You can use the Admin Console locally without enabling HTTPS. If the web server does not have HTTPS enabled:

  • You can access the Admin Console locally on the web server, by logging into the machine where you installed Google Sitemap Generator. Open a local browser and use HTTP to log in to the Admin Console.
  • You can use a command line tool locally on the web server.

However, to use the Admin Console remotely, you must have HTTPS enabled on the web server and enable the remote access feature.

To set up HTTPs on Windows:

  1. Open Internet Information Services (IIS) Manager.
  2. Edit the Google Sitemap Generator Admin Console entity.

    By default, the TCP port is set to 8181 and the SSL port is unset.

  3. Change the configuration to specify the SSL port, using the default port number 8181 or any unused number.
  4. Provide an additional, different port number for the TCP port, although Google Sitemap Generator will not use that port.

To set up HTTPS on Linux:

  1. Edit the file /usr/local/google-sitemap-generator/conf/httpd.conf.
  2. In the VirtualHost section, edit the SSL settings.
  3. Save the file.
  4. Restart Apache to effect the changes.


Changing the administration port

If the default Admin Console port, 8181, is already in use, you can change the port as follows:

  • On Windows: Open Internet Information Services (IIS) Manager and edit the Google Sitemap Generator Admin Console entity.
  • On LInux: Edit /usr/local/google-sitemap-generator/conf/httpd.conf, and restart Apache after saving the file.

Back to top

Setting Up

This section describes the steps you'll need to take during your first-time setup of Google Sitemap Generator.


Displaying the Admin Console and logging in

The first time that you log in to the Admin Console, open a browser on the web server, using this URL: https://localhost:8181. When you see the log-in page, enter the password that you created during the installation process.

After you've enabled remote access, you can use the hostname or IP address of the server in the Admin Console URL, as in these examples:

  • https://apache31.example.com:8181
  • https://12.345.67.89:8181

Once you've logged in, you'll see that Google Sitemap Generator has detected the websites on your web server. For each site, there's a site status page, a site configuration page, and a set of Sitemap types that you can configure.

The state of Google Sitemap Generator at first log-in is as follows:

  • Collection of URLs and creation of Sitemap files is enabled for Web Sitemaps, but disabled for all other types of Sitemaps. Generation of Web Sitemaps occurs once per day.
  • Automatic submission of Sitemaps is enabled or disabled, depending on your setting during installation.

Back to top


Enabling remote access

To use the Admin Console from your own computer, rather than from the web server, you'll need to enable remote access. You can use the Admin Console or a command.

To use the Admin Console:

  1. On the web server, log in to the Admin Console.
  2. Click Preferences.
  3. On the Preferences page, select "Allow remote access to the Administration Console."

To use a command, use one of the following:

  • Windows: SitemapService.exe remote_admin enable
  • Linux: /usr/local/google-sitemap-generator/bin/sitemap-daemon remote_admin enable

Note: You can use the command line tool to perform some other basic functions for Google Sitemap Generator, but it has a limited set of functionality. Refer to the Reference document for information about the command format and options.


Managing site activation

When you first log in, you'll see that Google Sitemap Generator has already detected the sites and listed them on the Admin Console Dashboard. Sites are activated by default. You can enable or disable use of Google Sitemap Generator for any site.

To disable or enable a site:

  1. From the Admin Console Dashboard, click Manage sites.
  2. Deselect any site for which you do not want to generate and submit Sitemaps, or select the site to enable it.
  3. Click Save.

Back to top


Setting configuration values

When you configure Google Sitemap Generator, you set default values for all sites on a web server. You can then accept or supersede those values for each site on the web server. There are two types of configurations: site configuration and Sitemap type configuration.

The following table is an overview of the configuration process.

Order Type of configuration How to View
1 Default site values for all sites on the web server. Dashboard > Default site settings
2 Default Sitemap-type values for all sites on the web server. Dashboard > Default site settings > Sitemaps
3 Site-specific site values that override default values. Dashboard > a site name > Site configuration
4 Site-specific values for specific types of Sitemaps. Dashboard > a site name > Sitemaps > a specific Sitemap type, such as Web, Mobile, and so on

The next sections describe the steps for setting default values and site-specific values.

Setting default values

Your first step is to examine and configure the default values.

To specify default values for all sites:

  1. Set the default site configuration.
    1. From the Dashboard, click Default site settings. Under Default site settings, you set values that apply to all sites on this web server. You can override these values for specific sites.
    2. Under Resource limits, specify the default values for all sites. These default values can be changed on a per-site basis.
    3. Under URL collectors, specify the way that you want Google Sitemap Generator to find URLs for all sites.
    4. Under URL query fields, read the special privacy notice and then specify URL query fields that you want to include in Sitemaps. Because Google Sitemap Generator removes all query fields from URLs in order to protect user privacy, you must explicitly specify any query field that you want URLs to include.
    5. Under Sitemap types, select the types of Sitemaps that you want to allow for all websites on this web server.
    6. Click Save.
  2. Set the default configuration for specific Sitemap types.
    1. In the left navigation bar, click Sitemap Types.
    2. Click Web, Mobile, Code Search, or Blog Search. When you click a Sitemap type, a message at the top of the configuration page notifies you if the Sitemap type is disabled. You can enable the Sitemap type by responding to the message.
    3. Configure the default values for that type of Sitemap. For information about the options, refer to the Reference document.

      Note: For Web Sitemaps, notice the Sitemap file submission section. If you chose the installation option to start with automatic submission disabled, the robots.txt setting is unselected, and the ping URLs are disabled. Do not fill these in until you are ready to allow automatic submission of Web Sitemaps.

    4. Click Save at the bottom of each page.

Back to top

Setting site-specific values

After you've examined and configured the default values, you can configure site-specific values. First you'll set the site configuration, and then you'll set the values for the Sitemap types you'll be using.

To specify customized values on a per-site basis:

  1. Set the site-specific configuration.
    1. From the Dashboard, click the name of a site. The Site status view appears.
    2. In the left navigation bar, click Site configuration.
    3. Override the default settings as needed, to meet specific needs of this site.
    4. Click Save.
  2. Set the site-specific configuration for specific Sitemap types.
    1. In the left navigation bar, click Sitemaps.
    2. Click the name of a specific type of Sitemap, such as Web, Mobile, Code Search, or Blog Search.
      • If the message "This Sitemap is disabled," appears at the top of the page, the Sitemap type is disabled in the default site configuration or in the site-specific site configuration. Click Enable to enable Google Sitemap Generator to collect the Sitemap type for this site.
      • If the message "Your Sitemap URL filter excludes all URLs from this Sitemap" appears at the top of the page, make sure that some inclusion rules appear.
    3. Override the default settings as needed to meet specific needs of this site. For information about the options, refer to the Reference document.
    4. Click Save.

You're now done with the configuration.

Back to top


Applying changes

Google Sitemap Generator saves and applies your settings each time you click the Save button.

Changes apply only to Sitemaps that have not yet been submitted to search engines. For example, suppose you specify new filters that exclude additional types of URLs from a Sitemaps file. The results are as follows:

  • When Google Sitemap Generator generates the next Sitemap file, it excludes those URLs from the local copy of the Sitemap file.
  • If the old Sitemap file has been submitted to search engines and if the search engines have crawled the Sitemap, the excluded URLs cannot be removed from search engines.
  • For Blog Search, if Google Sitemap Generator has pinged the URLs to Google, the excluded URLs cannot be removed from search engines.

Back to top


Testing Web Sitemaps before going live

Test your configuration by viewing sample Sitemap files, and then modifying the configuration until each type of Sitemap file is exactly what you want it to be.

Warning: It is important to execute this testing procedure. You are responsible for verifying that you are not exposing sensitive content or user data to the search engine.

The following steps use Web Sitemaps to illustrate the testing procedure. You can adapt this method to other types of Sitemaps except for Blog Search.

Perform the following steps for one site at a time.

  1. From the Dashboard, click the first site, click Sitemap types, and then click Web.
  2. In the Sitemap file settings area, make sure that automatic submission of Sitemaps is disabled. Unless it is already deselected, deselect Include Sitemap URL in robots.txt. If any ping commands are enabled, disable them.
  3. Under Sitemap generation schedule, modify the schedule to generate Sitemaps frequently, such as every ten minutes.
  4. Under Sitemap file settings, specify a test version of the Sitemap file name. Never use a real Sitemap file name when you are testing, to ensure that the file is not found by search engine crawlers.
  5. Click Save.
  6. In the left navigation bar, click Site configuration.
  7. On the Site configuration page, under Sitemap types enable Web Sitemaps, and then click Save to save these settings.
  8. Let Sitemap generation run for as long as necessary to capture sufficient URLs from the your website's live traffic. You can start with approximately 30 minutes and adjust the timing as needed.

When the testing time has passed, do this:

  1. In the Admin Console, from the Dashboard, click the first site, click Sitemap types, click Web.
  2. Under Sitemap generation schedule, modify the schedule again to reduce the frequency of Sitemap generation and reduce load on your web server.
  3. Examine the generated Sitemap file, looking for URLs that contain the following types of information:
    • Administration URLs
    • Information about users who are logged in
    • Pages that you don't want to make public
    • Overly complex URLs
    • Query fields that you did not intend to allow
  4. In the Admin Console, in the default site configuration or the site-specific configuration, modify the Sitemap URL filter. Under Excluded URL patterns, specify patterns that exclude the unwanted URLs. Repeat the process until the Sitemaps file contains no unwanted URLs.
  5. Run Google Sitemap Generator in test mode for a period of time to catch issues that occur infrequently.

When you are satisfied with the Sitemaps file, do the following:

  1. Under Sitemap file settings, change the Sitemap file name to its production name.
  2. In the Sitemap file submission section, enable submission of Web Sitemaps by entering the search engine ping URLs. (Submit other types of Sitemaps by using the methods described on Google Webmaster Central.)
  3. Save all your changes.
  4. Monitor the results of your Sitemaps on the live site. For information on how to do this, view the Webmaster Help Center.

Back to top

Changing the Administrator Password

To change your password:

  1. Click Preferences in the upper right corner of the page.
  2. On the Preferences page, click Change password.
  3. Enter your password, then enter and confirm the new password.
  4. Click Save.

If you forget your password and need to reset it, use one of the following commands:

  • Windows: SitemapService.exe reset_password
  • Linux: /usr/local/google-sitemap-generator/bin/sitemap-daemon reset_password

Back to top

Troubleshooting

The following table describes some issues that could arise and gives troubleshooting suggestions. Please share your findings with the community, to increase the information in this area.

Issue Suggested Investigations
Sitemaps files are generated but empty In the Sitemap URL filter section of the default and site-specific Site configuration pages, is there a rule that includes URLs? If the default value, the asterisk wildcard (*), is missing from Included URL patterns, and if no other patterns are present, no URLs are included in generated Sitemaps files.

In the same section, are there exclusion rules that exclude all URLs?

Is there live traffic on the site? Is it possible that the web server is down or that a network error has occurred?

Code Search Sitemap file is empty Are the file extensions specified in the URL filters the same as the file extensions used by the actual files?

 

 

 

Back to top

Updated on