Google Sitemap Generator (Beta) is software that improves the ability of search engines to find the content of your websites. Once you install and configure Google Sitemap Generator on your web server, it analyzes the way that users access content, then builds Sitemap files that contain the URLs that you want search engines to find.
Google Sitemap Generator creates industry-standard Web Sitemaps and automatically submits them to the search engines of your choice. It can also create Sitemaps for Google's Mobile and Code Search services. In the case of Blog Search, Google Sitemap Generator provides analogous services, but rather than creating Sitemap files, Google Sitemaps Generator collects the URLs and periodically pings them to Google.
This document is an overview of Google Sitemap Generator. For more information about the Sitemap protocol, see http://www.sitemaps.org/. For information about Google-specific functions and Webmaster Tools, see http://google.com/support/webmasters/.
In this document:
Google Sitemap Generator takes a fresh approach to Sitemap generation. The previous generation of Sitemap generators created Sitemaps by crawling websites, so they did not necessarily improve on the coverage provided by search engine crawlers. In contrast, Google Sitemap Generator monitors your web server traffic and detects updates to your site whenever a user accesses a new page. The main features of Google Sitemap Generator are:
Google previously released sitemapgen, a Python-based tool, to Sourceforge. In comparison to sitemapgen, Google Sitemap Generator is a next-generation tool that relies on web server filtering rather than crawling, provides enhanced features, and supports more formats.
This document was written for web server administrators. The document assumes that you know about the contents of your site and about your web server environment, but it does not require advanced technical knowledge.
This document uses the following typographic conventions:
Monospace fontindicates a command, or another type of literal value that you enter as-is.
At the time of this Beta release, we know that this document lacks some real-world information, and we hope to gain that while the product is in Beta. We hope that the community provides comments, feedback, and content, to increase documentation coverage over time.
Check our system requirements and determine how best to deploy Google Sitemap Generator in your environment. For full installation information, refer to the Installation document.
Google Sitemap Generator runs on a variety of operating systems and web servers. See the Installation document for the Windows and Linux prerequisites.
Based on our performance tests, Google Sitemap Generator has minimal performance impact on a web server.
For a site that is served by multiple, load-balanced web servers, you can choose from the following deployment options:
Once you install Google Sitemap Generator on the web server, it can serve all sites on the server.
In the administration console, you can enable or disable the use of Google Sitemap Generator for each site, and you can configure each site separately.
Google Sitemap Generator includes the following configuration options:
For more information about configuration, see the configuration document.
How are search engines informed about the Sitemaps that Google Sitemap Generator creates?
To submit Web Sitemaps, you have the following options:
To submit the Google-specific Mobile and Code Search Sitemaps, you use Google Webmaster Central.
To submit Blog Search content, you specify the frequency with which Google Sitemap Generator pings the Google Blog Service. Google Sitemap Generator does not create Sitemap files for Blog Search. Instead, it collects the URLs in its internal database and then sends them to Google.
Note: This document generally does not distinguish between Blog Search and other types of content, although there are no physical Sitemap files associated with Blog Search, as there are for other types of content.
Google Sitemap Generator is being released through the open source community, so that you can download and examine it. We already already know that you don’t want Google Sitemap Generator to send any private data to Google, and the only information that you'll be sharing with Google is the Sitemap file!
A Sitemap file optionally contains a change frequency and priority for each URL. Google Sitemap Generator sets the priority based on page views, but the resulting Sitemap file does not contain any information about page view statistics or other information you wouldn't want to expose. There's no difference between a generated Sitemap file and a manually created Sitemap file.
To prevent user data from being exposed, Google Sitemap Generator removes all URL query fields before adding the URL to a Sitemap. URL query fields are the name/value parameters that follow the question mark in a URL, and they often contain user information, such as name, password, or other private details. If you want to include specified query fields, you can list them in the administration console.
You are responsible for ensuring that Web Sitemap files do not contain user data when you submit them to Google. To accomplish this, you can perform test generations of Web Sitemap files and examine them for any URLs that should not be included. For more information, refer to the testing section of the configuration document.
If you have questions or want to know what others are asking, check out our discussion group.