Wednesday, February 27, 2008 at 6:00 PM
Last spring, the Sitemaps protocol was expanded to include the autodiscovery of Sitemaps using robots.txt to let us and other search engines supporting the protocol know about your Sitemaps. We subsequently also announced support for Sitemap cross-submissions using Google Webmaster Tools, making it possible to submit Sitemaps for multiple hosts on a single dedicated host. So it was only time before we took the next logical step of marrying the two and allowing Sitemap cross-submissions using robots.txt. And today we're doing just that.
We're making it easier for webmasters to place Sitemaps for multiple hosts on a single host and then letting us know by including the location of these Sitemaps in the appropriate robots.txt.
How would this work? Say for example you want to submit a Sitemap for each of the two hosts you own, www.example.com and host2.google.com. For simplicity's sake, you may want to host the Sitemaps on one of the hosts, www.example.com. For example, if you have a Content Management System (CMS), it might be easier for you to change your robots.txt files than to change content in a directory.
You can now exercise the cross-submission support via robots.txt (by letting us know the location of the Sitemaps):
a) The robots.txt for www.example.com would include:
Sitemap: http://www.example.com/sitemap-www-example.xml
b) And similarly, the robots.txt for host2.google.com would include:
Sitemap: http://www.example.com/sitemap-host2-google.xml
By indicating in each individual host's robots.txt file where that host's Sitemap lives you are in essence proving that you own the host for which you are specifying the Sitemap. And by choosing to host all of the Sitemaps on a single host, it becomes simpler to manage your Sitemaps.
We are making this announcement today on Sitemaps.org as a joint effort. To see what our colleagues have to say, you can also check out the blog posts published by Yahoo! and Microsoft.


11 comments:
While it sounds completely silly, are there any adverse effects if you accidentally point your sitemap to the wrong file where your CMS lives?
I'm thinking in the case case where you've got possibly hundreds of domains going through one CMS interface; it makes for a lot of sitemap files.
I think this is a great step. There are many companies who very large sites are typically dynamically generated, and it's very difficult to add content (even a robots.txt file) to the site.
With this added functionality, SEOs, as well as others who manage client websites, will be able to help their clients even more with sitemaps.
While this is going to be very helpful, I don't see it giving any actual search engine ranking benefits. There may be some residual ranking benefits only because more of a site's pages can get indexed, but not specifically as a help to getting better search engine rankings.
STUPID - as lame as requiring absolute URLs in the first place.
Why not just allow RELATIVE URIs and take the hostname from the URL that was used to fetch the "/robots.txt" resource.
Sitemap: /sitemap.xml
... Is so much NICER looking. Each virtual host on the same physical machine then will have it's own sitemap in its document root directory (if not mapped elsewhere by the web server configuration files).
Just a suggestion. On the crawl stats page, it would be nice to show the actual page rank for the top 30 or so pages on the site. Instead of just showing low,medium,high quantify it with an actual number. If the google toolbar can do it I don't see why you can't put it on the website. :)
Brilliant. Funnily enough, I was just looking at the sitemaps to be verified the other day and thinking that I would love to be able to conglomerate them in such a manner.
I am the webmaster of
http://exposureroom.com
Essentially, I redirect www.exposureroom.com to exposureroom.com
My question is what should my sitemap entry in robots.txt file be?
http://www.exposureroom.com/siteindex.aspx
or
http://exposureroom.com/siteindex.aspx
Thanks.
Shiv.
Shiv, this question would be great to post in our Webmaster Help Group.
Why does Google not support the web.sitemap file? I tried to ad a web.sitemap file today to a website in Google, but I was met by errors!
As far as I know, it is a valid XML file....
Susan,
I don't remember seeing an email notification when you replied and so I didn't get back to this earlier.
So how does one go about getting the answer to this in the help?
I cannot get Google Webmaster Tools to accept my sitemap.It keeps saying URL restricted by Robots.txt even though I don't have a robots.txt file or even when I do.Why?
I'm very used to submitting a sitemap and it's usually as easy as one two three:
add a domain,verify and submit sitemap.
I am using WordPress and the plugin for Google XML Sitemap Generator.
Hi everyone,
Since some time has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Help Forum.
Thanks and take care,
The Webmaster Central Team
Post a Comment