Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

What's new with Sitemaps.org?

Wednesday, April 11, 2007 at 5:00 AM

What has the Sitemaps team been up to since we announced sitemaps.org? We've been busy trying to get Sitemaps adopted by everyone and to make the submission process as easy and automated as possible. To that end, we have three new announcements to share with you.

First, we're making the sitemaps.org site available in 18 languages! We know that our users are located all around the world and we want to make it easy for you to learn about Sitemaps, no matter what language you speak. Here is a link to the Sitemap protocol in Japanese and the FAQ in German.

Second, it's now easier for you to tell us where your Sitemaps live. We wondered if we could make it so easy that you wouldn't even have to tell us and every other search engine that supports Sitemaps. But how? Well, every website can have a robots.txt file in a standard location, so we decided to let you tell us about your Sitemap in the robots.txt file. All you have to do is add a line like

Sitemap: http://www.mysite.com/sitemap.xml

to your robots.txt file. Just make sure you include the full URL, including the http://. That's it. Of course, we still think it's useful to submit your Sitemap through Webmaster tools so you can make sure that the Sitemap was processed without any issues and you can get additional statistics about your site

Last but not least, Ask.com is now also supporting the Sitemap protocol. And with the ability to discover your Sitemaps from your robots.txt file, Ask.com and any other search engine that supports this change to robots.txt will be able to find your Sitemap file.
The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

27 comments:

rmonkeygirl said...

Great idea! I don't speak Japanese, but the robots.txt and Ask news are very useful. My only question was whether Google would still report stats in webmaster central for autodiscovery, but the post seems rather clear that we should still submit manually to get those. Thanks again!

Sebastian said...

Vanessa, do the united engines support multiple sitemap locations in robots.txt? Lots of folks maintaining a handful of sitemaps don't mess with sitemap indexes.
Thanks!

Gareth said...

That's all very good.

I still wonder why Blogger can't auto-generate a REAL Sitemap.xml file, rather than have us resort to hacking around with incomplete Feed URLs.

It seems so very obvious and trivial for Blogger to produce such a file, when it knows about all our posts on the blog.

sartin said...

Awesome! Can we still use a .txt sitemap file using the robots.txt protocol for sitemaps? I ask becase the example is .xml and I see no mention of text files.

Hong Xiaowan's Studio said...

Gareth:
All content already saved at Blogger, why save them again?

Vincent said...

Why can't we just add a < link> tag?

E.g.:

< link rel="index" href="sitemap.xml" title="Sitemap" charset="UTF-8" type="application/xml" />

That's what the < link> tag is meant to do, and the type="application/xml" can be used to tell it's a sitemap in the Sitemaps.org format.

(Note: I wrote "< link" (with a space) because Blogger would not allow the posting of a tag)

VietnamGear said...

Specifying the location of the sitemap within the robots.txt file gives the following parsing result -Syntax not understood.

Is this just temporary?

Thanks.

hermanwest said...

protocol page in traditional chinese is not fully translated!

from Informing search engine crawlers to end of page.

Ted R. said...

We run a couple of different domains off a single webroot which share the same robots.txt file. Is it okay to have multiple Sitemap: entries in a robots.txt file? Or what would you recommend

MJE Sales, LLC said...

This is a great thing! This will make life a lot easier for folks like me who run huge sites with thousands of pages.

Web Conference Room Guy said...

Since reading an artilce about the importance of sitemaps, I have created sitemaps for each of my web sites. I think it's great that Google provides us with the information we need to create these important sitemaps. Thanks Google! Thank you Vanessa for keeping us informed.

GooglleFan1 said...

"Specifying the location of the sitemap within the robots.txt file gives the following parsing result -Syntax not understood. Is this just temporary?"

I have the same question.

paolo.groppo said...

Vanessa, it's a simple and a great idea, especially because of the standard use of robot.txt, every webmaster need a unique way to submit a sitemap to multiple engines.
I tried to include this directive:
Sitemap: http://www.mysite.com
to my websites but "Webmaster Tools" under section "robot.txt analysis" now said that my robot.txt has a bad syntax.
Webmaster tool still does not accept sitemap?

Stephen Newton said...

Good to see more use being made of Robots.txt. How about incorporating other useful hints for search engines such as a geographic location to ensure sites are listed in the correct country search. (This could not be abused as there's only one robots.txt file per site and only one location would be accepted.)

I'm in the UK, but my sites are hosted in the US because it's cheaper.

MenScience said...

This is awesome! I've been using the Google Webmaster Tools for quite a while now and I love it. However, I still can figure out why Google only has 6 pages indexed from my site (http://www.menscience.com), when I have over 600 pages of really good content. Can someone please explain this to me and give some suggestions. Thanks!

Dimpie said...

I like that sitemaps.org is now available in Dutch, however, I noticed a wrong translation at http://www.sitemaps.org/nl/faq.html#faq_sitemap_location

Hope someone can correct that.

easy HTools said...

Just wondering if what is the reason to force webmasters to provide FULL paths (starting with http:// ) to sitemap files?! Why the "Sitemap:" entry format should NOT comply with "Disallow:" entry format that requires RELATIVE paths?! You can't have a sitemap file coming from one server for another server anyway; so what is the point?
Also, why there is no way to specify multiple sitemaps in one row like multiple "Disallow" paths separated by space?!

Jérémie said...

Someone at www.sitemaps.org renamed the /0.9/ directory into /09/ so all the feeds can't be validated against http://www.sitemaps.org/schemas/sitemap/0.9/ no more. Can someone fix it?

Ravikant said...

Good !dea, but other features wich are available on google sitemaps are completely missed out like forbidden pages, crawl date etc. while we get clear idea about other things using google sitemaps.

Thanks

turifungia said...

Can we write also sitemap.gz?

Picture of beauty girl 9x said...

thanks for sharing !

Joseph said...

Everyone is an expert - here's what my developer says...please comment - I am totally confused!
I don’t see how the Robots.txt file would be important.



The Robots.txt file is only important for blocking the engines from certain areas. It is not used to help engines find information. The current PCI compliance specification prohibits use of robots.txt files because they can be used to try and find areas of a site to exploit. So even if you wanted to use one you’re prohibited from doing so by ScanAlert/SecureMetrix…



An XML sitemap can be important (sitemaps.org) only if your content is having trouble being indexed. If your .asp sitemap has all of your links then an XML sitemap is a bit repetitive.



You’re site is NOT having issues being indexed. Yours is a content SEO issue (top 200 words, H1, H2, duplicative content…).

rofaldez said...

with the new robot.txt method its very simple. Just place your
User-agent: *
Disallow:
Sitemap: http://www.yoursite.com/sitemap.xml
check my blog:
http://www.rofaldez.com/update-robottxt.htm

Fritz said...

Our site (trenchmice.com) grows new pages when our users create them. Similar, in a sense, to a wiki.

We dutifully update the sitemap when new pages get created. When we don't understand is, should we also re-ping Google to tell it to re-read our sitemap? Or does Google have an algorithm for deciding when to re-read a site's sitemap?

If it matters any, our sitemap is indexed. (Sitemap.xml desribes a large number of secondary sitemap files.)

Thanks!

I'm a student of General Linguistics, Psychology and Computer Science, considering myself to be a cognitive scientist. said...

gareth is so right:
why can't blogger auto-generate a REAL Sitemap.xml file, rather than have us resort to hacking around with incomplete Feed URLs?

I don't get it.

yasirwazir said...

Thank you so much google, for making things easier for us.
Now I don't have to worry about sitemaps any more.

Google Webmaster Central said...

Hi everyone,

Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Help Group.

Thanks and take care,
The Webmaster Central Team