Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Using stats from site: and Sitemap details

Wednesday, March 04, 2009 at 4:40 PM

Webmaster Level: Beginner to Intermediate

Every now and then in the webmaster blogosphere and forums, this issue comes up: when a webmaster performs a [site:example.com] query on their website, the number of indexed results differs from what is displayed in their Sitemaps report in Webmaster Tools. Such a discrepancy may smell like a bug, but it's actually by design. Your Sitemap report only reflects the URLs you've submitted in your Sitemap file. The site operator, on the other hand, takes into account whatever Google has crawled, which may include URLs not included in your Sitemap, such as newly added URLs or other URLs discovered via links.

Think of the site operator as a quick diagnosis of the general health of your site in Google's index. Site operator results can show you:
  • a rough estimate of how many pages have been indexed
  • one indication of if your site has been hacked
  • if you have duplicate titles or snippets
Here is an example query using the site operator:



Your Sitemap report provides more granular statistics about the URLs you submitted, such as the number of indexed URLs vs. the number submitted for crawling, and Sitemap-specific warnings or errors that may have occurred when Google tried to access your URLs.

Sitemap report

Feel free to check out our Help Center for more on the site: operator and Sitemaps. If you have further questions or issues, please post to our Webmaster Help Forum, where experienced webmasters and Googlers are happy to help.

Posted by Charlene Perez
The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

28 comments:

Mikes said...

Thanks for this info. I have been wondering as well and you have surely enlightened me. it pays to subscribe to your RSS. Thanks!

Roger said...

What is the reason why the # of URLs in the sitemap vs the # of URLs indexed is so different in some cases. As long as the URLs in the sitemap are valid and have unique content shouldn't they all be indexed?

Thanks!

Matt said...

I have also noticed that the number of pages indexed do not reflect the number of pages shown when using the site: modifier in a search.

I often tell people that pages show up when they are relevant to what google believes is the topic of the website. Even so, I have seen some pretty blank pages and pages that have no relevance at all showing up on the odd occasion.

Rather than summarising why the results differ, I think it would be much more useful to educate those who perform SEO on a daily basis so that they have something to report back to their customers who I can understand are probably getting very angry with the lack of knowledge and information.

*Edited typo*

Ryan said...

I have a similar issue with the link operator (link:). The operator returns far fewer results than webmaster tools - why would they be any different?

Manjari said...

Hi

Is there any way to bring Delisted site back on google...

Admin said...

Good news. How about robot.txt?

birdy said...

Great information, thanks for sharing.

Paez said...

Hi. Como ago para eliminar todos mis sitios web, que estan en google..

S Norris said...

Thankd for the informaiton. I am new to all of this and am trying to learn as much as possible.

Hank said...

How about showing pages that are indexed but not in the sitemap? Or even pages that are found but are not in the sitemap? This will help us identify spam.

We'd probably also need a way in webmaster tools to indicate that those pages are legitimate--sometimes you may only put part of a site in a sitemap, or you may have user-contributed info that may you may not put into a sitemap.

Murilo de Souza Lopes said...

My site is a problem a few months can not solve, the site
ok, following the google webmaster guidelines.

I'm even dreaming about that rsrs s complicated, I hope that one day the
google my site review

link my site: santaisabelonline.com.br

nizam said...

where can I seen my sumary? I already search around and nothing had found. My website is simplefuelwater.com

Samuel Jackson said...

How do i remove a link set by a spam site from my Google webmaster tools "Not Found" diagnosis. I have a site at 1mainstreet(dot)com. And some spam site linked to my site in a kind of weird way. Just notice the two posts under 25th December date on pages http://tinyurl.com/bjlsrx and http://tinyurl.com/bylc4p.
Now the diagnosis via webmaster tools says that this page is not found. I cannot do anything about other site that links that way. I like it when the tools stats show 0 in diagnosis tab. So i would like this to be removed. But simply cannot find the way to go about it. Any help would be appreciated

SEObetty said...

All righty. This is useful from a couple different angles:

1. It sets our staff's mind(s?) a little more at ease about Google's actions and intentions (yeah, we *know* you guys would never do anything evil, but it's always nice to get more evidence of that).

2. It gives us an actual explanation to provide to clients who notice the discrepancy (stuttering and saying, "uhhh ... I'll get back to you on that" never seems to impress them).

Duarte said...

Hi.
Sometimes, using the site: operator, I get a high number of urls indexed, like 1,200,000 that makes sense. but most of the times I get low number, 625,000. does this have to do with the multiple data centers not beeing upadated?

Thanks!

TriNi said...

I'm still trying to figure out how to submit a sitemap for a blogger blog.. :/ but I'll figure it out soon I'm sure :) Thanks for this post tho.. I learned why it's important.

Jeremy D. Impson said...

I can't get my livejournal sitemap to work It reports "URL not allowed", because all my posts are immediately under the root of the site (e.g. foo.com/a.html) while the RSS/Atom feeds are under a subdir (e.g. foo.com/data/rss).

Any way to remedy this, knowing that I have no control over how livejournal works?

LaMirabelle said...

Sitemaps? I don't know what this webmaster's tool is about. It doesn't work at all. I submitted a sitemap. Result?
I have 0 visits from Google search. That's a zero!
For this blog : http://www.leblogdelamirabelle.net
I have a Google rank of 3 and lots of quality backlinks.
When I type "blog de la mirabelle" my blog doesn't show up anymore!!
But according to this tool everything's fine.

Of course Google ignores all my mails. It's time Google gets competition since it's not working and doesn't respect users enough to deal with the problem.

Michael said...

I am staying busy now upgrading buisness broker web sites with old, corrupt and/or non standard code - and optimizing as I go. Some are riddled with page errors and MS styles

I have heard that too much improvement in a site's search search friendliness - TOO QUICKLY can raise a false flag.

Some of the sites I have worked on are total wrecks (one was not displaying in Firefox and IE8), with no text tags, keywords, descriptions, robots tags and on and on - so the legitimate changes I make are drastic, compared to the state they were in.

What can I do to keep from triggering some kind of sanction?

Mikes@Your Daily Word said...

I'm the first commenter of this post and i just want to tell you that i have submitted sitemaps and i see a lot of results when i do search now through my site. yipee!

Your Daily Word

Steve Crooks said...

Thats great as it shows up all the PDFs as well as the HTML pages

Crooks Design – East Anglia Graphic Design

freemusiclisteningondemand said...

I have submitted my sitemap for my website on March 31. On April 13 my sitemap was crawled with 47 URLs and for a SECOND time my sitemap was crawled today on April 19 with 56 URLs and NONE of my pages are indexed. Am I doing something wrong? Anyone that knows the answer to this, please reply to this feed.

Digital Warehouse said...

This post makes 0 sense, implying that the sitemap is a "guide" and the site operator tells you what they crawled and have in their index.

If that is the case, what is the reason for the INDEXED field in sitemaps? Why give us a number, specifically saying the * number of urls * indexed out of this many crawled, based on the sitemap I've submitted.

If I have 50,000 links submitted, and I see 15,000 indexed.. then I do a site: and get 5,000 what the hec??

It just doesn't make any damn sense Google! I'm simply trying to find a reliable way to calculate my site's inclusion ratio. Why do you have to beat around the bush. The Webmaster Tools was a big improvement.. (over.. nothing which existed before?) but you still are not addressing our needs.

Karen said...

I need help not sure of this is the right place to list this but I have submitted a sitemap in Google webmaster tools and the number of indexed pages is 344 lower then before it was submitted - is this due to the newly submitted sitemap being crawled from 0 again so its only so far through it up to now.

who said...

When use the site: operator to se how many pages are indexed from my web page I obtain a huge diference from google.co.uk than if I go to google.com.mx; for example the uk the number is 300,000 in mx the number is 38,000, why????

Bettina said...

Hi,

I have uploaded a xml sitemap for a site, which in webmaster tools it says it has been indexed.

However, when I search for it site:newSite.com - it does not match any documents

Can you help please
Thanks
B

Guru Sukwan said...

it's a good info, thanks for this info.

Google Webmaster Central said...

Hi everyone,

Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Central Help Forum.

Thanks and take care,
The Webmaster Central Team