Tuesday, July 24, 2012

Behold Google index secrets, revealed!

Webmaster level: All

Since Googlebot was born, webmasters around the world have been asking one question: Google, oh, Google, are my pages in the index? Now is the time to answer that question using the new Index Status feature in Webmaster Tools. Whether one or one million, Index Status will show you how many pages from your site have been included in Google’s index.

Index Status is under the Health menu. After clicking on it you’ll see a graph like the following:





It shows how many pages are currently indexed. The legend shows the latest count and the graph shows up to one year of data.

If you see a steadily increasing number of indexed pages, congratulations! This should be enough to confirm that new content on your site is being discovered, crawled and indexed by Google.

However, some of you may find issues that require looking a little bit deeper. That’s why we added an Advanced tab to the feature. You can access it by clicking on the button at the top, and it will look like this:





The advanced section will show not only totals of indexed pages, but also the cumulative number of pages crawled, the number of pages that we know about which are not crawled because they are blocked by robots.txt, and also the number of pages that were not selected for inclusion in our results.

Notice that the counts are always totals. So, for example, if on June 17th the count for indexed pages is 92, that means that there are a total of 92 pages indexed at this point in time, not that 92 pages were added to the index on that day only. In particular for sites with a long history, the count of pages crawled may be very big in comparison with the number of pages indexed.

All this data can be used to identify and debug a variety of indexing-related problems. For example, if some of your content doesn’t appear any more on Google and you notice that the graph of pages indexed has a sudden drop, that may be an indication that you introduced a site-wide error when using meta=”noindex” and now Google isn’t including your content in search results.

Another example: if you change the URL structure of your site and don’t follow our recommendations for moving your site, you may see a jump in the count of “Not selected”. Fixing the redirects or rel=”canonical” tags should help get better indexing coverage.

We hope that Index Status will bring more transparency into Google’s index selection process and help you identify and fix indexing problems with your sites. And if you have questions, don’t hesitate to ask in our Help Forum.

Posted by , and, Webmaster Tools Team

73 comments:

  1. Ok, this is amazing, but where is the export function??

    ReplyDelete
  2. Please give us a way to export the data. Thanks :)

    ReplyDelete
  3. Now provide us the same tool for the unindexed.
    pages to help the webmasters get ready for potential problem

    ReplyDelete
  4. Nice for dynamically generated pages from webmasters...

    ReplyDelete
  5. This was much awaited feature.. Thanks for getting it done... But i suggest that this feature should have have data export feature in it.

    ReplyDelete
  6. Amazing feature ! Hope the API datas will be refresh every night correctly, not as crawl errors datas via API. It's important !

    ReplyDelete
  7. colors are not very well chosen... gren should be "indexed" = good, red should be for BAD (not indexed, blocked...)

    ReplyDelete
  8. This is what I was looking for...thanks blogger team..

    ReplyDelete
  9. This Option is very helpful for Webmasters, If Google Webmaster Tool add one more feature for Export Data than Webmasters manage website very well.

    I hope Google update this feature in near future!!

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. This is all great but is there anywhere where you can see which ones are indexed and which are not?

    ReplyDelete
  12. A massive leap forward in transparent information from Google, Thanks a lot

    ReplyDelete
  13. Nice one big G.

    Would be great to compare against another managed url for Migrations. As others are saying here, would be great to highlight some unindexed pages, and to be able to export this.

    ReplyDelete
  14. It seems that crawled and not indexed include urls with parameters (sort=asc affId= for example). Only that would explain those high numbers. Any clue?

    ReplyDelete
  15. I think it will be very much helpful to Webmaster to determine indexed pages's number as well as pages which are out of index.

    ReplyDelete
  16. How can a rel=”canonical” tag help in redirect ?

    ReplyDelete
  17. How can a rel=”canonical” tag help in setting up the redirects ?

    ReplyDelete
  18. Great!
    Will when it be available via GData API?

    ReplyDelete
  19. I am not sure how you want Google to tell you which pages are not indexed - since knowing they exist - would mean that they're indexed. Anyway, maybe I am missing something, but this tool simply tells us which pages Google is aware of.
    -DK

    ReplyDelete
  20. Thank you very much for your help! Note of webmaster

    ReplyDelete
  21. Here we can see only no of urls. can we see those pages also so that we can request google to index our pages. If its already in webmaster then please let me know.

    ReplyDelete
  22. Thats really a great help for webmasters and SEO guys to identify the number of pages are indexed, blocked or some other crawling errors.

    Thanks for bringing more transparency to GWT.

    Amardeep Singh

    ReplyDelete
  23. This is a really nice feature - but it would be really good if you could drill-down into the 'not selected' data to get a ratio of re-directs vs. duplicate content as Wordpress (for example) creates a ton of redirects for permalinks etc but in the grand scheme of things this is less of a concern than knowing how much of your content is not showing because Google thinks it is duplicate content...

    ReplyDelete
  24. Its really help for webmaster's...

    ReplyDelete
  25. Please webmasters url stats/details of total index, ever crawled, not selected, and blocked by robots pages. It will help them properly fix the ignored/ corner case pages. It might help people suffering from Panda penalty unnecessarily as well. Some website that are spammed without any knowledge of webmaster will also gain from the data details. Yes it might help spammers as well in some cases. But in the we need to keep the interest of innocent people first. So considering that please release the said option as soon as possible.

    ReplyDelete
  26. really helpful information thanks for sharing. :D

    ReplyDelete
  27. Really useful. Next step, give some example URLs that are indexed, blocked etc

    ReplyDelete
  28. This is AWESOME! Thank you, so much. :)

    ReplyDelete
  29. This is an amazing development! Very nice. =)

    ReplyDelete
  30. Thanks for adding this detail, it will certainly help webmaster around the world which were previously rely of site: wide and URL search. By the way author status is still not showing correct details, looking forward to see it fixed.
    Javin

    ReplyDelete
  31. The new features will help webmasters for checking again and again for index pages. Now they can see their index pages at one place. A great tools for webmasters. Thanks a lot for adding new features under Health tab

    ReplyDelete
  32. Thanks for the transparency. For everyone else who needed clarification about this new feature, read: https://support.google.com/webmasters/bin/answer.py?hl=en&answer=2642366

    ReplyDelete
  33. this is a great update
    one single most important diagnostic in WMT is now monitored more easy

    ReplyDelete
  34. I just hope we can see which pages were indexed or not.

    ReplyDelete
  35. Hello Google guys, can we also have blocked URL in:
    Index Status> Blocked by robots

    ReplyDelete
  36. This comment has been removed by the author.

    ReplyDelete
  37. This comment has been removed by the author.

    ReplyDelete
  38. Sir I am new

    Please tell me about this index image

    Here is the Link - Image of Index Status

    ReplyDelete
  39. awesome action. thanks for article

    ReplyDelete
  40. Great tool. There is anyway to check which pages are not indexed
    the actual URL's?

    ReplyDelete
  41. For sites that have sub domains, this new tool is not useful.

    I did a search for site:www.mysite.com and found that I have 164,000 pages indexed. I compared that to GWT and found that I have 1.37mil pages indexed. Why such a difference? Well its probably because I have several subdomains and this report is aggregating the results together.

    Please split this out or all us to select a filter that says "exclude subdomains" since technically this is a completely different site.

    ReplyDelete
  42. I still see that Googlebot can be cheated about the amount of indexed pages, for example that worspress plugin called search terms tagging 2 use to index a whole lot more than the ones that actually exist on the website

    ReplyDelete
  43. that's indeed very handy, after all the pages only count as long they're indexed.

    ReplyDelete
  44. This is a really nice feature - but it would be really good if you could drill-down into the 'not selected' data to get a ratio of re-directs vs. duplicate content as Wordpress (for example) creates a ton of redirects for permalinks etc but in the grand scheme of things this is less of a concern than knowing how much of your content is not showing because Google thinks it is duplicate content..

    ReplyDelete
  45. Can you show us the URL's that havent gotten indexed, and if that's a large data set, can you show us the URLs for pages that are indexed? We need more transparency in order to make the site better.

    ReplyDelete
  46. Can you show us the URL's that havent gotten indexed, and if that's a large data set, can you show us the URLs for pages that are indexed? We need more transparency in order to make the site better.

    ReplyDelete
  47. Great share, i am just hoping that next time, in this index search term, Canonical index urls should be covered, then it would be easier to filter that on which pages you set canonical and on which pages u didn't set!!

    ReplyDelete
  48. The green line (Not Selected) What can I use it for, when I do not get all the URLs behind it?

    Yes, then I know I have some DC problems, but not where.

    :-)

    ReplyDelete
  49. I daily use for my website a SEO software called Botify. Web crawl and logs analysis are merged in a webinterface that lists indexed and unindexed pages.

    ReplyDelete
  50. Such a great feature is added to the webmaster tools it will help us to know about our indexed pages in a short period of time !!

    ReplyDelete
  51. Thank you G. This new tool helps to evolve the relationship between Google and Webmasters. I hope in a future coming you also let us to know or download a list with the exactly "No Selected" pages that aren't being indexed as well as those being indexed and blocked by robots. :-J

    ReplyDelete
  52. Your side column link is not working...

    New to Webmaster Central?
    >>Learn more<< about Google Webmaster Tools.

    FYI

    ReplyDelete
  53. not good because what to do after findout how many URLS are still pending. As we don't have any way to findout which URL isn't indexed yet.

    4 Out of 10
    Poor

    ReplyDelete
  54. Index check shows 0 pages indexed yet when i check sitemap stats it shows i have all my pages indexed. I"m confused.

    Why am I getting this conflicting data

    ReplyDelete
  55. Thanks for information,it will help to track changes.

    ReplyDelete
  56. What JohnMc said. Would be great to find out what pages are not selected.

    ReplyDelete
  57. For one of my website "Total Indexed" is nearly double of "Ever Crawled". This is just not possible, can you please explain any such possibility.

    ReplyDelete
  58. This post will help us a lot but how can i get the data for not selected URLs.

    ReplyDelete
  59. Thanks when I know how to see the site.

    ReplyDelete
  60. oh, it is a post good and useful...

    ReplyDelete
  61. How it possible if "Not selected" is much more than "Total indexed"?

    ReplyDelete
  62. Hello sir! i hope you are fine
    I have a blog which is 1 years old now. I saw it after a long time now and was shocked to see that Google robot has blocked 200+ pages of my website. I saw this through webmaster tools. IS there any way I can unblock them? I much confused please help?

    ReplyDelete
  63. google take more than week before recognize its new post

    ReplyDelete
  64. I just looked at my graph for my blogger site, and it says 74 pages indexed, and 108 pages blocked by robots. Help! The yellow line is off the charts! I don't have a custom robot text in my blogger page. How can I fix this?

    ReplyDelete
  65. drsoneet.com need to re indexed. So what should be done for that?

    What is google's policy or mechanism for submitting a site for re indexing.

    ReplyDelete
  66. Awesome! It is really helpful for me.

    ReplyDelete

The comments posted on this blog belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.