Tuesday, December 18, 2007 at 8:10 PM
In 2003, Google introduced a "supplemental index" as a way of showing more documents to users. Most webmasters will probably snicker about that statement, since supplemental docs were famous for refreshing less often and showing up in search results less often. But the supplemental index served an important purpose: it stored unusual documents that we would search in more depth for harder or more esoteric queries. For a long time, the alternative was to simply not show those documents at all, but this was always unsatisfying—ideally, we would search all of the documents all of the time, to give users the experience they expect.This led to a major effort to rethink the entire supplemental index. We improved the crawl frequency and decoupled it from which index a document was stored in, and once these "supplementalization effects" were gone, the "supplemental result" tag itself—which only served to suggest that otherwise good documents were somehow suspect—was eliminated a few months ago. Now we're coming to the next major milestone in the elimination of the artificial difference between indices: rather than searching some part of our index in more depth for obscure queries, we're now searching the whole index for every query.
From a user perspective, this means that you'll be seeing more relevant documents and a much deeper slice of the web, especially for non-English queries. For webmasters, this means that good-quality pages that were less visible in our index are more likely to come up for queries.
Hidden behind this are some truly amazing technical feats; serving this much larger of an index doesn't happen easily, and it took several fundamental innovations to make it possible. At this point it's safe to say that the Google search engine works like nothing else in the world. If you want to know how it actually works, you'll have to come join Google Engineering; as usual, it's all triple-hush-hush secrets.*
* Originally, I was going to give the stock Google answer, "If I told you, I'd have to kill you." However, I've been informed by management that killing people violates our "Don't be evil" policy, so I'm forced to replace that with sounding mysterious and suggesting that good engineers come and join us. Which I'm dead serious about; if you've got the technical chops and want to work on some of the most complex and advanced large-scale software infrastructure in the world, we want you here.


40 comments:
Very impressive, this always confused the hell out of me as to why your users should have cared in the first place what index the documents came from. You seem to have corrected this now I always thought it should have been transparent. Dave,
Hi,
One of our sites now says we have duplicate meta tag descriptions on 6 pages, including our main page.
We have completely dropped out of Google's search results for the entire site. So we corrected all of the pages and Google has since crawled and updated the main page, but Webmaster Tools still shows 6 pages? My conclusion is, this new feature tags your url as duplicate content(for the duplicate meta tag description error) and if it is your main page, then all your sub pages will drop out of Google's index.
My question is, when will Webmaster Tools know that we fixed the duplicate meta descriptions?
Why does this new Diagnostic feature Ban pages from showing in Google's index that have a duplicate meta tag description?
Thanks,
Larry
Interesting stuff, thanks for sharing.
Does this also apply to dynamic pages? Assuming that the dynamic pages have different titles, meta-descriptions, text etc...
I'd be very interested to see what improvements in the technology have allowed you guys to drop the Supplemental Index. Dan Thies gave a very good explanation for the SI, which was bastardized in plain English here.
How are you guys searching the whole thing and not slowing down the system? Is Googlebot faster? Are you caching more pictures and stuff that's less likely to change? Ignoring the whole part of the web that is gambling/pills/porn for any non-gambling/pills/porn query?
I don't think those make sense, because you're saying that now you'll be showing the SI results more. So PR would seem to be less important. I can't imagine you guys putting humans to the task of reading everything in the supplemental index and reviewing the pages... My best guess is that you're upgrading the importance given to B-list and C-list sites, which are more likely to link to these pages. IE perhaps not reducing PR's importance, but just redistributing it/calculating it in a new way to help these pages out.
I'm thinking what I just wrote probably sounds like gibberish, but wtv. Anyways, if you guys are hiring, why not just bring all the blackhats inhouse?
Can we get any more info? At least something like a "you're getting hot, hotter, hoooooot" answer?
So we should consider the SI as gone or non-existent correct?
Should we even look at those inaccurate SI or pages indexed operators ever again?
Are all pages being judged equally right now?
Is this why many websites have been seeing the total number of pages indexed going down when checking the site: operator?
amazing. great work as usual google webeng :)
Ah, but telling people you'd have to kill them if they told them and therefore you're NOT telling them because it's for THEIR OWN protection is not evil. It's protecting them from knowledge that would otherwise be lethal. Therefore, good.
I don't know jack about engineering but I can justify the heck out of stuff. And my rates are very reasonable. Just so you know.
So are they now a single index then, for all types of search queries? (including special things like site:, link: etc?)
Great to get the info passed back Googs. Of course, if you could give some similar detail on how you fall foul of &filter=0 it would be appreciated! :-)
Why do I still see different results when I use the site:domain.com/* vs. site:domain.com/ ?
Is this a matter of updates rolling out, or am I missing something?
Otherwise it seems like a great move. I've already noticed pages that couldn't be found in the SERPs now showing up...
Thanks,
Paul
When can we expect to see the most relevant results show up? I'm still seeing tons of less relevant pages show ahead of what appear to me to be Supplemental Results Pages.
Talk is cheap.
All I ask for is a search results page that can be trusted. When will I get that?
The only pages of my client's site I ever found in the supplemental index were loads and loads of duplicate content - pages that we never wanted indexed in the first place, in other words. So now you've dumped all these irrelevant pages (inadvertently created by legacy technical issues) back into the main index? I honestly don't see how this is an improvement.
I have always optimized websites for two campaigns:
1) The main campaign focusing on the aggressive 1-2 word terms:
The main campaign supports the main domain in ranking for the shorter terms (such as: Divorce Lawyer, Divorce Attorney, etc) - There are many factors in ranking for these terms, but the basics of it is to get the few main terms that are being targeted into the title and description a well as on every page content of the site.
2) More specified campaigns for the 2-5+ word terms:
The second campaign is supported by optimizing individual pages for your more specified terms (such as: Divorce Attorney Seattle, Lawyer Seattle WA, ask divorce lawyer question, etc) - In which the term you are focusing on is added to one page and offers your user the exact information they are looking for. In the end bringing them more relevant results and getting your site seen.
This is why the supplemental results are very important. Google's main goal is to bring the user the most relevant results, and it's up to us to help Google accomplish this.
"and it's up to us to help Google accomplish this.
Wait, explain to me why it's my job to do Google's job for them? I realize this isn't Matt Cutts' blog and it might not be appropriate to have a a Graywolf vs Matt-style debate here... but let's be perfectly honest. My job is not to "help Google" do its job better; my job is to help my clients.
-cont-
I guess I should add that my clients don't have problems with "real" pages falling into Supplemental.
Here's my perspective: http://www.all-about-content.com/2007/12/what-happened-to-supplemental-index-aka.html
Any reason for deleting my comments?
We just want to hear the truth as to what is happening if anything at all with this recent change. I don't see one difference in the index since this announcement.
One concern I have seen a lot about the different indexes is that pagerank does not flow from one index to the other in the same way as it flows between pages in the same index - a page in the supplementals becomes basically a black hole for PR. I never thought that was accurate, but was it the case, and if so has that changed now?
Paul, in regards to "Why do I still see different results when I use the site:domain.com/* vs. site:domain.com/ ?"
I think this has to do with the syntax - * is a wildcard operator. I don't think this affects the index that the query is pulled from, it simply changes the part of your site: that is included in the SERP. Not sure, not a Google guy though.
What is going on with the SERPS? My rank for my business name went from number one to number 18? My business name competes with just 6,000 other listings, and two of the listings above mine are parked pages. How can a parked page, with no incoming links and little relevancy achieve number 7 position?
This happened just in time for the last few days of online shopping before the xmas holiday. :( I feel Google has given me a lump of coal for xmas.
Was working with a site trying to calculate their supplemental vs. main index ratio and "solve" the problem of pages in supplemental hell by getting rid of duplicates and improving link flow to get more page rank to the pages.
At around the time of this announcement, the ratio suddenly improved dramatically. Will be interesting to see if traffic actually goes back to previous levels due to this change.
This thoroughly confuses the hell out of me as macewan.org has disappeared from Google traffic. As an average Joe blogger I feel at times we are being targeted just for adding advertising to our blogs.
Does this mean that I can put up a Sitemap without having my whole website sunk to such depths that only Nemo can read it?
MrBill, if you have specific questions about your site's performance you'll have better luck getting them answered in our Webmaster Help Group.
macewan, it's fine to have advertising on your site; but if you're concerned about your Google rankings, you should make sure your advertising links aren't passing PageRank. Check out this post for more details.
Frank Martino, having a Sitemap can only help you, it won't hurt you. Feel free to put up a Sitemap.
Thanks Susan, I will head over to the Webmaster help group. Have a great holiday!
macewan, a big drop in traffic like that is most likely unrelated to this integration of indices. We also don't penalize people just for having advertising on their site. You might want to double-check that your advertising links aren't passing PageRank, however; more details here. You could also review our Webmaster Guidelines and make sure your site isn't violating any of them, which could be affecting your performance.
Still waiting for Google to show the most relevant results first....
Well looks like now we have less interesting pages, with lower pageranks higher up... Frankly for many of my queries I must say I am more disapointed than anything else... Maybe PageRank wasn't the best solution, still it worked better than the mess we have now (at least for the queries I took the time to compare with the PageRank).
Hello
I'm Search Engine Optimization Expert having my 7 years of SEO/SEM/SMO experience. Nowadays, I'm in a trouble, I have owned a website, that some medical industry related. The problem I'm suffering through is not showing an exact meta (description), whenever someone searches for my site. I'm talking about site snippet. I have used "NOODP,NOYDIR" the same metas that helps meta description under bound. But this has no worth to me, could someone tell me what metas I use to reslove the issue.
Thanks in advance,
Bilal
Hi Bilal,
We use the meta description for the snippet sometimes, but not always (for example, we may choose a snippet from the text on the page). It depends on what's most relevant for a user's query. Check out this blog post or this help article for more information.
This is a bit off-topic but I don't know where to post this comment.
The LEGO 50th Anniversary Header has NO ACTIVE LINKS! Every other special heading for GOOGLE has had a myriad of informative links...
is this oversight or some new policy... I would love to see some connections for LEGO!!
Pass this forward to the right person to have it happen, if possible.
~Richard Danahy
hello people,
My question is: does Google tool-bar really track what users are browsing in the internet? and i have been told in the feature base on amount of visitors a site has it will get
better ranking in the search result, is this Possible?
Thank you,
Andrew from DailyServer
why does it say site map not allowed on my site on the dashboard
Ok,
so there is no supplemental index anymore.
But if I get two pages of 1 domain in the SERPS, the second SERP is always meshed.
Why? Secret supplemental? Or whatelse?
Thanks,
Whoopster
That's an "indented result;" see section O in this diagram.
Sharing this false and misleading post from your latest blog post won't change the fact tha Google shows pages in the Main Web Index FIRST, BEFORE Supplemental Pages.
Webmasters have damned good reason to be upset about the Supplemental Index.
Yonatan, probably the wrong group, and I apologize, but I can't find an answer anywhere on the Google site.
A Google search often shows the #1 ranked hit with multiple (supplemental?) pages - indexed and hot-linked. For instance, search the word "Aspen" and the #1 hit (Aspen Skiing Co) has eight indexed paged, all hot-linked, such as "Buy Lift Tickets" / "Daily Snow Report" and so forth.
Our company normally comes up 1st for the search word "Millennia" and has been there for years.
How can we get a Google result that lists and hot-links our top 8 pages in the Google search listing?
Thanks for helping.
i am a big google fan, love your site butsomething i never understood is when i "keyword" something, the search engine brings up the results of most popular results for the words entered
but, as you scroll to the bottom the is access to ___the next 9 pages, next____ and to get where i copied below, pages 45 to 64,......
___________________________________
...96k - Cached - Similar pages - Note this
Previous 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next
___________________________________
........i had to keep clicking the last page available for the next 10 pages, etc, etc, etc
now, i'm the inquisite surfer who finds sites on pages 210, 596, etc as i did today
why isn't there a search box for "_page_(enter number) to alleviate this constant clicking for obscure pages
i'm sure you may delete this message / comment but you may also research, survey, and enhance your site by offering this feature, if even to a sample of your trial users. in any event, for a more detailed elaboration on this feature, document it, and get back to me
compro7@google.com
a loyal googler
frankie
Cached - Similar pages - Note this
Previous 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next
Post a Comment