Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

More details about our webmaster guidelines

Thursday, June 07, 2007 at 5:59 PM

At SMX Advanced on Monday, Matt Cutts talked about our webmaster guidelines. Later, during Q&A, someone asked about adding more detail to the guidelines: more explanation about violations and more actionable help on how to improve sites. You ask -- we deliver! On Tuesday, Matt told the SMX crowd that we'd updated the guidelines overnight to include exactly those things! We work fast around here. (OK, maybe we had been working on some of it already.)

So, what's new? Well, the guidelines themselves haven't changed. But the specific quality guidelines now link to expanded information to help you better understand how to spot and fix any issues. That section is below so you can click through to explore these new details.

Quality guidelines - specific guidelines

As Riona MacNarmara recently posted in our discussion forum, we are working to expand our webmaster help content even further and want your input. If you have suggestions, please post them in either the thread or as a comment to this post. We would love to hear from you!
The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

29 comments:

Krishna Kumar said...

Avoiding the duplicate content seems to be a problem in blogs hosted on Blogger, because of the way archived pages are created. Right now, I have some pages in Supplementary results while the archived pages are found in the regular results. What is the best way to handle this?

Eric said...

I also have a question about the "avoid duplicate content" guideline: We have a large site that has several mirror sites in different countries. The content is identical (except for different mirror site links). The addresses look like au.expasy.org, br.expasy.org, ca.expasy.org etc. There isn't a way to tell Google to treat these all as identical to www.expasy.org, the main mirror site? Obviously we could forbid Google from indexing anything but the main mirror site, but since many people link to specific mirror sites I suspect we'd then show up less often in Google's search results...

Esto va mal said...

So, the marquees that shows 20Ks of links. Maybe 4 minutes in a 300px anchor. Are neither hidden links, neither link farms.

Come on people, add a super marquee in your web. :-)

Webmasters need more severity from Google.

Frankie Roberto said...

Re: hidden text or hidden links

What about CSS-hidden 'skip to content' links (eg using the positioned-off-page, 1px-high technique)?

This is often included to improve accessibility for screen readers.

Bill said...

Are you planning on a new section that explains the supplemental index.
Is there any other way besides in-bound links to get out of the index?

Sebastian said...

To prevent you from duplicate content I don't quote my longish appraisal ;) In short: I like it very much.

I think some statements need clarification though, see these forum threads:
Using CSS to hide text
Hiding Text with CSS and Webmaster Guidelines

Sebastian

Matthew C. Keegan said...

Sound advice! I have enjoyed reading Google's guidelines to help me produce sites which are interesting, relevant, and standards compliant. After 5 years of web designing, I think I am doing a decent job of staying current.

Regards,
MattK
The Article Writer

Michael Martinez said...

From your definition of Doorway Pages: "Doorway pages are pages specifically made for search engines. Doorway pages contain many links - often several hundred - that are of little to no use to the visitor, and do not contain valuable content."

NOPE. That's a HALLWAY page (also called a CRAWL page). The terminology has been around since before Google, so it will be helpful if you update your answers page to use the terminology correctly.

A DOORWAY page links or redirects to a specific page.

A HALLWAY page links to many other pages (usually just DOORWAY pages).

Doorway pages are traditionally used to capture rankings for specific expressions. Hence, 1 page with content receives traffic from hundreds or thousands of expressions through as many doorways.

In the old days, people had doorway pages for each expression for each major search engine (and you had to keep up with about 10 search engines in those days).

Cindy Pinsonnault said...

Thank you for the additional details. I believe I've been doing pretty well following Google's guidelines, but more information is always good.
In particular, you helped clear up some confusion I had about duplicate content. I've gotten all kinds of misleading and incorrect information from other sources about what exactly constitutes duplicate content.
I am happy to have the real data "from the horses mouth," so to speak. Thanks.

Werner said...

A website which provides a catalogue service (say, for the best cooking recipees on the Internet) will basically contain what one could call 'scraped content' (like the summary of a recipe existing elsewhere) and a link to the site where one can find the full information.

Can Google tell such a site from one which reproduces 'scrap content' just to increase its PageRank?

Google itself, or Yahoo, have a pretty good PageRank, so I suppose it can.

So how do you tell Goolgle that you are a site aggregating useful information and not one providing scrap content only?

Spain Homes Network said...

I just love google as everybody else. I do however don't agree with google guidelines that says just focus on the visitors of the website, avoid this and that. We have a site for last 5 years and still with page rank 3 instead of following a decent seo methods. The problem is many site are using the tricks to get to top so the normal sites can't come on top instead of following google seo standards. I know this isn't easy for google but that's what worries me.

Pankaj
Web Administrator
A well laid property site

Sin said...
This comment has been removed by the author.
Sin said...

I like the guidelines but duplicate content in reality does not hurt your site. Yes it should but it really doesn't. There are way too many sites out there with duplicate content to the human eye but the google-bot doesn't see as duplicate. Changing the order of content and placing a few extra words. No more duplicate contentas far as google is concerned.

As far as the question of content being scrap from another. Yes it can tell depending on how you display. Remember images, flash, and javascript doesn't count so you mearely looking at it from a text based view only. Display enough text before the scrap content and it no longer appears to be scrap.

The guidelines are good but in actually its not a person going to pages/sites its a bot. Too many duplicate content sites are having high pageranks to say it hurts.

shinzaiaku

Per-Erik Skramstad said...

I like the Webmaster Guidelines update, but I really would appreciate if Google could be more specific about what is "allowed" and what is not. For instance about visibility:hidden divs: when can it be used?

Richard Jennings said...

Who here has add their profile to congoo.com? This was on PC World magazine cover for June.

king said...

i have a site allysource.com but it is really low on the search result

Shin said...

allysource.com

Google PR : 0
Google Search : 122
Google BL : 0
Google Index : 44
Yahoo Search : 0
Yahoo BL : 13
Yahoo Indexed : 0
MSN Search : 0
MSN Index : 0

I ran your domain on my script. You need to get some more links out. Yahoo backlinks are alot easier to get then on google. If you feel you have some out there then google isn't crawling the pages that you have links to your site on.

Jim said...

My page rank recently went to 0
Lillicotch.com
I am wondering if it's because I added hidden links to a Project Honey Pot page?

Jason said...

I've been having some interesting debates with other web developers and some self-proclaimed SEO gurus about a new Flash accessibility programming method that we just blogged about: http://labs.blitzagency.com/?p=171

The title of the blog post is Search Engine Optimization for Flash Websites because well, lets face it, that title attracts more eyeballs than "Flash website accessibility".

I've read the Google webmaster guidelines cover to cover, and I believe that what we are doing does not go against Google's recommendations, but I 'd love to hear other peoples opinions -- especially Google's own!

What we are doing is in the true spirit of good web development -- separation of content, presentation, and behavior. By doing so, it allows is to make our Flash websites much more accessible to those without the Flash plugin, and the 10-million plus vision impaired (in the US alone). We believe that by making a site accessible, it also makes it search engine friendly, and that's not a bad thing. None of the techniques or text we are using are designed in such a way as to trick or deceive the spiders. The text on the page == the text in the Flash movie. We are simply exposing the same separated content to everyone.

So please, check out the blog post http://labs.blitzagency.com/?p=171 and check out the case study http://blitzagency.com/ and let me know what you think.

Noel Grech said...

Hi, I really would like more information regarding the supplemental index. I've been having problems with this for a couple of sites.

Any sort of information regarding how can a webmaster take out pages from the supplemental index would be great. Also if it's possible to give more information about what pages go in the supplemental index and why would be great.

N. Grech

kd6lvw said...

I must disagree with usage of two frowned-upon items: Hidden links and some limited cloaking.

Hidden links/text: I use these as traps for malicious robots (presumed to be spambots). Some are mailto URLs that point at spamtrap mailboxes, and others are links banned by my "robots.txt" file. These are things I don't want users following - else they will ban themselves too.

Cloaking: Where someone has the same content deliberately under different URLs (usually different domains), I believe that it is fine to present a redirect to a search engine - but ONLY IN THE CASE of pointing an alias URL to the chosen canonical URL that should be indexed (i.e. the URL used in the site map). (This also addresses the "mirroring" issue raised by another commenter.)

The way that google has written its "guidelines" makes it sound as if these practices should be banned. However, there are valid uses for them.

As for javascript-based links to other pages, I consider that BROKEN behavior. There are some of us who recognize the security risks of enabling javascript and will not operate a browser with it on. As such, web sites that use it for linking don't work. Often, the use of the javascript construct does NOTHING that can't be done with an equivalent HTML construct. Web designers who use them should be executed (after we get rid of all the spammers, of course.)

takanori said...

I would like to get more information about the affiliate program which create the URL like "http://www.dsfy.com/yourcom/stetje.cgi?1" for affiliators.

I am wondering whether using this program for the marketing purpose is violating the google's webmaster guidelines or not.

Currently, I am using the following program: http://www.yoursoft-tm.com/yourcom.htm

Thank you very much for your advise

webmaster said...

thanks,
it's a usefull info

Web Site Design

Teddie said...

Re: http://www.google.com/support/webmasters/bin/answer.py?answer=34450

It references a form, but the form has no specific option for this problem. Please can you add one to expedite these fixes.

Rohit said...

I have no idea why my blog has been blocked from the Google index suddenly! I have made no changes to my blog in the past couple of months and suddenly out of the blue my blog is blocked. I don't have any idea why this has happened. My blog url is www.visualreactor.org - Please if you can tell me anything that helps ...

Anthony Toop said...

Hi Vanessa
Thanks for this article. We are working hard to conform to the guidelines.
Recently our website was banned from google due to "hidden text". This was implemented by previous 'nieve' webmasters, and we were suprised we didnt recieve a warning first.
We have since rectified the problem, as well as made our site much more 'googlebot' friendly. We have also submitted a request for consideration within the webmaster area.
Do you know how long we must wait to get our site back in google's index ?

Many thanks,
Nathaniel Wolff

Jamie said...

I run a small tourism information website and have a competitor who uses a variety of techniques which openly abuses the Google webmaster guidlelines with multiple and identical sites with links all pointing to the primary site they market. I have used the Google Spam report to report this but nothing gets done.

Any advice appreciated

aasmi said...
This comment has been removed by a blog administrator.
Google Webmaster Central said...

Hi everyone,

Since months have passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Help Group.

Thanks and take care,
The Webmaster Central Team