Monday, March 05, 2007 at 4:05 PM
Recently, Danny Sullivan brought up good questions about how search engines handle meta tags. Here are some answers about how we handle these tags at Google.Multiple content values
We recommend that you place all content values in one meta tag. This keeps the meta tags easy to read and reduces the chance for conflicts. For instance:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
If the page contains multiple meta tags of the same type, we will aggregate the content values. For instance, we will interpret
<META NAME="ROBOTS" CONTENT="NOINDEX">
<META NAME="ROBOTS" CONTENT="NOFOLLOW">
The same way as:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
If content values conflict, we will use the most restrictive. So, if the page has these meta tags:
<META NAME="ROBOTS" CONTENT="NOINDEX">
<META NAME="ROBOTS" CONTENT="INDEX">
We will obey the NOINDEX value.
Unnecessary content values
By default, Googlebot will index a page and follow links to it. So there's no need to tag pages with content values of INDEX or FOLLOW.
Directing a robots meta tag specifically at Googlebot
To provide instruction for all search engines, set the meta name to "ROBOTS". To provide instruction for only Googlebot, set the meta name to "GOOGLEBOT". If you want to provide different instructions for different search engines (for instance, if you want one search engine to index a page, but not another), it's best to use a specific meta tag for each search engine rather than use a generic robots meta tag combined with a specific one. You can find a list of bots at robotstxt.org.
Casing and spacing
Googlebot understands any combination of lowercase and uppercase. So each of these meta tags is interpreted in exactly the same way:
<meta name="ROBOTS" content="NOODP">
<meta name="robots" content="noodp">
<meta name="Robots" content="NoOdp">
If you have multiple content values, you must place a comma between them, but it doesn't matter if you also include spaces. So the following meta tags are interpreted the same way:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
If you use both a robots.txt file and robots meta tags
If the robots.txt and meta tag instructions for a page conflict, Googlebot follows the most restrictive. More specifically:
- If you block a page with robots.txt, Googlebot will never crawl the page and will never read any meta tags on the page.
- If you allow a page with robots.txt but block it from being indexed using a meta tag, Googlebot will access the page, read the meta tag, and subsequently not index it.
Googlebot interprets the following robots meta tag values:
- NOINDEX - prevents the page from being included in the index.
- NOFOLLOW - prevents Googlebot from following any links on the page. (Note that this is different from the link-level NOFOLLOW attribute, which prevents Googlebot from following an individual link.)
- NOARCHIVE - prevents a cached copy of this page from being available in the search results.
- NOSNIPPET - prevents a description from appearing below the page in the search results, as well as prevents caching of the page.
- NOODP - blocks the Open Directory Project description of the page from being used in the description that appears below the page in the search results.
- NONE - equivalent to "NOINDEX, NOFOLLOW".
As defined by robotstxt.org, the following direction means NOINDEX, NOFOLLOW.
<META NAME="ROBOTS" CONTENT="NONE">
However, some webmasters use this tag to indicate no robots restrictions and inadvertently block all search engines from their content.


24 comments:
How do you handle the noindex, follow command. Does that pass any value like anchor text, pr, and themeing.
Are you sure about rel="nofollow" on a link stopping the linked page being crawled (as opposed to it not receiving any PR)?
If you look at p1dRobert's case in http://groups.google.com/group/Google_Webmaster_Help-Requests/browse_thread/thread/4ad2ef1f770e314b/359ca87d73d7a7cc#359ca87d73d7a7cc it appears that the spider has followed a link from Google Groups even though they're always nofollowed.
Oops. Thought the link in my previous post would be made clickable. Try this link instead.
David: "noindex,follow" works fine at Google (unfortunately, nowhere else, it seems). It is great for pages like blog summaries, which have a lot of unrelated articles.
See my blog (in French) about this issue:
http://www.bortzmeyer.org/indexation-blog.html
Great! post! I personally use the noodp link and the rest I take care of with robots.txt
I don't think the META tags are being respected by Googlebot.
Search this in Google:
cache:www.geocities.com/rmtiwari/main.html
You can see the NOFOLLOW NOINDEX metatag in the souce of "cached page", which should not have happened. It means that the crawl /caching was done for a page with the exclsion meta tags.
Please check.
Thanks,
Ram
I like to use Googlebot in the tag. I believe the bot likes to see its name.
meta name="googlebot" content="index, follow"
SEO Expert
Sorry ...my mistake..
Someone pointed an error in HTML. There should have been a gap between META and NAME.
Cheers,
Ram
Just wondering what googlebot would make of the following.
>META NAME="ROBOTS" CONTENT="NOFOLLOW, NOINDEX"<
Is this valid? What would be the outcome?
We use NOODP on our site but for some reason it ONLY shows whatever NOODP has indexed now. Any idea as to why? The site is www.techwyse.com
What would you say about Revisit Robots after 14 days ? or revisit robots tag? ?
Does they really work ?
krunal, see http://webmastershelp.iblogget.com/2007/05/07/meta-tags/
how long roughly does it take for the google bot to crawl into my blog?
I do have a site with a problem like this. The webmaster blocked the home page with meta robots - he wrote < META NAME="Robots" CONTENT="None" >. I corrected and submitted the sitemap to Google again.
Does anyone know how long it takes until the page is crawled again? All the other pages of the site are indexed.
Thanks.
what are the benefits of not indexing your page? that does not make any sense to me
In some cases it will be wise to not index a page - for example the home page of a blog which changes very often.
But in most of the cases it is a mistake.
I have a dynamic site in which you could get to the same page content via different query string parameters. I recently changed it from using numeric ID's to real name text strings to be more friendly.
I no longer use the numeric ID's in the URL's for my links, and if a page is accessed via a numeric ID, I throw the robots noindex, nocache tag. (See the 2 links above to get an idea of what I mean).
What happens to those pages with numeric ID's that were index and cached by Google the next time Google spiders it? It should not be able to get to those pages with numeric ID's from my site anymore (I do not use the numeric ID links anymore on my site).
Hi I was surprised to see that there was no cached link for one of the search result. The website was also not having good content . Still on Top ranking in Google
Check this keyword in google.com
Budget hotels in Milton Keynes
You will find www.miltonkeyneshotel.com/-Similar pages - Note this
THis the link you get on the first page. there is no cached page.
But still this site is achieving top ranking
To Stephane Bortzmeyer:
Stephane, if you are still reading this post, could you write an English translation of your French post at http://www.bortzmeyer.org/indexation-blog.html please? I went over and read the post but I did not understand it 100%. (I am sorry my French isn't as good as it used to be.)
From what I understand, you appear to imply that "noindex,follow"
is useful for pages containing diverse, unrelated articles - such as blogs and mail archives.
In the example, you said you searched for "FreeBSD LDAP" in Google and got an article on "FreeBSD" and an unrelated article on "LDAP" in the search results.
So, the "noindex" tag will allow the owner of a general blog or archive prevent his pages from appearing in Google's index (and search results pages), but surely it cannot stop Google from returning unrelated pages on "FreeBSD" and "LDAP" if the index does not have many pages containing "FreeBSD LDAP"?
i did what you say, but to no avail. should i submit my link first, and get google to map my blog??
here's the url:
http://animealtar.blogspot.com/
When you use the "noindex" ... google will return to check the "noindex" eventually? i'd like to remove temporarily from the index...
Google interesting with meta tags of course and with your contents, if your contents don't look like your meta keywords, it will never work Revisit Robots after 14 days etc... if u update your site great google love your site... I have dynamic flash based web site I use php code for updates for contents. They look automatic on home page. Try this for flash sites. Meta tags is very important with your contents don't forget...
Hi everyone,
Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Help Group.
Thanks and take care,
The Webmaster Central Team
Post a Comment