Wednesday, September 20, 2006 at 11:45 AM
Lately I've heard a couple smart people ask that search engines provide a way know that a bot is authentic. After all, any spammer could name their bot "Googlebot" and claim to be Google, so which bots do you trust and which do you block?The common request we hear is to post a list of Googlebot IP addresses in some public place. The problem with that is that if/when the IP ranges of our crawlers change, not everyone will know to check. In fact, the crawl team migrated Googlebot IPs a couple years ago and it was a real hassle alerting webmasters who had hard-coded an IP range. So the crawl folks have provided another way to authenticate Googlebot. Here's an answer from one of the crawl people (quoted with their permission):
Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name; eg:
> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.
> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
I don't think just doing a reverse DNS lookup is sufficient, because a spoofer could set up reverse DNS to point to crawl-a-b-c-d.googlebot.com.
This answer has also been provided to our help-desk, so I'd consider it an official way to authenticate Googlebot. In order to fetch from the "official" Googlebot IP range, the bot has to respect robots.txt and our internal hostload conventions so that Google doesn't crawl you too hard.
(Thanks to N. and J. for help on this answer from the crawl side of things.)


8 comments:
one other thing to realize is that you need to look at the hostname from the RDNS lookup (i.e. must have "google.com" at end of string). Also, many of the search engine's IP addresses do not resolve to any DNS, so this is kinda buggy (see iplists.com). Finally, IP spoofing can get around this all together.
The solution described here is more complicated than necessary. I've explained a much simpler solution here:
http://botsosphere.blogspot.com/2007/05/automatic-verification-of-machine.html
It might also help if we know what Googlebot actually "looks" like. Since you're the closest to knowing, would you care to give us any hints for the contest? ;)
http://blog.auinteractive.com/googlebot-competition
Matt, Good to see your post here and ofcourse great information.
Pratheep
You can check the IP address on websites like: Ip address lookup , you will know if it is google bot or no.
When is GoogleBot going to support challenge SHA256 keys which domains can register? This way, nothing can be spoofed?
The information about verifying google bot to use DNS to verify. But i am a beginner to web hosting and i had created a site on Rudraksha . I can't afford to appoint anyone to maintain it and i donno to use DNS to verify IP's in this case any alternate solution to verify google bot
Hi everyone,
Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Help Group.
Thanks and take care,
The Webmaster Central Team
Post a Comment