Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Advanced Website Diagnostics with Google Webmaster Tools

Tuesday, September 30, 2008 at 11:07 AM

Running a website can be complicated—so we've provided Google Webmaster Tools to help webmasters to recognize potential issues before they become real problems. Some of the issues that you can spot there are relatively small (such as having duplicate titles and descriptions), other issues can be bigger (such as your website not being reachable). While Google Webmaster Tools can't tell you exactly what you need to change, it can help you to recognize that there could be a problem that needs to be addressed.

Let's take a look at a few examples that we ran across in the Google Webmaster Help Groups:

Is your server treating Googlebot like a normal visitor?

While Googlebot tries to act like a normal user, some servers may get confused and react in strange ways. For example, although your server may work flawlessly most of the time, some servers running IIS may react with a server error (or some other action that is tied to a server error occurring) when visited by a user with Googlebot's user-agent. In the Webmaster Help Group, we've seen IIS servers return result code 500 (Server error) and result code 404 (File not found) in the "Web crawl" diagnostics section, as well as result code 302 when submitting Sitemap files. If your server is redirecting to an error page, you should make sure that we can crawl the error page and that it returns the proper result code. Once you've done that, we'll be able to show you these errors in Webmaster Tools as well. For more information about this issue and possible resolutions, please see http://todotnet.com/archive/0001/01/01/7472.aspx and http://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx.

If your website is hosted on a Microsoft IIS server, also keep in mind that URLs are case-sensitive by definition (and that's how we treat them). This includes URLs in the robots.txt file, which is something that you should be careful with if your server is using URLs in a non-case-sensitive way. For example, "disallow: /paris" will block /paris but not /Paris.

Does your website have systematically broken links somewhere?

Modern content management systems (CMS) can make it easy to create issues that affect a large number of pages. Sometimes these issues are straightforward and visible when you view the pages; sometimes they're a bit harder to spot on your own. If an issue like this creates a large number of broken links, they will generally show up in the "Web crawl" diagnostics section in your Webmaster Tools account (provided those broken URLs return a proper 404 result code). In one recent case, a site had a small encoding issue in its RSS feed, resulting in over 60,000 bad URLs being found and listed in their Webmaster Tools account. As you can imagine, we would have preferred to spend time crawling content instead of these 404 errors :).

Is your website redirecting some users elsewhere?

For some websites, it can make sense to concentrate on a group of users in a certain geographic location. One method of doing that can be to redirect users located elsewhere to a different page. However, keep in mind that Googlebot might not be crawling from within your target area, so it might be redirected as well. This could mean that Googlebot will not be able to access your home page. If that happens, it's likely that Webmaster Tools will run into problems when it tries to confirm the verification code on your site, resulting in your site becoming unverified. This is not the only reason for a site becoming unverified, but if you notice this on a regular basis, it would be a good idea to investigate. On this subject, always make sure that Googlebot is treated the same way as other users from that location, otherwise that might be seen as cloaking.

Is your server unreachable when we try to crawl?

It can happen to the best of sites—servers can go down and firewalls can be overly protective. If that happens when Googlebot tries to access your site, we won't be able crawl the website and you might not even know that we tried. Luckily, we keep track of these issues and you can spot "Network unreachable" and "robots.txt unreachable" errors in your Webmaster Tools account when we can't reach your site.

Has your website been hacked?

Hackers sometimes add strange, off-topic hidden content and links to questionable pages. If it's hidden, you might not even notice it right away; but nonetheless, it can be a big problem. While the Message Center may be able to give you a warning about some kinds of hidden text, it's best if you also keep an eye out yourself. Google Webmaster Tools can show you keywords from your pages in the "What Googlebot sees" section, so you can often spot a hack there. If you see totally irrelevant keywords, it would be a good idea to investigate what's going on. You might also try setting up Google Alerts or doing queries such as [site:example.com spammy words], where "spammy words" might be words like porn, viagra, tramadol, sex or other words that your site wouldn't normally show. If you find that your site actually was hacked, I'd recommend going through our blog post about things to do after being hacked.

There are a lot of issues that can be recognized with Webmaster Tools; these are just some of the more common ones that we've seen lately. Because it can be really difficult to recognize some of these problems, it's a great idea to check your Webmaster Tools account to make sure that you catch any issues before they become real problems. If you spot something that you absolutely can't pin down, why not post in the discussion group and ask the experts there for help?

Have you checked your site lately?

The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

34 comments:

Jenn said...

Google webmaster tools has come a long way from simple sitemap submission to all the analysis and help us optimizers get from the tool. I personally love the content analysis which helps me measure how effective our SEO efforts targeting key terms in content are.

SEO isn't just about meta tags anymore...
Thanks Google for all the great help!

vBulletinSetup Admin said...

Great article, I have web tools setup on one of my homepage tabs :D

saliloli said...

Webaster tool has been very useful in improving the performance of a website. It gives in and out what people are doing on your site. With new features it has definitely helped in doing health check up of my website.

I regular monitor performance of my website using websmaster tools it has been very benificial to improve my ranking as well as has improved my site performance.

Veda said...

Thanks for the kind information John,
But i've an issue in GWT Duplicate Title & MEta Section:

Almost two months back I 301 redirected a page (eg./xyz.html) to almost the same content page(eg. /abc.html), with same Title & Metas, (i've several times checked for the redirection using various checker tools and they return perfect 301 status).

But these two pages are showing errors for Duplicate Title & Meta Description in GWT?

To remove problem Again i did slight changes in Meta, Title as well as in content too in the Live page (/abc.html) and thought it would be alright now.

But it is still showing the same error even after the above changes.

I should mention you both the pages are still cached in the Google.

Now please tell me what should i do. One option I'm thinking of remove the old (/xyz.html) page from the cache using the webpage removal tool in GWT, will it work ? (the page is not throwing 404 error , it is perfectly throwing 301 status code ).

Waiting for your kind solution to my problem.
Thanks in Advance!

R. Patel said...

My URL is redirecting to Google, what is going on? I am losing a lot of traffic...Currently holding a contest on my blog...Paying for a custom domain and not getting service. I am upset.

dharchana said...

so we’ve provided Google Webmaster Tools to help webmasters to recognize potential issues before they become real problems. Some of the issues that you can spot there are relatively small (such as having duplicate titles and descriptions ), other issues can …>> (more…)
-------------------
jones
Internet Marketing

Ian M said...

As someone who works with a lot of blue chip client sites, it would be exceptionally useful to be able to set up 'guest accounts', which allow viewing of everything but not changing of anything.

This would be a massive boon to persuading these clients to provide access to their SEO agency.

seoer said...

Great article, but one precisation: IIS never treat case-sensitive URL in different manner than non case.
Both http://www.mywebsite.com/MYPAGE or http://www.mywebsite.com/mypage are exactly the same and a 200 error code is returned in a standard environment.

Maybe you were talking about linux based system nor some IIS server with some handler installed to "simulate" linux /unix web server behaviour.

FlemmingLeer said...

Google webmaster tools content problems not respecting robots.txt

I discovered that the Content problems section is displaying issues which are blocked in robots.txt and therefore should not be displayed as an issue.

When will the content problems respect robots.txt blocks ?

Foot In Mouth said...

I see much improvement in the webmaster tools, however, I am frustrated by the inconsistent data that displays in the "Top Search Queries". This seems to happen when information shifts from month to month. Meaning, Impressions and Traffic I saw for august when it was august are different than what displays for august now that it is September... If google is going to block services like Webmaster Gold from checking their serps for results they should provide accurate search metrics for site owners.

Tomas said...

Very nice tools. But anyway, would there be analytics at faster servers? They are slowing everything lately. I'd rather use statscounter for stats due to this.

John Mueller said...

@Veda Having a URL show up in the Duplicate Title & Meta Description section in Webmaster Tools should be considered more of a warning than an error -- if you're fine with those URLs, then there's no need to change things. I certainly wouldn't use the removal tool just to clean up the warning area. Without knowing the URLs and being albe to double check that it's all set up correctly, I would recommend just leaving it like that. (If you have any doubts about these URLs, feel free to post in the Webmaster Help groups, including your URLs.)

John Mueller said...

@R.Patel I would recommend posting in the Webmaster Help groups; make sure you include the URL & where you are seeing the redirect.

John Mueller said...

@R.Patel I would recommend posting in the Webmaster Help groups; make sure you include the URL & where you are seeing the redirect.

John Mueller said...

@Ian M. That's a good idea, I'll pass it on to the team!

John Mueller said...

@FlemmingLeer It might be that your robots.txt has changed recently and that we're still using the content information which we have from earlier. We would not re-crawl URLs when they are disallowed in the robots.txt, but we generally hold on to the old content if we were able to crawl earlier.

John Mueller said...

@Foot In Mouth: That's a fair point and something I know the team is working on. Thanks for bringing it up!

CS_Swan said...

google last accessed my home page www.walleyeworldcampground.com on august 14th before the page was altered from under construction.
I have submitted my sitemap 2 weeks ago webmaster tools says my sitemap is ok last downloaded 20 hours ago
but googles cache of my site still says underconstruction

please send help

Nawri said...

great artical, i'll try it, it's very intersting

lazar said...

People,
Google webmaster tools unfortunately do not let us tell Google when it has grossly wrong info about a website. I've got a new domain www.generatorguide.net more then a month ago, my index page is indexed, no other pages in this domain are in Google index, and yet the search for "similar pages" [ related:www.generatorguide.net/ ] returns adult sites, which have nothing to do my site, which is a techy one. I feel a month should be sufficient for Google technology to properly assert a site. After all, it's not just about my site, it is about credibility of Google info.
Ideally, if Google tools let webmaster flag such cases.

James said...

Is their a diagnostic tool to tell whether your webpage has been stolen, or cloned? When I search for our company name Tru Design Media in Google, another website with a domain name exactly like our company name is listed, yet we are no where to be found. How did they do this? They have our exact title and description, the cache is different, and the page is different than the cache, title, all of it. How can I fix this? What did they do, help!

admin said...

I think these posts from Google are simply great for us.

The very bad thing is that, also perfectly following advices, we sometimes can't understand why there is NO feedback at all.

My site, for example, was penalized about 4 months ago. Going deeper into site I found that a mysql problem caused hundreds of links to be broken.

I fixed it and hopefully waited some change in serps...unfortunately no messages in WMT or any luck.

Very disappointing for me; no reason at all to kill my site even if it is a small site... isn't the web made of small sites ?

Regards.

Tru Design Media said...

yay! the problem with our kelowna web design site and the company name tru design media has been resolved...

Herr Lucifer - The Fallen Angel said...

This was almost the best article I have read in this blog. I hope it make changes to how I administrate my websites in the future...

w kong pung said...

I'm very interesting at webmaster tools. I'm not expert in internet. But I like to learn about it.

Paul Ho Kang Sang said...

I signed up to Adsense some months back. Then some months ago when I tried to access Adsense, it doesn't allow me to login. Whenever I tried, (until todaY)

it gives me this error. https://www.google.com/adsense/noaccount

It says I have NO ACCOUNT. SO I go and set up a new account

Then it says that the person or email already has an account.

I have been trying this for no less than 10 to 20 times. I am really frustrated, can someone please help?

patrick said...

I'm working on auditing my clients architecture. I found a lot of URL's that no longer exist on the site in the Not Found report. Two things:

1. If they no longer exist don't I want them to be 404 correct? What command should I assign to them?

2. Other pages are linking to them. Some on my site and some from other 3rd party sites. What should I tell them to do?

Objectivity said...

Hi - Webmaster Tool "Diagnostics" are incorrect in at least two cases, and I'm concerned whether your bugs will impact Quality Score or Page Rank.

I've encountered a case-sensitive issue in my Webmaster Tools reporting. Reviewing the "Diagnositcs -> Content analysis -> Duplicate meta tags", and "Duplicate title tags" reports. In both reports, I'm seeing duplicate references to the SAME pages, where the only difference in the noted URLs is the case.

Please update your Diagnostics to account for the server type. Our site runs on IIS, which is most definitely absolutely NOT case-sensitive as was mentioned in one of your posts here: http://googlewebmastercentral.blogspot.com/2008/09/advanced-website-diagnostics-with.html (the second paragraph of the section titled "Is your server treating Googlebot like a normal visitor?").

I use Google's Webmaster Tools to ensure my site is in good standing with Google, and I worry that any false readings on your part might also impact how we are indexed by Google search.

Please update your Diagnostics to add a server-type check, and ignore any case-issues for sites that are running any Windows/IIS configurations.

Thanks,
TK

Objectivity said...

Also, a follow-up to my last post, Webmaster Tool Diagnostics also doesn't seem to be accounting for the default directory index page. I'm seeing multiple references under the duplicate meta tag and title reports, where the "duplicate" is coming from the exact same page/file name (eg., "/dir/" AND "/dir/default.asp" - are the same file - which you could confirm through a simple content comparison). Again, the concern here is that your bugs might cause the site to be penalized, when we're actually doing everything we're supposed to be doing (to make our site friendly, accessible and compliant with search engine requirements).

That's all...

interclass said...

I asked people at a webmaster forum and unfortunately didn't get any results.

I've been using webmaster tools to make any corrections to our site (http://www.interclass.kiev.ua) and it's been very useful and given many ideas of how to improve it. (Thank you for such a nice tool :)

Unfortunately this month crawling mistakes have been appearing for more and more pages: now already 65!!! pages :(((

Crawling errors
Unreachable (65)
www.site/page/ Error 200

Could you advice me of anything until it's too late...

Thank you in advance!

John said...

I am trying and trying to get Google webmastertools to stop showing my site as 'Network Unreachable' but am failing miserably. Using wget, my site returns 200, no problems. Using Googlebot as my user agent, still no problems. I don't know what I am missing, but webmastertools doesn't seem to like my site at all and slowing my ranking and not tracking any external links I have pointing to my site as well.

INDIAN said...

Why adsense is saying that, my website is still underconstruction??

Where as i have completed my website.

Vinyl Record Site Reviewer said...

you have a broken link in your article:

http://www.google.com/support/webmasters/bin/answer.py?answer=40362

Google Webmaster Central said...

Hi everyone,

Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Central Help Forum.

Thanks and take care,
The Webmaster Central Team