Monday, October 13, 2008 at 2:03 PM
Ever since we released the crawl errors feature in Webmaster Tools, webmasters have asked for the sources of the URLs causing the errors. Well, we're listening! We know it was difficult for those of you who wanted to identify the cause of a particular "Not found" error, in order to prevent it in the future or even to request a correction, without knowing the source URL. Now, Crawl error sources makes the process of tracking down the causes of "Not found" errors a piece of cake. This helps you improve the user experience on your site and gives you a jump start for links week (check out our updated post on "Good times with inbound links" to get the scoop).In our "Not Found" and "Errors for URLs in Sitemaps" reports, we've added the "Linked From" column. For every error in these reports, the "Linked From" column now lists the number of pages that link to a specific "Not found" URL.
Clicking on an item in the "Linked From" column opens a separate dialog box which lists each page that linked to this URL along with the date it was discovered. The source URL for the 404 can be within or external to your site.

For those of you who just want the data, we've also added the ability to download all your crawl error sources at once. Just click the "Download all sources of errors on this site" link to download all your site's crawl error sources.

Again, if we report crawl errors for your website, you can use crawl error sources to quickly determine if the cause is from your site or someone else's. You'll have the information you need to contact them to get it fixed, and if needed, you can still put in place redirects on your own site to the appropriate URL. Just sign in to Webmaster Tools and check it out for your verified site. You can help people visiting your site—from anywhere on the web—find what they're looking for.
Written by Jonathan Simon, Webmaster Trends Analyst and Michael Williamson, Webmaster Tools Intern


49 comments:
hi im not sure if this is the right place to ask, but currently in my sitemaps google webmaster tools, i encounter this error below and it wont crawl the RSS of my site for some reason. the address blog/?feed=rss2 isn't the feedburner rss i use, but i can't find the option to change the address in the tools site. would you know how i can do this? would appreciate your help. thank you.
blog/?feed=rss2
RSS Feed 53 minutes ago Errors 1
This is a very welcome addition to Webmaster Tools, thank you. I had been scratching my head with URL not found messages on Webmaster Tools before this.
Now, I still get a URL not found error, and the pop-up shows the "Discovery Date" is Jun 13, 2008, and the link is from two of my own pages. However, the "Problem detected on" date is Sep 21, 2008. I checked (and double checked) - all links pointing to the URL were removed in August!
Just curious - does that mean Google is crawling the pages but not verifying rectified broken links?
I didn't know where my crawl errors were coming from, but now I do... I added page-tagging for Google Analytics with the command trackpageview(../../..); Somehow the tag I give in this command is seen as a link, but I only use it to group pages within Google Analytics.
Thank you!
This removes a big source of frustration. Keep up the good work!
Woohoo! The best news I've had in ages. Thank you so much!!!
Glad to read that finally this functionality requested by tons of users (me included) has been finally implemented.
I don't see it my WMT yet, but I think is a simple delay of the upgrading process.
Bye
Andrea
Nice, but there's a problem -- whenever you close the in-page popup dialog (the one after clicking "pages") the page will scroll way up to the top again. So if you want to click on pages one by one in the list, you'll always lose your last point of focus.
Thank you for this feature - I was waiting for this. Knowing which all broken links were linked to without knowing which page linked to it was next to useless for me.
This is a good news! thanx!
Very useful! However, I've just been through the whole list for one of my sites and it says that I have 51 links to index.htm (when my homepage is index.html). I've just been to all of them and checked the source and they are all just linking to the domain name (no index. anything in the code). Surely this is not good for my site as Google now thinks it has 51 links to a dead page?? If this was fixed it would be a great tool!
THANK YOU! This is a great addition.
This is a great feature that I use extensively. I had a bunch of problems with http://www.dannedelko.com which I had created myself through testing rewrite rules with mod_rewrite and Apache.
Webmaster tools helped me track down many of problems. So this leads to the answer to the question: Does implementing a Google XML sitemap help increase your rankings?
Well no.
But yes. Since you can see mistakes you are making on your doain, repair them and properly lead Googlebot through the site, get better more accurate indexing.
Love the answer, well no but yes!
- Dan Nedelko
Finally. nice addition
Hi, I thought I could track down all of the reported 404s at last - but it turns out the links are from other pages which also don't exist in my site any more!
Also, I've started getting 'Google Alerts' about some of these non-existent pages ... anyone suggest what might be wrong?
Excellent feature. It will help with SEO and with general web usability.
How's about providing the source URLs for the 'In external links to your site' section of 'What Googlebot sees'? For instance at the moment a webmaster can expand on a phrase to see its upper case & lower case variations but cannot see which URLs contain these phrases in links to their site. The 'Pages with external links' section would work well if integrated with some anchor text information.
Good job on this feature though.
This is an awesome addition to webmaster tools. It was always a little tricky to do this with various analytics tools I've worked with which led us to create our own various solutions to tracking and fixing 404s. Eventually we "productized" it into www.errorlytics.com - which is in beta right now with free access if anyone who is interested in this subject matter would like to try it and shoot us some feedback. It actually lets you fix the 404s once it, or google webmaster tools finds the source of the 404s. It lets you auto"magically" create search engine friendly 301 redirects without have to know regex or dig into your .htaccess file every time you need to correct a 404.
Thank you
David
Hi,
This is brilliant. Something I have been requesting and waiting for since quite some time. A big thank you to the Google Webmaster team.
Keep up the good work.
Google rocks!!!
A nice feature, really useful!
Many thanks Google :)
I have a similar problem as Steve E posted. Anyway to correct this?
Steve E said...
Very useful! However, I've just been through the whole list for one of my sites and it says that I have 51 links to index.htm (when my homepage is index.html). I've just been to all of them and checked the source and they are all just linking to the domain name (no index. anything in the code). Surely this is not good for my site as Google now thinks it has 51 links to a dead page?? If this was fixed it would be a great tool!
This sounds like a great tool to use. Thanks for the heads up.
Hi,
Nice added service and very useful one, it is very helpful.
But after that "Linked From" is added, I don't see crawled date after I login to GWT.
More in http://groups.google.com/group/Google_Webmaster_Help-Indexing/browse_thread/thread/1edee281f2d85dfc#
thankyou,
I am trying to get my website http://www.hireweddingcar.co.uk/ up the on to the 1st page of Google, Google says l have errors, l do not totally understand everything about Google but l am trying to learn, l want my website to feature for wedding car and other wedding related wedding car search phrases, can you help.
I like this newest addition to to the webmaster tools. It was definitely necessary.
But there is a problem with your tool. It informs me that I have 86 broken links, which truely are not broken, and then it lists the sources for these broken links as pages that I have removed long ago, and do not exist. So the sources of my 404 errors pull 404 errors themselves.
My bosses check the webmaster tools just like me. They ask why it says so many broken links and they become panicky, and I have to tell them, "I don't know what the deal is, our link structure is almost perfect, Google' software is faulty."
Awesome new feature!
I am finding that when I click on the X Pages Linked From links, I get an error message, rather than the URLs:
Our system is currently busy. Please try again in a few minutes.
Seems this might be a bug, no amount of time or refreshing fixes this error.
Hope you can look into this.
Thanks!
I have used this its great matt :-)
Would be cool if google could make a tool that sets a page that was not found to the url root :-) you could set a custom 404 in webmasters tools something like for example when you can tell google to include www. or just http://site
"Hi, I thought I could track down all of the reported 404s at last - but it turns out the links are from other pages which also don't exist in my site any more!
Also, I've started getting 'Google Alerts' about some of these non-existent pages ... anyone suggest what might be wrong?"
I agree this is a great new facility but ther inclusion of old links which have long gone, does make it more difficult to trace where the old link comes from.
In many cases I have seen 6 links shown, five of which have long gone but only one still exists.
Can something be done about that?
But overall great to see this new facility.
What an excellent addition!
However I get the same issues as Click by Lavalife gets.
I also have the message
Our system is currently busy. Please try again in a few minutes.
I've been trying since yesterday.
I too have recent (problem detected on Oct 2008) errors showing from long-gone (2006!) pages; does Google do the analysis on pages cached long ago? When I check broken internal links on my site I don't get these errors.
I also get external errors for people linking to www.peteranne.it/index.html (we don't have index.html, but index.htm), but if I go an look at this pages, the people link to www.peteranne.it...any idea, therefore, why we get this error??
Does it matter that external links are pointing to your site, as far as SEO is concerned? For example, lets say you purchased a previously owned domain and create completely new pages. Will links to the now non-existent pages hurt your site rankings if they end up as 404 errors?
nice news thank you buddy this is very good news you have given when i had seen it in goggle webmaster i couldn't understand it.
It would be nice to be able to remove errors from our own Google Webmaster Tools | Overview page.
I'm seeing errors that will never be fixed. They are request parameters that have been changed and other similar changes.
Without the ability to "clean the slate" so to speak, that list of errors is going to get big and eventually useless as we would have to figure out the "real" crawl errors from the old/no longer relevant crawl errors.
Hi, I am trying to get google to index my website, but after 3 weeks i've given up! My website can be easily crawled (tested it), but if i look on igoogle at what googlebot sees, it says: "Our system is currently busy. Please try again in a few minutes" What does it mean? And how can i correct it? Any help would be appreciated! Thanks Sydney...
Sometimes it takes a while to get indexed. You just have to be patient.
Hi, I have a question… our server went down due server migration and now on one of our sites it shows over 22000 not found url’s. The site has been backup online for about a week now and I was wondering how I can remove those 22000 not found urls since all the pages are back online now ?
Thanks
As your pages get revisited by Google's crawler these errors should go away over time. If you have further questions you should post them to the Webmaster Help Forum:
http://www.google.com/support/forum/p/Webmasters?hl=en
this looks good and its showing the links for 404 pages.
how can i see the links "URLs restricted by robots.txt" errors. where can i find the information which files having restrictions in robots.
But can we remove them from google index so that SEO will improve...
Hi, I changed all the URLs in different addresses lately. And now I have got many links NOT Found in web crawl . Can anyone tell me what should I do?
hi
thanx a million
...been turning the web upside down for this info after i found out how many broken links i have on my sites...
now it will be a lot easier to fix
I don't know if this is the right place to ask but how can we remove all the not found url's from google index in a single click or a easier way...
Very good information. Thanks a lot
I am getting an error saying that I have two internal links pointing to a non-existent page called /a%3E
I cannot find any such reference looking through the source code of the supposed "culprit pages".
Where does Google get this from?
Thanks
Hi,
is there an option to generate reports using Web Master Tools and forward these onto third parties?
Your webmaster tools continually reports that my robots.txt file is restricting http://www.harmonieii.co.uk/nancymay/index.html yet the file does not contain any such disallow syntax???
I have a blog at wisconsinwhitetail.blogspot.com can someone please tell me if they find anything wrong that is crawling specific. I am not getting any issues, but crawling stats are not popping up.
Is there any way to have errors that were resolved removed? It seems that even though we've fixed some errors in links in our own pages, the errors are still showing up in Webmaster tools. The errors are fixed, why is the crawler still finding them?
Hi,
I'm using webmaster tools from a while and satisfied very much but from last few days I'm viewing 404 pages, which sources are unavailable. is anyone also facing these types of issues...
Thank you! Been looking for a tool to help me out and some tips. This one will do nicely especially in identifying some errors I have found.
Thanks!
Post a Comment