Monday, May 02, 2011 at 9:00 AM
Webmaster level: Beginner/IntermediateSo there you are, minding your own business, using Webmaster Tools to check out how awesome your site is... but, wait! The Crawl errors page is full of 404 (Not found) errors! Is disaster imminent??
Fear not, my young padawan. Let’s take a look at 404s and how they do (or do not) affect your site:
Q: Do the 404 errors reported in Webmaster Tools affect my site’s ranking?
A: 404s are a perfectly normal part of the web; the Internet is always changing, new content is born, old content dies, and when it dies it (ideally) returns a 404 HTTP response code. Search engines are aware of this; we have 404 errors on our own sites, as you can see above, and we find them all over the web. In fact, we actually prefer that, when you get rid of a page on your site, you make sure that it returns a proper 404 or 410 response code (rather than a “soft 404”). Keep in mind that in order for our crawler to see the HTTP response code of a URL, it has to be able to crawl that URL—if the URL is blocked by your robots.txt file we won’t be able to crawl it and see its response code. The fact that some URLs on your site no longer exist / return 404s does not affect how your site’s other URLs (the ones that return 200 (Successful)) perform in our search results.
Q: So 404s don’t hurt my website at all?
A: If some URLs on your site 404, this fact alone does not hurt you or count against you in Google’s search results. However, there may be other reasons that you’d want to address certain types of 404s. For example, if some of the pages that 404 are pages you actually care about, you should look into why we’re seeing 404s when we crawl them! If you see a misspelling of a legitimate URL (www.example.com/awsome instead of www.example.com/awesome), it’s likely that someone intended to link to you and simply made a typo. Instead of returning a 404, you could 301 redirect the misspelled URL to the correct URL and capture the intended traffic from that link. You can also make sure that, when users do land on a 404 page on your site, you help them find what they were looking for rather than just saying “404 Not found."
Q: Tell me more about “soft 404s.”
A: A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn’t exist. A common example is when a site owner wants to return a pretty 404 page with helpful information for his users, and thinks that in order to serve content to users he has to return a 200 response code. Not so! You can return a 404 response code while serving whatever content you want. Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s. Both of these cases can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content. Keep in mind that just because a page says “404 Not Found,” doesn’t mean it’s actually returning a 404 HTTP response code—use the Fetch as Googlebot feature in Webmaster Tools to double-check. If you don’t know how to configure your server to return the right response codes, check out your web host’s help documentation.
Q: How do I know whether a URL should 404, or 301, or 410?
A: When you remove a page from your site, think about whether that content is moving somewhere else, or whether you no longer plan to have that type of content on your site. If you’re moving that content to a new URL, you should 301 redirect the old URL to the new URL—that way when users come to the old URL looking for that content, they’ll be automatically redirected to something relevant to what they were looking for. If you’re getting rid of that content entirely and don’t have anything on your site that would fill the same user need, then the old URL should return a 404 or 410. Currently Google treats 410s (Gone) the same as 404s (Not found), so it’s immaterial to us whether you return one or the other.
Q: Most of my 404s are for bizarro URLs that never existed on my site. What’s up with that? Where did they come from?
A: If Google finds a link somewhere on the web that points to a URL on your domain, it may try to crawl that link, whether any content actually exists there or not; and when it does, your server should return a 404 if there’s nothing there to find. These links could be caused by someone making a typo when linking to you, some type of misconfiguration (if the links are automatically generated, e.g. by a CMS), or by Google’s increased efforts to recognize and crawl links embedded in JavaScript or other embedded content; or they may be part of a quick check from our side to see how your server handles unknown URLs, to name just a few. If you see 404s reported in Webmaster Tools for URLs that don’t exist on your site, you can safely ignore them. We don’t know which URLs are important to you vs. which are supposed to 404, so we show you all the 404s we found on your site and let you decide which, if any, require your attention.
Q: Someone has scraped my site and caused a bunch of 404s in the process. They’re all “real” URLs with other code tacked on, like http://www.example.com/images/kittens.jpg" width="100" height="300" alt="kittens"/></a... Will this hurt my site?
A: Generally you don’t need to worry about “broken links” like this hurting your site. We understand that site owners have little to no control over people who scrape their site, or who link to them in strange ways. If you’re a whiz with the regex, you could consider redirecting these URLs as described here, but generally it’s not worth worrying about. Remember that you can also file a takedown request when you believe someone is stealing original content from your website.
Q: Last week I fixed all the 404s that Webmaster Tools reported, but they’re still listed in my account. Does this mean I didn’t fix them correctly? How long will it take for them to disappear?
A: Take a look at the ‘Detected’ column on the Crawl errors page—this is the most recent date on which we detected each error. If the date(s) in that column are from before the time you fixed the errors, that means we haven’t encountered these errors since that date. If the dates are more recent, it means we’re continuing to see these 404s when we crawl.
After implementing a fix, you can check whether our crawler is seeing the new response code by using Fetch as Googlebot. Test a few URLs and, if they look good, these errors should soon start to disappear from your list of Crawl errors.
Q: Can I use Google’s URL removal tool to make 404 errors disappear from my account faster?
A: No; the URL removal tool removes URLs from Google’s search results, not from your Webmaster Tools account. It’s designed for urgent removal requests only, and using it isn’t necessary when a URL already returns a 404, as such a URL will drop out of our search results naturally over time. See the bottom half of this blog post for more details on what the URL removal tool can and can’t do for you.
Still want to know more about 404s? Check out 404 week from our blog, or drop by our Webmaster Help Forum.



36 comments:
Another great post. As a full time software developer, I have to be a part time webmaster. Articles like this help me understand technical issues and prioritize the issues I need to address.
I've driven home form the office more than once in the past couple of months wondering how important a couple of soft 404s that I saw in analytics were to my site search-ability and reputation. I'm geld to keep them on the to do list without letting them keep me awake at night. Thank you!
Thank you for answering some of the more "cloudy" areas of the Webmaster Tools results. These are definitely great resources so keep them coming!
On another note, I've always thought it would be beneficial to have 404 errors reported on the Sitemap exports when you do "All Sites". The export already shows warnings, submission dates, downloaded dates, indexed URLs, ect... but I think one missing key metric could be 404 errors. Any chance that could be added to the export report? It would make a lot of our lives easier to glance at the more than 100 sites in the reports for a quick view of 404s versus going to each sites dashboard.
Thanks!
A.M.
What I dont understand is why Google doesnt follow the above advice and make its 404 more useful (ie provide links to help people find the content)
http://www.google.com/seerch
Okay, but isn't broken links and dead links a disaster waiting to happen?
Great help to finetune our rewritten site's 404/301 pages.
Googles own 'takedown' request page mentioned in this article is broken. Quite ironic really.
WOW! this is simply awesome and very helpful post. I've been looking for everything on 404. Tx for posting.
Take down is not honored by Google. I've raised several take down requests but none of them have been honored. If you don't entertain it then why to publicize it at all.
On regular basis I see spammy websites using scrapping script posts, content etc. In most cases they create broken links and it shows up in Crawler report. I can understand it doesn't harm search ranking however it makes life a pain while looking at crawler error report.
Google as money and resources so why not make take down process simpler if you wish to really implement your advocacy of keep the Web clean and spam free.
Unfortunately this matter doesn't work at blogger site. Sometimes I can't block or remove unexpected / expired contents.
What about internal 404 links? Does it hurt me if I (accidentally) produce a lot of 404 links from www.mydomain.com/a.html to www.mydomain.com/b.html, .../c.html and so forth? Does Google see that as a sign for "Bad Quality" of my page?
We REALLY need a way to clear out "bad" 404s from Webmaster Tools. I work on a site that shows 400+ 404s but all internal links have been fixed (ie. the broken links have been corrected) and there is nothing I can do about the external links (most are from auto-generated sites that are NOT going to respond to a request to fix the link). Unfortunately with 400+ noise reports sitting there, the tool becomes all but worthless. We've simply given up using your tool and fix internal links using our own link checker spider.
Thank you for this post,is very interesting.
You can find here the italian translation of this post.
Potete trovare qui la traduzione italiana di questo post.
Emanuele
This is a VERY timely blog post.
I have been very concerned about a site from Asia that has created 3,000+ links to our eCOMM site and of course EVERY single one returns a 404.
Can this hurt our crawl budget or rankings?
I've read this post in it's entirety but I believe this may be a spam tactic, which this post did not cover.
Can this scenario I have described be hurting our site and / or rankings due to the excessive nature of what they are doing?
I have emailed their support@ address and no luck.
What options do we have?
Thank You For Any Information You Could Provide.
Michael
I recently saw a website where for "security reasons" the admin forbid the access to non-real/dead URLs and now instead of 404 they return 403 Forbidden.
Does this affect somehow ranking and how it differ for search engines compared with 404.
Hi, i have one question. If my site is using robots meta tag with values: index, follow. And when the page on my site doesn't exist these values becomes: noindex, follow and the error message occurs (with webpage design) that the url is broken or content was removed, is it still bad way to deal with 404? And if that page still needs to return 404 value?
I believe I have to build 301 redirect, coz after reading this article, getting my readers or users of my websites to the right page is much more important for search engine to search. My online reputation will be spoiled if they keep getting 401 or 404.
So can you confirm that there's no signal that looks at the percentage of 404s versus total crawl?
Or in essence, there's no signal that determines that the increased probability of a surfer to encounter a 404, which would point to poor user experience. In particular, if those 404s are generated internally and not from external links.
This is a great post that answers many questions I had regarding 404s. In general, if the URLs that 404 are not the important ones (traffic/revenue driven ones, top landing pages, etc), we can ignore the them. We should be careful when there is a huge jump or decreasing in the 404s, because this may be related to some server/back end malfunctions.
Thanks for the post. Good call highlighting soft 404's which we find particularly useful when auditing websites.
Very interesting article. It added the following PHP code to http://www.cartelinternetservices.nl and now it is returning the correct response code.
PHP code: header("HTTP/1.0 404 Not found");
Now our HTTP response code are in sync with our content (which already printed 404).
I am wondering how often Google crawl a site for finding the crawl errors. I have fixed the old crawl error for 6 weeks ago, but it has NOT been UPDATED by Google yet.
What can I do in order to have google to reevalute my site on this matter?
Please help me. Thanks.
I've set all my older removed pages to return a 410 error code but it still appears under "Not found".
Its nearly 5 months since I removed the content made those links return 410 codes but the detected is Jul 11, 2011 !!!!!
Thanks for the post. Good call highlighting soft 404's which we find particularly useful when auditing websites.
Does a sudden large number of 404s trigger a filter? We were 301 redirecting un-found products pages to the category page and switched to 404 not found. This resulted in a few thousand 404 pages in the crawl errors. Right after that, all search results for every page on the domain have been pushed back to page 30 or 40 in the rankings down from mostly page one positions.
Very informative and interesting article...thanks for sharing...it clarified many ifs and buts of mine..
RE: "Currently Google treats 410s (Gone) the same as 404s (Not found), so it’s immaterial to us whether you return one or the other."
When is Google going to fix this, which I believe to be improper behavior, and use the 410 response to explicitly remove the page, as the 410 response is supposed to be handled?
It is very frustrating and confusing to see urls that we know from "Fetch as Googlebot" are returning 410s, instead showing up as "404 - Not Found" in the Detail column of Webmaster Tools.
At the very least, when a 410 is returned, what is shown in the Webmaster Tools error reports should be the actual 410 response, and not a bogus 404. Ideally, if someone there feels that they still need to be reported, then 410s should have their own error report section, just as "Not Followed", "Not Found", and "In Sitemaps" are now reported separately.
I have 2 questions:
1] I have seen Google shows crawl errors related to other search engines where the long url is not properly displayed. It means google indexed the search result url of other search engines and shows the error. There are too many of them. How can we fix them. I guess by robots.txt but I am afraid that I may loss some actual inbound links also. So please suggest?
2] If too many such above broken urls are shown which are not created by me but are created due to other search engines or results will it harm the SERPS for the site?
Hi,
I am having problem with my blogger blog. I am using custom domain. and from last two days I can see my blog rather the browser returning a "Error 101 (net::ERR_CONNECTION_RESET): The connection was reset." message.
I don't know much about the problem. Or what to do?
Please can anyone tell me now what to do?
My site address is www.picwall.info
thanks
Good one.. I had lot of 404 non existing files but I'm now happy not to worry about it. Thanks a lot.
Unfortunately I have 11,233 404 errors, the reason being is that I have a large site and changed my site navigation around 3 times. Ever since I have literally vanished from Google, I know that Matt Cutts says that 404 errors don't effect your site in a bad way but when you have this many errors it obviously does as previously I was sitting at no1 page1 for my main key phrase.
Lessons have been learnt and I wont make the same mistake again, however it doesn't help the situation which I'm currently in. Can anyone give me advice on how I go about rectifying this situation?
Thanks,
Freddie.
What is the best practice when upgrading an 'old' site to something more modern (including permalinks for example) - And handling large numbers of external linking 404 errors.
Is using a rel canonical to the home page in the 404 error page itself a bad idea?
My Website Ypetshop.com has 500+ 404 pages as I changed my site content recently. But I think these hundreds of dead links causing issues with my sites ranking. So, I'm removing all of the links one by one manually on webmaster tools.
Anyone know if there is a way to keep google from re-listing 404s in the Crawl Errors Section? We want to continue to use this tool to find 404s our site but there are thousands of bad links from scrapers and bad typers. These 404s clog up our Not Found section making it virtually useless. I guess it would be something like disavow links but telling google that we know these links are bad, so please don't tell us about them anymore.
Thanks
this is really annoying that google webmaster tool is automatically crawling pages which we never created and due to which 404 ( page not found ) error is increasing day by day
But what is strange is that the missing pages are all like this:
autoworldcar.com/lamborghini-aventador-lp700-4/lamborghini-aventador-front-view/test.com/page/2/test.com
autoworldcar.com/tag/2013-ford/test.com/page/11/test.com/page/2/
autoworld dont have these pages anywhere created
The "regular" pages are all okay.
checked with my hosting company also and we tried to find why google is crawling these pages but not able to find what is the problem
please guide to remove these errors
This includes the possibility of errors such as 404? example image below
Post a Comment