Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Crawl Errors now reports soft 404s

Monday, June 07, 2010 at 10:17 AM

Webmaster Level: All

Today we’re releasing a feature to help you discover if your site serves undesirable "soft” or “crypto” 404s. A "soft 404" occurs when a webserver responds with a 200 OK HTTP response code for a page that doesn't exist rather than the appropriate 404 Not Found. Soft 404s can limit a site's crawl coverage by search engines because these duplicate URLs may be crawled instead of pages with unique content.

The web is infinite, but the time search engines spend crawling your site is limited. Properly reporting non-existent pages with a 404 or 410 response code can improve the crawl coverage of your site’s best content. Additionally, soft 404s can potentially be confusing for your site's visitors as described in our past blog post, Farewell to Soft 404s.    

You can find the new soft 404s reporting feature under the Crawl errors section in Webmaster Tools.



Here’s a list of steps to correct soft 404s to help both Google and your users:
  1. Check whether you have soft 404s listed in Webmaster Tools
  2. For the soft 404s, determine whether the URL:
    1. Contains the correct content and properly returns a 200 response (not actually a soft 404)
    2. Should 301 redirect to a more accurate URL
    3. Doesn’t exist and should return a 404 or 410 response
  3. Confirm that you’ve configured the proper HTTP Response by using Fetch as Googlebot in Webmaster Tools
  4. If you now return 404s, you may want to customize your 404 page to aid your users. Our custom 404 widget can help.

We hope that you’re now better enabled to find and correct soft 404s on your site. If you have feedback or questions about the new "soft 404s" reporting feature or any other Webmaster Tools feature, please share your thoughts with us in the Webmaster Help Forum.

The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

46 comments:

Indylogic said...

I am getting a soft 404 error on google webmaster tool. But when I enter the same URL in fetch as googlebot, I am getting a 200 response. How do i resolve the soft 404 error.

cmssupport said...

I have a dynamic page that in principle shows up a list of events but may not return any events. How can I avoid that it shows as soft 404?

UK Accountants said...

@cmssupport - why not return a page that says "no events found" with a 200 status?

This should be a useful tool for those of us that look after many websites, the chance to finally track down those 404's that return a different status code. Thanks Google :)

seoer said...

But wasn't this a registered patent of Yahoo!

http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=1&f=G&l=50&d=PG01&p=1&S1=20090157607.PGNR.&OS=dn/20090157607&RS=DN/20090157607

Chris F. said...

I am getting a notice on my blog that it is been blocked due to some spam attack apparently from a link that was hacked. What can be done about this?

ht990332 said...

your system is buggy.
one of my pages which returns 200 is thought by webmaster tools to be a 404 soft error.
No where on the page is a 'not found' message or missing content or anything that indicates no found.
something is wrong.

Chris F. said...

The message I get is Of the 4 pages we tested on the site over the past 90 days, 2 page(s) resulted in malicious software being downloaded and installed without user consent. The last time Google visited this site was on 2010-06-07, and the last time suspicious content was found on this site was on 2010-06-07.

Malicious software is hosted on 1 domain(s), including awmmagazine.ru/.

1 domain(s) appear to be functioning as intermediaries for distributing malware to visitors of this site, including marshaloftis.com/.

This site was hosted on 1 network(s) including AS15169 (Google Internet Backbone).

InfoTec said...

oh.Thanks for this great information

How To : http://toohow.blogspot.com

Amit Doda said...

Really good addition.. It will save time for investigating soft 404s

Alex K said...

I am getting Soft 404s ‎for my *.swf files, and 404 (Not found) for pages which definitely return 200.

Rich said...

This system is horribly buggy. Half of my product pages are being reported as being soft 404's - they aren't!

craig5320 said...

I too am receiving soft 404s for .swf files. Is this a problem with the way they're embedded in the pages? They are all embedded flash files and not anchors to the files.

Thanks

Craig

Article Guru said...

This feature is allowing me to find errors on my article directory, that I otherwise would not know of.

Dominion Web said...

Anyone know - do I need to care about soft 404s if my robots.txt file excludes pages I don't want crawled anyway?

Dominion Web said...
This comment has been removed by the author.
samjam04 said...

Is it OK to have a soft 404 error for a sitemap page that is actually just my Google verification page? I have it set up to Not Show in Menu on my sitemap....here is an example of the error.
Go to URLhttp://www.mywebsite.com/google4de83ce7cc02e51a.html 404-like content

John Mueller said...

@everyone :-) -- if you are seeing URLs that are not "soft 404s" but listed as such, it would be very useful to know about those URLs. It would be great if you could post some sample URLs, either here or in the help forum; feel free to use a URL shortener if you prefer. Thanks!

Alek said...

For John Mueller:
The two URLs webmaster tools told me are soft 404s are;

http://www.smrbathrooms.co.uk/360/W-120-C-180.swf
and
http://www.smrbathrooms.co.uk/360/W-120-G-180.swf

As you can see, they are not 404s but images of our products we have just added to our site.

They are embedded on this page
http://www.smrbathrooms.co.uk/acatalog/Contemporary-Corner-Rad-Valve-Chrome-pair-.html
and this page
http://www.smrbathrooms.co.uk/acatalog/Contemporary-Corner-Rad-Valve-Gold-pair-.html
respectively.

Is there anything in the way we have those pages set up which we could or should change to help the googlebot? Or do we just need to wait for this new feature to get better at distinguishing things?

Any input welcome.

John Mueller said...

Thanks, Alek, I'll pass that on. Given that we don't index SWF files for Image Search, I wouldn't worry about those files (we'll still be able to crawl and index the rest of the pages normally). It's good for our team to have that information though, so that we can work on fine-tuning it to make it more useful to webmasters like you :).

Niagara Falls said...

Hi John,
I've just noticed that a page on one of our sites is getting picked up as a soft-404 incorrectly. The link that is getting picked up is below, it is actually listed 6 different ways, with the only difference being the value for the "refid" parameter. The refid parameter is actually specific set to "ignore" in the settings.

http://niagarafallsrainforestcafe.com/falls-avenue-entertainment-complex.php?refid=1

What's strange is that there is a page with the same format as this on a couple of our other sites, and as of the time of writing these haven't been picked up at all.

Thanks.

twinkydoodle said...

@John Mueller:

The following are being reported as soft404 but are returning 200 and have no mention of anything being "not found" or "missing." Just regular pages with normal content.

http://beeets.com/events/view/930/7-come-11
http://beeets.com/events/view/931/the-string-slingers-harmony-grits
http://beeets.com/feed/pulse/events/1516

Thanks!

Brian Cryer said...

I applaud google's efforts to identify soft 404s. I can see why they are trying to do it, and I also recognise that any system to identify soft 404s is prone to occasionally mis-classify a page.

I have a page which is wrongly classified as a soft 404: http://www.cryer.co.uk/brian/delphi/error_fnfSHDocVw_TLB.htm

I suppose I can see why not, I have "File not found" in the title, and the text refers to "error" a few times. Yet the title is relevant and applicable and the page isn't a 404 in any sense (its dealing with a compiler error).

Perhaps google could use their existing system to identify candidates but provide some means for webmasters to appeal where they consider a page has been misclassified?

portal.connect.international said...
This comment has been removed by the author.
wcminor said...

I have a type of soft 404 that Google can't detect, and I'm not sure how to fix it. I was experimenting with modifying URLs for SEO purposes and my site got indexed several times before I finalized the URL structure that I wanted. This has resulted in invalid URLs being indexed. In addition to that, the snippets show error messages.


For example, google lists this page: http://blogstalk.com/blog/368/Selgas-Cano-Architecture-Office-by-Iwan-Baan with this snippet:
Fatal error: Call to undefined function minilogin() in /home/udosero/webapps/blogstalk/showpost.php on line 304.

Anyone who visits that link will get a 404 error, because the proper URL is:
http://blogstalk.com/blog/368/Selgas-Cano-Architecture-Office-by-Iwan-Baan/
Note the slash at the end of the URL. My sitemap has been updated with the slash terminated version, and it's clear that my site has been reindexed since this change, but the wrong version is still in the index.

John Mueller said...

@wcminor It sounds like you changed your URLs. In a case like that, I'd recommend setting up 301 redirects to change from the old form to the new one. Over time, as we recrawl your pages, we'll update our index based on that (and until then, users will still reach your new pages thanks to the redirect).

wcminor said...

@John Mueller
I can't easily set up a 301 in this case. The indexed pages were briefly valid URLs that became soft 404 and are now true 404. Due to my .htaccess setup, the URLs without the terminating slash can't be used at all, because I need a delimiter for the regexp to work.

wofgtg said...

Sir!
Thanks for giving your opinion on soft 404 error.
http://wofgtg.blogspot.com/.

admin carhireshop said...

How does Google find out that it's a soft 404?

Kristain77 said...

I am seeing some of my site URL's mentioned in the Crawl Error report under the 'Soft 404s' head. Is this new? It says 404-like content in the details section.
Animals In Australia

sts said...

Yes, this feature is Buggy. With my site, Webmaster Tools is reporting links to perfectly valid and functioning PDFs as soft 404s.

Despre said...

For John Mueller:
The entire website is based on a javascript interactive map. Example link below. It returns 200 but it's considered soft 404 by Google.
http://www.desprecluj.ro/hartacluj/index.html?map=cluj&cps=393170.621021198,586801.4192383448,50000&layers=__base__,ceva

Chipkarten said...

When will the Google Webmaster Tools API be updated to reflect most of the changes it had in the last years?

salvesis said...

Recently, we have experienced some pretty scary crawl errors on our webmaster tools reports. The crawl errors have been fluctuating from 10,000 to 129,000. We feel helpless because most of the errors are attributable to pages and even directories that Google's bot recognizes even though they do not exist on our server. We contacted our hosting company thinking that we had been
hacked, but they claim otherwise. It's like some alien spacecraft has
been generating pages and directories for our website that only Google's crawler can see and because these pages do not exist, a crawl error is generated.

Another crawl error we have been having is with a page that we removed from our website. It was a dynamic page that accepted a keyword queries. The removed page still receives thousands of hits from "scrapers" with keywords in Japanese and Swedish languages. When a hit is made to a removed
page we get a crawl error.

These "scrapers" cloak their IP addresses and our webmaster tools report the sources of these hits as "unavailable," so we cannot identify and control those hits by any way available to us.

In both of these cases, we are helpless to correct these crawl errors, yet Google may be penalizing us for them. We wonder if anyone else has been experiencing similar situations.

Thanks much.

Mike

archivist said...

I have a genuine soft 404 but its caused by the source link having incorrect get parameters unfortunately there is no "linked from" field in the report so I cant see where my error is yet.

http://www.collection.archivist.info/searchv12.php?Type=&Accn_no=&dir=2001/2001_05_11_bishton_clock&file=p1010002.jpg&src=title&Type=&Accn_no=&dir=2001/2001_05_11_bishton_clock&file=p1010002.jpg&srcprog=&scale=96&maincatpage=29&type=pd&accn_no=236&subject=9524&subject=9524&scale=.8

Expert said...

Webmaster tools shows 32 Soft 404s for my site, and all of them are false positives. Although they don't have much content, they are all valid pages. I checked with "Fetch as Googlebot" tool and they return 200 OK.

I can imagine how this hurts my ranking when google engine thinks that I have so many invalid links on my site...
One is a contact page, and all others are categories of a forum, which don't have posts yet.

My question is, how can i correctly serve pages with no content, so that google bot wouldn't think that they are 404s?

Here are examples of some of those false positives:
http://it.expertmonster.com/Internet/Web-Development/CSS/
http://it.expertmonster.com/Internet/Web-Development/JavaScript/
http://it.expertmonster.com/contact/

+29 other similar pages

Expert said...

The examples in my previous post already have content in them.
Here is an empty category:
http://it.expertmonster.com/Hardware/Networking-Hardware/
It says "No questions found." on the page. It may have triggered the false positive. Should I replace that message to something else?
And what caused it on the contact page?

exdmd said...

Webmaster tools is reporting a soft 404 error for my contact page at http://herbal-incense.net/contact-us.html

When I test the page using Fetch as Googlebot the page returns fine, so I am at a loss as to how to fix.

dirtza said...

Hello. Few days ago i have the eror 404s to the main page www.rosuites.com and say that is 404s-like content. also the site don't apear in the first searches when i type site:rosuites.com. Does anybody know how to solve this problem?

simplementeunomas said...

Hi All. I have problems with my GWT. Everyday there growing crawl error and all of them are non-existent pages. Yesterday I had 125, today I have 164. Could someone tell me how can i remove them? I'm losing a lot of traffic for this reason. I requested URL removal and google webmaster tool only allowed me to remove 115 URl only. Now i am not able to remove the rest 60 pages. Please help!

Dale said...

One of my sites is now showing the "soft 404" error in Webmaster Tools. It's a file with content and style located at: http://daleallyn.com/contact.php (obviously an important page). Status shows as 200 OK in Fetch as Googlebot.

The file is at the root level; on the same level as my 404.shtml file (which is currently returning "status 200 OK" with Fetch as Googlebot); same level as my sitemap.xml file; same level as my robots.txt file.

This is the first time I have encountered this and does not occur on my other sites (mine or clients'). All pages are W3C valid XHTML 1.0 transitional.

Any input would be appreciated.

Dale said...

After reading several more articles on the topic, and re-evaluating the output at Webmaster Tools, I'm wondering if my contact page is causing the "soft 404" because upon submission of the form the user is redirected (via php "header("Location: /thankyou.html"); ). This line of code is part of the sendmail script and an important function for the user, providing "thank you" message and navigation options, etc.

I'm not excited about removing this. Maybe there's a better way to present the new location.

WindshieldGuy said...

John Mueller,

I have 14 pages that are returning soft 404's. All pages have legitimate content but have been 301 directed from a previous site. But most of my pages were redirected. When I fetch as googlebot I get no error message. All pages have been removed from google's index. How do I fix this.

http://www.termite-control.net/buckeye.html
http://www.termite-control.net/fountain-hills.html
http://www.termite-control.net/gilbert.html
http://www.termite-control.net/glendale.html
http://www.termite-control.net/gold-canyon.html
http://www.termite-control.net/identifying-the-termite-problem.html
http://www.termite-control.net/mesa.html
http://www.termite-control.net/peoria.html
http://www.termite-control.net/scottsdale.html
http://www.termite-control.net/superstition-mountain.html
http://www.termite-control.net/termite-removal.html
http://www.termite-control.net/termite-treatment.html

Energy said...

its confusing that i get soft 404 for an old website that was running on asp.net. Changed to php 1 year ago and there was no problem until I changed webhotel..

USWT said...

Thank a lot , i have got all thing and removed my soft 404 error

jobisez said...

I would like to see a "linked from" column on the soft-404 page. Maybe I can correct this by contacting the webmaster and asking them to make a change.

Google Webmaster Central said...

Hi everyone,

Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Central Help Forum.

Thanks and take care,
The Webmaster Central Team