Tuesday, April 20, 2010 at 1:26 PMWebmaster Level: All
Welcome to the third episode of our URL removals series! In episodes one and two, we talked about expediting the removal of content that's under your control and requesting expedited cache removals. Today, we're covering how to use Google's public URL removal tool to request removal of content from Google’s search results when the content originates on a website not under your control.
Google offers two tools that provide a way to request expedited removal of content:
1. Verified URL removal tool: for requesting to remove content from Google’s search results when it’s published on a site of which you’re a verified owner in Webmaster Tools (like your blog or your company’s site)
2. Public URL removal tool: for requesting to remove content from Google’s search results when it’s published on a site which you can’t verify ownership (like your friend’s blog)
Sometimes a situation arises where the information you want to remove originates from a site that you don't own or can't control. Since each individual webmaster controls their site and their site’s content, the best way to update or remove results from Google is for the site owner (where the content is published) to either block crawling of the URL, modify the content source, or remove the page altogether. If the content isn't changed, it would just reappear in our search results the next time we crawled it. So the first step to remove content that's hosted on a site you don't own is to contact the owner of the website and request that they remove or block the content in question.
- Removed or blocked content
If the website owner removes a page, requests for the removed page should return a "404 Not Found" response or a "410 Gone" response. If they choose to block the page from search engines, then the page should either be disallowed in the site's robots.txt file or contain a noindex meta tag. Once one of these requirements is met, you can submit a removal request using the "Webmaster has already blocked the page" option.
Sometimes a website owner will claim that they’ve blocked or removed a page but they haven’t technically done so. If they claim a page has been blocked you can double check by looking at the site’s robots.txt file to see if the page is listed there as disallowed.
User-agent: *Another place to check if a page has been blocked is within the page’s HTML source code itself. You can visit the page and choose “View Page Source” from your browser. Is there a meta noindex tag in the HTML “head” section?
<html>If they inform you that the page has been removed, you can confirm this by using an HTTP response testing tool like the Live HTTP Headers add-on for the Firefox browser. With this add-on enabled, you can request any URL in Firefox to test that the HTTP response is actually 404 Not Found or 410 Gone.
<meta name="robots" content="noindex">
- Content removed from the page
Once you've confirmed that the content you're seeking to remove is no longer present on the page, you can request a cache removal using the 'Content has been removed from the page' option. This type of removal--usually called a "cache" removal--ensures that Google's search results will not include the cached copy or version of the old page, or any snippets of text from the old version of the page. Only the current updated page (without the content that's been removed) will be accessible from Google's search results. However, the current updated page can potentially still rank for terms related to the old content as a result of inbound links that still exist from external sites. For cache removal requests you’ll be asked to enter a "term that has been removed from the page." Be sure to enter a word that is not found on the current live page, so that our automated process can confirm the page has changed -- otherwise the request will be denied. Cache removals are covered in more detail in part two of the "URL removal explained" series.
- Removing inappropriate webpages or images that appear in our SafeSearch filtered results
Google introduced the SafeSearch filter with the goal of providing search results that exclude potentially offensive content. For situations where you find content that you feel should have been filtered out by SafeSearch, you can request that this content be excluded from SafeSearch filtered results in the future. Submit a removal request using the 'Inappropriate content appears in our SafeSearch filtered results' option.
Edit: Read the rest of this series:
Part I: Removing URLs & directories
Part II: Removing & updating cached content
Part IV: Tracking requests, what not to remove
Companion post: Managing what information is available about you online