Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

URL removal explained, Part III: Removing content that you don't own

Tuesday, April 20, 2010 at 1:26 PM

Webmaster Level: All

Welcome to the third episode of our URL removals series! In episodes one and two, we talked about expediting the removal of content that's under your control and requesting expedited cache removals. Today, we're covering how to use Google's public URL removal tool to request removal of content from Google’s search results when the content originates on a website not under your control.

Google offers two tools that provide a way to request expedited removal of content:

1. Verified URL removal tool: for requesting to remove content from Google’s search results when it’s published on a site of which you’re a verified owner in Webmaster Tools (like your blog or your company’s site)

2. Public URL removal tool: for requesting to remove content from Google’s search results when it’s published on a site which you can’t verify ownership (like your friend’s blog)

Sometimes a situation arises where the information you want to remove originates from a site that you don't own or can't control. Since each individual webmaster controls their site and their site’s content, the best way to update or remove results from Google is for the site owner (where the content is published) to either block crawling of the URL, modify the content source, or remove the page altogether. If the content isn't changed, it would just reappear in our search results the next time we crawled it. So the first step to remove content that's hosted on a site you don't own is to contact the owner of the website and request that they remove or block the content in question.
  • Removed or blocked content

    If the website owner removes a page, requests for the removed page should return a "404 Not Found" response or a "410 Gone" response. If they choose to block the page from search engines, then the page should either be disallowed in the site's robots.txt file or contain a noindex meta tag. Once one of these requirements is met, you can submit a removal request using the "Webmaster has already blocked the page" option.



    Sometimes a website owner will claim that they’ve blocked or removed a page but they haven’t technically done so. If they claim a page has been blocked you can double check by looking at the site’s robots.txt file to see if the page is listed there as disallowed.
    User-agent: *
    Disallow: /blocked-page/
    Another place to check if a page has been blocked is within the page’s HTML source code itself. You can visit the page and choose “View Page Source” from your browser. Is there a meta noindex tag in the HTML “head” section?
    <html>
    <head>
    <title>blocked page</title>
    <meta name="robots" content="noindex">
    </head>
    ...
    If they inform you that the page has been removed, you can confirm this by using an HTTP response testing tool like the Live HTTP Headers add-on for the Firefox browser. With this add-on enabled, you can request any URL in Firefox to test that the HTTP response is actually 404 Not Found or 410 Gone.

  • Content removed from the page

    Once you've confirmed that the content you're seeking to remove is no longer present on the page, you can request a cache removal using the 'Content has been removed from the page' option. This type of removal--usually called a "cache" removal--ensures that Google's search results will not include the cached copy or version of the old page, or any snippets of text from the old version of the page. Only the current updated page (without the content that's been removed) will be accessible from Google's search results. However, the current updated page can potentially still rank for terms related to the old content as a result of inbound links that still exist from external sites. For cache removal requests you’ll be asked to enter a "term that has been removed from the page." Be sure to enter a word that is not found on the current live page, so that our automated process can confirm the page has changed -- otherwise the request will be denied. Cache removals are covered in more detail in part two of the "URL removal explained" series.


  • Removing inappropriate webpages or images that appear in our SafeSearch filtered results

    Google introduced the SafeSearch filter with the goal of providing search results that exclude potentially offensive content. For situations where you find content that you feel should have been filtered out by SafeSearch, you can request that this content be excluded from SafeSearch filtered results in the future. Submit a removal request using the 'Inappropriate content appears in our SafeSearch filtered results' option.

If you encounter any issues with the public URL removal tool or have questions not addressed here, please post them to the Webmaster Help Forum or consult the more detailed removal instructions in our Help Center. If you do post to the forum, remember to use a URL shortening service to share any links to content you want removed.

Edit: Read the rest of this series:
Part I: Removing URLs & directories
Part II: Removing & updating cached content
Part IV: Tracking requests, what not to remove
Companion post: Managing what information is available about you online

The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

26 comments:

Meavo said...

What I do not get is why a 301 redirect is not sufficient. Since a 301-redirect tells a crawler a websitepage disappaired (for whatever reason) this should also be an option.

Some sites only show 404's when pages never excisted.

Can you tell us why you only choose for 404 an 410 redirects?

leprof said...

This is all very well; but it assumes that webmasters are all very nice chaps who'll happily cooperate with each other and remove content from their site with an "Oh, I'm so sorry", as soon as you ask.
But it does not seem to address the fundamental question of what to do when the content you don't own concerns you, is libellous or defamatory, or is your copyrighted work, and is hosted on a website in a distant country whose webmaster couldn't care a toss about your requests to remove material, and takes absolutely no steps to do so, however wrong, libellous or in breach of copyright that it may be.

John Mueller said...

@leprof In a situation like that you would have to take the same steps you would take if that content were published on a different medium. Our systems work to make the information accessible that is online -- we are not in a position to judge the validity of that information. Once the problematic information is no longer published online, our systems can work to update our view of it in our search results.

In your opinion, what would a potential solution be to the problem that you described? Our teams are always interested in improving our processes, so we welcome feedback from everyone.

John Mueller said...

@Meavo A 301 redirect does not signal that the website / page has been removed, it only signals that it is now available under a different address. Technically, the redirect on it's own only affects the URL, not the content. The content is affected by what is shown on the other URL.

If you need to signal that a page has ceased to exist, using a 404 or a 410 HTTP result code would be the correct thing. If the sites which you use are not doing this, feel free to point them to our blog posts (or to the definition of the HTTP result codes). I realize not all sites handle this properly (and we have also had services at Google with this problem), but I really think it's a good idea to stick to these standards wherever possible.

VnPress@net said...

thank for this post, thought it i know about removing content of my blog

Sunshine said...

I totally agree with "leprof".
I have had numerous of problems with specific porn website that don't like to co-operate at all. I think these remove tools is a good start - but when it's accually impossible to discuss a problem with the webmaster, you don't find a contact person/e-mail or anything on the website + the contact information doesn't exist if you check for the owner of that specific domain address (e.g. who.is etc. - please Google give us advise on how to handle these problems. I think they will grow a lot in the future!!!

Simon said...

If content is on the web you don't like, it really isn't Google's problem. So why should they advise you?

There is legal redress in most countries for copyright infringement, defamation and libel.

Sure legal redress can be expensive and difficult, but that is how it should be when you want to stop someone else expressing themselves freely.

Indeed I'm watching an ongoing issue where a con man has used a DMCA take-down notice to get sites that reveal his scam removed from Google's index. Shows what happens when you make the legal process of silencing others too easy.

leprof said...

@John Mueller. It is alas too easy to hide behind the argument of "the same steps you'd have to take in another medium". This suggests that the Internet is just like any other medium. It is not. Copying, slandering, misinforming, cheating - sometimes quite anonymously - across international borders are very uncommon and often impossible in other media. You say "We are not in a position to judge the validity of information"; but if you were, that would surely be a big improvement? After all, people using Google are looking for valid information (one supposes), not invalid information; so any way you could find to exclude in-valid results would be welcome by 99.99% of users. Any suggestions of how to do this? Why not pay a few hundred people to weed out the junk sites, MfA sites, copyright-infringing sites, slanderous sites, hate sites and other inappropriate results that crop up in Google. People want good results from a search engine, which means finding ways to eliminate the bad ones. If Google doesn't do it, Bing will, or some other search engine

Meavo said...

@John Mueller

A 301 does indeed not remove the content on the page it refers to... however... most of the time webmasters use this redirect to still move some linkjuice to the other page.
If we have a page that is for some reason disturbing for someone and they have a good argument then the page will disappear and a 301 will go to the homepage.
This occurs a lot on the web and it is totally legitimate. I understand your point of view but also see the point of view of websites with thousands of pages... if they get too many 404 pages they may even get penalized or filtered out of the index.

John Mueller said...

@Sunshine if your full name is showing up on a spammy adult content site, you can request that this is removed through the form at http://www.google.com/support/websearch/bin/request.py?contact_type=name_on_adult_spam_page&hl=en -- There are a few other similar exceptions, which you can find in the removal tool if you select "The website owner won't remove personal content."

John Mueller said...

@Meavo There is no penalty for having URLs that return 404; this is absolutely normal on the web and not something that someone would need to work around with a 301 redirect. A 404/410 HTTP result code is normal (and expected) for content that has been removed. It's possible to make great "page not found" pages that help the user much more than just redirecting them to the homepage :).

holidaysupermarket said...

What a laugh! "Removing content you dont own"! Makes the assumption that you can contact the tp site owner and they will respond! I have been trying to do this for months without joy. How about where we can tell google to ignore content on a website we don't own?

Sunshine said...

@John Mueller I have tried this probably 5 times. I never get any answer. I'm starting to think it's not working at all...

Cory said...

I used the public URL removal tool, it "approved" it, except the URL is still listed. The only difference is the description under it isn't there any longer. How do I get the link removed as well?

Cory said...

Just to add, if you search the content that was removed - it still shows up, despite the fact that the words appear absolutely nowhere on the page (making it irrelevant for that search anyways). The person controlling the site agreed to take it off.

Is it just that it takes time for it to disappear completely? For all intents and purposes, the public URL removal tool "worked" as there is no cached version or listed content under the link. But the link itself is still there (which doesn't make sense, being that the content searched appears nowhere on the page).

Any way I can get it immediately removed?

moskwa said...
This comment has been removed by the author.
moskwa said...

@Cory: It sounds like you only requested removal of the cache, not the entire URL. As stated in our second post:

"Google indexes and ranks items based not only on the content of a page, but also on other external factors, such as the inbound links to the URL. Because of this, it's possible for a URL to continue to appear in search results for content that no longer exists on the page, even after the page has been re-crawled and re-indexed. While the URL removal tool can remove the snippet and the cached page from a search result, it will not change or remove the title of the search result, change the URL that is shown, or prevent the page from being shown for searches based on any current or previous content. If this is important to you, you should make sure that the URL fulfills the requirements for a complete removal from our search results."

xx said...

A few months ago a newspaper had published my private information on its page. Finally they amended the article in its epaper version.
However yesterday when I googled myself, I still saw such information in the quick vew PDF cach (google doc), though the original newspaper page has been modified. Then I sent request by public webmaster tool to remove the old cach, but it was denied several times.

I am not the webmaster, the article that concerned me only occupied small part of the page and there were other news on it. Definitely they wont agree to give http404.

What can I do with it? I need some expertise. I am so shattered.

I heard someone that google wont re-crawl PDF/adobe site, is this true? Does this mean the outdated qucik view will stay in google search forever even though the original has been amended.

I really need some help, please

Susan Moskwa said...

@xx: As stated in our previous post, it can take a bit longer for us to recache PDF Quick View links. If you can send me the URL I may be able to speed it up for you.

xx said...
This comment has been removed by the author.
Brett said...

I need a question answered. I requested that a website remove content from a webpage - the content is no longer visible on the page - but it is still visible in the html source code.....is this why it is still showing up on google search? Will it continue to show up as long as the html text is not changed or will google crawl eventually update?

A related question is - how can this content be not visible on the page but still in the html code?

I tried to remove using the public removal tool and it was denied saying that the content appears on a live page.

Please help.

Thanks

F said...

Dear Mr. Mueller,

I hope you can help me. A disgruntled former employee has posted a defamatory blog page about me that pops up in searches of my name.

The blog is hateful, threatening, and encourages violence against me.

The content violates Blogger Content Policy: “Violence: Don't threaten other people on your blog. For example, don't post death threats against another person or group of people and don't post content encouraging your readers to take violent action against another person or group of people.” - http://www.blogger.com/content.g Copyright © 1999 – 2010 Google

I am quite certain that I will not get an "Oh, I'm so sorry" if I request a take-down of the blog.

Please give me further advice about how I can get the blog taken down and all traces removed?

Thank you.

F said...

...sorry, I meant "Dear Mr. Simon" the author. But I would welcome suggestions from anyone! Thank you.

connie said...

I have requested for a URL removal. It's pending. I was wondering if there is a faster way to get a URL removed? Google said it would take up to 90 days. Some personal information shows that needs to be removed urgently.

Deborah said...

The same thing has happened to me. Someone that I know has posted an ugly comment about me on Google and on Facebook. It defames me as a person and defames my character in a very negative way. If you have run out of options on your own, then you may need to consult with someone that can legally help you. It will cost you some money, but in the long run will bring you peace of mind, hopefully. The website is: cyberinvestigationservices.com

Google Webmaster Central said...

Hi everyone,

Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Central Help Forum.

Thanks and take care,
The Webmaster Central Team