Monday, June 09, 2008 at 3:40 AMWritten by Sven Naumann, Search Quality Team
Since duplicate content is a hot topic among webmasters, we thought it might be a good time to address common questions we get asked regularly at conferences and on the Google Webmaster Help Group.
Before diving in, I'd like to briefly touch on a concern webmasters often voice: in most cases a webmaster has no influence on third parties that scrape and redistribute content without the webmaster's consent. We realize that this is not the fault of the affected webmaster, which in turn means that identical content showing up on several sites in itself is not inherently regarded as a violation of our webmaster guidelines. This simply leads to further processes with the intent of determining the original source of the content—something Google is quite good at, as in most cases the original content can be correctly identified, resulting in no negative effects for the site that originated the content.
Generally, we can differentiate between two major scenarios for issues related to duplicate content:
- Within-your-domain-duplicate-content, i.e. identical content which (often unintentionally) appears in more than one place on your site
- Cross-domain-duplicate-content, i.e. identical content of your site which appears (again, often unintentionally) on different external sites
In the second scenario, you might have the case of someone scraping your content to put it on a different site, often to try to monetize it. It's also common for many web proxies to index parts of sites which have been accessed through the proxy. When encountering such duplicate content on different sites, we look at various signals to determine which site is the original one, which usually works very well. This also means that you shouldn't be very concerned about seeing negative effects on your site's presence on Google if you notice someone scraping your content.
In cases when you are syndicating your content but also want to make sure your site is identified as the original source, it's useful to ask your syndication partners to include a link back to your original content. You can find some additional tips on dealing with syndicated content in a recent post by Vanessa Fox, Ranking as the original source for content you syndicate.
Some webmasters have asked what could cause scraped content to rank higher than the original source. That should be a rare case, but if you do find yourself in this situation:
- Check if your content is still accessible to our crawlers. You might unintentionally have blocked access to parts of your content in your robots.txt file.
- You can look in your Sitemap file to see if you made changes for the particular content which has been scraped.
- Check if your site is in line with our webmaster guidelines.
If you would like to further discuss this topic, feel free to visit our Webmaster Help Group.
For the German version of this post, go to "Duplicate Content aufgrund von Scraper-Sites".