Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Duplicate content due to scrapers

Monday, June 09, 2008 at 3:40 AM



Since duplicate content is a hot topic among webmasters, we thought it might be a good time to address common questions we get asked regularly at conferences and on the Google Webmaster Help Group.

Before diving in, I'd like to briefly touch on a concern webmasters often voice: in most cases a webmaster has no influence on third parties that scrape and redistribute content without the webmaster's consent. We realize that this is not the fault of the affected webmaster, which in turn means that identical content showing up on several sites in itself is not inherently regarded as a violation of our webmaster guidelines. This simply leads to further processes with the intent of determining the original source of the content—something Google is quite good at, as in most cases the original content can be correctly identified, resulting in no negative effects for the site that originated the content.

Generally, we can differentiate between two major scenarios for issues related to duplicate content:
  • Within-your-domain-duplicate-content, i.e. identical content which (often unintentionally) appears in more than one place on your site

  • Cross-domain-duplicate-content, i.e. identical content of your site which appears (again, often unintentionally) on different external sites
With the first scenario, you can take matters into your own hands to avoid Google indexing duplicate content on your site. Check out Adam Lasnik's post Deftly dealing with duplicate content and Vanessa Fox's Duplicate content summit at SMX Advanced, both of which give you some great tips on how to resolve duplicate content issues within your site. Here's one additional tip to help avoid content on your site being crawled as duplicate: include the preferred version of your URLs in your Sitemap file. When encountering different pages with the same content, this may help raise the likelihood of us serving the version you prefer. Some additional information on duplicate content can also be found in our comprehensive Help Center article discussing this topic.

In the second scenario, you might have the case of someone scraping your content to put it on a different site, often to try to monetize it. It's also common for many web proxies to index parts of sites which have been accessed through the proxy. When encountering such duplicate content on different sites, we look at various signals to determine which site is the original one, which usually works very well. This also means that you shouldn't be very concerned about seeing negative effects on your site's presence on Google if you notice someone scraping your content.

In cases when you are syndicating your content but also want to make sure your site is identified as the original source, it's useful to ask your syndication partners to include a link back to your original content. You can find some additional tips on dealing with syndicated content in a recent post by Vanessa Fox, Ranking as the original source for content you syndicate.

Some webmasters have asked what could cause scraped content to rank higher than the original source. That should be a rare case, but if you do find yourself in this situation:
  • Check if your content is still accessible to our crawlers. You might unintentionally have blocked access to parts of your content in your robots.txt file.

  • You can look in your Sitemap file to see if you made changes for the particular content which has been scraped.

  • Check if your site is in line with our webmaster guidelines.
To conclude, I'd like to point out that in the majority of cases, having duplicate content does not have negative effects on your site's presence in the Google index. It simply gets filtered out. If you check out some of the tips mentioned in the resources above, you'll basically learn how to have greater control about what exactly we're crawling and indexing and which versions are more likely to appear in the index. Only when there are signals pointing to deliberate and malicious intent, occurrences of duplicate content might be considered a violation of the webmaster guidelines.

If you would like to further discuss this topic, feel free to visit our Webmaster Help Group.

For the German version of this post, go to "Duplicate Content aufgrund von Scraper-Sites".
The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

60 comments:

zoran said...

OK but why boxxet.com has tons of traffic and usually is positioned above many sites but obviously they have no original content only aggregating tons of things from google blog search, techorati and similar?

LebossTom said...
This post has been removed by the author.
Brent D. Payne said...

You guys aren't doing nearly as good a job as what this post states you are doing. I deal with multiple domains and synidcated content from those domains across one another. We always link back to the original source and seldom does Google recognize the original source as such.

Furthermore, adding the preferred URL to the sitemap does little (if any) good in telling Google which version of the URL you wish to have indexed.

Sorry, I am usually a huge promoter of what Google states in official addresses such as this one, but personal experience proves that this post is largely overexagerated.

Sincerely,

Brent D. Payne

Rich Pearson said...

@Brent - we actually see link backs work pretty well for syndicated content.

I do agree this is exaggerated, particularly on this statement "you shouldn't be very concerned about seeing negative effects on your site's presence on Google if you notice someone. Our customers have found that without link backs, scapers do indeed impact your site's presence.

jamiec said...

Question here: if a site does have an unintentional duplicate content issue, is Google still reaching out to webmasters either by email or within Webmaster Tools? Was that program ever officially rolled out?

a la: http://www.mattcutts.com/blog/notifying-webmasters-of-penalties/

Jennifer Mathews Somogyi said...

In August of 2007 I posted my results of a duplicate content case study that showed how the bots recognize content "shingles" - http://seomarketinggoddess.blogspot.com/2007/08/seo-similar-content-case-study.html

Andrew said...

What about duplicate content on a blog? For example, I write an article about Google Universal and the unique URL for the individual article is www.myblog.com/google-universal.html however the same content is available when clicking on the category "google" that displays all articles under that category, which has a url of www.myblog.com/category/google.html?

Should a blogger be weary of such duplicate content, or would that not warrant any worry?

Susan Moskwa said...

Andrew & Jamie:

As mentioned in the blog post, many cases of duplicate content are unintentional and Google will simply filter out the duplicates and choose 1 version to show in search results. We don't contact webmasters in these cases since it's quite common. Many blog platforms and CMSs generate a certain amount of duplicate content and it's generally not something you need to worry about. Andrew's example of a blog post being available both at its permalink, and on a category page, is a classic example of this.

Brent D. Payne said...

Regarding blog situation . . . if you are still concerned, go into your blog template and edit the pages that just have the snippets of the story and put in a meta noindex tag. A meta noindex will keep it out of the search engines but the search engines will still follow the links on the page and pass the 'link juice' to the pages that the noindex'd page links to (if you followed that run-on sentence).

As for examples of Google doing poorly regarding their ability to recognize the true source of information, email me and I'll provide several examples (just not going to do it publicly).

Thanks,

Brent D. Payne

The Wandering Author said...

First of all, what about bloggers who enter their work in a contest or challenge? We post the work ourselves, then the host reposts it? Is that anything to worry about?

Second, if Google is good at recognising scrapers, why do you index them at all? Users can get the same content without visiting and benefiting a site that is built upon theft. By indexing them, you continue to make it profitable for them to steal.

eMkt.vn ™ [ eMarketing Strategy ] said...

@ Andrew: You can also make a few changes to your Archive or Category pages by adding some new lines above the same content. Most Wordpress templates have Single post and Category/Archive page identical.

Best regards.

Autocrat said...

::: Internal-duplication :::
So basically it doesn't really matter if it occurs?

Should we be saying that it doesn't really matter, no real harm is likely to occur?

Should we be telling people not to worry aobut it?

If so, does that mean te majority of us at the Group have been wasting our time/effort trying to get such things communicated and fixed?

+ does Google automaticaly sort out the IBL and their value in regards to multiple URLs for the same content?

-

::: External-duplication :::
Sorry folks, but I'm with the more negative folk on this one.

There are constant problems with scrapers and fed sites that rank highly.

Add to that the varioues 'reinforcement' lines, and it reads more like ...
'Do no panic - we have it under control (honestly... we do!)'
... and that tends to make others nervous.

- - - - - - - - - - - - - - - - -

So here's and idea.
And yes, it will involve a little work.

Introduce some subset optioons for Search.
Instead of crippling Directories and Fed sites...
why not introduce 2 new search sets...
Google Directory Search
Google Fed Site Search
(okay, the names may need work ;))

Tada... a whole 2 seconds worth of thought there!
And you stil get to give people what they want... and you stop killing some peoples businesses (and yes, you have been hurting innocents... whilst still leaving the nasties in the rankings!).

So how about that?
a Sub-Search for Fed sites... would mean that you could get the 'originator' in the standard search, and if you want tons of potentially related stuff, you can do the fed site search an get places like tecni etc.

Robert said...

I really appreciate what Google has been doing with this blog. The information that it contains generally has been quite helpful to the webmaster community. However, this post about duplicate content is misleading and inaccurate.

1) I will grant you that it’s partially true that if duplicated content includes a link back to the original article that Google generally gets it right. However, many Googlers, including Matt Cutts, have admitted that this is not *always* the case, and even with the link back, Google doesn’t always recognize the original source. This is a problem because people seem to think that if they take content and include a link back to it that this is fair use and that they are not causing any harm to a site. It is not fair use, and they very well may be causing harm to a site, particularly small-to-mid-size ones.

2) If, by scraper sites, you mean, places where a short phrase from an article is randomly intermixed with other short phrases from other articles to create something that reads like utter nonsense, then, yes, Google OVER TIME will figure out which is the original source. However, the victim site can be harmed by a scraper site for the short term because its ranking can be lowered or it can be tossed from the index entirely until Google has a chance to sort it out. This can take months.

3) When large portions of articles are scraped and duplicated without a link back to the original source, Google does a HORRIBLE job of determining the original source, except for very very large sites. Unfortunately, most people are ignorant about including the link back to the original article, so they think they are sharing something that can help a site. In reality, they harm the site by failing to include a link back to the original article. Even if they put a link back to the main page of a site, Google doesn’t get it right. When you factor in made-for-AdSense sites that merely duplicate/aggregate content, you open the door to harming a lot of innocent sites that are creating the quality content.

I’m sure you saw the article on WebProNews regarding Michael VanDeMar’s situation. I think what he was doing was wrong, and he had no business archiving materials like that. I think it’s right that he suffered for it. However, I have seen and heard of effects similar to what happened to him where the duplicate content filter was applied incorrectly to entire sub-folders/sub-directories when a lot of the material within them was pilfered illegally by another site or, worse, a lot of other sites.

Even before such an extreme situation happens, I have seen cases where postings that were published within a few hours of each other were both removed from Google’s index when the Scraper Site stole just one of them. I have also seen cases where on the day that something is stolen from an Original Site, everything else it published on that day is suppressed by Google. I have even seen cases where a single paragraph that is stolen from the middle of a 6-paragraph article causes the original source to be tossed aside. Sometimes these effects can last for days, sometimes for weeks, and sometimes for months. It’s very troubling, and there is no real way to defend against it. Every remedy is reactive, and every time copyrighted material is stolen the original source suffers.

Is there anything a site owner CAN do to immediately nullify the harmful effects caused by stolen content? Filing spam reports doesn’t seem to do a thing. Filing reconsideration reports doesn’t seem to do a thing as they are site-wide, not page specific. Filing DMCA complaints takes so long that by the time the item is removed it has lost a lot of its “timely relevance.” Often, it never fully recovers.

It seems absurd that the site that was the victim of the theft should have to block access to the pages that were stolen just to keep Googlebot happy and minimize the harm. It also seems absurd that the site that was the victim of the theft should have to rewrite their content. These, however, are the only two options that seem to work relatively quickly, and neither of them seems very just because they turn the victim of theft into a victim of Google.

Rich Pearson said...

@robert - you can definitely do something about this.

full disclosure that I work at Attributor which automates link back requests and removal notices

we are not ready to take on everybody, but if you send me a mail at rich(at)attributor(dot)com, I get you in the beta we are running for smaller publishers. We're specifically looking for folks interested in building link backs as conversion for these has been a bit of a challenge whereas removals are 95% successful within a few days.

Inland Empire said...

What about using syndication tools? We have built an XML feed portal for all of our client to post to their respective site's newsrooms, then from their site allow them to create the FeedBurner feeds. We want the client's sites to be the source, not our portal, but it seems that the portal is getting the credit. Any suggestions on how to change that?

Hikari said...

And what happens when we own more than 1 domain and want to use all of them on our site?

If we just use 301 redirect we will need to choose 1 domain to be the principal one and the others will in fact never be used, because even if somebody types them on address bar it will be replaced by the principal domain.

Let's take as exemple a blog, which iteself already have duplicate content on permalink, archive and cateories. You said we don't need to worry about it.

So, we have domainA.com domainB.com domainC.com domainC.net. If all of these domains returns the exact same content of that blog, what will happen?

Tigwyk said...

@The Wandering Author: Google's not in the business of censoring what's on the internet, they simply index all content. To discriminate against those who scrape would be a) against the entire point of the internet, and b) a waste of time. How many articles are available on the web for free as long as you give them credit? Lots. Will Google know the difference between these and plagiarized articles? No.

Besides, once Google starts censoring something due to certain "requirements", what's next? Who's to stop 'em from charging to index your site?

Autocrat said...

...hikari...
If you are refering to using the same content on multiple domains, thenyes, that is obviously duplication.

Why would Google want to show 4 sites with exactly the same content?

That sort of thing is generally the result of 2 possibilities;
1) Complete ignorance
2) An attempt to 'cheat' the SE's

.

Think about it logically!
Would you write and produce the same book 3 or 4 times, and jsut change the cover?
If you did, would you expect to get in the top 10 bestseller list for each book at the same time???

Hikari said...

Well I've seen a few books with different cover, like blue to men and pink to women :P

But I don't want to have all of them on search queries, it's just that they are cool names and I want to use them all, and not have all-1 pointing to 1, that would be parking and I don't want that.

What's happening currently is that Google is choosing randomly 1 of them to show and filtering the others out. That's perfect to me.

My concern is if Google will rank me out due to that or think I'm cheating and ban me. Anyway, sitemap.xml is the same for all of them and points to only 1 domain.

Spanish speaker said...

Why not use the ping to know who is the original content?

The problem is very serius and the solution very easy...

Nancy said...

In the thread of duplicate content you did not address what Google thinks about template websites. Say a custom WSIWG website for real estate agents and all agents have the same default content in the about page, home page, and services page.

How does Google handle things like this? In the vein they do with affiliate websites?

Bill_Leibenguth said...

For the past several months www.freedomflighinc.com/Blake_Miller.html has enjoyed a first page result when searching for "Blake Miller landscaping". Now it has suddenly disappears from the search result completely. I did recently purchase www.Blakemillerlandscaping.com and asked NetNation Host to point it to the above location. Could this have caused the problem?

Autocrat said...

...Bill_Leibenguth...
Please ask for help in the Crawling, Indexing and Ranking discussion group.

.

A part of hte post that causes me a little concern is the definition of 'Negative'.
You state that duplicate content existing shouldn't cause concern for negative results.
No offence folks, but not having your page(content) appear in the SERPs because it has been 'Filtered Out' is a Negative.

Fair enough... no penalty.
Fair enough... the site is not 'damaged' in any way.

But a possibly relevant (and well ranking) page may not appear, which means business can be lost.
So, I repeat (just to make sure it's clear), that's a Negative result.

.

Another part that has caused me to think (always an unpleasent experience ;)), is the part where 'certain sites' (those known for providing non-original content) get shown above/instead of hte autoring site.
Well... call be dumb, but if the site is known for being a fed site, or a scraping site... why is it being show... as it's obviously not going to be the author site, is it???

Word Smith said...

Hello I was reading you article and noticed you had something about a revisit tag for robots. I am still not clear if this is a Myth or if I should disregard what I read here.

The Revisit-After META Tag Myth Continues

http://www.seoconsultants.com/meta-tags/revisit-after.asp
I am confused to say the least.
Any response would be greatly appreciated.
Peace
Gabriella

Susan Moskwa said...

Gabriella,

Here's a post explaining which robots directives are supported by Google/Yahoo!/Microsoft.

I don't see revisit-after anywhere on that list, although Yahoo! and Microsoft support a Crawl-Delay directive that you can use to delay the frequency with which a crawler checks for new content.

Cole said...

A local group met in Raleigh last night to discuss this and some other topics. The questions that we were most interested in were those related to duplicate content within the same site. What I'm getting from this is, if the duplicated content is only seen in one or two places, then it is not detrimental. The origination page will receive the ranking credit and the duplicates will not, and will not be indexed. Is that correct? I don't think that this is so complicated, I like to think that best practices are not detrimental. If the line is pushed, then some consequence is likely. It is fair to say that there is some room for human error in the Google algorithm.

San Diego Foreclosure said...

"Some webmasters have asked what could cause scraped content to rank higher than the original source."

The answer to this question is very very very simple. There's one thing that matters and that's links. If the scraped content has more links to it then the original source 9 times out of 10 it's going to beat the original source in the rankings. Don't believe me? Try it yourself.

"That should be a rare case"
It's not rare at all or else all the Blackhat SEO people would have given up a long time ago.

The key is links my friends nothing more and nothing less. The more the better :)

ShopDownLite.com said...

Wow - hot topic. We tried to address the topic of duplicate content after running out site through IBP10 (http://www.ShopDownLite.com). Now we try and have unique titles and descriptions which are made into the meta and title tags on the site and the url's. Since we sell a pretty common item - pillows - it is hard to come up with unique permutations of pillow names over and over :-)

We still have lots of work to do especially on our basic category pages -but on a sku level were getting better.

www.DesignerSofas4u.co.uk said...

There are many things that are taken into account when ranking is involved. its not just back links that work.

Autocrat said...

No, it's not jsut links... but they are a fairly strong influence.

Just think how many links some of the known fed sites have... litterally thousands.
Not suprising that they will out rank the originator.

.

In a way, it's a another 'double standard'.
You should not copy other peoples content.
You should have unique content.
If you do not, there is a good chacne that we will see your site as a duplicate, and filter it out of the results...
...unless...
you are one of the majort sites, in which case we will leave you well enough alone.

.

Sound about right?
As thats how it looks.

Kevin Georg Paquet said...

I wonder why my site has been banned/removed from google search ( pinoyteens dot net) My traffic decreased a lot since I am not appearing in the searches anymore :(

rruedac said...

I have systematically added my url to Google. I saw my website in Google index maybe 3 months ago, Since that, I don't know why my website is not on Google index anymore. I don´t know Why and I don´t know how can I solve this situation. I have added, again and again the URL to Google, but the Website is not added. What can I do? It´s urgent for me to stay on Google search.

Robert said...

So, Autocrat, what you're saying is that the only way to overcome the scrapers and fed sites is to get more links than they have. Why do I have a feeling that if I got enough links to overcome the scrapers and fed sites that Google would deem them unnatural and punish me for it? Meanwhile, the scrapers somehow continue to get away with having vast networks of manufactured links? The scrapers also are not likely to put a link back to the original source, and they are so big that it is doubtful that even if they did that it would matter. No amount of asking them for a link back is likely to make them change their minds either. This is absurd. Google can run its business however it wants, of course, but it's such a large and dominant company that it should feel morally obligated to fix this problem.

charly said...

I have already put my web in google http://www.cidtur.nh.co.cu but It isn´t indexed yet, so what should I do ??

Anna Glendenning said...
This post has been removed by the author.
Susan Moskwa said...

Hi folks,
If you have specific examples of scrapers that are outranking original content, please post the URLs here and we can look into it. Thanks!

Anna Glendenning said...

These pages are ranking on a site:

This is a post about finding Stumble and nothing really of any content value and Ranks 2/10:

http://adoptedjane.blogspot.com/2008/01/stumbled-uponstumble-upon.html

This is a page ranking 2/10 with nothing more then an Icon that has the number one Cuss Word in it...

http://adoptedjane.blogspot.com/2008/01/todays-icon-of-reflection.html

I believe that there is one other page on the blog that has another Icon or Cuss word Ranking 2/10

and One Page with actual REAL content but in the first sentence in Caps has the Worst Cuss word

http://adoptedjane.blogspot.com/2008/01/truth-and-consequences-part-2.html

These pages together seem to make the blog Rank overal 3/10 based on one page with content...

And I am wondering if these pages are ranking just because they have cuss words and are searched for porn or something? It seems that a site should not be valued this high just because a blogger writes the F-word and puts little buttons on a page and calls it content that is ranked as if it is content...

If writing a blog saying I found Stumble means I get 2 and putting icons made with widgets on a blog makes it content to rank 2 with one page of the whole blog ranking that actually has any original content makes an overall ranking of 3/10 then I just feel that all the efforts I have made to not duplicate anything--and provide content would actually RANK if I wrote bad words talked about Stumble and put pictures with nasty words on them would have been much easier to do...

Then to be mocked that my blog dosen't EVEN rank after all my efforts and have that seen as some reason that someone who wants adoption to be abolished can rub it in my face because I work my butt off trying to help Foster Children Get adopted makes me feel a little hurt...

Especially when nearly every page of the other blog is just full of bad words and hate about adoption it makes me wonder why I even try to get noticed on searches since my efforts are giving this person around the world some Idea I am losing some war with her that I didn't even know I was in...until she tells me my efforts are crap because her blog ranks higher?

my blog:

http://newmemories.blogspot.com/

(which usually doesn't have bad words but again this morning had my buttons pushed and I caved and used a bad word (used by those who want adoption to end because they call themselves that bad word with pride) that seems to make a blog more important than 129,000 children waiting in Foster Care matters...)

I have been in this blog bash for months trying to find a way that Google might offer me at least an occasional rank of something I write which is orinal and not copy and paste all over as the other blog is everywhere... most posts on the other blog are just fully copied writing of other people's and while the duplicated pages themselves are not ranking the blog writer thinks I am wrong to offer her a suggestion that it isn't a good idea to reprint other writers work and she seems to think it is okay because her front page is ranking 3.... and that what she does is fine because it is ranking 3 after I let her know it isn't good to reprint other posts and call it a post of her own... Meaning she will continue to do this and just look at her front page rank thinking I am crazy to offer any tip of advice that might help her not lose her 3 ranks publishing nothing but reprints all over and based on some combination of a few 2 ranks making the whole blog rank 3.

kcmc said...

Hi, I am not sure if I am leaving this post in the right place, but I hope some one can help answer a spam related question. We recently put up a WordPress blog on one of our sites and found out that some junk blog site is linking to our posts with irrelevant anchor text. Whatever the purpose the end result of this is diluting our rankings and SEO efforts. My question is: is there anything that we can do to defend ourselves from this type of attack?

thanks,

Mortgage Man said...

I am very concerned about scrappers, whether intentional or not.

One example I recently found was a Yahoo Answer in which a poster copied the entire content of one of my pages and posted it as his answer.

How when Google looks at both my page and the Yahoo answers page, which will have more trust and be considered the original?

And what about hijacked .edu and .gov sites that scrap content. I used Copyscape and found virtually every page of mine copied by various other sites.

Meanwhile my site is in nowhere to be found in the search results.

OnlineJobs4us.co.in said...

whats the procedure to increase page rank and SEO optmize

Online Data Entry Jobs said...

Hi,

My Duplicate Title Tags are showing three - however, when I click on the links that are pointing to the same page but the anchor text shows a different page. Example:

/companyprofile.htm
/contact.htm
/links.htm

When I click them, they are all CompanyProfile.htm? Any ideas why? I checked all my links and can't find a problem.

http://www.dataplus-svc.com.

Thanks in advance.

Bonnie

lior said...

Hi,

What about the same content ,few times,in the same web site (Tourism Industry )..?

SearchMasters said...

Are you interested in a current case study about how scraper websites copying a home pages meta description caused that homepage to plummet from 12th to 93rd for a search phrase?

12th to 93rd.

Homepage had gone from 12th to 93rd for "Auckland Apartments". Then on 27 July I changed the meta description and content of the page. Once Google had recached the page, the homepage regained its ranking and was back to 12th.

Google, please get this issue sorted. I have been able to regain many websites rankings by making content unique again after scraper sites have copied content. So the case study is not an isolated event.

SearchMasters said...

Any response to my case study?

Lyndi said...

Personally we would like to use RSS feeds to post blog content on our members sites. Yet the concern came up whether or not those RSS blog feeds would be counted towards duplicate content. Really would like to offer multiple ways for members clients to find their blog content rather than justing being a blog in a sea of blogs.

LINKADOR.com said...

At the dawn of the last day 26 to the current day was something very strange, our site despencou the number of visits by more than 90%, is something without explanation and failed to understand why this happened within minutes.

Our server this perfect, inclúsive changed imagining that the problem was and it was not already examined all the logs, scripts and everything is perfect.

Do you have any response on this issue? could be someone manipulating our visits and redirecting it.

the Web site is rich in content and we are worried,

I am sure that this is not normal, the problem is that we still can not answer for anyone from Google, inclúsive adsense, I must inform oque may have occurred to take all measures as soon as possible.

I hope we can help me get to continue at work.

Please evaluate our site

Sds
Bruno Soares R

LINKADOR.com said...

I see the system for filtering content google inefficient, our site is being penalized innocent.

if the filter system was really efficient google not penalize innocent sites.

We are taking losses with it and Google adsense too.
are $ 10,000 less per month on Google adsense, who is leaving to win with our site. unfortunately none of google in one of attention to solving problems well, one day google will have a competitor in the highest and when this happens will be better for us, we are tied with google and unfortunately do not give attention to his publishers that give profits, our Visits fell more than 90% due to Google filters, in assessing our content will see that there are flaws in the way that the filters google judges.

Changed the system of the old site recently? id=? recent ?inkid= more you can access through the same url, use the Mod-rewrite the site and is not not generate 2 urls with the same content, through IDs and Mod-Rewrite (html) we are not doing anything wrong and we are asking for that rehabilitates the Web site as it was, turned off the way Mod-Rewrite (html), I do not know if that vai help resolve.

They should improve in order to filter content without punishing people working seriously with web content

BG Mahesh said...

Is framing considered as duplication of content? In my opinion it is not duplication.

how to make money online said...

Great article, the whole duplicate content saga will worry me a little less now. : )

Sebastian said...

After a GSite crawl, I was informed that I had the following duplicate content:

http://www.collectiveinterest.us

and

http://www.collectiveinterest.us/index.htm?partner=permalink&exprod=permalink%20%20%20(Collective%20Interest%20Green-News%20and%20Information%20on%20Electric%20Cars,%20Hybrid%20Vehicles,%20and%20Solar%20Energy)

Ironically, the variable URL is serving the ads I'd like to serve. I don't know if my site's been hijacked or what.

Can anyone tell me what that duplicate url is?

Jennifer said...

I want to know that if i post come content on my blog from other blog. and i placed one link as source. would it be fine for google.

Franck said...

@Susan

Here is an a specific example of scrapers that are outranking original content:

Search query:
http://www.google.fr/search?q=Cuir+cha%C3%AEne+et+SM&sourceid=navclient-ff&ie=UTF-8&rlz=1B3GGGL_frFR210FR210

Scraper (appears in #1 position):
http://www.truveo.com/Cuir-chaine-et-SM/id/3209926148

Original (with a "clean rewritten" URL that does not even appear in the first 100 SERP):
http://fr.video-x.com/cuir,chaine,et,sm-N-1377.html

It'll be interesting to have some feedback...

Franck

Evan said...

Hello guys,
Several days ago i got a message from google. they found violations of Adsense policies on pages such as www.changemystreet.com/6794.html
So, ad serving was disabled.

I can not understand what kind of violations they found? On my question they directed me to this blog for help. Could you advice?

xalki said...

@ Susan Moskwa
then checkout these please(first the scraper, second the original. Wherever he cut the post to a third or put a link, he did it after my pressure. But the damage has been done for my google search results.):

http://xolloth.blogspot.com/2009/01/tips-pc-windows-internet-ubuntu-linux.html
http://sotostips.blogspot.com/2009/01/google-chrome-windows-2000.html

http://xolloth.blogspot.com/2009/01/microsoft_3251.html
http://sotostips.blogspot.com/2009/01/microsoft.html

http://xolloth.blogspot.com/2009/01/ubuntu-tweak-044-fedora.html
http://sotostips.blogspot.com/2009/01/ubuntu-tweak-044-fedora.html

http://xolloth.blogspot.com/2009/01/flock-internet-browser-firefox-windows.html
http://sotostips.blogspot.com/2009/01/flock-internet-browser-firefox-windows.html

http://xolloth.blogspot.com/2009/01/lynx-browser-accessibility-site.html
http://sotostips.blogspot.com/2009/01/lynx-browser-accessibility-site.html

http://xolloth.blogspot.com/2009/01/firefox-password-yahoo-mail.html
http://sotostips.blogspot.com/2009/01/firefox-password-yahoo.html

http://xolloth.blogspot.com/2009/01/91-windows.html
http://sotostips.blogspot.com/2009/01/91-windows.html

Back To Basics Fuel said...

Whoever wrote this is simply wrong, especially if the thieving scrapers have a higher pagerank than the original content owner has. My site Back To Basics Fuel was scraped and I lost my pagerank, I found the scum site and finally had them remove my content. Now the rank is moving back up, only 1 right now, but it is climbing.

Adele Wiseman said...

Hi

Hope someone with more knowledge than me can give me some advice regarding duplicate content.

I am running 1 site at the moment but intend to grow this to 2. All 3 sites will be using the same MYSQL data base, so in theory each product that I sell will be duplicated 3 times. Will this be viewed as duplicate content and therefore will my sites be penalised? Each site will have a different look and feel about it, and will have different content on 'landing pages', it is just the products that may cause a problem

I would appreciate your views on this.

Pete Kosednar said...

Hello Susan Moskwa:

A large SEO company who has hundred or real estate websites scraped my entire content and now ranks for the key works and shows up as under blogger search at the top. Here are the posts that are all my content.

http://arizona.realestatecenterblog.com/about/

http://arizona.realestatecenterblog.com/advertising/

http://arizona.realestatecenterblog.com/homes-for-sale/

http://arizona.realestatecenterblog.com/white-mountains-roundup-2009/

http://arizona.realestatecenterblog.com/manufactured-home-installation/

http://arizona.realestatecenterblog.com/rv-parks/

http://arizona.realestatecenterblog.com/city-4-tv/

Thanks

Pete

Divya Sai said...

Sorry for the self-promotion, but I have written a tutorial on how to avoid duplication (unintentional)problem in Blogger blogs:
http://bloggerstop.net/2009/04/duplicate-meta-description-and-titles.html

This tutorial solves both Meta tags duplication and Titles Duplication problems !

Regards
Sai
BloggerStop.Net

Bob said...

Another problem is other sites that frame your complete site. I used to find this all the time until I began putting a frame buster script in my code.

Maile Ohye said...

Hi everyone,

Since some time has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Help Forum.

Thanks and take care,
The Webmaster Central Team