Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Duplicate content and multiple site issues

Tuesday, September 15, 2009 at 3:17 PM

Webmaster Level: All

Last month, I gave a talk at the Search Engine Strategies San Jose conference on Duplicate Content and Multiple Site Issues. For those who couldn't make it to the conference or would like a recap, we've reproduced the talk on the Google Webmaster Central YouTube Channel. Below you can see the short video reproduced from the content at SES:



You can view the slides here:



The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

28 comments:

Ahmad Alfy said...

Simple and elegant!
I am writing my canonical URL extension for Joomla now!

Internet Freelancer said...

Thank you for sharing valuable information

Computer Repair Services said...

Very informative. Thanks.

Mr Stuart said...

Great vid, interesting info. You look and sound like Matt Cutts love child.

UK Air Charter Service said...

I found the part on canonical redirects very useful. Thanks!

Tom Hallett said...

Very informative, thanks. Some good iformation there even for experienced SEOs.

Marco Di Fresco said...

A collateral question: suppose that site A, that produces news on a specific topic, give authorization to site B, a generalist news site, to republish pages. For other factors, site B is more popular than site A.

What site A can do to claim that its pages are the original ones?

rob said...

Do you know, I'm amazed at how many sites don't have www or non-www sorted out - some sites won't even load unless you get it right. Very good slideshow, which thankfully just highlights the fact we're pretty much in line with you guys!

Sandy said...

Very informative information. Thanks for sharing!

msaxe said...

Good piece. Thanks.

Play4Me said...

We have two subdomains with language specific content "fr.domain.com" and "dk.domain.com", as well as the main property www.domain.com.

Apart from the dilution of ranking, would/could Googlebot consider the content to be duplicates of the main property?

Ninny said...

Thanks for providing this information

WSI Web Marketing said...

Great Video! Can't waiting to start using the canonical url as an alternative to 301 redirects.

Anthony said...

My site is in good shape for always using the same canonical name, but the in-site Google search engine keeps screwing me up, coming up with "ftp.aplawrence.com" even though nowhere do I have any such link.

I wish I could tell the Search to make all results be the canonical name!

jorislagong said...

In almost two years that I've been working in one of the good company here in the Philippines, so far, so far good I'll always removed all the duplicates sites. And for that information also, thanks.. It would come up for my additional knowledge.

skiold said...

As @Play4Me I'm not sure if or how multilingual sites are considered from a content duplication perspective.

Please, any one can share some insights?

Shekhar Sahu said...

Let us see

Travis said...

Nice examples of unknown duplicate content users may have.

Robert said...

This is information we already know. What I would like like to learn is how you plan to address the never-ending problem of how other sites can steal (they call it aggregate) your content and displace you in the search rankings.

What is particularly egregious about how Google handles this issue is that when someone steals your content, your article (which previously ranked in the top 10) falls off the face of the Earth. It's nowhere to be found in Google.

Even for searches on specific quotes, the original article subsequently gets buried beneath every damn scraper site that stops by.

Here's the kicker: If enough of your content gets stolen in a short amount of time, everything else that resides in the same folder also gets tossed out of Google. It doesn't matter how many links you have to your content. It doesn't matter how long your content was online before it was stolen. It doesn't matter if some pieces in that folder remain original and unique.

If someone wants to screw your site, they can steal a handful of articles from you and post them on a few other sites (with or without links back to yours). A week later, you're done.

Every few months, one of your engineering team makes an absurd post like this. People respond and say you don't get it right. Yet, you do nothing to fix the problem.

Last time it was Susan Moskwa who made the post. Later, in one of her comments, she admitted/stated that if you syndicate your articles, you run the risk of those other sites outranking you.

That's fine and dandy if you syndicate on purpose. But what happens with other sites STEAL YOUR CONTENT?

There are only so many DMCA complaints you can file. And it only really works for sites hosted in the U.S. that are run by ethical companies.

Yahoo/Microsoft do not have problems with this issue. They get it right. Why can't Google close this loophole so unethical SEO services can't kill off small business content owners?

Robert said...

@Marco Di Fresco: In Google, Site B will outrank Site A, and Site A may be irreparably harmed by it. In Yahoo/Microsoft, Site A will be recognized as the original, and it will outrank Site B.

They take a lot of flak, but Yahoo/Microsoft do a better job of showing diverse, non-duplicate search results. And they almost always get it right by showing the original source while suppressing the duplicates, authorized or otherwise.

moca_interactive said...

A reflection about your post.

Scenario: in some circumstances I have to keep the non-canonical URL "live" within my website.

Now the questions:

1) About 301

Via 301 I'm telling to Google that the duplicate URL "does not exist" and it should exclude that URL from its index. Furthermore, the canonical one will pick up the "juice".

2) Internal link structure (the link is still "live")

At the same time, via internal link structure, I'm telling to Google that the duplicate URL is still "live". Of course, less important then the canonical one, but certainly existing.
In the end, Google comes to know the whole web site by crawling the internal link structure. Right?

I'm asking myself another question: does Google like to encounter "every time" a 301?

Summing up the previous points, a "final" question: don't you think this represent a nonsense?

Thank you!

Have a nice day.

Art by Art said...

What if you copy and paste content from your own website into your own blog? Will that help or hurt you? And which site will be ommitted? Will google rank one page lower than the other, or what's the explanation? Thanks!!

TJM said...

Great info and explained very well. Even a SEO newbie would understand this. Great job !

Tony

pokerino said...

I have two questions:
1. What about RSS feeds that are placed on a site and are showing content of another source? For example, if I’m using google new for a certain topic and placing it as a feed on my blog, is that considered duplication or is it an acceptable referral method?
2. In blogs you get the same content sometimes in the different pages like: home page, different categories, auto generated tag pages and in posts. This is not done in intention but is created by default through the commonly used blog platforms. How does this affect a blog in terms of page weight and coming up is search results?

Rahul said...

very nice video

thanks
http://bharatclick.com

omr said...

IMPORTANT NOTE:

CROSS-DOMAIN rel=canonical support was announced in December 2009.

See:
http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html

feng said...

As I understand from the video, you CAN have duplicate content for multiple domains using different domain that have the extensions of the location or country you are targeting. Example: Domain.com and Domain.com.au can have duplicate content without being penalized. I wonder if someone can confirm this for me.

Google Webmaster Central said...

Hi everyone,

Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Central Help Forum.

Thanks and take care,
The Webmaster Central Team