Tuesday, December 15, 2009 at 2:47 PM
Webmaster level: IntermediateWe've recently discussed several ways of handling duplicate content on a single website; today we'll look at ways of handling similar duplication across different websites, across different domains. For some sites, there are legitimate reasons to duplicate content across different websites — for instance, to migrate to a new domain name using a web server that cannot create server-side redirects. To help with issues that arise on such sites, we're announcing our support of the cross-domain rel="canonical" link element.

Ways of handling cross-domain content duplication:
- Choose your preferred domain
When confronted with duplicate content, search engines will generally take one version and filter the others out. This can also happen when multiple domain names are involved, so while search engines are generally pretty good at choosing something reasonable, many webmasters prefer to make that decision themselves.
- Reduce in-site duplication
Before starting on cross-site duplicate content questions, make sure to handle duplication within your site first.
- Enable crawling and use 301 (permanent) redirects where possible
Where possible, the most important step is often to use appropriate 301 redirects. These redirects send visitors and search engine crawlers to your preferred domain and make it very clear which URL should be indexed. This is generally the preferred method as it gives clear guidance to everyone who accesses the content. Keep in mind that in order for search engine crawlers to discover these redirects, none of the URLs in the redirect chain can be disallowed via a robots.txt file. Don't forget to handle your www / non-www preference with appropriate redirects and in Webmaster Tools.
- Use the cross-domain rel="canonical" link element
There are situations where it's not easily possible to set up redirects. This could be the case when you need to move your website from a server that does not feature server-side redirects. In a situation like this, you can use the rel="canonical" link element across domains to specify the exact URL of whichever domain is preferred for indexing. While the rel="canonical" link element is seen as a hint and not an absolute directive, we do try to follow it where possible.
Still have questions?
Q: Do the pages have to be identical?
A: No, but they should be similar. Slight differences are fine.
Q: For technical reasons I can't include a 1:1 mapping for the URLs on my sites. Can I just point the rel="canonical" at the homepage of my preferred site?
A: No; this could result in problems. A mapping from old URL to new URL for each URL on the old site is the best way to use rel="canonical".
Q: I'm offering my content / product descriptions for syndication. Do my publishers need to use rel="canonical"?
A: We leave this up to you and your publishers. If the content is similar enough, it might make sense to use rel="canonical", if both parties agree.
Q: My server can't do a 301 (permanent) redirect. Can I use rel="canonical" to move my site?
A: If it's at all possible, you should work with your webhost or web server to do a 301 redirect. Keep in mind that we treat rel="canonical" as a hint, and other search engines may handle it differently. But if a 301 redirect is impossible for some reason, then a rel="canonical" may work for you. For more information, see our guidelines on moving your site.
Q: Should I use a noindex robots meta tag on pages with a rel="canonical" link element?
A: No, since those pages would not be equivalent with regards to indexing - one would be allowed while the other would be blocked. Additionally, it's important that these pages are not disallowed from crawling through a robots.txt file, otherwise search engine crawlers will not be able to discover the rel="canonical" link element.
We hope this makes it easier for you to handle duplicate content in a user-friendly way. Are there still places where you feel that duplicate content is causing your sites problems? Let us know in the Webmaster Help Forum!


65 comments:
What should be the right approach when sharing the same content on different domains for geolocalization reasons.
50% of the content across domains is duplicate. 50% of the content on each domain is targeted at user from that country.
Should I use one preferred domain for the shared (duplicate) content, and use rel="canonical" on links to that domain?
How actually to use rel is not cleared.
Is that possible for blogger blog?
Q: I'm offering my content / product descriptions for syndication. Do my publishers need to use rel="canonical"?
A: Nobody will do it. Most ecommerce websites will suffer otherwise. John Mu, why do you lie to the people?!
Can I use 301 redirect to my website?
If I am having domain.com/index.html I want to do 301 redirect to domain.com/index.html to www.domain.com/
I have done 301 (Permanent Redirection) for my domain example.com to www.example.com which clearly states both the domains belongs to me.
But when you try to set the 'Preferred domain' in Webmaster tools, it gives a message "Part of the process of setting a preferred domain is to verify that you own http://example.com/. Please verify http://example.com/.". It never allows me to set my preferred domain.
Though there is 301 redirection why Google is not accepting the 'Preferred domain' set up.
@John Mueller - not allowing meta robots noindex tag on pages with the canonical link element has some problematic side effects.
As search engines treat the canonical link element as a hint only - and particularly because each search engine will handle it slightly differently - you can't guarantee that it's going to be followed.
Therefore, if you want to avoid duplicate content in the search engines, you'd use the robots meta tag on variant pages to guarantee that it's not indexed.
If you could use both together, you'd allow webmasters to use the canonical link element, safe in the knowledge that if one particular search engine doesn't accept it for a particular page, that the page won't end up being indexed as it will fall back on the meta robots noindex tag.
Have you got any comments on this scenario?
(note: this is mainly regarding near-duplicates, e.g. different navigation paths, product sort order, etc, rather than exact duplicates which we'd generally 301).
@Abi - yes, you can definitely 301 domain.com/index.html to www.domain.com/ - 301 is always the best option if you can do it.
@Babloo - try adding http://example.com/ to your list of sites and then select verify it the same way as you did for "www" - it will automatically follow the 301 and then verify.
@Ian M - Is it that you are saying to have two profiles one with www.example.com and the other one with example.com.
I would prefer not to try for 'Preferred domain' as Google by default indexed my all URLs with www.example.com. They why to have another profile?
Sir my blog is on blogspot, how it will effect on my blog?
@Babloo yes, you have two different profiles. That doesn't mean that Google will suddenly index the non-www one, it just means you've proven that you own it.
Preferred domain isn't required if you've done the 301 redirect properly, but there are minor reasons you might want to do it (e.g. if the site is big and you've only just done the 301 and Google has indexed tons of pages on the non-www, it might take a while for Google to re-spider all the 301s).
How can we apply any of this to a process that for better or worse drives duplicate content. I'm thinking job ads, job aggregators and job boards. The job content might originate on a corporate site but also be sent to 2 or 3 aggregators and also 4 or 5 job boards. Duplicate content in this situation abounds...
@larshberge: using canonical on duplicate content from one site to another, what kind of implication (if any) should that bear for the xml-feed (for any of the involved sites)?
@felipus I would not recommend using rel=canonical or 301 redirects if you are creating content for different geographic areas as this would most likely make geotargeting more difficult.
@Shekhar Sahu: If you can add "link" elements to the "head" section of your pages, you should be able to use this.
@Backup Brother: This is a decision that you have to make if you want to syndicate your content.
@Ian M: You can also use the "googlebot" meta tag, which overrules the more generic "robots" meta tag if you want to provide Google-specific directives per page.
I previously had a wordpress.com blog and now have it self-hosted. Unfortunately, wordpress does not support 301 redirects and the only choice I had was to use a 302 redirect. What are the implications?
So, what does the implementation look like? Is it simply the usual rel="canonical" tag but with a full path ("http://www...")?
Also, how are you planning to validate and avoid abuse? It seems like this opens up opportunity for people to try to scrape content and then canonicalize their sources out of existence.
You haven't actually specified how to impliment the tag. We know how to do it with subdomains
i.e. < link rel="canonical" href="http://www.mysite.com/master_copy.html" >
but what about cross domains?
< link rel="cross-domain-canonical" href="http://www.mastersite.com/master_copy.html" />
?
Please let us know
We' we changed our site with a new one and the it guy made a 301 redirect BUT :
1. we'v lost all our rank
2. 20 % of the back-links are gone
The impact of using "canonical" will be different?
Can we do something?
Nice Tips.
Thank you very much. I often do redirect so this will help me much.
And what about distributing your blog content to other social media sites like facebook with RSS. Is this considered duplicate content as well?
as far as I know the canonical-tag is accepted by other search engines as well - how about cross domain canonicalisation?
Good day all, if you do not post this in your forum then ill know fire fox is 100% a scam.
Today I went into Google to do a typical ranking check for my clients and I did a ranking check in fire fox with the web developer add-on installed, and My website was listed on the front page in the second position. ( I have a screen capture with the time and date in it.) so good to go! but no... I went to internet explorer and the same listing for the same domain was no where to be found ( also have screen capture) So now I'm thinking wth? over 14 year of search engine optimization, I mean I was studying ink search ( the first search engine) back in the early 90's. and this is the first time something like this has happened to me. I also have a screen capture of my listing search (a few minutes later) with the web developer add-on uninstalled) and guess what no where to be found. I'm not going to jump to any conclusions before i get a few opinions. if you know about SEO then you know why I'm amazed this has happened. It doesn't add up at all. Please feel free to respond, Todd Herman.
That's a nice Xmas present, it's very useful in our context.
Cheeesus, Criss - Cross. A nice and good begining to the crossbrowser freefloatin-formation.
Now we just waiting for a smoot tool to migrate from blogspot without loosing al da juice. The migratory solutions today does not seem to work too good.
I made a suggestion on using Tynt to provide this meta tag every time someone copy-paste's from your website - but would that work if this meta tag isn't in the header?
Hi, please I am a blogger newbie so forgive me if my question sounds silly.
I had my site on blogspot, with the .blogspot.com suffix.I submitted my sitemaps and they were indexed by google. However, now I have acquired my own domain name, but hosting it on blogger, with a redirect to my new domain name. I den decided to submit sitemaps for the new domain name too. So my question is, will google regard this as duplicate content? Since its two diff sitemaps with the same content tho one redirects to the other? And how do I correct it?
Shall i have to use link href="new domain" rel="cross-domain-canonical"
for every post of blogger blog individually??
Todd: Google has provided personalized search results for some time now. Your site is important to you, so obviously it "should" come first - you probably click on it all the time! But IE won't have your cookies, and therefore you're seeing what an average person would see.
This is really good news so we can be less worried about duplicate content.We have used it already in our web design site. Just added in all our secondary domains this property in a link attribute:
Thanks!
The Google webmaster help page at
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=139394
states in the video that you can't use a different domain (but a different sub-domain is okay). The text at the foot of the page "Can rel="canonical" be used to suggest a canonical url on a completely different domain?" isn't conclusive about this either.
Can you clarify whether this link element can be used with a completely different host name?
Hi,
John Mu can you please answer this for me? Just wondering something. I'm very new to this but I'm currently in the process of purchasing about 80 domain names for my industry. All these are regular keyword searches. I have about 4 main pages on my website and I want to divide the 80 keyword domains so that they land on 1 of the 4 pages.
How should I do this? I want each keyword to be able to be found in the search engines when someone types in that particular keyword.
Should I have a separate landing page for each of the sites, even though they are the same as other sites?
Or should I redirect back to one site? If I do this, will each keyword show up in search engines?
Thanks for your help. Much Appreciated.
Justin.
@justin me too. I have more than 50 national domains right now. I've adopted some parts of the URL (and canonical too) to the language of the origin, but still, I have some duplicate issues. And Merry Christmas to Google and everyone :)
This is possibly really good news for me, and I've implemented it in the only way I'm able - but I have no idea whether it'll work.
At my original site I have no control over the head of my documents. The only way I've been able to add the canonical meta tag is through Javascript - and because I've seen Googlebot process quite a lot of Javascript (links written through document.write, for example) I think that this might work.
But on the other hand, it might not. My canonical link is present in memory (it's readable by the Firefox Web Developer plug-in), but not on the source displayed by a browser. Any idea if Googlebot will be able to read it?
Example: http://www.printfection.com/retro-future
Very helpful!! Thanks so much!! God bless!!
looks like google is making most of you jump the hoops with new tags every few mo.. this is sad
Posting spam to the google webmaster blog?
Does that improve rankings for the domains used in the comment?
Seems like shooting oneself in the foot.
It's not really clearly. But thanks for the tips.
you post has cleared my idea regarding canonical.
thanx
Very helpful.
I like that it clearly mention that canonical is not powerful like redirects.... and it's no harm to add
i ping my web site s new content to my blogs a social network site with links back to my site added in the new articals dose this count as duplicate content as i was the first to post and all content links back to my site .i must have a hundred adds using the first 5 lines of my home page ?????
Great Info. Learned a lot.
Googlebot seems to not be recognizing one of the main 3 words on my 200+ page website. Why would that be the case? My new website version is about a month old. Could it be that Google has not crawled all the pages? But in any case, this is word that is used quite often on every page. http://www.HomeArchitects.com thanks for your advice.
How can i use rel=canonical, if i have duplicated content inside of one file? (i have within one file flash and alternated content for users with disabled flash). i would like to point googlebot with canonical url to flash-file, and not to alternated content.
This cross domain canonical tag is ideal in situation were you have multiple domains on the same website. We have even used it last month on a solution were some pages of the main website (http://www.slaapgilde.nl/slaapadvies.aspx) were shown on subwebsites (http://www.devriesslapen.nl/slaapadvies.aspx). In this case we used the canonical tag on the subwebsite.
With a 301 redirect I can transfer the Link Popularity values from a page to another, but what happens to the Link Popularity values if I use rel="canonical"?
I am considering publishing my entire site in a different language.
Will that be seen as duplicate content?
This is really a common issue with all multi site owners.
Provided the nice information.
Thanks
I'm wondering whether I should use this on the mobile versions of my pages. The article is the same, but the rest of the page content differs significantly: The standard version has ads, menus, navigation, and other stuff. Both are HTML 4.01 Transitional so I can't rely on Google automatically detecting that one is the mobile version. The mobile version always ranks higher, presumably because of the tighter content.
to do a canonical link for a home page on the root of the domain
http://www.domain.com/
the tag will be
link rel="canonical" href="http://www.example.com/"
or
link rel="canonical" href="http://www.example.com"/
I found little documentation about exact canonicals in a root document.
Is this the correct way to handle the problem of duplicate content for a recycled ip address that has 2 former domains still pointing to it?
Example:www.hbdirect.com (correct & current domain for the ip address)
These are the offending domains below which are still pointing to MY new ip address:
http://poolcenter.idealbb.net
http://www.namasteyindia.com
Help please?
If our website is global brand which we have english version for US and also english version for each country, content is nearly the same but slightly change a bit but the content remain consistency across the local domain due to brand policy , if we would target for each country (google.com.sg , google.com.au, etc). We still demand for presence of local website (Eg., www.widget.com.sg instead of www.widget.com)
What should be the approach for this?
Can the URL end with a backslash? In the example the URL was: example.com/product.php?item=swedish-fish
Could the URL also be:
example.com/products/
Oh No... This is going to cause hackers to go ballistic.
Imagine if someone hacked your site, added a cross domain rel="canonical", created a copy of your content, and launched a new domain?
Link juice = stolen.
I wonder if this set of rules will have any use in the future?
They look like temporary patches to me.
This is a good solution for many purposes, however it has some disadvantages too:
- It requires a presence on each individual page
- Works on HTML only (what about images and other non-HTML content appearing in search results?)
- Customers with content management systems may not have access to the HTML headers
- Web services may not be easily editable in their HTML headers, yet they too are indexed
That's why I also like the idea of being able to specify the canonical host in robots.txt, as proposed here:
Proposal for Setting Canonical Host via Robots.txt
http://www.cloanto.com/users/mcb/20041011chrobots.html
Maybe both approaches could be supported by Google, etc.? After all, the dual robots.txt/HTML header approach has already shown its benefits in other scenarios where spidering and caching have to be managed.
But on a blog to avoid DC, i use a canonical link to an article. Also i have a noindex to categories and tags. Would this be anough to avoid DC?
I wonder if other search engines (like Bing or Yahoo) support Cross-Domain Canonical tag or not.
If the answer is not, what can we do without 301 redirect (cause the policy of Google Adwords doesn't allow 301 redirect for landing page).
I think that Google should even consider extending it across competitors. It would only require that, instead of a single canonical, there is one canonical per company. Two problems are created when Google discard pages (with similar content) from the index or omit them in the search results : first, authors do not always get the credit and, second, which company hosts the content might actually be important for the users. This single modification solves these two problems. There are many issues to consider, but there is no room here. Here is more about whether it makes sense to use the canonical tag across competing companies. Please remove the link if you think it is not useful to your reader. I understand that links are money for you. Of course, in my case, I don't make money with them.
My website is spread on 3 different servers in a cluster.
A couple of years ago I created the following A name entries on my DNS, but never published on line.
www3.mywebsite.ext
www4.mywebsite.ext
www5.mywebsite.ext
I had problems with some servers and I needed to check through the web what server was down.
Besides that, since all uploads arrive to www3 and then are copied to all the other servers of the cluster, and since my customers upload a lot of images, after a image upload I show to them [img src="http://www3.mywebsite.ext/image.jpg"] instead of [img src="http://www.mywebsite.ext/image.jpg"] to be sure they see the image (the sync could slow a bit and they could not see the image uploaded immediately).
Google is now indexing www, www3, www4 and www5. They are of course the same website and the google index has duplicate content.
I did read how to handle duplicate content across different websites, but I am not sure that the canonical links fit with my condition.
My problem seems most related to the "preferred domain" in webmaster central when google ask if you like to set as preferred domain www.domain.ext or domain.ext. Of course I looked for this in Webmaster Central and I did not find a way to resolve my problem.
So, I'm in front of three different solutions:
1) Eliminate the entries in my DNS (this will cause a lot of problems to resolve in the code)
2) To use the link rel="canonical" in each page of the website pointing it to "www.mywebsite.ext/page.html"
3) Continue to hope that Google already set a solution for this in the webmaster central that I still did not find.
Any suggestion?
If you have no inbound links toward www3, www4 and www5 and these domains are not useful to search engine users, then you lose nothing in using a noindex metatag or, even better, put restrictions in your robots.txt file and they will not get indexed. The only advantage of the canonical tag is that it allows a consolidation of the link properties.
@Dominic 108 - but if you did have a number of inbound links would it be worth opening up these dup domains with the x-domain canonical.
Will it take away crawl quota from the main sub-domain or does google treat each sub-domain differently in terms of crawl quota?
I have a very serious problem, Yesterday I move my hosting of ( TechiSpot ) from Australia server to USA server.
Today I got warning from Google that I having duplicate content in my website, even there are all original content, if any chunk I copy from any site I always give reference of the website.
So how can I tell Google, It is my same site that was hosted in Australia and now move to USA it is not duplicated.
Please help me if anyone can
Important question:
What about duplicated e-commerce B2B/B2C?
I have a site for retailers and another for no-retailers.
..like says www.shop-B2B.com and www.shop-B2C.com
What should I do?
Thanks
Dave
Hi everyone,
Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Central Help Forum.
Thanks and take care,
The Webmaster Central Team
Post a Comment