Helping webmasters from user to user

Thursday, December 24, 2009 at 2:30 PM

You have to have some kind of super-powers to keep up with all of the issues posted in our Webmaster Help Forum—that's why we call our Top Contributors the "Bionic Posters." They're able to leap through tall questions in a single bound, providing helpful and solid information all around. We're thankful to the Bionics for tackling problems both hard and easy (well, easy if you know how). Our current Bionic Posters are: Webado (Christina), Phil Payne, Red Cardinal (Richard), Shades1 (Louis), Autocrat, Tim Abracadabra, Aaron, Cristina, Robbo, John, Becky Sharpe, Sasch, BbDeath, Beussery (Brian), Chibcha (Terry), Luzie (Herbert), 奥宁 (Andy), Ashley, Kaleh and Redleg!

With thousands of webmasters visiting the English Help Forum every day, some questions naturally pop up more often than others. To help catch these common issues, the Bionic Posters have also helped to create and maintain a comprehensive list of frequently asked questions and their answers. These FAQs cover everything from "Why isn't my site indexed?" to diagnosing difficult issues with the help of Google Webmaster Tools, often referring to our Webmaster Help Center for specific topics. Before you post in the forum, make sure you've read through these resources and do a quick search in the forum; chances are high that your question has been answered there already.

Besides the Bionic Posters, we're lucky to have a number of very active and helpful users in the forum, such as: squibble, Lysis, yasir, Steven Lockey, seo101, RickyD, MartinJ and many more. Thank you all for making this community so captivating and—most of the time—friendly.

Here are just a few (well, a little more than a few) of the many comments that we've seen posted in the forum:

  • "Thank you for this forum... Thank you to those that take the time to answer and care!"
  • "I've only posted one question here, but have received a wealth of knowledge by reading tons of posts and answers. The time you experts put into helping people with their problems is very inspiring and my hat's off to each of you. Anyway, I just wanted to let you know that your services aren't going unnoticed and I truly appreciate the lessons."
  • "Thank you very much cristina, what you told me has done the trick. I really appriciate the help as this has been bugging me for a while now and I didn't know what was wrong."
  • "thank you ssssssssssoooo much kaleh. "
  • "OK, Phil Payne big thanks to You! I have made changes and maybe people are starting to find me in G! Thanks to Ashley, I've started to make exclusive and relevant content for people."
  • "If anything, it has helped me reflect on the sites and projects of days gone by so as to see what I could have done better - so that I can deliver that much more and better results going forward. I've learned that some things I had done right, were spot on, and other issues could have been handled differently, as well as a host of technical information that I've stored away for future use. Bottom Line: this forum rocks and is incredibly helpful."
  • "I asked a handful of questions, got GREAT help while doing a whole lot of lurking, and now I've got a site that rocks!! (...) Huge thanks to all the Top Contributors, and a very special mention to WEBADO, who helped me a TON with my .htaccess file."
  • "Over the years of reading (and sometimes contributing) to this forum I think it has helped to remove many false assumptions and doubts over Google's ranking systems. Contrary to what many have said I verily believe Google can benefit small businesses. Keep up the good work. "
  • "The forum members are awesome and are a most impressive bunch. Their contribution is immeasurable as it is huge. Not only have they helped Google in their success as a profitable business entity, but also helped webmasters both aspiring and experienced. There is also an engender(ment) of "family" or "belonging" in the group that has transcended the best and worst of times (Current forum change still TBD :-) ). We can agree, disagree and agree to disagree but remain respectful and civil (Usually :-) )."
  • "Hi Redleg, Thank you very much for all of the information. Without your help, I don't think I would ever have known how to find the problem. "
  • "What an amazing board. Over the last few days I have asked 1 question and recieved a ton of advice mainly from Autocrat. "
  • "A big thank you to the forum and the contributors that helped me get my site on Google . After some hassle with my web hosters and their naff submission service, issues over adding pages Google can see, issues over Sitemaps, I can now say that when I put my site name into the search and when i put in [custom made watch box], for instance, my site now comes up."
  • "Thank you Autocrat! You are MAGNIFICENT! (...) I am your biggest fan today. : ) Imagine Joe Cocker singing With a Little Help from My Friends...that's my theme song today."
  • "I've done a lot of reading since then and I've learned more in the last year than I learned in the previous 10. When I stumbled into this forum I had no idea what I was getting into but finding this forum was a gift from God! Words cannot express the amount of gratitude I feel for the help you have given me and I wish I could repay you some how.... I don't mean to sound so mushy, but I write this with tears in my eyes and I am truly, truly grateful..."

Are you new to the Webmaster Help Forum? Tell us a little bit about yourself and then join us to learn more and help others!

Handling legitimate cross-domain content duplication

Tuesday, December 15, 2009 at 2:47 PM

Webmaster level: Intermediate

We've recently discussed several ways of handling duplicate content on a single website; today we'll look at ways of handling similar duplication across different websites, across different domains. For some sites, there are legitimate reasons to duplicate content across different websites — for instance, to migrate to a new domain name using a web server that cannot create server-side redirects. To help with issues that arise on such sites, we're announcing our support of the cross-domain rel="canonical" link element.



Ways of handling cross-domain content duplication:
  • Choose your preferred domain
    When confronted with duplicate content, search engines will generally take one version and filter the others out. This can also happen when multiple domain names are involved, so while search engines are generally pretty good at choosing something reasonable, many webmasters prefer to make that decision themselves.
  • Enable crawling and use 301 (permanent) redirects where possible
    Where possible, the most important step is often to use appropriate 301 redirects. These redirects send visitors and search engine crawlers to your preferred domain and make it very clear which URL should be indexed. This is generally the preferred method as it gives clear guidance to everyone who accesses the content. Keep in mind that in order for search engine crawlers to discover these redirects, none of the URLs in the redirect chain can be disallowed via a robots.txt file. Don't forget to handle your www / non-www preference with appropriate redirects and in Webmaster Tools.
  • Use the cross-domain rel="canonical" link element
    There are situations where it's not easily possible to set up redirects. This could be the case when you need to move your website from a server that does not feature server-side redirects. In a situation like this, you can use the rel="canonical" link element across domains to specify the exact URL of whichever domain is preferred for indexing. While the rel="canonical" link element is seen as a hint and not an absolute directive, we do try to follow it where possible.


Still have questions?

Q: Do the pages have to be identical?
A: No, but they should be similar. Slight differences are fine.

Q: For technical reasons I can't include a 1:1 mapping for the URLs on my sites. Can I just point the rel="canonical" at the homepage of my preferred site?
A: No; this could result in problems. A mapping from old URL to new URL for each URL on the old site is the best way to use rel="canonical".

Q: I'm offering my content / product descriptions for syndication. Do my publishers need to use rel="canonical"?
A: We leave this up to you and your publishers. If the content is similar enough, it might make sense to use rel="canonical", if both parties agree.

Q: My server can't do a 301 (permanent) redirect. Can I use rel="canonical" to move my site?
A: If it's at all possible, you should work with your webhost or web server to do a 301 redirect. Keep in mind that we treat rel="canonical" as a hint, and other search engines may handle it differently. But if a 301 redirect is impossible for some reason, then a rel="canonical" may work for you. For more information, see our guidelines on moving your site.

Q: Should I use a noindex robots meta tag on pages with a rel="canonical" link element?
A: No, since those pages would not be equivalent with regards to indexing - one would be allowed while the other would be blocked. Additionally, it's important that these pages are not disallowed from crawling through a robots.txt file, otherwise search engine crawlers will not be able to discover the rel="canonical" link element.

We hope this makes it easier for you to handle duplicate content in a user-friendly way. Are there still places where you feel that duplicate content is causing your sites problems? Let us know in the Webmaster Help Forum!


Your site's performance in Webmaster Tools

Friday, December 04, 2009 at 2:26 PM

Webmaster level: Intermediate

Let's take a quick look at the individual sections in the Google Webmaster Tools' Site Performance feature:

Performance overview



The performance overview shows a graph of the aggregated speed numbers for the website, based on the pages that were most frequently accessed by visitors who use the Google Toolbar with the PageRank feature activated. By using data from Google Toolbar users, you don't have to worry about us testing your site from a location that your users do not use. For example, if your site is in Germany and all your users are in Germany, the chart will reflect the load time as seen in Germany. Similarly, if your users mostly use dial-up connections (or high-speed broadband), that would be reflected in these numbers as well. If only a few visitors of your site use the Google Toolbar, we may not be able to show this data in Webmaster Tools.

The line between the red and the green sections on the chart is the 20th percentile — only 20% of the sites we check are faster than this. This website is pretty close to the 20% mark, which pages would we have to work on first?

Example pages with load times



In this section you can find some example pages along with the average, aggregated load times that users observed while they were on your website. These numbers may differ from what you see as they can come from a variety of different browsers, internet connections and locations. This list can help you to recognize pages which take longer than average to load — pages that slow your users down.

As the page load times are based on actual accesses made by your users, it's possible that it includes pages which are disallowed from crawling. While Googlebot will not be able to crawl disallowed pages, they may be a significant part of your site's user experience.

Keep in mind that you may see occasional spikes here, so it's recommended that you watch the load times over a short period to see what's stable. If you consistently see very large load times, that probably means that most of your users are seeing very slow page loads (whether due to slow connections or otherwise), so it's something you should take seriously.

Page Speed suggestions



These suggestions are based on the Page Speed Firefox / Firebug plugin. In order to find the details for these sample URLs, we fetch the page and all its embedded resources with Googlebot. If we are not able to fetch all of embedded content with Googlebot, we may not be able to provide a complete analysis. Similarly, if the servers return slightly modified content for Googlebot than they would for normal users, this may affect what is shown here. For example, some servers return uncompressed content for Googlebot, similar to what would be served to older browsers that do not support gzip-compressed embedded content (this is currently the case for Google Analytics' "ga.js").

When looking at flagged issues regarding common third-party code such as website analytics scripts, one factor that can also play a role is how wide-spread these scripts are on the web. If they are common across the web, chances are that the average user's browser will have already cached the DNS lookup and the content of the script. While these scripts will still be flagged as separate DNS lookups, in practice they might not play a strong role in the actual load time.

We offer these suggestions as a useful guideline regarding possible first performance improvement steps and recommend using the Page Speed plugin (or a similar tool) directly when working on your website. This allows you to better recognize the blocking issues and makes it easy to see how modifications on the server affect the total load time.


For questions about Webmaster Tools and this new feature, feel free to read the Help Center article, search and post in the Webmaster Help Forums or in the Page Speed discussion group. We hope this information helps you make your website even faster!

How fast is your site?

Wednesday, December 02, 2009 at 2:04 PM

We've just launched Site Performance, an experimental feature in Webmaster Tools that shows you information about the speed of your site and suggestions for making it faster.

This is a small step in our larger effort to make the web faster. Studies have repeatedly shown that speeding up your site leads to increased user retention and activity, higher revenue and lower costs. Towards the goal of making every webpage load as fast as flipping the pages of a magazine, we have provided articles on best practices, active discussion forums and many tools to diagnose and fix speed issues.

Now we bring data and statistics specifically applicable to your site. On Site Performance, you'll find how fast your pages load, how they've fared over time, how your site's load time compares to that of other sites, examples of specific pages and their actual page load times, and Page Speed suggestions that can help reduce user-perceived latency. Our goal is to bring you specific and actionable speed information backed by data, so stay tuned for more of this in the future.

screenshot of Site Performance

The load time data is derived from aggregated information sent by users of your site who have installed the Google Toolbar and opted-in to its enhanced features. We only show the performance charts and tables when there's enough data, so not all of them may be shown if your site has little traffic. The data currently represents a global average; a specific user may experience your site faster or slower than the average depending on their location and network conditions.

This is a Labs product that is still in development. We hope you find it useful. Please let us know your feedback through the Webmaster Tools Forum.

Update on 12/04/2009: Our team just reconvened to provide you more information on this feature. Check out JohnMu's latest post on Site Performance!

New User Agent for News

Webmaster Level: Intermediate

Today we are announcing a new user agent for robots.txt called Googlebot-News that gives publishers even more control over their content. In case you haven't heard of robots.txt, it's a web-wide standard that has been in use since 1994 and which has support from all major search engines and well-behaved "robots" that process the web. When a search engine checks whether it has permission to crawl and index a web page, the "check if we're allowed to crawl this page" mechanism is robots.txt.

Publishers could easily contact us via a form if they didn't want to be included in Google News but did want to be in Google's web search index. Now, publishers can manage their content in Google News in an even more automated way. Site owners can just add Googlebot-News specific directives to their robots.txt file. Similar to the Googlebot and Googlebot-Image user agents, the new Googlebot-News user agent can be used to specify which pages of a website should be crawled and ultimately appear in Google News.

Here are a few examples for publishers:

Include pages in both Google web search and News:
User-agent: Googlebot
Disallow:

This is the easiest case. In fact, a robots.txt file is not even required for this case.

Include pages in Google web search, but not in News:
User-agent: Googlebot
Disallow:

User-agent: Googlebot-News
Disallow: /

This robots.txt file says that no files are disallowed from Google's general web crawler, called Googlebot, but the user agent "Googlebot-News" is blocked from all files on the website.

Include pages in Google News, but not Google web search:
User-agent: Googlebot
Disallow: /

User-agent: Googlebot-News
Disallow:

When parsing a robots.txt file, Google obeys the most specific directive. The first two lines tell us that Googlebot (the user agent for Google's web index) is blocked from crawling any pages from the site. The next directive, which applies to the more specific user agent for Google News, overrides the blocking of Googlebot and gives permission for Google News to crawl pages from the website.

Block different sets of pages from Google web search and Google News:
User-agent: Googlebot
Disallow: /latest_news

User-agent: Googlebot-News
Disallow: /archives

The pages blocked from Google web search and Google News can be controlled independently. This robots.txt file blocks recent news articles (URLs in the /latest_news folder) from Google web search, but allows them to appear on Google News. Conversely, it blocks premium content (URLs in the /archives folder) from Google News, but allows them to appear in Google web search.

Stop Google web search and Google News from crawling pages:
User-agent: Googlebot
Disallow: /

This robots.txt file tells Google that Googlebot, the user agent for our web search crawler, should not crawl any pages from the site. Because no specific directive for Googlebot-News is given, our News search will abide by the general guidance for Googlebot and will not crawl pages for Google News.

For some queries, we display results from Google News in a discrete box or section on the web search results page, along with our regular web search results. We sometimes do this for Images, Videos, Maps, and Products, too. This is known as Universal search results. Since Google News powers Universal "News" search results, if you block the Googlebot-News user agent then your site's news stories won't be included in Universal search results.

We are currently testing our support for the new user agent. If you see any problems please let us know. Note that it is possible for Google to return a link to a page in some situations even when we didn't crawl that page. If you'd like to read more about robots.txt, we provide additional documentation on our website. We hope webmasters will enjoy the flexibility and easier management that the Googlebot-News user agent provides.

Region Tags in Google Search Results

Tuesday, December 01, 2009 at 4:25 PM

Webmaster Level: All

Country-code top-level domains (or ccTLDs) can provide people with a quick and valuable clue about the location of a website—for example, ".fr" for France or ".co.jp" for Japan. However, for certain top level domains like .com, .info and .org, it's not as easy to figure out the location. That's why today we're adding region information supplied by webmasters to the green address line on some Google search results.

This feature is easiest to explain through an example. Let's say you've heard about a boxing club in Canada called "Capital City Boxing." You try a search for [capital city boxing] to find out more, but it's hard to tell which result is the one you're looking for. Here's a screen shot:


None of the results provide any location information in the title or snippet, nor do they have a regional TLD (such as .ca for Canada). The only way to find the result you're looking for is to refine your search ([capital city boxing canada] works) or click through the various links to figure it out. Clicking through the first result reveals that there's apparently another "Capital City Boxing" club in Alabama.

Region tags improve search results by providing valuable information about website location right in the green URL line. Continuing our prior example, here's a screen shot of the new region tag (circled in red):



As you can see, the fourth result now includes the region name "Canada" after the green URL, so you can immediately tell that this result relates to the boxing club in Canada. With the new display, you no longer need to refine your search or click through the results to figure out which page is the one you're looking for. In general, our hope is that these region tags will help searchers more quickly identify which results are most relevant to their queries.

As a webmaster, you can control how this feature works by adjusting your Geographic Targeting settings. Log in to Webmaster Tools and choose Site configuration > Settings > Geographic Target. From here you can associate a particular country/region with your site. These settings will determine the name that appears as a region tag. You can learn more about using the Geographic Target tool in a prior blog post and in our Help Center.

We currently show region tags only for certain domains such as .com and .net where the location information would otherwise be unclear. We don't show region tags for results on domains like .br for Brazil, because the location is already implied by the green URL line in our default display. In addition, we only display region tags when the region supplied by the site owner is different from the domain where the search was entered. For example, if you do a search from the Singapore Google domain (google.com.sg), we won't show you region tags for all the websites webmasters have targeted to Singapore because we'd end up tagging too many results, and the tag is really most relevant for foreign regions. For the initial release, we anticipate roughly 1% of search results pages will include webpages with a region tag.

We hope you'll find this new feature useful, and we welcome your feedback.

Changes in First Click Free

Webmaster level: Intermediate

We love helping publishers make their content available to large groups of readers, and working on ways to make the world's information useful and accessible through our search results. At the same time, we're also aware of the fact that creating high-quality content is not easy and, in many cases, expensive. This is one of the reasons why we initially launched First Click Free for Google News and Google Web Search -- to allow publishers to sell access to their content in general while still allowing users to find it through our search results.

While we're happy to see that a number of publishers are already using First Click Free, we've found that some who might try it are worried about people abusing the spirit of First Click Free to access almost all of their content. As most users are generally happy to be able to access just a few pages from these premium content providers, we've decided to allow publishers to limit the number of accesses under the First Click Free policy to five free accesses per user each day. This change applies to both Google News publishers as well as websites indexed in Google's Web Search. We hope that this encourages even more publishers to open up more content to users around the world!

Questions and answers about First Click Free

Q: Do the rest of the old guidelines still apply?
A: Yes, please check the guidelines for Google News as well as the guidelines for Web Search and the associated blog post for more information.

Q: Can I apply First Click Free to only a section of my site / only for Google News (or only for Web Search)?
A: Sure! Just make sure that both Googlebot and users from the appropriate search results can view the content as required. Keep in mind that showing Googlebot the full content of a page while showing users a registration page would be considered cloaking.

Q: Do I have to sign up to use First Click Free?
A: Please let us know about your decision to use First Click Free if you are using it for Google News. There's no need to inform us of the First Click Free status for Google Web Search.

Q: What is the preferred way to count a user's accesses?
A: Since there are many different site architectures, we believe it's best to leave this up to the publisher to decide.

(Please see our related blog post for more information on First Click Free for Google News.)

GENERIC CIALIS on my website? I think my site has been hacked!

Thursday, November 26, 2009 at 3:23 AM

How to use "Fetch as Googlebot", part 1
Webmaster level: Intermediate

Has your site ever dropped suddenly from the index or disappeared mysteriously from search results? Have you ever received a notice that your site is using cloaking techniques? Unfortunately, sometimes a malicious party "hacks" a website: they penetrate the security of a site and insert undesirable content. Sophisticated attackers can camouflage this spammy or dangerous content so that it doesn't appear for normal users, and appears only to Googlebot, which could negatively impact your site in Google's results.

In such cases it used to be very difficult to detect the problem, because the site would appear normal in the eyes of the user. It may be possible that only requests with a User-agent: of Googlebot and coming from Googlebot's IP could see the hidden content. But that's over: with Fetch as Googlebot, the new Labs feature in Webmaster Tools, you can see exactly what Googlebot is seeing, and avoid any kind of cloaking problems. We'll show you how:

Let's imagine that Bob, the administrator of www.example.com, is searching for his site but he finds this instead:



That's strange, because when he looks at the source code of www.example.com, it looks fine:



With much surprise Bob may receive a notice from Google warning him that his site is not complying with Google's quality guidelines. Fortunately he has his site registered with Webmaster Tools, let's see how he can check what Googlebot sees:

First Bob logs into Webmaster Tools and selects www.example.com. The Fetch as Googlebot feature will be at the bottom of the navigation menu, in the Labs section:



The page will contain a field where you can insert the URL to fetch. It can also be left blank to fetch the homepage.



Bob can simply click Fetch and wait a few seconds. After refreshing the page, he can see the status of the fetch request. If it succeeds, he can click on the "Success" link...



...and that will show the details, with the content of the fetched page:



Aha! There's the spammy content! Now Bob can be certain that www.example.com has been hacked.

Confirming that the website has been hacked (and perhaps is still hacked) is an important step. It is, however, only the beginning. For more information, we strongly suggest getting help from your server administrator or hoster and reading our previous blog posts on the subject of hacked sites:


If you have any questions about how to use the Fetch as Googlebot feature, feel free to drop by the Webmaster Help Forum. If you feel that your website might be hacked but are having problems resolving it, you might want to ask the experts in our "Malware and Hacked sites" category.

PS Keep in mind that once you have removed hacked content from your site, it will generally still take time for us to update our search results accordingly. There are a number of factors that affect crawling and indexing of your content so it's impossible to give a time-frame for that.

Hard facts about comment spam

Wednesday, November 25, 2009 at 7:17 PM

Webmaster Level: Beginner

It has probably happened to you: you're reading articles or watching videos on the web, and you come across some unrelated, gibberish comments. You may wonder what this is all about. Some webmasters abuse other sites by exploiting their comment fields, posting tons of links that point back to the poster's site in an attempt to boost their site's ranking. Others might tweak this approach a bit by posting a generic comment (like "Nice site!") with a commercial user name linking to their site.

Why is it bad?

FACT: Abusing comment fields of innocent sites is a bad and risky way of getting links to your site. If you choose to do so, you are tarnishing other people's hard work and lowering the quality of the web, transforming a potentially good resource of additional information into a list of nonsense keywords.

FACT: Comment spammers are often trying to improve their site's organic search ranking by creating dubious inbound links to their site. Google has an understanding of the link graph of the web, and has algorithmic ways of discovering those alterations and tackling them. At best, a link spammer might spend hours doing spammy linkdrops which would count for little or nothing because Google is pretty good at devaluing these types of links. Think of all the more productive things one could do with that time and energy that would provide much more value for one's site in the long run.


Promote your site without comment spam

If you want to improve your site's visibility in the search results, spamming comments is definitely not the way to go. Instead, think about whether your site offers what people are looking for, such as useful information and tools.

FACT: Having original and useful content and making your site search engine friendly is the best strategy for better ranking. With an appealing site, you'll be recognized by the web community as a reliable source and links to your site will build naturally.

Moreover, Google provides a list of advice in order to improve the crawlability and indexability of your site. Check out our Search Engine Optimization Starter Guide.

What can I do to avoid spam on my site?

Comments can be a really good source of information and an efficient way of engaging a site's users in discussions. This valuable content should not be replaced by gibberish nonsense keywords and links. For this reason there are many ways of securing your application and disincentivizing spammers.
  • Disallow anonymous posting.
  • Use CAPTCHAs and other methods to prevent automated comment spamming.
  • Turn on comment moderation.
  • Use the "nofollow" attribute for links in the comment field.
  • Disallow hyperlinks in comments.
  • Block comment pages using robots.txt or meta tags.
For detailed information about these topics, check out our Help Center document on comment spam.

My site is full of comment spam, what should I do?

It's never too late! Don't let spammers ruin the experience for others. Adopt security measures discussed above to stop the spam activity, then invest some time to clean up the spammy comments and ban the spammers from your site. Depending on you site's system, you may be able to save time by banning spammers and removing their comments all at once, rather than one by one.

If I spammed comment fields of third party sites, what should I do?

If you used this approach in the past and you want to solve this issue, you should have a look at your incoming links in Webmaster Tools. To do so, go to the Your site on the web section and click on Links to your site. If you see suspicious links coming from blogs or other platforms allowing comments, you should check these URLs. If you see a spammy link you created, try to delete it, else contact the webmaster to ask to remove the link. Once you've cleared the spammy inbound links you made, you can file a reconsideration request.

For more information about this topic and to discuss it with others, join us in the Webmaster Help Forum. (But don't leave spammy comments!)

'New software version' notifications for your site

Friday, November 20, 2009 at 9:30 AM

Webmaster level: All

One of the great things about working at Google is that we get to take advantage of an enormous amount of computing power to do some really cool things. One idea we tried out was to let webmasters know about their potentially hackable websites. The initial effort was successful enough that we thought we would take it one step further by expanding our efforts to cover other types of web applications—for example, more content management systems (CMSs), forum/bulletin-board applications, stat-trackers, and so on.

This time, however, our goal is not just to isolate vulnerable or hackable software packages, but to also notify webmasters about newer versions of the software packages or plugins they're running on their website. For example, there might be a Drupal module or Joomla extension update available but some folks might not have upgraded. There are a few reasons a webmaster might not upgrade to the newer version and one of the reasons could be that they just don't know a new version exists. This is where we think we can help. We hope to let webmasters know about new versions of their software by sending them a message via Webmaster Tools. This way they can make an informed decision about whether or not they would like to upgrade.

One of the ways we identify sites to notify is by parsing source code of web pages that we crawl. For example, WordPress and other CMS applications include a generator meta tag that specifies the version number. This has proven to be tremendously helpful in our efforts to notify webmasters. So if you're a software developer, and would like us to help you notify your users about newer versions of your software, a great way to start would be to include a generator meta tag that tells the version number of your software. If you're a plugin or a widget developer, including a version number in the source you provide to your users is a great way to help too.

We've seen divided opinions over time about whether it's a good security practice to include a version number in source code, because it lets hackers or worm writers know that the website might be vulnerable to a particular type of exploit. But as Matt Mullenweg pointed out, "Where [a worm writer's] 1.0 might have checked for version numbers, 2.0 just tests [a website's] capabilities...". Meanwhile, the advantage of a version number is that it can help alert site owners when they need to update their site. In the end, we tend to think that including a version number can do more good than harm.

We plan to begin sending out the first of these messages soon and hope that webmasters find them useful! If you have any questions or feedback, feel free to comment here.

Running desktop and mobile versions of your site

Wednesday, November 18, 2009 at 10:15 PM

(This post was largely translated from our Japanese version of the Webmaster Central Blog )

Recently I introduced several methods to ensure your mobile site is properly indexed by Google. Today I'd like to share information useful for webmasters who manage both desktop and mobile phone versions of a site.

One of the most common problems for webmasters who run both mobile and desktop versions of a site is that the mobile version of the site appears for users on a desktop computer, or that the desktop version of the site appears when someone finds them from a mobile device. In dealing with this scenario, here are two viable options:

Redirect mobile users to the correct version
When a mobile user or crawler (like Googlebot-Mobile) accesses the desktop version of a URL, you can redirect them to the corresponding mobile version of the same page. Google notices the relationship between the two versions of the URL and displays the standard version for searches from desktops and the mobile version for mobile searches.

If you redirect users, please make sure that the content on the corresponding mobile/desktop URL matches as closely as possible. For example, if you run a shopping site and there's an access from a mobile phone to a desktop-version URL, make sure that the user is redirected to the mobile version of the page for the same product, and not to the homepage of the mobile version of the site. We occasionally find sites using this kind of redirect in an attempt to boost their search rankings, but this practice only results in a negative user experience, and so should be avoided at all costs.

On the other hand, when there's an access to a mobile-version URL from a desktop browser or by our web crawler, Googlebot, it's not necessary to redirect them to the desktop-version. For instance, Google doesn't automatically redirect desktop users from their mobile site to their desktop site, instead they include a link on the mobile-version page to the desktop version. These links are especially helpful when a mobile site doesn't provide the full functionality of the desktop version -- users can easily navigate to the desktop-version if they prefer.

Switch content based on User-agent
Some sites have the same URL for both desktop and mobile content, but change their format according to User-agent. In other words, both mobile users and desktop users access the same URL (i.e. no redirects), but the content/format changes slightly according to the User-agent. In this case, the same URL will appear for both mobile search and desktop search, and desktop users can see a desktop version of the content while mobile users can see a mobile version of the content.

However, note that if you fail to configure your site correctly, your site could be considered to be cloaking, which can lead to your site disappearing from our search results. Cloaking refers to an attempt to boost search result rankings by serving different content to Googlebot than to regular users. This causes problems such as less relevant results (pages appear in search results even though their content is actually unrelated to what users see/want), so we take cloaking very seriously.

So what does "the page that the user sees" mean if you provide both versions with a URL? As I mentioned in the previous post, Google uses "Googlebot" for web search and "Googlebot-Mobile" for mobile search. To remain within our guidelines, you should serve the same content to Googlebot as a typical desktop user would see, and the same content to Googlebot-Mobile as you would to the browser on a typical mobile device. It's fine if the contents for Googlebot are different from the one for Googlebot-Mobile.

One example of how you could be unintentionally detected for cloaking is if your site returns a message like "Please access from mobile phones" to desktop browsers, but then returns a full mobile version to both crawlers (so Googlebot receives the mobile version). In this case, the page which web search users see (e.g. "Please access from mobile phones") is different from the page which Googlebot crawls (e.g. "Welcome to my site"). Again, we detect cloaking because we want to serve users the same relevant content that Googlebot or Googlebot-Mobile crawled.

Diagram of serving content from your mobile-enabled site


We're working on a daily basis to improve search results and solve problems, but because the relationship between PC and mobile versions of a web site can be nuanced, we appreciate the cooperation of webmasters. Your help will result in more mobile content being indexed by Google, improving the search results provided to users. Thank you for your cooperation in improving the mobile search user experience.

Pros and cons of watermarked images

Tuesday, November 17, 2009 at 11:40 AM

Webmaster Level: All

What's our take on watermarked images for Image Search? It's a complicated topic. I talked with Peter Linsley—my friend at the 'plex, video star, and Product Manager for Image Search—to hear his thoughts.

Maile: So, Peter... "watermarked images". Can you break it down for us?
Peter: It's understandable that webmasters find watermarking images beneficial.
Pros of watermarked images
  • Photographers can claim credit/be recognized for their art.
  • Unknown usage of the image is deterred.
If search traffic is important to a webmaster, then he/she may also want to consider some of our findings:
Findings relevant to watermarked images
  • Users prefer large, high-quality images (high-resolution, in-focus).
  • Users are more likely to click on quality thumbnails in search results. Quality pictures (again, high-res and in-focus) often look better at thumbnail size.
  • Distracting features such as loud watermarks, text over the image, and borders are likely to make the image look cluttered when reduced to thumbnail size.
In summary, if a feature such as watermarking reduces the user-perceived quality of your image or your image's thumbnail, then searchers may select it less often. Preview your images at thumbnail size to get an idea of how the user might perceive it.
Maile: Ahh, I see: Webmasters concerned with search traffic likely want to balance the positives of watermarking with the preferences of their users -- keeping in mind that sites that use clean images without distracting artifacts tend to be more popular, and that this can also impact rankings. Will Google rank an image differently just because it's watermarked?
Peter: Nope. The presence of a watermark doesn't itself cause an image to be ranked higher or lower.

Do you have questions or opinions on the topic? Let's chat in the webmaster forum.

Help Google index your mobile site

Friday, November 13, 2009 at 1:48 PM

(This post was largely translated from our Japanese Webmaster Central Blog.)

It seems the world is going mobile, with many people using mobile phones on a daily basis, and a large user base searching on Google’s mobile search page. However, as a webmaster, running a mobile site and tapping into the mobile search audience isn't easy. Mobile sites not only use a different format from normal desktop site, but the management methods and expertise required are also quite different. This results in a variety of new challenges. As a mobile search engineer, it's clear to me that while many mobile sites were designed with mobile viewing in mind, they weren’t designed to be search friendly. I'd like to help ensure that your mobile site is also available for users of mobile search.

Here are troubleshooting tips to help ensure that your site is properly crawled and indexed:

Verify that your mobile site is indexed by Google

If your web site doesn't show up in the results of a Google mobile search even using the 'site:' operator, it may be that your site has one or both of the following issues:
Googlebot may not be able to find your site
Googlebot, our crawler, must crawl your site before it can be included in our search index. If you just created the site, we may not yet be aware of it. If that's the case, create a Mobile Sitemap and submit it to Google to inform us to the site’s existence. A Mobile Sitemap can be submitted using Google Webmaster Tools, in the same way as with a standard Sitemap.
Googlebot may not be able to access your site
Some mobile sites refuse access to anything but mobile phones, making it impossible for Googlebot to access the site, and therefore making the site unsearchable. Our crawler for mobile sites is "Googlebot-Mobile". If you'd like your site crawled, please allow any User-agent including "Googlebot-Mobile" to access your site. You should also be aware that Google may change its User-agent information at any time without notice, so it is not recommended that you check if the User-agent exactly matches "Googlebot-Mobile" (which is the string used at present). Instead, check whether the User-agent header contains the string "Googlebot-Mobile". You can also use DNS Lookups to verify Googlebot.

Verify that Google can recognize your mobile URLs

Once Googlebot-Mobile crawls your URLs, we then check for whether the URL is viewable on a mobile device. Pages we determine aren't viewable on a mobile phone won't be included in our mobile site index (although they may be included in the regular web index). This determination is based on a variety of factors, one of which is the "DTD (Doc Type Definition)" declaration. Check that your mobile-friendly URLs' DTD declaration is in an appropriate mobile format such as XHTML Mobile or Compact HTML. If it's in a compatible format, the page is eligible for the mobile search index. For more information, see the Mobile Webmaster Guidelines.

If you have any question regarding mobile site, post your question to our Webmaster Help Forum and webmasters around the world as well as we are happy to help you with your problem.

Post-Halloween Treat: New Keywords User Interface!

Wednesday, November 11, 2009 at 12:37 AM

Our team had an awesome Halloween and we hope you did too. Yes, the picture below is our team; we take our Halloween costumes pretty seriously. :)


As a post-Halloween treat, we're happy to announce a brand new user interface for our Keywords feature. We'll now be updating the data daily, providing details on how often we found a specific keyword, and displaying a handful of URLs that contain a specific keyword. The significance column compares the frequency of a keyword to the frequency of the most popular keyword on your site. When you click on a keyword to view more details, you will get a list of up to 10 URLs which contain that keyword.

This will be really useful when you re-implement your site on a new technology framework, or need to identify which URLs may have been hacked. For example, if you start noticing your site appearing in search results for terms totally unrelated to your website (for example, "Viagra" or "casino"), you can use this feature to find those keywords and identify the pages that contain them. This will enable you to eliminate any hacked content quickly.

Let us know what you think!

New personalization features in Google Friend Connect

Wednesday, November 04, 2009 at 7:40 AM

Webmaster Level: All

Update: The described product or service is no longer available.


Just a few weeks ago, we made Google Friend Connect a lot easier to use by dramatically simplifying the setup process. Today, we're excited to announce several new features that make it possible for website owners to get to know their users, encourage users to get to know each other, and match their site content (including Google ads) to visitors' interests.

To learn more about these new features, check out the Google Social Web Blog.

Get your site ready for the holidays: Webmasters - make your list and check it twice!

Tuesday, November 03, 2009 at 2:25 PM

Webmaster Level: All

Are the holidays an important season for your website or online business? We think so! And to help make sure you're in good shape, we wanted to invite you to our Holiday Webmaster Webinar.

This Webex will be hosted by Senior Search Quality Engineer Greg Grothaus, and AdWords Evangelist Fred Vallaeys. They'll be discussing a range of webmaster best practices and useful Google tools followed by a Q&A session to make sure you and your site are well primed for the holiday rush!

Topic: Holiday Webmaster Webinar
Date: Friday, November 13, 2009
Time: 10:00 am, Pacific Standard Time (GMT -08:00, San Francisco)
Meeting Number: 574 659 815
Meeting Password: webmaster

Please click the link below to see more information, or to join the meeting.

-------------------------------------------------------
To join the online meeting (Now from iPhones too!)
-------------------------------------------------------
1. Go to https://googleonline.webex.com/googleonline/j.php?ED=133402392&UID=0&PW=db339c4e641e0f525412171e5646
2. Enter your name and email address.
3. Enter the meeting password: webmaster
4. Click "Join Now".

-------------------------------------------------------
To join the teleconference only
-------------------------------------------------------
Call-in toll-free number (US/Canada): 866-469-3239
Call-in toll number (US/Canada): 1-650-429-3300
Toll-free dialing restrictions: http://www.webex.com/pdf/tollfree_restrictions.pdf

-------------------------------------------------------
For assistance
-------------------------------------------------------
1. Go to https://googleonline.webex.com/googleonline/mc
2. On the left navigation bar, click "Support".

Using RSS/Atom feeds to discover new URLs

Thursday, October 29, 2009 at 5:50 PM

Webmaster Level: Intermediate

Google uses numerous sources to find new webpages, from links we find on the web to submitted URLs. We aim to discover new pages quickly so that users can find new content in Google search results soon after they go live. We recently launched a feature that uses RSS and Atom feeds for the discovery of new webpages.

RSS/Atom feeds have been very popular in recent years as a mechanism for content publication. They allow readers to check for new content from publishers. Using feeds for discovery allows us to get these new pages into our index more quickly than traditional crawling methods. We may use many potential sources to access updates from feeds including Reader, notification services, or direct crawls of feeds. Going forward, we might also explore mechanisms such as PubSubHubbub to identify updated items.

In order for us to use your RSS/Atom feeds for discovery, it's important that crawling these files is not disallowed by your robots.txt. To find out if Googlebot can crawl your feeds and find your pages as fast as possible, test your feed URLs with the robots.txt tester in Google Webmaster Tools.

Help us make the web better: An update on Rich Snippets

Monday, October 26, 2009 at 2:00 PM

Webmaster Level: All

In May this year we announced Rich Snippets which makes it possible to show structured data from your pages on Google's search results.


We're convinced that structured data makes the web better, and we've worked hard to expand Rich Snippets to more search results and collect your feedback along the way. If you have review or people/social networking content on your site, it's easier than ever to mark up your content using microformats or RDFa so that Google can better understand it to generate useful Rich Snippets. Here are a few helpful improvements on our end to enable you to mark up your content:

Testing tool. See what Google is able to extract, and preview how microformats or RDFa marked-up pages would look on Google search results. Test your URLs on the Rich Snippets Testing Tool.


Google Custom Search users can also use the Rich Snippets Testing Tool to test markup usable in their Custom Search engine.

Better documentation. We've extended our documentation to include a new section containing Tips & Tricks and Frequently Asked Questions. Here we have responded to common points of confusion and provided instructions on how to maximize the chances of getting Rich Snippets for your site.

Extended RDFa support. In addition to the Person RDFa format, we have added support for the corresponding fields from the FOAF and vCard vocabularies for all those of you who asked for it.

Videos. If you have videos on your page, you can now mark up your content to help Google find those videos.

As before, marking up your content does not guarantee that Rich Snippets will be shown for your site. We will continue to expand this feature gradually to ensure a great user experience whenever Rich Snippets are shown in search results.

Verifying a Blogger blog in Webmaster Tools

Thursday, October 22, 2009 at 2:48 PM

Webmaster Level: All

You may have seen our recent announcement of changes to the verification system in Webmaster Tools. One side effect of this change is that blogs hosted on Blogger (that haven't yet been verified) will have to use the meta tag verification method rather than the "one-click" integration from the Blogger dashboard. The "Webmaster Tools" auto-verification link from the Blogger dashboard is no longer working and will soon be removed. We're working to reinstate an automated verification approach for Blogger hosted blogs in the future, but for the time being we wanted you to be aware of the steps required to verify your Blogger blog in Webmaster Tools.

Step-By-Step Instructions:

In Webmaster Tools
1. Click the "Add a site" button on the Webmaster Tools Home page
2. Enter your blog's URL (for example, googlewebmastercentral.blogspot.com) and click the "Continue" button to go to the Manage verification page
3. Select the "Meta tag" verification method and copy the meta tag provided

In Blogger
4. Go to your blog and sign in
5. From the Blogger dashboard click the "Layout" link for the blog you're verifying
6. Click the "Edit HTML" link under the "Layout" tab which will allow you to edit the HTML for your blog's template
7. Paste the meta tag (copied in step 3) immediately after the <head> element within the template HTML and click the "SAVE TEMPLATE" button




In Webmaster Tools
8. On the Manage Verification page, confirm that "Meta tag" is selected as the verification method and click the "Verify" button

Your blog should now be verified. You're ready to start using Webmaster Tools!

One million YouTube views!

Wednesday, October 21, 2009 at 12:34 PM

Earlier this year, we launched our very own Webmaster Central channel on YouTube. Just today, we saw our total video views exceed one million! On the road to this milestone, we uploaded 154 videos, for a total of nearly 11 hours of webmaster-focused media. These videos have brought you conference presentations, updates on tools for webmasters, general tips, and of course answers to your "Grab bag" questions for Matt Cutts.

To celebrate our one million views, we're sharing a fun video with you in which Matt Cutts shows us what happened when he lost a bet with his team:



We're also pleased to announce that we've added captions to all of our videos and plan to do so for our future videos as well. Thank you to everyone who has watched, shared, and commented on our videos. We look forward to the next million views!

Dealing with low-quality backlinks

Friday, October 16, 2009 at 4:53 PM

Webmaster level: Intermediate/Advanced

Webmasters who check their incoming links in Webmaster Tools often ask us what they can do when they see low-quality links. Understandably, many site owners are trying to build a good reputation for their sites, and some believe that having poor-quality incoming links can be perceived as "being part of a bad neighbourhood," which over time might harm their site's ranking.

example of low-quality links
If your site receives links that look similarly dodgy, don't be alarmed... read on!

While it's true that linking is a significant factor in Google's ranking algorithms, it's just one of many. I know we say it a lot, but having something that people want to look at or use—unique, engaging content, or useful tools and services—is also a huge factor. Other factors can include how a site is structured, whether the words of a user's query appear in the title, how close the words are on the page, and so on. The point is, if you happen to see some low quality sites linking to you, it's important to keep in mind that linking is just one aspect among many of how Google judges your site. If you have a well-structured and regularly maintained site with original, high-quality content, those are the sorts of things that users will see and appreciate.

That having said, in an ideal world you could have your cake and eat it too (or rather, you could have a high-quality site and high-quality backlinks). You may also be concerned about users' perception of your site if they come across it via a batch of spammy links. If the number of poor-quality links is manageable, and/or if it looks easy to opt-out or get those links removed from the site that's linking to you, it may be worth it to try to contact the site(s) and ask them to remove their links. Remember that this isn't something that Google can do for you; we index content that we find online, but we don't control that content or who's linking to you.

If you run into some uncooperative site owners, however, don't fret for too long. Instead, focus on things that are under your control. Generally, you as a webmaster don't have much control over things like who links to your site. You do, however, have control over many other factors that influence indexing and ranking. Organize your content; do a mini-usability study with family or friends. Ask for a site review in your favorite webmaster forums. Use a website testing tool to figure out what gets you the most readers, or the biggest sales. Take inspiration from your favorite sites, or your competitors—what do they do well? What makes you want to keep coming back to their sites, or share them with your friends? What can you learn from them? Time spent on any of these activities is likely to have a larger impact on your site's overall performance than time spent trying to hunt down and remove every last questionable backlink.

Finally, keep in mind that low-quality links rarely stand the test of time, and may disappear from our link graph relatively quickly. They may even already be being discounted by our algorithms. If you want to make sure Google knows about these links and is valuing them appropriately, feel free to bring them to our attention using either our spam report or our paid links report.

Let's make the mobile web faster

(Cross-posted on the Google Code Blog)

This week, we've been celebrating all things mobile across Google. Of course, this wouldn't be complete without a component for mobile web developers! Two months ago we asked you to make the web faster. Now, we've asked the Google Mobile team for some best practices, tips, and resources for mobile web development, and we've come up with a few things we wanted to share. "Go Mobile!" with our Make the mobile web faster article.

Managing your reputation through search results

Thursday, October 15, 2009 at 3:00 PM

(Cross-posted on the Official Google Blog)

A few years ago I couldn't wait to get married. Because I was in love, yeah; but more importantly, so that I could take my husband's name and people would stop getting that ridiculous picture from college as a top result when they searched for me on Google.

After a few years of working here, though, I've learned that you don't have to change your name just because it brings up some embarrassing search results. Below are some tips for "reputation management": influencing how you're perceived online, and what information is available relating to you.

Think twice

The first step in reputation management is preemptive: Think twice before putting your personal information online. Remember that although something might be appropriate for the context in which you're publishing it, search engines can make it very easy to find that information later, out of context, including by people who don't normally visit the site where you originally posted it. Translation: don't assume that just because your mom doesn't read your blog, she'll never see that post about the new tattoo you're hiding from her.

Tackle it at the source

If something you dislike has already been published, the next step is to try to remove it from the site where it's appearing. Rather than immediately contacting Google, it's important to first remove it from the site where it's being published. Google doesn't own the Internet; our search results simply reflect what's already out there on the web. Whether or not the content appears in Google's search results, people are still going to be able to access it — on the original site, through other search engines, through social networking sites, etc. — if you don't remove it from the original site. You need to tackle this at the source.
  • If the content in question is on a site you own, easy — just remove it. It will naturally drop out of search results after we recrawl the page and discover the change.
  • It's also often easy to remove content from sites you don't own if you put it there, such as photos you've uploaded, or content on your profile page.
  • If you can't remove something yourself, you can contact the site's webmaster and ask them to remove the content or the page in question.
After you or the site's webmaster has removed or edited the page, you can expedite the removal of that content from Google using our URL removal tool.

Proactively publish information

Sometimes, however, you may not be able to get in touch with a site's webmaster, or they may refuse to take down the content in question. For example, if someone posts a negative review of your business on a restaurant review or consumer complaint site, that site might not be willing to remove the review. If you can't get the content removed from the original site, you probably won't be able to completely remove it from Google's search results, either. Instead, you can try to reduce its visibility in the search results by proactively publishing useful, positive information about yourself or your business. If you can get stuff that you want people to see to outperform the stuff you don't want them to see, you'll be able to reduce the amount of harm that that negative or embarrassing content can do to your reputation.

You can publish or encourage positive content in a variety of ways:
  • Create a Google profile. When people search for your name, Google can display a link to your Google profile in our search results and people can click through to see whatever information you choose to publish in your profile.
  • If a customer writes a negative review of your business, you could ask some of your other customers who are happy with your company to give a fuller picture of your business.
  • If a blogger is publishing unflattering photos of you, take some pictures you prefer and publish them in a blog post or two.
  • If a newspaper wrote an article about a court case that put you in a negative light, but which was subsequently ruled in your favor, you can ask them to update the article or publish a follow-up article about your exoneration. (This last one may seem far-fetched, but believe it or not, we've gotten multiple requests from people in this situation.)
Hope these tips have been helpful! Feel free to stop by our Web Search Forum and share your own advice or stories about how you manage your reputation online.

Fetch as Googlebot and Malware details -- now in Webmaster Tools Labs!

Monday, October 12, 2009 at 3:15 PM

The Webmaster Tools team is lucky to have passionate users who provide us with a great set of feature ideas. Going forward, we'll be launching some features under the "Labs" label so we can quickly transition from concept to production, and hear your feedback ASAP. With Labs releases, you have the opportunity to play with features and have your feedback heard much earlier in the development lifecycle. On the flip side, since these features are available early in the release cycle they're not as robust, and may break at times.

Today we're launching two cool features:
  • Malware details
  • Fetch as Googlebot
Malware details (developed by Lucas Ballard)

Before today, you may have been relying on manual testing, our safe browsing API, and malware notifications to determine which pages on your site may be distributing malware. Sometimes finding the malicious code is extremely difficult, even when you do know which pages it was found on. Today we are happy to announce that we'll be providing snippets of code that exist on some of those pages that we consider to be malicious. We hope this additional information enables you to eliminate the malware on your site very quickly, and reduces the number of iterations many webmasters go through during the review process.

More information on this cool feature is available at our Online Security Blog.


Fetch as Googlebot (developed by Javier Tordable)

"What does Googlebot see when it accesses my page?" is a common question webmasters ask us on our forums and at conferences. Our keywords and HTML suggestions features help you understand the content we're extracting from your site, and any issues we may be running into at crawl and indexing time. However, we realized it was important to provide the ability for users to submit pages on their site and get real-time feedback on what Googlebot sees. This feature will help users a great deal when they re-implement their site with a new technology stack, find out that some of their pages have been hacked, or want to understand why they're not ranking for specific keywords.


We're pretty excited about this launch, and hope you are too. Let us know what you think!

A proposal for making AJAX crawlable

Wednesday, October 07, 2009 at 10:51 AM

Webmaster level: Advanced

Today we're excited to propose a new standard for making AJAX-based websites crawlable. This will benefit webmasters and users by making content from rich and interactive AJAX-based websites universally accessible through search results on any search engine that chooses to take part. We believe that making this content available for crawling and indexing could significantly improve the web.

While AJAX-based websites are popular with users, search engines traditionally are not able to access any of the content on them. The last time we checked, almost 70% of the websites we know about use JavaScript in some form or another. Of course, most of that JavaScript is not AJAX, but the better that search engines could crawl and index AJAX, the more that developers could add richer features to their websites and still show up in search engines.

Some of the goals that we wanted to achieve with this proposal were:
  • Minimal changes are required as the website grows
  • Users and search engines see the same content (no cloaking)
  • Search engines can send users directly to the AJAX URL (not to a static copy)
  • Site owners have a way of verifying that their AJAX website is rendered correctly and thus that the crawler has access to all the content


Here's how search engines would crawl and index AJAX in our initial proposal:
  • Slightly modify the URL fragments for stateful AJAX pages
    Stateful AJAX pages display the same content whenever accessed directly. These are pages that could be referred to in search results. Instead of a URL like http://example.com/page?query#state we would like to propose adding a token to make it possible to recognize these URLs: http://example.com/page?query#[FRAGMENTTOKEN]state . Based on a review of current URLs on the web, we propose using "!" (an exclamation point) as the token for this. The proposed URL that could be shown in search results would then be: http://example.com/page?query#!state.
  • Use a headless browser that outputs an HTML snapshot on your web server
    The headless browser is used to access the AJAX page and generates HTML code based on the final state in the browser. Only specially tagged URLs are passed to the headless browser for processing. By doing this on the server side, the website owner is in control of the HTML code that is generated and can easily verify that all JavaScript is executed correctly. An example of such a browser is HtmlUnit, an open-sourced "GUI-less browser for Java programs.
  • Allow search engine crawlers to access these URLs by escaping the state
    As URL fragments are never sent with requests to servers, it's necessary to slightly modify the URL used to access the page. At the same time, this tells the server to use the headless browser to generate HTML code instead of returning a page with JavaScript. Other, existing URLs - such as those used by the user - would be processed normally, bypassing the headless browser. We propose escaping the state information and adding it to the query parameters with a token. Using the previous example, one such URL would be http://example.com/page?query&[QUERYTOKEN]=state . Based on our analysis of current URLs on the web, we propose using "_escaped_fragment_" as the token. The proposed URL would then become http://example.com/page?query&_escaped_fragment_=state .
  • Show the original URL to users in the search results
    To improve the user experience, it makes sense to refer users directly to the AJAX-based pages. This can be achieved by showing the original URL (such as http://example.com/page?query#!state from our example above) in the search results. Search engines can check that the indexable text returned to Googlebot is the same or a subset of the text that is returned to users.



(Graphic by Katharina Probst)

In summary, starting with a stateful URL such as
http://example.com/dictionary.html#AJAX , it could be available to both crawlers and users as
http://example.com/dictionary.html#!AJAX which could be crawled as
http://example.com/dictionary.html?_escaped_fragment_=AJAX which in turn would be shown to users and accessed as
http://example.com/dictionary.html#!AJAX

View the presentation

We're currently working on a proposal and a prototype implementation. Feedback is very welcome — please add your comments below or in our Webmaster Help Forum. Thank you for your interest in making the AJAX-based web accessible and useful through search engines!

Congratulations! You've discovered botcoins, a new currency for webmasters! You can use botcoins for your daily searches, even without typing anything! (100 botcoins = 1 free search). To redeem your botcoins, simply find a Google Search Team member in person, hand over your botcoin certificate, and request your query. The Googler will then make the search for you. Note: Googler may ignore you or choose to search for a different phrase or topic if they are in a bad mood. And, currently no mining opportunities are provided for botcoins.