Wednesday, October 07, 2009 at 10:51 AMWebmaster level: Advanced
Today we're excited to propose a new standard for making AJAX-based websites crawlable. This will benefit webmasters and users by making content from rich and interactive AJAX-based websites universally accessible through search results on any search engine that chooses to take part. We believe that making this content available for crawling and indexing could significantly improve the web.
Some of the goals that we wanted to achieve with this proposal were:
- Minimal changes are required as the website grows
- Users and search engines see the same content (no cloaking)
- Search engines can send users directly to the AJAX URL (not to a static copy)
- Site owners have a way of verifying that their AJAX website is rendered correctly and thus that the crawler has access to all the content
Here's how search engines would crawl and index AJAX in our initial proposal:
- Slightly modify the URL fragments for stateful AJAX pages
Stateful AJAX pages display the same content whenever accessed directly. These are pages that could be referred to in search results. Instead of a URL like http://example.com/page?query#state we would like to propose adding a token to make it possible to recognize these URLs: http://example.com/page?query#[FRAGMENTTOKEN]state . Based on a review of current URLs on the web, we propose using "!" (an exclamation point) as the token for this. The proposed URL that could be shown in search results would then be: http://example.com/page?query#!state.
- Use a headless browser that outputs an HTML snapshot on your web server
- Allow search engine crawlers to access these URLs by escaping the state
- Show the original URL to users in the search results
To improve the user experience, it makes sense to refer users directly to the AJAX-based pages. This can be achieved by showing the original URL (such as http://example.com/page?query#!state from our example above) in the search results. Search engines can check that the indexable text returned to Googlebot is the same or a subset of the text that is returned to users.
(Graphic by Katharina Probst)
In summary, starting with a stateful URL such as
http://example.com/dictionary.html#AJAX , it could be available to both crawlers and users as
http://example.com/dictionary.html#!AJAX which could be crawled as
http://example.com/dictionary.html?_escaped_fragment_=AJAX which in turn would be shown to users and accessed as
View the presentation
We're currently working on a proposal and a prototype implementation. Feedback is very welcome — please add your comments below or in our Webmaster Help Forum. Thank you for your interest in making the AJAX-based web accessible and useful through search engines!