Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Retiring support for OAI-PMH in Sitemaps

Wednesday, April 23, 2008 at 10:20 AM



When we originally launched Sitemaps, we included support for the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) 2.0 protocol, an interoperability framework based on metadata harvesting. In the meantime, however, we've found that the information we gain from our support of OAI-PMH is disproportional to the amount of resources required to support it. Fewer than 200 sites are using OAI-PMH for Google Sitemaps at the moment.

In order to move forward with even better coverage of your websites, we have decided to support only the standard XML Sitemap format by May 2008. We are in the process of notifying sites using OAI-PMH to alert them of the change.

If you have been using OAI-PMH as a Google Sitemap feed, we would love to see you adopt the industry standard XML Sitemap format. This format is supported by all of the major search engines and helps to make sure that everyone is able to find your new and updated content as soon as you make it available.

If you have any questions regarding the move to XML Sitemap files, feel free to post in our Google discussion group for Sitemaps.
The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

12 comments:

Jay said...

Can you list the sites?

mcockerill said...

I'm curious about this - it seems a bit of a shame.

The OAIster index (www.oaister.org) claims to include 15,000,000+ items, indexed from 950 OAI-compliant sources. Was there any particular reason that Google was only indexing 200 of these resources?

While 15m is only a small fraction of Google's total content, items available via OAI-compliant archives tend to be scholarly articles etc, and so those seem like 15m items worth having, and OAI is a very widely accepted standard within the specific domain that it was designed for.

Susan Moskwa said...

Hi MC,
It's not that we were only indexing < 200 of these resources, it's that < 200 of their owners were submitting Sitemaps using the OAI protocol. It's possible that the other 750+ site owners were submitting Sitemaps to us in a different format, or maybe they weren't submitting Sitemaps at all; either way, it doesn't mean that we can't still discover the URLs of OAI-compliant sites through other mechanisms, or continue to index their content. It just means that very few people were actually making use of this format for submitting Sitemaps to Google, which is why we're discontinuing support of it as a Sitemap format.

Danielle Cunniff Plumer said...

I'm disappointed about this, as I'd been working with libraries, archives, and museums across Texas to make their resources more findable. Because many of the items we work with are images of photos and hand-written documents and therefore can't be crawled effectively, and because what metadata we have for these documents is generally locked up in databases that currently can't be (or at least, aren’t being) crawled, I was hoping that OAI-PMH sitemaps would provide a viable option for findability.

I suppose my question is whether the difficulty in supporting OAI-PMH was in trying to harvest records using the approved syntax or whether there has been some issue with the XML produced. I had been working with institutions to produce what were effectively static respositories conforming to the Open Access Initiatives Static Repositories specification (http://www.openarchives.org/OAI/2.0/ guidelines-static-repository.htm), placing these at the top level of a site to allow robots to access site metadata and URLs for individual dynamically-created pages. The static repository is simply an xml file containing metadata records for individual items, and my feeling is that it should be straightforward to crawl this file and use the data it contains as a sitemap.

Because other search tools commonly used by libraries, archives, and museums widely support OAI-PMH, we will continue to support this protocol and devote resources to making more collections OAI-PMH compliant. While it is trivial to convert the static repository file for any given collection to the sitemap protocol using XSLT, I am not sure that the institutions I work with will be able to support yet another specification.

I hope that there may be some alternatives that can be explored on this issue! We very much want the rich resources held by libraries, archives, and museums to be discoverable through Google.

Danielle Cunniff Plumer, Coordinator
Texas Heritage Digitization Initiative
Texas State Library and Archives Commission

Media of Media said...

Good Website. Good job!

marco said...

I have a similar interest in Google's support of OAI-PMH as Danielle.

I am working for the Dutch organisation Digital Heritage Netherlands. We are commissioned by the Dutch Ministry of Education, Cultural Affairs and Science, to collects and distribute knowledge about ICT standards and other quality instruments.

Last year we've started a process aimed at a set of minimal standards to be used in digitization by all heritage organisations in our country.
The first topic we normalised was 'findability'. The standards that were selected as 'basic practise' were: HTTP, XML, UTF-8, Dublin Core, URI, SRU and OAI-PMH.
Of course relalively new technologies like OAI-PMH at the moment are only supported by large collection owners primary research libraries. We do believe in the potential of protocols like OAI for better integrated services and presentations of heritage information for the benefit of the public. However implementing these protocols is a major step for smaller institutions.

The improved indexing through OAI-PMH by internet search engines is an enormous help since these organisation at the moment have difficulties to open their collection information to search engines. The potential critisism that our organisation was promoting the use of Google could easily be countered with the argument that we are just promoting the use of standards. It's unthinkable that we would include Google's SiteMaps in our minimum set of standards.

With the announcement of Google that they will stop the support of OAI-PMH this situation of mutual benefit seems to disappear.

Susan Moskwa said...

Hi Marco,
If it helps, the Sitemaps protocol is supported by all the major search engines in the US, not just Google:

www.sitemaps.org

Danielle Cunniff Plumer said...

Susan,

I'm familiar with the sitemaps protocol and the support for it from various search engines, but that's not the point. My question is: what is it about OAI-PMH that makes it difficult to support as an alternative to sitemaps? I'd think it would be trivial for Google to take an xml file with OAI metadata and extract all the URLs. It would be much easier for you to do it automatically than for us to explain sitemaps to hundreds of cultural heritage institutions, particularly those for whom OAI-PMH is already built into their asset management systems.

Danielle Cunniff Plumer

www.Treximet.net said...

Any idea on how I link our site www.Treximet.org with google higher ranking sites. It's a homeschool site used for parents to breakup the typical boring conversation, so what did you do at school today. Gonna need some advice to get our site to a higher rank. Thanks google, our kids fav site.

Sincerely The Treximet Tram

Media of Media said...

Submit sitemaps to Google can support website.

chocolatecity.cc said...

I have a website say: www."my site".cc I will like to completely
redirect the index or main page to www."my site".cc/blog.

How can I do so without being penalize by Google or loosing my link popularity?

Maile Ohye said...

Hi everyone,

Because several months have passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Help Group.

Thanks and take care,
The Webmaster Central Team