Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

1000 Words About Images

Wednesday, April 25, 2012 at 1:08 AM

Webmaster level: All

Creativity is an important aspect of our lives and can enrich nearly everything we do. Say I'd like to make my teammate a cup of cool-looking coffee, but my creative batteries are empty; this would be (and is!) one of the many times when I look for inspiration on Google Images.


The images you see in our search results come from publishers of all sizes — bloggers, media outlets, stock photo sites — who have embedded these images in their HTML pages. Google can index image types formatted as BMP, GIF, JPEG, PNG and WebP, as well as SVG.

But how does Google know that the images are about coffee and not about tea? When our algorithms index images, they look at the textual content on the page the image was found on to learn more about the image. We also look at the page's title and its body; we might also learn more from the image’s filename, anchor text that points to it, and its "alt text;" we may use computer vision to learn more about the image and may also use the caption provided in the Image Sitemap if that text also exists on the page.

 To help us index your images, make sure that:
  • we can crawl both the HTML page the image is embedded in, and the image itself;
  • the image is in one of our supported formats: BMP, GIF, JPEG, PNG, WebP or SVG.
Additionally, we recommend:
  • that the image filename is related to the image’s content;
  • that the alt attribute of the image describes the image in a human-friendly way;
  • and finally, it also helps if the HTML page’s textual contents as well as the text near the image are related to the image.
Now some answers to questions we’ve seen many times:


Q: Why do I sometimes see Googlebot crawling my images, rather than Googlebot-Image?
A: Generally this happens when it’s not clear that a URL will lead to an image, so we crawl the URL with Googlebot first. If we find the URL leads to an image, we’ll usually revisit with Googlebot-Image. Because of this, it’s generally a good idea to allow crawling of your images and pages by both Googlebot and Googlebot-Image.

Q: Is it true that there’s a maximum file size for the images?
A: We’re happy to index images of any size; there’s no file size restriction.

Q: What happens to the EXIF, XMP and other metadata my images contain?
A: We may use any information we find to help our users find what they’re looking for more easily. Additionally, information like EXIF data may be displayed in the right-hand sidebar of the interstitial page that appears when you click on an image.


Q: Should I really submit an Image Sitemap? What are the benefits?
A: Yes! Image Sitemaps help us learn about your new images and may also help us learn what the images are about.


Q: I’m using a CDN to host my images; how can I still use an Image Sitemap?
A: Cross-domain restrictions apply only to the Sitemaps’ tag. In Image Sitemaps, the tag is allowed to point to a URL on another domain, so using a CDN for your images is fine. We also encourage you to verify the CDN’s domain name in Webmaster Tools so that we can inform you of any crawl errors that we might find.


Q: Is it a problem if my images can be found on multiple domains or subdomains I own — for example, CDNs or related sites?
A: Generally, the best practice is to have only one copy of any type of content. If you’re duplicating your images across multiple hostnames, our algorithms may pick one copy as the canonical copy of the image, which may not be your preferred version. This can also lead to slower crawling and indexing of your images.


Q: We sometimes see the original source of an image ranked lower than other sources; why is this?
A: Keep in mind that we use the textual content of a page when determining the context of an image. For example, if the original source is a page from an image gallery that has very little text, it can happen that a page with more textual context is chosen to be shown in search. If you feel you've identified very bad search results for a particular query, feel free to use the feedback link below the search results or to share your example in our Webmaster Help Forum.

SafeSearch

Our algorithms use a great variety of signals to decide whether an image — or a whole page, if we’re talking about Web Search — should be filtered from the results when the user’s SafeSearch filter is turned on. In the case of images some of these signals are generated using computer vision, but the SafeSearch algorithms also look at simpler things such as where the image was used previously and the context in which the image was used. 
One of the strongest signals, however, is self-marked adult pages. We recommend that webmasters who publish adult content mark up their pages with one of the following meta tags:
<meta name="rating" content="adult" />
<meta name="rating" content="RTA-5042-1996-1400-1577-RTA" />

Many users prefer not to have adult content included in their search results (especially if kids use the same computer). When a webmaster provides one of these meta tags, it helps to provide a better user experience because users don't see results which they don't want to or expect to see. 

As with all algorithms, sometimes it may happen that SafeSearch filters content inadvertently. If you think your images or pages are mistakenly being filtered by SafeSearch, please let us know using the following form

If you need more information about how we index images, please check out the section of our Help Center dedicated to images, read our SEO Starter Guide which contains lots of useful information, and if you have more questions please post them in the Webmaster Help Forum

The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

32 comments:

Lothmak said...

Interesting.. I always wondered how the indexing worked for images regardless of format or name. I love it.

Christian Schmidt said...

What should I do if I have two URLs representing the same image in different sizes in order to avoid having the image indexed twice?

According to http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066 the ''Link: ; rel="canonical"'' HTTP header is only supported for Web Search (i.e. not Image search, I assume).

Cakap Niaga said...

Glad to know that Google taking account alt="" seriously, a method that I liked much to implement. Thank you, Google!

Allen said...

Is there a reason that you do not look at the title attribute?

salhakim85 said...

Its interesting noting that many seos believe that optimizing images for SEO is not as important as putting more of a focus on other SEO factors such as keyword research, architecture and building quality links. What are people's opinion on this?

Sergey Sypalo said...

Let's assume i have photo gallery on my web site with following structure:
http://photo.sypalo.com/backstages/harmony/2011/Autumn/Standard/photo-1
http://photo.sypalo.com/backstages/harmony/2011/Autumn/Standard/photo-2
and so on...
so i have structure like
DanclubName -> year -> Season -> DanceType -> PhotoName
If user click save he goes to http://photo.sy.. .../photo-1/save url and receive image photo-1.jpg
So i have following questions:
1) Do i need to change photo name to something that duplicates folder structure, in my case Harmony-2011-Autumn-Standrad-photo-1.jpg
2) Is it ok to have different url and file name, i mean:
http://photo.sypalo.com/backstages/harmony/2011/Autumn/Standard/photo-1 will b pointed to action thet returns not photo-1.jpg, but Harmony-2011-Autumn-Standrad-photo-1.jpg
3) What's the priority/importance order for image additional information, as we can have following things filled up:
a)Image EXIF data
b)Image sitemap
c)Image filename
d)Image ALT tag

Gary Illyes said...

Hey Christian!

In general, we try to include only one copy of your images as usually only one copy of the image is relevant to the content.

Could you show us a search query/term or link to a search result page where we show more than one copy of an image? Thanks a lot!

Gary Illyes said...

Hi Sergey,

Excellent questions!

1) If the structure works for you, I'd leave it like that. We have other ways to learn what's the image about, having a descriptive filename is really optional.

2) I wouldn't worry about it, especially since it might cause you some headaches. If you rewrite your URLs now that your images are indexed, we will have to recrawl and reindex all those that had their URLs changed which might lead us to drop some of them.

3) I'd order them like this:
a)Image ALT tag
b)Image sitemap
c)Image EXIF data
d)Image filename


Hope this helps!

Johnny said...

My business has been smashed overnight and youre posting articles about "image optimization".

Animesh said...

The most interesting part of this post was about spun articles. Lots of people are using it but its clear that Google can catch such tactics.
Other points are also very informative like unusual link building.
Thanks

estellaEffects said...

Love these tips!1

iWeb Square said...

I like to have interest in the creation of image sitemap and this is best for an ecommerce website containing large products with their images.

Garavi Gujarat said...

i just loved to have creation in images, it will be helpful for all news sites. l loved it.

Christian Schmidt said...

Hi Gary,

Thanks for responding.

I don't have any examples of pictures appearing twice (in fact I just checked and you always seem to pick up the largest picture), so from a user perspective it works fine.

My reason for asking is that according to my log files that the same picture is being requested in different sizes, even for old pictures that were published and indexed many months ago. This generates a lot of traffic (it seems that the bot does not send an If-Modified-Since - at least I see almost no "304 Not modified" responses), and it triggers our on-the-fly resizing mechanism (the cached images are resized, but only for a limited period).

So what I am really looking for is a way to direct all requests for all resized versions to the original picture (or at least some large, high-quality version, if the original picture is extremely big). I don't necessarily have a link to the original picture from my site (the image sizes actually used on the site differs depending on variousu template logic). If I block access with robots.txt, the pictures will not get indexed at all. So I was hoping that there is some way to tell the Googlebot-Image "You are allowed to download http://example.com/img/40x30/foo.jpg and http://example.com/img/80x60/foo.jpg, but in the future please only download the large version on http://example.com/img/800x600/foo.jpg".


Christian

Sergey Sypalo said...

Hello Gary,
Many thanks for explanation!
Afer reading comments from Christian Schmidt I want to ask another one question about describing same image that I have in several sizes. So basically on the I have:
http://photo.sypalo.com/backstages/harmony/2011/Autumn/Standard - small images (previews)
http://photo.sypalo.com/backstages/harmony/2011/Autumn/Standard/photo-1 - medium images (fullscreen)
http://photo.sypalo.com/backstages/harmony/2011/Autumn/Standard/photo-1/save - big images (originals)

All images have same name but stored on server in different directories like:
.../Standard/small/photo-1.jpg
.../Standard/medium/photo-1.jpg
.../Standard/full/photo-1.jpg

What the best way to tell google that there is same image but in different sizes. For now i have same ALT text for all sizes as well as same EXIF and image sitemap. What you will recommend to change/improve? Many thanks in advance

Sergey Sypalo said...

Hi Gary!

Forgot to add...
Do I need to include only biggest image in image sitemap or I can add all of them but put some desctiption about size? Same about ALT text, do I need to add size desctiption, so ALT text will looks like:
"Dancecleb Harmony Contest Autumn 2011 photo-1 small"
"Dancecleb Harmony Contest Autumn 2011 photo-1 medium"... and so on

Tom Conte said...
This comment has been removed by the author.
Tom Conte said...

Hi Gary,

I have an image sitemap that is hosted on domain but all the images are hosted on a subdomain.

In order for the images URLs to be crawled, I would need to verify the subdomain hosting the images in GWT?

In addition, if the answer is yes, I should verify the subdomain that I'm hosting videos on?

Thanks!

Tom

saffron said...

Ok, I can follow your recommendations. When I will be penalized for this as in last "antispam" filter ?

Arpit M said...

I've a question here...What if the image on one page (Page 1) is linked to the other page (Page 2)? Will Google still consider content on Page 1 for relating to the image or will Google consider the importance of the link and related content on Page 2 ?

swill388 said...

I am working as a designer and i love to make new and new designs. I love the coffee art images. I have learned some designs from it.

Byte64 said...

Hi,
I am wondering if it may be useful to go through each and every image included in my blogs hosted on Blogger and manually add the relevant ALT tag.
Since they have been already indexed, the question is if adding the ALT tag now will be picked up later or if it is basically a waste of time.

Thanks

emi_me said...

hey check this new website www.countcode.com. It's a social network made for programmers, where you can download,share or upload source codes, where you can count your own code lines for free. You have access to the web forum and the web chatroom. we are happy to have you joined to our community!

andrej said...

What if i have images that are displayed in different language versions of the same page. should i alter the url, to optimize it for the current language?
For example:
English url: marinas.info/germany/marina-image-1.jpg
German url: marinas.info/deutschland/marinas-bild-1.jpg

Anime Manga said...

means the details of the images should be described in order to get good search engine, and easy to find people and must be unique

Terrie@Basalite said...

Love the article! One question- I thought that the "alt" tag had been changed to "title" in regard to description. Does Google prefer one over the other ,or should we use both? thx!

Terrie@Basalite said...

Great article! However, one questoin. I thought the "alt" tag was changed to "title" for keyword/descriptive terms. Does Google prefer one command over the other, or is it best practice to use both?

Thx!

Agung Purnomo said...

whether the title attribute on the image also affects the ranking on google search result

sharethesepictures said...

will google crawl image titles ?

SENSER said...

I am wondering if google can crawl images in data uri format. What's the SEO impact by data uri?

Mikhail Gavryuchkov said...

Good article. I have a question about image originality. If one artist creates a great image and posts it in the gallery without much of textual content. Then after a while someone copies this image and posts on their page with additional textual embellishments. Following your logic, the second person will over-rank the original creator of the artwork. Is it fair? Google has always been promoting original textual content, why not do the same with images?

I played with google images by dragging and dropping different images to the search box. I can see that google can find whether this is a stock image or an original image pretty easy. So technology of identifying the original image is there. Why is it not used? Will it be used in the future?

Tom Conte said...

Is it possible to allow crawling of images hosted on a CDN as long as the robots.txt file for the CDN points to the Sitemap?