Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Using the site: command

Friday, March 02, 2007 at 1:49 PM

The site: command enables you to search through a particular site. For instance, a searcher could look for references to [Buffy] in this blog by doing the following search:

site:googlewebmastercentral.blogspot.com buffy

Webmasters sometimes use this command to see a list of indexed pages for a site, like this:

site:www.google.com

Note that with this command, there's no space between the colon and the URL. A search for www.site.com returns URLs that begin with www and a search for site.com returns URLs for all subdomains. (So, site:google.com returns URLs such as www.google.com, checkout.google.com, and finance.google.com). You can do this search from Google or you can go to your webmaster tools account and use the link under Statistics > Index stats. Note that whether this link includes the www depends on how you have added the site to your account.

Historically, Google has avoided showing pages that appear to be duplicate (e.g., pages with the same title and description) in search results. Our goal is to provide useful results to the searcher. However, with a site: command, searchers are likely looking for a full list of results from that site, so we are making a change to do that. In some cases, a site: search doesn't show a full list of results even when the pages are different, and we are resolving that issue as well. Note that this is a display issue only and doesn't in any way affect search rankings. If you see this behavior, simply click the "repeat the search with omitted results included" link to see the full list. The pages that initially don't display continue to show up for regular queries. The display issue affects only a site: search with no associated query. In addition, this display issue is unrelated to supplemental results. Any pages in supplemental results display "Supplemental Result" beside the URL.

Because this change to show all results for site: queries doesn't affect search rankings at all, it will probably happen in the normal course of events as we merge this change into the next time that we push a new executable for handling the site: command. As a result, it may be several weeks or so before you start to see this change, but we'll keep monitoring it to make sure the change goes out.
The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

18 comments:

LKBM said...

I don't use site: because I want a full listing, I use it because I know which site has (or doesn't have) the result I want and using a website's internal search feature has problems:
* Sometimes there isn't one.
* Google knows search better.
* I know Google's query format: OR, +, -, doublequotes, and so on. I don't know example.com's.
* I have my browser configured to make searching Google from my location bar trivial. I don't have the it configured for most websites.
* Google will fix my spelling better than anyone else can.

An easier way to say 'Don't hide results' would be nice, but when I search for 'intitle:"Marie Antoinette" site:everything2.com', I don't want to be shown all six results. (Sometimes it's a lot more than six dups.)

Just one will be sufficient.

Simon said...

"The site: command enables you to search through a particular site"

No, the site: command is far more useful than that, it returns results under a given part of the domain name hierarchy.

When I want sensible advice on nutrition, or health, I'll prefix a query with "site:gov" or "site:gov.uk" or "site:nhs.uk".

Want a UK vendor "site:co.uk", is as useful, sometimes more so, than the Google UK searches.

Perhaps this usage is more obvious/useful to those of us in countries with second level domains.

Jonathan Davis said...

Vanessa,
I am certainly glad you are fixing this command since us SEO folks rely on it a lot. How about the command link:domain.com -site:domain.com so we can see how many links google has in its index that are NOT from the domain we are reviewing?
Thanks for all your work.

jamiec said...

Vanessa,

Thanks for your efforts to make the site: query more accurate-- I think it's an extremely useful tool in gauging the success of different parts of our site.

Somewhat of a tangential comment-- I've noticed recently an issue with the cache: query. If you do a site: query on our domain (zoominfo.com) and then click the "Cached" link on our homepage, I get null results... which seems strange to me. Am I missing something there?

...I only mention it because we've seen a dramatic change in Google-referred traffic over the past ~week: a drop of ~80% for a site with a constant >1.2M pages indexed.

Anyway, thought I'd mention it. Thanks for all of your efforts.

Best regards,
Jamie

Digital Marketing South Africa said...

I use the site: command to check if Google sees any of my pages as duplicate. If there are any I can then change them. Will Google still some how show what pages it sees as duplicate?

Vanessa Fox said...

Googlepro, you can see a list of links to your site from other domains in webmaster tools:
http://googlewebmastercentral.blogspot.com/2007/02/discover-your-links.html

jaimiec, I can see the cache of your home page. Maybe a difference in data centers? I'll look into it.

digital marketing, you can see if pages have the same title and description with a site: search, but not necessarily if they're duplicate. Two pages with the same title and description could have different content.

LNOF said...

Hi Vanessa,

I'm pretty sure that you won't be able to give specifics but it would be interesting to see what exactly is regarded as duplicate content. Our site relaunch (which took 8 months hard hard work) seems to have been affected quite a bit.

e.g. our pages
http://www.lastnightoffreedom.co.uk/stag-weekends/bristol/ and http://www.lastnightoffreedom.co.uk/hen-weekends/bristol/ are very different pages (one aimed at men, one at women) but they do have some similar content and appear to be being penalised as such.

I'm happy that the pages describe perfectly what they are about but should I alter the title, description, h1 etc of one of the pages to eliminate any duplicate filters in order to rank better?

Best Regards

Matt

LNOF said...

Hi Vanessa,

I'm pretty sure that you won't be able to give specifics but it would be interesting to see what exactly is regarded as duplicate content? Our site relaunch (which took 8 months hard hard work) seems to have been affected quite a bit.

e.g. our pages
http://www.lastnightoffreedom.co.uk/stag-weekends/bristol/ and http://www.lastnightoffreedom.co.uk/hen-weekends/bristol/ are very different pages (one aimed at men, one at women) but they do have some similar content and appear to be being penalised as such.

I'm happy that the pages describe perfectly what they are about but should I alter the title, description, h1 etc of one of the pages to eliminate any duplicate filters in order to rank better?

This would appear to be a bit of a backwards SEO step as we'd be doing it entirely for ranking purposes and not those of user experience.

Best Regards

Matt

jamiec said...

Thanks, Vanessa! Yes, could be a data center issue. I just checked and still see the prob. If it helps, a search from google.com lead me to:

http://209.85.165.104/search?q=cache:zu7FWxeYTS4J:www.zoominfo.com/+site:zoominfo.com&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a

Just trying to figure out if this is related to our site's traffic drop (i.e. if our site doesn't have a "valid" homepage, does that hurt us?)

Either way, I appreciate the help.

Best,
Jamie

Eliane Alhadeff said...

All,
I've recently noticed an issue with the cache query. Although my sites are being indexed they both present a recurring problem:

a)If I do a search query for my site: "future-making serious games" at http://elianealhadeff.blogspot.com and click the "Cached" link to my homepage, that's what I get: "Your search - cache:DfcYgG_IDEsJ:elianealhadeff.blogspot.com/ "future-making serious games" - did not match any documents", although I do have almost 200 pages.

b)If I carry out a cache command, exactly as constant on Google Webmaster Tools, index.stats, that's waht I get:
"Your search - cache:elianealhadeff.blogspot.com - did not match any documents"

The same happens with my other site Serious Games Portal at http://seriousgamesportal.blogspot.com

Could you please coach me on how to resolve this issue.
Warmly, Eliane

Blog Bloke said...

Can you please tell me why my site search does not show any information relevant to the link?

http://www.google.com/search?q=site%3Ablog.instabloke.com

It only shows the link title followed by the blog description. Very strange, and that goes for all of the links as well.

I have been plagued with this problem for years now and I can't resolve it.

I await your answer and thanks in advance.

...BB

Shackmaster said...

Perhaps, Vanessa, you'd like to apologise on behalffor the inaccurate help pages served up by Google regarding sitemaps.

The carefully crafted example sitemap tells us all that the sitemap protocal in use of 0.84.

IS IT.

John Lewis, overworked, underpaid and pissed off.

admin said...

I am certainly glad you are fixing this command since us SEO folks rely on it a lot. How about the command link:domain.com -site:domain.com so we can see how many links google has in its index that are NOT from the domain we are reviewing?
Thanks for all your work. syamsul farid http://wisataku.blogspot.com

top everything said...

What about if the command site: is not showing your site but if you type info: then you see your site without description ? what does it means ?

Maureen said...

I have noted that when Compupaye is entered in google search box, it brings us accesstrust.org first instead of compupaye.com.

CompuPaye - UK Payroll Bureau Service Company in search shows www.accesstrust.org.uk/ instead of www.compupaye.com


I have complained to my webhosting company as accesstrust.org (who is nothing to do with Compupaye) does not exist. accesstrust.org uses the same nameservers as compupaye.com both hosted by Verio (NTTL Europe).
I cannot remove accesstrust.org using webmasters tools because I dont own that site.

Some days, I am on top of the search engines with webmaster tools confirming on the info link, Some days, I dissapear altogether with webmaster tools where no information can be found on the info link, My site now only gets crawled once a month. I have no idea how accesstrust.org got mixed up with Compupaye.com. As both names share the same nameserver, google much have cached this from my webhosting company. Any ideas on how to fix this?

rpatton said...

I'm having trouble using the site: command. I'm trying to come up with the number of pages that Google has indexed on our site. If I do a search on "site:turbodieselregister.com" (without the quotes but I'm putting them here so you can see exactly what I'm searching on) I get 9,300 results. But if I do a search on "site:turbodieselregister.com forums" I get 1,620,000 results. Why don't all of those results come back on the basic site: search without additional terms?

lee said...

I have noticed a change over the last week with the site command. It keeps fluctuating its results on the .co.uk search and the web search. The stats for indexed pages are very random at present and am still trying to figure out if Google is doing an update or the site command is broken.

Google Webmaster Central said...

Hi everyone,

Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Help Group.

Thanks and take care,
The Webmaster Central Team