Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Validation: measuring and tracking code quality

Monday, July 11, 2011 at 6:39 AM

Webmaster level: All

Google’s Webmaster Team is responsible for most of Google’s informational websites like Google’s Jobs site or Privacy Centers. Maintaining tens of thousands of pages and constantly releasing new Google sites requires more than just passion for the job: it requires quality management.

In this post we won’t talk about all the different tests that can be run to analyze a website; instead we’ll just talk about HTML and CSS validation, and tracking quality over time.

Why does validation matter? There are different perspectives on validation—at Google there are different approaches and priorities too—but the Webmaster Team considers validation a baseline quality attribute. It doesn’t guarantee accessibility, performance, or maintainability, but it reduces the number of possible issues that could arise and in many cases indicates appropriate use of technology.

While paying a lot of attention to validation, we’ve developed a system to use it as a quality metric to measure how we’re doing on our own pages. Here’s what we do: we give each of our pages a score from 0-10 points, where 0 is worst (pages with 10 or more HTML and CSS validation errors) and 10 is best (0 validation errors). We started doing this more than two years ago, first by taking samples, now monitoring all our pages.

Since the beginning we’ve been documenting the validation scores we were calculating so that we could actually see how we’re doing on average and where we’re headed: is our output improving, or is it getting worse?

Here’s what our data say:


Validation score development 2009-2011.


On average there are about three validation issues per page produced by the Webmaster Team (as we combine HTML and CSS validation in the scoring process, information about the origin gets lost), down from about four issues per page two years ago.

This information is valuable for us as it tells us how close we are to our goal of always shipping perfectly valid code, and it also tells us whether we’re on track or not. As you can see, with the exception of the 2nd quarter of 2009 and the 1st quarter of 2010, we are generally observing a positive trend.

What has to be kept in mind are issues with the integrity of the data, i.e. the sample size as well as “false positives” in the validators. We’re working with the W3C in several ways, including reporting and helping to fix issues in the validators; however, as software can never be perfect, sometimes pages get dinged for non-issues: see for example the border-radius issue that has recently been fixed. We know that this is negatively affecting the validation scores we’re determining, but we have no data yet to indicate how much.

Although we track more than just validation for quality control purposes, validation plays an important role in measuring the health of Google’s informational websites.

How do you use validation in your development process?

The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

47 comments:

kaidez said...

Any news on HTML5 Semantic validation? I know Google's attitude on HTML5 is "wait-and-see," but can you give ANY indication as to where things are headed in this respect?

Portland said...

So does validation have any say in the SERP's or SEO?

aaron robb said...

How does the Google validation deal with Schema.org markup? w3.org validation sees errors if code includes any Schema tags. Is there a way around this? Or has Google taken this into consideration?

Noodles said...

Any chance we'll see these metrics in Webmaster Tools for everyone?

8fe7b65c-b1af-11e0-a489-000f20980440 said...

What's the purpose of the score (0-10) and what happens when a page has 20 errors?

We do automated scoring at our company and we always measure by number of errors. Zero = no errors = perfect web page. Using our method, at least we then know that we have 20 errors to fix because the result (score) for our page is 20.

Or is this like PageRank on Google Toolbar and you are hiding the actual score from the public? I would also like to see these metrics in Webmaster Tools for everyone.

cj said...

All of these comments are off-topic. And almost all of them are self serving in a blatant, needy and unhelpful way. Good lord.

I like the idea of scoring pages to avoid over emphasizing the importance of having zero error validation. I also think it may be helpful to count errors by type, especially on a site with dynamically generated code.

For example, if the server side code generates an invalid attribute on an LI by default, like an href, and you have 12 generated LIs on a page, I'm not sure that page needs to score a zero.

Obviously, the validator has difficulty comprehending what's going on there, but it would be nice to have some kind of redundancy check, so that you're not penalized for making one error multiple times.

Fida said...

It's all over my head :(

DazzlinDonna said...

And here I've been, foolishly trying to figure out how to deal with the validation errors heaped on me by the new g:plusone button code for my XHTML site. I would not have guessed, based on that code, that Google cared at all about validation.

gavtaylor said...

why do you stop on information pages? why not validate all of Googles webpages? Homepage, SERPS, etc.

Sune said...

Validation is definitely a good thing. What puzzles me then, is why google+ snippets use iframes with the frameborder attribute which is obsolete in html5 and xhtml.

Defomaz said...

+1 for this comment

So does validation have any say in the SERP's or SEO?

Dictina said...

+1 for this comment, me too

So does validation have any say in the SERP's or SEO?

kaustiko lemoni said...

The current page has:

203 Errors, 237 warning(s)...

:D

Kruno said...

Which tools do you use?

PageRank-SEO said...

Thanks for sharing this post.

Could you be a little more explicit on what the Webmaster Team's quality metrics actually measures? While applause is certainly warranted for any sustained improvement in quality and it's great that the number of "validation issues" has dropped from four to three in the last two years, there appears to be a discrepancy between the definition of what's an "validation issue".

If we run yesterday's post on Google Webmaster Central Blog, http://googlewebmastercentral.blogspot.com/2011/07/validation-measuring-and-tracking-code.html , through any of the World Wide Web Consortium's validation services, the page is strewn with errors.

On Unicorn - W3C's Unified Validator there are 156 errors. Screen shot: http://goo.gl/UtPpC Unicorn - W3C's Unified Validator: http://goo.gl/fvxsx

On W3C Markup Validation Service there are 211 errors. Screen shot: http://goo.gl/cEFjp W3C Markup Validation Service: http://goo.gl/348DV Please be aware this service indicates errors in the comments. Yesterday, when I ran the page there were 160 errors. After I post this comment the number of errors will increase, again.

On The W3C CSS Validation Service there are 60 errors. Screen shot: http://goo.gl/BkC72 The W3C CSS Validation Service: http://goo.gl/UId7N

Will there be a public version of the tool with which you're measuring the pages. It would be helpful if we're all on the same page (pun intended).

Diggler said...

Definitely much better than Yahoo's webmaster blog, congratulations

AJClarke said...

@DazzlinDonna - LOL. That's exactly what I was thinking.

Test said...

@DazzlinDonna, note the alternative +1 button code at http://code.google.com/apis/+1button/#plusonetag.

@Kruno, we use internal instances of the W3C’s HTML and CSS validators.

SkyHighSoftware said...

my only error was the google plus one code copy and pasted from google :(

DazzlinDonna said...

Test, I've already checked out the alternative code. Those attributes still don't validate with XHTML, so it really doesn't help. If no attributes are passed at all, then it's fine, but if you want tall install of default size for example, forget it. No validation for you. :)

MK Safi said...

Excellent work with those metrics and stuff. But I'm a single webmaster, what can I do to make sure all my pages validate? I can't submit each and every URL to the online validator.

jobisez said...

Will this "validation score" be available to us on WMT?

Webnauts said...

@DazzlinDonna here I posted the solution to your problem http://community.seoworkers.com/threads/195-Googe-1-button?p=1312&viewfull=1#post1312

Ryan said...

There are some really awesome questions in the comments of this blog post. It would be great to see some response from the Google Webmaster Team. Any thoughts?

Вячеслав Вареня said...

I created my blog on blogspot.
How can I make blog template valid?
I asked this question on Russian help forum but no reply received.

Vinay said...

Does W3C Code and CSS verification affects our keyword ranking or page ranks in google.

jobisez said...

And by the way, that +1 code fails the W3C validator.

Here's the code... ""

Any sugestions on how to correct this?

Rick Vidallon said...

Like it or not, industry jargon often coughs up terms that become buzzwords. When this occurs—and it occurs across the board; web development is no exception—the terms can become diluted, even ambiguous. Two such terms lately include “validation” and “web standards.”

To be clear, the W3C provides specifications and recommendations, not mandates. In a rigorous sense, it can be argued that true web standards do not exist: they are a myth. Scary word! But don’t be alarmed. Don’t confuse myth with falsehood. So-called “web standards” are a myth in the sense that they describe an oft-repeated ideology that strives to establish popular convention.

Thinking optimistically, we might call these an ever-evolving ideal, something we as a community are still working toward perfecting. What we have, at present, are de facto guidelines, principles that serve an objective without being legally enforceable.

If a house’s wiring and electrical components are not UL-listed, the home inspector may refuse to issue occupancy permits. When ISO compliance isn’t met, products don’t ship. These are high stakes. On the other hand, in the face of invalid web markup, websites march on. The overwhelming majorities of surfers don’t bat an eyelash and don’t need to.

Provided the developer has written functional markup, failure to meet W3C validation means nothing more than the fact that a document contains something that is either not in the specification or is in disagreement with the specification. Invalid markup is therefore not necessarily in violation of anything.

These strong words—“invalid,” “violation”—may pack a punch to the layman, but in context of the web developer’s lexicon, they reflect markup that may be an addition to the specification or something the validator simply doesn’t recognize. Certain JavaScript that is universally understood by user agents, for example, does not appear in the HTML specifications.

Let’s not misunderstand. Poorly formed HTML can be a hassle to update. It may be a factor in search engine optimization (whose “standards” change often, to the chagrin of SEO subject matter experts). In some cases, it can cause content to load slowly (or appear to load slowly).

Validators are great for quickly spot-checking possible deal-breaker gaffes among copious volumes of markup. But validators are servants, not masters. W3C badges are effectively academic badges of honor. Such validation is an admirable enough goal, but is not always worth the return on investment in a production environment. Far more important is to ensure that markup is efficiently written.

W3C validation is not the web developer’s Holy Grail. Validation does not guarantee a site will look the same from platform to platform, from browser to browser. Validation does not assure that markup is efficiently written or adheres to a given entity’s assessment of best practices. What it means is that the developer has coded a functional document and used no markup in addition to that specified by the guidelines.

Wearing suspenders in addition to a belt isn’t illegal, it’s just … extra.

No harm in that, is there?

Mary said...

I use Blogger as my development-environment. Any chance you could have a word with the lads behind it, and suggest to them that it should produce "valider" code?

Putri Arisnawati said...

Hello google, I'm using blogger.com. Regularly, I always check my blogger script with validator from w3c. But I always found error script, and it always come from blogger default script. They script is no longer valid. I don't know why.

wasaweb.net said...

There used to be no errors on my site when I validated my pages, until the addition of the +1 buttons (small without the counter). These errors are of no real consequence, but it would please my slightly OCD personality greatly if there were none. I'm not yet prepared to move to HTML5, but I suppose that would be the eventual solution.

"Guppy" Honaker said...

Code quality? How do link farms such as http://www.affi2009-brest.com (PR5) and http://hydrosymposium.org (PR4) get reported to Google (from the /webmasters/tools/paidlinks page) more than once, and yet they still rank high and remain on the Google search engine? These are just two pages run by the same person/organization that other websites use to link and boost their own Google PR.

Code quality my butt.

tobto said...

thats realy a good news! I do it from the day one. But what about code validation of Google site with tons of mistakes?

Paul Ingraham said...

I validate as much as I can. I stalk perfection. I get as close as I can get, until some unavoidable snippet of code — I’m looking at you, g:plusone — barges in like the dork in the pack and loudly says, “Hey guys, what’s goin’ on?” scaring all the valid caribou away.

Then I have a martini.

papamike69 said...

[Rick Vidallon said... Wearing suspenders in addition to a belt isn’t illegal, it’s just … extra.]

This is true but, rope, belt or suspenders your pants are still going to fall down if they are not tied, buckled of fastened correctly.

I have had tons of sites that have not had all valid code and they have worked well across all platforms and for the most they have all been indexed and rank well in the SERP's. Granted in HTML, XHTML there are certain things that just don't work and can brake a site if not valid, but in my opinion, "if it ain't broke don't fix it". If a site is working well and ranking well and doing its job to make you money then why worry so much about how well it validates. No website is ever going to be perfect...

FfdE said...

Not only the 1+ Button, also the Google Custom Search create Warnings and Errors in Firebug and probably also in the Validator. And the need many seconds to load....

Nuttakorn Rattanachaisit said...

I can't really find the zero validation error, even Google.com , it got 35 Errors, 2 warning(s) from W3C Markup validation service. Google might need to push a lot to CMS platform rather than individual websites.

zeidanconsulting said...

Quality of the code always matter in programming

La Gringa said...

As Kaustiko pointed out above -- and it has since increased -- this Google page now has 346 Errors and 289 warnings on the Validator.

I'm not a professional but I really try to learn and do things properly on my blogspot blog. It's discouraging when so many things under the control of Google result in my blog having hundreds of errors and warnings. Some of those are simple careless CSS errors in Google's standard CSS files. The share feature with the Google Plus button alone adds dozens of errors to my site.

Similarly, we read that speed is important to Google, but some of the things I'm told to improve, again, are as a result of Google's coding, not mine. The Google search widget has a huge delay and several errors.

It is just very disappointing that Google has taken a "Do as we say, not as we do" attitude when it comes to Blogspot blogs.

Garini said...

Google was always look on our codes as this is the best way to learn if the site is OK if the site is being handle in professional way or not.

Google is always giving us a small amount of info as he don't want people will do SEO.

Code is one of the best way to learn if the site is man hand or machine... PANDA is based in this.

woodworker said...

You talk about validation but there are many sites in TOP10.
Explain it please.

Denis said...

Well, I'm using code validation during development of my pages. That's why my pages are almost validation error free.

The only errors I get on my homepage are... from google, facebook & twitter. Maybe you could ask your colleagues who are responsible for the code snippets to also produce valid code.

Thanks ;-)

yoetama said...

still confusing for a beginner like me, the default template there's still an error message

Deddy Suyanto said...

so many errors

sprinklerk.blogspot.com said...

Can any one say me how to validate my blogger blog as per w3c validation rules

La Gringa said...

Sprinklerk, that is the ironic thing! You'll never be able to validate a blogspot blog until Google changes the erroneous code that they put in your blog.

It would be really nice if someone from Google would comment to say whether they are working on these errors but apparently that is never going to happen.

Ryan Riyanto said...

This blog 267 Errors, 341 warning(s) in validator.w3.org, good or bad in SEO?