Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

On web semantics

Monday, July 16, 2012 at 10:42 AM

Webmaster level: All

In web development context, semantics refers to semantic markup, which means markup used according to its meaning and purpose.

Markup used according to its purpose means using heading elements (for instance, h1 to h6) to mark up headings, paragraph elements (p) for paragraphs, lists (ul, ol, dl, also datalist or menu) for lists, tables for data tables, and so on.

Stating the obvious became necessary in the old days, when the Web consisted of only a few web sites and authors used tables to code entire sites, table cells or paragraphs for headings, and thought about other creative ways to achieve the layout they wanted. (Admittedly, these authors had fewer instruments at their disposal than authors have today. There were times when coding a three column layout was literally impossible without using tables or images.)

Up until today authors were not always certain about what HTML element to use for what functional unit in their HTML page, though, and “living” specs like HTML 5 require authors to keep an eye on what elements will be there going forward to mark up what otherwise calls for “meaningless” fallback elements like div or span.

To know what elements HTML offers, and what meaning these elements have, it’s necessary to consult the HTML specs. There are indices—covering all HTML specs and elements—that make it a bit simpler to look up and find out the meaning of an element. However, in many cases it may be necessary to check what the HTML spec says.

For example, take the code element:

The code element represents a fragment of computer code. This could be an XML element name, a filename, a computer program, or any other string that a computer would recognize.

Author-controlled semantics

HTML elements carry meaning as defined by the HTML specs, yet ID and class names can bear meaning too. ID and class names, just like microdata, are typically under author control, the only exception being microformats. (We will not cover microdata or microformats in this article.)

ID and class names give authors a lot of freedom to work with HTML elements. There are a few basic rules of thumb that, when followed, make sure this freedom doesn’t turn into problems:

Advantages of using semantic markup

Using markup according to how it’s meant to be used, as well as modest use of functional ID and class names, has several advantages:

  • It’s the professional thing to do.
  • It’s more accessible.
  • It’s more maintainable.

Special cases

“Neutral” elements, elements with ambiguous meaning, and presentational elements constitute special cases.

div and span offer a “generic mechanism for adding structure to documents.” They can be used whenever there is no other element available that matches what the contents in question represent.

In the past a lot of confusion was caused by the b, strong, i, and em elements. Authors cursed b and i for being presentational, and typically suggested a 1:1 replacement with strong and em. Not to stir up the past, here’s what HTML 5 says, granting all four elements a raison d’être:

b “a span of text to be stylistically offset from the normal prose without conveying any extra importance, such as key words in a document abstract, product names in a review, or other spans of text whose typical typographic presentation is boldened” <p>The <b>frobonitor</b> and <b>barbinator</b> components are fried.
strong “strong importance for its contents” <p><strong>Warning.</strong> This dungeon is dangerous.
i “a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized” <p>The term <i>prose content</i> is defined above.
em “stress emphasis of its contents” <p><em>Cats</em> are cute animals.

Last but not least, there are truly presentational elements. These elements will be supported by user agents (browsers) for forever but shouldn’t be used anymore as presentational markup is not maintainable, and should be handled by style sheets instead. Some popular ones are:

  • center
  • font
  • s
  • u

How to tell whether you’re on track

A quick and dirty way to check the semantics of your page and understand how it might be interpreted by a screen reader is to disable CSS, for example using the Web Developer Toolbar extension available for Chrome and Firefox. This only identifies issues around the use of CSS to convey meaning, but can still be helpful.

There are also tools like W3C’s semantic data extractor that provide cues on the meaningfulness of your HTML code.

Other methods range from peer reviews (coding best practices) to user testing (accessibility).

Do’s and Don’ts

Don’t Do Reason
<p class"heading">foo</p>
<h1>foo</h1>
For headings there are heading elements.
<p><font size="2">bar</font></p>
<p>bar</p>

p { font-size: 1em; }
Presentational markup is expensive to maintain.
<table>
  <tr>
    <td class="heading">baz</td>
  </tr>
  <tr>
    <td>scribble</td>
  </tr>
</table>
<h1>baz</h1>
<p>scribble</p>
Use table elements for tabular data.
<div class="newrow">foo</div>
<div>1</div>
<div class="newrow">bar</div>
<div>2</div>
<table>
  <tr>
    <th>foo</th>
    <td>1</td>
  </tr>
  <tr>
    <th>bar</th>
    <td>2</td>
  </tr>
</table>
Use table elements for tabular data.
foo bar.<br><br>baz scribble.
<p>foo bar.</p>
<p>baz scribble.</p>
Denote paragraphs by paragraph elements, not line breaks.

The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

16 comments:

Marcos said...

So, what about Blogger?

For this post the source shows that you specifically fixed the use of <p>, but the previous post on this same blog shows how Blogger resorts to using line breaks to denote paragraphs.

What gives?

Matthew Brundage said...

Jens, for the record, the u and s tags have been resurrected in HTML5, for better or for worse: 3.3 Changed Elements

Vlad said...

Why doesn't google respect these specs on their own pages?

I know they want to adhere to standards and proselyte them because it makes their job easier but still...

sc said...

Good stuff, but I'd like to know if these tags are really important for Google for producing "prominent" content than other (i.e. heading more important than span or div). Thank you!

TheSEOGuru said...

Good Post Jens can you please share any benchmark site using these semantics.

şenol uysal said...

in do's and don'ts part "Use table elements for tabular data." has been written in two cells. The first cell (c4) should be different the second one (c5) is ok. please enligten if I am wrong.

Şenol uysal

Daniel Peiser said...

Şenol, the first example uses tables for non-tabular data. The comment written in Reason is right, but it could be "Use table elements only for tabular data." for more clarity.

kim Anderson said...

well nice post. the examples uses in table for non tabular data. only use the table element only for tabular data .

Unknown said...

This help help designers for creating a business website SEO friendly. Great post and thank you for sharing with us

Alex said...

Don't and Do are good. People will review their websites through this post

Tim Finin said...

I think talking about these issues using the terms "web semantics" and "semantic markup" introduces needless confusion with the concepts introduced by RDF and Microdata.

Ocean infosot - Web Development Company said...

Hi, thanks for sharing this information, i have read your whole article; I would like to share this information to my friends.

WANDI thok said...

Thanks for sharing this information, i have read your whole article; I would like to share this information to my friends

Scott Simpson said...

If I ever get around to refreshing the sites I made back in 1998 with FrontPage, I'll keep these tips in mind. Just goes to show that some of the lessons I learned back in the first edition of HTML For Dummies still hold true.

Aditya Solanki said...

something important for me. thanks a lot.

53north said...

..and, and, using headings for paragraphs will let googlebot know more about your page, your keyword intent, and the depth of the content.
It's quite possible to rank 1st page on a rare topic with a h3 paragraph..., but not with b, font size and br...