4 Easy Ways To Spruce Up Your HTML Markup

In the last issue of the Tech Times, I mentioned I was hard at work with the team here on a new front page design for sitepoint.com. While most of our time has been split between tweaking the CSS styles and crafting the PHP code that will generate the page, any major redesign brings with it the opportunity to improve the HTML code at the heart of your site.

Of course, any newly-written HTML code these days should validate. But there’s more to good HTML code than validation. Validation is the bare minimum you should be doing to assure the quality of your code.

This issue, I’d like to take a look at four simple things you can do to make sure your HTML has that nice, new markup smell.

Take Care Of Your Heading Structure

This is something I’ve banged on about in the Tech Times before. The headings in your document (<h1>, <h2>, etc.) should form a consistent hierarchy: one or more <h1> sections each containing <h2> sections, which in turn may contain <h3>s. Resist the urge to skip heading levels (e.g. placing a <h5> after a <h1>) to indicate levels of “importance” in your content.

The easiest way to check your heading structure nowadays is to use the Web Developer extension in Firefox. Simply click Information ? View Document Outline to see the outline of your page in a new tab.

SitePoint’s current home page doesn’t hold up too well to this test. Heading tags are chosen very much by the perceived importance of the text they contain. Heading levels are skipped, and important sub-headings are routinely given lower heading levels than the titles of the sections that contain them. In some places, non-heading text is even marked up using heading tags.

This type of heading structure is virtually impossible for users of assistive technologies like screen readers to navigate. It’s valid HTML, but it doesn’t do much to describe the structure of the content it is intended to describe.

The new HTML front page solves all of these issues. Since the front page represents the site as a whole, the page contains a single <h1>. All of the sections of the page are then marked with <h2> tags within that top-level section.

Although the titles of articles may be styled with larger fonts than structural headings like “Latest News”, the actual tags that are used to mark them up are chosen to describe the structure of the page, providing a useful map for screen reader users to navigate.

Replace Named Anchors With IDs

This one’s dead simple, but it’s something I still see developers who have been writing HTML for a long time get wrong. If you want to provide links to particular spots within a page (e.g. http://www.sitepoint.com/#news), you don’t need to fill your HTML code with <a name=""> tags—just use the id attribute on the elements you already do have!

In the past, if you wanted to provide a link to the ‘News’ section of your page, you’d have to do something like this:

<h2><a name="news"></a>News</h2>

These days, all browsers support in-page links based on the id attribute as well as the old-fashioned <a name=""> tag. So instead of the above, you can just do this:

<h2 id="news">News</h2>

A link to "#news" in this page will find either of the above example headings, but the second one is a lot neater, and also gives you the ability to apply styles to to heading based on its unique identifier if you need to.

Declare Your Language

It may be obvious to you what language the content of your document is written in, but to search engines and assistive technologies, this is an important piece of information that can be difficult to guess correctly.

Make sure the <html> tag in all your documents contain a lang attribute that identifies the primary language in use in your document. For English, set it to "en":

<html lang="en">

If your document is XHTML, you should also set the xml:lang attribute, which will be recognized by systems that understand XML:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

Declare Your Character Encoding

This is another axe I have to grind with many developers who should know better. The Tech Times #134 focused entirely on character encodings, and what every web developer should know about them. SitePoint later published Tommy Olsson’s article, The Definitive Guide to Web Character Encoding.

In short, an encoding describes how the binary bytes that your web server sends to the browser may be translated into the characters of text that make up your HTML code. If you don’t specify an encoding, the browser has to guess.

Depending on how your code editor is configured, chances are that you are encoding your HTML in plain Latin 1 (ISO-8859-1), in the extended version of Latin 1 called Windows-1252, or in the Unicode encoding UTF-8. UTF-8 lets you include the widest range of characters in your code, but current browsers will assume your code is Windows-1252 unless you tell them otherwise.

To encode your HTML pages in UTF-8, first make sure your text editor (along with the text editors of everyone working on your site) is set to default to UTF-8 encoding. If you’re using a simple editor like Notepad, you may have to tell it to save with UTF-8 encoding every time you create a new file—one more reason to avoid Notepad.

Once you’ve done that, make sure to include a <meta> tag in the <head> of all your HTML documents that declares the page as being encoded using UTF-8:

<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

The earlier this tag appears in the document, the less time the browser will have to waste guessing the document’s encoding, so it should really be the first thing to follow your opening <head> tag.

Finally, test one of these pages on your site to make sure that browsers recognize it as being encoded with UTF-8. In Firefox, you can right-click on the background of the page and choose Page Info. The Encoding should be shown on the first tab of the Page Info window. If it isn’t UTF-8, talk to your server admin to see about either getting the correct encoding declared in your site’s HTTP Response headers, or simply removing the encoding from these headers so that the <meta> tag can do its thing.

Any Suggestions?

Those are just four ways to freshen up your markup with the latest HTML techniques. If you can think of any others, be sure to leave a comment! I’d love to hear how you go the extra mile to make your HTML markup look sexy and new.

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • http://www.tyssendesign.com.au Tyssen

    Make sure all your labels have for attributes that match ids of their associated form elements.

  • http://www.cemerson.co.uk Stormrider

    “Make sure all your labels have for attributes that match ids of their associated form elements.”

    Absolutely. You see a lot of people trying to link it with the name attribute of the form element, which is wrong – form elements can share the same name in some cases, but have different labels.

  • http://www.ltheobald.co.uk Leesy

    should form a consistent hierarchy: one or more <h1> sections

    I thought best practice was to use only one H1 tag. Am I wrong in thinking this?

  • RyanR

    I thought best practice was to use only one H1 tag. Am I wrong in thinking this?

    On the home page it’s usually best to have one H1 tag for the name of the site, as the home page is usually about the site in some way. On subsequent pages there may be more than one top level heading, usually I use H2 but I can see possibilities for H1.

  • http://www.eclecticdreams.com Matt_Machell

    Be careful with replacing named anchors with ids. Everybody’s favourite browser (IE) can have problems with keyboard-based navigation and links to ids within elements without hasLayout. Not a problem in many cases, but sometimes very annoying.

  • CraigB

    “Be careful with replacing named anchors with ids. Everybody’s favourite browser (IE) can have problems…”

    Although I’ve not checked this for a while, a variety of mobile browsers used to just fail completely on links to id-based anchors. Not such a good thing when “skip navigation” links are in play at the top of the page.

  • http://www.sitepoint.com/ Kevin Yank

    Be careful with replacing named anchors with ids. Everybody’s favourite browser (IE) can have problems with keyboard-based navigation and links to ids within elements without hasLayout. Not a problem in many cases, but sometimes very annoying.

    Interesting! Have you got a testcase you can link to? I assume you meant the problem could occur with elements that have a layout, right? The vast majority of elements do not have a layout.

  • http://www.eclecticdreams.com Matt_Machell

    There’s an in depth look over at Bruce’s Blog

  • jaevans

    Also, watch your case on id’s or names.

  • http://www.silklink.co.uk silklink

    Great article. I learnt a thing or two from it – thanks.

    I agree with the comments so far on the use of form labels, as this is very important for accessibility. Many forms have no label elements at all.

    If, like me, you use a scripting language, like PHP, output is usually echo’d (printed) from within the script. It is easy to forge ahead with the html output with little regard for how it will look when viewed in the browser with the ‘view source’. What’s needed are tabs and line breaks to improve the layout, complete with indentation.

    Although this is more work for the coder, it can save a great deal of time in debig mode. Missing end tags and such like that can take an age to find become much more logical in a nice layout.

  • linux-mike

    Notepad, UTF-8 and PHP do not mix. Notepad is one of the, mainly Windows, apps which saves the UTF-8 BOM at the start of every file. This is optional and quite legal but totally unnecesary. Even in the latest Windows Server 2008 if you save a file from Notepad with “foo” as its only content as UTF-8 it is shown in properties as six bytes long.

    Most browsers will ignore a UTF-8 BOM at the start of a file. But will choke and display rubbish if they find one in the middle of the file.

    PHP Includes will add the whole file being included at the given location, including the BOM if it exists, throwing a BOM into the middle of your HTML markup.

  • http://dyersweb.com/ dyer85

    I know it’s not strictly HTML, but a major problem crippling readability and maintainability of your code can be throwing in JS in your markup. For example, using the attribute event handlers can become a real pain to maintain. It’s better to utilize the DOM in JS to deploy your scripting. It also makes it easier to gracefully handle when users don’t have JS activated.

  • bishfish

    Two things:

    “Resist the urge to skip heading levels (e.g. placing a after a ) to indicate levels of “importance” in your content.”
    Why? The H1 can be attention grabbing, the H5 brief explanation. We should not let the code get in the way of the message. Resist the urge at all costs to let the code get in the way of visual impact!

    This banging on about reading assist machines should be set against a reality check – how many visually impaired readers to the point of requiring machine assistance would be reading a site on web design? The web is littered with examples of where reading assistance is of minimal importance as are the numbers of visually impaired visiting them. As sites become more and more visual in content (eg video and image manipulation) the problems for Vis Imp will sadly only multiply – but the reality is this is no different to the problems Vis Imp face watching televsion.

  • Tim

    @bishfish, you’ve missed the entire point of this article if you think this is about making sitepoint.com usable by people who as you point out, may not be using the site. This article is about Sitepoint the company practising what they preach. How can you take them seriously if they harp on about accessibilty, but don’t make their own site follow the same rules they preach as being best practise. Just because one site may not be used by a certain demographic doesn’t mean you should ignore them when making your architectural and design choices. What Kevin is advocating here is simply good practise regardless of whether or not visually impaired people are viewing the site or not. The concept of visually impaired translates directly to the world of the semanic web and making your site machine readable. It is not just about the human users any more. Good structure is good structure. Don’t use the default styles that come with headings for instance to influence the choice of markup you use when creating the structure of your site.

  • http://dyersweb.com/ dyer85

    Why? The H1 can be attention grabbing, the H5 brief explanation. We should not let the code get in the way of the message. Resist the urge at all costs to let the code get in the way of visual impact!

    I understand what you’re saying, but you’re not limiting yourself by respecting the <h#> order. With CSS, you can style the headers however you please. Also, for brief explanations, what’s wrong with using <p>, <div<, or <span> with CSS for style?

  • Tim

    Kevin, I’d be interested to hear your take on the little smiley face image in the bottom left corner of this page. It seems to exist for the purpose of tracking wordpress stats, yet it holds no structural or design relevance to sitepoint.com. But it impacts, albeit in a very minor way, on the visual presentation. Shouldn’t trackers and counters etc never be seen or heard?

    Similarly when viewing the source of this page, there is a comment after the closing html tag like this “<!-- Dynamic Page Served (once) in 0.521 seconds -->“. Is this valid markup putting comments outside the closing root tag on the document?

  • bishfish

    Tim, I did not miss the point. I was merely trying to point out that any site, not just SitePoint, may spend too much time making their site Vis Imp safe, when the likely audience is very unlikely to have Vis Imp visitors.
    For instance a photographers gallery, art gallery, video collection, fly fishing site, the list goes on and on.
    To advocate strict linear presentation of headings does not allow for visual creativity in many cases.
    “Rules are for the obedience of fools, merely guides for the wise” – Douglas Bader, WWII legless pilot.

  • http://www.sitepoint.com/ Kevin Yank

    Tim,

    I’m not able to see the smiley face you mention. What browser are you seeing it in?

    Comments following the closing </html> tag are perfectly legal.

  • http://www.sitepoint.com/ Kevin Yank

    bishfish,

    To advocate strict linear presentation of headings does not allow for visual creativity in many cases.

    No one is arguing against the creative presentation of headings—you have complete freedom to control presentation using CSS.

    Can you give an example of how keeping to a correct linear heading structure might hinder visual creativity?

    It’s true that not all sites can justify going out of their way to accommodate non-visual user agents, but why not do it when it’s possible to do so with little or no inconvenience (as I believe is the case with heading structure)?

  • quba

    I agree with Leesy. There should only be one <h1> tag on a page. sub-sections are then divided by <h2>’s etc.

  • bishfish

    Kevin,

    I try to minimise the number of styles, classes and divs in my websites. I strive to make updates quick and clean.
    So and to be fair, it would be unusual to go from a H1 tag to a H5 – but it is not uncommon to go from a H1 to an H5.
    For example an article H1 heading, “New Egg Patterns Lethal on Trout”, but you want to ‘kick’ a sub-head/rider use a H4 “But only use if you don’t want to go to heaven”, then into the content itself.
    Sure you could use a span, or write a class, but I try to work on the KISS principle as far as possible.
    This page might give a clue as to mix and max headings:
    http://www.bishfish.co.nz/webbooks/smttrout/runlies.htm

  • meerkat

    bigfish, if you are you for real then I really suggest that you lift your game and enter the world of 21st century web design and build.

    There is simply no excuse for misusing HTML headings in order to create visual affect.

    Your comments are a depressing indictment to the fact that there are still people in the Web world who don’t understand the benefits of standards compliance, who don’t understand technically how to achieve it and who have a depressingly insular view of the people who are using the Web and how they are using it.

  • Dan

    The H1 can be attention grabbing, the H5 brief explanation.

    markup is about structure and content… NOT presentation. CSS determines what something looks like, no reason why a h5 can’t have bigger text than a h1, or vice versa, what’s important is that the document has a logical structure. You don’t read a book by looking at the 1st page, then the 3rd page, then the 5th page, then the 2nd page, there has to be a correct order, even if a browsers default styling of different headings isn’t how you want them to look, that’s irrelevant, that’s what CSS is for.

  • gigi

    Many H1 ? Does your page have more than a title ? There should be only one h1, you don’t have two titles for a page.

  • sitehatchery

    That the for value for a label corresponds with the “id” of the form field is new to me. If you can only have one ID per page, what would you do for a label representing a group of fields – such as with radio buttons? You wouldn’t want to add the same ID to each radio button.

  • Tim

    Kevin, smiley face is a 6pxx5px gif in the far bottom left corner of the footer visible in FF3 and IE7. I tried to put the src value in this post, but it was flagged as spam

  • Richard Bone

    @sitehatchery – When you’re dealing with a group of related form elements such as radio buttons it’s a good idea to wrap them in a fieldset and use the fieldset’s legend as the label for the group of elements while giving each element its own id and corresponding label. You can find a more in-depth look at this kind of stuff in the recent SitePoint article, Fancy Form Design Using CSS. Page 6 deals with this exact problem, but it’s all a good read.

  • Stevie D

    I can think of examples of when it may make sense to skip heading levels.

    Let’s say that you have a very structured document/set of pages, with the same subsections being repeated in each section – it may be that some of those sections can’t be subdivided.

    For example – if I was looking at a breakdown of local government services in this area, there is the region, then within that there are the counties, and within those are the districts, and then the individual departments – each of those would have a hierarchical heading. But what do you do when you have a county that doesn’t have any districts within it.

    ie.

    h1 – Yorkshire & Humber
    h2 – North Yorkshire
    h3 – Harrogate District
    h4 – Education
    h4 – Transport
    h4 – Health
    h3 – Scarborough District
    h4 – Education
    h4 – Transport
    h4 – Health
    h2 – West Yorkshire
    h3 – Leeds
    h4 – Education
    h4 – Transport
    h4 – Health
    h3 – Bradford
    h4 – Education
    h4 – Transport
    h4 – Health
    h2 – York
    h4 – Education
    h4 – Transport
    h4 – Health

    There is a consistent structure to this setup, with the subheadings of Education, Transport and Health being given the same heading level throughout the document. This makes more sense to me than to have them jumping up and down the hierarchy depending on how many tiers of government there are above them – it makes them easier to find and easier to style.

    One solution would be to have a “dummy heading” at h3 to fill the gap, but that could confuse users if they only find one sub-division and are looking for others.

  • Sarah

    Many H1 ? Does your page have more than a title ? There should be only one h1, you don’t have two titles for a page.

    Back in High school when learning how to write outlines. I remember being taught that if you have a I. then you need a II. And if you have an A. then you need a B.

    So if you have a
    I.
    A.
    a.
    b.
    B.

    An instructor would say you need to set it to
    I.
    A.
    B.
    II.

    Should a Title and a H1 be the same? Or like a formal outline are they 2 different parts of the layout? If the html heading were originally styled after the formal outline(and I think they were) then why do we diverge here?

  • Sarah

    One solution would be to have a “dummy heading” at h3 to fill the gap, but that could confuse users if they only find one sub-division and are looking for others.

    What about using CSS to style them and have them set as
    use classes
    .region
    .counties
    .districts
    .departments

    Or flip it
    Region > Counties > Departments > Districts

    Or if what you are really sorting is departments then maybe that is the most important thing
    Departments > Region > Counties > Districts

    Flipping to a bit of DB organization, What are you indexing..What should you be indexing? How is the list going to be used and what would be more efficient?
    I think that’s on of the big points of the Formal Outline. It allows you to organize and re organize your thoughts into the best possible way while still working with pencil and paper. If it doesn’t fit, reexamine it and make sure you are right before you find yourself committed with 500 lined of code written.

  • perreault

    I don’t understand the comments from users who say there can only be one level on a page. Why not? I would consider the tag to be the first element, then the levels would be the next, etc.

    I also agree with the user listing the various country sub-divisions and using that as a logical reason to skip heading levels on a particular page. Going straight from to to on a page that stands alone is fine. But if you have a single style sheet that is applied across dozens of pages, it is far easier to be consistent in a tag of always defining a county (and therefore always having the same style applied) across all of the pages than to say, OK, on this page we have a county for xyz country, but no regions, so on this page the county will have an tag (which of course makes it look differently).

    I don’t like add a whole bunch of additional markup just to make the two LOOK the same when in fact a county on one page SHOULD BE a county on another page — whether visually impaired or not.

  • perreault

    Sorry… not used to this forum’s requirements for tags… This should read

    I don’t understand the comments from users who say there can only be one h1 level on a page. Why not? I would consider the title tag to be the first element, then the h1 levels would be the next, and then h2, etc.

    I also agree with the user listing the various country sub-divisions and using that as a logical reason to skip heading levels on a particular page. Going straight from h1 to h2 to h3 on a page that stands alone is fine. But if you have a single style sheet that is applied across dozens of pages, it is far easier to be consistent in a tag of h3 always defining a county (and therefore always having the same style applied) across all of the pages than to say, OK, on this page we have a county for xyz country, but no regions, so on this page the county will have an h2 tag (which of course makes it look differently).

    I don’t like add a whole bunch of additional markup just to make the two LOOK the same when in fact a county on one page SHOULD BE a county on another page — whether visually impaired or not.

  • KenA

    I believe that in the HTML+CSS world the great issue is that it´s a combination of related tasks we as developers need to accomplish and normally it leads to a “fruit salad” as a result.

    There´s the technical aspect of coding HTML and then there´s the Semantical Html part and then we need to mix it all to the Visual part via CSS, which by the way has its intrinsic issues also.

    Not sure if I made myself clear here, but what I trying to point out here is that HTML development can be very personal and what appears to be right to one, can look very wrong to another.

    Some rules are very clear, like: Declare Your Language or Declare Your Character Encoding, but some like: Use just one H1 per page is very particular/personal rule.

    Maybe the best thing to do is to pre-establish some rules with your development team. This is not just for Html, but for the whole development cicle too. IMHO the most important thing is creating a solid development architecture, establishing the bases and only the start developing. It´s hard work since web development is client and server side combination.

    Well, that´s it … hope it helps …

  • Roy

    @bishfish I don’t want to pick on you, but the Bish on Fish site you linked to is a terrible example of how to markup pages. Valid, semantic markup does not interface with visual presentation. Off the top of my head I’d suggest reading Eric Meyer’s ‘
    CSS Web Site Design Hands on Training’ book. It will open your eyes as to how you can clean up your code, lean down the markup and improve your work.

  • http://autisticcuckoo.net/ AutisticCuckoo

    The easiest way to check your heading structure nowadays is to use the Web Developer extension in Firefox.

    Actually, using the ‘Table of Contents’ user style sheet in Opera 9.5 is even easier – you don’t even have to download and install an extension. ;)

    Why? The H1 can be attention grabbing, the H5 brief explanation. We should not let the code get in the way of the message. Resist the urge at all costs to let the code get in the way of visual impact!

    It appears you haven’t fully grasped the purpose of HTML. It’s about marking up semantics and structure. It has nothing whatsoever to do with presentation. For visual ‘impact’, use CSS. That’s its purpose, after all.

    I was merely trying to point out that any site, not just SitePoint, may spend too much time making their site Vis Imp safe, when the likely audience is very unlikely to have Vis Imp visitors.

    Search engine spiders are effectively visually impaired visitors. Even if you’re greedy and callous and couldn’t care less about people with disabilities, you probably want to be nice to the ‘bots.

  • http://www.silklink.co.uk silklink

    4 ways to spruce up your mark-up.

    Sorry, needed to repeat the H1 title to remind me what this is all about :-)

    The header structure is there, along with paragraph, block quotes, emphasis and text/font attributes to build structure into your ‘document’, just like a word processor.

    There is no such rule for having one or more H1 headings, but there is, as someone has already pointed out, a need to structure your headings so that they would, for example, make sense in a table of contents. You might make a rule to have one document title with a structure of headings below it. Think of it like a report structure.

    A H1 heading might give more emphasis to search engines when, for example, the words are wrapped in anchor tags, which are ferenced from a TOC at the head of the page, like this…

    Sporting Today

    Being wrapped in an anchor, the words have some sort of importance to Google PageRank when compare to no anchors, so I have read. However, this is something I tend to do for larger documents to make it easier for viewers to navigate around it – just the same as I do in my word processor for lengthy reports.

    So for me, sprucing up the HTML is all about enabling people to view my sites’ content and move around the document with ease.

  • bishfish

    Roy said,
    “@bishfish I don’t want to pick on you, but the Bish on Fish site you linked to is a terrible example of how to markup pages”
    Actually I don’t mind being ‘picked on’. If you don’t learn something every day you better pinch yourself, you could be dead.
    I started the Bishfish site in 1995, and it has grown like topsy ever since. Two years ago it was still table based! But despite the poor mark-up it keeps growing in visitor numbers, and has become a very prominent site in its field. Ultimately one thing is true about web site design – content is more important than the mark-up. (Of course bad mark-up can make even great content hard to digest.)
    But slowly and surely I am trying to update the mark-up.
    This whole discussion has revealed a great deal to me, so the process will go on.

  • Bernhard

    If you list important parts from top to bottom then you should really start with:
    Declare Your Doctype!
    Too many docs on the web still begin with <html> and leave it to the browser to guess what’s being served up.