This is known as the doctype (short for Document Type Definition). It must be the first item on a web page, appearing even before any spacing or carriage returns.
Have you ever taken a document you wrote in Microsoft Word 2007 on one computer, and tried to open it on another computer that only had Word 2000 on it? Frustratingly, without some preemptive massaging when the file is saved in the first place, this doesn’t work quite as expected. It fails because Word 2007 includes features that Bill Gates and his team hadn’t even dreamed of in 2000, and so Microsoft needed to create a new version of its file format to cater for these new features. Just as Microsoft has many different versions of Word, so too are there different versions of HTML, most recently XHTML and HTML 5. Mercifully, the different versions of HTML have been designed so that it doesn’t suffer the same incompatibility gremlins as Word, but it’s still important to identify the version of HTML that you’re using. This is where the doctype comes in. The doctype’s job is to specify which version of HTML the browser should expect to see. The browser uses this information to decide how it should render items on the screen.
The doctype above states that we’re using XHTML 1.0 Strict, and includes a URL to which the browser can refer: this URL points to the W3C’s specification for XHTML 1.0 Strict. Got all that? Okay, let’s take a jargon break! There are way too many abbreviations for this paragraph.
Note: HTML5 — The New Kid on the Block
Note that at the time of writing, the HTML5 specification is not yet finalized and browser support for it is also incomplete (understandable, given the moving goalposts). For this reason, we’ll not be covering HTML5 in this book. You should, however, be aware of its existence at the very least.
Note: Jargon Busting
URL stands for Uniform Resource Locator. It’s what some (admittedly more geeky) people refer to when they talk about a web site’s address. URL is definitely a useful term to know, though, because it’s becoming more and more common.
W3C is an abbreviation of the name World Wide Web Consortium, a group of smart people spread across the globe who, collectively, come up with proposals for the ways in which computing and markup languages used on the Web should be written. The W3C defines the rules, suggests usage, then publishes the agreed documentation for reference by interested parties, be they web site creators like yourself (once you’re done with this book, that is), or software developers who are building the programs that need to understand these languages (such as browsers or authoring software).
The W3C documents are the starting point, and indeed everything in this book is based on the original documents. But you won’t want to look at any W3C documents for a long time yet. They’re just plain scary for us mere mortals without Computer Science degrees. Just stick with this book for the time being and I’ll guide you through.
The html Element
So, the doctype has told the browser to expect a certain version of HTML. What comes next? Some HTML!
An XHTML document is built using elements. Remember, elements are the bricks that create the structures that hold a web page together. But what exactly is an element? What does an element look like, and what is its purpose?
- An XHTML element starts and ends with tags — the opening tag and the closing tag.
- A tag consists of an opening angled bracket (<), some text, and a closing bracket (>).
- Inside a tag, there is a tag name; there may also be one or more attributes.
Let’s take a look at the first element in the page: the html element. The figure below shows what we have.
The figure below depicts the opening tag, which marks the start of the element:
Below this we see the closing tag, which marks its end (and occurs right at the end of the document):
Here’s that line again, with the tag name in bold:
And there is one attribute in the opening tag:
Note: What’s an Attribute?
HTML elements can have a range of different attributes; the available attributes vary depending on which element you’re dealing with. Each attribute is made up of a name and a value, and these are always written as name="value". Some attributes are optional while others are compulsory, but together they give the browser important information that the element wouldn’t offer otherwise. For example, the image element (which we’ll learn about soon) has a compulsory “image source” attribute, the value of which gives the filename of the image. Attributes appear only in the opening tag of any given element. We’ll see more attributes crop up as we work our way through this project, and, at least initially, I’ll be making sure to point them out so that you’re familiar with them.
Back to the purpose of the html element. This is the outermost “container” of our web page; everything else (apart from the doctype) is kept within that outer container. Let’s peel off that outer layer and take a peek at the contents inside.
There are two major sections inside the html element: the head and the body. It’s not going to be difficult to remember the order in which those items should appear, unless you happen to enjoy doing headstands.
The head Element
The head element contains information about the page, but no information that will be displayed on the page itself. For example, it contains the title element, which tells the browser what to display in its title bar (the title bar is the very top part of the browser window — the part with minimize, maximize and close buttons):
The title Element
<title> and closing
</title> tags are wrapped around the words “Untitled Document” in the markup above. Note that the
<title> signifies the start, while the closing
</title> signifies the end of the title. That’s how closing tags work: they have forward slashes just after the first
< angle bracket.
The Untitled Document title is typical of what HTML authoring software provides as a starting point when you choose to create a new web page; it’s up to you to change those words. As the figure below shows, it really pays to put something meaningful as a title, and not just for the sake of those people who visit our web page.
The content of the title element is also used for a number of other purposes:
- It’s the name that appears in the Windows Taskbar — that strip along the bottom of your Windows desktop that show all the currently open windows — for any open document. It also appears in the dock on a Mac. When you have a few windows open, you’ll appreciate those people who have made an effort to enter a descriptive title!
- If users decide to add the page to their bookmarks (or favorites), the title will be used to name the bookmark.
- Your title element is used heavily by search engines to ascertain what your page contains, and what information about it should be displayed in the search results. Just for fun, and to see how many people forget to type in a useful title, try searching for the phrase Untitled Document in the search engine of your choice.
Inside the head element in our simple example, we can see a
meta element, which is shown in bold below:
meta elements can be used in a web page for many different reasons. Some are used to provide additional information that’s not displayed on screen to the browser or to search engines; for instance, the name of the page’s author or a copyright notice might be included in
meta elements. In the example above, the
meta tag tells the browser which character set to use (specifically, UTF-8, which includes the characters needed for web pages in just about any written language).
There are many different uses for
meta elements, but most of them will make no discernible difference to the way your page looks, and as such, won’t be of much interest to you (at least at this stage).
The meta element is an example of a self-closing element (or an empty element). Unlike
meta element needn’t contain anything, so we could write it as follows:
XHTML contains a number of empty elements, and the boffins who put together XHTML decided that writing all those closing tags would get annoying pretty quickly, so they decided to use self-closing tags: tags that end with
/>. So our
meta example becomes:
The Memory Game: Remembering Difficult Markup
If you’re thinking that the doctype and
meta elements are difficult to remember, and you’re wondering how on earth people commit them to memory, don’t worry, most people don’t. Even the most hardened and world-weary coders would have difficulty remembering these elements exactly, so most do the same thing — they copy from a source they know to be correct (most likely from their last project or piece of work). You’ll probably do the same as you work with project files for this book.
Fully-fledged web development programs, such as Dreamweaver, will normally take care of these difficult parts of coding. But if you are using a humble text editor and need some help, you need only remember that there is a completely searchable HTML reference, accessible at any time at SitePoint.com.
Finally, we get to the place where it all happens. The
body element of the page contains almost everything that you see on the screen: headings, paragraphs, images, any navigation that’s required, and footers that sit at the bottom of the web page: