Understanding Browser Performance
What is performance?
The first is load time. From the time that a user clicks a link or enters a URL into the address bar, how long does it take to load that page? Even this has multiple definitions as there’s one hand where it’s how long it takes to completely load the page and on the other hand it’s the user’s perception of how long before the page does something useful or is responsive.
For example, you could load the main content and make the page available for interaction while you programmatically go back to the server and grab the rest of the content.
The second performance metric is more complicated. This one is about how fast animations and interactions on your site perform in the browser.
Web Runtime Architecture
In both cases, it takes a great understanding of the web runtime architecture to tune either of these cases. I’m deliberately saying web runtime architecture rather than the IE runtime architecture because all of the browsers have roughly the same architecture when talking about it at this level because they are all implementing the same standards. They may have different names for their subsystems or have optimized one over the other but in general this is the flow of every browser.
Roughly, there are three categories of things to think about, those that affect network, those that affect the CPU and those that affect the GPU.
The first thing that happens is networking. Networking starts when the URL is first entered. For now, I’ll be talking about starting with an HTML page but the reality is that networking handles whatever type of resource that’s referenced by the page including XML, JSON, IMG, PDF, etc. and different subsystems come into play for various resources.
It’s actually up from where the spec started which was two simultaneous connections! The reason for this limitation is to help a server detect a Denial of Service Attack (DOS or DDOS), where the client opens up as many connections as possible and takes as much of the bandwidth from the server as they possibly can. The good news about the six limit is that it allows the browser to go get more content at the same time.
I’m over-simplifying the job of the parsers, and just covering those could be an entire separate article. The important part about the conversation at the moment is that writing well-formed documents — HTML, XML, XSLT, JSON… — can improve your load time.
Internal Data Structures
Every time something touches the DOM, it affects everything downstream from there. Why that’s a problem will become clearer in just a bit.
$(“path statement”) syntax is incredibly powerful as a selector for the DOM. The
$.getJSON() method is an incredibly simple to use and fast method of making AJAX requests.
Styling – Formatting and Layout
The next step, styling, actually covers two subsystems.
The first styling subsystem is formatting. Formatting is important because the DOM tree is completely ignorant to anything visual. All it knows is the parent child relationships between the elements and the attributes. It’s the CSS cascade that knows all of that information. During formatting this information is joined up giving the DOM elements size, color, backgrounds, font sizes and so on.
At the end of this subsystem, the individual elements each know what they look like.
The next step, now that all of the DOM elements know what they look like, is to figure out how they look together on the page – this is the layout process.
CSS is inherently a block based layout. This means that everything; images, paragraphs, divs, spans, event shapes such as circles, are actually blocks to CSS. The big question is how big of a block. And HTML/CSS is inherently a flow based layout unless something overrides that. This means that the content, by default, will go to the next available top right-hand corner in which it fits.
So if your screen is 800px wide, and you have four elements that are 300px wide each, you’ll fit two across the top and the third one will go on the next row of items. The CSS Cascade can override this however and position items either in a relational or absolute manner. If it’s relational, its position is dictated to be a certain distance from the previous item that was laid out on the screen.
If it’s absolute, it ignores all of the rules and is laid out a certain distance from the uppermost right hand corner of the page. In this case, it’s likely that it will even be positioned over another item on the screen.
As such, the primary job of the layout engine is to put all of the blocks on the screen. This includes positioning objects based on their relative or absolute positioning, wrapping or scaling things that are too wide and all of the other things that go into that lightning round of Tetris that is required.
At the end of the layout phase, the browser has an internal structure called the display tree. This is an interesting data structure. Some folks ask why browsers don’t just use the DOM tree but it’s not a 1-1 relationship between the items in the DOM tree and the items in the display tree. While this may take a moment to get your head around, it makes sense.
There are things that are in the DOM tree that are not in the display tree such as elements that are display:none or input fields that are type=hidden. There are also things that are in the display tree that are not in the DOM tree such as the numbers on an ordered list (
Paint and Compositing
At this point, the styling is done and the display tree is ready to paint. There’s a careful choice of words there in that the browser is ready to paint but is not painting just yet. What’s next requires the hardware to tell the browser the next time to do a paint to the screen. Monitors can only paint so many times a second. Most modern day ones do that 60 times a second, thus the 60 hertz refresh rate.
This means that it’s 1000/60 (milliseconds divided by the refresh rate) on most monitors between refreshes which works out to 16.67 milliseconds (roughly). That doesn’t sound like a lot of time but in reality a modern day I7 processor can run roughly 2138333333 instructions (Millions of instructions per second as noted in Wikipedia on Instructions per Second/60). Between me and you, that’s a lot of instructions.
It’s not infinite but it’s a lot. It means that the computer can do roughly 35638889 instructions between monitor refreshes.
When the monitor is ready to paint, it sends the hardware interrupt which initiates the paint cycle. At this point, the browser paints the layers of the page to the surface which for IE is DirectX surfaces which are composited and displayed on the monitor for your user’s enjoyment by the Desktop Windows Manager (DWM).
At this point, the browser has shown something on the screen so this is load time. However, there is no animation yet or any interaction.
Hopefully at this point you can see the issue there. Touching the DOM tree means that the display tree is no longer an accurate representation of the DOM. This means that the browser needs to redo the layout and the formatting and get ready to re-paint.
This is what I meant earlier when I said that describing the DOM as being “slow” is complicated. It’s not that manipulating the DOM is particularly slow by itself. What’s slow is that touching it creates the domino effect of forcing the other subsystems to trigger.
Recap of the Web Runtime Architecture
Understanding what to do to get the best performance reduce the impact of those decisions is what makes the difference between a good and a truly great web developer.