The big question...'How does the browser render a web page?'. Before starting, let's quickly answer some sub-questions:

  • Is this important?
  • Will this help my day to day development?
  • Where can I learn more about this?

Is this important?

Yes, this is very important. By understanding how the browser renders a web page/application, it will lead to better software design decisions, particularly for the Front End.

Will this help my day to day development?

Yes, this will. As mentioned it will lead to better software design decisions, in addition it will also explain why some industry best practices are used. This relates to performance, code quality and getting the best out of the browser.

Where can I learn more about this?

Links to external resources will be provided throughout the article series.

Constructing the DOM Tree

The DOM tree...quite an interesting abbreviation for the 'Document Object Model'. As i'm sure you're aware the DOM is an application programming interface (API) which allows developers to interact with the underlying document nodes in the browser. To demonstrate this, consider the following C++ code taken from Google's Chromium browser:

#include "core/html/HTMLParagraphElement.h"

#include "core/CSSPropertyNames.h"
#include "core/CSSValueKeywords.h"
#include "core/HTMLNames.h"

namespace blink {

using namespace HTMLNames;

inline HTMLParagraphElement::HTMLParagraphElement(Document& document)
    : HTMLElement(pTag, document)
{
}

DEFINE_NODE_FACTORY(HTMLParagraphElement)

void HTMLParagraphElement::collectStyleForPresentationAttribute(const QualifiedName& name, const AtomicString& value, MutableStylePropertySet* style)
{
    if (name == alignAttr) {
        if (equalIgnoringCase(value, "middle") || equalIgnoringCase(value, "center"))
            addPropertyToPresentationAttributeStyle(style, CSSPropertyTextAlign, CSSValueWebkitCenter);
        else if (equalIgnoringCase(value, "left"))
            addPropertyToPresentationAttributeStyle(style, CSSPropertyTextAlign, CSSValueWebkitLeft);
        else if (equalIgnoringCase(value, "right"))
            addPropertyToPresentationAttributeStyle(style, CSSPropertyTextAlign, CSSValueWebkitRight);
        else
            addPropertyToPresentationAttributeStyle(style, CSSPropertyTextAlign, value);
    } else {
        HTMLElement::collectStyleForPresentationAttribute(name, value, style);
    }
}

} // namespace blink

This is how a <p> tag looks inside of the browser. Therefore we can clearly see that constructing the DOM tree requires the underlying computation of each node, including its properties and methods. When constructing this tree, the browser firstly needs some important information. This information comes from the network in the form of bytes. The HTML5 specification explains that these bytes undergo a process called 'tokenization', however before these bytes can be tokenized, they must first be converted into characters. To understand tokenization it's helpful to think about what the browser needs to construct the DOM tree. Tokens are groups of characters ('input' will become a token [input]) which provide a template for the DOM tree. The browser uses these tokens to decide which DOM elements to create and how they will relate with each other. If the byte code for a <p> element comes in from the network, the tokenizer then turns this into a token which is then used with the various other converted tokens. These tokens are then turned into nodes (I know, it's quite an extensive process isn't it!) and finally are used to build the DOM tree. For example the following tokens:

[div][span][p][/p][img/][/span][ul][li][/li][/ul][/div]

will be used to eventually create the following DOM tree:

       [div]
       /   \
      /     \
   [span]   [ul]
    /  \       \
   /    \       \
 [p]    [img]  [li]

Once the DOM tree is created the browser now has a skeleton which it can use to paint elements onto the screen and compute the positions of each DOM node (positioning an element on the page); this process is known as reflow. For reflow to begin, the websites CSS code must now be converted into a structure very similar to the DOM tree. This is called the CSSOM tree, meaning 'CSS Object Model'.

So, to recap. When constructing the DOM the following sequence must occur (this is implemented in various browsers according to the HTML5 specification):

  1. Browser sends HTTP request for page
  2. Browser receives response from web server
  3. Browser converts response data (bytes) into tokens, via tokenization
  4. Browser turns tokens into nodes
  5. Browser turns nodes into the DOM tree
  6. DOM tree construction finished, awaiting CSSOM tree construction

It's important to understand that this sequence is not exhaustive, because the actual DOM construction process involves far more lower-level processes. If you are interested in finding out more, I recommend reading the HTML5 Specification. Although it's quite a lengthy document, there are links to the specific sections that you want. In addition, why not download the source code for Mozilla and/or Chromium (It's basically Google Chrome)? You can have a look in the codebase for yourself!

Next article...Constructing the CSSOM Tree