Code Safari: HAML, Compiling to Completion

Haml: Week 2

Last week we started working through the implementation of HAML, a popular templating language for HTML. We got through the parsing step, leaving the compile step to cover this week. If you haven’t read the previous article, I encourage you to do so now. This article will make a lot more sense that way. For reference, here is the example template we have been using:

%article
  %h1 My great article
  %p
    Here is the text of
    my article

And here is the particular line we left off at:

# lib/haml/engine.rb:124
compile(parse)

Compiling

We found this line last time and investigated the parse method. Let’s jump straight into the other side: compile. Searching the project for def compile we find this method in the expected module Haml::Compiler.

# lib/haml/compiler.rb:444
def compile(node)
  parent, @node = @node, node
  block = proc {node.children.each {|c| compile c}}
  send("compile_#{node.type}", &(block unless node.children.empty?))
ensure
  @node = parent
end

This is a neat little function that compiles a node, providing a block to compile all of the node’s children that can be yielded to as required. So compiling a tag with nested children, the compiler can emit the opening tag, yield to compile (and emit) the children, then after the yield completes add the closing tag.

The interesting call is the send on the third line. This calls a different method to compile each type of node. It’s a cute way of implementing the visitor pattern in Ruby that allows bypassing the traditional accept call of the algorithm. Rather than creating a class heirachy for each type of node that knows how to compile itself, HAML keeps all the compiler logic in the one place. This allows a conceptual grouping of like code, and also allows the compiler to easily maintain state (instance variables) without having to expose them outside of the class.

Let’s look at the compile methods for the two types of nodes we have in our example: tag and plain.

There is quite a lot going on in the compile_tag method, so we’ll need to apply our skills to filter it down to the essentials. There are a lot of conditionals to deal with different types of formatting: are we stripping whitespace? Are we outputting HTML5 or XHTML? Are we in ugly mode? These aren’t important for us to understand at the moment as they are not really essential to the algorithm.

# lib/haml/compiler.rb:91
# Heavily edited
def compile_tag
  t = @node.value

  object_ref = t[:object_ref]
  value = t[:value]

  tag_closed = !block_given?

  open_tag = prerender_tag(t[:name], t[:self_closing], t[:attributes])

  if tag_closed && value
    open_tag << "#{value}</#{t[:name]}>" # Point A
  end

  push_merged_text(open_tag, 0, true)
  return if tag_closed

  if value.nil?
    # Point B
    @output_tabs += 1 unless t[:nuke_inner_whitespace]
    yield if block_given?
    @output_tabs -= 1 unless t[:nuke_inner_whitespace]

    push_merged_text("</#{t[:name]}>", 0, true)
  end
end

I’ve stripped the preceding listing significantly to highlight just two code paths. Compare to the original source to get an idea of how I have “extracted” the essence of the method by deleting extraneous code. This is a hard skill, but becomes easier with practice.

With the new skeleton version, we can see two main branches: one for when a value is provided with the node (Point A), and one for when the value is nested beneath the tag (Point B). These correspond in order to the h1 and p tags in our example template. The yield just below point B will run the proc defined at line 2 of the compile method (listing above), which will in turn compile any child nodes (our text nodes for the p). push_merged_text pretty much just outputs the text to the buffer, with a bit of extra logic for things like ensuring that the appropriate indentation is preserved.

It’s quite a bit to take in, but the logic will become clearer below when we try to apply this concept to our own HAML interpreter.

After the behemoth compile_tag method, compile_plain is refreshingly short:

def compile_plain
  push_text @node.value[:text]
end

Yep, not much to it. It does help illustrate though the value of creating a parse tree before trying to emit any output — the complexity of parsing can be very different to the complexity of compiling, so it helps to separate out those concerns.

Building our own

Last week we built the beginnings of our own HAML interpreter. We got to the point of being able to create a tree of parse nodes, and this week we need to compile those nodes. For reference, here is the full listing of where we got to last time:

class HamlParser
  class Node < Struct.new(:type, :data)
    attr_accessor :children
    attr_accessor :parent

    def initialize(*args)
      super
      self.children = []
    end
  end

  def initialize(string)
    @string = string
  end

  def parse
    @root = Node.new(:root, {})
    @parent = @root
    @depth = 0
    @string.lines.each do |line|
      process_indent(line)

      push parse_line(line.strip)
    end
    @root
  end

  def process_indent(line)
    indent = line[/^s+/].to_s.length / 2
    if indent > @depth
      @parent = @parent.children.last
      @depth = indent
    end
  end

  def push(node)
    @parent.children << node
    node.parent = @parent
  end

  def parse_line(line)
    case line[0]
    when ?%
      name, value = line[1..-1].split(' ')
      Node.new(:tag, :name => name, :value => value)
    else
      Node.new(:plain, :value => line)
    end
  end
end

We won’t be changing that code, rather we will be adding a new HamlCompiler class to convert from the parse tree we have generated into HTML. First off, we will just implement the compile_plain method to convert plain text input into plain text output. As per last week, we are using test/unit to ensure we get things right. Run this script and you will see the tests being run automatically.

require 'test/unit'

class HamlCompilerTest < Test::Unit::TestCase
  def compile(input)
    HamlCompiler.new.compile(HamlParser.new(input).parse)
  end

  def test_compile_one_line_plain
    assert_equal 'hello', compile("hello")
  end
end

class HamlCompiler
  def compile(node)
    block = if node.children.any?
      proc { node.children.each {|x| compile(x) }}
    end
    send("compile_#{node.type}", node, &amp;block)
    @output
  end

  def compile_root(node)
    @output = ""
    yield
  end

  def compile_plain(node)
    @output << node.data[:value]
  end
end

Note we are taking advantage of using the visitor pattern by storing state in the @output variable. This means we don’t have to pass it through as arguments to all the compile methods, which we would have had to do if they were methods on ParseNode. As more state is needed (such as keeping track of indentation) this approach really starts to pay dividends. You can also see how always starting with a root node allows for a really simple algorithm that doesn’t need any special cases to get started, and even provides extra benefits such as giving us a good place to initialize the output.

Let’s add a compile_tag method so that we can completely interpret our example template. Hopefully it will be somewhat simpler than the one HAML uses!

class HamlCompilerTest < Test::Unit::TestCase
  # ... as above

  def test_compile_one_line_tag_with_value
    assert_equal '<em>hello there</em>', compile("%em hello there")
  end

  def test_compile_tag_with_nested_value
    assert_equal '<em>hello there</em>', compile("%em

hello there”)
end
end

class HamlCompiler
  # ... as above

  def compile_tag(node)
    tag_name = node.data[:name]

    @output << "<#{tag_name}>"
    if block_given?
      yield
    else
      @output << node.data[:value].to_s
    end
    @output << "</#{tag_name}>"
  end
end

Due to the simpler nature of our interpreter, we are able to get away with a simple conditional to cope with both one line and tags with nested text. And because of the recursive nature of the algorithm, it also deals with arbitrary nested nodes, such as the h1 and p tags inside article, with no special handling!

Wrapping it up

Over the course of this two-part series, we have dived into the code that drives the HAML templating engine, and discovered that is uses a two-pass method to render templates. On the first pass, it creates an abstract tree of parse nodes representing the document, and on the second it converts those nodes into HTML. This separation of responsiblities allows for a neat architecture.

In the process we took what we learned and created our own mini-HAML interpreter that was able to parse and compile simple documents.

For bonus points, here are some extra tasks you could tackle:

  • Add support for HTML attributes to our interpreter.
  • Our interpreter does not emit any significant whitespace. Fix it to nicely indent nested tags.
  • ARel, a relational algebra for Ruby that is used by Rails and ActiveRecord, also uses the visitor pattern to compile SQL. Compare it to HAML’s compiler.

Tune in next week for more adventures in the code jungle.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

No Reader comments

Comments on this post are closed.