Code Safari: HAML, Compiling to Completion
Haml: Week 2
Last week we started working through the implementation of HAML, a popular templating language for HTML. We got through the parsing step, leaving the compile step to cover this week. If you haven’t read the previous article, I encourage you to do so now. This article will make a lot more sense that way. For reference, here is the example template we have been using:
%article
%h1 My great article
%p
Here is the text of
my article
And here is the particular line we left off at:
# lib/haml/engine.rb:124
compile(parse)
Compiling
We found this line last time and investigated the parse
method. Let’s jump straight into the other side: compile
. Searching the project for def compile
we find this method in the expected module Haml::Compiler
.
# lib/haml/compiler.rb:444
def compile(node)
parent, @node = @node, node
block = proc {node.children.each {|c| compile c}}
send("compile_#{node.type}", &(block unless node.children.empty?))
ensure
@node = parent
end
This is a neat little function that compiles a node, providing a block to compile all of the node’s children that can be yielded to as required. So compiling a tag with nested children, the compiler can emit the opening tag, yield to compile (and emit) the children, then after the yield completes add the closing tag.
The interesting call is the send
on the third line. This calls a different method to compile each type of node. It’s a cute way of implementing the visitor pattern in Ruby that allows bypassing the traditional accept
call of the algorithm. Rather than creating a class heirachy for each type of node that knows how to compile itself, HAML keeps all the compiler logic in the one place. This allows a conceptual grouping of like code, and also allows the compiler to easily maintain state (instance variables) without having to expose them outside of the class.
Let’s look at the compile methods for the two types of nodes we have in our example: tag
and plain
.
There is quite a lot going on in the compile_tag
method, so we’ll need to apply our skills to filter it down to the essentials. There are a lot of conditionals to deal with different types of formatting: are we stripping whitespace? Are we outputting HTML5 or XHTML? Are we in ugly mode? These aren’t important for us to understand at the moment as they are not really essential to the algorithm.
# lib/haml/compiler.rb:91
# Heavily edited
def compile_tag
t = @node.value
object_ref = t[:object_ref]
value = t[:value]
tag_closed = !block_given?
open_tag = prerender_tag(t[:name], t[:self_closing], t[:attributes])
if tag_closed && value
open_tag << "#{value}</#{t[:name]}>" # Point A
end
push_merged_text(open_tag, 0, true)
return if tag_closed
if value.nil?
# Point B
@output_tabs += 1 unless t[:nuke_inner_whitespace]
yield if block_given?
@output_tabs -= 1 unless t[:nuke_inner_whitespace]
push_merged_text("</#{t[:name]}>", 0, true)
end
end
I’ve stripped the preceding listing significantly to highlight just two code paths. Compare to the original source to get an idea of how I have “extracted” the essence of the method by deleting extraneous code. This is a hard skill, but becomes easier with practice.
With the new skeleton version, we can see two main branches: one for when a value is provided with the node (Point A), and one for when the value is nested beneath the tag (Point B). These correspond in order to the h1
and p
tags in our example template. The yield
just below point B will run the proc
defined at line 2 of the compile
method (listing above), which will in turn compile any child nodes (our text nodes for the p
). push_merged_text
pretty much just outputs the text to the buffer, with a bit of extra logic for things like ensuring that the appropriate indentation is preserved.
It’s quite a bit to take in, but the logic will become clearer below when we try to apply this concept to our own HAML interpreter.
After the behemoth compile_tag
method, compile_plain
is refreshingly short:
def compile_plain
push_text @node.value[:text]
end
Yep, not much to it. It does help illustrate though the value of creating a parse tree before trying to emit any output — the complexity of parsing can be very different to the complexity of compiling, so it helps to separate out those concerns.
Building our own
Last week we built the beginnings of our own HAML interpreter. We got to the point of being able to create a tree of parse nodes, and this week we need to compile those nodes. For reference, here is the full listing of where we got to last time:
class HamlParser
class Node < Struct.new(:type, :data)
attr_accessor :children
attr_accessor :parent
def initialize(*args)
super
self.children = []
end
end
def initialize(string)
@string = string
end
def parse
@root = Node.new(:root, {})
@parent = @root
@depth = 0
@string.lines.each do |line|
process_indent(line)
push parse_line(line.strip)
end
@root
end
def process_indent(line)
indent = line[/^s+/].to_s.length / 2
if indent > @depth
@parent = @parent.children.last
@depth = indent
end
end
def push(node)
@parent.children << node
node.parent = @parent
end
def parse_line(line)
case line[0]
when ?%
name, value = line[1..-1].split(' ')
Node.new(:tag, :name => name, :value => value)
else
Node.new(:plain, :value => line)
end
end
end
We won’t be changing that code, rather we will be adding a new HamlCompiler
class to convert from the parse tree we have generated into HTML. First off, we will just implement the compile_plain
method to convert plain text input into plain text output. As per last week, we are using test/unit
to ensure we get things right. Run this script and you will see the tests being run automatically.
require 'test/unit'
class HamlCompilerTest < Test::Unit::TestCase
def compile(input)
HamlCompiler.new.compile(HamlParser.new(input).parse)
end
def test_compile_one_line_plain
assert_equal 'hello', compile("hello")
end
end
class HamlCompiler
def compile(node)
block = if node.children.any?
proc { node.children.each {|x| compile(x) }}
end
send("compile_#{node.type}", node, &block)
@output
end
def compile_root(node)
@output = ""
yield
end
def compile_plain(node)
@output << node.data[:value]
end
end
Note we are taking advantage of using the visitor pattern by storing state in the @output
variable. This means we don’t have to pass it through as arguments to all the compile methods, which we would have had to do if they were methods on ParseNode
. As more state is needed (such as keeping track of indentation) this approach really starts to pay dividends. You can also see how always starting with a root node allows for a really simple algorithm that doesn’t need any special cases to get started, and even provides extra benefits such as giving us a good place to initialize the output.
Let’s add a compile_tag
method so that we can completely interpret our example template. Hopefully it will be somewhat simpler than the one HAML uses!
class HamlCompilerTest < Test::Unit::TestCase
# ... as above
def test_compile_one_line_tag_with_value
assert_equal '<em>hello there</em>', compile("%em hello there")
end
def test_compile_tag_with_nested_value
assert_equal '<em>hello there</em>', compile("%em
hello there”)
end
end
class HamlCompiler
# ... as above
def compile_tag(node)
tag_name = node.data[:name]
@output << "<#{tag_name}>"
if block_given?
yield
else
@output << node.data[:value].to_s
end
@output << "</#{tag_name}>"
end
end
Due to the simpler nature of our interpreter, we are able to get away with a simple conditional to cope with both one line and tags with nested text. And because of the recursive nature of the algorithm, it also deals with arbitrary nested nodes, such as the h1
and p
tags inside article
, with no special handling!
Wrapping it up
Over the course of this two-part series, we have dived into the code that drives the HAML templating engine, and discovered that is uses a two-pass method to render templates. On the first pass, it creates an abstract tree of parse nodes representing the document, and on the second it converts those nodes into HTML. This separation of responsiblities allows for a neat architecture.
In the process we took what we learned and created our own mini-HAML interpreter that was able to parse and compile simple documents.
For bonus points, here are some extra tasks you could tackle:
- Add support for HTML attributes to our interpreter.
- Our interpreter does not emit any significant whitespace. Fix it to nicely indent nested tags.
- ARel, a relational algebra for Ruby that is used by Rails and ActiveRecord, also uses the visitor pattern to compile SQL. Compare it to HAML’s compiler.
Tune in next week for more adventures in the code jungle.