Code Safari: Getting Started in HAML

With the recent release of HAML 3.1, I decided to venture into its depths to figure out what makes it tick. What beasts lurk in the bowels of a templating system?

HAML is a templating language that allows you to write HTML using a terse syntax:

%article
  %h1 My great article
  %p
    Here is the text of
    my article

Which compiles to:

<article>
  <h1>My great article</h1>
  <p>Here is the text of my article</p>
</article>

It allows some extra nifty thing such as inline Ruby blocks that are closed by the significant whitespace. No doubt it has some interesting tricks up its sleeve.

Let’s go on safari.

Key Takeaways

HAML 3.1 is a templating language that allows you to write HTML using a terse syntax. It separates the parsing of the document from the compilation down to HTML, which is a standard technique that separates out two different concerns.
HAML’s parsing process involves taking a representation (in this case, the HAML template) and preparing it for output to another representation (HTML). The parse method creates a tree of Haml::Parser::ParseNode, creating an abstract representation of the document that decouples the syntax of HAML from the output.
To understand how the parsing side of HAML works, one can create a simple parser that transforms a sample document into a tree of nodes. This involves setting up the parser to have a concept of the current node to add children to and the current depth, and adding a parent accessor to nodes so that the tree can be traversed both down and up.

Safari Time

As always, start by grabbing the code:

git clone git://github.com/nex3/haml

I encourage you to read it alongside this article.

There are two places I always start when investigating a library: the README, and the main require. Unfortunately most libraries don’t have a guide to diving into the code in their README, but it doesn’t hurt to look. For HAML we find some very nice user documentation, but nothing to point us in the right direction. That’s OK though, since we are greeted with a very nice comment in lib/haml.rb that makes me smile:

# lib/haml.rb
# The module that contains everything Haml-related:
#
# * {Haml::Engine} is the class used to render Haml within Ruby code.
# * {Haml::Helpers} contains Ruby helpers available within Haml templates.
# * {Haml::Template} interfaces with web frameworks (Rails in particular).
# * {Haml::Error} is raised when Haml encounters an error.
# * {Haml::HTML} handles conversion of HTML to Haml.

We’ve found our guide! Class and Module levels headers like this are a godsend. You can write the nicest code in the world, but the sheer weight of it can be intimidating for new developers. Welcome developers in to your codebase.

It looks like Haml::Engine is going to be the money ticket, and opening up lib/haml/engine.rb we are welcomed by another comment that pays jackpot.

# This is the frontend for using Haml programmatically.
# It can be directly used by the user by creating a
# new instance and calling {#render} to render the template.
# For example:
#
#     template = File.read('templates/really_cool_template.haml')
#     haml_engine = Haml::Engine.new(template)
#     output = haml_engine.render
#     puts output

Let’s play along at home with irb, and confirm that the suggested syntax does in fact work. Lauch irb from within the HAML directory. -I is a flag that adds a directory to the load path.

$ irb -Ilib
irb> require 'haml'
irb> Haml::Engine.new("%b hello").render
 => "<b>hello</b>"

Search for “def initialize” in lib/haml/engine.rb to find our entry point. There are a lot of lines here, the trick to efficient reading when just trying to get the gist of a library is to be able to quickly skip over code that is unimportant to get to the guts of the program. Usually this means skipping over assignments and searching for method calls. I’ll often also work from the bottom up, starting at the return value. Typically methods are structured setup-action-return, and at the moment we are interested in the last two. Most of #initialize is variable setup, but right near the end you will find a very intersting line:

# lib/haml/engine.rb:124
compile(parse)

Our first insight! It would appear HAML separates parsing of the document from the compilation down to HTML. This is a standard technique, separating out two very different concerns.

Parsing

Parsing is the act of taking a representation (in this case our HAML template) and preparing it for output to another representation (HTML). You can find the parsing code in lib/haml/parser.rb, either by search the project for “def parse” or by noticing the include of Parser at the top of Haml::Engine. Starting at the bottom of the method, we see it is returning the instance variable @root. This is handy — since Parser is included as a module into the Engine class, we should be able to easily inspect this instance variable. We can use the instance_eval method to evaluate code in the context of any object, giving us access to even private methods and instance variables. This is a really bad idea for production code, but it’s a great exploration tool.

irb> input = "%article ... sample from above ..."
irb> Haml::Engine.new(input).instance_eval { @root }
 => (root nil
  (tag {:name=>"article", :value=>nil}
    (tag {:name=>"h1", :value=>"My great article"})
    (tag {:name=>"p", :value=>nil}
      (plain {:text=>"Here is the text of"})
      (plain {:text=>"my article"})))
  (haml_comment {:text=>""})) 
irb> Haml::Engine.new(input).instance_eval { @root }.class
 => Haml::Parser::ParseNode
irb> Haml::Engine.new(input).instance_eval { @root }.children.map(&amp;:class)
 => [Haml::Parser::ParseNode, Haml::Parser::ParseNode] 
# (I edited out some extra values from the hashes for clarity.)

The parse method is creating a tree of Haml::Parser::ParseNode, creating an abstract representation of our document. In other words, this representation is not tied to the fact that our input was a string. This decouples the syntax of HAML from the output, which results in a nicer architecture. Note that there is always one special root node to attach the rest of the tree to.

Let’s delve into the parsing a bit more. Scanning the parse method, we get the following basic structure:

while next_line
  process_indent # decrease nesting if needed
  process_line
  if block_opened?
    increase nesting
  end
end
close open tags

There are two main functions here: dealing with indentation, and parsing the line. I’ll focus on the latter here, and leave reading the indentation code as an exercise for you to work on (see the end of the article). Once again, I’ll take a skeleton view of process_line:

case first_char_of_line
when '%'; push tag(text)
when '.'; push div(text)
# ... other cases
else;     push plain(text)
end

The tag, div and plain methods construct and return ParseNode objects, while push adds the node to the current node’s children.

Making our own

We now have enough of an idea of how the parsing side of HAML works to try and put together a script ourselves. This helps to confirm that we have read the code correctly, and also to cement any knowledge we’ve learned. Let’s create a simple parser that will be able to transform our sample document from above into a tree of nodes, by starting with a simple case ignoring indentation.

require 'test/unit'
class HamlParserTest < Test::Unit::TestCase
  def test_one_line_plain
    tree = HamlParser.new("hello").parse
    assert_equal 1, tree.children.size
    assert_equal :plain,  tree.children[0].type
    assert_equal 'hello', tree.children[0].data[:value]
  end
  def test_one_line_tag_with_value
    tree = HamlParser.new("%em hello").parse
    assert_equal 1, tree.children.size
    assert_equal :tag,    tree.children[0].type
    assert_equal 'em',    tree.children[0].data[:name]
    assert_equal 'hello', tree.children[0].data[:value]
  end
end
class HamlParser
  class Node < Struct.new(:type, :data)
    attr_accessor :children
    attr_accessor :parent   # Used in next example
    def initialize(*args)
      super
      self.children = []
    end
  end
  def initialize(string)
    @string = string
  end
  def parse
    @root = Node.new(:root, {})
    @root.children = @string.lines.map do |line|
      parse_line(line)
    end
    @root
  end
  def parse_line(line)
    case line[0]
    when ?%
      name, value = line[1..-1].split(' ')
      Node.new(:tag, :name => name, :value => value)
    else
      Node.new(:plain, :value => line)
    end
  end
end

Test::Unit is the unit testing framework provided in Ruby’s standard library. If you run this file you will see that it automatically runs the tests specified. It’s a great way to quickly build out a small project like this one. I’ve shaped the code similarly to the HAML code, with a parse_line method that switches on the first character of the line, and a root node to hold the tree.

To support indentation, we need to set the parser up so it has a concept of the current node to add children to (instead of always adding to root as per our first example), and also of the current depth. To facilitate this, we will add a parent accessor to nodes so that we can traverse both down and up the tree. This version is actually a bit simpler than the HAML code, but it gets the job done for now.

require 'test/unit'
class HamlParser < Test::Unit::TestCase
  def test_tag_with_nested_value
    tree = HamlParser.new("%em
hello").parse
        assert<em>equal 1, tree.children.size
        assert</em>equal :tag,    tree.children[0].type
        assert<em>equal 'em',    tree.children[0].data[:name]
        assert</em>equal 'hello', tree.children[0].children[0].data[:value]
      end
    end

class HamlParser
  # Node and initialize as above
  def parse
    @root = Node.new(:root, {})
    @parent = @root
    @depth = 0
    @string.lines.each do |line|
      process_indent(line)
      push parse_line(line.strip)
    end
    @root
  end
  def process_indent(line)
    indent = line[/^s+/].to_s.length / 2
    if indent > @depth
      @parent = @parent.children.last
      @depth = indent
    end
  end
  def push(node)
    @parent.children << node
    node.parent = @parent
  end
  def parse_line(line)
    # ... as above
  end
end

This is a good start, and it parses our initial example code, but there is plenty more to do:

Fix process_indent in our example so it also “de-indents” correctly.
It’s hard to visualise our parser output because the default Ruby inspect implementation doesn’t include a node’s children. Override Node#inspect to provide a nice output like HAML does.
The HAML parser actually keeps track of two lines at once, rather than one as our parser does. Read through the HAML code to find instances of where this is useful.

Let us know how you go in the comments. Join me next week as I continue working through the second half of the process: the compile step.

Enjoy this article and have something to say? RubySource is currently seeking regular Ruby writers for paid work, check out the Write for Us page and get in touch