With the recent release of HAML 3.1, I decided to venture into its depths to figure out what makes it tick. What beasts lurk in the bowels of a templating system?
HAML is a templating language that allows you to write HTML using a terse syntax:
%article
%h1 My great article
%p
Here is the text of
my article
Which compiles to:
<article>
<h1>My great article</h1>
<p>Here is the text of my article</p>
</article>
It allows some extra nifty thing such as inline Ruby blocks that are closed by the significant whitespace. No doubt it has some interesting tricks up its sleeve.
Let’s go on safari.
Safari Time
As always, start by grabbing the code:
git clone git://github.com/nex3/haml
I encourage you to read it alongside this article.
There are two places I always start when investigating a library: the README, and the main require. Unfortunately most libraries don’t have a guide to diving into the code in their README, but it doesn’t hurt to look. For HAML we find some very nice user documentation, but nothing to point us in the right direction. That’s OK though, since we are greeted with a very nice comment in lib/haml.rb
that makes me smile:
# lib/haml.rb
# The module that contains everything Haml-related:
#
# * {Haml::Engine} is the class used to render Haml within Ruby code.
# * {Haml::Helpers} contains Ruby helpers available within Haml templates.
# * {Haml::Template} interfaces with web frameworks (Rails in particular).
# * {Haml::Error} is raised when Haml encounters an error.
# * {Haml::HTML} handles conversion of HTML to Haml.
We’ve found our guide! Class and Module levels headers like this are a godsend. You can write the nicest code in the world, but the sheer weight of it can be intimidating for new developers. Welcome developers in to your codebase.
It looks like Haml::Engine
is going to be the money ticket, and opening up lib/haml/engine.rb
we are welcomed by another comment that pays jackpot.
# This is the frontend for using Haml programmatically.
# It can be directly used by the user by creating a
# new instance and calling {#render} to render the template.
# For example:
#
# template = File.read('templates/really_cool_template.haml')
# haml_engine = Haml::Engine.new(template)
# output = haml_engine.render
# puts output
Let’s play along at home with irb
, and confirm that the suggested syntax does in fact work. Lauch irb
from within the HAML directory. -I
is a flag that adds a directory to the load path.
$ irb -Ilib
irb> require 'haml'
irb> Haml::Engine.new("%b hello").render
=> "<b>hello</b>"
Search for “def initialize” in lib/haml/engine.rb
to find our entry point. There are a lot of lines here, the trick to efficient reading when just trying to get the gist of a library is to be able to quickly skip over code that is unimportant to get to the guts of the program. Usually this means skipping over assignments and searching for method calls. I’ll often also work from the bottom up, starting at the return value. Typically methods are structured setup-action-return, and at the moment we are interested in the last two. Most of #initialize
is variable setup, but right near the end you will find a very intersting line:
# lib/haml/engine.rb:124
compile(parse)
Our first insight! It would appear HAML separates parsing of the document from the compilation down to HTML. This is a standard technique, separating out two very different concerns.
Parsing
Parsing is the act of taking a representation (in this case our HAML template) and preparing it for output to another representation (HTML). You can find the parsing code in lib/haml/parser.rb
, either by search the project for “def parse” or by noticing the include of Parser
at the top of Haml::Engine
. Starting at the bottom of the method, we see it is returning the instance variable @root
. This is handy — since Parser
is included as a module into the Engine
class, we should be able to easily inspect this instance variable. We can use the instance_eval
method to evaluate code in the context of any object, giving us access to even private methods and instance variables. This is a really bad idea for production code, but it’s a great exploration tool.
irb> input = "%article ... sample from above ..."
irb> Haml::Engine.new(input).instance_eval { @root }
=> (root nil
(tag {:name=>"article", :value=>nil}
(tag {:name=>"h1", :value=>"My great article"})
(tag {:name=>"p", :value=>nil}
(plain {:text=>"Here is the text of"})
(plain {:text=>"my article"})))
(haml_comment {:text=>""}))
irb> Haml::Engine.new(input).instance_eval { @root }.class
=> Haml::Parser::ParseNode
irb> Haml::Engine.new(input).instance_eval { @root }.children.map(&:class)
=> [Haml::Parser::ParseNode, Haml::Parser::ParseNode]
# (I edited out some extra values from the hashes for clarity.)
The parse
method is creating a tree of Haml::Parser::ParseNode
, creating an abstract representation of our document. In other words, this representation is not tied to the fact that our input was a string. This decouples the syntax of HAML from the output, which results in a nicer architecture. Note that there is always one special root node to attach the rest of the tree to.
Let’s delve into the parsing a bit more. Scanning the parse
method, we get the following basic structure:
while next_line
process_indent # decrease nesting if needed
process_line
if block_opened?
increase nesting
end
end
close open tags
There are two main functions here: dealing with indentation, and parsing the line. I’ll focus on the latter here, and leave reading the indentation code as an exercise for you to work on (see the end of the article). Once again, I’ll take a skeleton view of process_line
:
case first_char_of_line
when '%'; push tag(text)
when '.'; push div(text)
# ... other cases
else; push plain(text)
end
The tag
, div
and plain
methods construct and return ParseNode
objects, while push
adds the node to the current node’s children.
Making our own
We now have enough of an idea of how the parsing side of HAML works to try and put together a script ourselves. This helps to confirm that we have read the code correctly, and also to cement any knowledge we’ve learned. Let’s create a simple parser that will be able to transform our sample document from above into a tree of nodes, by starting with a simple case ignoring indentation.
require 'test/unit'
class HamlParserTest < Test::Unit::TestCase
def test_one_line_plain
tree = HamlParser.new("hello").parse
assert_equal 1, tree.children.size
assert_equal :plain, tree.children[0].type
assert_equal 'hello', tree.children[0].data[:value]
end
def test_one_line_tag_with_value
tree = HamlParser.new("%em hello").parse
assert_equal 1, tree.children.size
assert_equal :tag, tree.children[0].type
assert_equal 'em', tree.children[0].data[:name]
assert_equal 'hello', tree.children[0].data[:value]
end
end
class HamlParser
class Node < Struct.new(:type, :data)
attr_accessor :children
attr_accessor :parent # Used in next example
def initialize(*args)
super
self.children = []
end
end
def initialize(string)
@string = string
end
def parse
@root = Node.new(:root, {})
@root.children = @string.lines.map do |line|
parse_line(line)
end
@root
end
def parse_line(line)
case line[0]
when ?%
name, value = line[1..-1].split(' ')
Node.new(:tag, :name => name, :value => value)
else
Node.new(:plain, :value => line)
end
end
end
Test::Unit
is the unit testing framework provided in Ruby’s standard library. If you run this file you will see that it automatically runs the tests specified. It’s a great way to quickly build out a small project like this one. I’ve shaped the code similarly to the HAML code, with a parse_line
method that switches on the first character of the line, and a root node to hold the tree.
To support indentation, we need to set the parser up so it has a concept of the current node to add children to (instead of always adding to root as per our first example), and also of the current depth. To facilitate this, we will add a parent
accessor to nodes so that we can traverse both down and up the tree. This version is actually a bit simpler than the HAML code, but it gets the job done for now.
require 'test/unit'
class HamlParser < Test::Unit::TestCase
def test_tag_with_nested_value
tree = HamlParser.new("%em
hello").parse
assert<em>equal 1, tree.children.size
assert</em>equal :tag, tree.children[0].type
assert<em>equal 'em', tree.children[0].data[:name]
assert</em>equal 'hello', tree.children[0].children[0].data[:value]
end
end
class HamlParser
# Node and initialize as above
def parse
@root = Node.new(:root, {})
@parent = @root
@depth = 0
@string.lines.each do |line|
process_indent(line)
push parse_line(line.strip)
end
@root
end
def process_indent(line)
indent = line[/^s+/].to_s.length / 2
if indent > @depth
@parent = @parent.children.last
@depth = indent
end
end
def push(node)
@parent.children << node
node.parent = @parent
end
def parse_line(line)
# ... as above
end
end
This is a good start, and it parses our initial example code, but there is plenty more to do:
- Fix
process_indent
in our example so it also “de-indents” correctly. - It’s hard to visualise our parser output because the default Ruby
inspect
implementation doesn’t include a node’s children. OverrideNode#inspect
to provide a nice output like HAML does. - The HAML parser actually keeps track of two lines at once, rather than one as our parser does. Read through the HAML code to find instances of where this is useful.
Let us know how you go in the comments. Join me next week as I continue working through the second half of the process: the compile step.
Enjoy this article and have something to say? RubySource is currently seeking regular Ruby writers for paid work, check out the Write for Us page and get in touch
Xavier Shay is a DataMapper committer who has contributed to Ruby on Rails. He wrote the Enki blog platform, as well as other widely used libraries, including TufteGraph for JavaScript graphing and Kronic for date formatting and parsing. In 2010, he traveled the world running Ruby on Rails workshops titled "Your Database Is Your Friend." He regularly organizes and speaks at the Melbourne Ruby on Rails Meetup, and also blogs on personal development at TwoShay, and on more esoteric coding topics at Robot Has No Heart.