Understanding ASTs by Building Your Own Babel Plugin

This article was peer reviewed by Tim Severien. Thanks to all of SitePoint’s peer reviewers for making SitePoint content the best it can be!

Every day, thousands of JavaScript developers use versions of the language that browser vendors haven’t even implemented yet. Many of them use language features that are nothing more than proposals, with no guarantee they’ll ever make it into the specification. All of this is made possible by the Babel project.

Babel is best known for being able to translate ES6 code into ES5 code that we can run safely today, however it also allows developers to write plugins that transform the structure of JavaScript programs at compile time.

Today, we’ll look at how we can write a Babel plugin to add immutable data by default to JavaScript. The code for this tutorial can be downloaded from our GitHub repo.

Language Overview

We want to design a plugin that will allow us to use regular object and array literals, which will be transformed into persistent data structures using Mori.

We want to write code like this:

var foo = { a: 1 };
var baz = foo.a = 2;
foo.a === 1;
baz.a === 2;

And transform it into code like this:

var foo = mori.hashMap('a', 1);
var baz = mori.assoc(foo, 'a', 2);
mori.get(foo, 'a') === 1;
mori.get(baz, 'a') === 2;

Let’s get started with MoriScript!

Babel Overview

If we look beneath the surface of Babel, we’ll find three important tools that handle the majority of the process.

Babel Process

Parse

Babylon is the parser and it understands how to take a string of JavaScript code and turn it into a computer friendly representation called an Abstract Syntax Tree (AST).

Transform

The babel-traverse module allows you to explore, analyse and potentially modify the AST.

Generate

Finally, the babel-generator module is used to turn the transformed AST back into regular code.

What is an AST?

It’s fundamental that we understand the purpose of an AST before continuing with this tutorial. So let’s dive in to see what they are and why we need them.

JavaScript programs are generally made up of a sequence of characters, each with some visual meaning for our human brains. This works really well for us, as it allows us to use matching characters ([], {}, ()), pairs of characters ('', "") and indentation to make our programs easier for us to interpret.

However, this isn’t very helpful for computers. For them, each of these characters is just a numeric value in memory and they can’t use them to ask high level questions like “How many variables are there in this declaration?”. Instead we need to compromise and find a way to turn our code into something that we can program and computers can understand.

Have a look at the following code.

var a = 3;
a + 5

When we generate an AST for this program, we end up with a structure that looks like this:

AST Example

All ASTs start with a Program node at the root of the tree, which contains all of the top level statements in our program. In this case, we only have two:

A VariableDeclaration with one VariableDeclarator that assigns the Identifier “a” to the NumericLiteral “3“.
An ExpressionStatement which is in turn is made up of a BinaryExpression, which is described as an Identifier “a“, an operator “+” and another NumericLiteral “5“.

Despite the fact that they are made up of simple building blocks, the size of ASTs means they are often quite complex, especially for nontrivial programs. Rather than trying to figure out ASTs ourselves, we can use astexplorer.net, which allows us to input JavaScript on the left, then outputs an explorable representation of the AST on the right. We’ll use this tool exclusively to understand and experiment with code as we continue.

To stay consistent with Babel, make sure you choose “babylon6” as a parser.

When writing a Babel plugin, it’s our job to take an AST then insert/move/replace/delete some nodes to create a new AST which can be used to generate code.

Setup

Make sure you have node and npm installed before you start. Then create a folder for the project, create a package.json file and install the following dev dependencies.

mkdir moriscript && cd moriscript
npm init -y
npm install --save-dev babel-core

Then we’ll create a file for our plugin and inside we’ll export a default function.

// moriscript.js
module.exports = function(babel) {
  var t = babel.types;
  return {
    visitor: {

    }
  };
};

This function exposes an interface for the visitor pattern, which we’ll come back to later.

Finally we’ll create an runner that we can use to test our plugin as we go.

// run.js
var fs = require('fs');
var babel = require('babel-core');
var moriscript = require('./moriscript');

// read the filename from the command line arguments
var fileName = process.argv[2];

// read the code from this file
fs.readFile(fileName, function(err, data) {
  if(err) throw err;

  // convert from a buffer to a string
  var src = data.toString();

  // use our plugin to transform the source
  var out = babel.transform(src, {
    plugins: [moriscript]
  });

  // print the generated code to screen
  console.log(out.code);
});

We can call this script with the name of an example MoriScript file to check that it generates the JavaScript we are expecting. For example, node run.js example.ms.

Arrays

The first and foremost goal for MoriScript is to convert Object and Array literals into their Mori counterparts: HashMaps and Vectors. We’ll tackle arrays first, as they’re slightly simpler.

var bar = [1, 2, 3];
// should become
var bar = mori.vector(1, 2, 3);

Paste the code from above into astexplorer and highlight the array literal [1, 2, 3] to see the corresponding AST nodes.

For the sake of readability, we’ll omit the metadata fields that we don’t need to worry about.

{
  "type": "ArrayExpression",
  "elements": [
    {
      "type": "NumericLiteral",
      "value": 1
    },
    {
      "type": "NumericLiteral",
      "value": 2
    },
    {
      "type": "NumericLiteral",
      "value": 3
    }
  ]
}

Now let’s do the same with the call to mori.vector(1, 2, 3).

{
  "type": "CallExpression",
  "callee": {
    "type": "MemberExpression",
    "object": {
      "type": "Identifier",
      "name": "mori"
    },
    "property": {
      "type": "Identifier",
      "name": "vector"
    }
  },
  "arguments": [
    {
      "type": "NumericLiteral",
      "value": 1
    },
    {
      "type": "NumericLiteral",
      "value": 2
    },
    {
      "type": "NumericLiteral",
      "value": 3
    }
  ]
}

If we express this visually, we’ll get a better sense of what needs to change between the two trees.

Array AST

Now we can see quite clearly that we’ll need to replace the top level expression, but we’ll be able to share the numeric literals between the two trees.

Let’s start by adding an ArrayExpression method onto our visitor object.

module.exports = function(babel) {
  var t = babel.types;
  return {
    visitor: {
      ArrayExpression: function(path) {

      }
    }
  };
};

When Babel traverses the AST it looks at each node and if it finds a corresponding method in our plugin’s visitor object, it passes the context into the method, so that we can analyse or manipulate it.

ArrayExpression: function(path) {
  path.replaceWith(
    t.callExpression(
      t.memberExpression(t.identifier('mori'), t.identifier('vector')),
      path.node.elements
    )
  );
}

We can find documentation for each type of expression with the babel-types package. In this case we’re going to replace the ArrayExpression with a CallExpression, which we can create with t.callExpression(callee, arguments). The thing we’re going to call is a MemberExpression which we can create with t.memberExpression(object, property).

You can also try this out in realtime inside astexplorer by clicking on the “transform” dropdown and selecting “babelv6”.

Objects

Next let’s take a look at objects.

var foo = { bar: 1 };
// should become
var foo = mori.hashMap('bar', 1);

The object literal has a similar structure to the ArrayExpression we saw earlier.

{
  "type": "ObjectExpression",
  "properties": [
    {
      "type": "ObjectProperty",
      "key": {
        "type": "Identifier",
        "name": "bar"
      },
      "value": {
        "type": "NumericLiteral",
        "value": 1
      }
    }
  ]
}

This is quite straightforward. There is an array of properties, each with a key and a value. Now let’s highlight the corresponding Mori call to mori.hashMap('bar', 1) and see how that compares.

{
  "type": "CallExpression",
  "callee": {
    "type": "MemberExpression",
    "object": {
      "type": "Identifier",
      "name": "mori"
    },
    "property": {
      "type": "Identifier",
      "name": "hashMap"
    }
  },
  "arguments": [
    {
      "type": "StringLiteral",
      "value": "bar"
    },
    {
      "type": "NumericLiteral",
      "value": 1
    }
  ]
}

Again, let’s also look at a visual representation of these ASTs.

Object AST

Like before, we have a CallExpression wrapped around a MemberExpression which we can borrow from our array code, but we’ll have to do something a bit more complicated to get the properties and values into a flat array.

ObjectExpression: function(path) {
  var props = [];

  path.node.properties.forEach(function(prop) {
    props.push(
      t.stringLiteral(prop.key.name),
      prop.value
    );
  });

  path.replaceWith(
    t.callExpression(
      t.memberExpression(t.identifier('mori'), t.identifier('hashMap')),
      props
    )
  );
}

This is mostly quite similar to the implementation for arrays, except we have to convert the Identifier into a StringLiteral to prevent ourselves ending up with code that looks like this:

// before
var foo = { bar: 1 };
// after
var foo = mori.hashMap(bar, 1);

Finally, we’ll create a helper function for creating the Mori MemberExpressions that we will continue to use.

function moriMethod(name) {
  return t.memberExpression(
    t.identifier('mori'),
    t.identifier(name)
  );
}

// now rewrite
t.memberExpression(t.identifier('mori'), t.identifier('methodName'));
// as
moriMethod('methodName');

Now we can create some test cases and run them to see whether our plugin is working:

mkdir test
echo -e "var foo = { a: 1 };\nvar baz = foo.a = 2;" > test/case.ms
node run.js test/case.ms

You should see the following output to the terminal:

var foo = mori.hashMap("a", 1);
var baz = foo.a = 2;

Assignment

For our new Mori data structures to be effective we’ll also have to override the native syntax for assigning new properties to them.

foo.bar = 3;
// needs to become
mori.assoc(foo, 'bar', 3);

Rather than continue to include the simplified AST we’ll just work with the diagrams and plugin code for now, but feel free to keep running these examples through astexplorer.

Assignment AST

We’ll have to extract and translate nodes from each side of the AssignmentExpression to create the desired CallExpression.

AssignmentExpression: function(path) {
  var lhs = path.node.left;
  var rhs = path.node.right;

  if(t.isMemberExpression(lhs)) {
    if(t.isIdentifier(lhs.property)) {
      lhs.property = t.stringLiteral(lhs.property.name);
    }

    path.replaceWith(
      t.callExpression(
        moriMethod('assoc'),
        [lhs.object, lhs.property, rhs]
      )
    );
  }
}

Our handler for AssignmentExpressions makes a preliminary check to see whether the expression on the left hand side is a MemberExpression (because we don’t want to mess with stuff like var a = 3). Then we replace the with with a new CallExpression using Mori’s assoc method.

Like before, we also have to handle cases where an Identifier is used and convert it into a StringLiteral.

Now create another test case and run the code to see whether it works:

echo -e "foo.bar = 3;" >> test/case.ms
node run.js test/case.ms

$ mori.assoc(foo, "bar", 3);

Membership

Finally, we’ll also have to override the native syntax for accessing a member of an object.

foo.bar;
// needs to become
mori.get(foo, 'bar');

Here’s the visual representation for the two ASTs.

Member AST

We can almost use the properties of the MemberExpression directly, however the property section will come as an Identifier, so we’ll need to convert it.

MemberExpression: function(path) {
  if(t.isAssignmentExpression(path.parent)) return;

  if(t.isIdentifier(path.node.property)) {
    path.node.property = t.stringLiteral(path.node.property.name);
  }

  path.replaceWith(
    t.callExpression(
      moriMethod('get'),
      [path.node.object, path.node.property]
    )
  );
}

The first important difference to note is that we’re exiting the function early if the parent of this node is an AssignmentExpression. This is because we want to let our AssignmentExpression visitor method deal with these cases.

This looks fine, but if you run this code, you’ll actually find yourself with a stack overflow error. This is because when we replace a given MemberExpression (foo.bar) we replace it with another one (mori.get). Babel then traverses this new node and passes it back into our visitor method recursively.

Hmm.

To get around this we can tag the return values from moriMethod and choose to ignore them in our MemberExpression method.

function moriMethod(name) {
  var expr = t.memberExpression(
    t.identifier('mori'),
    t.identifier(name)
  );

  expr.isClean = true;
  return expr;
}

Once it’s been tagged, we can add another return clause to our function.

MemberExpression: function(path) {
  if(path.node.isClean) return;
  if(t.isAssignmentExpression(path.parent)) return;

  // ...
}

Create a final test case and compile your code to check that it works.

echo -e "foo.bar" >> test/case.ms
node run.js test/case.ms

$ mori.get(foo, "bar");

All things being well, you now have a language that looks like JavaScript, but instead has immutable data structures by default, without compromising the original expressive syntax.

Conclusion

This was quite a code-heavy post, but we’ve covered all the basics for designing and building a Babel plugin that can be used to transform JavaScript files in a useful way. You can play with MoriScript in a REPL here and you can find the complete source on GitHub.

If you’re interested in going further and you want to read more about Babel plugins, then checkout the fantastic Babel Handbook and refer to the babel-plugin-hello-world repository on GitHub. Or just read through the source code for any of the 700+ Babel plugins already on npm. There’s also a Yeoman generator for scaffolding out new plugins.

Hopefully this article has inspired you to write a Babel plugin! But before you head off to implement the next great transpile-to language, there a few ground rules to be aware of. Babel is a JavaScript-to-JavaScript compiler. This means we can’t implement a language like CoffeeScript as a Babel plugin. We can only transform the slight superset of JavaScript that Babel’s parser can understand.

Here’s an idea for a novel plugin to get you started. You could abuse the bitwise | OR operator to create functional pipelines like you’d find in F#, Elm and LiveScript.

2 | double | square

// would become

square(double(2))

Or for example, inside an arrow function:

const doubleAndSquare = x => x | double | square

// would become

const doubleAndSquare = x => square(double(x));

// then use babel-preset-es2015

var doubleAndSquare = function doubleAndSquare(x) {
  return square(double(x));
};

Once you understand the rules, the only limits are the parser and your imagination.

Have you made a Babel plugin you want to share? Let me know in the comments.

Frequently Asked Questions (FAQs) about Understanding ASTs and Building Babel Plugin

What is an Abstract Syntax Tree (AST) and why is it important in programming?

An Abstract Syntax Tree (AST) is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. ASTs are important in programming because they help in understanding the structure and flow of the code. They are used in various applications such as code linting, code transformation, and even in the process of minification. Understanding ASTs can significantly improve your ability to work with certain tools and can also help you in writing more efficient code.

How does Babel use ASTs?

Babel is a popular JavaScript compiler that uses ASTs to transform your code. It parses your code into an AST, traverses through the AST to apply any transformations, and then generates the transformed code back. This process allows Babel to convert ES6 code into a format that older browsers can understand, or even transform JSX into regular JavaScript.

How can I visualize an AST?

To visualize an AST, you can use online tools like AST Explorer. You simply need to paste your code into the tool, and it will generate the corresponding AST. This can be extremely helpful when you’re trying to understand how your code is being interpreted, or when you’re trying to debug a complex piece of code.

What is a Babel plugin and how can I create one?

A Babel plugin is a small JavaScript program that instructs Babel on how to transform code. Creating a Babel plugin involves defining a visitor object that will be used to traverse the AST. The visitor object has methods that correspond to different types of nodes in the AST. These methods are called when a node of the corresponding type is found, allowing you to transform the code.

How can I use Babel plugins to transform my code?

Once you’ve created a Babel plugin, you can use it to transform your code by adding it to your Babel configuration. When Babel compiles your code, it will use the plugin to transform the AST before generating the final code. This can be used to implement custom transformations that aren’t possible with Babel’s built-in plugins.

What is the difference between Babel’s parse and parseExpression methods?

The parse method in Babel is used to parse a piece of code into an AST. On the other hand, the parseExpression method is used to parse a single expression into an AST. The main difference between the two is that parse can handle a full program, while parseExpression can only handle a single expression.

How can I traverse an AST using Babel?

Babel provides a traverse method that you can use to traverse an AST. This method takes an AST and a visitor object as arguments. The visitor object should have methods that correspond to the types of nodes you want to visit. When a node of a certain type is found, the corresponding method on the visitor object is called.

What is the role of AST in code linting?

In code linting, an AST is used to check the code for potential errors and enforce a certain coding style. The linter parses the code into an AST and then traverses the AST to check for any nodes that violate the rules. This allows the linter to provide accurate and helpful error messages.

How can I use ASTs to improve my code?

Understanding ASTs can help you write more efficient and cleaner code. By visualizing your code as an AST, you can gain a better understanding of how your code is structured and how different parts of your code interact with each other. This can help you identify potential issues and areas for improvement in your code.

What are some common use cases for Babel plugins?

Babel plugins are commonly used for transforming code to ensure compatibility with older browsers, transforming JSX into regular JavaScript, and implementing custom transformations that aren’t possible with Babel’s built-in plugins. They can also be used for code linting and enforcing coding styles.