An Alternative to Regular Expressions: apg-exp

This article was peer reviewed by Sebastian Seitz and Almir Bijedic. Thanks to all of SitePoint’s peer reviewers for making SitePoint content the best it can be!

Hardly any programmer escapes the need to use regular expressions in one form or another from time to time. For many, the pattern syntax can seem cryptic and forbidding. This tutorial will introduce a new pattern-matching engine, apg-exp—a feature-rich alternative to RegExp with an ABNF pattern syntax that is a little easier on the eyes.

A Quick Comparison

Have you ever needed to verify an email address and come across something like this?

^[\w!#$%&'*+/=?^_`{|}~-]+(?:\.[\w!#$%&'*+/=?^_`{|}~-]+)*@(?:[A-Z0-9-]+\.)+[A-Z]{2,6}$

A pattern-matching engine is the right tool for the job. This is a well-designed, well-written regular expression. It works great. So what’s not to like?

Well, if you are an expert with regular expressions, nothing at all. But for the rest of us, they may be

Hard to read
Even harder to write
Hard to maintain

The regular expression syntax has a long, time-honored history and is deeply integrated into many of the tools and languages that we, as programmers, use every day.

There is, however, an alternative syntax that has been around almost as long, is very popular with writers and users of Internet technical specifications, has all the power of regular expressions but is seldom used in the world of JavaScript programming. Namely, the Augmented Backus-Naur Form, or ABNF, formally defined by the IETF in RFC 5234 and RFC 7405.

Let’s see what that same email address might look like in ABNF.

email-address   = local "@" domain
local           = local-word *("." local-word)
domain          = 1*(sub-domain ".") top-domain
local-word      = 1*local-char
sub-domain      = 1*sub-domain-char
top-domain      = 2*6top-domain-char
local-char      = alpha / num / special
sub-domain-char = alpha / num / "-"
top-domain-char = alpha
alpha           = %d65-90 / %d97-122
num             = %d48-57
special         = %d33 / %d35 / %d36-39 / %d42-43 / %d45 / %d47 
                / %d61 / %d63 / %d94-96 / %d123-126

Not as compact, for sure, but like HTML and XML it is designed to be read by humans as well as machines. I’m guessing that with nothing more than a passing knowledge of wild card search patterns, you can just about read what is going on here in “plain English”.

the email address is defined as a local part and a domain separated by @
the local part is one word followed by optional dot-separated words
the domain is one or more dot-separated sub-domains followed by a single top domain
the only things you might not know here, but can probably guess, are:
- just as the wild card character * means “zero or more”, 1* means “one or more” and 2*6 means min 2 and max 6 repetitions
- / separates alternate choices
- %d defines decimal character codes and character code ranges
- for example, %d35 represents #, ASCII decimal 35
- %d65-90 represents any character in the range A-Z, ASCII decimals 65-90

RegExp and apg-exp are compared for this email address in example 1.

apg-exp is a pattern-matching engine designed to have the look and feel of RegExp but to use the ABNF syntax for pattern definitions. In the next few sections I’ll walk you through:

How to get apg-exp into your app
A short guide to the ABNF syntax
Working with apg-exp—a few examples
Where to go next—more details, advanced examples

Up and Running—How to Get It

npm

If you are working in a Node.js environment, from your project directory run:

npm install apg-exp --save

You can then access it in your code with require().

For example:

var ApgExp = require("apg-exp");
var exp = new ApgExp(pattern, flags);
var result = exp.exec(stringToMatch);

GitHub

To get a copy of the code from GitHub, you can clone the repository to your project directory:

git clone https://github.com/ldthomas/apg-js2-exp.git apg-exp

or download it as a zip file.

Then in page.html:

<!-- optional stylesheet used in tutorial examples -->
<link rel="stylesheet" href="./apg-exp/apgexp.css">
<script src="./apg-exp/apgexp-min.js"></script>

<script>
  var useApgExp = function(){
      var exp = new ApgExp(pattern, flags); 
      var result = exp.exec(stringToMatch);
      /* do something with the result */
  }
</script>

CDN

You can also create a CDN version directly from the GitHub source using RawGit. However, be sure to read the no uptime or support guarantees (In fact, be sure to read the entire FAQ).

The following are used in all of the examples in this tutorial.

<link rel="stylesheet"
 href="https://cdn.rawgit.com/ldthomas/apg-js2-exp/89c6681798ba9e47583b685c87b244406b18a26d/apgexp.css">
<script
 src="https://cdn.rawgit.com/ldthomas/apg-js2-exp/c0fc3adac954a6f6ad6f265fd2f8f06f68001e10/apgexp-min.js"
 charset="utf-8"></script>

These files are cached on the MaxCDN servers and you are free to use them for testing as long as they remain available. However, for production, you should place copies of apgexp-min.js and apgexp.css on your own servers for guaranteed access and include them in your pages as best suited to your application.

A Short Guide to ABNF

ABNF is a syntax to describe phrases, a phrase being any string. As you saw in the email example above, it allows you to break down complex phrases into a collection of simpler phrases. A phrase definition has the form:

name = elements LF

where LF is a line feed (newline \n) character.

The table below is a short guide to the elements (see SABNF for the full guide).

Element	Definition	Comments/Examples
name	rule name	alphanum + hyphen (see note 1 below)
%d32	single character code	decimal value of the character code
%d97.98.99	string of character codes	abc
%d48-57	character code range	any ASCII digit 0-9
“abc”	case-insensitive literal string	abc or ABC, etc.
%s”aBc”	case-sensitive literal string	aBc only (= %d97.66.99)
space	concatenates two elements	%d97 %d98 (= ab)
/	separates two alternate elements	%d97 / %d98 (= a or b)
*element	zero or more repetitions of element	(see note 2 below)
(elements)	grouping, treated as single element	(see note 3 below)
[elements]	optional grouping	[%d97] %d98 (ab or b)
%^	beginning of input string	matches position only as empty phrase
%$	end of input string	matches position only as empty phrase
&element	look ahead for element	element must follow current string position
!element	look ahead for no element	element must NOT follow current string position
&&element	look behind for element	element must precede current string position
!!element	look behind for no element	element must NOT precede current string position
\name	back reference to rule “name”	matches previous phrase found for “name”
;comment	comment	comment runs from ; to end of line

Note 1: Rule names must begin with an alphabetic character in the first column of the line. Continuation lines must begin with a space or tab. The first rule defines the entire phrase or pattern to be matched. The following rules define the named sub-phrases (or named capture) within the entire phrase.

Note 2: The general form of a repetition is n*m, defining a minimum of n and maximum of m repetitions. Short hand notation can be *m for zero to m, n* for n to infinity or just nfor n*n.

Note 3: Groupings are important for keeping alternation and concatenation behaving as expected. Concatenation has tighter binding than alternation. If

phrase1 = elem (foo / bar) blat LF
phrase2 = (elem foo) / (bar blat) LF
phrase3 = elem foo / bar blat LF

then, phrase1 matches elem foo blat or elem bar blat and both phrase2 and phrase3 match elem foo or bar blat. Be careful and use groupings liberally.

Using apg-exp—A Few Examples

Now that you have apg-exp included in your app and know the basics of writing the pattern syntax, let’s go straight to the fun part and look at some examples of how to use it.

I’ve put the boring details of the constructed object in the next section, below. You can refer to it as needed to understand the examples. I’ll also skip error handling but you should be aware that if there is a pattern error, the constructor will throw an ApgExpError exception object which has a couple of handy functions for formatted display of the pattern errors. Your try/catch block might look something like this:

try {
  var exp = new ApgExp(pattern, flags);
  var result = exp.exec(stringToMatch);
  if (result) {
    // do something with results
  } else {
    // handle failure
  }
} catch(e) {
  if (e.name === "ApgExpError") {
    // display pattern errors to console
    console.log(e.toText());
    // display pattern errors to HTML page
    $("#errors").html(e.toHtml());
  } else {
    // handle other exceptions
  }
}

Telephone Numbers

Telephone numbers present a common form validation problem and a good opportunity to present some basics. Often you just want to know that 10 digits have been entered in a form that looks something like a phone number without worrying about the details of the North American Plan or international numbers.

Let’s just require that it starts with a parenthesis, (, or a digit and has blocks of 3, 3 & 4 digits separated by zero to three non-digits. That’s general enough to accept the most common formats and specific enough to grab the area, office and subscriber codes as capture groups for any reformatting.

Here’s what it looks like in ABNF:

phone-number = ["("] area-code sep office-code sep subscriber
area-code    = 3digit                       ; 3 digits
office-code  = 3digit                       ; 3 digits
subscriber   = 4digit                       ; 4 digits
sep          = *3(%d32-47 / %d58-126 / %d9) ; 0-3 ASCII non-digits
digit        = %d48-57                      ; 0-9

Example 2 demonstrates this and gives you an opportunity to vary the phone number formats:

Example 3 let’s you play with the pattern syntax and get a look at the error messages you might expect to see:

The RegExp syntax,

\(?(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})

is not difficult either and example 4 gives a side-by-side comparison.

Dates

Now let’s ramp it up a little an see how apg-exp and RegExp compare when matching dates.

Our date format requirements are:

mm/dd/yy or mm/dd/yyyy
dd/mm/yy or dd/mm/yyyy
mm, 1-12 or 01-12, i.e. with or without leading zero
dd, 1-31 or 01-31, i.e. with or without leading zero
yy, 00-99
yyyy, 1900-1999 or 2000-2099

The mm/dd/yyyy format itself is not so difficult, but constraining the numeric ranges is what ramps it up. Getting there with ABNF looks like this:

date     = %^ (mm-first / dd-first) %$
mm-first = mm "/" dd "/" yyyy   ; month before day
dd-first = dd "/" mm "/" yyyy   ; day before month
dd       = "0" digit1           ;    01-09
         / ("1"/"2") digit      ; or 10-29
         / "3" %d48-49          ; or 30-31
         / digit1               ; or 1-9
mm       = "0" digit1           ;    01-09
         / "1" %d48-50          ; or 10-12
         / digit1               ; or 1-9
yyyy     = ("19" / "20") 2digit ; 1900-1999 or 2000-2099
         / 2digit               ; or 00-99
digit    = %d48-57              ; 0-9
digit1   = %d49-57              ; 1-9

The comments make it fairly self-explanatory. Note that dd, mm and yyyy have the shortest alternative last. This is very important, since apg-exp always takes a “first match wins” approach to the alternatives. Alternates are tried left to right and once a match is found any remaining alternates are ignored. Here, this means that when a pattern can match one or two digits, the two-digit pattern needs to be first.

Following exactly the same approach as above, breaking down the date into alternate patterns of dd, mm and [yy]yy then combining them for the full date results in the following RegExp syntax:

^(?:((?:0[1-9])|(?:1[0-2])|(?:[1-9]))/((?:0[1-9])|(?:(?:1|2)[0-9])|(?:3[0-1])|(?:[1-9]))|((?:0[1-9])|(?:(?:1|2)[0-9])|(?:3[0-1])|(?:[1-9]))/((?:0[1-9])|(?:1[0-2])|(?:[1-9])))/((?:19|20)?[0-9][0-9])$

I’m not a RegExp expert, so there may be ways to shorten this, but this one works. You can compare for yourself in example 5. Jump over there and give it a work out.

Matching nested pairs (()) with recursion

Finally, I’d like to show how to match nested pairs of parentheses, brackets and the like. While this is a very important pattern-matching problem, you can’t do it with RegExp. Consider the following ABNF for matched pairs of parentheses:

P = L P R / L R
L = "("
R = ")"

Notice that the rule P appears in its own definition. This is called recursion. And while some flavors of regular expression engines do support recursion and some RegExp tools do provide a measure of recursion capabilities, JavaScript’s RegExp has no support for it at all. L and R above have been chosen to match parentheses, but they can be most anything as long as L can’t match the same thing as R.

Head over to example 6 and we’ll have some fun with recursion.

Before leaving the subject of matching, nested pairs, I’d like to demonstrate a couple of real-world examples that apg-exp can help you with.

In example 6 you saw how to match pairs within pairs and include text between the brackets. Suppose your assignment were to write a program such that: if the cursor is on an opening curly bracket, {, highlight the matching closing bracket, } .

Example 7 shows you a solution to this problem. You will need to understand sticky mode.

One final nested-pair example. Did you ever want to comment out a large block of HTML only to find that there were already comments in the block? Frustrating? Just do an Internet search for “HTML nested comment problems” and feel the frustration. Example 8 shows you one possible solution to this problem. Warning—this one might stretch you a little. You will need to understand the result.rules object and global mode.

Bringing it all together

To close out and bring it all together let’s see how a full form-validation example might look.

Typically when creating a new account you are asked for user name, email address, password and password confirmation. Let’s require that user names be 3-32 ASCII letters, hyphens and periods. The password must be 8-16 upper case letters, lower case letters or digits and must have at least one of each.

The form will display a descriptive error message above any invalid entry before continuing. Example 9 brings it all together.

See the Pen apg-exp: A RegExp Alternative: Example 9 by SitePoint (@SitePoint) on CodePen.

The library API

I’ve tried to provide enough information in this section to understand the examples above. However, there are a number of advanced features that require a more in-depth understanding of parsing theory to use and I’ve just indicated those with an “advanced” comment.

The input arguments

The apg-exp constructor, ApgExp here, takes up to four arguments.

var exp = new ApgExp(pattern[, flags[, nodeHits[, treeDepth]]]);

pattern
- string: An SABNF pattern syntax* as described in the previous section.
- object: An instantiated APG parser object. (Advanced option.)
flags: string: any of the characters "gyud"
- g – global mode: Iterate over all pattern matches (see Example 8)
- y – sticky mode: Anchors the match at exp.lastIndex (see Example 7)
- u – Unicode mode: Returns results as arrays of integer character codes instead of strings
- d – debug mode: Advanced option – exposes APG trace object
nodeHits: integer > 0: default = Infinity: Limits the matching algorithm to "nodeHits" steps. Protects against catastrophic backtracking.
treeDepth: integer > 0: default = Infinity: Limits the matching parser’s tree depth.

* For example, to match an alphanumeric name you might use the input string:

var pattern = "alphanum = alpha *(alpha / num)\n"
            + "alpha = %d65-90 / %d97-122\n"
            + "num = %d48-57\n";

Properties and methods

The constructed object, exp, itself has 16 properties and 14 methods.

Property	Type	Description
ast	object	(Advanced: the APG Abstract Syntax Tree object)
debug	boolean	true if the debug flag, `d`, was set, false otherwise
flags	string	a re-formatted copy of the flags argument
global	boolean	true if the global flag, `g`, was set, false otherwise
input	string/array(*)	a copy of the input string
lastIndex	integer	the character index to begin the match attempt at(**)
leftContext	string/array(*)	prefix of the matched pattern
lastMatch	string/array(*)	the matched pattern (=`result[0]`)
nodeHits	integer	the input value (see `result.nodeHits` for actual value)
rightContext	string/array(*)	the suffix of the matched pattern
rules	object	See `result.rules`. Only the last matched phrase is retained here.
source	string	the input SABNF pattern syntax string
sticky	boolean	true if the sticky flag, `y`, was set, false otherwise
trace	object	(Advanced: see the APG trace object.) `undefined` if the debug flag is false.
treeDepth	integer	the input value (see `result.treeDepth` for actual value)
unicode	boolean	true if the Unicode flag, `u`, was set, false otherwise

Note 1: If the unicode flag is true this will be an array of integer character codes, otherwise, a string.

Note 2: The user is free to set this to any value. The pattern match always begins at this index value. Its use and value after the match attempt are affected by global and sticky mode. See examples 7 and 8.

Method	Arguments	Description
defineUdt(name, func)	string, function	Advanced: used for SABNF’s UDT feature.
exclude([string])	array of rule names	a list of rule names to exclude from the `result.rules` object
exec(str)	string	attemps a pattern match in str
include([string])	array of rule names	list of rule names to include exclusively in the `result.rules` object
maxCallStackDepth()	none	returns upper bound on the call stack depth
replace(str, repl)	string, string or function	match patten in `str`, replace with `repl`
sourceToHtml()	none	returns input pattern in HTML format
sourceToHtmlPage()	none	returns input pattern as complete HTML page
sourceToText()	none	returns input pattern in ASCII text format
split(str[, limit])	string, integer	splits input string at pattern matches
test(str)	string	returns `true` if a pattern match is found, `false` otherwise (See example 9.)
toHtml()	none	returns this object’s properties in HTML format
toHtmlPage()	none	returns this object’s properties as complete HTML page
toText()	none	returns this object’s properties in ASCII text format

The result object

A successful pattern match returns the results in a result object with 7 properties and 3 methods:

var result = exp.exec(str);

Property	Type	Description
[0]	string/array(*)	the matched pattern
input	string/array(*)	a copy of the input string
index	integer	index of first character of the matched pattern
length	integer	the number of characters in the matched pattern
nodeHits	integer	the actual number of parsing steps required for the match
rules	object	(named capture) An array of all matched phrases for each named rule(***)
treeDepth	integer	the actual maximum parse tree depth reached during the match

Note 3: For example, result.rules["name"][i] = {phrase: string/array, index: integer} See example 8.

Method	Arguments	Description
toHtml()	none	returns properties in HTML format
toHtmlPage()	none	returns properties as a complete HTML page
toText()	none	returns properties in ASCII text format

Where to Go Next

I’ve tried to keep it to a minimum here just to give you a taste of what the apg-exp library and ABNF look like and how they stack up against RegExp. But there is much, much more you can do. If you want to improve your pattern matching skills or you are just feeling adventurous take look at these more advanced examples and refer to the full user’s guide.

Have I convinced you to give apg-exp a try in your next project? Do you think ABNF is easier to work with than regular expressions? I’d love to hear from you in the comments below!

Frequently Asked Questions (FAQs) about Alternatives to Regular Expressions

What are the limitations of using regular expressions?

Regular expressions, while powerful, have certain limitations. They can become complex and difficult to understand, especially for larger patterns. This complexity can lead to errors and make debugging difficult. Regular expressions also lack the ability to process recursive patterns and can be inefficient for certain tasks, leading to slow performance.

Why should I consider alternatives to regular expressions?

If you’re dealing with complex patterns or large amounts of data, alternatives to regular expressions can offer more efficiency and readability. They can also handle recursive patterns, which regular expressions cannot. Additionally, these alternatives often provide more user-friendly syntax and better error handling.

What are some popular alternatives to regular expressions?

There are several alternatives to regular expressions, including string methods, parsing libraries, and domain-specific languages (DSLs). String methods are simple and efficient for basic tasks. Parsing libraries, such as PEG.js and Nearley, offer more power and flexibility. DSLs, like XPath and CSS Selectors, are designed for specific tasks and can be more intuitive to use.

How do string methods compare to regular expressions?

String methods, such as “indexOf”, “substring”, and “split”, are simpler and more straightforward than regular expressions. They are ideal for basic tasks like finding a character in a string or splitting a string into an array. However, they lack the power and flexibility of regular expressions for more complex pattern matching.

What are parsing libraries and how do they work?

Parsing libraries, like PEG.js and Nearley, are tools that can analyze and interpret code or data. They work by breaking down the input into smaller parts, analyzing each part, and building a data structure. This process allows them to handle complex patterns and recursive structures, which regular expressions cannot.

Can you give an example of a domain-specific language (DSL)?

XPath and CSS Selectors are examples of domain-specific languages. XPath is used for selecting nodes in an XML document, while CSS Selectors are used for selecting elements in an HTML document. These DSLs are designed for specific tasks and can be more intuitive and efficient than regular expressions for those tasks.

What are the benefits of using a DSL over regular expressions?

DSLs are designed for specific tasks, making them more intuitive and efficient for those tasks than regular expressions. They also offer better error handling and can handle recursive patterns. However, they may not be as flexible or powerful as regular expressions for other tasks.

How can I choose the best alternative to regular expressions for my needs?

The best alternative depends on your specific needs. If you’re doing basic tasks, string methods may be sufficient. For more complex patterns or large amounts of data, a parsing library or DSL might be more suitable. Consider factors like the complexity of the task, the size of the data, and your familiarity with the tool.

Are there any resources for learning more about these alternatives?

Yes, there are many resources available online. The documentation for each tool is a great place to start. There are also tutorials, blog posts, and forums where you can learn more and ask questions.

Can I use multiple alternatives together?

Yes, you can use multiple alternatives together to suit your needs. For example, you might use string methods for simple tasks and a parsing library for more complex patterns. The key is to choose the right tool for the task.