First syntax highlighter

I am trying to create a syntax highlighter for JS practice.

This is the code →

<html lang="en" dir="ltr">
	<head>
		<style media="screen">
			code {
				display: block;
				margin: auto;
				white-space: pre-wrap;
				border: 1px solid #000;
				padding: 10px;
				line-height: 1.5em;
				font-family: "Lucida Console", Monaco, monospace;
				max-width: 900px;
			}
			.code-elem {
				color: #D43B07;
				font-weight: bold;
			}
			.code-str {
				color: #090;
			}
			.margin {
				margin: auto;
			}
		</style>
		<script type="text/javascript">
			function syntaxhighlights() {
					var ca = document.getElementsByTagName("code");
					for(var i=0; i<ca.length; i++){
						var data =  ca[i].innerHTML;
						data = data.replace(/&quot;(.*?)&quot;/g,'<span class="code-str">&quot;$1&quot;</span>');
						data = data.replace(/&lt;(.*?)&gt;/g,'<span class="code-elem">&lt;$1&gt</span>');
						ca[i].innerHTML = data;
					}
			}
			window.addEventListener("load", syntaxhighlights);
		</script>
	</head>
	<body>
		<h2 class="margin">Code Example:</h2>
		<code>&lt;h2 id="h2class"&gt;Welcome Vistors&lt;/h2&gt;<br>&lt;p&gt;This place is a dream. Only a sleeper considers it real. Then death comes like dawn, and you wake up laughing at what you thought was your grief.&lt;/p&gt;
		</code>
	</body>
</html>

But string part, h2class, for example is not turning green.

No it won’t, because &quot; is quite different from a " doublequote.

1 Like

Thanks for coming in the thread sir. There must be some way to get it working?

You could use &quot; instead of " or " instead of &quot;. Those are two different options that can get it working.

Edit: Using &quot; seems to be troublesome, so stick with the " doublequote and you should be fine.

Please forgive me I am slightly confused. Please bifurcate the statement.

Sure thing. You could either use &quot; instead of ", or you could use " instead of &quot;.

I recommend the latter option.

1 Like

Not sure if this is what you are looking for

data = data.replace(/(['"])([^'"]*)\1/g,`<span class='code-str'>&quot;$2&quot;</span>`)

But sir please see the image. Isn’t that I am using the same thing, but still not getting result or may be yet I am missing what you are trying to say?

This was creating issue:

/";(.*?)";/g,

This is the fix:

/"(.*?)"/g,

I read your comments carefully and realized the mistake.

1 Like

@Paul_Wilkins Why was I making such mistakes I think I am missing on some fundamentals. Is there anything I need to know and study. Would it be possible if you can direct me to read and browse some online resources.

Doing proper syntax highlighting is a difficult and complex system. You need to break down the content into tokens and other systems of hierarchy.

What can help though it to look at how other code libraries achieve syntax highlighting, such as Prism, lolight, or McHighlight.

A deeper dive into the nuts and bolts of syntax highlighting is given in the following article too:

Implementing a Syntax-Highlighting JavaScript Editor—In JavaScript

1 Like

Thanks, very enlightening, but sir I was getting/facing issues in this part:

data.replace(/"(.*?)"/g,'<span class="code-str">&quot;$1&quot;</span>');

Are there any recommendation to read few things so that I do not make mistakes in this part.

Using tests to exercise expected behaviour of your code is a very good way to ensure that your code continues to work correctly as you work on it.

1 Like

Thanks for the input.

I was coming up with a few simple tests to exercise your code.

Making the code easier to test

To easily test the code, it helps if the functions require less assumptions. This was easily achieved by moving your conversion code out to a separate syntaxHighlight() function.

function syntaxHighlight(data) {
    data = data.replace(/"(.*?)"/g,'<span class="code-str">&quot;$1&quot;</span>');
    data = data.replace(/&lt;(.*?)&gt;/g,'<span class="code-elem">&lt;$1&gt</span>');
    return data;
}
function syntaxhighlights() {
    var ca = document.getElementsByTagName("code");
    for(var i=0; i<ca.length; i++){
        ca[i].innerHTML = syntaxHighlight(ca[i].innerHTML);
    }
}

That way, the syntaxHighlight function can be given a simple string as data, and the returned value can be easily checked.

Testing quote highlight

Here is the quote highlight test:

console.assert(
    syntaxHighlight('<h2 id="h2class">') === 
    '<h2 id=<span class="code-str">&quot;h2class&quot;</span>>',
    "Should highlight quotes"
);

And it works well, only outputting to the console when the test fails.

Testing tag highlight

Here is the tag highlight test:

console.assert(
    syntaxHighlight('&lt;p&gt;') === 
    '<span class="code-elem">&lt;p&gt;</span>',
    "Should highlight tags"
);

This test failed, outputting to the console that it should highlight tags.

Was something wrong with my test, or was it the code?

Investigating the cause

We can log out the result from syntaxHighlight to find out more details about things.

console.log(syntaxHighlight('&lt;p&gt;'));

And we get the following output:

<span class="code-elem">&lt;p&gt</span>

Do you see how the &gt is missing its semicolon? It should be $gt; instead.
Let’s investigate the code to find out why that’s happening.

Here is the line responsible:

    data = data.replace(/&lt;(.*?)&gt;/g,'<span class="code-elem">&lt;$1&gt</span>');

Fixing the cause of the problem

Near the end of the above line of code where it has &gt</span>, that should instead be &gt;</span>

    data = data.replace(/&lt;(.*?)&gt;/g,'<span class="code-elem">&lt;$1&gt;</span>');

And the asserts no longer log out any problems. The code is now better than it was.

Summary

That has been a good demonstration of using tests. They help to ensure that the code works in expected and desirable ways.

The full code that was used is:

function syntaxHighlight(data) {
    data = data.replace(/"(.*?)"/g,'<span class="code-str">&quot;$1&quot;</span>');
    data = data.replace(/&lt;(.*?)&gt;/g,'<span class="code-elem">&lt;$1&gt;</span>');
    return data;
}
function syntaxhighlights() {
    var ca = document.getElementsByTagName("code");
    for(var i=0; i<ca.length; i++){
        ca[i].innerHTML = syntaxHighlight(ca[i].innerHTML);
    }
}
window.addEventListener("load", syntaxhighlights);

// Tests
console.assert(
    syntaxHighlight('<h2 id="h2class">') === 
    '<h2 id=<span class="code-str">&quot;h2class&quot;</span>>',
    "Should highlight quotes"
);
console.assert(
    syntaxHighlight('&lt;p&gt;') === 
    '<span class="code-elem">&lt;p&gt;</span>',
    "Should highlight tags"
);

The code with the asserts can be found at https://jsfiddle.net/rht5f976/?editor_console=0

2 Likes

How that we have those tests in place, we can more easily work with the code while retaining the assurance that things work properly.

Updating element collection

For example, we can replace getElementsByTagName with querySelectorAll instead:

    var ca = document.querySelectorAll("code");

Using forEach method

Now that the elements are in an iterable collection, we can use the forEach method to loop through them:

    // for(var i=0; i<ca.length; i++){
    //     ca[i].innerHTML = syntaxHighlight(ca[i].innerHTML);
    // }
    ca.forEach(function (code) {
        code.innerHTML = syntaxHighlight(code.innerHTML);
    });

Trying different regular expressions

I notice that the regular expressions are duplicating some things such as &lt;. Can they be included in the capture group?

    // data = data.replace(/&lt;(.*?)&gt;/g,'<span class="code-elem">&lt;$1&gt;</span>');
    data = data.replace(/(&lt;.*?&gt;)/g,'<span class="code-elem">$1</span>');

Yes, that works well.

Can the same be done with the quotes?

    // data = data.replace(/"(.*?)"/g,'<span class="code-str">&quot;$1&quot;</span>');
    data = data.replace(/&quot;(.*?)&quot;/g,'<span class="code-str">$1</span>');

We end up with a failed assertion, because the assertion expected &quot; and instead got ".

Can we use " in the regular expression instead? No we can’t. Even when the code is stopped from working, the web browser converts &quot; into " but keeps &lt; and &gt; as they are.

That is why the regular expression needs to use " instead of &quot;

We can then properly update the test as we now better understand what’s going on there:

console.assert(
    syntaxHighlight('<h2 id="h2class">') === 
    '<h2 id=<span class="code-str">"h2class"</span>>',
    "Should highlight quotes"
);

And the code needs to use " instead of &quot; because it’s not possible to retrieve the &quot; entity code back from the DOM again.

    // data = data.replace(/&quot;(.*?)&quot;/g,'<span class="code-str">$1</span>');
    data = data.replace(/"(.*?)"/g,'<span class="code-str">$1</span>');

And the tests work fine again.

The benefits of the regular expressions being:

    data = data.replace(/(".*?")/g,'<span class="code-str">$1</span>');
    data = data.replace(/(&lt;.*?&gt;)/g,'<span class="code-elem">$1</span>');

is that the replacement focuses on what gets added around the matched text, without needing to worry about anything else from what was matched.

Summary

The updated code is :slight_smile:

function syntaxHighlight(data) {
    data = data.replace(/(".*?")/g,'<span class="code-str">$1</span>');
    data = data.replace(/(&lt;.*?&gt;)/g,'<span class="code-elem">$1</span>');
    return data;
}
function syntaxhighlights() {
    var ca = document.querySelectorAll("code");
    ca.forEach(function (code) {
        code.innerHTML = syntaxHighlight(code.innerHTML);
    });
}
window.addEventListener("load", syntaxhighlights);

// Tests
console.assert(
    syntaxHighlight('<h2 id="h2class">') === 
    '<h2 id=<span class="code-str">"h2class"</span>>',
    "Should highlight quotes"
);
console.assert(
    syntaxHighlight('&lt;p&gt;') === 
    '<span class="code-elem">&lt;p&gt;</span>',
    "Should highlight tags"
);

And the updated code is found at: https://jsfiddle.net/f6x1a8zk/?editor_console=0

1 Like

You have written a full tutorial. It will take some significant time to digest the content. I will come back to you if I have any questions. Thank you again. You are wonderful.

1 Like