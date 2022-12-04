I am looking to import and parse markdown from a MongoDB database. I have already achieved that using a package called ‘marked’.

I previously wrote a parser in PHP to style the javascript codeblocks, and I am now re-writing that in JS.

Basically it wraps span tags around keywords, comments etc. e.g.

<span class='keyword'>const</span> x = <span class='string'>'A string'</span>

The following seems to work, but wondered if there was a slicker approach.

Sample text file as source:

const rx = /([0-9]+)|([a-z]+)/gmi const highlightJS = (markup) => { return markup.replaceAll(tokensRx, (...args) => { // last index is named matches e.g. // { string: undefined, comment: '// comment here', ...} const [matches] = args.slice(-1) for (const type in matches) { if (matches[type] !== undefined) { return `<span class='${type}'>${matches[type]}</span>` } } }) }

Javascript Parser

// Using named capture groups const tokens = [ '(?<strings>([\\u0027"`])[\\s|\\S]*?(?<!\\u005C)\\2)', '(?<comments>(?<!:)\\u002F{2}.*|\\u002F\\u002A[\\s\\S]*?\\u002A\\u002F)', '(?<regex>(?<!\\/)\\/[^\\/]+\\/[a-zA-Z]{0,3})', '(?<spread>(\\.{3}))', '(?<props>(?<=\\w\\.)\\w+)', '(?<numbers>\\b\\d+(?:\\.\\d+)?\\b)', '\\b(?<keywords>abstract|arguments|await|boolean|break|byte|case|catch|char|class(?!=\s*?=)|const|continue|debugger|default|delete|do|double|else|enum|eval|export|extends|false|final|finally|float|for|function|goto|if|implements|import|in|instanceof|int|interface|isNaN|let|long|native|new|null|package|private|protected|public|return|short|static|super|switch|synchronized|this|throw|throws|transient|true|try|typeof|undefined|var|void|volatile|while|with|yield)\\b', '\\b(?<builtIn>Object|Array|Function|String|Number|null|undefined|Symbol|BigInt)\\b', '(?<brackets>[\\{\\(\\[\\]\\)\\}])' ] // join all regexes const tokensRx = new RegExp(tokens.join('|'), 'gm') const highlightJS = (markup) => { return markup.replaceAll(tokensRx, (...args) => { // last index contains named matches e.g. // { strings: undefined, comment: '// comment here', regex: undefined ...} const [matches] = args.slice(-1) for (const tokenType in matches) { if (matches[tokenType] !== undefined) { const span = document.createElement('span') span.className = tokenType span.textContent = matches[tokenType] return span.outerHTML } } }) }

Here is a codepen showing the output



I am aware of preexisting packages like highlightJS.

A quick play last night with highlightJS, and it seems that everything needs to be rendered to the DOM before it then styles code blocks.

With my approach I will be styling the HTML prior to rendering to the page with EJS.