I am looking to import and parse markdown from a MongoDB database. I have already achieved that using a package called ‘marked’.
I previously wrote a parser in PHP to style the javascript codeblocks, and I am now re-writing that in JS.
Basically it wraps span tags around keywords, comments etc. e.g.
<span class='keyword'>const</span> x = <span class='string'>'A string'</span>
The following seems to work, but wondered if there was a slicker approach.
Sample text file as source:
const rx = /([0-9]+)|([a-z]+)/gmi
const highlightJS = (markup) => {
return markup.replaceAll(tokensRx, (...args) => {
// last index is named matches e.g.
// { string: undefined, comment: '// comment here', ...}
const [matches] = args.slice(-1)
for (const type in matches) {
if (matches[type] !== undefined) {
return `<span class='${type}'>${matches[type]}</span>`
}
}
})
}
Javascript Parser
// Using named capture groups
const tokens = [
'(?<strings>([\\u0027"`])[\\s|\\S]*?(?<!\\u005C)\\2)',
'(?<comments>(?<!:)\\u002F{2}.*|\\u002F\\u002A[\\s\\S]*?\\u002A\\u002F)',
'(?<regex>(?<!\\/)\\/[^\\/]+\\/[a-zA-Z]{0,3})',
'(?<spread>(\\.{3}))',
'(?<props>(?<=\\w\\.)\\w+)',
'(?<numbers>\\b\\d+(?:\\.\\d+)?\\b)',
'\\b(?<keywords>abstract|arguments|await|boolean|break|byte|case|catch|char|class(?!=\s*?=)|const|continue|debugger|default|delete|do|double|else|enum|eval|export|extends|false|final|finally|float|for|function|goto|if|implements|import|in|instanceof|int|interface|isNaN|let|long|native|new|null|package|private|protected|public|return|short|static|super|switch|synchronized|this|throw|throws|transient|true|try|typeof|undefined|var|void|volatile|while|with|yield)\\b',
'\\b(?<builtIn>Object|Array|Function|String|Number|null|undefined|Symbol|BigInt)\\b',
'(?<brackets>[\\{\\(\\[\\]\\)\\}])'
]
// join all regexes
const tokensRx = new RegExp(tokens.join('|'), 'gm')
const highlightJS = (markup) => {
return markup.replaceAll(tokensRx, (...args) => {
// last index contains named matches e.g.
// { strings: undefined, comment: '// comment here', regex: undefined ...}
const [matches] = args.slice(-1)
for (const tokenType in matches) {
if (matches[tokenType] !== undefined) {
const span = document.createElement('span')
span.className = tokenType
span.textContent = matches[tokenType]
return span.outerHTML
}
}
})
}
Here is a codepen showing the output
I am aware of preexisting packages like highlightJS.
A quick play last night with highlightJS, and it seems that everything needs to be rendered to the DOM before it then styles code blocks.
With my approach I will be styling the HTML prior to rendering to the page with EJS.