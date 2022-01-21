To start with my PHP is pretty poor.

My goal is to take a javascript code block (string format) and wrap strings, comments, keywords, built-ins etc in span tags with the appropriate class name. I want to do some colouring in

I need to be able to isolate those parts, so for instance I don’t want keywords matching ‘this’ in a string, or ‘for’ in a comment. Order seems to be important here.

I’m looking at preg_split which is actually doing quite a nice job. The downside though is I need that little bit of extra data in the form of a classname — a tuplet I think it is what I am after.

So instead of getting this

[1]=> string(5) "const" [2]=> string(6) " x = " [3]=> string(2) "10"

I end up with something like this

[1]=> array(2) ["const", "js_keyword"] [2]=> array(1) [" x = "] [3]=> array(2) ["10", "js_number"]

I’m thinking preg_split isn’t goint to cut it, preg_split_callback might have been nice, but it illustrates where I am going with this.

In the end, I want to re-assemble with something like array_reduce, wrapping the returned strings in spans if index 1 exists.

This is a sample of what I am playing with

<?php $codeTypes = [ 'js_string' => '((["\'`])[^\2]+?\2)', 'js_comment' => '((?<!:)\/\/.*|\/\*[\s\S]+\*\/)', 'js_keyword' => '(\babstract\b|\barguments\b|\bawait\b|\bboolean\b|\bbreak\b|\bbyte\b|\bcase\b|\bcatch\b|\bchar\b|\bclass(?!=)\b|\bconst\b|\bcontinue\b|\bdebugger\b|\bdefault\b|\bdelete\b|\bdo\b|\bdouble\b|\belse\b|\benum\b|\beval\b|\bexport\b|\bextends\b|\bfalse\b|\bfinal\b|\bfinally\b|\bfloat\b|\bfor\b|\bfunction\b|\bgoto\b|\bif\b|\bimplements\b|\bimport\b|\bin\b|\binstanceof\b|\bint\b|\binterface\b|\blet\b|\blong\b|\bnative\b|\bnew\b|\bnull\b|\bpackage\b|\bprivate\b|\bprotected\b|\bpublic\b|\breturn\b|\bshort\b|\bstatic\b|\bsuper\b|\bswitch\b|\bsynchronized\b|\bthis\b|\bthrow\b|\bthrows\b|\btransient\b|\btrue\b|\btry\b|\btypeof\b|\bvar\b|\bvoid\b|\bvolatile\b|\bwhile\b|\bwith\b|\byield\b)' ]; $sampleHtml = <<<END const x = 10 // the number 10 const entries = Object.entries({x: 2, y: 6}) for(let i = 0; i < x; i++) { if (i % 2 == 0) console.log('this i is even') } /* this is a comment block not an Object */ const elements = document.querySelectorAll('.my-elements') // a 'string' class MyClass { constructor(x, y) { this.x = x; this.y = y } } END; var_dump(preg_split('/' . implode('|', $codeTypes) . '/', $sampleHtml, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE));

output

array(31) { [0]=> string(8) " " [1]=> string(5) "const" [2]=> string(8) " x = 10 " [3]=> string(17) "// the number 10 " [4]=> string(9) " " [5]=> string(5) "const" [6]=> string(59) " entries = Object.entries({x: 2, y: 6}) " [7]=> string(3) "for" [8]=> string(1) "(" [9]=> string(3) "let" [10]=> string(35) " i = 0; i < x; i++) { " [11]=> string(2) "if" [12]=> string(26) " (i % 2 == 0) console.log(" [13]=> string(16) "'this i is even'" [14]=> string(1) "'" [15]=> string(22) ") } " [16]=> string(92) "/* this is a comment block not an Object */" [17]=> string(10) " " [18]=> string(5) "const" [19]=> string(38) " elements = document.querySelectorAll(" [20]=> string(14) "'.my-elements'" [21]=> string(1) "'" [22]=> string(2) ") " [23]=> string(14) "// a 'string' " [24]=> string(11) " " [25]=> string(5) "class" [26]=> string(63) " MyClass { constructor(x, y) { " [27]=> string(4) "this" [28]=> string(25) ".x = x; " [29]=> string(4) "this" [30]=> string(32) ".y = y } }" }

I am aware of highlightJS, some very clever coding, but I have got my teeth into this now and it saves that extra dependency.

Advice would be appreciated.