To start with my PHP is pretty poor.
My goal is to take a javascript code block (string format) and wrap strings, comments, keywords, built-ins etc in span tags with the appropriate class name. I want to do some colouring in
I need to be able to isolate those parts, so for instance I don’t want keywords matching ‘this’ in a string, or ‘for’ in a comment. Order seems to be important here.
I’m looking at preg_split which is actually doing quite a nice job. The downside though is I need that little bit of extra data in the form of a classname — a tuplet I think it is what I am after.
So instead of getting this
[1]=>
string(5) "const"
[2]=>
string(6) " x = "
[3]=>
string(2) "10"
I end up with something like this
[1]=>
array(2) ["const", "js_keyword"]
[2]=>
array(1) [" x = "]
[3]=>
array(2) ["10", "js_number"]
I’m thinking preg_split isn’t goint to cut it, preg_split_callback might have been nice, but it illustrates where I am going with this.
In the end, I want to re-assemble with something like array_reduce, wrapping the returned strings in spans if index 1 exists.
This is a sample of what I am playing with
<?php
$codeTypes = [
'js_string' => '((["\'`])[^\2]+?\2)',
'js_comment' => '((?<!:)\/\/.*|\/\*[\s\S]+\*\/)',
'js_keyword' => '(\babstract\b|\barguments\b|\bawait\b|\bboolean\b|\bbreak\b|\bbyte\b|\bcase\b|\bcatch\b|\bchar\b|\bclass(?!=)\b|\bconst\b|\bcontinue\b|\bdebugger\b|\bdefault\b|\bdelete\b|\bdo\b|\bdouble\b|\belse\b|\benum\b|\beval\b|\bexport\b|\bextends\b|\bfalse\b|\bfinal\b|\bfinally\b|\bfloat\b|\bfor\b|\bfunction\b|\bgoto\b|\bif\b|\bimplements\b|\bimport\b|\bin\b|\binstanceof\b|\bint\b|\binterface\b|\blet\b|\blong\b|\bnative\b|\bnew\b|\bnull\b|\bpackage\b|\bprivate\b|\bprotected\b|\bpublic\b|\breturn\b|\bshort\b|\bstatic\b|\bsuper\b|\bswitch\b|\bsynchronized\b|\bthis\b|\bthrow\b|\bthrows\b|\btransient\b|\btrue\b|\btry\b|\btypeof\b|\bvar\b|\bvoid\b|\bvolatile\b|\bwhile\b|\bwith\b|\byield\b)'
];
$sampleHtml = <<<END
const x = 10 // the number 10
const entries = Object.entries({x: 2, y: 6})
for(let i = 0; i < x; i++) {
if (i % 2 == 0) console.log('this i is even')
}
/*
this is a
comment block
not an Object
*/
const elements = document.querySelectorAll('.my-elements') // a 'string'
class MyClass {
constructor(x, y) {
this.x = x;
this.y = y
}
}
END;
var_dump(preg_split('/' . implode('|', $codeTypes) . '/', $sampleHtml, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE));
output
array(31) {
[0]=>
string(8) " "
[1]=>
string(5) "const"
[2]=>
string(8) " x = 10 "
[3]=>
string(17) "// the number 10
"
[4]=>
string(9) "
"
[5]=>
string(5) "const"
[6]=>
string(59) " entries = Object.entries({x: 2, y: 6})
"
[7]=>
string(3) "for"
[8]=>
string(1) "("
[9]=>
string(3) "let"
[10]=>
string(35) " i = 0; i < x; i++) {
"
[11]=>
string(2) "if"
[12]=>
string(26) " (i % 2 == 0) console.log("
[13]=>
string(16) "'this i is even'"
[14]=>
string(1) "'"
[15]=>
string(22) ")
}
"
[16]=>
string(92) "/*
this is a
comment block
not an Object
*/"
[17]=>
string(10) "
"
[18]=>
string(5) "const"
[19]=>
string(38) " elements = document.querySelectorAll("
[20]=>
string(14) "'.my-elements'"
[21]=>
string(1) "'"
[22]=>
string(2) ") "
[23]=>
string(14) "// a 'string'
"
[24]=>
string(11) "
"
[25]=>
string(5) "class"
[26]=>
string(63) " MyClass {
constructor(x, y) {
"
[27]=>
string(4) "this"
[28]=>
string(25) ".x = x;
"
[29]=>
string(4) "this"
[30]=>
string(32) ".y = y
}
}"
}
I am aware of highlightJS, some very clever coding, but I have got my teeth into this now and it saves that extra dependency.
Advice would be appreciated.