I’m trying to write a function to encode a string as per RFC3987, similar to what rawurlencode does per RFC3986:
function preg_iriencode($url){
return preg_replace('/[\\x0-\\xc2\\x9f]|[\\xef\\xbf\\xb0-\\xef\\xbf\\xbd]/eu', 'rawurlencode("$0")', $url);
}
$url = 'Exclamation!Question?NBSP*Newline
Atsign@Tab Hyphen-Plus+Tilde~好';
echo 'iriencode='.number_format($iriencode/500, 30).' preg_iriencode='.number_format($preg_iriencode/500, 30);
//Expected result: Exclamation%21Question%3FNBSP Newline%0AAtsign%40Tab%09Hyphen-Plus%2BTilde~好
//Actual result: Exclamation%21Question%3FNBSP%C2%A0Newline%0AAtsign%40Tab%09Hyphen-Plus%2BTilde~好
Non breaking space (NBSP) is \xC2\xA0 in UTF8, the first character class in the regex only goes up to \xc2\x9f, so I don’t understand why it is being matched and so encoded?
(VBulletin seems to convert the nbsp in the test string into a * but the actual code does have an nbsp in it)