For a name regular expression which also should allow accented characters like in René, I made this regex:
!preg_match('~[1][\‘a-zÀ-ÿ \-]*$~i’
Is my use of À-ÿ good for that? I got that from the html entity list. Saw that all accented characters fell into that range.
But I’ve also just read about the alpha class which should take care of that? So I could also use this? And did I wrote it correct?
!preg_match(‘~^(<alpha|a-z>)([\’ \-]<alpha|a-z>)*$~i’
Just did. Think alpha is just the same as A-Z…
But just discovered that you can also use hex values in a php regex with \x. Looked up the hex values for acceted characters and this worked: [1][\'a-z\xC0-\xFF \-]*$
But is there anything wrong with using À-ÿ? Or is it safer to use hex values in that case?
The “alpha” class (correct syntax is [:alpha:]) matches any “letter”. The meaning of “letter” depends on the locale, in en-US locale this includes A-Z and accented characters in ISO-8859 encoding.
You can easily find out what it does on your system using a test code like this:
Does a php server have a standard charset setting? The hex codes work even if I haven’t set the charset in the php script specifically. Or is that because my browser has a default setting of iso8859?
This encoding should allways be iso 8859-1?
I’ve already set the index.php as iso 8859-1 and all posted values are first converted to iso 8859-1 with utf8_decode, so names like René display as René in sent email (and not with funny codes).
Encoding and locale are different issues. PHP assumes that strings are ISO-8859-1.
Browsers will send data back, using the same charset as the page was served in. Thus you shouldn’t use utf8_decode on incoming data.