UTF-8 Regular Expression

Hello,

I have made function that checks if username contains only letter and numbers, but when I use some UTF-8 (e.g ĆŠĐ) letter it doesn’t work. My reg exp is “/[1]+$/”.

Can you help me to solve this problem?

Best regards,
Mark.


  1. a-zA-Z0-9 ↩︎

Did you try adding À-ÿ to the regex to include accented characters?

It fails again.

You don’t have to use a-zA-Z if you have \p{L}. And remember unicode pattern matching is much much slower than standard ascii matching - something to watch out for in loops, etc.

1 Like

Previous reg exp that I posted doesen’t work, I was wrong.

How should i check if string contains only letters(including ćšđž) and numbers?
Should i write to pattern something like šžđć?

You could try
\w
if you don’t mind underscores also being accepted.

Can you tell us what how it doesn’t work? This:

preg_match('/^[\p{L}0-9]+$/u', 'abcABC123ćšđĆŠĐ', $matches);

works fine for me.

1 Like

It works now, thank you very much sir!

BTW, you may want to be a bit more specific what kind of letters you want to accept. \p{L} will accept letters from any language in the world including Cyrillic, Arabic, Chinese, Japanese and all other exotic letters. You can choose a specific alphabet according to the supported scripts table. For example, to accept only Latin and Cyrillic alphabets (including accented characters):

preg_match('/^[\p{Latin}\p{Cyrillic}0-9]+$/u', 'abcABC123ЖжЗćšđĆŠĐß', $matches);
3 Likes

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.