Regular expression for al utf8 characters

Hi,

I am new with using regular expressions, I am trying to test a string that can contain all utf-8 alphabetic characters (therefore eèéĕE… etc), white spaces (but no consecutive white spaces), no numbers, only single quotes (but no consecutive single quotes).

In my code I have tried this:
reading to Unicode character properties in PHP manual, I have inserted \pL in the pattern and also a optional whitespace and a singlequote but the result is invalid!

And also I am trying to put instead delimiters for start and end of the string using ^ and $ but obtain an error and the script doesn’t run.

Can you help me, many thanks!

   $text = "tèst";

   if (preg_match("/\pL\s?'/", $text) == 1) {
      echo "valid!";
   } else {
      echo "not valid!";
   }

You forgot to mention the error message so only a guess.

Use the UTF-8 pattern modifier?

u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern and the subject is checked since PHP 4.3.5. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8.

The error message appears only when instead of using / as delimiters, I used ^ and $ as beginning and end of the line.
preg_match("^\pL\s?'$", $text)

Warning: preg_match(): No ending delimiter ‘^’

And also retrieves “not valid!”

I have tried what you have said:

preg_match("/\pL\s/u", $text)
but I obtained “not valid”

That’s because tèst doesn’t match that pattern.
tè st will, if that’s what you want it to find.

else try putting the “zero or one” ? after the \s space meta character.

Found it!
I don’t know if can be useful to someone but,
This is the regex I done, I think that it is right, but I don’t know if there is a better way to write it
/^(\pL{1,}[ ]?)+$/u

i.e. accept “multiple words” (phrases) that contain 1 or more uf8 characters followed by optional space (but no consecutive spaces)

or also this:
/^(\pL{1,}[ ]?)+[^ ]$/u
don’t accept spaces after the last character.

All the regex I have used, match the following string perfectly:

$text = "tè st statement";

But, in all my attempts I have done, I have obtained different results, can you help me to explain me what happens?

I have also used the online tool, https://regex101.com/, that says to me the exact position of a character in the given array, and accept the flag /u.

1)

as last post

preg_match('/^(\pL{1,}[ ]?)+[^ ]$/u', $text, $matches)

doing var_dump of $matches I obtain the following:

array (size=2)
  0 => string 'tè st statement' (length=16)
  1 => string 'statemen' (length=8)

2)

here I have removed one or more {1,} since there is the plus symbol + after the round parenthesis that can do the same

preg_match('/^(\pL[ ]?)+[^ ]$/u', $text, $matches)

using var_dump:

array (size=2)
  0 => string 'tè st statement' (length=16)
  1 => string 'n' (length=1)

3)

here I have removed the rule that deny a space after the last letter (maybe, here, does the var dump produce the wanted result ?! :confused:)

preg_match('/^(\pL[ ]?)+$/u', $text, $matches)

doing var_dump, in the array 1 I obtain the last letter:

array (size=2)
  0 => string 'tè st statement' (length=16)
  1 => string 't' (length=1)

Can you explain me better? many thanks!

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.