SitePoint Sponsor

User Tag List

Results 1 to 18 of 18
  1. #1
    SitePoint Zealot
    Join Date
    Nov 2008
    Location
    Italy
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Arrow regexp finding all accent combination of a same word.

    Hello everybody,

    please I would to know if is it possible to match all accented variant of a given word. I'll try to explain better:
    - For example I have a word like 'd¨ck' or 'lÓmb' or 'lámb' or 'lýes' or '¨niversity'.
    So I say you if is it possible to find, these words even if (for istance in a input box for highlight them...) I consider that words without accent like duck, lamb, lies, etc....

    I hope I have been clear enough!

  2. #2
    SitePoint Addict wibble wobble's Avatar
    Join Date
    Dec 2008
    Posts
    242
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,
    Functions soundex() and metaphone() do this.
    Find freelance jobs from all the major sites in one place:
    on twitter / on the web / twitter rss feed

  3. #3
    SitePoint Zealot
    Join Date
    Nov 2008
    Location
    Italy
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Arrow

    No because this method is valid only for english words.
    I'm try to explain better... Making a raw example... I consider an array for the letter 'a' with all important accented letter and the same for e,i,o,u:
    Code:
    
    array('a' => ('à', 'á'), 
          'e' => ('è', 'é'),
          'i'  => ('ì',, 'í'))
    ...
    and so on... I want that the words that contain given letter with accent (in this case in array) will be treated as word without accent for searching reasons. So even if I write down a word without accent, at the end I will find both the words without accent and the same words with accent.
    I don't know but maybe something like a list of characters in regexp... these are the characters I want to consider [èéàáìíòóùú]. The others as -âãäåėěîïôõöûü- are not important, for me for my purposes.

  4. #4
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    There isn't any PHP function in-built to do this, but a simple google search to a PHP.net page found this:
    http://uk3.php.net/strtr#85556
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  5. #5
    SitePoint Zealot
    Join Date
    Nov 2008
    Location
    Italy
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes but I have done the same with str_replace. And however with strtr() is faster, thanks.
    However let's go in details.
    Before, I was speaking about regexp because I have used regexp, see this:
    Code:
    preg_replace("/($single_word)/i", '<span class="highlight_word">\1</span>', $text);
    this regexp finds both words with accent and not and it is insensible (i). Ok but it goes correctly only if I write the accent, where present, of a word.
    I want to use this function also to match an accented word even if I write the same word with not accented letters.

    One solution is to convert all $text without accent, but doing so I lose all original text accents, the function will be right only for non accented words and return a 'useless' text full of not accented words.

  6. #6
    SitePoint Member
    Join Date
    Dec 2008
    Posts
    22
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    maybe you can convert with iconv and then process in regex.

  7. #7
    SitePoint Zealot
    Join Date
    Nov 2008
    Location
    Italy
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Arrow

    Yes but after the $text conversion, I lose my original string.

    The path to follow, till now, should be to check with the regex within the converted string, apply the highlight and return 'the original string' with the 'original word' highlighted even if the check has been done on the modified string without accents...

    here's why I asking about regex (if exists). It will be the faster than other methods.... If possible to add a kind of class of characters within the regex pattern I have submitted....

  8. #8
    SitePoint Member
    Join Date
    Dec 2008
    Posts
    22
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    this code must be work properly

    PHP Code:
    <?php 
    $temp 
    "any text containing unusual characters "//input text

    $temp1 explode(" ",$temp); //explode words 

    foreach ($temp1 as $value) {
    $temp2 iconv("iso-8859-1","utf-8//TRANSLIT",$value); //change charsets whatever you want

    if (stripos($temp2,"any") !== FALSE) { 
    $output .= "<b>".$value."</b> ";
    } else {
    $output .= $value." ";
    }
    }
    echo 
    "<br>".$output;
    ?>

  9. #9
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    But again that doesn't help at all because he's looking to apply regex.

    What regex would you like to apply if the letters were in normal format?
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  10. #10
    SitePoint Member
    Join Date
    Dec 2008
    Posts
    22
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    it works like his regex. change "any" in

    if (stripos($temp2,"any") !== FALSE) {

    my code is searching "any" word and replace it "<b>any</b>". also replace "anyone" and replace "<b>anyone</b>".

  11. #11
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    That isn't regex.

    Regex has much more power than any flat string function could be stretched too.

    But, as he said, he wants to match:
    Enter &#193;ny Charact&#233;r
    And return the string, along with the &#193; and &#233;.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  12. #12
    SitePoint Zealot
    Join Date
    Nov 2008
    Location
    Italy
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The idea of MC_delta_T is good and it is the same of mine.
    Arkinstall is right. I want to return the string exactly.
    The work to do about accents etc.. will be done only for recognize the string and apply the span tag.

  13. #13
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If I understand correctly, you have a specific list of $words you want to match, and a specific list of accented characters.
    PHP Code:
    header('content-type:text/plain');
    $words = array(
        
    'duck'
      
    'lámb'
      
    'ùniversity'
    );
    $text "
    'dùck' or 'lAmb' or 'l&aacute;mb' or 'lìes' or 'ùniversity'.
    "
    ;
    $collation = array(
        array(
    'a''à''á')
      , array(
    'e''è''é')
      
    // etc...
    );
    $find = array();
    $replace = array();
    foreach (
    $collation as $chars) {
        
    $tmp '[' join($chars) . ']';
        
    $find[] = "/$tmp/";
        
    $replace[] = $tmp;
    }

    $prepped = array();
    foreach (
    $words as $word) {
        
    $prepped[] = preg_replace($find$replacepreg_quote($word'/'));
    }

    echo 
    $pattern sprintf('/\b(%s)\b/i'join('|'$prepped));
    echo 
    preg_replace($pattern'<span>$1</span>'$text); 
    The case insensitive regex flag seems to work from my single test of the A character, but I beleive that depends on the locale...you may want to explicitly include the uppercase varients in the $collation to make sure it always works.

    Almost none of this really needs to be done at runtime. You could save $pattern to an array and just load it. Keep in mind there's a max length to a regex(i think around 36000 chars, but its a setting) so if you have a lot of $words to search for, then you will need to split it over multiple $pattern's.

  14. #14
    SitePoint Zealot
    Join Date
    Nov 2008
    Location
    Italy
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    crmalibu, I think that what you have submitted is what I need.
    Now I ask: It seems that running this script, it doesn't match the word 'duck' and 'lßmb' though these words are listed in the array $words... Why?

    In this case I mean with the the term 'match' the wrapping within the span tags!

  15. #15
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Because you need to fill in the values in the $collation array.

  16. #16
    SitePoint Zealot
    Join Date
    Nov 2008
    Location
    Italy
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes, it goes but for the word 'l&aacute;mb' before to introduce the $text in the function I should str_replace l&aacute; with ß and proceed. Is it right?

  17. #17
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes. You may be able to use html_entity_decode() here.

  18. #18
    SitePoint Zealot
    Join Date
    Nov 2008
    Location
    Italy
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok Thanks.. The problem has been resolved in the best manner. This code it will be really useful. So, Many thanks to all indeed. Bye bye!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •