SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    Non-Member
    Join Date
    Jan 2004
    Location
    Seattle
    Posts
    4,328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Advanced preg_replace function

    I'm working with a content management system, with a static page at MySite/Reference/Glossary/index.php. Each page is identified as $MyName, where $MyName = the last portion of the URL. So, if the URL is MySite/Reference/Glossary/nation, then $MyName = "nation."

    Now imagine the following text drawn from a database:

    "Nationhood is the dream of millions of people. But becoming a nation takes work. Nations have certain duties and responsibilities. Like territories, nations must be defended. Are you familiar with the term nation-building?"

    Either of the scripts below...

    PHP Code:
    $Text preg_replace('/[^-a-zA-Z]('.$MyName.')[^-a-zA-Z]/''<em>$0</em>'$Text);

    $Text preg_replace("/\b$MyName\b/"'<em>$0</em>'$Text); 
    ...will italicize every instance of "nation" while ignoring words like nationhood and nation-building, like this:

    "Nationhood is the dream of millions of people. But becoming a <em>nation</em> takes work. Nations have certain duties and responsibilities. Like territories, nations must be defended. Are you familiar with the term nation-building?"

    But there are two more items I'd like to italicize: Instances of "nation" that begin a sentence ("Nation," with a capital N) and the plural form of nation ("nations").

    But it's a little more complex, since there are different plural forms. For example, the plural of echo is echoES, and the plural of goose is gEEse.

    Since I probably can't cover all the bases, I thought I'd try and style the more common plural forms. So, is there a way of modifying either of the above scripts so that they italicize...

    1. $MyName, even when it begins with a capital letter

    2. $MyName, even when it ends in s or es

    3. $MyName, even when it begins with a capital letter AND ends in s or es

    Thanks.

  2. #2
    is_empty(2); foofoonet's Avatar
    Join Date
    Mar 2006
    Posts
    1,000
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Search for "regex coach" and download a little win prog that lets you test differing regexes on any test text you like. Its a great time saver, and will save you many postings.

  3. #3
    Obey the Purebreed trib4lmaniac's Avatar
    Join Date
    Dec 2004
    Location
    Cornwall, UK
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    http://mq.astronomyforbeginners.com/regexp.php is a handy tool I made years ago (and still use) when testing my regular expressions.

    A simple fix would be:
    PHP Code:
    $Text preg_replace('/\b' preg_quote($MyName) . '(es|s)?\b/i''<em>$0</em>'$Text); 

  4. #4
    Non-Member
    Join Date
    Jan 2004
    Location
    Seattle
    Posts
    4,328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by trib4lmaniac
    http://mq.astronomyforbeginners.com/regexp.php is a handy tool I made years ago (and still use) when testing my regular expressions.

    A simple fix would be:
    PHP Code:
    $Text preg_replace('/\b' preg_quote($MyName) . '(es|s)?\b/i''<em>$0</em>'$Text); 
    Wow, that nailed it. If I wanted to add a third plural ending, like "ates," I assume I would just change (es|s) to (es|s|ates)?

    I'll download your tool and Regex Coach. I can see I'm going to be using regex a lot. Thanks.

  5. #5
    Non-Member
    Join Date
    Jan 2004
    Location
    Seattle
    Posts
    4,328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    2 more questions...

    1. Do you know how to modify this script so it treats hyphens (-) like letters? For example, suppose I'm focusing on the word "right" (as in the political right). If my text contains the terms left-right and right-winger, then the html would look like this:

    left-<em>right</em>, <em>right</em>-winger

    Instead, I want to leave those words alone. I figured out how to add the hyphens to the other script...

    PHP Code:
    [^a-zA-Z](\$MyName)[^a-zA-Zwith replacement '<em>' . $'</em>' 
    ...but I'm not sure how to add them to your script:

    PHP Code:
    $Text preg_replace('/\b' preg_quote($MyName) . '(es|s)?\b/i''<em>$0</em>'$Text); 
    2. Suppose I have some text that features a particular word over and over and over. Is there a way to modify your script so it only applies to the FIRST instance of a word? More precisely, could I change it so it modifies all words just once?

    Thanks.

  6. #6
    Obey the Purebreed trib4lmaniac's Avatar
    Join Date
    Dec 2004
    Location
    Cornwall, UK
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    $Text preg_replace('/\b(?<!-)' preg_quote($MyName) . '(es|s|ates)?(?!-)\b/i''<em>$0</em>'$Text); 
    Does that do what you want for question 1?
    Question 2 is bit trickier. Do you mean that in the string "nation nation" you get back "<em>nation</em> nation"?

  7. #7
    Non-Member
    Join Date
    Jan 2004
    Location
    Seattle
    Posts
    4,328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by trib4lmaniac
    PHP Code:
    $Text preg_replace('/\b(?<!-)' preg_quote($MyName) . '(es|s|ates)?(?!-)\b/i''<em>$0</em>'$Text); 
    Does that do what you want for question 1?
    Perfect; that script is now complete.

    Question 2 is bit trickier. Do you mean that in the string "nation nation" you get back "<em>nation</em> nation"?
    Exactly. I figured it would probably be pretty tricky, and it's not that big a deal. If there's a relatively simple way to do it, I'll go for it, otherwise, I can live with multiple linked words.

    Thanks.

  8. #8
    Obey the Purebreed trib4lmaniac's Avatar
    Join Date
    Dec 2004
    Location
    Cornwall, UK
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    $Text preg_replace('/\b(?<!-)' preg_quote($MyName) . '(es|s|ates)?(?!-)\b/i''<em>$0</em>'$Text1); 
    Just remembered preg_replace has a limit parameter

  9. #9
    SitePoint Enthusiast
    Join Date
    May 2006
    Posts
    38
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    here's what i could manage off the top of my head...
    PHP Code:
    $MyName="nation";
    $source "nation nation national";
    $find "@([".strtolower($MyName[0]).strtoupper($MyName[0])."]".substr($MyName1)."(e?s)?)([\W])@";
    $replace "<em>$1</em>";
    $hilited preg_replace($find"<em>$1</em>$3"$source);
    echo 
    $hilited
    should produce
    Code:
    <em>nation</em> <em>nation</em> national


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •