SitePoint Sponsor

User Tag List

Results 1 to 19 of 19
  1. #1
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    It's official: regular expressions are evil...

    Well, maybe that's not true, but it certainly feels that way today.

    Anyway, I'm trying to create a regular expression to look for the following search queries:

    • Show <assettypes>
    • Show <assettypes> from today [option: by <username / workgroup>]
    • Show <assettypes> from yesterday [option: by <username / workgroup>]
    • Show <assettypes> from <number> days ago [option: by <username / workgroup >]
    • Show <assettypes> from last <period: week / month / year> [option: by <username / workgroup >]

    And so on and so forth.

    So firstly, I need to be able to identify the pattern and then secondly lift out the salient details to build the query...

    If someone can help with the expression, that would be cool.

    I'm pretty sure I could figure the rest out from there...

  2. #2
    SitePoint Enthusiast gundari's Avatar
    Join Date
    Apr 2005
    Location
    Santa Fe, Argentina
    Posts
    38
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  3. #3
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have no idea what any of that means! Sorry!

    That's just unintelligible to me.

    The screen shots are just frightening...

  4. #4
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Forbes
    Anyway, I'm trying to create a regular expression to look for the following search queries:
    What exactly is the problem? If you don't know regexp syntax, consider reading the manual.

  5. #5
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've been onto PHP.net and there are no examples that cover what I'm trying do, that's why I'm posting here...

  6. #6
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I like this application: The Regex Coach. It allows you to quickly figure out what is working and what isn't... as you create your expression it will show you what part of a string you are matching.
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?

  7. #7
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What about a Mac version? Well, I had a beta version for OS*X online for some time but it had some serious problems and wasn't ready for distribution. The problem is that, while porting applications between Windows, Linux, and OS X is generally a breeze with LispWorks, it turns out that in the case of The Regex Coach it isn't - the way the application works on the other two platforms can't be mapped one-to-one to OS*X because of threading issues...
    Fell at the first hurdle.

    I don't have access to either of the other two qualifying platforms.

    I'll keep an eye on this because it looks like a really nice application.

    Lord knows I need help with regular expressions...

  8. #8
    SitePoint Addict
    Join Date
    Sep 2004
    Posts
    211
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    http://www.regexplib.com/

    maybe you'll find something in there to help you out or at least get you started.

  9. #9
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've sort of got things going:
    PHP Code:
            if (ereg("show ([A-Za-z]{1,})"$this->arrayInputValues['search'], $regExpMatches)) {

        echo 
    "1st &raquo; " $this->arrayInputValues['search'] . " / " $regExpMatches[0];

        } elseif (
    ereg("show [A-Za-z]|(by [A-Za-z])"$this->arrayInputValues['search'], $regExpMatches)) {

        echo 
    "2nd &raquo; " $this->arrayInputValues['search'] . " / " $regExpMatches[0];

    // end if 
    But the problem is, the return string includes the word: "show" which is what I want to exclude.

    So in the opening example in my opening post, all words in bold would need to excluded from the returned array...

  10. #10
    SitePoint Evangelist jplush76's Avatar
    Join Date
    Nov 2003
    Location
    Los Angeles, CA
    Posts
    460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    so you don't want the word "show" in your echo statement?

    its because you're looking in $regExpMatches[0]; instead of $regExpMatches[1];

    do you see how your A-Za-z]{1,} is surrounded by parenthesis? each parenthesis is an index of matches starting at 1. So $regExpMatches[0] is the WHOLE string the regex matched on and $regExpMatches[1] is the first set of parenthesis.

    PHP Code:
    if (ereg("show ([A-Za-z]{1,})"$this->arrayInputValues['search'], $regExpMatches)) {

        echo 
    "1st &raquo; " $this->arrayInputValues['search'] . " / " $regExpMatches[0];

        } 
    here is a working example
    PHP Code:
    <?php
    $string 
    "show assetgroup";


    if (
    preg_match("#show\s([A-Za-z]{1,})#"$string$regExpMatches)) {

        echo 
    "1st &raquo; " $string " / " $regExpMatches[1];

        } 
        
    ?>
    My-Bic - Easiest AJAX/PHP Framework Around
    Now Debug PHP scripts with Firebug!

  11. #11
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi Jim!

    In my example, if I search for: 'show contacts from yesterday by wayne'

    I get: '2nd show contacts from yesterday by wayne / show contacts / contacts /'

    So that's the whole string, then the first part of the match, including the word: 'show', but it's ignoring the last part of the string which should have caused the 3rd statement to fire, not the second...

  12. #12
    SitePoint Evangelist jplush76's Avatar
    Join Date
    Nov 2003
    Location
    Los Angeles, CA
    Posts
    460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    its the same concept in the 2nd regex check as well, you're echo'ing out the matches[0] key which is the whole string, anything you want to pick out of your string surroud it by parenthesis so take the string

    "jim is sleepy today"

    if I wanted to extract each word with a regex I could do this:

    PHP Code:
    <?php
    $string 
    "jim is sleepy today";
    if(
    preg_match("#(jim)\s(is)\s(sleepy)\s(today)#"$string$matches))
    {
       echo 
    "whole string is {$matches[0]}<BR><BR>";
       echo 
    "1st word is : {$matches[1]}<BR>";
       echo 
    "2nd word is : {$matches[2]}<BR>";
       echo 
    "3rd word is : {$matches[3]}<BR>";
       echo 
    "4th word is : {$matches[4]}<BR>";
    }
      
    ?>
    which prints out...
    whole string is jim is sleepy today

    1st word is : jim
    2nd word is : is
    rd word is : sleepy
    4th word is : today
    My-Bic - Easiest AJAX/PHP Framework Around
    Now Debug PHP scripts with Firebug!

  13. #13
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK, having some success, now:
    PHP Code:
            if (ereg("show ([a-zA-Z]{1,})$"$this->arrayInputValues['search'], $arrayRegExpMatches)) {

                echo 
    "1st &raquo; ";

                foreach (
    $arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            
    } elseif (ereg("show ([a-zA-Z]{1,}[^by ]$)|show ([a-zA-Z]{1,}) by ([a-zA-Z]{1,}$)"$this->arrayInputValues['search'], $arrayRegExpMatches)) {

                echo 
    "2nd &raquo; ";

                foreach (
    $arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            
    } elseif (ereg("show ([a-zA-Z]{1,}) from ([a-zA-Z]{1,}[^by ]$)|show ([a-zA-Z]{1,}) from ([a-zA-Z]{1,}) by ([a-zA-Z]{1,})"$this->arrayInputValues['search'], $arrayRegExpMatches)) {

                echo 
    "3rd &raquo; ";

                foreach (
    $arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            
    // end if 

  14. #14
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm getting a stray empty element coming through the array which I need to keep out.

    But other than that, I think this is the basis of a working model...

  15. #15
    SitePoint Evangelist jplush76's Avatar
    Join Date
    Nov 2003
    Location
    Los Angeles, CA
    Posts
    460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    glad you're on the path, good luck
    My-Bic - Easiest AJAX/PHP Framework Around
    Now Debug PHP scripts with Firebug!

  16. #16
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Just for the record...

    Hi guys and thanks for proddin' and a pokin' me in the right direction.

    As usual, I'm trying to do something that isn't normally done.

    So for the sake of all of those unfortunates who might be following my route into Regular Expression hell, here's the conundrum solved:
    PHP Code:
            // Show <assettypes> [option: by <username / workgroup>]
            
    if (ereg("show ([a-zA-Z]{1,})$|show ([a-zA-Z]{1,}) by ([a-zA-Z]{1,})$"$this->arrayInputValues['search'], $this->arrayRegExpMatches)) {

                echo 
    "1st &raquo; ";

                foreach (
    $this->arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            // Show <assettypes> from today [option: by <username / workgroup>]
            
    } elseif (ereg("show ([a-zA-Z]{1,}) from today$|show ([a-zA-Z]{1,}) from today by ([a-zA-Z]{1,}$)"$this->arrayInputValues['search'], $this->arrayRegExpMatches)) {

                echo 
    "2nd &raquo; ";

                foreach (
    $this->arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            // Show <assettypes> from yesterday [option: by <username / workgroup>]
            
    } elseif (ereg("show ([a-zA-Z]{1,}) from yesterday$|show ([a-zA-Z]{1,}) from yesterday by ([a-zA-Z]{1,})$"$this->arrayInputValues['search'], $this->arrayRegExpMatches)) {

                echo 
    "3rd &raquo; ";

                foreach (
    $this->arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            // Show <assettypes> from last <period: week / month / year> [option: by <username / workgroup>]
            
    } elseif (ereg("show ([a-zA-Z]{1,}) from last ([a-zA-Z]{1,})$|show ([a-zA-Z]{1,}) from last ([a-zA-Z]{1,}) by ([a-zA-Z]{1,}$)"$this->arrayInputValues['search'], $this->arrayRegExpMatches)) {

                echo 
    "4th &raquo; ";

                foreach (
    $this->arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            // Show <assettypes> from <number> days ago [option: by <username / workgroup>]
            
    } elseif (ereg("show ([a-zA-Z]{1,}) from ([0-9]{1,}) days ago$|show ([a-zA-Z]{1,}) from ([0-9]{1,}) days ago by ([a-zA-Z]{1,})$"$this->arrayInputValues['search'], $this->arrayRegExpMatches)) {

                echo 
    "5th &raquo; ";

                foreach (
    $this->arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            // Show <assettypes> from last <period: week / month / year> [option: by <username / workgroup>]
            
    } elseif (ereg("show ([a-zA-Z]{1,}) from last ([a-zA-Z]{1,})$|show ([a-zA-Z]{1,}) from last ([a-zA-Z]{1,}) by ([a-zA-Z]{1,})$"$this->arrayInputValues['search'], $this->arrayRegExpMatches)) {

                echo 
    "6th &raquo; ";

                foreach (
    $this->arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            // Show <assettypes> from <dateformat: DD/MM/YYYY> to <dateformat: DD/MM/YYYY> [option: by <username / workgroup>]
            
    } elseif (ereg("show ([a-zA-Z]{1,}) from (([0-9]{1,2})/([0-9]{1,2})/([0-9]{1,2})) to (([0-9]{1,2})/([0-9]{1,2})/([0-9]{1,2}))$|show ([a-zA-Z]{1,}) from (([0-9]{1,2})/([0-9]{1,2})/([0-9]{1,2})) to (([0-9]{1,2})/([0-9]{1,2})/([0-9]{1,2})) by ([a-zA-Z]{1,})$"$this->arrayInputValues['search'], $this->arrayRegExpMatches)) {

                echo 
    "6th &raquo; ";

                foreach (
    $this->arrayRegExpMatches as $value) { echo ("$value / "); } // end foreach

            
    // end if 
    There's still room for filling out the expressions to make them more accurate and specific, but seeing as though I've got this far, I'm sure I'll figure the rest out.

    Cheers...

  17. #17
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I was going to suggest this earlier, but thought it might be too much of a distraction.

    You may consider tackling this problem without using any regex. It would probably be more verbose, but using switch statements and str_functions I wouldn't be surprised you could come up with something that (despite having more code) could run quicker.

    Nothing to toss what you have over, just something to consider if you find that your present solution becomes unmanageable.
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?

  18. #18
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Speed of execution could be a factor later on.

    What I'm working on is a framework that other applications will sit on top of, and this particular problem is the Search tool that all applications will use.

    I steered clear of Pearl-compatible regular expressions because of their increased performance hit.

    So thanks for that, I'll have a look and see what I can do...

  19. #19
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Forbes
    I steered clear of Pearl-compatible regular expressions because of their increased performance hit.
    I think you might have that backwards. Generally, I believe it is str, preg, then ereg; in order of fastest to slowest execution time. For example, on the ereg php manual page it is noted...
    Note: preg_match(), which uses a Perl-compatible regular expression syntax, is often a faster alternative to ereg().
    If there is something that indicates that ereg is faster, or faster in some situations, I'd be interested to read it that so I may correct my understanding.
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •