SitePoint Sponsor

User Tag List

Results 1 to 14 of 14

Thread: REGEX help?

  1. #1
    SitePoint Enthusiast sabret00the's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    REGEX help?

    I have a string
    Code:
    When: When: Thu Aug 23, 2012 1pm to Fri Aug 24, 2012 10pm \nBST\u003cbr /\u003e\n\u003cbr /\u003eWho: Paul\n\u003cbr /\u003eWhere: The O2, Greenwich\n\u003cbr /\u003eEvent Status: confirmed
    And I'd like to extract two pieces of information from it

    First is
    Code:
    The O2, Greenwich
    Second is
    Code:
    1pm to 10pm
    I've always sucked at REGEXs so help me please?

  2. #2
    SitePoint Enthusiast sabret00the's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    To further explain: I'm trying to extract two pieces of data from a string (see below), the time and the location.

    String example two
    Code:
    When: When: Thu Aug 23, 2012 1pm to Fri Aug 24, 2012 10pm \nBST\u003cbr /\u003e\n\u003cbr /\u003eWho: Paul\n\u003cbr /\u003eWhere: Lost Region, Timbuktu\n\u003cbr /\u003eEvent Status: confirmed

  3. #3
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,716
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by sabret00the View Post
    I have a string [...] And I'd like to extract two pieces of information from it
    The following should do the job.

    Code javascript:
    var eventInfo = "When: When: Thu Aug 23, 2012 1pm to Fri Aug 24, 2012 10pm \nBST\u003cbr /\u003e\n\u003cbr /\u003eWho: Paul\n\u003cbr /\u003eWhere: The O2, Greenwich\n\u003cbr /\u003eEvent Status: confirmed",
        whenLine = eventInfo.match(/(.*)/)[1];
        when = whenLine.match(/(\d+(?:am|pm) to )/)[1] + whenLine.match(/(\d+(?:am|pm))\s*$/)[1],
        where = eventInfo.match(/Where: (.*)/)[1];
     
    // when is "1pm to 10pm"
    // where is "The O2, Greenwich"
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  4. #4
    SitePoint Enthusiast sabret00the's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you so much. I really appreciate the help.

  5. #5
    SitePoint Enthusiast sabret00the's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm finding that it errors out if it doesn't find a match, however I thought to myself that I should encapsulate it in a conditional and thus I've tried
    Code javascript:
    if (eventinfo[index]['details'].match(^(20|21|22|23|[01]\d|\d)(([:.][0-5]\d){1,2})$))

    But it's giving me a parse error, what am I doing wrong?

  6. #6
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,716
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by sabret00the View Post
    I'm finding that it errors out if it doesn't find a match, however I thought to myself that I should encapsulate it in a conditional and thus I've tried
    Code javascript:
    if (eventinfo[index]['details'].match(^(20|21|22|23|[01]\d|\d)(([:.][0-5]\d){1,2})$))

    But it's giving me a parse error, what am I doing wrong?
    It errors when it doesn't find a match, so let's deal with that issue instead. We can get the match separately, and the check to see if the match contains anything useful before getting the [1] index from it. If it doesn't contain anything useful, we can give it a default value of an empty string instead.

    Code javascript:
    var eventInfo = "Some non-matching content",
        whenMatch = eventInfo.match(/(.*)/),
        whenLine = (whenMatch && whenMatch[1]) || '',
        fromWhenMatch = whenLine.match(/(\d+(?:am|pm) to )/),
        toWhenMatch = whenLine.match(/(\d+(?:am|pm))\s*$/),
        whereMatch = eventInfo.match(/Where: (.*)/),
        fromWhen = (fromWhenMatch && fromWhenMatch[1]) || '',
        toWhen = (toWhenMatch && toWhenMatch[1]) || '',
        when = fromWhen + toWhen,
        where = (whereMatch && whereMatch[1]) || '';


    We can even simplify things further by putting parts of this in to some functions:

    Code javascript:
    function getFirstLine(info) {
        var match = info.match(/(.*)/),
            firstLine = (match && match[1]) || '';
     
        return firstLine;
    }
     
    function getWhenInfo(firstLine) {
        var fromMatch = firstLine.match(/(\d+(?:am|pm) to )/),
            toMatch = firstLine.match(/(\d+(?:am|pm))\s*$/),
            from = (fromMatch && fromMatch[1]) || '',
            to = (toMatch && toMatch[1]) || '',
            when = from + to;
     
        return when;
    }
     
    function getWhereInfo(info) {
        var match = info.match(/Where: (.*)/),
            where = (match && match[1]) || '';
     
        return where;
    }
     
    var eventInfo = "Some non-matching info",
        firstLine = getFirstLine(eventInfo),
        when = getWhenInfo(firstLine),
        where = getWhereInfo(eventInfo);
    Last edited by paul_wilkins; Aug 17, 2012 at 15:01.
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  7. #7
    SitePoint Enthusiast sabret00the's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you Paul. You're awesome. That works perfectly.

  8. #8
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,716
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by sabret00the View Post
    Thank you Paul. You're awesome. That works perfectly.
    You're welcome. Just to explain briefly, this is where the syntax comes from for:
    Code:
    where = (match && match[1]) || '';
    First we start with how that might be written in full, as:

    Code javascript:
    if (match.length > 0) {
        where = match[1];
    } else {
        where = '';
    }

    Which can then easily be turned in to a ternary expression of the format (...) ? ... : ...;

    Code javascript:
    where = (match.length > 0) ? match[1] : '';

    And that might do just well, except that it's not being expressive enough. Currently the above code says that the where variable is either an array value, or an empty string. But it doesn't really make it clear as to why this should be the case. Sure, we can come up with some kind of correlation between the length check and the match[1], but there are more expressive ways to go about this.

    What we can do is to use the && operator as a guard condition. Only if the preceeding condition is truthy, will JavaScript be allowed to carry on and check the next one. This is something that is commonly used to check if something exists first before using it.

    Code javascript:
    if (targ && targ.nodeName && targ.nodeName === 'A') {
        ...
    }

    And there is also the || operator which is used as a default value, because JavaScript will keep on checking the different conditions until it comes across one that is truthy in nature. This is commonly used to assign default values to a variable:

    Code javascript:
    function onclickEventHandler(evt) {
        evt = evt || window.event;
        var targ = evt.target || evt.srcElement;
        ...
    }

    So we can put the two together. To say that match[1] is being guarded first, in case it doesn't exist, and if nothing is found there, that a default value of an empty string should be used instead.

    Code javascript:
    where = (match.length > 0) && match[1] || '';

    Now since an array is considered to always be a truthy value, even if the array is completely empty, and a failed regular expression match doesn't give an array, but null instead, we can just test to see if the match is truthy or not.

    Code javascript:
    where = match && match[1] || '';

    And finally to help make things clearer to someone who is reading the code, we can use parenthesis to help clarify that the first two parts are related to each other:

    Code javascript:
    where = (match && match[1]) || '';

    So you could have ended up with a whole lot of if/else statements in your code.

    Code:
    if (match.length > 0) {
        where = match[1];
    } else {
        where = '';
    }
    But you now have in return code that is more expressive instead. I wouldn't say that this is a one-liner, because reducing code to a single line is not a good goal to have. Instead, it makes use of some well known javascript techniques to result in being even more expressive than the if/else code from before.

    Code javascript:
    where = (match && match[1]) || '';
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  9. #9
    SitePoint Enthusiast sabret00the's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks again Paul. Me and JS have never gotten along that well, but you're actually amazing at explaining this stuff. Truly, thank you.

  10. #10
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,716
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by sabret00the View Post
    Thanks again Paul. Me and JS have never gotten along that well, but you're actually amazing at explaining this stuff. Truly, thank you.
    I eventually figured out how I was wanting to end up the discussion.

    The ternary expression is equivalent to this code:

    Code javascript:
    // var where = (match.length > 0) ? match[1] : '';
     
    var where;
    if (match.length > 0) {
        where = match[1];
    } else {
        where = '';
    }

    Whereas, the preferred code using a guard operator and a default operator, is a lot closer to this:

    Code javascript:
    // var where = (match && match[1]) || '';
     
    var where = ''; // default value
    if (match) { // guard
        where = match[1];
    }
    Last edited by paul_wilkins; Aug 18, 2012 at 16:02.
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  11. #11
    SitePoint Enthusiast sabret00the's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Before this discussion, I had no idea you could do something like:

    Code javascript:
    where = (match && match[1]) || '';

    I thought you'd have to use a ternary. It actually surprised me or maybe that's because my background is PHP.


    Sent from my HTC One X using Tapatalk 2

  12. #12
    SitePoint Enthusiast sabret00the's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi Paul, I've discovered an issue.

    Sometimes I have
    Code javascript:
    function getWhenInfo(firstLine) {
    	// Written by Paul Wilkins @ Sitepoint
        var fromMatch = firstLine.match(/(\d+(?:am|pm) to )/),
            toMatch = firstLine.match(/(\d+(?:am|pm))\s*$/),
            from = (fromMatch && fromMatch[1]) || '',
            to = (toMatch && toMatch[1]) || '',
            when = from + to;
     
        return when;
    }

    only returning

    Code:
    6pm to
    From
    Code:
    When: Tue Oct 16, 2012 6pm to 10pm \nBST\u003cbr /\u003e\n\u003cbr /\u003e
    I've been unable to figure out how to make it recognise that without breaking multi-day usage. I thought it'd be something like
    Code javascript:
    fromMatch = firstLine.match(/(\d+(?:am|pm) to )/ || /(\d+(?:am|pm) to \d+(?:am|pm))/),
    but it's not.

  13. #13
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,716
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by sabret00the View Post
    I've been unable to figure out how to make it recognise that without breaking multi-day usage. I thought it'd be something like
    Code javascript:
    fromMatch = firstLine.match(/(\d+(?:am|pm) to )/ || /(\d+(?:am|pm) to \d+(?:am|pm))/),
    but it's not.
    The fromMatch part doesn't need changing. You should put that back to what it was before.

    It's the toMatch line that needs to be made more flexible, by having it match from the "to " keyword of the line.

    Code javascript:
    toMatch = firstLine.match(/to (\d+(?:am|pm))/),
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  14. #14
    SitePoint Enthusiast sabret00the's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you Paul. I most definitely owe you a beer.

    Sent from my HTC One X using Tapatalk 2


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •