SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Member
    Join Date
    Mar 2010
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Python Regex Help

    Hi everyone, i'm pulling my hair out over this.

    I have strings that are in this format:

    Code Python:
    line = "(optional text) this is required text +oneWordOptional @OneWordOptional"

    It could be in any format except the parentheses (if they exist) must be first, eg, this is also valid:

    Code Python:
    line = "(optional text) this is required text @OneWordOptional +oneWordOptional"

    I've got this regex:

    Code Python:
    optionRe = re.compile(r'(?:\(.+\))?(.+)\+?|@?')

    however, it's including the first + or @ prefaced text in the result, meaning it's being greedy. Reading through my RE book, I found that doubling the ? (i.e. ??) makes the RE non-greedy. However, it appears Python's interpretation doesn't support this. How can I make this non-greedy?

  2. #2
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    Hm, Python 2.7 should understand ?? fine. But I'm not sure I understand what you mean by greedy... greedy is "match as much as possible, in a given string". It's not "match as many strings as possible".

    I'm wondering if you want to say
    ^(?:\(stuff\))? ...

    Here, the "(stuff)" is still optional and non-captured, but the regex should be looking for it specifically at the beginning of the string.

    however, it's including the first + or @ prefaced text in the result...
    So, you have strings:
    Code:
    (optional text) this is required text +oneWordOptional @OneWordOptional
    and
    Code:
    this is required text +oneWordOptional @OneWordOptional
    but also
    Code:
    +oneWordOptional @OneWordOptional this is required text
    which you don't want to let match?

    You may want to forbid those symbols explicitly when at the beginning then:

    ^[^+@]\w ...

    *edit: I may be confused by your post, are you grabbing strings or atoms of a string?

  3. #3
    SitePoint Member
    Join Date
    Mar 2010
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for your help! I have 2.6 installed, so I guess an update is in order. Thanks for the other tips too. I haven't done real regex in years, so I haven't gotten all the kinks out.

  4. #4
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    Well, I don't know that 2.6 doesn't have ??... in general, the C-ish languages follow PCRE, and any exceptions to that are usually known in the community and listed around in various places. Nothing wrong with upgading, but I would be surprised that your version's missing ??.

    *edit could you be more clear on what exactly you're doing with these strings?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •