SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Wizard megamanXplosion's Avatar
    Join Date
    Jan 2004
    Location
    Kentucky, USA
    Posts
    1,099
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Match Hyphens with Regex

    I am building a simple CMS. The content is stored as psuedo-markup in a bunch of text files. Based on the URL, a specific text file is opened, read, and its psuedo-markup is parsed into HTML, which is then sent to the browser. I'm having a problem with the parsing, though.

    The text files look like this:

    The Art of Writing and Speaking the English Language
    ======================================================

    General Introduction
    ------------------------------------------------------------------------------------------

    If there is a subject of really universal interest and utility, it is the art of writing and speaking one's own language effectively. It is the basis of culture, as we all know; but it is infinitely more than that: it is the basis of business. No salesman can sell anything unless he can explain the merits of his goods in _effective_ English (among our people), or can write an advertisement equally effective, or present his ideas, and the facts, in a letter. Indeed, the way we talk, and write letters, largely determines our success in life.
    I want to parse the underlined text into <h#> elements, like this:

    Code HTML4Strict:
    <h1>The Art of Writing and Speaking the English Language</h1>
     
    <h2>General Introduction</h2>

    My code parses the <h1> section fine, but it doesn't work at all for the <h2> section. Here is my code...

    Code PHP:
    $this->data = preg_replace
    (
        array
        (
            '/^(.+)\n(=)+/',
            '/^(.+)\n(-)+/'
        ),
        array
        (
            '<h1>$1</h1>',
            '<h2>$1</h2>'
        ),
        $this->data
    );

    The only difference between the regexes is that one uses an equal sign and the other uses a hyphen. I tried a few things to get it to work, but no luck.

    I would appreciate it if someone could tell me what I'm doing right and perhaps show me how to do it right.

  2. #2
    PHP Guru lampcms.com's Avatar
    Join Date
    Jan 2009
    Posts
    921
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You probably need to use a multiline switch m
    like this:

    '/^(.+)\n(=)+/m',
    '/^(.+)\n(-)+/m'

  3. #3
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    If not, try escaping the hyphen with a preceding backslash.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  4. #4
    SitePoint Wizard megamanXplosion's Avatar
    Join Date
    Jan 2004
    Location
    Kentucky, USA
    Posts
    1,099
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sharedlog,

    The <h1> code works perfectly, which it wouldn't do if there was a problem with multiple lines. The only difference between the <h1> regex and the <h2> regex is the equal sign in the first and the hyphen in the second. The problem has something to do with the hyphen.

    Arkinstall,

    I tried putting a backslash immediately before the hyphen, like this /^(.+)\n(\-)+/, but it makes no difference: the regex still doesn't match anything.

  5. #5
    PHP Guru lampcms.com's Avatar
    Join Date
    Jan 2009
    Posts
    921
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Are you sure that these are hyphens and not upderscores? They look similar

  6. #6
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    55 Post(s)
    Tagged
    0 Thread(s)
    Without using the multiline modifier, the caret (^) only matches the start of the subject string (your markup). That's why only that pattern matches, because the <h2> portion of the text is not at the beginning of the string.

    Did you even try Sharedlog's suggestion? Perhaps you should.

    Escaping the hyphen character is unnecessary because it has no special meaning outside of character sets ([]). The problem, as eluded to above, is the caret anchoring to the start of the subject string and not the start of any individual line within that string.

    Also, a minor point, there is no need for the second capturing group (second set of parentheses) in each pattern since you won't be making use of whatever is captured (just a = or -).

    As such, amended patterns that should work as expected could look like:

    /^(.+)\n=+$/m
    /^(.+)\n-+$/m
    Salathe
    Software Developer and PHP Manual Author.

  7. #7
    SitePoint Wizard megamanXplosion's Avatar
    Join Date
    Jan 2004
    Location
    Kentucky, USA
    Posts
    1,099
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ah, it was the /m modifier that I needed. I didn't even realize that I was assuming preg_replace worked line-by-line. Oops.

    Thank you everyone


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •