SitePoint Sponsor

User Tag List

Results 1 to 2 of 2
  1. #1
    Resident OCD goofball! bronze trophy Serenarules's Avatar
    Join Date
    Dec 2002
    Posts
    1,911
    Mentioned
    26 Post(s)
    Tagged
    0 Thread(s)

    I need assistance with an advanced regex statement.

    What I have is a text file with some questions in it. It looks like this, if read using file_get_contents...

    *** Question 101 This is the question. (A) Choice a. (B) Choice b. (C) Choice c. (D) Choice d. (E) Choice e. Explanation: the explanation. Question 102 This is the question. (A) Choice a. (B) Choice b. (C) Choice c. (D) Choice d. (E) Choice e. Explanation: the explanation. *** Question 201 This is the question. (A) Choice a. (B) Choice b. (C) Choice c. (D) Choice d. (E) Choice e. Explanation: the explanation. Question 202 This is some instructions for latter questions. [non-question]

    This is what it looks like if formatted a bit...

    ***
    Question 101
    This is the question.
    (A) Choice a.
    (B) Choice b.
    (C) Choice c.
    (D) Choice d.
    (E) Choice e.
    Explanation: the explanation.

    Question 102
    This is the question.
    (A) Choice a.
    (B) Choice b.
    (C) Choice c.
    (D) Choice d.
    (E) Choice e.
    Explanation: the explanation.

    ***
    Question 201
    This is the question.
    (A) Choice a.
    (B) Choice b.
    (C) Choice c.
    (D) Choice d.
    (E) Choice e.
    Explanation: the explanation.

    Question 202
    This is some instructions for latter questions.
    [non-question]

    Notes: *** and [non-question] are flags which can be present or not. If [non-question] is present, there are no choices or explanations.

    What I want is to able to do this:

    PHP Code:
    preg_match_all($pattern$source$matchesPREG_SET_ORDER);
    foreach (
    $matches as $match)
    {
        
    // do something with $match['seen_on_exam'] or $match['number'] etc...

    Of course, this means using parameters such as (?P<seen_on_exam>\*{3}), which I can on simpler cases. The problem is that this pattern is strange. Here's what I came up with.

    (?P<seen_on_exam>\*{3})?
    Question
    (?P<as_numbered>\d+)
    (?P<question_text>\w+)
    (\(A\) (?P<choice_a>\w+))?
    (\(B\) (?P<choice_b>\w+))?
    (\(C\) (?P<choice_c>\w+))?
    (\(D\) (?P<choice_d>\w+))?
    (\(E\) (?P<choice_e>\w+))?
    (Explanation: (?P<explanation>\w+))?
    (?P<non_question>\[non\])?

    The hard part is accounting for possible whitespace between optional/required parts (the only required things is the text "Question", the number, and the actual question text. However, every line needs to come through in the match array, leaving non-existant elements blank. I just can't get the final regex correct. Would somebody mind taking a look at this and help me assemble it?

    My final version, which doesn't work, is this:

    /(?P<seen_on_exam>\*{3}\s)?Question\s(?P<as_numbered>\d+)\s(?P<question_text>\w+)\s?(\(A\) (?P<choice_a>\w+))?\s?(\(B\) (?P<choice_b>\w+))?\s?(\(C\) (?P<choice_c>\w+))?\s?(\(D\) (?P<choice_d>\w+))?\s?(\(E\) (?P<choice_e>\w+))?\s?(Explanation: (?P<explanation>\w+))?\s?(?P<non_question>\[non\])?/

  2. #2
    SitePoint Wizard wonshikee's Avatar
    Join Date
    Jan 2007
    Posts
    1,223
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    I would just forgo complicated regexp and use the fact that it comes with "Question 123" and use that for splitting.

    http://php.net/manual/en/function.preg-split.php

    That gets you the question group, from which you can use simple logic to with explode or easier regexp to break it apart further.

    Try to tackle complicated problems in small steps rather than one big one.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •