SitePoint Sponsor

User Tag List

Results 1 to 2 of 2
  1. #1
    SitePoint Addict melchiorus's Avatar
    Join Date
    Jun 2004
    Location
    Indiana
    Posts
    283
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Creating a Parsing Engine

    I've been working with some regex replace functions lately and have noticed a small flaw in what I was using it for. It replaces all of one type before replacing all of another type.

    For example, let's say there's a keyword of "%loopFunction:%" and this was in two different places of a document. The first place this function would output one thing, but the second time it would output a totally different thing. The problem is, though, that the replace function replaces both sections with the same code as the first output.

    So this got me to think about a parsing engine that would parse from the top-down of a document. However, I can't really think about how this could be done with PHP.

    The only method I can really think of would be to read in X bytes of a file and parse. However, this would have two major flaws. The first one being that it would be extremely slow and wasteful, and the second being that it would need to make exceptions for properties or loops that would possibly span multiple lines or a large number of bytes.

    Does anyone have any experience in doing something like this? Even if you haven't done it in PHP, maybe you could post some concept code that could get the logical juices in the brain flowing. Thanks in advance for any help here.
    -Melchior (Stephen Craton)

  2. #2
    SitePoint Wizard silver trophy
    Join Date
    Mar 2006
    Posts
    6,132
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ive always been curious about this as well.

    i had some thoughts about defining a bunch of tokens. you go through the document maybe character by character. once you find a token, you now check your list of which other tokens which may be contained within the current token, if any at all. continue that process until you each the end of that token, and any nested tokens, if any.

    maybe a primitive system could be made with arrays, strpos, a loop, substr or substr_replace, strlen.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •