SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Addict
    Join Date
    Jun 2008
    Posts
    205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    content written in file

    The below code works fine if i write the $text in the same file.

    Code:
    $text = '{AUTHOR}
    author1
    staff1
    {HEADLINE}
    DISPOSABLE DECOR: THE CUTTING EDGE DULLS FAST\
    STYLE AT A SPEED
    USUALLY ASSOCIATED WITH WARDROBE ITEMS.
    
    ';
    
    preg_match('~{AUTHOR}([^{]+)~is', $text, $matches);
    
    echo nl2br(trim($matches[1]));
    If I write the $text in a file called new.txt
    Code:
    {AUTHOR}
    author1
    staff1
    {HEADLINE}
    DISPOSABLE DECOR: THE CUTTING EDGE DULLS FAST\
    STYLE AT A SPEED
    USUALLY ASSOCIATED WITH WARDROBE ITEMS.
    {AUTHOR}
    author2
    staff2
    {HEADLINE}
    and the below code doesnot print the same value as above

    Code:
    <?php
    $fcontents = file ('new.txt');
    while (list($line,$str) = each( $fcontents ))  {
    preg_match('~{AUTHOR}([^{]+)~is', $str, $matches);
    echo nl2br(trim($matches[1]));
    }
    
    ?>
    It doesnot work and it doesnot print all the values between {AUTHOR} and first {}

  2. #2
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    54 Post(s)
    Tagged
    0 Thread(s)
    The file() function returns an array with each array item containing a line from the file. So you're trying to match against a single line of the file, where you want to be matching against multiple lines / the whole file.

    Why not use file_get_contents()? That way you get the whole file easily, with no need to loop.
    Salathe
    Software Developer and PHP Manual Author.

  3. #3
    SitePoint Addict
    Join Date
    Jun 2008
    Posts
    205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
    <?php
    $fcontents = file_get_contents('file.txt');
    preg_match_all('~{AUTHOR}([^{]+)~is', $fcontents, $matches);
    
    foreach($matches[1] as $match)
    {
       echo nl2br(trim($match)) . '<br />';
    }
    ?>

    This works with the small file.....

    If the run the script on large file and redirect the to file gives out the message

    Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 261125730 bytes) in filename.php

  4. #4
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    54 Post(s)
    Tagged
    0 Thread(s)
    How big is the "large file", you appear to be needing 249 MB of memory to hold it! If the file is indeed approaching that size (or even an order of magnitude smaller) then your regular expression approach may not be the best suited to solving the problem.
    Salathe
    Software Developer and PHP Manual Author.

  5. #5
    SitePoint Enthusiast Logicb0x's Avatar
    Join Date
    Apr 2008
    Posts
    44
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Huh, I'm very confused with PHP
    My Signature Talk About Info Lombok | Gadis Bugil | Bisnis Online

  6. #6
    SitePoint Addict
    Join Date
    Jun 2008
    Posts
    205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Is there any approach to print the values which doesnot consume large memory?.

  7. #7
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You can read the file in chunks using fopen() fread() fseek()

    Read a generously sized chunk(more than double the size of the largest token you will be matching). Test if the chunk contains the token. If so, note the position that the match occured at, as well as the length of the match. fseek() to the next byte after the token, and repeat, reading another chunk. If the chunk did not contain the token, fseek() half the size of the chunk forward, and repeat.

    An edge case-if the chunk contained a match, and it occurred at the absolute end of the chunk, you cannot be sure you matched the entire token. You should consider this a "no match" and proceed as if there was no match.

    You can use the same regex, but you should use preg_match() instead because you want to only match one at a time. Use the PREG_OFFSET_CAPTURE flag so you can find the position of the match.

    If you can't anticipate a reasonable maximum size of a token, you will need to use a similar, but slightly different method.

    There's a lot of various methods like this, but they all revolve around reading smaller, more manageable chunks of the file at a time so that you don't consume too much memory.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •