SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Enthusiast
    Join Date
    Feb 2003
    Location
    l'Europe
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    preg_replace in HTML-code

    Hi,

    I succeeded in retrieving the source code with PHP imap from email HTML-attachments. These attachments are (HTML)pages (forms with filled out data) produced by an external program.

    The source code is actually a HTML-page with 4 or 5 tables, each table holding labels and data.
    The second or third table holds the data of two addresses, like

    HTML Code:
    [size=1][color=#0000c0]
    <TABLE ....>[/color][/size][size=1]...[/size][size=1][color=#0000c0]</TABLE>
     
    <TABLE style=[/color][/size][size=1]"FONT-SIZE: 10pt"[/size][size=1][color=#0000c0] width=[/color][/size][size=1]"100%"[/size][size=1][color=#0000c0] border=[/color][/size][size=1]0[/size][size=1][color=#0000c0]>
     
    [/color][/size][size=1][color=#0000c0]<TBODY>
     
    [/color][/size][size=1][color=#0000c0]<TR>
     
    [/color][/size][size=1][color=#0000c0]<TH align=[/color][/size][size=1]left[/size][size=1][color=#0000c0] width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]address 1:[/size][size=1][color=#0000c0]</TH>
     
    [/color][/size][size=1][color=#0000c0]<TH align=[/color][/size][size=1]left[/size][size=1][color=#0000c0] width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]address 2:[/size][size=1][color=#0000c0]</TH></TR>
     
    [/color][/size][size=1][color=#0000c0]<TR></TR>
     
    [/color][/size][size=1][color=#0000c0]<TR>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]Name 1[/size][size=1][color=#0000c0]</TD>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]Name 2[/size][size=1][color=#0000c0]</TD></TR>
     
    [/color][/size][size=1][color=#0000c0]<TR>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]Street 1[/size][size=1][color=#0000c0]</TD>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]Street 2[/size][size=1][color=#0000c0]</TD></TR>
     
    [/color][/size][size=1][color=#0000c0]<TR>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]Zip 1 City 1[/size][size=1][color=#0000c0]</TD>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]Zip 2 City 2[/size][size=1][color=#0000c0]</TD></TR>
     
    [/color][/size][size=1][color=#0000c0]<TR>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]Tel. 1 Fax 1[/size][size=1][color=#0000c0]</TD>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]Tel. 2 Fax 2[/size][size=1][color=#0000c0]</TD></TR>
     
    [/color][/size][size=1][color=#0000c0]<TR>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]More info 1a[/size][size=1][color=#0000c0]</TD>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]&nbsp;[/size][size=1][color=#0000c0]</TD></TR>
     
    [/color][/size][size=1][color=#0000c0]<TR>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]More info 1b[/size][size=1][color=#0000c0]</TD>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]&nbsp;[/size][size=1][color=#0000c0]</TD></TR>
     
    [/color][/size][size=1][color=#0000c0]<TR>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]More info 1c[/size][size=1][color=#0000c0]</TD>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]></TD></TR>
     
    [/color][/size][size=1][color=#0000c0]<TR>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]>[/color][/size][size=1]More info 1d[/size][size=1][color=#0000c0]</TD>
     
    [/color][/size][size=1][color=#0000c0]<TD width=[/color][/size][size=1]"50%"[/size][size=1][color=#0000c0]></TD></TR></TBODY></TABLE>
     
    <TABLE ....>[/color][/size][size=1]...[/size][size=1][color=#0000c0]</TABLE>
     
    [/color][/size]
    I need to erase the data from address 2 and save the whole code in MySQL for later re-use (with other adresses 2).
    Therefore I wanted to use the function preg_replace to empty the data, but I don't know how to access the second row each time.
    Maybe I need to give each row an id='number' first to be able to pinpoint rows? But how can I use a var $i inside preg_replace.
    I've tried many things, but my problem is to pinpoint those rows of address 2...

    Any suggestions are more than welcome!!

    Thank you for your help,

    Ann

  2. #2
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Perhaps something on the order of:
    PHP Code:
    $test = <<<EOS

    <TABLE ....>...</TABLE>
     
    <TABLE style="FONT-SIZE: 10pt" width="100%" border=0>
     
    <TBODY>
     
    <TR>
     
    <TH align=left width="50%">address 1:</TH>
     
    <TH align=left width="50%">address 2:</TH></TR>
     
    <TR></TR>
     
    <TR>
     
    <TD width="50%">Name 1</TD>
     
    <TD width="50%">Name 2</TD></TR>
     
    <TR>
     
    <TD width="50%">Street 1</TD>
     
    <TD width="50%">Street 2</TD></TR>
     
    <TR>
     
    <TD width="50%">Zip 1 City 1</TD>
     
    <TD width="50%">Zip 2 City 2</TD></TR>
     
    <TR>
     
    <TD width="50%">Tel. 1 Fax 1</TD>
     
    <TD width="50%">Tel. 2 Fax 2</TD></TR>
     
    <TR>
     
    <TD width="50%">More info 1a</TD>
     
    <TD width="50%">&nbsp;</TD></TR>
     
    <TR>
     
    <TD width="50%">More info 1b</TD>
     
    <TD width="50%">&nbsp;</TD></TR>
     
    <TR>
     
    <TD width="50%">More info 1c</TD>
     
    <TD width="50%"></TD></TR>
     
    <TR>
     
    <TD width="50%">More info 1d</TD>
     
    <TD width="50%"></TD></TR></TBODY></TABLE>
     
    <TABLE ....>...</TABLE>
    EOS;

    $regex = <<<EOS
    ~<TABLE[^>]*>\s*<TBODY>\s*<TR>\s*<TH[^>]*>[^<]*</TH>\s*
    <TH[^>]*>([^<]*)</TH></TR>\s*    # capture address 2
    <TR>\s*</TR>\s*<TR>\s*<TD[^>]*>[^<]*</TD>\s*
    <TD[^>]*>([^<]*)</TD>\s*</TR>\s*    # capture name 2
    <TR>\s*<TD[^>]*>[^<]*</TD>\s*
    <TD[^>]*>([^<]*)</TD>\s*</TR>\s*    # capture street 2
    <TR>\s*<TD[^>]*>[^<]*</TD>\s*
    <TD[^>]*>([^<]*)</TD>\s*</TR>\s*    # capture zip2 city2
    <TR>\s*<TD[^>]*>[^<]*</TD>\s*
    <TD[^>]*>([^<]*)</TD>\s*</TR>\s*    # capture tel2 fax2
    <TR>\s*<TD[^>]*>[^<]*</TD>\s*
    <TD[^>]*>([^<]*)</TD>\s*</TR>\s*    # capture info 2a
    <TR>\s*<TD[^>]*>[^<]*</TD>\s*
    <TD[^>]*>([^<]*)</TD>\s*</TR>\s*    # capture info 2b
    <TR>\s*<TD[^>]*>[^<]*</TD>\s*
    <TD[^>]*>([^<]*)</TD>\s*</TR>\s*    # capture info 2c
    <TR>\s*<TD[^>]*>[^<]*</TD>\s*
    <TD[^>]*>([^<]*)</TD>\s*</TR>\s*    # capture info 2d
    </TBODY>\s*</TABLE>
         # end regex, use case insensitive, ungreedy
        # multiline, do-it-all, extended whitespace parsing
    ~iumsx
    EOS;

    preg_match($regex$test$matches);
    var_dump($matches); 
    makes the following captures:
    Code:
      [1]=>
      string(10) "address 2:"
      [2]=>
      string(6) "Name 2"
      [3]=>
      string(8) "Street 2"
      [4]=>
      string(12) "Zip 2 City 2"
      [5]=>
      string(12) "Tel. 2 Fax 2"
      [6]=>
      string(6) " "
      [7]=>
      string(6) " "
      [8]=>
      string(0) ""
      [9]=>
      string(0) ""
    Jason Sweat ZCE - jsweat_php@yahoo.com
    Book: PHP Patterns
    Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
    Detestable (adjective): software that isn't testable.

  3. #3
    SitePoint Enthusiast
    Join Date
    Feb 2003
    Location
    l'Europe
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Dear sweatje, thank you for your reply!

    !Amazing!

    I've managed to 'translate' the answer with my PHP-manual and Google. I have to little knowledge on this subject, but now I understand it a little bit more...

    I(->you)' ve managed to 'pinpoint' the data of address 2.

    The function preg_match gives that data, but I want to replace that data immediately with other data (another 'address 2:', 'name 2', etc.).

    In that way, I can re-use the HTML-code as a sort of template for another form or letter.

    I've tried to use the values of the array $matches with the function str_replace() and that's OK for values that aren't empty, but I have a problem with values 6, 7 and 8 (space or empty) where I can't change the specific fields.

    I've tried to use the function preg_replace() where I use the same variables $regex and $test as before and I use for $replacements an array containing values for the 9 data fields of another address 2, but then I end up with following error:
    preg_replace(): Parameter mismatch, pattern is a string while replacement in an array.

    Could you save me one more time?

    Thank you very much for your cooperation!

    Sincerely,

    Ann

  4. #4
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    add this in with the previous code and I think you have it. No points for style, but it seems to work:

    PHP Code:
    $replace = array(
         
    'replacement for add2'
        
    ,'replacement for name2'
        
    ,'replacement for street2'
        
    ,'replacement for zip and city2'
        
    ,'replacement for tel and fax2'
        
    ,'replacement for info2a'
        
    ,'replacement for info2b'
        
    ,'replacement for info2c'
        
    ,'replacement for info2d'
        
    );
    $out $test;
    for(
    $i=1$i<10$i++) { 
        
    $out preg_replace(keep_match($regex,$i),'\\1'.$replace[$i-1].'\\2',$out);
    }

    echo 
    htmlspecialchars($out);

        
    function 
    keep_match($subject$which) {
        
    $ret = ($which 1) ? preg_replace('/\(/''(?:'$subject, (int)$which-1) : $subject;
        
    $ret preg_replace('/\((?!\?:)/''OPEN_PARIN'$ret1);
        
    $ret preg_replace('/\((?!\?:)/','(?:',$ret);
        
    $ret str_replace('~<''~(<'$ret);
        
    $ret str_replace('OPEN_PARIN[^<]*)''OPEN_PARIN[^<]*('$ret);
        
    $ret str_replace('OPEN_PARIN'')'$ret);
        
    $ret str_replace('</TABLE>''</TABLE>)'$ret);
        return 
    $ret;

    Basically the "keep_match" code changes all the other matches from the prior regex into "non-capturing" matches. Then it captures everything before what you want as \1 and everything after what you want as \2. Then in the loop you just do the replacement.

    HTH
    Last edited by sweatje; Jan 13, 2004 at 08:33. Reason: bug: should be str_replace('~<', '~(<', $ret);

  5. #5
    SitePoint Enthusiast
    Join Date
    Feb 2003
    Location
    l'Europe
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    kisses

    Dear sweatje, thank you for your reply!

    It is wonderful! Your solution does exactly what I wanted to achieve. Thank you very much! You are a real professional...


    Sincerely,

    Ann

  6. #6
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    No problem, glad it helped.

    It occured to me that you should be able to reverse all the captures in the first regex I posted (capture all of the <TABLE... stuff, instead of the original capture of the content) and do all the replacements with one regex. The replacement would look like '\\1'.$replace_addr2.'\\2'.$replace_name2.'\\3'. and so on.

    Should be doable if you ever are worried about performance.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •