SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Member Seinfeld's Avatar
    Join Date
    Jun 2004
    Location
    there's no place like 127.0.01
    Posts
    11
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Extracting data from a HTML document

    (my first post )
    Is there any chance to do this easy ?
    I'm not good at regexes ...
    my document is a list of tables like this :
    Code:
     	  <TABLE cellSpacing=0 cellPadding=2 width=500 border=0>
     		<TBODY>
     		<TR>
     		  <TD class=l vAlign=top width="50%" bgColor=#f0f0f0><FONT 
     		    face="Arial, Helvetica" color=#000000 size=2><STRONG>Almodovar , 
     		    Olivia <BR>Home Phone: 562-423-6578<BR>Voice Mail: 000-0000<BR>Fax: 
     			562-490-9715<BR>Pager: 000-0000<BR>Email: <A 
     		    href="mailto:zmalmodovar@charter.net">zmalmodovar@charter.net</A><BR>Public 
     			Id: PALMOOLI </STRONG></FONT><BR><FONT face="Arial, Helvetica" 
     		    color=#000000 size=2><STRONG></STRONG></FONT><A 
     		    href="http://www5.priv.socal.xmlsweb.com/Listings_Roster.asp?w=a&amp;roster=PALMOOLI&amp;a=active"><IMG 
     		    height=11 alt=Search src="0082_files/darkred_diam.gif" width=11 
     		    border=0 name=Search>&nbsp;View Agent's Active Listings</A><BR><A 
     		    href="http://www5.priv.socal.xmlsweb.com/Listings_Roster.asp?w=a&amp;roster=PALMOOLI&amp;a=sold"><IMG 
     		    height=11 alt=Search src="0082_files/darkred_diam.gif" width=11 
     		    border=0 name=Search>&nbsp;View Agent's Sold Listings</A><BR><A 
     		    href="http://www5.priv.socal.xmlsweb.com/Listings_Roster.asp?w=a&amp;roster=PALMOOLI&amp;a=pend"><IMG 
     		    height=11 alt=Search src="0082_files/darkred_diam.gif" width=11 
     		    border=0 name=Search>&nbsp;View Agent's Pending Listings</A><BR>
     		  <TD class=l vAlign=top width="50%" bgColor=#f0f0f0><FONT 
     		    face="Arial, Helvetica" color=#000000 size=2><STRONG>Coldwell Banker 
     		    Coast Alliance&nbsp;-&nbsp;0082 <BR>3826 Atlantic Ave.&nbsp;<BR>Long 
     		    Beach , CA &nbsp;90807 <BR>Office Phone: 562-426-6577<BR>Office Fax: 
     		    562-490-9715<BR></STRONG></FONT></TD></TR></TBODY></TABLE>
     	  <P>
    ... etc (tons of tables like this ..are following)
    I want to exctract & serialize the data from some table cells in a CSV file ...
    Please help.

  2. #2
    Fully Sweet Car noddy's Avatar
    Join Date
    Aug 2002
    Location
    Perth, Western Australia
    Posts
    759
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Strip slahes on the document to start with.

  3. #3
    public static void brain Gybbyl's Avatar
    Join Date
    Jun 2002
    Location
    Montana, USA
    Posts
    647
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You may have to set up a parser like you would be going through an XML document. This is made easy with PHP 5's SimpleXML API.

    That may be too complex, as well. I don't know if there is a truly simple solution for this.
    Ryan

  4. #4
    SitePoint Member Seinfeld's Avatar
    Join Date
    Jun 2004
    Location
    there's no place like 127.0.01
    Posts
    11
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Gybbyl
    You may have to set up a parser like you would be going through an XML document. This is made easy with PHP 5's SimpleXML API.

    That may be too complex, as well. I don't know if there is a truly simple solution for this.
    ok .. I started doing it with regex and I'm almost finished ..
    Thanks guys ..


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •