SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Addict
    Join Date
    May 2008
    Location
    Missouri, USA
    Posts
    273
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Simple HTML Dom Parser Problem

    I'm trying to use the simplehtmldomparser to parse an html file but i'm betting an error.

    This is the error i'm getting:
    Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 16 bytes) in C:\xampp\htdocs\www\gathub\simple_html_dom.php on line 879

    However, i've tried increasing the memory size in all my php.ini files and its made no difference. Anything else I can do?
    Follow Me On Twitter: BryceRay

  2. #2
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    do a phpinfo();
    find location of correct php.ini file
    edit
    restart webserver or reboot

    You could also use ini_set()

  3. #3
    SitePoint Addict
    Join Date
    May 2008
    Location
    Missouri, USA
    Posts
    273
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Any suggestions for parsing inconsistent data? I've never done this before and its causing a lot of problems.

    This is a sample of the code i'm trying to parse:
    This is the page i'm trying to parse:
    http://www.efluids.com/efluids/pages/ShowByAlphabet.htm

    This is my code using the simplehtmldomparser:
    PHP Code:
    include('simple_html_dom.php');

    $html file_get_html('http://www.efluids.com/efluids/jsp/efWho_is_Who.jsp?alphabet=a');
     
    // extract text from table
     
    $return = array();
    foreach(
    $html->find('td[width=680] font') as $e)
       
    array_push($return$e->plaintext '<br>');

    print_r($return); 
    And this is the output:
    Code:
    Array ( [0] => Abart Bruno Affiliation: SIRIA 
    [1] => Abart
    [2] => Bruno
    [3] => Affiliation: SIRIA 
    [4] => Address: 2 rue de la vienne,
    [5] => Saint Sebastien sur Loire
    [6] => France
    [7] => 44230
    [8] => Email: abart@ec-nantes.fr   Keywords: NA   Consulting: Yes   Expert witness: Yes   Review Papers and Proposals: Yes
    [9] => abart@ec-nantes.fr
    [10] => Keywords: NA  
    [11] => Consulting: Yes  
    [12] => Expert witness: Yes  
    [13] => Review Papers and Proposals:
    [14] => Yes
    [15] => Abarzhi Snezhana I;    Prof. Dr.;    Affiliation: University of Chicago 
    [16] => Abarzhi
    [17] => Snezhana
    [18] => I;   
    [19] => Prof. Dr.;   
    [20] => Affiliation: University of Chicago 
    [21] => Address: 5640 S. Ellis Ave RI-427,
    [22] => Chicago
    [23] => IL,
    [24] => USA
    [25] => 60637
    Looking at all the code I just can't figure out a way to grab the information I need because of the way its formatted? Is parsing this file doable? or should I just do it by hand?

    Thanks for your insight into the problem.
    Follow Me On Twitter: BryceRay


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •