Script pasts HTML source with PHP and save to DB

Halllo,

I have got the following HTML source. How do can i make PHP convert this so that i can store different parts of the source in to a database?

<h1>
Je 23 spelers</h1>

<div class=“playerList”>

<div id=“ctl00_CPMain_rep1_ctl00_ucPlayerFace_pnlAvatar” class=“faceCard” style=“background-image: url(/Img/Avatar/backgrounds/card1.png);”>

<img src=“/Img/Avatar/backgrounds/bg_blue_int.png” style=“left:9px;top:10px;” alt=“” /><img src=“http://res.hattrick.org/kits/8/72/711/710100/body5.png” style=“left:9px;top:10px;” alt=“” /><img src=“/Img/Avatar/faces/f8c.png” style=“left:9px;top:10px;” alt=“” /><img src=“/Img/Avatar/eyes/e10c.png” style=“left:21px;top:21px;” alt=“” /><img src=“/Img/Avatar/mouths/m9c.png” style=“left:30px;top:61px;” alt=“” /><img src=“/Img/Avatar/noses/n32.png” style=“left:17px;top:19px;” alt=“” /><img src=“/Img/Avatar/hair/f8h8b.png” style=“left:9px;top:10px;” alt=“” /><img src=“/Img/Avatar/misc/f8injury.png” style=“left:5px;top:5px;” alt=“” /><img src=“/Img/Avatar/numbers/1.png” style=“left:83px;top:130px;” alt=“” />
</div>

<div class=“playerInfo”>
<b>

  1. <a href=“/Club/Players/Player.aspx?PlayerID=213961711&BrowseIds=213961711,228426334,210017648,237064242,223921213,120067484,123565509,153866999,114631176,145763661,123761689,122913707,89484684,27434084,100300303,152587738,137842676,38301368,260509555,265898597,261378237,268073282,281058154” title=“Nuno Carballosa” alt=“Nuno Carballosa”>Nuno Carballosa</a> <img src=“/Img/Icons/injured.gif” class=“injuryInjured” title=“Geschatte hersteltijd: [3] weken” alt=“Geschatte hersteltijd: [3] weken”></img><span>3</span>
    </b>
    <p>
    22 jaar en 58 dagen, TSI = 40 370<br />
    <a href=“/Help/Rules/AppDenominations.aspx?lt=skillshort&ll=6#skillshort” class=“skill”>redelijk</a> in vorm, conditie <a href=“/Help/Rules/AppDenominations.aspx?lt=skill&ll=7#skill” class=“skill”>goed</a> </p>

<table class=“thin”>

<tr>
<td class=“right”>
Keepen:
</td>
<td>
<img src=“/App_Themes/Standard/progressBar/p3.png” alt=“16/20” title=“16/20” class=“percentImage” style=“background-position: -29px 0px” /> <a href=“/Help/Rules/AppDenominations.aspx?lt=skill&ll=16#skill” class=“skill”>buitenaards</a>
</td>
</tr>
<tr>
<td class=“right”>
Verdedigen:
</td>
<td>
<img src=“/App_Themes/Standard/progressBar/p3.png” alt=“3/20” title=“3/20” class=“percentImage” style=“background-position: -133px 0px” /> <a href=“/Help/Rules/AppDenominations.aspx?lt=skill&ll=3#skill” class=“skill”>slecht</a>
</td>
</tr>
<tr>
<td class=“right”>
Positiespel:
</td>
<td>
<img src=“/App_Themes/Standard/progressBar/p3.png” alt=“2/20” title=“2/20” class=“percentImage” style=“background-position: -141px 0px” /> <a href=“/Help/Rules/AppDenominations.aspx?lt=skill&ll=2#skill” class=“skill”>waardeloos</a>
</td>
</tr>
<tr>
<td class=“right”>
Vleugelspel:
</td>
<td>
<img src=“/App_Themes/Standard/progressBar/p3.png” alt=“1/20” title=“1/20” class=“percentImage” style=“background-position: -149px 0px” /> <a href=“/Help/Rules/AppDenominations.aspx?lt=skill&ll=1#skill” class=“skill”>rampzalig</a>
</td>
</tr>
<tr>
<td class=“right”>
Passen:
</td>
<td>
<img src=“/App_Themes/Standard/progressBar/p3.png” alt=“1/20” title=“1/20” class=“percentImage” style=“background-position: -149px 0px” /> <a href=“/Help/Rules/AppDenominations.aspx?lt=skill&ll=1#skill” class=“skill”>rampzalig</a>
</td>
</tr>
<tr>
<td class=“right”>
Scoren:
</td>
<td>
<img src=“/App_Themes/Standard/progressBar/p3.png” alt=“1/20” title=“1/20” class=“percentImage” style=“background-position: -149px 0px” /> <a href=“/Help/Rules/AppDenominations.aspx?lt=skill&ll=1#skill” class=“skill”>rampzalig</a>
</td>
</tr>
<tr>
<td class=“right”>
Spelhervatting:
</td>
<td>
<img src=“/App_Themes/Standard/progressBar/p3.png” alt=“1/20” title=“1/20” class=“percentImage” style=“background-position: -149px 0px” /> <a href=“/Help/Rules/AppDenominations.aspx?lt=skill&ll=1#skill” class=“skill”>rampzalig</a>
</td>
</tr>

<tr>
<td class=“right middle”>
<a href=“/Club/Matches/Match.aspx?matchID=269841317&TeamId=502131&BrowseIds=&UpdateViewedReport=False”>07-04-2010</a>
</td>
<td class=“middle”>
<img src=“/Img/Matches/star_half_yellow.png” class=“starHalf” alt=“+” title=“+” />
  <span class=“shy”>(Vleugelverdediger)</span>
</td>
</tr>

</table>

</div>
<div class=“borderSeparator”>
</div>

If you do find out, please let me know. I have the HTML output from eBay I’d like to convert to a php app. :stuck_out_tongue:

Do you have any knowledge of php?

I do have some knowlegde of PHP. But for this question i don’t have a clue where to start :expressionless:

It’s not -that- hard to do… which parts do you need to isolate?

A few parts:

<div id="ctl00_CPMain_rep1_ctl00_ucPlayerFace_pnlAvatar" class="faceCard" style="background-image: url(/Img/Avatar/backgrounds/card1.png);"> 

[B]<img src="/Img/Avatar/backgrounds/bg_blue_int.png" style="left:9px;top:10px;" alt="" /><img src="http://res.hattrick.org/kits/8/72/711/710100/body5.png" style="left:9px;top:10px;" alt="" /><img src="/Img/Avatar/faces/f8c.png" style="left:9px;top:10px;" alt="" /><img src="/Img/Avatar/eyes/e10c.png" style="left:21px;top:21px;" alt="" /><img src="/Img/Avatar/mouths/m9c.png" style="left:30px;top:61px;" alt="" /><img src="/Img/Avatar/noses/n32.png" style="left:17px;top:19px;" alt="" /><img src="/Img/Avatar/hair/f8h8b.png" style="left:9px;top:10px;" alt="" /><img src="/Img/Avatar/misc/f8injury.png" style="left:5px;top:5px;" alt="" /><img src="/Img/Avatar/numbers/1.png" style="left:83px;top:130px;" alt="" /> [/B]
</div> 
1. <a href="/Club/Players/Player.aspx?PlayerID=213961711&amp;BrowseIds=213961711,228426334,210017648,237 064242,223921213,120067484,123565509,153866999,114631176,145763661,123761689,1 22913707,89484684,27434084,100300303,152587738,137842676,38301368,260509555,26 5898597,261378237,268073282,281058154" title="Nuno Carballosa" alt="Nuno Carballosa">[B]Nuno Carballosa[/B]</a>&nbsp;<img src="/Img/Icons/injured.gif" class="injuryInjured" title="Geschatte hersteltijd: [3] weken" alt="Geschatte hersteltijd: [3] weken"></img><span>3</span> 
</b> 
<p> 
[B]22 jaar en 58 dagen, TSI = 40 370[/B]<br /> 
<a href="/Help/Rules/AppDenominations.aspx?lt=skillshort&amp;ll=6#skillshort" class="skill">[B]redelijk[/B]</a> in vorm, conditie <a href="/Help/Rules/AppDenominations.aspx?lt=skill&amp;ll=7#skill" class="skill">[B]goed[/B]</a> </p> 

FOREACH TR:

<tr> 
<td class="right"> 
[B]Keepen[/B]:
</td> 
<td> 
<img src="/App_Themes/Standard/progressBar/p3.png" alt="16/20" title="16/20" class="percentImage" style="background-position: -29px 0px" /> <a href="/Help/Rules/AppDenominations.aspx?lt=skill&amp;ll=16#skill" class="skill">[B]buitenaards[/B]</a> 
</td> 
</tr> 

I’m assuming you’ve retrieved the HTML data as a String stored in $data. If you havent, file_get_contents() is probably your friend.
I’m also assuming you’ve shown me the -full- HTML data, and not hidden things that make this more difficult…

Give this a try:


<?php
//Line 1 - Image Sources
$line1 = explode('<div',$data);
$line1 = explode('src="',$line1[1]);
foreach($line1 AS $key => $value) {
  $line1[$key] = array_shift(explode('"',$value));
} 
//$line1 is now an array containing the SRC attributes of the images.

//Line2 - Address?
$line2 = str_replace('##','<br>',strip_tags(str_replace('<br>','##',array_shift(explode('</p>',array_pop(explode('<p>',$data)))))); 
//If you'd rather have a 2-key array, replace str_replace with explode and remove the first ",'<br>'"

//Line 3
$line3 = explode("</tr><tr>",array_pop(explode("<table class='thin'>",array_shift(explode('</table>',$data)))));
array_pop($line3); // Throw away last row
foreach($line3 AS $key => $value) {
  $l3o[$key] = explode('##',strip_tags(str_replace("</td><td>","##",$value)));
}
$line3 = $l3o; 
//$line3 is now a multidimensional array holding the two pieces of information isolated in your third block.
?>

I haven’t tried it yet butttttt what i’ve shown you isn’t the full HTML data. So i guess it wouldn’t work.

To be honest, what i showed you are lines 219 <h1> till 315 </div>

In total it goes from line 1 till 2573

Do you have a URL you can show me?

The original page is behind a login.

I zipped the page to: http://coding.duinmayer.nl/players.zip

it is called screen scraping and there is a whole book written over it…
it can be considered illegal as well as if you do it in volume you are wasting their bandwidth and off course copying their content …
but it can be done…
when i do (with permission) some of functions used are
curl,some regex,explode,strip functions,var dump…

I have to agree here… if the information is locked behind a login, more than likely the owner of the website doesnt want you picking all the data out of the members-only area and putting it on your own site.

You also almost certainly will have to have learned to use cURL, since file_get_contents wont transmit login session data.