Screen Scraping help

Hello!

I need some help with screen scraping.

I want to take part of a website and show it in my website.

I found a wordpress plugin that would do the job and I have tried it, it works great. But my problem is that I dont use wordpress and want to use the script in a php file.
utd.hu/simple-and-easy-to-use-plugin-to-show-a-part-of-another-website-on-your-page
This is were I found the plugin.

I think this is the code that I need to use in my php file. Problem is, I dont know how I should insert everything, because when its in wordpress you just use a shortcode.

So I need your help to show me were I should insert the URL, the starting tag, and the ending tag. Also showing me how the script should look in my php file so that it works.

	

	$myurlstring = $url['url']; 
	$sample1=str_replace('>','>',str_replace('<','<',$url['sample1']));
	$sample2=str_replace('>','>',str_replace('<','<',$url['sample2']));
	$prefix=str_replace('>','>',str_replace('<','<',$url['prefix']));
	$suffix=str_replace('>','>',str_replace('<','<',$url['suffix']));
	$rfrom=str_replace('>','>',str_replace('<','<',$url['replace_from']));
	$rto=str_replace('>','>',str_replace('<','<',$url['replace_to']));
	$ckfile = tempnam ("/tmp", "CURLCOOKIE");
	$ch = curl_init ($myurlstring);
	curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile); 
	curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
	$output = curl_exec ($ch);

	$ch = curl_init ($myurlstring);
	//curl_setopt ($ch, CURLOPT_POST, 1); 
	//curl_setopt ($ch, CURLOPT_POSTFIELDS, "sType=Vehicle+VIN&vtype=A&vin=".$vinnumber); 
	curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile); 
	curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
	$all = curl_exec ($ch);
	$dfpos=@strpos($all,$sample1);	
	$depos=@strpos($all,$sample2,$dfpos);

	$div=$prefix.substr($all,$dfpos,$depos-$dfpos).$suffix;
	$div=str_replace($rfrom,$rto,$div);
	echo $div;
	curl_close ($ch);


I will be thankful for any help.

Hi,
Try to use HTML Dom Library.There is a function called
loadHtmlFile().With this you can get the source code of any file.After that you can edit to whatever you want to display.

Certainly when starting out with something like this, do it in 2 operations.

1 get the file, cache it
2 extract what you want

actually its 3…

3 delete the old file

Try it this way because there is so much that can go wrong with each operation (latency, cookie acceptance, negotiating redirects then dom traversal and data lifting from the source), you need to zoom in on where the error lies. Divide and conquer.

Off Topic:

If you are doing this without permission you should contact them first and ask, then put it to them they provide a data feed in the way of an API or downloads. What country you in?

Is there any ready made script that I can use or any tutorial.

I would rather want to use the the wordpress plugin script since I have tested it and it works exactly like I want. But since i’m not doing it in wordpress I dont know where to insert all the variables,( like url, start tag, end tag e.t.c.) and how to make it work outside of wordpress.