Grabbing Text from a website

Hi

Not sure if i’m in the right place. we have a small eccomerce website that works a little like a dropship for a small number of products we don’t want to stock.

WE log into the website each week see if its in stock or not and adjust our website accordingly, but have to look at 100 pages in all

Is there an easy way of automating this in some sort of code and exporting to excel.

A bit likea website grabber application but something that just grabs a piece of the website.

Paul

one way is to write php code which runs on your server and trigger it every week by a cron job. the code would request and get a whole page’s html, then the code would parse the html to extract the bits you want. if all those 100 pages are from the same source, using the same logic/structure, you’d probably be able to write one bit of parsing code to work for all 100. if all the 100 pages are from different sources, using entirely different formats/structure, you’ll have to write different parsing code for each one, but once done, done, until they go and change their format. worth looking to see if they provide their product info in any other format, some rss feed, or some other xml format; if not, ask them to. better to get it from that kind of data rather than html; screen or web page scraping as it’s known.

how to page scrape is in this book: http://nostarch.com/webbots.htm

its source code is here: http://www.schrenk.com/nostarch/webbots/DSP_download.php

the code for getting a webpage is LIB_http.php – that’s the bit you want.

then it’s just a question of using various string parsing technique, strpos() and regexes etc. to extract the bits you’re intersted in. it’s a messy process and breaks if and when they change their html/web page.