I read this thread a couple of days ago and would like to ask something. It is very relevant.
My site is live, if content changes it updates the users current view. The check is made every 10 seconds. I created a page that gets the RSS from the BBC feeds and embeds it into the current page. Rather than the script reading the BBC's RSS on the initial hit and every subsiquent "live" hit for each user, I check it once and save the HTML into its own file on the server. Subsiquent hits then check the time this was last done, if less than 10 seconds use the current saved html, else go to the BBC and see if the feed has been updated. This makes sure that if my site gets busy the most requests it will ever send to the BBC is 1 every 10 seconds. It also improves the execution time for that particular script. I was thinking of doing the same to my website. When I do I query on a database to build a public page instead of just outputting it to the client I save it first as HTML file on the server. All requests in the next 10 seconds use just the info from that file, either with includes or reading as XML and outputting to the screen. The first request after that 10 seconds repeats the process.
A theory is, if they are going to scrape your content they will find a way of doing it, and speeding up the time it takes for you give them that data may be a good defense.
Would something like that help?