Convert Javascript/PHP file to static HTML

I have a client that is moving their current site / shopping cart to a hosted accounting / CRM / etc software. They are currently ranked VERY high with google on a number of keywords, so pagerank is the highest priority in changing.

For a variety of reasons that I won’t get into here, I need to create pages for the current products that will have the same URLs as the current pages, but get their content from the hosted database using <script> tags that are supplied by the hosted solution such as:


<script src='http://shopping.netsuite.com/app/site/query/getitemname.nl?c=724997&n=1&id=567'></script>

the problem is that google won’t index the javascript generated content, so I need to copy the completely loaded page source into a new file that will then have the same URL as the current page for the same product.

Is there an existing tool using PHP, javascript, anything else that will do this? I am most fluent in PHP.

Cheers,

You could use cURL to “visit” that URL and return the data. Then, with PHP, echo that data to the browser before the page is displayed.

There are tons of examples of cURL out there. The easiest way to get started, in my opinion is: http://php.net/manual/en/book.curl.php

Read up on Apache’s mod_rewrite module. That allows your php script to take a URL as an argument and return the requested page. The process occurs serverside, so google indexing bots won’t be able to see the difference.

I wrote a post containing a basic mod_rewrite that files all pages to the same URL but I’m in a bit of a hurry at the moment - if you search this site for posts I made with mod_rewrite in the text it should be among the first few hits (other than this post).

cURL seems like it should be useful, but the output still just contains the <script> tag rather than the output from the script. In the case above:


<script src='http://shopping.netsuite.com/app/site/query/getitemname.nl?c=724997&n=1&id=567'></script>

outputs document.write(‘Laptop Lock PRO’);

What I want in the output is simply to have: Laptop Lock PRO;

I can use Snoopy Class to get the text for the JavaScript src file and then use a regular expression to grap the text inside the (’ … ') of the document.write, but there are 25 - 30 of these little scripts on each page, so the Snoopy approach will take a LONG time loading.

What I want is something like the cURL approach (or something similar in Snoopy) that waits for the page to finish loading (including all scripts) and then grabs the output.

I assumed that all of your scripts do document.write() - that seems the only logical reason to have multiple script tags throughout the body. If that’s not the case, I apologize.

If it is case, however, it would be fairly simple to grab the text using preg_match (or even str_replace if you have to) and remove document.write(’ from the begging and '); from the end. Then, display what’s in the middle.

You could simply run a full local search-and-replace with regex to all the files and replace:

<script src='http://shopping.netsuite.com/app/site/query/getitemname.nl?c=724997&n=1&id=567'></script>

with:


<?php
echo cURLFunction('http://shopping.netsuite.com/app/site/query/getitemname.nl?c=724997&n=1&id=567');
?>

That would be easy enough to do using regex with a program like Dreamweaver or NetBeans.

Inside of cURLFunction() you grab the output and remove document.write(’ and '); leaving only the raw text. Then, output it.

Thanks telos…

I figured I had to do something like that, but that means that I have to load the html into a php var, then preg_match the <script> tags and then cURL (or Snoopy) the src file from the script tags. In this case, I have to do that about 25 times per page, which will make the page loading REALLY slow. I am guessing that in order to keep things happy with Google, I will either need to cache the results or create some sort of HTML static version of the output.

Actually now that I think about it, caching the results is a fairly good idea. This will allow me to update info (text, pricing, etc.) on the hosted shopping cart and have the info update fairly quickly on the static page side, while keeping the content loading quickly.

Thanks for the help.

If you have access to the HTML on your local machine, running a regex ONCE and then uploading the modified files would be easiest. That way, it doesn’t have to run preg_replace EVERY TIME the page loads - which is a bad idea.

After that, the page wouldn’t be any slower than it would with JS alone.

Obviously, there are much better solutions in the long run and this is just a quick fix.