Using PHP cURL to fix an article

@John_Betong,

In another thread of my, you offered the below code…

I haven’t done any PHP code in a couple of years and it is like I am starting from scratch.

Can you help me undrstand what is going on with your code and what exactly it is doing to that problematic NYTimes article I was complaining about in my other thread?

Thanks,

Cat

The only thing I see, is putting an article on your own website from another. In otherwords, on this page of my website, display this article from the nytimes.com.

It’s really that simple.

I was under the impression it also fixed the issues discussed here…

https://www.sitepoint.com/community/t/saved-webpages-unreadable/312893

John made it sound like it stripped out offending items or maybe it captured the missing dynamic Javascript parts whch seem to be blowing up my saved html files.

The web page was created to solve a problem which can be found by following the SP Forum link at the top right hand of the web page.

The web page code uses PHP Curl(…) to extract the HTML web page source from web pages.

As mentioned at the top of the page by appending ?url=https://example.com/web-page-to-curl.html the web page will display the results in numerous forms.

I searched for the page you mentioned, appended the link and obtained the web page source.

The web page source was copied to the page you had problems viewing and saving.

The complete link to see the problematic web page HTML source:

Curled HTML Web Page Source

Beware: I think only the date is being used in the URL and the text changes depending on updates to the page…

So that code “crawls” a webpage, gathers the HTML, and displays it using PHP, right?

However, it doesn’t solve any issues that might be caused by external stylesheets or “rendered” Javascript pages, right?

I think for external sources you’ll need to parse out the src values and request those too.

IMHO the only way to solve issues to external style sheets and JavaScripts changes is to take a PDF snapshot as mentioned by @AllanP in your other thread,

Post #28: Saved webpages unreadable - #28 by AllanP

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.