PHP or JS - Get dynamically generated data from other domain

Hi,

There’s a site which updates a number on a daily basis and I want to grab that number via a cron script that will run once a day. That site’s content is dynamically produced via JS - no HTML of the element of that number is visible via “Ctrl + U”.

Is it possible to somehow grab that number using PHP and or JS?

I know file_get_contents(), but it does not work in this case.

Thanks.

You can scrape dynamically generated content with puppeteer.

https://manuelhans.com/blog/2020/01/17/scraping-a-dynamic-web-page-using-puppeteer/

It should also be possible to run that via a cron job, but be aware that it requires a Node.js runtime.

1 Like

Doesn’t that site provide an API so that you can access the data without having to scrape the site? I just wonder if part of the reason that they generate the site in that way is to make it difficult for people to scrape it.

It’s probably just using some kind of JS framework.

They are relatively new, and have no API yet. What I need to get is a 4-digit number, once a day. My script will be no different than me manually accessing the site and writing down the number into my script.

I guess I can manually do it for the time being.

Thank you for the suggestion. I will take a look into that, but I mostly prefer not to use any libraries or extra scripts, unless it is totally impossible otherwise. E.g. I’m wondering if it is possible via Ajax…

You’ll run into CORS issues, most likely.

Try PHP curl and PHP wget. Both should get the complete web page, I’m not sure about the “dynamically produced JavaScript”.

A site that loads its content via Javascript will not magically produce the content via curl, or wget, or anything else - the Javascript has to be run by a browser in order to result in the output.

That said, the site seems to be going through a lot of steps to prevent its content from being read. What site is this, and have you read their Terms of Service?

1 Like

Thank you for your insights. The site has no TOS or the like. I even contacted them asking for a possible API. Here is the link actually:

https://www.ampleforth.org/dashboard/

They use JS to display their data. I need to get, once a day at the same time, Oracle Rate and Price Target values.

I’m doing it manually now, and if I manage to do it with cron or something, it shall be no different than manually doing it - I mean no extra load on that site or any kind of scraping extra pages. Just one single hit request per day.

It’s a React app which is just pulling in some content dynamically. I don’t think they’re trying to prevent anything.

What you could do is bring up the site in a browser, press F12 and look at the network tab. Refresh the browser and see what sort of request is being triggered by the javascript to get your number. Duplicating the request itself with curl should not be difficult though if they have any sort of security on it then you might still hit a roadblock.

1 Like

The owner of the site replied, with a link to their simple API, which is what I needed actually. I was not able to see it on their GitHub earlier, weird thing. Here’s the link, in case anyone may need some time:

https://github.com/ampleforth/Ampleforth-Wiki/wiki/Ampleforth-API

3 Likes

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.