There’s a site which updates a number on a daily basis and I want to grab that number via a cron script that will run once a day. That site’s content is dynamically produced via JS - no HTML of the element of that number is visible via “Ctrl + U”.
Is it possible to somehow grab that number using PHP and or JS?
I know file_get_contents(), but it does not work in this case.
Doesn’t that site provide an API so that you can access the data without having to scrape the site? I just wonder if part of the reason that they generate the site in that way is to make it difficult for people to scrape it.
They are relatively new, and have no API yet. What I need to get is a 4-digit number, once a day. My script will be no different than me manually accessing the site and writing down the number into my script.
Thank you for the suggestion. I will take a look into that, but I mostly prefer not to use any libraries or extra scripts, unless it is totally impossible otherwise. E.g. I’m wondering if it is possible via Ajax…
A site that loads its content via Javascript will not magically produce the content via curl, or wget, or anything else - the Javascript has to be run by a browser in order to result in the output.
That said, the site seems to be going through a lot of steps to prevent its content from being read. What site is this, and have you read their Terms of Service?
They use JS to display their data. I need to get, once a day at the same time, Oracle Rate and Price Target values.
I’m doing it manually now, and if I manage to do it with cron or something, it shall be no different than manually doing it - I mean no extra load on that site or any kind of scraping extra pages. Just one single hit request per day.
What you could do is bring up the site in a browser, press F12 and look at the network tab. Refresh the browser and see what sort of request is being triggered by the javascript to get your number. Duplicating the request itself with curl should not be difficult though if they have any sort of security on it then you might still hit a roadblock.
The owner of the site replied, with a link to their simple API, which is what I needed actually. I was not able to see it on their GitHub earlier, weird thing. Here’s the link, in case anyone may need some time: