I can't scrape a website

I am trying to scrape http://www.tsetmc.com/Loader.aspx?ParTree=15. i tried file_get_ content and curl but it does not work at all. I searched a lot and tried different codes and reached no result


this is the page I get as a result
please help me, I have no idea what to do

Have you tried paying for their subscription web service, which they clearly advertise in their news section?

3 Likes

no, I didn’t noticed that and I don’t want to . I am learning php and I want to make a little program for myself . it is not commercial. it would be nice of you if you give me a solution. I am stuck in this little project and don’t have a clue

If your goal is to learn how to use cURL to get JSON from a file it would be easier for you to:

  • create a file that outputs data as JSON *
  • create a file that uses cURL to get the JSON from your file *
  • write script that works with the JSON *

* develop with full error reporting enabled

By having all the files be your own on your own server, you will have control and knowledge of what’s involved and can learn by changing things as often as you wish.

5 Likes

Violation of another website’s property rights by scraping their data when obviously such data for automation purposes is to be served through a paid service, doesn’t care about ‘commercial’ or not.

There are plenty of freely available JSON sources. The US runs a site called data.gov, which offers up freely accessible and free-to-use datasets. There are over 15,000 JSON sets available here.

4 Likes

I am reluctant to help someone get data for free that they can get by paying for, especially if they intend to sell it themself. We have no way of knowing what you intend to do; it is foolish to trust someone we know nothing about.

It is likely difficult, probably impossible, to use curl to do that, at least to get the updates. The data appears to be generated dynamically so a simple curl would not get it.

I have not looked at the code but here is what could be happening. When the data is updated a request is sent to the server. The server (perhaps using PHP) sends data back to update the page. If that is happening then it will be quite difficult to bypass their system. Well, you might be able to but it is so technical that I don’t know how to do it.

2 Likes

Yes I tried with my Curl utility and was unable to fetch the data but works ok with other sites that are not programmed to prevent scraping:

Jb-Curl

Source code is included.

1 Like

thanks for all your answers:heart::heart::heart:
I tried a lot of php code but they did not worked for me.
after searching for hours, I found the answer in node.js
it is very easy task with nightmare library . Nightmare is a high-level browser automation library. thanks again for your time and answers

As long as the page can be viewed by a browser, curl can access it as well.

The key point is that many websites/services add different systems in place to prevent automatic access. But as mentioned earlier, as long as they still need to be accessible from a browser all you need to do is emulate a user with curl and you will get the content.

With correct use of Curl you can login to a website, browse the website as the logged in user, etc. Though depending on what you want to do, it might be easier writing a Behat script than doing it manual/direct with Curl.

1 Like

I feel the need to once again invoke Crichton.

" So they are focused on whether they can do something. They never stop to ask if they should do something."
-Michael Crichton (via Ian Malcolm), Jurassic Park

You can emulate a user logging into a website. But if it is in violation of a site’s ToS, you shouldn’t.

6 Likes