I can't scrape a website

hosseinghaedi · June 29, 2019, 1:23pm

I am trying to scrape http://www.tsetmc.com/Loader.aspx?ParTree=15. i tried file_get_ content and curl but it does not work at all. I searched a lot and tried different codes and reached no result

this is the page I get as a result
please help me, I have no idea what to do

m_hutley · June 29, 2019, 2:11pm

Have you tried paying for their subscription web service, which they clearly advertise in their news section?

hosseinghaedi · June 29, 2019, 4:38pm

no, I didn’t noticed that and I don’t want to . I am learning php and I want to make a little program for myself . it is not commercial. it would be nice of you if you give me a solution. I am stuck in this little project and don’t have a clue

Mittineague · June 29, 2019, 4:56pm

If your goal is to learn how to use cURL to get JSON from a file it would be easier for you to:

create a file that outputs data as JSON *
create a file that uses cURL to get the JSON from your file *
write script that works with the JSON *

* develop with full error reporting enabled

By having all the files be your own on your own server, you will have control and knowledge of what’s involved and can learn by changing things as often as you wish.

m_hutley · June 29, 2019, 9:34pm

Violation of another website’s property rights by scraping their data when obviously such data for automation purposes is to be served through a paid service, doesn’t care about ‘commercial’ or not.

There are plenty of freely available JSON sources. The US runs a site called data.gov, which offers up freely accessible and free-to-use datasets. There are over 15,000 JSON sets available here.

SamuelCalifornia · June 30, 2019, 12:22am

I am reluctant to help someone get data for free that they can get by paying for, especially if they intend to sell it themself. We have no way of knowing what you intend to do; it is foolish to trust someone we know nothing about.

It is likely difficult, probably impossible, to use curl to do that, at least to get the updates. The data appears to be generated dynamically so a simple curl would not get it.

I have not looked at the code but here is what could be happening. When the data is updated a request is sent to the server. The server (perhaps using PHP) sends data back to update the page. If that is happening then it will be quite difficult to bypass their system. Well, you might be able to but it is so technical that I don’t know how to do it.

John_Betong · June 30, 2019, 12:34am

Yes I tried with my Curl utility and was unable to fetch the data but works ok with other sites that are not programmed to prevent scraping:

Jb-Curl

Source code is included.

hosseinghaedi · July 2, 2019, 4:53pm

thanks for all your answers:heart:
I tried a lot of php code but they did not worked for me.
after searching for hours, I found the answer in node.js
it is very easy task with nightmare library . Nightmare is a high-level browser automation library. thanks again for your time and answers

TheRedDevil · July 2, 2019, 8:10pm

As long as the page can be viewed by a browser, curl can access it as well.

The key point is that many websites/services add different systems in place to prevent automatic access. But as mentioned earlier, as long as they still need to be accessible from a browser all you need to do is emulate a user with curl and you will get the content.

With correct use of Curl you can login to a website, browse the website as the logged in user, etc. Though depending on what you want to do, it might be easier writing a Behat script than doing it manual/direct with Curl.

m_hutley · July 2, 2019, 8:30pm

I feel the need to once again invoke Crichton.

" So they are focused on whether they can do something. They never stop to ask if they should do something."
_{-Michael Crichton (via Ian Malcolm), Jurassic Park}

You can emulate a user logging into a website. But if it is in violation of a site’s ToS, you shouldn’t.

system · October 2, 2019, 3:30am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PHP or JS - Get dynamically generated data from other domain PHP	12	2056	September 15, 2020
How to retrieve certain data from a website in json format? PHP	20	6379	June 8, 2024
Scrapping site search result with PHP (cUrl) PHP	5	1476	December 6, 2012
How can i pull content from another website? PHP	16	92328	March 1, 2011
Scraping PHP	7	560	April 13, 2010

I can't scrape a website

Related topics