How to retrieve certain data from a website in json format?

cdevl3749 · April 15, 2024, 8:30pm

Good evening

Do you know of any other tutorials?

Thank you for your help.

m_hutley · April 15, 2024, 8:42pm

Does the website offer the data in a json format?

(Is this not the same post you made 6 minutes prior?)

cdevl3749 · April 15, 2024, 8:53pm

I have items here https://jandjoy.com/collections/femme?filter.p.product_type=Jupes&sort_by=created-descending

and for each article I would like to retrieve the data here in json format

m_hutley · April 15, 2024, 9:00pm

Is this your website?

cdevl3749 · April 15, 2024, 9:03pm

no, I saw that it is possible via web scrapping.

I just want to retrieve data in json format

m_hutley · April 15, 2024, 9:18pm

Right, so that we’re clear, the VAST majority of websites will not legally allow you to do so. You are responsible for checking the terms and conditions of a website to ensure it’s permitted to do; you may be prosecuted if found to be violating such a restriction. I assume no liability for your actions; this is your own dangerous road to tread.

You would pull the page’s HTML, and using PHP’s functions (probably DOM functions, or string functions) to identify the products, and convert the HTML into the desired JSON format.

A cursory glance at the HTML structure tells me that the objects in question are found in html nodes with the “w-full” class.

cdevl3749 · April 15, 2024, 9:26pm

Thanks for this info.
As for the theory, I have understood my problem is that I am looking for a little tutorial on the internet to convert it into json format and the problem is that on the internet I find tutorials which explain how to work with a json file and not the other way around

m_hutley · April 15, 2024, 9:33pm

create an array of arrays, with the data from the page, and then use json_encode to turn it into a JSON string.

I can’t give you a tutorial on how to generate the JSON you want, because i dont know what JSON you want. And even if i did, frankly i’m not going to offer code in this field, to avoid said liabilities.

midnightbast · April 16, 2024, 3:20pm

You can retrieve data from a website in JSON format using web scraping techniques with tools like Python’s BeautifulSoup or Scrapy libraries. Check out tutorials on websites like W3Schools or Real Python for step-by-step guides.

Zensei · April 16, 2024, 6:08pm

Below is a quick example against the website you provided.
Using Python requests, BeatifulSoup and json.
Make sure to install those libraries

Doc for BeutifulSoup

If the website is written using client site library then you have to use Selenium to get the page source and then continue with BeatifulSoup if so desire.

Good luck

import requests
from bs4 import BeautifulSoup
import json

page = requests.get('https://jandjoy.com/collections/femme?filter.p.m.filter.option_1=Femme&filter.p.product_type=Bonnets&filter.p.product_type=Jupes&filter.p.product_type=Polars&sort_by=created-descending')

# print(page.text)

soup = BeautifulSoup(page.text, 'html.parser')

# print(soup.prettify)

items = soup.find_all('div', class_='w-full')

products = []

for item in items:
    # print(item)
    product = {}
    try:
        product['item_id'] = item['data-item-id']
        image = item.article.find(
            'div', class_='ProductItem__Image tw-relative tw-overflow-hidden').div.img['data-src']
        product['image_src'] = image
        product['description'] = item.article.find(
            'div', class_='ProductItem__Meta tw-py-3 tw-flex tw-flex-col sm:tw-relative').h3.string.strip()
        product['price'] = item.article.find(
            'div', class_='ProductItem__Meta tw-py-3 tw-flex tw-flex-col sm:tw-relative').div.div.span.span.string.strip()

        products.append(product)

    except KeyError as e:
        print(f"----- KeyError: {e}")

json_string = json.dumps(products)
print(json_string)

with open('femme.json', 'w') as file:
    json.dump(json.loads(json_string), file)

cdevl3749 · April 16, 2024, 8:29pm

Good evening, thank you for your answer, I will test it.

I use vs code

cdevl3749 · April 16, 2024, 9:36pm

I tested in php and obviously I got the error which prohibits this.

" Failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in C:\Users\cdevl\OneDrive\Bureau\scrapping\index.php on line 3 "

<?php 
    $api_url = 'https://jandjoy.com/products/2302wskirt03-jupe-femme-longue-leopard?_pos=3&_fid=e38af50c3&_ss=c';
    $data = file_get_contents($api_url);
    echo $data;
?>

SamuelCalifornia · April 17, 2024, 6:14am

It is interesting to me that in questions in these forums asking how to protect a website from theft of code such as HTML, CSS and JavaScript and the answer often provided is that it is not possible to protect a website. In those responses, no one says anything about it being illegal.

There is a difference between criminal and civil. Is the law you are describing criminal or civil (at least in the USA)? I believe it is a civil matter and civil matters are not subject to prosecution. I am not an attorney but I believe that the penalty for a civil matter is monetary, never jail.

m_hutley · April 17, 2024, 8:12am

In my (no doubt somewhat jaded) experience, the answer to that is “it depends on who you do it to, how often you do it, and how much you take”.

And then probably followed somewhere in there about either how many people they know and/or money they have to throw at it.

Thallius · April 17, 2024, 8:18am

Very strange discussion.

So its better to do things if they are less criminal then others???

Sorry but for my personal opinion you should avoid EVERYTHING which harms others. No matter if is is physical or material…

Zensei · April 17, 2024, 1:32pm

I read the term and condition for the website, and did not find anything related to web scrapping, although I agree with @m_hutley specially about “How often you do it” even if legally, you can overload the website.

Now I know this company sales cloud (at scale) web scrapping but interesting reading about legality of web scrapping:

The 8 biggest myth about web scrapping

SamuelCalifornia · April 17, 2024, 5:57pm

Some more thorough and seemingly authoritive:

Thallius · April 17, 2024, 6:32pm

This is

Never read such a bullshit.

There is no website in the world where the publisher would be glad if someone is scrapping his page and use the data for own purposes…. Normally 99% have this in their disclaimers and even the 1% who forget about that don’t want to be „robbed“ of their content.

Zensei · April 17, 2024, 9:39pm

We are talking about the legality of web scrapping, not about whether the owners are happy that their content is being “robbed”.

Although I understand the anger, I don’t see the bs in the article.

The second link from @SamuelCalifornia is a fascinating reading.

A lot of gray areas there.
They keep saying how something that is publicly available is a secret. Or the scrapping may not be unlawful but inappropriate.

There the case of hiQ v. LinkedIn, Where the defendant won the case:

In hiQ v. LinkedIn, the Ninth Circuit applied Van Buren to reject LinkedIn’s attempt to use the CFAA to prevent the scraping of public LinkedIn profiles.69While LinkedIn implemented access restriction measures such as Internet Protocol address blocking, a terms of service prohibiting scraping, and a cease and desist letter, the court found these restrictions insufficient to trigger the CFAA

Now Compulife Software, Inc. v. Rutstein. Start At the end of page 141. In my opinion it was highway robbery. Basically what the defender did was scrape the data of insurance quotes and created is own software and website providing the same service.

It went back and for for the defendant and the plaintiff. Eventually the court rule that the scrapping was not illegal but was inappropriate for the large amount of data it took.

Importantly, and correctly, the Eleventh Circuit did not hold that scraping is per se improper. Only scraping a substantial amount, such that “the block of data that the defendants took was large enough to constitute appropriation of the database itself,” is improper.
The substantial copying requirement allows courts to police the line between fair and unfair competition and between harmful free-riding and beneficial dissemination of information