The function imgArray returns just one image and not an array. That is, the third column of the spreadsheet which is img lists just one image. Here is my code:
const rp = require('request-promise');
const otcsv = require('objects-to-csv');
const cheerio = require('cheerio');
const baseURL = 'https://www.example.com';
const getCategories = async () => {
const html = await rp(baseURL);
const imgArray = () => {
cheerio('td.productListing-data > a > img', html).each((i, image) => {
img = cheerio(image).attr('src');
})};
imgArray();
const businessMap = cheerio('.category', html).map(async (i, e) => {
const link = e.attribs.href;
const innerHtml = await rp(link);
const cat = e.children[0].data;
return {
link,
cat,
img,
}
}).get();
return Promise.all(businessMap);
};
getCategories()
.then(result => {
const transformed = new otcsv(result);
return transformed.toDisk('./spreadsheets/output.csv');
})
.then(() => console.log('SUCCESSFULLY COMPLETED THE WEB SCRAPING SAMPLE'));
What on earth is cheerio?
Edit:
Ahh, it’s a cut-down version of jQuery.
1 Like
Hi @makamo661 , you’re assigning a new value to img
in each iteration here; so after the loop completes, it will just hold the last one. So instead you might directly map()
each image to its src
attribute like so:
const images = cheerio('td.productListing-data > a > img', html)
.map((i, image) => cheerio(image).attr('src'))
.get()
Edit:
Oh boy I didn’t even get this LOL… it seems it does implement map()
as well though.
1 Like
I changed the code as follows and it still just gives me one item:
let img;
const images = cheerio('td.productListing-data > a > img', html).map(async (i, image) => {
img = cheerio(image).attr('src');
}).get();
I used:
const images = cheerio('td.productListing-data > a > img', html)
.map((i, image) => cheerio(image).attr('src'))
.get()
that had the result that every cell of the spreadsheet contained an array.
I tried iterating through the array but couldn’t make it work with the return statement.
system
Closed
June 20, 2020, 4:32am
7
This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.