Python Scrapy issue

Hey everyone. I have an issue with the Python library Scrapy, to scrape glassdoor.com

I get the users location and job, and that works great. However, the issue I have is scraping the “Job Count” number. As an example: “https://www.glassdoor.com/Job/jobs.htm?sc.keyword=web%20developer&locT=C&locId=1147070&locKeyword=Coachella,%20CA&srs=RECENT_SEARCHES

The job count number here is listed as “10 jobs”, however every time I try and scrap this, it will give me a different number each time. It changes between 6 to 10. I tried adding a delay to the scrap, thinking it might be because the value is generated from a JS script. However the delay did not solve it.

I just get the number from the response.css =

job_count =response.css(“p.jobsCount::text”).extract()

Does anyone have any idea, or can point me in the right direction? I’m running out of ideas to try :x

You want to not scrape
https://www.glassdoor.com/about/terms.htm


You agree that you will not:

  • Introduce software or automated agents to Glassdoor, or access Glassdoor so as to produce multiple accounts, generate automated messages, or to scrape, strip or mine data from Glassdoor without our express written permission;

You want to use the API
https://www.glassdoor.com/developer/index.htm

3 Likes

Ahh damn.

The only thing with that API is that they don’t accept any requests to get the API key. So I guess I’m kinda outta luck with glassdoor

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.