I have a python script (Script is attached) that works functionally but m getting a syntax error on the return section of the code, but cant figure out why it doesnt work despite indenting the line. can you please help solve the indentation or syntax issue? Sorry i m not very familiar with python indentation so any advise/help would be appreciated.
this is the error:
File “scrape.py”, line 26
return e.extract(r.text,base_url=url)
^
SyntaxError: ‘return’ outside function
Python is quite picky when it comes to indentation. I’ll assume you wanted to place your return statement inside of the scrape() function, but forgot to indent/space it and that is why python thinks the return statement isn’t a part of a the scrape function.
from selectorlib import Extractor
import requests
from time import sleep
import csv
# Create an Extractor by reading from the YAML file
e = Extractor.from_yaml_file('booking.yml')
def scrape(url):
headers = {
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
'DNT': '1',
'Upgrade-Insecure-Requests': '1',
# You may want to change the user agent if you get blocked
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Referer': 'https://www.booking.com/index.en-gb.html',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
}
# Download the page using requests
print("Downloading %s"%url)
r = requests.get(url, headers=headers)
# Pass the HTML of the page and create
return e.extract(r.text,base_url=url)
with open("urls.txt",'r') as urllist, open('data.csv','w') as outfile:
fieldnames = [
"name",
"location",
"price",
"price_for",
"room_type",
"beds",
"rating",
"rating_title",
"number_of_ratings",
"url"
]
writer = csv.DictWriter(outfile, fieldnames=fieldnames,quoting=csv.QUOTE_ALL)
writer.writeheader()
for url in urllist.readlines():
data = scrape(url)
if data:
for h in data['hotels']:
writer.writerow(h)
# sleep(5)
the one you post is better indented though it is now complaining about line 48
File “.\scrape.py”, line 48
for h in data[‘hotels’]:
^
TabError: inconsistent use of tabs and spaces in indentation