Downloaded PDF files is only 1KB

chenphilip14 · November 27, 2023, 4:39am

I am asked to download a number of PDF files from a website. Since it’s too tiring to get it by hand, I found this code here: https://realpython.com/python-download-file-from-url/

from urllib.request import urlretrieve

url = (
    "https://online.personalcarecouncil.org/ctfa-static/online/lists/cir-pdfs/PRS572.pdf"
)
filename = "PDF File.pdf"

path, headers = urlretrieve(url, filename)
for name, value in headers.items():
    print(name, value)

print(f"Downloaded file {path}")

However, when the file that I downloaded is only 1KB hence I cannot open nor read it, and it’s far from the actual PDF file when I click on the link! How can I fix this?

Martyr2 · November 27, 2023, 6:17am

Are you sure the file is not just 1KB in size? I see the file contains some javascript and doesn’t seem to contain more than 240bytes which is just 1KB. Double check that is the right file.

chenphilip14 · November 27, 2023, 7:01am

That is the file that I want. When I went to the link directly, I had no problems with the file. But when I tried downloading it from my code, this happens.

mabismad · November 27, 2023, 7:12am

Open the downloaded file using a programming editor. You may find an error message in it that will tell you why it isn’t getting downloaded.

chenphilip14 · November 27, 2023, 7:17am

I get this from my terminal:

Cache-Control private
Content-Type text/html;charset=ISO-8859-1
Server Microsoft-IIS/10.0
Set-Cookie JSESSIONID=91F77396F8BF9460366C88483F5351E2; Path=/; Secure; HttpOnly
X-AspNet-Version 4.0.30319
X-Powered-By ASP.NET
Date Mon, 27 Nov 2023 07:16:25 GMT
Connection close
Content-Length 240
Downloaded file FR717.pdf

I am using VSCode, and I did not see any errors from my terminal. It can download, but the size is just wrong.

m3g4p0p · November 27, 2023, 10:57am

Hi @chenphilip14, it appears that this resource requires authentication; if you open the downloaded file in a plain text editor, you’ll see some JS that would just redirect to the home / login page:

<script language="javascript">
    document.location = 'https://online.personalcarecouncil.org/jsp/Home.jsp?pageName=https://online.personalcarecouncil.org/jsp/LoginAuth.jsp&pageName2=null&statuschecking=yes';
</script>

Martyr2 · November 27, 2023, 6:55pm

What @m3g4p0p just pointed out is what I see too. Which is why I said it had some JS in it and was only 240bytes roughly.

Thanks for the confirmation @m3g4p0p

chenphilip14 · November 28, 2023, 3:24am

Ah, yes. No wonder the file is so odd, since I need to login if I were to get the file straight from that link. Thanks for the headsup.

When I went to their website from Google and went thru their index from A-Z, I could download manually without problems.

system · February 27, 2024, 10:24am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.