Whenever I find something interesting online, I save the webpage as complete HTML webpage so I have it for reference later on.
Today I went to open one of these pages, and while it opened it was basically blank. (The webpage was from an online newspaper, and all that I saw was a basc head and footer.)
There is both a .html file and a folder for the article full of various files, so why cannot I read the page I saved?
Except most online newspapers pull articles after a very short period of time, which is why I save things - so I have them forver!
That is material for another thread… I used to be able to PDF things just fine, but there is so much Javascript crap online, most webpages I try to PDF turn out to be gobbedygook.
I’ve got several issues going on here with the Internet going farther down the toilet each day.
In my OP, the issue i that I save a webpage, there is a .html file and an HTML folder on my hard-drive. Then, later on - maybe a year later - when I have double-clicked on the .html file I either get a blank page, or I see the webpage load for a second, and then get a white page.
After supper let me see if I can post some examples.
To think I have been saving things for the last decade, and apparently in the last year or two something changed online, and now what I thought was saved is total gibberish!
When I went back to the saved file and opened it, I got a New York Times heading and “Page Not Found” in the center of the page.
How can a saved, offline page not be found?
By the way, the .html file and associted folder were about 2MB and when I zipped them it became 8MB so I can’t upload the .zip which is probably what you need.
That would be consistent with my “source vs. computed” idea. That is, a lot of sites are “JavaScript required” instead of “JavaScript enhanced”. i.e. the browser has JavaScript enabled but the “save as” doesn’t.
As for revisiting older NYT articles, I could find policy that seems to be about “using” them. Though I wouldn’t consider saving a local copy as “using” it, maybe they do? In any case it wouldn’t hurt to ask. Maybe there is some sort of “bookmarked favorites” feature.
You could do that to confirm, but basically there would be a “no JavaScript” page, and a “with JavaScript” page. You could also compare how “view-source” looks with how the dev tool’s page DOM looks.
So I turned off Javascript in Firefox and when I went back to the original article on the NYT, it loaded for a second and then I got a blank page. Then I searched around and found some site claiming to have a non-JS version, although it looked like some foreign site. When I saved that page with Javascript still turned off, it seems like I can read the page as I would expect. I was also able to create a decent PDF of it.
Are you saying that many modern websites are set up so that if you don’t have active Javascript enabled that the page breaks, including saved copies?
If so, is this by design? Or is it just horrible web design?
You know another thing I just discovered is this…
So I have saved all of these news articles, and in my folder I see the .html file and the corresponding folder. But when I double-click on the .html file to re-read an article I saved weeks ago, the .html file sudden disappears and all I am left with is the html folder, which renders the webpage unviewable since there is no long an .html file to click on?!
Is this another conspiracy by media outlets? Or is it more sloppy programming?
Most importantly, how can I easily save web pages like I did in the past?
The reason I save stuff is for reference, and hoping that a webpage will still be around in a week, month or year is dreaming. Plus I want a way to easily access a web page offline when I need it. (For instance, tonight I was trying to read a webpage I saved about configuring Apache, and sadly it fell victims to the issues above, and so what was once a great reference to help me out is now apparently lost forever.)
Like I said, I have tried PDFing webpages from the get go, but often there is so much Javascript nonsense going on - sorta like on this website - you can never get a legible web page when you PDF things. (5-10 years ago I just relied on the “print version” and would save that and/or PDF it and things were golden.
Now it is like companies don’t want anyting to be permanent. You read it once and it is gone forever… Maybe we should move back into the cave and just tell stories by the fire light and hope we can remember things to tell our chldren?!