Web pages disappearing after save

This is what Internet Hell must feel like?!

I thought I was having enough problems with this: Saved Webpages Unreadable

But now I am discovering that most of the web pages I have been saving onto my computer aren’t really there.

I primarily use Firefox, and when I want to keep a copy of a good web page/article for later reference, I go to File > Save Page As and choose “HTML Document” which create a .html file plus a matching HTML folder with all of the related CSS, javascript, etc.

Last week I discovered rather randomly, that when I went to my “ARTICLES” folder and found an article I wanted to revisit, there was an html file and the HTML folder,however, when I double-clicked on the html file it suddenly disappeared?!

All I was left with was the matching HTML folder, which is useless without the .html file!!

In the thread above, I have been having a terrible time trying to save articles from online and save them into a format that is legible and which retains the text, layout, photos, and graphics so it is like reading the original article.

I thought the problem in that other thread might be my use of UMatrix and NoScript, so last night I spent over two hours lucjkily finding these recent articles online, basically allowing everything for the pages I was saving, and then saving fresh copies into my “ARTICLES” folder.

I also went in and used Firefox’s Web Devloper tools to remove some problematics widgets in the articles - likely Javascript remnents - and resaved those files which I edited.

Next, I systematically when into my “ARTICLES” folder, confirmed each article had a .html file and a corresponding HTML Folder, and then I opened each one up in Firefox.

Everything looked peachy, and those New York Times articles I edited looked great!

All was well - or so I thought.

Tonight I went back into these articles t double-check things, and as I clicked on each .html file they suddenly disappeared leaving me only with the HTML folder which by itself is useless?!

How can you save a web page, your computer saves a .html file and the corresponding HTML folder, and when you double-click on the .html file a day, a week, a month later, it suddenly disappears???

All of that re-work from last night has been distroyed a second time.

I am so pissed off right now…

Post a link to one of the New York Times articles you saved and had disappear upon clicking.

We need to work with the same page your trying to save, in order to narrow down the problem. Whether it be the NY Times page or FF or possibly even something to do with your PC.

1 Like

What do you have for Virus/Malware protection?

Here some of the many that disappeared since last night…

Nothing on this page is real - How lies become truth in online America

How China walled off the Internet

Limiting social media use reduced loneliness and depression in new study

I don’t really want to work at Facebook

I thought the problem was just NYT articles, but it appears to be more widespread…

The problem is happening on a Mac, so no, no anti-virus.

Actually I have experienced the problem your describing before. Click on the html file and it disappears. What I found when that happened was that the download itself failed. I discovered this in the “Show Downloads” arrow at the top of the browser.

When I would click retry it would be successful most of the times. I just had to do that with the “How China walled off the Internet” link. Saved to my desktop just fine and was able to open it. The sticky header was not working but the rest of the page worked and images were intact.

2 Likes

I’m not familiar with Mac (or Win 10), but in Windows you could have the .html file connected in pair with the folder that holds the saved content it points to.

IIRC the settings is in menu>tools>folders something, if you couple the .html and its folder the two becomes a pair with the folder hidden and you can’t? access it by itself, and if you delete the .html you also delete the folder.

Could that shed some light into what happens to your saved pages? I can guess the Mac file system is even more “userfriendlier” than Windows ever was.

Just two of my cents. (I might have a few more cents left.)

So this is an issue with Firefox and not the articles themselves?

I’m not following you.

As I said above, when I save a web page there is an HTML FILE and an HTML FOLDER.

If you don’t have both, you are screwed.

When I save the web page I see BOTH.

When I double-click on the HTML FILE it disappears!!

Thus leaving the HTML FOLDER useless to me.

And I have to go back and try and find that page on the Internet and save it again.

Of course, apparently saving a file and seeing it on your hard-drive isn’t enough to know it is actually there?!

What I have had happen on rare occasion is that when a download is incomplete the file is created but not written.

That is, I can see the name of the file in my folder, but the filesize reveals there is a problem with it.

For example, if you downloaded “some-article.html” to your folder, in the list of files you would see something like.

some-article.html       text/html      0kb 

Except these files had size to them.

I just tried yet again to save a NYT article…

HTML FILE was 1.7 MB. The HTML FOLDER didn’t show a size, but was packed full of files.

I double-clicked on it, and this time it opened, but when I quit out of Firefox, I could see the 1.7 MB file disappear before my eyes?!

WTF?!

If you copy the both to another directory before quitting FF, are they too disappeared after FF quit?

I’m a long time Linux guy so Mac and Win users around think I know all there is to know also about their computers. :wink:

Your system makes little sense to me. Find why quitting FF deletes the saved files, are they cached or synced or temp or what? Can’t deny I’m starting to get curious here.

I cleaned up my “ARTICLES” folder, emptied the Trash on my Mac, and saved this article again…

What if we’re all coming back?

Next I quit Firefox and watched the 1.7 MB .html file go poof while the corresponding folder remained.

Deleted the folder, emptied my trash and saved it again.

This time I copied and pasted the .html file and folder to a temp dorectory.

Quit Firefox, and poof there goes the original .html file again. However, both copied and pasted files in the temp directry remained.

I don’t see any of that as an improvement, let alone a solution!

A possible solution just tried and it worked for me :slight_smile:

What if we are all coming back

  1. Browsed to the above URL
  2. right-click to view source
  3. highlight all HTML source text
  4. copied and pasted to “…/archives/what-if-we-are-all-coming-back.html”
  5. used text-editor to open and edit html file:
    a. find <script globally replace with <!-- script
    b. find </script> globally replace with <script -->
    c. save changes
  6. browse to “…/archives/what-if-we-are-all-coming-back.html”
  7. total time taken less than a minute
1 Like

Well in my case I think it was a slow internet connection.

But in your case, the more I read about what your experiencing, it sounds like it may be a combination of Firefox and your computer. I’m not sure about that though. Might be a good time to make sure you have a current version of FF.

And what is your default location for saving the file upon download. I save to my desktop and then drag them where I want later. Just wandering if where your saving them has anything to do with the conflict (though it shouldn’t).

So are you able to run the files that you copied to your temp folder? It seems they stay intact after you copy the originals. They are working when you view them?

Another thing I thought of is the amount of duplicate files you must have. I mean if you’ve got 100 articles saved from NYT that means you’ve got 100 copies of the same CSS, 100 copies of JQuery or similar JS files. That’s gonna add up over time. Have you considered setting up a directory that mimics NYT’s directory and just saving the html page only.

Though that does not answer your problem it sounds like a leaner way for someone that saves lots of files from the same site.

Questions:

1.) When you say “highlight all HTML source text” I assume you mean all code in the view source, right?

2.) After you “copied and pasted to “…/archives/what-if-we-are-all-coming-back.html””, I assume you saved the file first, right?

3.) What editor are you using?

4.) How foolproof do you think that approach is?

Can we assume that all script is Javascript and that Javascript is evil for what I am trying to do?

Could I be wiping out any “good” scripts?

@Ray.H,

I am doing this on a near new, top-of-the-line Macintosh and with the latest version of Firefox.

Highly unlikely it is a problem with my computer.

I am saving these files into an “ARTICLES” folder nested down in my user profile.

I have been doing this for years with no issues until recently.

1 Like

Text editor is Sublime Text. Why do you ask about the text editor? Any one should do as long as the html file is saved as a text file.

How foolproof do you think that approach is?

I tried it and it works. Did you try it?

Can we assume that all script is Javascript and that Javascript is evil for what I am trying to do?

It is a very poor web page if it needs JavaScript to render correctly.

Could I be wiping out any “good” scripts?

I have yet to come across a “good” JavaScript :slight_smile:

Why don’t you try what I suggested and see if it works. If there are problems then report back with details.

Now you’re confusing me…

You say copy & paste the html into a text file and then save it but then you refer back to the edited html file.

As far as I know, you can’t paste html in a true text editor like Notepad or TextEdit and save things back to html.

If you use a more advanced “text editor” like NotePad++ or BBEdit and so on, then you can do about anything.

That is why I asked what you were using.

For instance, I have an IDE on my computer where I do development, so I wasn’t sure if I needed that or something lesser like Notepad++.

I am going to.

Isn’t that why my other trhead exists - because web pages from palces like the NYT are garbage from a coding standpoint?!

[quote=“John_Betong, post:18, topic:313067, full:true”]

Could I be wiping out any “good” scripts?

I have yet to come across a “good” JavaScript :slightly_smiling_face:

[quote]

Okay.

Will try quickly before bed…

Something has changed, wouldn’t you agree? :slightly_smiling_face:

If it were me I would go to the people who know all about FF and start a thread there.

http://forums.mozillazine.org/

2 Likes