Firefox: Locating innerHTML

Disclaimer: I don’t know anything about Javascript!

Is it true that on web pages that are dynamically created by Javascript, that the “innerHTML” is the final code that actually gets rendered by a browser?

If so, how can I easily access this in Firefox?

I’m not sure tbh. I’ll leave that one to someone else to pick up.

Go into the Dev Tools in Firefox. You’ll find the HTML that’s being rendered in there, plus any files used in the creation of the page.

Allow me to explain what I’m up to…

Whenever I find a useful web page online, I have always saved it to my computer. That way I always have an offline copy and when the page disappears I still have a copy.

For the last 20 years this approach worked fine, but recently I discovered that may web pages that I thought had been saved to my computer were not.

Best case scenario is the web page is missing elements and all jumbled up.

Worst case scenario, I open up the page and see a white screen in Firefox.

When I do File > Save Page As in Firefox, I am getting an html file and folder, but from what I just learned, apparently a lot of these pages are created dynamically using Javascript and when I File > Save page As it is saving an incomplete page.

I have been researching this online, and stumbled across some article using Pythin to address this issue and it mentioned that you need to capture the “innerHTML” to get the page you are trying to save.

Like I said, I don’t know Javascript, but am wondering if there is some way I can grab that from Firefox and then maybe dump it into an HTML template and render a real and complete web page that way.

Let me finish by saying that it INFURIATES ME that I just dropped $100 on The New York Times, and now all of these articles I have paid for and been saving on my laptop are really ghost files!!!

I hope there is a way to get around all of these garbage websites online and get back to basics when the Internet was pure…

The most reliable way to save these pages is to print them to a PDF file.

2 Likes

Unforunately that often doesn’t work either. When I try and PDF pages from the NYT, I get a garbled up resulting PDF.

Based on preliminary searching, there is lots of information on “screen scraping”, but I don’t have the time to learn something like Python right now.

I was hoping there was an easy way for me to capture the final HTML that gets rendered and then save that so the saved web page is what I see on my computer as I read it.

The best solution would be some add-on to Firefox or Chrome that would do all of this for me.

There must be some solution to this vexing problem of the modern Internet?

Sadly no there is not, as the NYT has a negative incentive in allowing you to save their articles.

I don’t know, it seems there must be some way, at least to some extent

https://help.nytimes.com/hc/en-us/articles/115014893428-Terms-of-service

2.3 You may download or copy the Content and other downloadable items displayed on the Services for personal use only, provided that you maintain all copyright and other notices contained therein. Copying or storing of any Content for other than personal use is expressly prohibited without prior written permission from The New York Times Rights and Permissions Department, or the copyright holder identified in the copyright notice contained in the Content.

I would ask “help” if accessing content outside of a browser is considered “circumventing any restriction or condition” before you commit to developing anything like a cURL script etc.

4.1 You may not access or use, or attempt to access or use, the Services to take any action that could harm us or a third party. You may not access parts of the Services to which you are not authorized. You may not attempt to circumvent any restriction or condition imposed on your use or access, or do anything that could disable or damage the functioning or appearance of the Services, including the presentation or display of advertising. Being exposed to advertising is a condition of accessing the Services.

So I meet their “fair use” terms. Problem is I haven’t found a way to do what I want yet.

The best way would be to ask them. I did notice mention on one of the help pages about a “reprint” (button?) that appears on article pages. I took it to be something pertaining to using the page rather than saving the page, but there’s a chance it’s for a “reprint” page that can be saved locally.

There is indeed such an add-on for Firefox. Here is how it is described in a Google search.

How do I take a screenshot of a whole web page in Firefox?
Here is how to capture a scrolling screenshot:
Open the desired website in Firefox.
On the right-hand side of the address bar, click the Page Actions (three dots) button.
Choose “Take a Screenshot” option from the drop-down menu.

I have tried it and it works fine. :grin:

1 Like

Thanks but I see no such button, plus that doesnt solve the same issue I have on dozens of other websites.

Looks like this is going to take a fair amount of work on my end. :unhappy:

Look at the right hand side of the address bar. There are three horizontal dots. Click on them to get to Screenshot.

I forgot that existed.

Thanks for the suggestion, but not what I am looking for, besides I can do that already in Snagit.

In a worst-case scenario, that would help me to preserve the look-and-feel of a web page - although it somewhat screwed things up - but ideally I would like a page that not only looks exactly as the original, but that is readable text and not an image.

When I tried installing some add-ons in Chrome before supper, I found some choices that came close, which indicates to me that there is a way to programmatically do what I want.

Now how easy or hard that is remains to be seen.

I guess I am not really understanding why these pages are breaking lately.

I thought that when you view a web page in your browser that you are looking at the final HTML, right?

After all, I may have a PHP website that dynamically creates the page content, but at the end of the day the final product is static HTML, right?

So how are these modern Javascript pages blowing things up on me?

And why can’t I save a web page in the exact format that my eyes see it?

You may be saving the html page, but you are not saving external CSS and Javascript files that influence the way the page renders in your browser. If you want to keep the look of the page you need to use screen capture for a permanent record.

1 Like

In the past when I saved web pages as “HTML Documents” the WYSIWYG was preserved.

Why is that no longer the case?

There must be some way to accomplish what I want…

It is possible that you were really saving a link to the page, rather than the page itself. Alternatively, all of the code required to render the page may have been on the page - in other words, there were no external files required to render the page.

Definitely not. I was saving the HTML and associated stylesheets (and I assumed any Javascript too).

Possibly.

So when I am viewing a web page with external stylesheets and so on, isn’t there a way to suck those up and save them in the HTML folder that goes along with the html file?

Also, would using a “screen-scraping” approach yield better results?

Yes, when you save the page select > Save as type: > Web Page, complete

That will give you the HTML file and the associated folder (with files) that goes with it.

1 Like

Apparently that is no longer the case because as mentioned above when I save certain fancier web pages and then go to view them, they are no appearing as the original and often will not even load…

You are getting back to the “screen shot” discussed earlier.

Another alternative is to save the page as a pdf file. Opera browser allows you to do that.