How to download the HTML code of a website with Node.js?

rittubhansali · March 10, 2018, 12:27pm

I have an array of links like this one:
['example1.com', 'example2.com', 'example3.com', 'example4.com', 'example5.com']

I want to get only the HTML source code of all the 5 links using Node.js. Please advise how can I do that?

chrisofarabia · March 10, 2018, 12:33pm

You’ll need to look up a technique called web scraping I think.

James_Hibbard · March 11, 2018, 7:24pm

You should be able to adapt this to your needs.

bodybuildoo08 · March 12, 2018, 11:50am

W3Schools is a good place to start. The site has some great tutorials for Web technologies which are easy to follow.

Gandalf · March 12, 2018, 11:52am

It can be. But it is quite outdated on some topics. I’m not sure it is a good choice for Node.

tvilmart · March 12, 2018, 2:22pm

If it is static HTML, and you just need to download HTML files, then you need to use the npm library request.

It it is dynamic HTML, generated by a library or HTML written at runtime, you need to use a browser emulator like selenium-webdriver. And then you can extract the innerHTML from the <html> element. another solution would be to use a mocked DOM, like JSDOM.

Once the page is loaded in selenium, the extraction of the HTML looks like this:

await driver.findElement(By.id('react-application-root'))
const searchFormButton = await driver.findElement(By.id('search-form-button'))
await searchFormButton.click()

// we wait for the search results to be displayed
const timeoutToDisplaySearchResults = 5000
await driver.wait(until.elementLocated(By.id('tab0')), timeoutToDisplaySearchResults)
const htmlElement = await driver.findElement(By.tagName('html'))
const htmlElementinnerHTML = await htmlElement.getAttribute('innerHTML')
const fullHtml = '<!DOCTYPE HTML>\n<html lang="en">\n' + htmlElementinnerHTML + '</html>'

system · June 11, 2018, 9:22pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to do element extraction in javascript? JavaScript	6	944	December 29, 2016
How to wirte file from browser site in node.js JavaScript nodejs	2	1201	February 1, 2023
Who have node.js documentation? JavaScript	4	733	July 11, 2016
How to Grab Components From Any Web Page HTML & CSS	7	100	April 23, 2025
Extracting node elements from the DOM JavaScript	3	641	July 16, 2019

How to download the HTML code of a website with Node.js?

Related topics