Parsing html content in Javascript, too resource intensive?

jeremy58 · June 18, 2024, 1:37am

I have a Wordpress gallery plugin that I maintain that uses the Lightgallery plugin on the frontend. Lightgallery has functionality that can create the gallery dynamically (by passing in an array of image urls).

I’d like to incorporate functionality that will parse the main HTML content on the page (the main body of the article, in this case) and use a regex to extract any tags, which I would then pass back to the backend via ajax. Then the backend will query Wordpress to find those images in the database (to retrieve certain details about them such as the copyright info, and description, etc). The backend would then return the necessary details and the gallery will be initialized.

Would using javascript to parse the HTML content (which might be a few/several MB) be too slow/resource intensive for low-end devices to handle?

m_hutley · June 19, 2024, 10:02am

Probably not, but if your HTML content is “several MB”, your HTML content might be too slow/resource intensive.

Keep in mind that the HTML content is just the text and the tags, it’s not the images you load.
<img src="this_image_is_20_MB_big.jpg"> has an HTML content weight of 39 bytes (or 78 if you’re using badly formed multibytes). Not KB, not MB, bytes.

The entire bible (UTF-8, KJV, No references/footnotes) in plaintext is 4.2 MB _{[https://www.gutenberg.org/ebooks/10]}. If your HTML content is bigger than that… I have questions.

EDIT: Site your source, Marc…

SamuelCalifornia · June 24, 2024, 5:52am

Regexes do not parse. Regexes can recognize portions of text but they do not recognize the entire grammar. Experts advise to not use regexes on HTML.

There are compiler generators that are used to compile many things, including HTML, CSS and PHP. For PHP RE2C is used. The earliest generators that are still popular are Lex (Flex) for syntax and YACC (Bison) for semantics.

I found Peggy – Parser Generator for JavaScript. It might help. I assume that input exists for Peggy that can result in generation of a parser of HTML.

For server-side there are (is a) parser generator(s) for PHP, perhaps that is what you need.

system · September 23, 2024, 12:52pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hidden HTML elements to store data? CMS & WordPress wordpress	4	1806	July 7, 2022
Parsing web pages using javascript JavaScript	2	1371	December 7, 2010
Spitting out scripts dynamically? PHP	6	625	October 25, 2023
Need to find <a> tags with <img> tags in them JavaScript	6	859	June 26, 2010
My regular expression is very slow in JS JavaScript	4	2172	October 8, 2014

Parsing html content in Javascript, too resource intensive?

Related topics