Is there a hack to run vanilla JavaScript in a shell terminal?

bendqh1 · February 4, 2024, 1:26am

A webpage contains certain element with a CSS class and I want to echo the textContent property of this element in a shell terminal.

Pseudocode:

PieceOfWebData=$(curl example.com | SELECT document.querySelector('.a').textContent)
echo "${PieceOfWebData}"

m_hutley · February 4, 2024, 7:37am

not generally, but there are file system tools that can do what you’re trying to do, depending on what OS you’re running on…

James_Hibbard · February 4, 2024, 9:56am

As marc says, not really, but it all depends what you are trying to do.

If all you want is the link text, you could do:

curl https://hibbard.eu/about/ | grep -oP '<a[^>]*>\K[^<]*(?=<\/a>)'

For simpler cases, the regex could be tweaked to achieve the desired result.

But if you are trying to do anything much more complicated than that, I would use Node, fetch the content of the page, parse it as HTML, then do whatever you need to do:

import fetch from 'node-fetch';
import { JSDOM } from 'jsdom';

const url = 'https://hibbard.eu/about/';
async function fetchData() {
  try {
    const response = await fetch(url);
    const text = await response.text();
    const dom = new JSDOM(text);
    const links = [...dom.window.document.querySelectorAll('a')]
      .map(link => link.textContent.trim())
      .filter(text => text);
    console.log(links);
  } catch (error) {
    console.error('Error:', error);
  }
}

fetchData();

Notice that if you were to run both of the above, the output is slighly different. The shell one-liner chokes on my site title:

<a href="/">
  <span class="back-arrow icon">
    <svg fill="#000000" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="none"/>
      <path d="M20 11H7.83l5.59-5.59L12 4l-8 8 8 8 1.41-1.41L7.83 13H20v-2z"/>
    </svg>
  </span>
  James Hibbard
</a>

Whereas the node script is way more robust and picks it up.

LMK if you want some help going the Node route.

m3g4p0p · February 4, 2024, 12:30pm

Yeah as mentioned there are many possibilities… FWIW I’d go with xmllint for stuff like this (using xpath instead of CSS selectors though):

curl https://example.com | xmllint --html --xpath './/a/text()' -

James_Hibbard · February 4, 2024, 12:38pm

Oh nice. I had no idea you can do that.

Just out of curiosity, if you run it with my example page, it spits out a bunch of warnings. Any idea what that is about?

curl https://hibbard.eu/about/ | xmllint --html --xpath './/a/text()' - 

-:63: HTML parser error : Tag header invalid
  <header>

...

It still outputs the correct results though.

m3g4p0p · February 4, 2024, 1:18pm

Oh hm I wasn’t aware of this, apparently xmllint just doesn’t recognize these tags. SO suggests to redirect stderr to /dev/null then… but as the comment says that’s not nice indeed. :-/ I’ll admit I have only used xmllint for actual XML yet. ^^

SamuelCalifornia · February 5, 2024, 12:01am

I would not know about tools that hackers use. I do not do hacking.

You do not say what operating system you use. Microsoft has provided an official JavaScript engine (outside of a browser) for Windows for about as long as JavaScript has existed, no need to do any hacking.

Is any of the sample code you provided actually JavaScript? I do not know JavaScript really well but I cannot find anything that says your code is JavaScript.

bendqh1 · February 5, 2024, 1:26am

Hi Smmuel
I didn’t mean “hack” in the modern day typical usage of term but as a parallel for “code trick”.

Anyway, in this case, I use CentOS operating system and the code example I gave is a pseudocode just to example how an abstract code that do what I try to do, may look like.

SamuelCalifornia · February 5, 2024, 3:39am

See the dictionaries. The Merriam-Webster dictionary defines hack (in part) as gain access to a computer illegally and similar manners. There are unrelated definitions that are off-topic. It is true that in the modern day people are beginning to use hack with positive connotations but you say you did not mean it in a modern day usage.

Please do not think my response is due to anything in the past. I did not look at who posted this until now.

That helps. That is the type of information that is best provided in a question initially.

bendqh1 · February 5, 2024, 4:12am

Hi James

I never worked with Node.JS but I would have gladly tried and getting the textContent of an element in a webpage from a CentOS Bash terminal could be a very nice tutorial to start with, but in my shared hosting where I actually need to do this there is no Node.JS installed according to the node -v command and I don’t have a root access to directly install it there so even if I use some “Node.JS application” Cpanel tool I don’t think I’ll be able to freely work with Node.JS as I could freely work with Perl or Python which are natually shipped with this environment.

The farthest I got with grep was this:

curl -s https://packagist.org/packages/drupal/core | grep -oP '<span class="version-number">'.*

10.2.2

I don’t know how best to isolate only the 10.2.2 part. How to match only numbers and dots from the entire match of grep -oP ''.*

James_Hibbard · February 5, 2024, 7:01am

Like this?

curl -s https://packagist.org/packages/drupal/core | grep -oP '<span class="version-number">\K.*?(?=<\/span>)'
10.2.2

It is very brittle though, e.g. it relies on finding the exact string  and would break if it found something like  or 

FWIW, you don’t need admin rights to install Node on a server, just shell access with git.

If you can install your own packages, here is the same script as above in Python (untested). You can use the requests library for fetching web content and BeautifulSoup for parsing HTML.

import requests
from bs4 import BeautifulSoup

url = 'https://example.com'

def fetch_data():
    try:
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')
        links = [link.get_text().strip() for link in soup.find_all('a') if link.get_text().strip()]
        print(links)
    except Exception as error:
        print('Error:', error)

fetch_data()

bendqh1 · February 5, 2024, 7:46am

Like this?

Yes indeed

Thanks for the examples about Node.JS and Python, it’s elegant and interesting.

FWIW, you don’t need admin rights to install Node on a server, just shell access with git.

Should I Google search how to install node.js with git?

James_Hibbard · February 5, 2024, 7:49am

Nope. If you can SSH into your server and you have git installed on the server, you should use nvm.

bendqh1 · February 5, 2024, 7:57am

I can indeed SSH into that environment.

The command git clearly indicates that git is installed but the command nvm install node brings:

-bash: nvm: command not found

James_Hibbard · February 5, 2024, 8:04am

You need to install nvm, then use nvm to install Node. nvm is a version manager and an easy way to install and manage multiple instances of Node.

See: https://github.com/nvm-sh/nvm?tab=readme-ov-file#installing-and-updating

Thallius · February 5, 2024, 8:56am

in centos try

yum install npm

Dukree · April 9, 2024, 9:40pm

Hi there! Well, I’ve tried it once. Basically, there is a way to run JavaScript in a shell terminal to extract data from a webpage. You can use a tool like node-fetch along with Node.js to achieve this.