This is quite useful. If one does .abort() on the XMLHttpRequest or on the fetch(), the backend request may still be hanging, for example fetching lots of data from a database.
First some knowledge. An abort on the front-end side is silent for the back-end side unless one build a mechanism that I am talking about. With nodejs, the same request can do several things in parallel by using Promises.
Here are my ideas. Please comment or suggest other ideas.
one can use websockets. But I don’t see how it would be better than other solutions.
one can use the Keep-Alive Header. This allows to send cancellation data in the same request. This seems the best option. I will explore it more.
one can use a separate request that sets a flag in redis, an in-memory OS database. And the request that should know when to abort itself will know by polling redis from a promise.
Is this actually a problem for you, or is it something you think might eventually become a problem someday.
To be honest this sounds like brutal over engineering. It’ll take you quite a lot of time to get it right, it’ll almost never happen, and when it does you’ll save a few seconds of computing power at best.
The costs way outweigh the benefits here.
Unless maybe you’re running something with reports that take hours to process I wouldn’t bother.
Processing that is done for nothing is a waste. Each request takes CPU and memory both on the nodejs server and on the database.
The case I have has images in Megabytes to download from a slow and crowded database. After the download, the images are converted from 10bit to 8bit, and sometimes they are converted to jpeg. Sometimes other databases are needed to add meta-data. The whole request takes a few seconds of processing.
We never hear about environment friendliness of back-ends. Wasted processing and over-dimensioned HW have an environmental impact. It is quite a problem that companies are so much driven by power struggles that the technical and the environmental aspects are ignored.
Beyond the morality of the environmental impact, commercial websites scale to thousands of users. I have worked with back-ends with thousands of users. And the HW was over-dimensioned with titanic machines to handle the amount of data. Over-dimensioned HW adds a huge cost to companies.
And the guys that installed the titantic HW often say that it is cool to be able to use powerful HW. In total ignorance of environmental or cost issues.
Furthermore, availability is always challenging. The more data flow and requests, the more availability is hard to achieve, especially for the database.
If I can find a good solution and then give it to people using nodejs, it may have an environment impact planet-wide. Yes nodejs is a booming technology that is as simple and attractive as Java or .NET for companies. I will submit a pull request to nodejs if I can find something good pre-made to use by people.
there is another solution. If one uses streaming with x-ndjson format, and if the backend-end tries to send an empty string, and if it fails because the front-end has aborted, it will be possible with a catch to trigger a stop of the backend request.
The abort event that you mention is for http.ClientRequest. It is not relevant because it aborts the client of the HTTP request. It does nothing on the back-end side of the HTTP request. The library http provides a new client. It is confusing because nodejs is considered a backend. But you can have a client in your backend when you want your backend to send an HTTP request.
A websocket allows to send data in two directions over a continuous time period. We could imagine a framework with a sort of heart beat ping in 2 directions. But I don’t think a websocket would provide a simple solution. The problem seems to only be in one direction, with the client informing the backend to abort. Hence, we do not need the 2 directions of the websocket.
I think now that my solution mentioning a persistent connection with keep-alive is not valid and was loaded with confusion. This is because the persistent connection is just reusing the TCP socket. But the requests are still separated. Moreover in HTTP1.1, all connections are persistent by default.
"Unlike HTTP/1.0+ keep-alive connections, HTTP/1.1 persistent connections are active by default. "
So I agree that the solution to this problem would need a mechanism as you suggest (m3g4popp). A second request will need an ID or token. But I stop you when you say the request would trigger an abort event with that token. The requests are independent. You need a way to make one request send a signal to another request.
The core of the solution lies in this communication between the 2 requests. I have suggested previously to use polling over an in-memory database like redis. IT should probably work. But it does not look like a simple solution. For the best solution, one would need to read the nodejs source code and find a way to make one request abort another one.
If the server was not nodejs but Apache. Then all requests would have a different process. There are 4 ways to communicate between processes: files, pipes, signals, UNIX sockets that are lightweight in network. I don’t think yo ucan scale the problem with signals. You cannot easily pipe 2 processes. piping must be done before you create the processes using fork(). So the UNIX sockets would be preferable. Using an in-memory database is actually based on using UNIX sockets.
Nodejs has only one process in on-clustered mode. One could use the shared memory of the process to register flags. Instead of using an in-memory database, one could use the memory of the nodejs process.
An in-memory database solves the problem in a simple way. But it would be slower than using only the process memory.
Hence the solution with in-process memory:
request 1: normal request with lots of work on the databse
request 2: to abort the request, it will set a flag in the process memory
the request 1 will poll the heap memory to see if the flag to abort is set
request 1. tell me on which server you are, the request 1 is load balanced
response 1: I am on server number N1
request 2: like above, do the work, but explicitely ask on server N1.
request 3: ask to abort.
Only the request 1 is distributed and available. But if anything else fails, you can retry back to step request 1.
Redis is extremely popular. I was not even considering a distributed redis but it helps there too. I think that Redis is quite greedy in HW resources. As I said above, we need greener backends. So if a clever trick can be used to make a simpler solution, it is nice for the planet earth.
Or maybe they should improve HTTP to fix this. A new way to abort would be nice.
If one process writes and the other process reads, you do not need a semaphore. There is no risk of a bad scenario that requires you to protect the data with a semaphore.
A semaphore protects the access to the data.
nodejs has in the simple set-up only one process.
The best is to search “semaphore” and read. There are inter-process semaphores and intra-process semaphores. I have already used both.
each request will have access to this global data structure.
THis solution being quite clear, there is still the problem of nojs in cluster mode on one machine, and on several machines. FOr that we need to use tha ddional first request to know exactly where the server is located. This means nodjs would be used in non cluster mode but on several machines.
Thanks a lot for the detailed reply @tvilmart! :-) I see my approach was a bit naive – didn’t think of different processes etc.
Hm… could you explain why this wouldn’t work in a non-clustered app though? As all requests are getting handled within the same process, any request handler would subscribe to the very same abort event emitter; so it is indeed possible to send a signal from one to another (works with my sample code at any rate). And even if you forked a child process, couldn’t you still communicate the event via .send() / .on('message')?
you do not need the abort-token API point to get a token. The expensive request can have for example a token parameter as part of the Url search parmeters (after the ? and separated by &)
I am not satisfied with the way you send an event between 2 promises. WE cannot say it is simple, there are lots of setTimeout and wrappers.
I think it would be better ot have:
the expensive HTTP request registers the pair token promise P1 in a global object.
the abort HTTP request retrieves the Promise P1 from the global object.
then P1.reject() is called and it will reject the promise.
The most simple solution is to have one request that stops the other request. I am not sure that in express, each request belongs to a promise, but we ca neasily create a new one.
).then(res => res)
In general, I don’t think event systems bring clean code. Because the declaration of the event callback is done in a global context.
That would work too – in my code above requesting the token actually is optional – although I think that generally the backend should be responsible for generating the token, so that the consumer doesn’t have to worry about uniqueness etc.
Ah yes, true! Unless you need to communicate the event between separate processes, the promise could directly register an abort function with the token in a global lookup table – here’s a gist that incorporates your ideas. As for the timeout, that’s just to fake an expensive request though.
I’m not quite sure what you mean… by no means do you have to declare callbacks in the global context. Quite the opposite: unless explicitly exposed, they live isolated inside the modules where they are defined, often even passed in directly as anonymous function expressions so that they are not accessible from any scope but the function that receives them.
It is less memory greedy to just store the reference to the promise in the map. Storing a function callback is heavier. But is is somewhat equivalent. Not a big issue really.
eventEmitter.on('abort, function f())
What I mean is that eventEmitter is a global object, and everything you put in it is a global context.
With threads (and we can compare promises to mini-threads), we don’t send data because we do not wan to coopy it. Se we share data in a global context. The minimum data required is the map from the tokens to references to the promises.
EventEmitter exists in nodejs, but not in the browser for some reason. They don’t want to encourage lots of global events affecting the code at many different places.
I have used an event system in the PIXI library. It made the code very spaghetti like. There is a better way than using events. One can very clearly expose what are the links between the different parts and what data is shared. Separate actions and data.