Fetch newline delimited json

This function uses fetch to retrieve and process a stream of ndjson incrementally instead of parsing a potentially huge response all at once. This function works so far but I’m just starting to really get into JavaScript so I was hoping that folks could just give me some tips about how to make this better even if it is just style wise.

Comments?

function fetchNdjson(resource, init, resultfn) {

    function NdjsonReader(reader) {

        const that = this;

        this.decoder = new TextDecoder();
        this.buf = ""; 

        reader.read().then(function ndjsonproc(result) {

            if (result.done) {
                that.nextfn(JSON.parse(that.buf));
                return;
            }

            that.buf += that.decoder.decode(
                        result.value, { stream: true }); 

            const lines = that.buf.split(/[\r\n](?=.)/);
            const ln = lines.length - 1;
            that.buf = lines[ln];

            for (let li = 0; li < ln; li++) {
                that.nextfn(JSON.parse(lines[li]));
            }

            return reader.read().then(ndjsonproc);
        });

        this.next = function(fn) {
            this.nextfn = fn; 
        }
    }

    fetch(resource, init).then(
        resp => resp.body.getReader()
    ).then(reader => {
        (new NdjsonReader(reader)).next(result => {
            resultfn(result);
        });
    }).catch(console.error);
}

This is called just like fetch mostly but with a callback that gets each JSON object:

const body = {
    cmd: "userlist",
    param: 99,
};

fetchNdjson(form.action, {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
    },
    body: JSON.stringify(body),
}, result => {
console.log(result); // called for each result
});

1 Like

I can’t help but think these two lines are either incongruent or your userlist contains far more information than it should.

That is just a contrived bit example usage. The body of the request could be anything. It doesn’t even need to be JSON.

Well, yes, by definition, it must be json. That’s what an NDJSON parser reads. Feeding it anything other than JSON will cause it to fail.

If you’re just trying to include bulk text files into your page such that a standard AJAX pull-and-insert won’t do, you’re probably putting too much text into a page…

That said, I don’t see anything particularly wrong with your code as-is. It’s about as compact as you can make it, doesnt seem to break any ‘style’ guidelines that jump out to me.

You have misinterpreted the code. The “userlist” bit you’re referring to is in the request which has nothing to do with the function being critiqued.

So what you’re trying to say is that

Is intended to fetch things other than Ndjson?

Nope

So I can see where we’re talking about two different things, and admittedly, it’s my fault for not being particularly scrupulous with my terminology.

That statement is true. It also is completely irrelevant.

fetch() by definition takes two parameters: a string and optionally an object. So… yes. That part must be a JSON.

Your response could be anything; but as you’ve put it inside a function called fetchNdjson and are passing it directly into an ndjson parser… it definitely needs to be json. Not necessarily of the nd variety, but it’d have to be a pretty simple one otherwise.

The function is written to parse an ndjson; it parses an ndjson. Nothing really to critique there, it does what it says on the tin, and as far as I can see, does so in probably the most compact way it can, so there’s not really even any stylistic note for me to give. I’m not quite sure what you’re wanting a critique of? The response is already in memory (because fetch doesnt stream, it returns the entire response), and so i’m not entirely sure I understand the value of ‘streaming’ it, but that would i suppose depend on your end use case?

Ok, thanks. Like I said, I’m just starting to really get into JS. I normally write Java networking code and C. JS transformations seem to be pretty convoluted sometimes. There are so many equivalent ways to do things, being still a little green, I’m not sure what the prevailing techniques are at this point. I know how to write good code but I’m still googling a lot more than I should I think.

No, it’s not in memory. It depends on what the server is doing precisely but in my default LAMP test site it’s using 64K buffers. So resp.body.getReader().read().then() supplied function is called with each 64K chunk.

Also, just as a general warning if you care, IE does not support fetch() (or ReadableStream at all). But Edge does. So… know your audience?

Hi @ioplex, FWIW that for loop looks a bit clumsy… you could just chain array methods instead like so:

that.decoder
  .decode(result.value, { stream: true })
  .split(/[\r\n]/)
  .map(JSON.parse)
  .forEach(that.nextfn)

Also, I don’t see much point in defining that NdjsonReader constructor (and a new one for each request!) and then assigning a nextfn to that – you might just call resultfn directly instead:

function fetchNdjson (resource, init, resultfn) {
  const decoder = new TextDecoder()

  return fetch(resource, init).then(
    resp => resp.body.getReader()
  ).then(reader => reader.read().then(function process ({ value, done }) {
    if (done) {
      return
    }

    decoder
      .decode(value, { stream: true })
      .split(/[\r\n]/)
      .map(JSON.parse)
      .forEach(resultfn)

    return reader.read().then(process)
  }))
}

(BTW I would leave catch()ing errors to the consumer so they can handle it appropriately – just return the promises you chained to the fetch() call.)

If you’re into async / await (i.e. basically asynchronous code looking synchronously), then another approach be to define a async iterator for to consume the reader… speaking of IE and all LOL:

const readerIterator = reader => ({
  [Symbol.asyncIterator] () {
    return this
  },

  async next () {
    const value = await reader.read()
    return value
  }
})

function fetchNdjson (resource, init, resultfn) {
  const decoder = new TextDecoder()

  fetch(resource, init).then(
    resp => resp.body.getReader()
  ).then(async reader => {
    for await (const value of readerIterator(reader)) {
      decoder
        .decode(value, { stream: true })
        .split(/[\r\n]/)
        .map(JSON.parse)
        .forEach(resultfn)
    }
  })
}

That won’t work because each result.value chunk can break in the middle of an ndjson record. So the clumsy looking part is actually for saving the last ndjson record until result.done. The object is necessary to keep the buf state.

I see, good point. Yet there is no need to instantiate something like a NdjsonReader to hold the buffer, you can simply keep that in memory as a regular variable over which you create a closure:

function fetchNdjson (resource, init, resultfn) {
  const decoder = new TextDecoder()
  let buffer = ''

  return fetch(resource, init).then(
    resp => resp.body.getReader()
  ).then(reader => {
    return reader.read().then(function process ({ value, done }) {
      if (done) {
        return
      }

      const lines = (
        buffer + decoder.decode(value, { stream: true })
      ).split(/[\r\n](?=.)/)

      buffer = lines.pop()
      lines.map(JSON.parse).forEach(resultfn)

      return reader.read().then(process)
    })
  })
}

And if at all, I’d pull that “class” out of the function so that you don’t have to re-define it for each fetchNdjson() call; in JS, you’d usually define classes to save memory as all instances point to the same prototype. But your class doesn’t have a dedicated prototype, and it is getting discarded after their first use anyway, so you might as well use an object literal directly (or a closure for that matter).

Yes! Nice work m3g4p0p. I think I thought the object was necessary to keep it re-entrant not fully visualizing the whole callback tree paradigm.

For posterity, here’s the latest working and tested version which mostly just adds emitting the last line compared to your version. I couldn’t bring myself to drop semicolons.

function fetchNdjson(resource, init, resultfn) {
    // note that the server must flush
    // periodically to actually stream

    const decoder = new TextDecoder();
    let buf = ''; 

    return fetch(resource, init).then(
        resp => resp.body.getReader()
    ).then(reader => {
        return reader.read().then(function process({ value, done }) {
            if (done) {
                resultfn(JSON.parse(buf));
                return;
            }

            buf += decoder.decode(value, { stream: true }); 
            const lines = buf.split(/[\r\n](?=.)/);
            buf = lines.pop();
            lines.map(JSON.parse).forEach(resultfn);

            return reader.read().then(process);
        });
    });
}

One thing worth mentioning is that with my stock LAMP site I have to call ob_flush() to get the server to actually stream. Otherwise it just buffers and emits everything at once. That’s probably the right thing to do 90% of the time but if this code is really necessary, whatever is emitting the JSON data will need to call ob_flush to push each chunk.

And to hell with IE.

2 Likes

Ah yes thanks, missed that one…

Sry I didn’t mean to imply you should drop them… I’m just mostly writing standard style myself, so my editor will strip them automatically. ^^

Absolutely! :-) We have to acknowledge that there are times when we still do have to support IE, but fortunately these times are getting rarer… and as for myself, I’d rather use a transpiler and polyfills than bother to write IE-compliant code by hand.

BTW here’s a neat trick to make sure your users don’t get any surprises when visiting your site with IE:

<script nomodule>
  // Modern browsers will ignore this code
  document.write("Sorry, this site won't work in IE -- consider using a real browser.")
</script>

<script type="module">
  // Your actual code goes here; browers that do support native modules 
  // will also support (mostly) any other contemporary JS feature and API
</script>

You guys are amazing, I’m so happy I delayed working on exactly the same problem by a week! :sweat_smile:

Out of curiosity, why are you requiring the extra character after the newline when splitting the buffer?

When the backend streams lines like {"status": "1/10"}\n, this delays calling resultfn with the “1/10” data until after the “2/10” data has been streamed.

Is it to avoid trying to parse empty lines?

If the (?=.) look-ahead was not used, split will return an array where the last line of the chunk is either A) a fragment of an ndjson record (because a chunk can break in the middle of a record) or B) an empty string (because the chunk ends with a newline). These two conditions would need to be tested and accounted for in the logic. So the look-ahead is a round-about way of dodging that extra condition so that the last ndjson record can simply be emitted by itself after done == true.

It might be a worthwhile exercise to remove it and just test for the empty line. It probably is a little pedantic to do it in regex.

And yes it does also eat up empty lines but I would consider empty lines to be an error on the part of the server so that’s not an important feature.