CSV to JSON conversion using NPM package 'csvtojson'

I’m part way through building a small JavaScript routine that imports data from a CSV file, and that then converts it into a JSON format using the NPM package ‘csvtojson’ (‘From CSV String’ section), before saving it out as a JSON file.

The code below works perfectly well in importing the CSV and saving it out into a file with a .JSON file extension, though at this point it’s actually still in a CSV format. What I’m scratching my head at, is understanding where the appropriate place in the code is to position the JSON conversion, and then how to integrate that into what already exists. Could someone give me a quick pointer?

const https = require('https');
const fs = require('fs');
const path = require('path');
const uuidv1 = require('uuid/v1');
const csvToJSON = require('csvtojson');

// Write function to fetch CSV writeFileSync
const downloadCSV = (url='https://prod-edxapp.edx-cdn.org/assets/courseware/v1/07d100219da1a726dad5eddb090fa215/asset-v1:Microsoft+DEV283x+2T2017+type@asset+block/customer-data.csv') => {
  console.log('Downloading CSV file at: ', url);

  const fetchFile = (urlF, callback) => {
    https.get(urlF, (response) => {
      let buff = '';

      response.on('data', (chunk) => {
        buff += chunk;
      })

      response.on('end', () => {
        // console.log('buff is typeof: ', typeof buff);
        // console.log('buff contains: ', buff);
        callback(null, buff);
      })
    }).on('error', (error) => {
      console.log(`An error has occurred: ${error.message}`);
      callback(error);
    })
  }

  const folderName = uuidv1();  // Create a unique string based on system timestamp

  fs.mkdirSync(folderName);     // Create folder name using unique string (folderName)

  fetchFile(url, (error, data) => {  // Call fetchFile functon
    if(error) {
      return console.log('An error occurred: ' + error);
    }

    fs.writeFileSync(path.join(__dirname, folderName, 'url.txt'), url); // Write URL downloaded out to url.txt
    fs.writeFileSync(path.join(__dirname, folderName, 'file.json'), data); // Write downloaded data out to file.json

    console.log('Your download is complete, and can be found in folder: ', folderName);
  })
}

downloadCSV(process.argv[2]);

Just a wild guess @chrisofarabia as I have never done this before with JS:

fs.writeFileSync(path.join(__dirname, folderName, 'url.txt'), url); // Write URL downloaded out to url.txt
// wouldn't it be here where you have to convert 'data' to JSON format before writing it into a file?
fs.writeFileSync(path.join(__dirname, folderName, 'file.json'), data); // Write downloaded data out to file.json

2 Likes

You might actually directly pass the response to csvToJSSON() as a stream, and then pipe it to an fs write stream:

const jsonFilePath = path.join(__dirname, folderName, 'file.json')
const writeStream = fs.createWriteStream(jsonFilePath)

https.get(url, res => {
  csvToJSON()
    .fromStream(res)
    .pipe(writableStream)
})

This way you don’t even need that temporary buffer. :-)

2 Likes

Hey Chris,

@m3g4p0p has already given you the best answer (pass the response to csvtojson as a stream), but if you’d like to stay closer to your original code, here’s an alternative.

Inside the fetchFile callback, you have the contents of the CSV file as a string (in the data variable). That means you can pass it to csvtojson’s fromString method, then listen for the end_parsed event, which will offer up a JSON array when it is done. You can then write this JSON array to a file.

fetchFile(url, (error, data) => {
  if(error) return console.log(`An error occurred: ${error}`);

  csvToJSON()
    .fromString(data)
    .on("end_parsed", (jsonArrayObj) => { 
      fs.writeFileSync(path.join(__dirname, folderName, 'file.json'), JSON.stringify(jsonArrayObj));
      console.log('Your download is complete, and can be found in folder: ', folderName);
    });
});
2 Likes

Thanks all (@Andres_Vaquero, @James_Hibbard & @m3g4p0p). I think I was getting very tired when I posted last night and should have been able to work that out for myself. I went with @James_Hibbard’s solution in the end, as it saved me pulling too much apart.

The only thing I’m curious about is why this would be the case. What advantages does it offer?

Less code, more obvious intent and clearer flow control, for one thing:

const https = require('https');
const fs = require('fs');
const path = require('path');
const uuidv1 = require('uuid/v1');
const csv = require('csvtojson');

const downloadCSV = (url) => {
  const folderName = uuidv1();
  fs.mkdirSync(folderName);

  const jsonFilePath = path.join(__dirname, folderName, 'file.json');
  const writeStream = fs.createWriteStream(jsonFilePath);

  https.get(url, (res) => {
    csv()
      .fromStream(res)
      .pipe(writeStream);

    console.log('Downloaded CSV file at: ', url);
  })
    .on('error', (err) => {
      console.error(err);
    });
};

const url = 'https://prod-edxapp.edx-cdn.org/assets/courseware/v1/07d100219da1a726dad5eddb090fa215/asset-v1:Microsoft+DEV283x+2T2017+type@asset+block/customer-data.csv';
downloadCSV(url);
1 Like

You’ve done this before haven’t you? :wink:

It’s certainly shorter, and I can see how it clarifies things. I think that’s one of the things as I’ve been going along with this, is not so much understanding the code, as figuring out ways to structure it. At the moment, I’m still at the “write whatever I need to get something to happen” stage - elegance can come later.

No worries, Chris :slight_smile:

Fair play. The main thing is to keep chipping away at it and to do as much coding as you can. It’s always nice when you revisit old code and can apply techniques that you have learned in the meantime.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.