CSV to JSON but columns instead of rows

http://techslides.com/convert-csv-to-json-in-javascript

First I take the whole CSV file and split it into an array of lines. Then, I take the first line, which should be the headers, and split that by a comma into an array.

I have a google spreadsheet which uses “column A” as its headers. It seems the typical CSV->JSON functions expect Row1 to be the headers. Is there a simple way to reverse the logic of this function?

function csvJSON(csv){
        console.log(csv);

        var lines=csv.split("\n");
        var result = [];
        var headers=lines[0].split(",");
        for(var i=1;i<lines.length;i++){
          var obj = {};
          var currentline=lines[i].split(",");
          for(var j=0;j<headers.length;j++){
            obj[headers[j]] = currentline[j];
          }
          result.push(obj);
        }
        return JSON.stringify(result); //JSON
      }

It’s certainly doable.
Let’s first shatter the CSV into a multidimensional array. (We’re gonna need all the data separated ahead of time.)

function csvJSONvert(csv) {
  var array = csv.split("\n").map((x) => x.split(","));

Now walk your shattered array; rather than walking the length of the lines, we walk the length of the first ‘line’, which should represent the ‘rows’ of the data. (Again, if your data in line 1 is not complete, this will cause problems.)

  var result = []; //holding place for my result
  for(var i = 1; i < array[0].length; i++) { //Foreach 'Row' of my Data
    var obj = {}; //take an empty object,
    for(var j=0; j < array.length; j++) { //foreach 'column' of my data
      obj[array[j][0]] = array[j][i]; //instantiate property. 
      //the i'th element of the j'th entry is the value for that cell.
      //the 0th element of the j'th entry is the property name.
    }
    result.push(obj); //Finished with this one, put it away.
  }
  return JSON.stringify(result); //Returns the JSON of the array of my results.
}

This does assume that the values in the ‘headers’ (column A) are valid javascript property names.

Thank you! I have been console logging the data. So, here is an example of a cell in column A - one of our headers.
Two Words: $5,000-$9,999

I am reading the code and can quite clearly see that it’s splitting the array of results by comma, so obviously these commas are issues. I assume there’s no other way to resume without modifying the spreadsheet to remove these commas? I can’t think of a way to ignore specific commas. I looked at the spreadsheet and there are several (280) commas, and not just used in a money format like above. E.g. Dr. and Mrs. Name Here, Jr.

Thank you @m_hutley you are always very knowledgeable and helpful.

Yeah, Javascript isnt going to accept “Two Words: $5,000-$9,999” as a property name.

If that’s the value in column A, you’re going to have a bad time. You might need to devise your own header array and use that instead. (which is really not that difficult to do.)

Regarding the commas, if you open the CSV in a text editor, are your fields wrapped in quotes? If so, we can work with that. If not, any CSV parser is going to choke on this file.

Fair enough. I figured as much.

From what I can tell, any cell in my spreadsheet that has a comma is getting quotes wrapped around it. That seems to be the pattern I’m identifying as I look at each instance of “quotes” in my CSV.

So you’d trim the ends of the string, and then split on a literal ","

If not every cell is string-encapsed… man your system’s making it annoying on people…
you’d have to get fancy with some regexing.

In JavaScript any string value can be used as a property name.

let obj = {};
obj['Two Words: $5,000-$9,999'] = 'whatever';
console.log(obj['Two Words: $5,000-$9,999']);

A better way to store the data though is probably a Map. The key can be the header and value an array of row data.

I suggest you download your Google spreadsheet in tab-separated values file format.

Strictly speaking, this is untrue; a property Identifier has stronger requirements than EMCAScript bracket notation’s workarounds. But this is semantic argument.

or any separator that’s not in the data, which is kind of where the idea was going for the quoted strings - slice out the values inside quotes, replacing them with a placeholder; shatter the text file, and then whenever you encounter a placeholder, plop back in the appropriate value, sans-quotes.

Google Sheets does not allow you to specify a separator. The options are:

If it works it works.

Right. The idea was to introduce his own through code, if one was not available in the data sets. If, for example, the data contains both a comma, and a tab, then both output types would be ‘dirty’ in data terms.

Do you mean if “every cell that has a comma in it IS NOT encapsulated with quoets, then we’d need fancy regexing”? Just making sure I understand you.

No, i mean every field.

0,1,"this is a, string","this is another string",4

How do you split that?

Can’t split on commas - you’ll split the one in the middle of a string.
Can’t split on "," because you’d end up with 2 fields.

If there are no tabs in the data, then as Archibald said, download it as a TSV instead of a CSV and split on tab.

If there are tabs and commas in the data… time to get fancy.

Based on what I know is in the google spreadsheet data, I don’t think there would be tabs in the data. I’ll work on converting my code to work with the TSV but I’ve never done TSV work before so I’ll see if I run into any snags.

1 Like

Just change split(",") to split("\t")

1 Like

I’ll try that! This work is a little on the backburner due to actual project work coming up, but I hope to try this within the next couple days!

It worked great! There’s just one thing I can’t figure out. It seems that the result is a string (as exemplified by the stringify name). I’m trying to parse this data. Let me back up here. I’m trying to fix a broken program that ran off of a JSON-generated google spreadsheet. There used to be a JSON view of a google spreadsheet and we could use that URL in our programs. That has now stopped.

So my issue with this result, is that while the data looks great, I think the result needs to be tweaked. Since I’m modifying this existing program and it used(?) to be a JSON object. With the JSON data, it would run…

for (var i = 0; i < JSON.length; i++) {

But since this is a string, it’s counting each letter as part of its .length where I think I need it to behave more like an object / key array.
Here is what the data looks like in its stringified version

var test = [
            {
              "Donors": "value",
              "Donors": "value",
              "Donors": "value"
            },
            {
              "Donors": "value",
              "Donors": "value",
              "Donors": "value"
            },
            ...
          ];

So if you don’t need a string… dont stringify the result and return it as is?

I’m not sure what the question is here.

I misunderstood some console logs - I had attempted just JSON.parse but I got an error, but returning just results was what I wanted.

I have a new issue as I’ve been working through this.

var JSON = tsvJSONvert(tsv);
          $('.list-search-hide-too').append('<table class="output"><tbody></tbody></table>');
          console.log(JSON);
          for (var i = 0; i < JSON.length; i++) {
            console.log(JSON[i]);
            for (const [key, value] of Object.entries(JSON[i])) {
              console.log(`${key}: ${value}`);
              $('.list-search-hide-too table.output tbody').append(
                '<tr><td>'+ key +'</td></tr>',
                '<tr><td>'+ value+'</td></tr>'
              );
            }
          }

This has been console logging for a few minutes now, so I’m guessing an infinite loop of some sort but I don’t see it? My issue did not appear until I introduced the Object.entries code. I threw that in because JSON[i] has multiple keys/values in it so I was trying to loop through them, because otherwise the <tr><td> has many values in it instead of 1 cell per value, if that makes sense.