Querying JSON files with SQL

Hi all

I have always used DB’s for data storage and used general SQL syntax for SELECT, WHERE etc. Though in this instance I need to query data from a JSON file and not the DB directly using the same technique.

I’ve just come accross (drill.apache.org…querying-json-files)

SELECT * FROM cp.employee.json LIMIT 5;

Which I think is the equivalent to

SELECT * FROM employee LIMIT 5

Which has lead me to, what is DRILL?

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets.

My questions:

Do I need to configure DRILL to query a JSON file on my page using SQL as above?

Is there an easier way? Maybe a little overkill adding a storage plugin for simple query (drill.apache.org…storage-plugin-registration)

Any views or information how to do this, much appreciated thanks.

Barry

Update:

I’ve just came across JSON_OBJECT():

SELECT * 
FROM JSON_OBJECT($json);

$json = json_decode(file_get_contents('events.json'), true);

Any experience with this?

Having trouble how to use WHERE selecting from the $json file below:

Say for example I want to SELECT the LOCATION FROM the VEVENT with a UID of 87846 ?
I have removed the UID for simply viewing, though each item has a unique UID which we can query.

{
  "VCALENDAR": [
    {
      "METHOD": "PUBLISH",
      "VERSION": "2.0",
      "X-WR-CALNAME": "events",
      "PRODID": "-//Apple Inc.//iCal 4.0.4//EN",
      "X-APPLE-CALENDAR-COLOR": "#492BA1",
      "CALSCALE": "GREGORIAN",
      "VEVENT": [
        {
          "TRANSP": "OPAQUE",
          "DTEND": "20160601T020000Z",
          "DTSTAMP": "20160524T185925Z",
          "LOCATION": "Big Warehouse",
          "STATUS": "CONFIRMED",
          "SEQUENCE": "26533666",
          "CLASS": "PUBLIC",
          "SUMMARY": "Tickets OUT NOW!",
          "DTSTART": "20160531T210000Z",
          "CREATED": "20150722T163139Z"
        },
        {
          "TRANSP": "OPAQUE",
          "DTEND": "20160601T020000Z",
          "DTSTAMP": "20160524T185925Z",
          "LOCATION": "Small Warehouse",
          "STATUS": "CONFIRMED",
          "SEQUENCE": "26533666",
          "CLASS": "PUBLIC",
          "SUMMARY": "Tickets SOLD OUT!",
          "DTSTART": "20160531T210000Z",
          "CREATED": "20150722T163139Z"
        }
      ]
    }
  ]
}

As I’ve just discovered (JSON functions mysql)

JSON functions were added in MySQL 5.7.8

Which is a blow as my shared hosting is running 5.6.26 :frowning:

Unless anybody knows of a simple way adding JSON data as above into a DB table?

Update:
I think there could be a solution importing json into mysql using MYSQL Workbench - (mysql.com/downloads/workbench/)

Will post back on my findings, else great to hear from anybody with knowledge.

Further update:
After downloading Mysql Workbench, installed and running… only to realise I can only import CSV files.

Anybody?

Barry

I don’t have experience with Drill, but (sorry if this sounds stupid) if you want to perform queries like this, why not use a SQL db in the first place? ^^ Trying to make a document model behave like a relational one is a bit of a hopeless endeavour IMHE. (EDIT: I may have misread your post… are you trying to actually transfer a JSON file into a SQL db?)

Anyway to add something constructive, there are other ways to search an array… maybe something like this might do the trick?

function query($array, $column, $needle, $property) {
    $column = array_column($array, $column);
    $key = array_search($needle, $column);
    return $array[$key][$property];
}

// Output is "Small Warehouse"
echo query(
    $json["VCALENDAR"][0]["VEVENT"], 
    "DTSTAMP", 
    "20160524T185925Z", 
    "LOCATION"
);

Exactly! This is what I’ve been trying to do m3g4p0p… "hopeless endeavour IMHE - I second that.

I may have misread your post… are you trying to actually transfer a JSON file into a SQL db?

Yes. I just can’t find a way of importing JSON data into the database. Tried many times converting the JSON into a more suitable CSV but everything just turns into a mess. No structure.

Do you know how to easily import JSON into SQL db using the exact JSON structure we have here?

Thanks, Barry

It depends on the structure of your SQL db… ultimately you might have to bite the bullet and write a custom script to insert the data from the JSON file into the db appropriately. :-/ I don’t think there’s a quick and easy solution as a relational db has a different structure than a JSON file by nature.

Is the SQL db already fixed BTW, or do you want to create it from scratch on the basis of that JSON file?

Is the SQL db already fixed BTW, or do you want to create it from scratch on the basis of that JSON file?

Create from scratch to match JSON :slight_smile:

I’m just looking into another converter I found, the oppsite of what I used to convert the original iCAL file into JSON npmjs…ical2json

Currently trying json2csv - npmjs…json2csv
Having a small issue in the terminal…

Get back shortly.

Thanks, Barry

Vigorish testing at the minute :confused:

I’ve managed convert to CSV using json2csv though not much difference from my previous attempts, currently trying different options see if I can solve it. No doubt a few more hours on this for me ha.

Great to here if you find a solution.
Again, I’ll report back once I have a conclusion from my attempt.

Barry

You just need to create a custom importer for the data. There is no way around that.

Your not using a framework or cms by chance? Most well known cms’s have importers for common sources/vendor data. For example, Drupal has migrations and feeds.

This is what I looked at… though I wasn’t sure where and how to use an importer successfully without having to check every import for errors - Maybe once I understand this process things will seem a lot simpler :slight_smile:

Your not using a framework or cms by chance? Most well known cms’s have importers for common sources/vendor data. For example, Drupal has migrations and feeds.

:grin: ha

This is the main reason I’m building my own oddz.

Spent months using Drupal 7 and have a very good system setup with ical feed importers which updates great. The problem is the amount of files and modules needed, patches etc which is making things extremely slow and 10 lines of code just to change a class… Don’t get me wrong, I love Drupal and still use it, though in this instance I think building this manually is the best bet.

Also Drupal 8 is full steam ahead at the moment and I can’t use that for the importers because some modules I need and use havn’t all updated for the latest D8 release… could be years :expressionless:

Anyhow, I’ve managed to get a good export using json2csv after further testing.
I’m just building a small DB for testing now…

Get back once complete.

Barry

You pretty much just iterate over each row and insert one by one via a php script that runs at the command line.

Ah ok, I thought you might have to somehow squeeze the data into an existing schema. :-P Yes as @oddz says… you don’t really need to query the JSON then as there are only child-parent relations anyway (I suppose), so just iterate your way through the tree and write it into the db. The additional step to convert it to CSV only to facilitate SQL importing seems like unnecessary bother to me.

Great, both make it sound simply :slight_smile:

Ok, just to clarify. The best approach to import JSON data into the DB is by creating and using a snippet of PHP which will act as the importer?

If this is the case, a small example based on the JSON above would be great of how to do this or good resource that I can study for best approach and ease of use.


On another note…

I’ve managed to populate the DB table with data from the CSV file. The only problem now is that the timestamps don’t work and some columns don’t match up - some more testing should fix this hopefully.

The process so far:

  1. Fire up terminal and run
    json2csv -i events.json -f ID,DTEND,UID,DTSTAMP,LOCATION,DESCRIPTION -o events.csv
    I can also exclude columns without any errors, which is convenient (ID is auto int)

  2. Import CSV into db_table
    Though this is causing some errors right now as mentioned above

  3. Write some PHP for output.
    And thats it :sunny:

Though saying all this, very keen to see if and how the importer could be a better approach… less work, smoother execution.

I also still seem to be getting lots of \\ \\ \ stored in the columns?

Barry

To simplify any advice on the importer recommendation…

I have reduced the JSON to the bare essentials also removing VCALENDAR, VEVENT which I think was confusing things, cheers :slight_smile:

[{
"DTEND": "20160601T020000Z",
"UID": "Data here...",
"DTSTAMP": "20160524T185925Z",
"LOCATION": "Data here...",
"DESCRIPTION": "Data here...",
"SUMMARY": "Data here...",
"DTSTART": "20160531T210000Z",
"CREATED": "20150722T163139Z"
},
{
"DTEND": "20160608T020000Z",
"UID": "Data here...",
"DTSTAMP": "20160602T162640Z",
"LOCATION": "Data here...",
"DESCRIPTION": "Data here...",
"SUMMARY": "Data here...",
"DTSTART": "20160607T210000Z",
"CREATED": "20151007T152306Z"
}]

The sort of columns I’ll be using to import JSON into MYSQL:

ID : AUTO_INCREMENT
DTEND : DATETIME
UID : VARCHAR
DTSTAMP : TIMESTAMP/DATETIME
LOCATION : VARCHAR
DESCRIPTION : TEXT
SUMMARY : VARCHAR
DTSTART : DATETIME
CREATED : DATETIME

Thanks, Barry

Thanks again for the input @m3g4p0p and @oddz

Ok, I have an importer working after your advice, put me on the right track :smile:

Example code (just for testing)

A few questions below if you can offer any advice cheers.

$st = $mysqli->prepare("
  INSERT INTO events_test(
  DTEND
  , UID
  , DTSTAMP
  , LOCATION
  , DESCRIPTION)
  VALUES (?, ?, ?, ?, ?)
");

mysqli_stmt_bind_param($st, 'sssss'
  , $dtend
  , $uid
  , $dtstamp
  , $location
  , $description);   

$json = json_decode(file_get_contents('events.json'), true);

foreach ($json as $row) {
  
  $dtend = $row['DTEND'];
  $uid = $row['UID'];
  $dtstamp = $row['DTSTAMP'];
  $location = $row['LOCATION'];
  $description = $row['DESCRIPTION'];

mysqli_stmt_execute($st);
}

Questions

  1. The code above - Is this how you would code it for the importer?
  2. DTEND and DTSTAMP keep showing 0000-00-00 00:00:00 in DB - How do I fix this?
  3. Where do I run this code? Best place?
  4. How can I remove the \ and force \n to render inside the DESCRIPTION?

Appreciate your feedback here, and room for improvement maybe?

Update:

2 . I changed the columns DTEND, DTSTAMP from DATETIME to VARCHAR :slight_smile:
I can format these on the page, though please correct me if this is wrong.

Barry :sunglasses:

I think you might even do this recursively and largely unaware of the actual data. Here’s a (very) rough gist to automatically create an SQL db schema and populate it with the data from an arbitrary JSON file… no actual db execution, sanitizing etc., just for the basic idea. :-) Here’s a sample output from the JSON you posted. Usage would be like e.g.

echo json2sql("VCALENDARS", null, $json) . "\n";

and either simply save the output as an SQL dump to import into your db, or directly execute it within the PHP script.

Where you run your script doesn’t really matter, although if you want to execute SQL it would make sense to run it on the server, I suppose. As for the time stamp, maybe the database expects a different format? Anyway, formatting it on the page shouldn’t be a big deal.

Thanks a lot @m3g4p0p

Really appreciate the time spent here working on the code.
Feel a little bad now if I don’t use it, though giving more ideas and knowledge :sunglasses:

I’ve been working endlessly on my snippet above… hence the late reply.

What you have supplied is, or seems a little advanced for what I’m after. I’ll be running this importer about once a week populating one table which will be created in advance and will never change once created.

As mentioned. I have deleted the VCALENDAR and VEVENT now as shown in #13… was getting a little confusing but now we have one simple list of JSON objects.

I ran into an issue because the events kept uploading even when the same events was present… UNIQUE INDEX on UID solved that :slight_smile:

I think you might even do this recursively and largely unaware of the actual data.

Not even sure what this was, then I came across a function on php.net which mentioned ‘recursively’…

A recursive function is a function that calls itself during its execution. This enables the function to repeat itself several times, outputting the result and the end of each iteration.

Maybe we should add my latest snippet above into a recursive function? Will this improve things, how will it benefit the importer? Big difference?

As for the time stamp, maybe the database expects a different format?

Yes I needed to change to varchar and convert on the page… still having a few problems with this, getting there.

I have a couple of questions though getting very long this post.

The main question for me right now m3g4p0p…

Is the code/prepared statement I have built above correct? Could I potentially use this in production?

Thanks, Barry

To query a file in a JAR file in the Drill classpath, you need to use the cp (classpath) storage plugin configuration, as shown in the sample query.

Sample Query

Start the Drill shell, and select five rows of data from the employee.json file installed with Drill.

SELECT * FROM cp.employee.json LIMIT 5;

I wouldn’t see how. The idea of the recursion was to iterate down the JSON tree and create and populate the SQL db accordingly, like (in pseudo-code):

function json2sql($element, $parent) {

    if ($element is associative array) {
        create an $element table if not exists;
        reference the $parent ID as foreign key;

        foreach ($property in $element) {
            $value = $element[$property];

            if ($value is another array) {

                // Next step of the recursion will reference
                // the current $element as its $parent
                call json2sql($value, $element);
            } else {
                add a column $property to the table;
                insert $value;
            }
    } else {

        foreach ($index in $element) {
            call json2sql($element[$index], $parent);
        }
    }
}

and thereby retaining the child-parent relations with an ID map. However, if you’re explicitly importing certain properties by name (and your tree has only a depth of 1 anyway), the recursion is kind of redundant.

No worries if you’re doing it differently anyway. :-) That algo was supposed to be more of a general idea how to do this generically; in reality, it would require quite a few additional checks and special cases (such as caching the properties of a given object and eventually create a table with every possible column, type conditions etc.). And in your case, that might be overkill anyway. :-P

I don’t see an immediate error in the code… is it not working?

Yes this is what I thought, building extra functionality that’s not really needed, not just yet anyhow :slight_smile:

No worries if you’re doing it differently anyway.

Cool, still doing the same thing and some good code and ideas here if I decide to extend and build up on this thanks m3g4p0p

I don’t see an immediate error in the code… is it not working?

On the contrary… it works very well :grinning:

Truth be said, it was too easy ha - this is why I thought something can’t be right. I think because I spent a lot of time thinking things wouldn’t work, and all the time we spent on the JSON etc.

That’s nice to here!

Thanks again, couple of questions below:

I’m now trying to return only the events that have a date/timestamp of today onwards using FROM_UNIXTIME(DTSTART) AS FT… though something not right?

$queryEvents = "
  SELECT ID
  , DTEND
  , DTSTAMP
  , LOCATION
  , DTSTART
  , SUMMARY
  , FROM_UNIXTIME(DTSTART) AS FT
  FROM events_test
  WHERE FT >= NOW()
  ORDER BY DTSTART ASC";

And remember that DTSTART is a VARCHAR timestamp inside the DB. Every works if I remove FROM_UNIXTIME(DTSTART) AS FT and WHERE FT >= NOW()

But if I leave it in:

Call to a member function fetch_assoc() on a non-object in…

If it helps…

Further down from above I have the below which converts some dates:

$resultEvents = $mysqli->query($queryEvents);
$eventsArray = array();
while ($row = $resultEvents->fetch_assoc()) {
  $row['endTime'] = date('g:ia', strtotime($row['DTEND']));
  $row['fullDate'] = date('l jS F Y', strtotime($row['DTSTART']));
  $row['startTime'] = date('g:ia', strtotime($row['DTSTART']));
  $eventsArray[] = $row;
}

On the page I’m using things like:

foreach($eventsArray as $value){
  echo {$value['LOCATION']} {$value['endTime']}

And if I remember correctly, somebody once said it would be quicker if I return events from >= NOW() using the PHP code, instead of inside the SQL? Not sure if there is much difference in the speed.

Barry

1 Like

Thanks again @m3g4p0p

I’ll bring this thread to a close now… I’ve continued on with the code over on our previous thread in the PHP forum if you would like to keep a watch or get involved thanks :sunglasses:

…json-will-not-show/227556/12

Barry