The entire quarter-billion-record GDELT Event Database is now available as a public dataset in Google BigQuery.
This is the sentence at the top of the release post, and it’s a really big deal.
The Global Database of Events, Language and Tone is one of the largest datasets on the planet. It is the quantitative database of human society, relying on thousands of news sources from every corner of the globe dating back to 1979.
It was thought up by Kalev Leetaru, who is also the author of the Google release post referenced above. The GDELT covers all countries globally spanning a third of a century, and consists of daily updates during that time period. Hundreds of millions of records, each with 59 fields narrating into detail the actors and events having taken place. Every record is georeferenced, so you can globally place it, and all actors are tagged with appropriate ethnic and religious affiliation. All this – free and available for your perusal, and you don’t even have to have the computing power to handle it.
Google BigQuery, “Google’s powerful cloud-based analytical database service” is, basically, the world’s fastest SQL engine, and it’s completely free for any and all uses of GDELT. Due to the sheer power of BigQuery, you can get results on GDELT queries in near real-time and any permutation of fields and values you can think of won’t be enough to bog it down to a halt – unless you really mess things up and go against the grain. If you deal with databases in any regards and the following paragraph doesn’t send chills down your spine, you’re probably dead:
For us, the most groundbreaking part of having GDELT in BigQuery is that it opens the door not only to fast complex querying and extracting of data, but also allows for the first time real-world analyses to be run entirely in the database. Imagine computing the most significant conflict interaction in the world by month over the past 35 years, or performing cross-tabbed correlation over different classes of relationships between a set of countries. Such queries can be run entirely inside of BigQuery and return in just a handful of seconds. This enables you to try out “what if” hypotheses on global-scale trends in near-real time.
Currently, GDELT on BigQuery is updated daily, but there are plans to move to a near real-time update schedule – updating the dataset every 15 minutes.
Before you get too excited – there is a limit, but it’s not a quota you’ll easily hit. To read more about free quotas, see here and keep in mind you can always pay for more if you actually develop a commercially viable application on top of this data.
Running a sample query
You can start playing around with GDELT on BigQuery by visiting this URL – you might have to make a new project if you don’t have one already. After gaining access, you should see a screen not unlike the following:
To run the sample query from the release post, click the red “Compose Query” button, paste the SQL into the newly opened textarea and click “Run Query”. Mine took 20 seconds, yours may take anywhere from 5 to 30, but you should get a result not unlike this one:
Using it with PHP
To see how you can use BigQuery and PHP, stay tuned on SitePoint for articles that target that specific combination – they’re coming in June. For now, you can check out this excellent Lever.rs post post that runs through it in a very approachable manner.
In a nutshell, you need to use the PHP library Google provides and install it with Composer or through alternative means. Once done, you need to include the lib in your project as you normally would, through Composer’s autoload file, and you can start using the API.
For a full introduction on how to get started, obtain API keys and get deep into using Google APIs for access to BigQuery and similar services, please see this guide. You can also RSS subscribe to the Google App Engine tag and you’ll be instantly notified of new posts in that category.
The GDELT project has long been an admirable one, but the advent of its BigQuery release marked a new milestone – a general availability to the public never before seen. Everyone now has the ability to query the world’s history, and we can’t wait to see what you build – judging by Kalev, the author, neither can the GDELT team. They’re inviting you to share your queries and experiments with them and if impressive enough, they just might share them with the world on the official blog. If you do come up with anything stunning, let us know – we’re keen to publish tutorials and analyses on it!
Bruno is a blockchain developer and technical educator at the Web3 Foundation, the foundation that's building the next generation of the free people's internet. He's also a DX person at Diffbot. He runs two newsletters you should subscribe to if you're interested in Web3.0: Dot Leap covers ecosystem and tech development of Web3, and NFT Review covers the evolution of the non-fungible token (digital collectibles) ecosystem inside this emerging new web. His current passion project is RMRK.app.