An excerpt from http://www.sitepoint.com/exploring-githubs-public-events-php-google-bigquery/, by Younes Rafie
If you’ve been following along with my previous articles about Github’s API, you know that Github’s developers are doing their best to ease the pain of interacting with Github data. In this article, we’re going to take a look at the Github public events API and we will build a small demo along the way.
What are Github Public Events?
Github events are interactions made by users, like pushing, merging, creating repositories, etc. It’s like storing the history of Github. However, we are limited to the last 300 events, and that’s a problem if you want to search through the data of the whole history.
Ilya Grigorik wanted to keep track of his open source project, but due to the limitation of the Github public events API, he decided to create the GithubArchive website where he queries the Github API and stores the results. You can query the archive with links like http://data.githubarchive.org/2015-01-01-15.json.gz and you’ll get an archive containing the JSON payload. This solution is not really efficient and doesn’t help with querying the history for data, so, when Google BigQuery was released, Grigorik moved all of his data to the new storage and made it public.
Why BigQuery?
Google BigQuery was created to solve the queries latency problem. When you’re dealing with big data, you need to have the right hardware and infrastructure to handle the load properly. Now, you have the opportunity to move your append-only database to Google’s infrastructure and see millions of data rows processed in seconds.
If you’re not familiar with Google APIs, be sure to check this guide about starting a new Google API project. After that, be sure to enable the BigQuery API on your Google developers console.
Setting Up the Environment
I will be using Laravel as my framework of choice, and Homestead Improved for my VM. We will grab our Google credentials from the config/google.php file, and to follow Laravel’s security conventions I will store those credentials inside the .env file.
// config/google.php
return [
'app_name' => env('app_name'),
'client_id' => env('client_id'),
'client_secret' => env('client_secret'),
'api_key' => env('api_key')
];
// .env
APP_DEBUG=true
app_name='Optional app name'
client_id='CHANGE IT'
client_secret='CHANGE IT'
api_key='CHANGE IT'
I will also create a new Service Provider to hold our Google services bindings.
// app/Providers/BigQueryServiceProvider.php
class BigQueryServiceProvider extends ServiceProvider
{
public function register()
{
$this->app->bind('GoogleClient', function () {
$googleClient = new \Google_Client();
$googleClient->setAccessToken(\Session::get("token"));
return $googleClient;
});
$this->app->bind('bigquery', function () {
$googleClient = \App::make('GoogleClient');
$bigquery = new \Google_Service_Bigquery($googleClient);
return $bigquery;
});
}
}
// config/app.php
// ...
'providers' => [
// ...
'App\Providers\BigQueryServiceProvider',
]
Since our application will require the user authorization to query the API, I tried to make the process easier by creating a Middleware for guarding our routes.
// app/Http/Middleware/GoogleLogin.php
class GoogleLogin
{
public function handle($request, Closure $next)
{
$ga = \App::make('\App\Services\GoogleLogin');
if (!$ga->isLoggedIn()) {
return redirect('login');
}
return $next($request);
}
}
// app/Kernel.php
protected $routeMiddleware = [
// ...
'google_login' => '\App\Http\Middleware\GoogleLogin'
];
Google BigQuery Query Console
Before we start working with BigQuery using the PHP library, lets start exploring the BigQuery Query console. The console has a Compose Query button to let you write your queries. Underneath, we have a Query History link which shows our queries history. Since our queries will take a while to respond, we can execute a query as a job and ask for its stats after a while – the Job History link holds the list of the current jobs.
If you want to take a look at the table schema and explore the available fields, you can navigate to the left panel and select the Github timeline table.