PINQ – querify your datasets – introduction

Taylor Ren
Tweet
This entry is part 1 of 2 in the series PINQ - Querify Your Datasets

PINQ - Querify Your Datasets

You may have heard of LINQ (Language-Integrated Query), a “set of features introduced in Visual Studio 2008 that extends powerful query capabilities to the language syntax of C# and Visual Basic”.

It provides necessary interfaces and syntax to perform various queries on a given dataset so that operations like filtering, sorting, grouping, aggregating, etc can be performed.

PINQ (PHP Integrated Query) is “based off the .NET’s Linq, and unifies querying across arrays/iterators and external data sources, in a single readable and concise fluent API”. (Excerpted from PINQ’s official site)

Why another query language?

PHPers are very much comfortable with executing queries with either raw SQL statements or via ORM. We select the data from a database, process it and display it in a chosen format (say, a table form). If we need another set of data, we issue another statement, process the returned dataset and display it.

In normal cirucumstances, this process is both sufficient and efficient.

But there are cases where this process simply won’t work. Take, for example, a typical e-Commerce website. The user enters one search keyword (say, “router”) and the site displays every matching item. The initial search may only search items with their description, category or tags containing the keyword. Next, the user will start to fine tune the search results by selecting a brand, a price range, etc.

This fine tuning process is called “faceted” search. Some database engines (like SOLR) have this capability built in (as described in this series: Using Solarium for SOLR Search) but obviously MySQL does not come with this functionality.

That does not mean, however, that MySQL can’t provide such features. After all, it is all about constructing a new SQL statement and fetching the data again. This has some disadvantages, however:

  1. The criteria of the SQL statement, i.e., the “where” and/or “group by” part, can get very complicated after SQL construction.
  2. As the SQL statement will be very dynamic, it can’t be optimized and will make indexing more difficult.
  3. It will create a huge overhead when communicating the SQL statement back to the db server.

In these cases, PINQ may come in handy. It is a PHP version of the LINQ library which provides filtering, sorting, grouping, aggregating, and indexing on a given dataset.

Preparation

In this series of two parts, we will demonstrate how to use PINQ to mimic a “faceted” search. We will use the sample book collection application’s data (see Data Fixtures in Symfony2 on how to dump the sample data) but with some slight modifications.

Also, we will use Silex, a very light-weight PHP framework and Bootstrap CSS to simplify the app setup. Please follow the instructions in their respective websites on how to set up a Silex web app and integrate Bootstrap CSS.

The sample data we used in this demo is slightly modified and I have uploaded it to the repo for this demo. The source code can also be found there.

PINQ Installation

The recommended PINQ installation is to modify the composer.json file that comes with Silex and add one more line in its require section:

{
    "require": {
        "silex/silex": "~1.1",
        "twig/twig": ">=1.8,<2.0-dev",
        "doctrine/dbal": "2.2.*",
        "symfony/twig-bridge": "~2.3",
        "timetoogo/pinq": "~2.0"
    }
} 

Please note that I have also added a few more dependencies: Twig (and Twig-Bridge) to display the results, and Doctrine as I am to grab data from my database for further processing.

After this, we can run a composer.phar update command to install all necessary packages.

Demo 1: the basic usage

We will first show a few lines of code to demonstrate the basic usage of PINQ. I will grab the books as they are, do a simple filter (for price between 90 and 99) and then display the aggregation information for different authors.

The display will be like this:

Let’s see how to make this demo page.

NOTE: I am not going to cover the basics on Silex app setup in this article. Silex’s default index.php does not include support for Twig/Twig-bridge and Doctrine-DBAL. You will need to enable these two modules in your app’s index.php file. Please refer to Silex’s official site to find out how.

We first create a Demo class in pinqDemo.php as the data provider for our app:

<?php

namespace pinqDemo
{

    class Demo
    {

        private $books = '';

        public function __construct($app)
        {
            $sql = 'select * from book_book order by id';
            $this->books = $app['db']->fetchAll($sql);
        }

        public function test1($app)
        {
            return $this->books;
        }

    }

}

This file is very simple. In the class declaration, we have a constructor that retrieves the data from the db server and a function that returns the dataset back to the calling function.

This class can of course be enhanced, for example, by introducing some typical design patterns.

Next we will move to the index.php file and see some of the basic PINQ usages applied on the data that we retrieved.

// Excerpt of index.php
use Pinq\ITraversable,
    Pinq\Traversable;

... ...

$app->get('/demo1', function () use ($app)
{
    global $demo;
    $books = $demo->test1($app);
    $data = Traversable::from($books);

    //Apply first filter
    $filter1 = $data
            ->where(function($row)
            {
                return $row['price'] > 90 && $row['price'] < 99;
            })
            ->orderByDescending(function($row)
    {
        return $row['id'];
    });

    $filter2 = $filter1
            ->groupBy(function($row)
            {
                return $row['author'];
            })
            ->select(
            function(ITraversable $filter1)
    {
        return ['author' => $filter1->first()['author'], 'count' => $filter1->count()];
    }
    );

    return $app['twig']->render('demo1.html.twig', array('orig' => $data, 'filter1' => $filter1, 'filter2'=>$filter2));
}
);

We mapped the URI /demo1 to the first sample. In the handling function for this route, we basically do 4 things:

  1. Get the data retrieved from our pinqDemo\Demo class
  2. Apply the first filter. In our case, we apply a price range to our original data
  3. Apply another operation (grouping) to the data generated in Step 2.
  4. Display data generated in Step 2 and 3.

Above all things, if we are going to use PINQ, we need to provide a dataset.

In PINQ’s terminology, the dataset to be manipulated is a “Traversable”. As expected, we can construct a “Traversable” from our dataset returned from our SQL queries:

$data = Traversable::from($books);

To apply an operation on the newly created “Traversable” object – I will use “dataset” onwards – PINQ provides a rich set of functions:

  • Filtering, where clause;
  • Ordering, such as orderByAscending and orderByDescending clause;
  • Grouping, groupBy clause;
  • Aggregating, such as count, average clause;
  • Selecting, to select and/or construct fields in the resulted dataset, such as select clause;
  • And others like joining, etc.

The complete API reference can be found in PINQ’s official documentation site.

In our demo1, the first operation we applied is filtering and sorting:

    $filter1 = $data
            ->where(function($row)
            {
                return $row['price'] > 90 && $row['price'] < 99;
            })
            ->orderByDescending(function($row)
    {
        return $row['id'];
    });

PINQ uses advanced PHP features like closures and anonymous functions to perform these tasks.

The syntax is quite similiar to the one we can see when we are using an ORM: the commands can be chained, each continuing on to the next command in the chain.

The main differences between this and an ORM are:

  1. We are not operating on a db connection or “Entity Manager” but directly on a dataset
  2. Filtering criteria (and other functions) is not expressed in SQL-like syntax but in PHP

After we applied the filter, which is a where clause, we also did a sorting of the filtered data by id in descending order.

If you are using the same sample data as I am, the resulting dataset ($filter1) will contain 9 records and they are displayed in the 2nd section of the final rendered page as shown above.

One of PINQ’s selling points is that the dataset generated can be re-used. This gives us the flexibility to apply multiple operations on the same dataset but provides different sub datasets and/or aggregation information, without destroying the original dataset.

In our sample code above, immediately after we have applied the first filter, we do an aggregation job to find out how many distinct authors there are and how many books they have written:

$filter2 = $filter1
            ->groupBy(function($row)
            {
                return $row['author'];
            })
            ->select(
    function(ITraversable $filter1)
    {
        return ['author' => $filter1->first()['author'], 'count' => $filter1->count()];
    }
    );

The groupBy function is familiar but the select function needs a bit more elaboration.

For a typical group by aggregation, we expect to return at least two values from the execution: one is the key used for grouping (author) and the aggregation information (count) related to that key.

Thus, the above statement inside the function(ITraversable $filter1) selects the author value from the first record in each group, and also the count for that group. This is exactly what we need.

The result is displayed in the bottom section of the page as shown above: in the total 9 books filtered, we have 2 books each for Author 0/1/23 and 1 book for Author 4.

If we want to show the total price in this grouping, we can further extend the select function statement:

->select(
    function(ITraversable $filter1)
    {
        return
		 ['author' => $filter1->first()['author'],
 		  'count' => $filter1->count(),
		   'sum'=>$filter1->sum(
				function($row)
                {
                    return $row['price'];
                }
           )
           ];
    }
    )

Anonymous functions get embedded and return the correct results.

We can then chain another orderByAscending to the select statement to order the dataset by sum:

->orderByAscending(function($row){return $row['sum'];});

This is very convenient and consistent.

Some may argue that in the above groupBy and select statements, the syntax used may not seem intuitive enough. Why not, say, use the below more straightforward syntax?

groupBy(function(return $row['author'];), $filter1->count());

Well, the author replied that it is impossible to do a grouping like this. And in PINQ’s current version (2.1), my approach is the best. However, the author hinted in his reply to me that in the next major release of PINQ (3.0), a much improved syntax will be introduced for such a common task, i.e., grouping by a key and also getting the respective aggregation information. Let’s keep an eye on it.

Conclusion and moving forward

In this article, we briefly introduced the basic functionality of PINQ and how to use it in a web app.

PINQ is under heavy development and thus its documentation is not fully there yet. Almost all functions lack a solid and practical demonstration. I hope this article can be of help in this sense, but I also hope the author of PINQ can put some effort in the documentation as well.

In the followup, I will try to illustrate how to use it to mimic a kind of “faceted” search capability.

Feel free to comment and give us your thoughts!

PINQ - Querify Your Datasets

PINQ – Querify Your Datasets – Faceted Search >>

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Asad Hasan

    Wow, excellent. This can be a great approach in minimizing effects of object relational impedance mismatch.

  • gggeek

    About the advantages of doing the grouping/filtering in-memory in php instead of letting the database do it: I think that it will be in fact *less* efficient for most of the cases.
    Developers should be warned that letting the database return a big dataset to php is a good way to waste both memory and time – you mention the overhead of sending a big sql query to the db, but the overhead in getting back a big dataset is probably bigger.
    Plus the query planners and executors in the database are highly optimized, and written in C, and thus most likely faster than the corresponding PinQ code.

    I am not saying that there are no advantages to PinQ – just that performance is probably not the main one…

    • Taylor Ren

      +gggeek Yes, fetching a big dataset back is always a bad idea. The PINQ comes in handy when the result set is already there and we must do some further analysis on the dataset. So instead of using another SQL statement, which will be sent to the SQL server and executed and then return the desired result, PINQ can do the job at the “client” side.

      The application of this approach should be strict.

  • Taylor Ren

    @StePP

    In part 2, there will be a more real world application: doing a mini version of facet search. As my reply to +gggeek says, the application of this approach must be carefully planned. And one of the scenarios that I can think of is to mimic a facet search.

  • Charles Bryant

    This looks really good, I would agree that it may not be great for replacing database functionality, however there is a good chance you may have already cached your first database query. It has a really valid application when you are getting results from an api.