PINQ – Querify Your Datasets – Faceted Search
In part 1, we briefly covered the installation and basic syntax of PINQ, a PHP LINQ port. In this article, we will see how to use PINQ to mimic a faceted search feature with MySQL.
We are not going to cover the full aspect of faceted search in this series. Interested parties can refer to relevant articles published on Sitepoint and other Internet publications.
A typical faceted search works like this in a website:
- A user provides a keyword or a few keywords to search for. For example, “router” to search for products which contain “router” in the description, keyword, category, tags, etc.
- The site will return the products matching the criteria.
- The site will provide some links to fine tune the search. For example, it may prompt that there are different brands for a router, and there may be different price ranges and different features.
- The user can further screen the results by clicking the different links provided and eventually gets a more customized result set.
Faceted search is so popular and powerful and you can experience it in almost every e-Commerce site.
Unfortunately, faceted search is not a built-in feature provided by MySQL yet. What can we do if we are using MySQL but also want to provide our users with such a feature?
With PINQ, we’ll see there is an equally powerful and straightforward approach to achieving this as when we are using other DB engines – at least in a way.
Extending Part 1 Demo
NOTE: All code in this part and the part 1 demo can be found in the repo.
In this article, we will extend the demo we have shown in Part 1 and add in some essential faceted search features.
Let’s start with index.php
by adding the following few lines:
$app->get('demo2', function () use ($app)
{
global $demo;
$test2 = new pinqDemo\Demo($app);
return $test2->test2($app, $demo->test1($app));
}
);
$app->get('demo2/facet/{key}/{value}', function ($key, $value) use ($app)
{
global $demo;
$test3 = new pinqDemo\Demo($app);
return $test3->test3($app, $demo->test1($app), $key, $value);
}
);
We just created two more routes in our demo application (using Silex).
The first route is to bring us to the page showing all the records that match our first search behavior, i.e., search by providing a keyword. To keep the demo simple, we select all books from the sample book_book
table. It will also display the result set and faceted links for further navigation.
The second route brings us to another page showing the records matching further facet search criteria in the result set produced in the above step. It will display the faceted search links too.
In a real world implementation, after a faceted link is clicked, any faceted filtering in the result page will be adjusted to reflect the statistical information of the result data set. By doing this, the user can apply “add-on” screenings, adding “brand” first then “price range”, etc.
But in this simple demo, we will skip this approach, all faceted search and the links will only reflect the information on the original data set. This is the first restriction and the first area for improvement in our demo.
As we see from the code above, the real functions reside in another file called pinqDemo.php
. Let’s see the relevant code that provides the faceted search feature.
A facet class
First, we create a class to represent a facet. Generally, a facet should have a few properties:
- The data it operates on (
$data
) - The key it groups on (
$key
) - The key type (
$type
). It can be one of the below:- specify a full string to make an exact match
- specify partial (normally beginning) of a string to make a pattern match
- specify a value range to group by a value range
- If the key type is a range, there is a need to specify a value step to determine the upper/lower bound of the range; or if the key type is a partial string, we need to provide a number to specify how many first letters shall be used to group (
$range
)
The grouping is the most critical part in a facet. All the aggregating information that a facet could possibly return depends on the “grouping” criteria. Normally, “Full String”, “Partial String” and “Value Range” are the most commonly used ones.
namespace classFacet
{
use Pinq\ITraversable,
Pinq\Traversable;
class Facet
{
public $data; // Original data
public $key; // the field to be grouped on
public $type; // F: full string; S: start of a string; R: range;
public $range; // Only valid if $type is not F
...
public function getFacet()
{
$filter = '';
if ($this->type == 'F') // Full string
{
...
}
elseif ($this->type == "S") //Start of string
{
...
}
elseif ($this->type == "R") // A value range
{
$filter = $this->data
->groupBy(function($row)
{
return floor($row[$this->key] / $this->range) * $this->range;
})
->select(function (ITraversable $data)
{
return ['key' => $data->last()[$this->key], 'count' => $data->count()];
});
}
return $filter;
}
}
}
In this class, the key function is to return the faceted result set based on the data and the facet key properties. We noticed that for different types of keys, there are different ways to group the data. In the above, we have shown what the code will look like if we are grouping the data by a value range in a step specified by $range
.
Making facets and displaying the original data
public function test2($app, $data)
{
$facet = $this->getFacet($data);
return $app['twig']->render('demo2.html.twig', array('facet' => $facet, 'data' => $data));
}
private function getFacet($originalData)
{
$facet = array();
$data = \Pinq\Traversable::from($originalData);
// 3 samples on constructing different Facet objects and return the facet
$filter1 = new \classFacet\Facet($data, 'author', 'F');
$filter2 = new \classFacet\Facet($data, 'title', 'S', 6);
$filter3 = new \classFacet\Facet($data, 'price', 'R', 10);
$facet[$filter1->key] = $filter1->getFacet();
$facet[$filter2->key] = $filter2->getFacet();
$facet[$filter3->key] = $filter3->getFacet();
return $facet;
}
In the getFacet()
function, we do the following steps:
- Convert the original data to a
Pinq\Traversable
object for further processing. - We create 3 facets. The ‘author’ facet will group on the field
author
and it is a full string grouping; ‘title’ facet on fieldtitle
and a partial string grouping (the starting 6 letters count); ‘price’ facet on fieldprice
and a range grouping (by a step of 10). - Finally, we get the facets and return them back to
test2
function so that the template can render the data and the facets.
Displaying the facets and the filtered data
Most of the time, facets will be displayed as a link and bring us to a filtered data set.
We have already created a route ('demo2/facet/{key}/{value}'
) to display the faceted search results and the facet links.
The route takes two parameters, reflecting the key we facet on and the value of that key. The test3
function that eventually gets invoked from that route is excerpted below:
public function test3($app, $originalData, $key, $value)
{
$data = \Pinq\Traversable::from($originalData);
$facet = $this->getFacet($data);
$filter = null;
if ($key == 'author')
{
$filter = $data
->where(function($row) use ($value)
{
return $row['author'] == $value;
})
->orderByAscending(function($row) use ($key)
{
return $row['price'];
})
;
}
elseif ($key == 'price')
{
...
}
else //$key==title
{
...
}
return $app['twig']->render('demo2.html.twig', array('facet' => $facet, 'data' => $filter));
}
Basically, depending on the key, we apply filtering (the anonymous function in where
clause) corresponding to the value passed in and get the further screened data. We can also specify the order of the faceted data.
Finally, we display the data (along with the facets) in a template. This route renders the same template as that which is used by route 'demo2'
).
Next, let’s take a look at the template and see how the facet links are displayed. I am using Bootstrap so the CSS components used here should be quite familiar:
<div class="col col-md-4">
<h4>Search Bar</h4>
<ul>
{% for k, v in facet %}
<li><h5><strong>{{k|capitalize}}</strong></h5></li>
<ul class="list-group">
{% for vv in v %}
<li class="list-group-item"><span class="badge">{{vv.count}}</span><a href="/demo2/facet/{{k}}/{{vv.key}}">{{vv.key}}</a></li>
{%endfor%}
</ul>
{%endfor%}
</ul>
</div>
We have to remember that the facet generated by our app is a nested array. In the first layer, it is an array of all the facets, and in our case, we have a total of 3 (for author
, title
, author
, respectively).
For each facet, it is a “key-value” paired array so that we can iterate in a traditional way.
Please note how we construct the URIs of the links. We used both the outer loop’s key (k
) and inner loops key (vv.key
) to be the parameters in the route ('demo2/facet/{key}/{value}'
). The count of the key (vv.count
) is used to touch up the display in the template (as a Bootstrap badge).
The template will be rendered as shown below:
(The first shows the initial entry page and the second shows a faceted result with price between $0 to $10 and ordered by author
)
All right, so far we have managed to mimic a faceted search feature in our web app!
Before we conclude this series, we shall take a final look at this demo and see what can be done to improve it and what are the limitations.
Improvements to be made
Overall, this is a quite rudimentary demo. We just ran through the basic syntax and concepts and forged them into a can-run example. As we saw earlier, a few areas can be improved on to make it more flexible.
We need to consider providing “add-on” criteria searching capability. Our current implementation limits the facet search to be applied on the original only, instead of the screened data. This is the most important improvement I can think of.
Limitations
The faceted search implemented here has a deep-rooted limitation (and probably true for other faceted search implementations):
We are retrieving data from the MySQL server every time.
This app uses Silex as the framework. For any single-entrance framework like Silex, Symfony, Laravel, its index.php
(or app.php
) gets called every time a route is to be analyzed and a controller’s function is to be invoked.
Looking at the code in our index.php
, we will see that this also means the below line of code:
$demo = new pinqDemo\Demo($app);
gets called every time a page in the app is displayed, which then means the following lines are executed every time:
class Demo
{
private $books = '';
public function __construct($app)
{
$sql = 'select * from book_book order by id';
$this->books = $app['db']->fetchAll($sql);
}
Will it be better if we avoid using a framework? Well, besides the fact that it is not really a very good idea to develop an app without a framework, we are still facing the same issue: data (and status) are not persistent from one HTTP call to another. This is the fundamental characteristic of HTTP. This should be avoided with the use of a caching engine.
We do save some SQL statements being executed at the server side when we are constructing the facets. Instead of passing 1 select
query AND 3 different group by
queries with the same where
statement, we just issue one select
query with the where
statement and use PINQ to provide the aggregating information.
Conclusion
In this part, we managed to mimic a facet search capability for our book collection site. As I said, it is merely a can-run demo and has plenty of room of improvement and some default limitations. Let us know if you build on this example and can show us some more advanced use cases!
The author of PINQ is now working on the next major version release (version 3). I do hope it can get more powerful.
Feel free to leave your comments and thoughts below!