PHP
Article

Introduction to Elasticsearch in PHP

By Wern Ancheta

In this tutorial, we’re going to take a look at Elasticsearch and how we can use it in PHP. Elasticsearch is an open-source search server based on Apache Lucene. We can use it to perform super fast full-text and other complex searches. It also includes a REST API which allows us to easily issue requests for creating, deleting, updating and retrieving of data.

ElasticSearch Logo

Installing Elasticsearch

This tutorial will assume you’re using a Debian-based environment like this one in the installation instructions below.

To install Elasticsearch we first need to install Java. By default, it is not available in the repositories that Ubuntu uses so we need to add one.

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update

Once that’s done, we can install Java.

sudo apt-get install oracle-java8-installer

Next, let’s download Elasticsearch using wget.

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.5.2.tar.gz

Currently, the most recent stable version is 1.5.2 so that is what we used above. If you want to make sure you get the most recent version, take a look at the Elasticsearch downloads page.

Then, we extract and install.

mkdir es
tar -xf elasticsearch-1.5.2.tar.gz -C es
cd es
./bin/elasticsearch

When we access http://localhost:9200 in the browser, we get something similar to the following:

{
  "status" : 200,
  "name" : "Rumiko Fujikawa",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.5.2",
    "build_hash" : "62ff9868b4c8a0c45860bebb259e21980778ab1c",
    "build_timestamp" : "2015-04-27T09:21:06Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

Using Elasticsearch

Now we can start playing with Elasticsearch. First, let’s install the official Elasticsearch client for PHP.

composer require elasticsearch/elasticsearch

Next, let’s create a new php file that we will use for testing and with the following code so that we can use the Elasticsearch client.

<?php
require 'vendor/autoload.php';

$client = new Elasticsearch\Client();

Indexing Documents

Indexing new documents can be done by calling the index method on the client. This method accepts an array as its argument. The array should contain the body, index and type as its keys. The body is an array containing the data that you want to index. The index is the location where you want to index the specific document (corresponds to database in traditional RDBMS). Lastly, the type is the type you want to give to the document, how you want to categorize the document. It’s like the table in RDBMS land. Here’s an example:

$params = array();
$params['body']  = array(
  'name' => 'Ash Ketchum',
  'age' => 10,
  'badges' => 8 
);

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';

$result = $client->index($params);

If you print out the $result you get something similar to the following:

Array
(
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => AU1Bn51W5l_vSaLQKPOy
    [_version] => 1
    [created] => 1
)

In the example above, we haven’t specified an ID for the document. Elasticsearch automatically assigns a unique ID if nothing is specified. Let’s try assigning an ID to another document:

$params = array();
$params['body']  = array(
  'name' => 'Brock',
  'age' => 15,
  'badges' => 0 
);

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';
$params['id'] = '1A-000';

$result = $client->index($params);

When we print the $result:

Array
(
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => 1A-001
    [_version] => 1
    [created] => 1
)

When indexing documents, we’re not limited to a single-dimensional array. We can also index multi-dimensional ones:

$params = array();
$params['body']  = array(
  'name' => 'Misty',
  'age' => 13,
  'badges' => 0,
  'pokemon' => array(
    'psyduck' => array(
      'type' => 'water',
      'moves' => array(
        'Water Gun' => array(
          'pp' => 25,
          'power' => 40
        )
      ) 
    )
  ) 
);

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';
$params['id'] = '1A-002';

$result = $client->index($params);

We can go as deep as we want, but we still need to observe proper storage of data (not going too deep, keeping it structured and logical, etc) when we index it with Elasticsearch, just like we do in an RDBMS setting.

Searching for Documents

We can search for existing documents within a specific index using either the get or search method. The main distinction between the two is that the get method is commonly used when you already know the ID of the document. Its also used for getting only a single document. On the other hand, the search() method is used for searching multiple documents, and you can use any field in the document for your query.

Get

First, let’s start with the get method. Just like the index method, this one accepts an array as its argument. The array should contain the index, type and id of the document that you want to find.

$params = array();
$params['index'] = 'pokemon';
$params['type'] = 'pokemon_trainer';
$params['id'] = '1A-001';

$result = $client->get($params);

The code above would return the following:

Array
(
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => 1A-001
    [_version] => 1
    [found] => 1
    [_source] => Array
        (
            [name] => Brock
            [age] => 15
            [badges] => 0
        )

)

Search with Specific Fields

The array argument for the search method needs to have the index, the type and the body keys. The body is where we specify the query. To start, here’s an example on how we use it to return all the documents which have an age of 15.

$params['index'] = 'pokemon';
$params['type'] = 'pokemon_trainer';
$params['body']['query']['match']['age'] = 15;

$result = $client->search($params);

This returns the following:

Array
(
    [took] => 177
    [timed_out] => 
    [_shards] => Array
        (
            [total] => 5
            [successful] => 5
            [failed] => 0
        )

    [hits] => Array
        (
            [total] => 1
            [max_score] => 1
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => pokemon
                            [_type] => pokemon_trainer
                            [_id] => 1A-001
                            [_score] => 1
                            [_source] => Array
                                (
                                    [name] => Brock
                                    [age] => 15
                                    [badges] => 0
                                )

                        )

                )

        )

)

Let’s break the results down:

  • took – number of milliseconds it took for the request to finish.
  • timed_out – returns true if the request timed out.
  • _shards – by default, Elasticsearch distributes the data into 5 shards. If you get 5 as the value for total and successful then every shard is currently healthy. You can find a more detailed explanation in this Stackoverflow thread.
  • hits contains the results.

The method that we used above only allows us to search with a first-level depth, though. If we are to go further down, we have to use bool queries. To do that, we specify bool as an item for the query. Then we can traverse to the field we want by using . starting from the first-level field down to the field we want to use as a query.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';
$params['body']['query']['bool']['must'][]['match']['pokemon.psyduck.type'] = 'water';
$result = $client->search($params);

Searching with Arrays

We can search using arrays as the query (to match several values) by specifying the bool item, followed by must, terms and then the field we want to use for the query. We specify an array containing the values that we want to match. In the example below we’re selecting documents which have an age that is equal to 10 and 15.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';

$params['body']['query']['bool']['must']['terms']['age'] = array(10, 15);

This method only accepts one-dimensional arrays.

Next, let’s do a filtered search. To use filtered search, we have to specify the filtered item and set the range that we want to return for a specific field. In the example below, we’re using the age as the field. We’re selecting documents which have ages greater than or equal to (gte) 11 but less than or equal (lte) to 20.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';
$params['body']['query']['filtered']['filter']['range']['age']['gte'] = 11;
$params['body']['query']['filtered']['filter']['range']['age']['lte'] = 20;
$result = $client->search($params);

OR and AND

In RDBMS land we are used to using the AND and OR keywords to specify two or more conditions. We can also do that with Elasticsearch using filtered search. In the example below we’re using the and filter to select documents which have an age of 10 and a badge count of 8. Only the documents which matched this criteria are returned.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';

$params['body']['query']['filtered']['filter']['and'][]['term']['age'] = 10;
$params['body']['query']['filtered']['filter']['and'][]['term']['badges'] = 8;

$result = $client->search($params);

If you want to select either of those then you can use or instead.

$params['body']['query']['filtered']['filter']['or'][]['term']['age'] = 10;
$params['body']['query']['filtered']['filter']['or'][]['term']['badges'] = 8;

Limiting Results

Results can be limited to a specific number by specifying the size field. Here’s an example:

$params['body']['query']['filtered']['filter']['and'][]['term']['age'] = 10;
$params['body']['query']['filtered']['filter']['and'][]['term']['badges'] = 8;
$params['size'] = 1;

This returns the first result since we limited the results to just one document.

Pagination

In RDBMS land we have the limit and offset. In Elasticsearch we have size and from. from allows us to specify the index of the first result in the resultset. Documents are zero-indexed. So for 10 results per page, if we have a size of 10, we add 10 to the from value every time the user navigates to the next page.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';

$params['size'] = 10;
$params['from'] = 10; // <-- will return second page

Updating a Document

To update a document, we first need to fetch the old data of the document. To do that, we specify the index, type and the id like we did earlier and then we call the get method. The current data can be found in the _source item. All we have to do is update the current fields with new values or add new fields to that item. Finally, we call the update method with the same parameters used for the get method.

$params = array();
$params['index'] = 'pokemon';
$params['type'] = 'pokemon_trainer';
$params['id'] = '1A-001';
$result = $client->get($params);


$result['_source']['age'] = 21; //update existing field with new value

//add new field
$result['_source']['pokemon'] = array(
  'Onix' => array(
    'type' => 'rock',
    'moves' => array(
      'Rock Slide' => array(
        'power' => 100,
        'pp' => 40
      ),
      'Earthquake' => array(
        'power' => 200,
        'pp' => 100
      )
    )
  )
);

$params['body']['doc'] = $result['_source'];

$result = $client->update($params);

This returns something similar to the following:

Array
(
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => 1A-001
    [_version] => 2
)

Note that the _version is incremented every time you call the update method, regardless of whether things have actually been updated.

You might be wondering why we have a version in the document or even be tempted to think that there’s a functionality in Elasticsearch that allows us to fetch a previous version of a document. Unfortunately, that isn’t so. The version merely serves as a counter as to how many times a document was updated.

Deleting a Document

Deleting a document can be done by calling the delete method. This method accepts an array containing the index, type and id as its argument.

$params = array();
$params['index'] = 'pokemon';
$params['type'] = 'pokemon_trainer';
$params['id'] = '1A-001';

$result = $client->delete($params);

This returns the following:

Array
(
    [found] => 1
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => 1A-001
    [_version] => 7
)

Note that you will get an error if you try to fetch a deleted document using the get method.

Conclusion

In this article, we looked at how we can work with Elasticsearch in PHP using the official Elasticsearch client. Specifically, we’ve taken a look at how to index new documents, search for documents, paginate results, and delete documents.

Overall, Elasticsearch is a nice way to add search functionality to your PHP applications. If you want to learn more about how to integrate Elasticsearch on your PHP applications, you can check out Daniel Sipos’ series on how to integrate Elasticsearch with Drupal and Silex.

If, however, you prefer more automatic solutions to adding in-depth search functionality to your applications, see this series.

  • Vendedor Buceta

    I was reading until I saw the PHP 5.3 syntax for array…

    • ahhhh

      Spoiled

    • snv

      Seriously? You’re that much of a php snob?

      Your life must suck in every imaginable way if you let shit like that bother you.

  • Ivan Panfilov

    nice article

  • hot_rush

    better to use https://github.com/ruflin/Elastica, official client too ugly

  • Mantas Urnieža

    There is a nice DSL https://github.com/ongr-io/ElasticsearchDSL and Symfony bundle https://github.com/ongr-io/ElasticsearchBundle for working with ES in PHP. I think they are worth mentioning :)

  • Mussa Mosws

    Worth reading,thanks

  • Gopal Sharma

    Incomplete for beginners.

  • http://SalaryNet30.com Elida Flores

    Do you want to know something really interesting that is worth paying your attention right now,a fabulous online opportunity to work for those people who want to use their free time so that they can make some extra money using their computers… I have been working on this for last two and half years and I am making 60-90 dollar/ hour … In the past week I have earned 13,70 dollars for almost 20 hours sitting ….

    Any skills, Degree ,Specific qualification is not necessary for this, just keyboard typing and a good working and reliable internet connection ….

    Any time limitations to start work is not required … You may do this work at any time when you willing to do it ….

    Do you want to know how I have been doing this?…..….see this {Iink} on my !|profile|!` to know how I am working` on this`

    bvvvvvvvvvvvvvvvvvvvv45454325

  • ishwar

    What directory should I install elastic search and then vendor?

  • Adarsh Chacko

    How do I execute the following in php:

    curl -XHEAD -i ‘http://localhost:9200/twitter/tweet/1’

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in PHP, once a week, for free.