Enterprise Search with Apache Solr and WordPress
In this tutorial, we will look at what Apache Solr is and how it works. We are going to look at some powerful Solr features and how it is different than MySQL. We will see benefits of integrating Solr into our WordPress website. We will also cover how to install WordPress Solr plugin and also how to host Apache Solr. Finally, we will wrap by looking at some popular websites using Solr.
This tutorial is for both WordPress developers and users. If you’re not a developer and using WordPress just to setup websites in that case you should only care about Solr benefits and setting up. However, if you are a WordPress developer then learning the internals of Solr and technical implementations will help you.
What Is Apache Solr?
Apache Solr is an Open Source, enterprise search server. It stores information in such a way that searching is very fast. In a nutshell, it’s also a storage system like SQL and NoSQL.
Solr is written in Java and uses the Lucene search library for its core functionality. You don’t need to know Java to work with Solr.
How It Is Different than MySQL?
If you’re new to Solr, the best way to understand the internals of Solr is to compare it with MySQL.
- MySQL stores information in the form of tables and rows. Whereas Solr stores information in form of schema and XML documents. Schema defines the structure of the documents.
- You can have multiple tables in MySQL, similarly you can have multiple schemas in Solr.
- Columns in a table define the structure of the table similarly in Solr fields define the structure of the schema.
- In MySQL you store in the form of rows whereas in Solr you store in the form of documents.
- In MySQL when columns are indexed the rows get arranged in a tree like structure. Whereas in Solr when a field is indexed it is arranged into a inverted index data structure.
What Makes It Fast for Search?
Solr uses inverted index data structure to search for words in documents and intersects the final result. No other storage system uses this kind of data structure.
What Are Other Features of Solr?
Solr offers many other features like spell correction, faceting, highlighting, result grouping, auto completion etc. Implementing these features into your WordPress site will make it stand out from the crowd. These features provide better user experience and a new way to access content on your WordPress site.
Why You Should Integrate WordPress with Solr?
When the number of posts on your site increases, MySQL starts to perform slow when users search on your site. This is because MySQL loops through every post and uses regular expressions to match search terms. This is a very CPU expensive task. Sometimes users get request timeout errors due to PHP script execution time limit. If there are 10,000 posts then for every search query MySQL is going to hit the file system 10,000 times which is a very expansive task and will slow down your website.
Whereas Solr can search 10,000 documents in just a couple of seconds. If you have a medium size blog, then a single Solr instance is enough to power all posts.
How to Integrate Solr with WordPress?
To integrate Solr in WordPress you need two things: Solr plugin and Solr hosting. The basic job of a Solr plugin is to intercept the WordPress search requests and provide results from Solr server instead of WordPress naturally fetching the result from MySQL.
Solr plugins also provide features like auto suggestion, spell correction, highlighting search terms, faceting on tags and categories etc. However, to implement these extra features the WordPress theme must be compatible with APIs of that particular Solr plugin. Some plugins allow you to add a custom
search.php file to the theme and then the plugin intercepts the template hierarchy and executes the custom search file which has all these advanced features.
Solr plugin only makes copy of posts and pages into Solr server. Solr is used only during search requests, for example
http://example.com/?s=search_query. For all other operations and requests WordPress follows the normal flow, for example it uses MySQL.
Solr Plugins for WordPress
There are two popular WordPress plugins to integrate Solr: Advanced Search by My Solr Server and WPSOLR Search Engine.
In this tutorial, I will be using WPSOLR Search Engine plugin to integrate Solr into WordPress.
You can host Solr on a dedicated server, but this requires some maintenance. Therefore, Solr cloud hosting services are preferred. There are two popular Solr cloud hosting services: OpenSolr and GotoSolr.
In this tutorial, I will use GotoSolr to host our Solr server.
Installing the WPSOLR Search Engine Plugin
Navigate to ‘Plugins’ and then to ‘Add New’ in WordPress Admin Dashboard. Then search for ‘WPSOLR Search Engine’. You will have the plugin listed on top. Install and activate it.
Hosting Solr on GotoSolr
Here are the steps to host Solr on GotoSolr:
- Create a new GotoSolr account. The first month is free, so it’s good to try it out with your WordPress site first. This way, there’s no risk if you don’t like it.
- Once you have created an account you will have access to the dashboard.
- An index is a collection of schema, documents and their configurations. You need to create an Index. Click on the + button on the Indexes tab.
- Now you need to download the configuration files of WPSOLR plugin for your Index and upload them.
- Click on tab “schema.xml” and use the Upload button to upload the previously downloaded schema.xml file. Then click the save button.
- Click on tab ‘Access keys’, create a new security key/secret by clicking on ‘Add a new key/secret’. Later, you’ll use these keys to let the plugin (and it only) connect to your Solr index, by setting it’s values in the fields user/password used in all Solr https basic authentication calls.
- Click on tab “URL of this index”. Paste the URL of your index in a document for later. Notice how complex the URL is, and that it’s using HTTPS. This, plus your access keys, ensures your index is secured.
Configuring and Connecting WPSOLR Search Engine to GotoSolr Server
Here are the steps to connect to the GotoSolr server:
- Open the WPSOLR settings page on WordPress administration dashboard. Click on ‘I uploaded my 2 compatible configuration files to my Solr core’ button.
- Now select “Cloud Hosting” radio button. And then copy server access information from GotoSolr dashboard.
Click on ‘Check Solr Status Then Save’. This will validate your settings, and will test the connexion to your Solr index. If any error, you will be warned with a message. If (and only if) the connection is validated, then your settings will be saved.
- Under ‘Solr Options’ tab select what you want to be indexed and also what extra features you need.
- Now go to ‘Solr Operations’ tab and click on ‘Load data’ button which will copy all your WordPress content into Solr server.
- From this point forward, if you make any changes or create new pages and posts then this plugin will automatically make a copy of it in the Solr index.
This plugin overrides the WordPress search form to implement auto completion feature. If you are using this plugin then you don’t need to create a custom search.php file because this plugin creates it automatically. Here are some screenshots showing auto completion and other features of Solr on a sample WordPress site.
Websites Using Solr
There are a lot of popular websites which are using Solr to power their search. Here is just a short list:
- DuckDuckGo uses Solr to provide spell checking, storing webpages etc.
- Drupal.org uses Solr to power all their site features.
- Last.fm uses Solr for all its search features.
- Some other sites include AOL, Yahoo, Instagram, Yellow Pages etc.
If you’re a developer then you will definitely benefit from learning Solr. If you are a WordPress user, then integrating Solr will surely help you provide a powerful search engine.
Here are some resources where you can get some further information on Apache Solr:
- Solr in 5 minutes: This online resource covers the fundamentals of Solr and will quickly bring you up-to-speed.
- Solr in Action: If you prefer learning through a then this is a great choice.
- Learning Apache Solr with Big Data and Cloud Computing: This is my own personal video course, it’s a great way to get started with Solr.
As you’ve seen, it was simple enough to implement Solr in WordPress without too much fuss. Please share your experiences and any Solr and WordPress implementations you’ve come across below.