Drupal + Solr Search on Linux

By default, Drupal comes with very basic search functionality. There are contributed modules that extend this functionality, but they don't address a major issue -- performance. On smaller sites or sites that don't need to index a non-Drupal database, the core search and additional contributed modules will get the job done. Other limitations of Drupal's core search include:

  • Limited operators
  • SQL was not designed as a search language
  • Doesn't take in proximity of multiple keywords
  • Won't return variations in keywords

The benefits of using Solr search are:

  • Ability to index and retrieve search results quicker
  • Expose all attributes of nodes to search
  • Provide faceted search-based navigation allowing users to drill down into content by date, autor, tags, content type, and other attributes.
  • Boosting or ignoring Content types
  • Boosting by fields or HTML tags
  • Building content from Solr index alone (No SQL queries or node_load calls)

Features:

  • Hit and relevance highlighting
  • dynamic clustering
  • auto correction
  • Caching
  • Multi-site searches
  • Auto completion
  • Suggestions
  • Spell check
  • Faceted search
  • Highlighting
  • Content recommendations
  • Document handling (e.g., Word and PDF)

Assumptions

  • Single Core instance

Considerations

The Drupal 7 and 6.x-3.x work with Solr 1.4.1 and 3.6.x. Drupal 6 modules such as apachesolr_autocomplete and apachesolr_views are not compatable with this the Drupal 6.x-3.x module due to changes in the module. If you want to use other Drupal 6 contibuted Solr moules you should use apachesolr-6.x-1.7. However, using Drupal 7 will allow you to use modules such as apahesolr_views, apachesolr_panels, apachesolr_autocomplete, richsnippets, etc without compatibility issues.

Prerequisits

Installing Solr on Linux 

Java should be installed already, if not, install it. If you wish to run Solr under Tomcat or Jetty, you can install it using your distribution's package management system. If you choose to use Tomcat or Jetty, the packages that you want to install are tomcat6 and tomcat6-admin or jetty. However, this is not necessary.

For Solr 3.6.x:

# wget http://mirror.symnds.com/software/Apache/lucene/solr/3.6.0/solr-3.6.2.tgz
# tar zxf solr-3.6.2.tgz
# cd solr-3.6.2

For Solr 1.4.1

# wget http://archive.apache.org/dist/lucene/solr/1.4.1/apache-solr-1.4.1.tgz
# tar zxf apache-solr-1.4.1.tgz
	# cd apache-solr-1.4.1

Tomcat setup

# cp example/webapps/solr.war /var/lib/tomcat6/webapps
# cp -rf example/solr /var/lib/tomcat6/
# chown -R tomcat6:tomcat6 /var/lib/tomcat6
# service tomcat6 restart

If you have opted to not use Tomcat or Jetty, then from within the uncompressed Solr directory, in the example directory, run the following command. It doesn't matter if you downloaded 1.4.1 or 3.6.x, the example directory still exists and the command will still launch an instance of Solr on port 8983.

$ java -jar start.jar

Drupal Configuration and Setup

Using drush, enter the Drupal directory you wish to install Solr search for:

$ drush dl apachesolr
$ cd sites/all/modules/contrib/apachesolr

If you are using the Drupal 6.x-1.x module and Solr 1.4.1. If you have opted to not use Tomcat, then replace /var/lib/tomcat6 with the path to where the solr directory lives. For example, if you downloaded the apache solr file to your home directory, the path would look like this: /home/paulus/apache-solr-1.4.1/solr/conf.

$ drush make --no-core -y --contrib-destination=. apachesolr.make.example
$ su
# cp *.xml /var/lib/tomcat6/solr/conf

Drupal 7.x-1.x or 6.x-3.x support both Solr 1.4.1 and 3.6.x. These modules have a subdirectory called solr-config which has two directories containing the schemea.xml and solrconfig.xml files for either 1.4 or 3.6

$ su
# cp solr-config/{solr-1.4, solr-3.6}/*.xml /var/lib/tomcat6/solr/conf/

Enable the apachesolr and apachesolr_search modules; using drush is easier:

$ drush en -y apachesolr apachesolr_search

If you would like to search multiple drupal sites, then download and enable apachesolr_multisitesearch

$ drush dl apachesolr_multisitesearch
$ drush en apachesolr_multisitesearch

Go to the solr configuration page /admin/settings/apachesolr By default, Tomcat runs on port 8080 which means you have to change the port. You can leave the port as is if you are running jetty or started the solr instance from the commandline by running:

$ java -jar start.jar

Other settings you may want to consider changing

  • Index Write Access - If you want the site to update the index, then leave it as Read and write
  • Make Apache Solr Search the Default - you most likely want to change this to Enabled
  • Enable spellchecker and suggestions

Next, look at the Enabled Filters tab, to enable any additional filtering, such as by content type. If you want to filter certain content types out, you will need to configure it on the Content bias settings tab. 

Once everything is configured, you need to index the site. In the Drupal 7 and Drupal 6.x-3.x modules, you can index everything from within the site. However, in the Drupal 6.x-1.x module you can only index up to 200 nodes every cron run. To overcome this, you can use drush by setting the apachesolr_cron_limit variable.

$ drush vset apachesolr_cron_limit 10000
$ drush solr-index
$ drush vset apachesolr_cron_limit 50

Now that Solr is installed, configured, and returning search results, you should disabled core search from indexing your site.