This add-on is operated by One More Cloud, Inc.
Index and search with Apache Solr, the most popular open source search engine.
Websolr
Last updated August 16, 2021
Table of Contents
Websolr is a managed search service provided by Onemorecloud and powered by Apache Solr. The Websolr add-on allows you to use the high performance functionality of Solr in your application today.
Install the Add-on
$ heroku addons:create websolr
Websolr add-ons are not free. Please see the Websolr plans available and choose the version most suited to your needs and budget.
Choosing a Solr Client
The Apache Solr search server presents an API, and there are a number of open source clients to choose from. We recommend Sunspot, although you may already be using another. We provide more general client configuration at the end of this document.
Sunspot for Ruby on Rails
Sunspot provides a Rails plugin as a gem, named sunspot_rails
; as of this writing, the gem is version 1.3.0. These instructions cover setting up the Sunspot gem in your Rails application with this version, but you can use whatever version is most appropriate for your use case. Websolr is generally client neutral, so users are not bound to specific clients or versions. If you plan to use a version of Sunspot other than 1.3.0, make sure to consult the documentation for the version you have chosen.
Installing Sunspot with Bundler
Rails 3 applications use Bundler by default. If you are developing a Rails 2.3 application, please review Using Bundler with Rails 2.3 to ensure that your application is configured to use Bundler correctly.
Once you have set up your application to use Bundler, add the sunspot_rails
gem to your Gemfile
.
gem 'sunspot_rails', '~> 1.3.0'
Run bundle install
to install Sunspot, and its dependencies, into your local environment.
Configure Sunspot
When you add websolr to your application, a new Solr index is automatically created with a unique URL. This URL is added to your Heroku environment as the WEBSOLR_URL
variable. You can see this be running heroku config:get WEBSOLR_URL -a <your app name>
By default, Sunspot 1.3.0 supports the WEBSOLR_URL
environment variable used by your Heroku application in production. This lets Sunspot perform actions on your index without further configuration, allowing users to get search up and running quickly without necessitating changes to their app’s codebase.
If you would like more fine-grained control over which Solr servers you are using in different environments, you may run script/generate sunspot
from a command line in your application’s root directory to create a Sunspot configuration file at config/sunspot.yml
.
Using Sunspot
With Sunspot you configure your models for searching and indexing using a Ruby DSL. By default, your records are automatically indexed when they are created and updated, and removed from the index when destroyed.
Indexing Models
Here is a simple example of using Sunspot’s searchable
block and DSL to configure an ActiveRecord model.
class Post < ActiveRecord::Base
searchable do
text :title
text :body
string :permalink
integer :category_id
time :published_at
end
end
To learn more, refer to the following article at the Sunspot wiki:
Searching
To search the model in the above example, you may use something like the following:
@search = Post.search { keywords 'hello' }
@posts = @search.results
(If your model already defines a search
method, you may use the solr_search
method instead, for which search
is an alias.)
Sunspot exposes the full functionality of Solr. To learn more about searching your models, refer to the following articles at the Sunspot wiki:
Sunspot Rake Tasks
Sunspot provides Rake tasks to start and stop a local Solr server for development and testing. In order to use these Rake tasks, add the following line to your application’s Rakefile
:
require 'sunspot/rails/tasks'
You may wish to familiarize yourself with the available tasks by running rake -T sunspot
.
Running a local Solr server with Sunspot
Sunspot provides a means to start up a local instance of Solr. This is for development purposes, and has nothing to do with your websolr instance. That is, you do not need to explicitly start Solr to get websolr to work. Instead, this process will create a Solr server on your computer which can be used for development. To start and stop a local Solr server for development, run the following rake tasks:
rake sunspot:solr:start
rake sunspot:solr:stop
Re-indexing Data with Sunspot
If you are adding Websolr to an application with existing data in your development or production environment, you will need to “re-index” your data. Likewise, if you make changes to a model’s searchable
configuration, or change your index’s configuration at the Websolr control panel, you will need to reindex for your changes to take effect.
In order to reindex your production data, you may run a command similar to the following from your application’s directory:
heroku run rake sunspot:reindex
If you are indexing a large number of documents, or your models use a lot of memory, you may need to reindex in batches smaller than Sunspot’s default of 50. We recommend starting small and gradually experimenting to find the best results. To reindex with a batch size of 10, use the following:
heroku run rake sunspot:reindex[10]
Refer to rake -T sunspot
to see the usage for the reindex task.
R14 Memory exceeded error in Heroku
In some rare cases, you may get an R14 Memory exceeded error when trying to reindex your data. This occurs when the application tries to load more documents into memory than your app has been allotted by Heroku. In this case, the problem is a memory leak within your application, so batching with Sunspot will not help. You’ll need to find a way to load less data into memory before reindexing.
One possible solution is to replace Model.all
with Model.find_each(batch_size: <batch size>)
. The find_each method will batch-load into memory, mitigating the data’s footprint. It can then be batched again (if needed) when Sunspot sends the data to websolr. You can also buy more memory from Heroku.
Updating Asynchronously with Heroku Workers
Queuing your updates to Solr is a perfect job for Heroku’s Delayed Job Workers. Sending updates to Solr has the advantage of increasing your application’s performance and robustness. Simply add the following lines to your model after the searchable
block:
handle_asynchronously :solr_index
handle_asynchronously :remove_from_index
Resque users should consult this gist: https://gist.github.com/1282013
Haystack for Django
If your application is using Django, you can use the Haystack Solr client. Once you have set up your application as per their official getting started tutorial, you should modify your application’s settings.py
to use these settings:
HAYSTACK_URL = os.environ.get('WEBSOLR_URL', '')
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
'URL': HAYSTACK_URL,
},
}
When you are ready to deploy to Heroku, use the following command to generate your Solr schema.xml
, to be uploaded to your Websolr index:
./manage.py build_solr_schema > schema.xml
Copy the contents of the schema.xml
file and open the Websolr addon dashboard:
heroku addons:open websolr
Select your index, and select the “Advanced” tab to paste in the contents of your schema.xml
. Your index will take a minute or two to reconfigure itself, and then you can run the following command to reindex your data:
heroku run python myproject/manage.py rebuild_index
Using a Different Solr Client
There are other Solr clients, including the venerable but still popular acts_as_solr
. If you are already using one of these clients and are not interested in switching your application to Sunspot, here are a few pointers for using Websolr in production.
Your index’s URL is set in the WEBSOLR_URL
environment variable. If your Solr client can be configured at runtime, we recommend creating an initializer file (such as config/initializer/websolr.rb
in Rails) in which you instruct your client to connect to ENV['WEBSOLR_URL']
when present.
Alternatively, you may run heroku config:get WEBSOLR_URL -a <your app name>
from your application’s directory to view the value for WEBSOLR_URL
and manually hard-code the relevant configuration file for your particular Solr client.
Configuring your index
When your index is first created, it will be automatically configured using the schema.xml
for the latest version of Sunspot, which is a very flexible schema that can cover a lot of uses.
Websolr provides a control panel where you may make changes to your index, such as adding or removing different Solr features, selecting a different Solr client, providing your own schema.xml
and so on. You can access this dashboard by running heroku addons:open websolr
Questions?
If you are experiencing a problem with installing or using the Websolr add-on, you may email us or visit http://support.heroku.com/ for assistance. Please provide your index URL and, if possible, a reproduction of the error using curl
.
Websolr is a popular service that receives many questions. We love to answer general questions about Solr integration, but need to prioritize support questions directly related to our service. If you have general questions about implementing various search features, you may first want to try their relevant public forums.
- Sunspot mailing list
- Heroku users mailing list
- Stack Overflow — we watch the “websolr” tag fairly closely.
- The official solr-user mailing list.
If you have suggestions for our docs, we welcome comments here: https://gist.github.com/2333627.