SearchBox Elasticsearch

This add-on is operated by Sebula Bilisim Teknolojileri LTD STI

Search Made Simple.

SearchBox Elasticsearch

Last Updated: 16 April 2014

Table of Contents

SearchBox is an add-on for providing full-text hosted search functionality powered by ElasticSearch.

SearchBox offers real time searching, bulk indexing, faceting, geo tagging, auto-complete, suggestions (did you mean support) , saved queries(percolation) and many more without headache.

Installing the add-on

SearchBox can be installed to a Heroku application via the CLI:

$ heroku addons:add searchbox

Once Searchbox has been added SEARCHBOX_URL and SEARCHBOX_SSL_URL setting will be available in the app configuration and will contain the account name and api-key to access SearchBox indices service. This can be confirmed using the heroku config command.

$ heroku config | grep SEARCHBOX_URL
SEARCHBOX_URL  => http://paas:8ed0986ecaabcb7c20b4b2bdd6251f2d@api.searchbox.io
$ heroku config | grep SEARCHBOX_SSL_URL
SEARCHBOX_SSL_URL  => https://paas:8ed0986ecaabcb7c20b4b2bdd6251f2d@api.searchbox.io

After installing SearchBox the application should be configured to fully integrate with the add-on.

SearchBox does NOT creates index automatically. So ensure to create an index via API or dashboard.

Using Tire with Rails

Tire is a Ruby client for the ElasticSearch search engine. It provides Ruby-like API for fluent communication with the ElasticSearch server and blends with ActiveModel class for convenient usage in Rails applications. It allows to delete and create indices, define mapping for them, supports the bulk API, and presents an easy-to-use DSL for constructing your queries. It has full ActiveRecord/ActiveModel compatibility, allowing you to index your models (incrementally upon saving, or in bulk), searching and paginating the results.

A sample Rails application using the Tire library can be found on GitHub https://github.com/searchbox-io/rails-sample.

Configuration

Ruby on Rails applications will need to add the following entry into their Gemfile.

gem 'tire'

Update application dependencies with bundler. shell $ bundle install Configure Tire in configure/application.rb or configure/environment/production.rb

ENV['ELASTICSEARCH_URL'] = ENV['SEARCHBOX_URL']

Make your model searchable:

class Document < ActiveRecord::Base
  include Tire::Model::Search
  include Tire::Model::Callbacks
end

When you now save a record:

Document.create :name => "Cost",
               :text => "Cost is claimed to be reduced and in a public cloud delivery model capital expenditure is converted."

The included callbacks automatically add the document to a documents index, making the record searchable:

@documents = Document.search 'Cost'

Tire has very detailed documentation at it’s github page.

Using Haystack with Django

Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends without having to modify your code.

A sample Django application using Haystack can be found on GitHub https://github.com/searchbox-io/django-haystack-sample.

Configuration

In the sample application the requirements.txt file is ready to install:

virtualenv venv
source venv/bin/activate
(venv)  pip install -r requirements.txt

As with most Django applications, you should add Haystack to the INSTALLED_APPS within your settings.py.

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',

    # Added.
    'haystack',

    # Then your usual apps...
]

Add Haystack connection string to integrate with SearchBox into settings.py and set a default index name.

import os

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': os.environ['SEARCHBOX_URL'],
        'INDEX_NAME': 'documents',
        },
    }

HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

Creating SearchIndexes

SearchIndex objects are the way Haystack determines what data should be placed in the search index and handles the flow of data in. You can think of them as being similar to Django Models or Forms in that they are field-based and manipulate/store data.

To build a SearchIndex, all that’s necessary is to subclass both indexes.SearchIndex & indexes.Indexable, define the fields you want to store data with and define a get_model method. We’ll create the following DocumentIndex to correspond to our Document model. This code generally goes in a search_indexes.py file within the app it applies to, though that is not required. This allows Haystack to automatically pick it up. The DocumentIndex should look like:

from haystack import indexes
from myapp.models import Document

class DocumentIndex (indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)

    def get_model(self):
        return Document

Additionally, we’re providing use_template=True on the text field. This allows us to use a data template (rather than error prone concatenation) to build the document the search engine will use in searching. You’ll need to create a new template inside your template directory called search/indexes/myapp/document_text.txt and place the following inside:

{{ object.name }}
{{ object.body }}

Also to integrate Haystack with Django admin, create search_sites.py inside your application;

import haystack

haystack.autodiscover()

Setup views

Add the SearchView To Your URLconf

(r'^search/', include('haystack.urls')),

Search template sample

Your search template with default url configuration is should be placed under your template directory and called search/search.html.

{% for result in page.object_list %}
   <p>{{ result.object.name }}</p>
   <p>{{ result.object.body }}</p>
{% empty %}
   <p>No results found.</p>
{% endfor %}

Searching

With default url configuration you need to make a get request with parameter named q to action /search.

<form action="/search" method="get">
    <input type="text" name="q">
</form>

The Haystack home page is great resource for additional documentation.

Using ElasticSearchClient with Node.js

elasticsearchclient is a lightweight ElasticSearch client for Node.js. It is actively developed and covers core modules of ElasticSearch.

A sample Node.js application can be found on GitHub https://github.com/searchbox-io/node.js-sample.

Configuration

Add elasticsearchclient dependency to your package.json file and use npm to install your dependencies

"dependencies": {
   "elasticsearchclient":">=0.5.1"
}

Create a search client:

var ElasticSearchClient = require('elasticsearchclient'),
url = require('url');

var connectionString = url.parse(process.env.SEARCHBOX_URL);

var serverOptions = {
    host: connectionString.hostname,
    port: connectionString.port,
    secure: false,
    auth: {
        username: connectionString.auth.split(":")[0],
        password: connectionString.auth.split(":")[1]
    }
};

var elasticSearchClient = new ElasticSearchClient(serverOptions);

Index a document

elasticSearchClient.index('sample', 'document', {'name':'Reliability', 'text':'Reliability is improved', id:"1"})
    .on('data', function(data) {
        console.log(data)
    }).exec()

Create a query and search it

var qryObj = {
    "query":{
        "query_string":{
            "query":"Reliability"
        }
    }
};

elasticSearchClient.search('sample', 'document', qryObj)
    .on('data', function (data) {
            console.log(data)
    }).on('error', function (error) {
            console.log(error)
    }).exec()

Using Elastisch with Clojure

Elastisch is a minimalistic Clojure client for ElasticSearch. It is reasonably feature complete, well documented and tested.It closely follows ElasticSearch API structure without no new abstractions and targets Clojure 1.3.0 and later from the ground up.

A sample Clojure application using Elastisch can be found on GitHub https://github.com/searchbox-io/clojure-elastisch-sample.

Configuration

With Leiningen add Elastisch dependency to your project.clj file.

[clojurewerkz/elastisch "1.1.0"]

Install Elastisch via Leiningen

$ lein install

Connect to SearchBox Elasticsearch:

(esr/connect! (System/getenv "SEARCHBOX_URL"))

Index a document:

(esd/create "tweets" "tweet" {:username "Tweety" :text "Tweety Bird (also known as Tweety Pie or simply Tweety) is a fictional Yellow Canary in the Warner Bros."})

Search indexed document:

(esd/search "tweets" "tweet" :query {:query_string {:query "tweety"}})

Elastisch has very detailed documentation at it’s web site.

Using Jest with Java

Jest is a Java HTTP Rest client for ElasticSearch.It is actively developed and tested by SearchBox.io.

A sample Java application using Jest can be found on GitHub https://github.com/searchbox-io/java-jest-sample.

Configuration

Ensure you have added Sonatype repository to your pom.xml

 <repositories>
 .
 .
   <repository>
     <id>sonatype</id>
     <name>Sonatype Groups</name>
     <url>https://oss.sonatype.org/content/groups/public/</url>
   </repository>
 .
 .
 </repositories>

With Maven add Jest dependency to your pom.xml

 <dependency>
   <groupId>io.searchbox</groupId>
   <artifactId>jest</artifactId>
   <version>0.0.4</version>
 </dependency>

Install Jest via Maven

$ mvn clean install

Configuration

Create a Jest Client:

// Configuration
ClientConfig clientConfig = new ClientConfig.Builder(SEARCHBOX_URL")
.multiThreaded(true).build();

// Construct a new Jest client according to configuration via factory
JestClientFactory factory = new JestClientFactory();
factory.setClientConfig(clientConfig);
JestClient client = factory.getObject();

Indexing

Create an index via Jest with ease;

client.execute(new CreateIndex.Builder("articles").build());

Create new document.

Article source = new Article();
source.setAuthor("John Ronald Reuel Tolkien");
source.setContent("The Lord of the Rings is an epic high fantasy novel");

Index article to “articles” index with “article” type.

Index index = new Index.Builder(source).index("articles").type("article").build();
client.execute(index);

Searching

Search queries can be either JSON String or ElasticSearch SearchSourceBuilder object (You need to add ElasticSearch dependency for SearchSourceBuilder).

String query = "{\n" +
    "    \"query\": {\n" +
    "        \"filtered\" : {\n" +
    "            \"query\" : {\n" +
    "                \"query_string\" : {\n" +
    "                    \"query\" : \"Lord\"\n" +
    "                }\n" +
    "            }\n"+
    "        }\n" +
    "    }\n" +
    "}";

Search search = (Search) new Search.Builder(query)
// multiple index or types can be added.
.addIndexName("articles")
.addIndexType("article")
.build();

List<Article> result = client.getSourceAsObjectList(Article.class);

Jest has very detailed documentation at it’s github page.

SearchBox ElasticSearch dashboard

The SearchBox dashboard allows you to create, delete and edit access configurations of your indices and also gives basic statistical information.

The dashboard can be accessed via the CLI:

$ heroku addons:open searchbox
Opening searchbox for sharp-mountain-4005…

or by visiting the Heroku apps web interface and selecting the application in question. Select SearchBox from the Add-ons menu.

Migrating between plans

Application owners should carefully manage the migration timing to ensure proper application function during the migration process.

Use the heroku addons:upgrade command to migrate to a new plan.

$ heroku addons:upgrade searchbox:basic
-----> Upgrading searchbox:basic to sharp-mountain-4005... done
Your plan has been updated to: searchbox:basic

Removing the add-on

SearchBox can be removed via the CLI.

This will destroy all associated data and cannot be undone!

$ heroku addons:remove searchbox
-----> Removing searchbox from sharp-mountain-4005... done, v20 (free)

Troubleshooting

SearchBox.io returns errors as JSON objects with message property.

  • 400 - {“message”:“You have reached your maximum index count, upgrade your plan to add more documents!”}
  • 400 - {“message”:“You have reached your maximum storage size, upgrade your plan for more storage!”}
  • 403 - {“message”:“At least one of given indices does not exist!”}
  • 403 - {“message”:“Given api key is invalid!”}
  • 409 - {“message”:“Index can not be deleted via api.”}
  • 409 - {“message”:“Index name is invalid. Index name should be between 3-16 characters and only letters, numbers and hyphens are allowed”}
  • 409 - {“message”:“An index with given name already exists”}
  • 409 - {“message”:“You have reached maximum index count for your current plan.”}

API limitations

Index refresh times are set to 1 second and can not be invoked via API.

While creating an index below parameters are ignored;

  • store
  • translog
  • cache
  • refresh_interval
  • compound_format
  • term_index_interval
  • term_index_divisor

Additionally, all administrative features of ElasticSearch are restricted from the API. Here list of banned ElasticSearch resources to call:

  • _cluster
  • _shutdown
  • _local
  • _primary
  • _gateway
  • _settings
  • _template
  • _nodes
  • _segments
  • _cache

Please take into account that Searchbox ElasticsSearch is under heavy development and this list may evolve.

Support

All SearchBox support and runtime issues should be submitted via on of the Heroku Support channels. Any non-support related issues or product feedback is welcome at SeachBox Support.

Additional resources