SearchBox Elasticsearch

This add-on is operated by Sebula Bilisim Teknolojileri LTD STI

Search Made Simple.

SearchBox Elasticsearch

Last Updated: 13 October 2014

Table of Contents

SearchBox is an add-on for providing full-text hosted search functionality powered by Elasticsearch.

SearchBox offers real time searching, bulk indexing, faceting, geo tagging, auto-complete, suggestions (did you mean support) , saved queries(percolation) and many more without headache.

Installing the add-on

SearchBox can be installed to a Heroku application via the CLI:

$ heroku addons:add searchbox

Once Searchbox has been added SEARCHBOX_URL and SEARCHBOX_SSL_URL setting will be available in the app configuration and will contain the account name and api-key to access SearchBox indices service. This can be confirmed using the heroku config command.

$ heroku config | grep SEARCHBOX_URL
SEARCHBOX_URL  => http://paas:8ed0986ecaabcb7c20b4b2bdd6251f2d@.....searchly.com
$ heroku config | grep SEARCHBOX_SSL_URL
SEARCHBOX_SSL_URL  => https://paas:8ed0986ecaabcb7c20b4b2bdd6251f2d@.....searchly.com

After installing SearchBox the application should be configured to fully integrate with the add-on.

SearchBox does NOT creates index automatically. So ensure to create an index via API or dashboard.

Using with Rails

Elasticsearch Rails client is a Ruby client for the Elasticsearch supports;

  • ActiveModel integration with adapters for ActiveRecord and Mongoid
  • Repository pattern based persistence layer for Ruby objects
  • Active Record pattern based persistence layer for Ruby models
  • Enumerable-based wrapper for search results
  • ActiveRecord::Relation-based wrapper for returning search results as records
  • Convenience model methods such as search, mapping, import, etc
  • Rake tasks for importing the data
  • Support for Kaminari and WillPaginate pagination
  • Integration with Rails' instrumentation framework
  • Templates for generating example Rails application

A sample Rails application can be found on GitHub https://github.com/searchly/searchly-rails-sample.

Configuration

Ruby on Rails applications will need to add the following entry into their Gemfile.

gem 'elasticsearch-model'
gem 'elasticsearch-rails'

Update application dependencies with bundler. shell $ bundle install Configure Rails Elasticsearch in configure/application.rb or configure/environment/production.rb

Elasticsearch::Model.client = Elasticsearch::Client.new host: ENV['SEARCHBOX_URL']

Index Creation

From Rails console, create documents index for model Document.

Document.__elasticsearch__.create_index! force: true

Make your model searchable:

class Document < ActiveRecord::Base
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
end

When you now save a record:

Document.create :name => "Cost",
               :text => "Cost is claimed to be reduced and in a public cloud delivery model capital expenditure is converted."

The included callbacks automatically add the document to a documents index, making the record searchable:

@documents = Document.search('Cost').records

Elasticsearch Rails has very detailed documentation at official Elasticsearch page.

Using Haystack with Django

Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends without having to modify your code.

A sample Django application using Haystack can be found on GitHub https://github.com/searchly/searchly-django-haystack-sample.

Configuration

In the sample application the requirements.txt file is ready to install:

virtualenv venv
source venv/bin/activate
(venv)  pip install -r requirements.txt

As with most Django applications, you should add Haystack to the INSTALLED_APPS within your settings.py.

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',

    # Added.
    'haystack',

    # Then your usual apps...
]

Add Haystack connection string to integrate with SearchBox into settings.py and set a default index name.

import os
from urlparse import urlparse

es = urlparse(os.environ.get('SEARCHBOX_URL') or 'http://127.0.0.1:9200/')

port = es.port or 80

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': es.scheme + '://' + es.hostname + ':' + str(port),
        'INDEX_NAME': 'documents',
    },
}

if es.username:
    HAYSTACK_CONNECTIONS['default']['KWARGS'] = {"http_auth": es.username + ':' + es.password}

Creating SearchIndexes

SearchIndex objects are the way Haystack determines what data should be placed in the search index and handles the flow of data in. You can think of them as being similar to Django Models or Forms in that they are field-based and manipulate/store data.

To build a SearchIndex, all that’s necessary is to subclass both indexes.SearchIndex & indexes.Indexable, define the fields you want to store data with and define a get_model method. We’ll create the following DocumentIndex to correspond to our Document model. This code generally goes in a search_indexes.py file within the app it applies to, though that is not required. This allows Haystack to automatically pick it up. The DocumentIndex should look like:

from haystack import indexes
from myapp.models import Document

class DocumentIndex (indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)

    def get_model(self):
        return Document

Additionally, we’re providing use_template=True on the text field. This allows us to use a data template (rather than error prone concatenation) to build the document the search engine will use in searching. You’ll need to create a new template inside your template directory called search/indexes/myapp/document_text.txt and place the following inside:

{{ object.name }}
{{ object.body }}

Also to integrate Haystack with Django admin, create search_sites.py inside your application;

import haystack

haystack.autodiscover()

Setup views

Add the SearchView To Your URLconf

(r'^search/', include('haystack.urls')),

Search template sample

Your search template with default url configuration is should be placed under your template directory and called search/search.html.

{% for result in page.object_list %}
   <p>{{ result.object.name }}</p>
   <p>{{ result.object.body }}</p>
{% empty %}
   <p>No results found.</p>
{% endfor %}

Searching

With default url configuration you need to make a get request with parameter named q to action /search.

<form action="/search" method="get">
    <input type="text" name="q">
</form>

The Haystack home page is great resource for additional documentation.

Using with Node.js

Elasticsearch Node.js client is official client for Node.js.

A sample Node.js application can be found on GitHub https://github.com/searchly/searchly-nodejs-sample.

Configuration

Add elasticsearch dependency to your package.json file and use npm to install your dependencies

"dependencies": {
   "elasticsearch": ">=1.1.0"
}

Create a search client:

var elasticsearch = require('elasticsearch');

var connectionString = process.env.SEARCHBOX_URL;

var client = new elasticsearch.Client({
    host: connectionString
});

Index a document

client.index({
  index: 'sample',
  type: 'document',
  id: '1',
  body: {
          name: 'Reliability',
          text: 'Reliability is improved if multiple redundant sites are used, which makes well-designed cloud computing suitable for business continuity.'
  }
}, function (error, response) {
  console.log(response);
});

Create a query and search it

client.search({
        index: 'sample',
        type: 'document',
        body: {
            query: {
                query_string:{
                   query:"Reliability"
                }
            }
        }
    }).then(function (resp) {
        console.log(resp);
    }, function (err) {
        console.log(err.message);
    });

Detailed documentation for Nodejs client can be found here

Using Elastisch with Clojure

Elastisch is a minimalistic Clojure client for ElasticSearch. It is reasonably feature complete, well documented and tested.It closely follows ElasticSearch API structure without no new abstractions and targets Clojure 1.3.0 and later from the ground up.

A sample Clojure application using Elastisch can be found on GitHub https://github.com/searchbox-io/clojure-elastisch-sample.

Configuration

With Leiningen add Elastisch dependency to your project.clj file.

[clojurewerkz/elastisch "1.1.0"]

Install Elastisch via Leiningen

$ lein install

Connect to SearchBox Elasticsearch:

(esr/connect! (System/getenv "SEARCHBOX_URL"))

Index a document:

(esd/create "tweets" "tweet" {:username "Tweety" :text "Tweety Bird (also known as Tweety Pie or simply Tweety) is a fictional Yellow Canary in the Warner Bros."})

Search indexed document:

(esd/search "tweets" "tweet" :query {:query_string {:query "tweety"}})

Elastisch has very detailed documentation at it’s web site.

Using Jest with Java

Jest is a Java HTTP Rest client for ElasticSearch.It is actively developed and tested by SearchBox.

A sample Java application using Jest can be found on GitHub https://github.com/searchly/searchly-java-sample.

Configuration

Ensure you have added Sonatype repository to your pom.xml

 <repositories>
 .
 .
   <repository>
     <id>sonatype</id>
     <name>Sonatype Groups</name>
     <url>https://oss.sonatype.org/content/groups/public/</url>
   </repository>
 .
 .
 </repositories>

With Maven add Jest dependency to your pom.xml

 <dependency>
   <groupId>io.searchbox</groupId>
   <artifactId>jest</artifactId>
   <version>0.1.2</version>
 </dependency>

Install Jest via Maven

$ mvn clean install

Configuration

Create a Jest Client:

// Get connection url from env
String connectionUrl = System.getenv("SEARCHBOX_URL");

// Construct a new Jest client according to configuration via factory
JestClientFactory factory = new JestClientFactory();
factory.setHttpClientConfig(new HttpClientConfig
       .Builder(connectionUrl)
       .multiThreaded(true)
       .build());
JestClient client = factory.getObject();

Indexing

Create an index via Jest with ease;

client.execute(new CreateIndex.Builder("articles").build());

Create new document.

Article source = new Article();
source.setAuthor("John Ronald Reuel Tolkien");
source.setContent("The Lord of the Rings is an epic high fantasy novel");

Index article to “articles” index with “article” type.

Index index = new Index.Builder(source).index("articles").type("article").build();
client.execute(index);

Searching

Search queries can be either JSON String or ElasticSearch SearchSourceBuilder object (You need to add ElasticSearch dependency for SearchSourceBuilder).

String query = "{\n" +
    "    \"query\": {\n" +
    "        \"filtered\" : {\n" +
    "            \"query\" : {\n" +
    "                \"query_string\" : {\n" +
    "                    \"query\" : \"Lord\"\n" +
    "                }\n" +
    "            }\n"+
    "        }\n" +
    "    }\n" +
    "}";

Search search = (Search) new Search.Builder(query)
// multiple index or types can be added.
.addIndex("articles")
.addType("article")
.build();

List<Article> result = client.getSourceAsObjectList(Article.class);

Jest has very detailed documentation at it’s github page.

SearchBox Elasticsearch dashboard

The SearchBox dashboard allows you to create, delete and edit access configurations of your indices and also gives basic statistical information.

The dashboard can be accessed via the CLI:

$ heroku addons:open searchbox
Opening searchbox for sharp-mountain-4005…

or by visiting the Heroku apps web interface and selecting the application in question. Select SearchBox from the Add-ons menu.

Migrating between plans

Application owners should carefully manage the migration timing to ensure proper application function during the migration process.

Use the heroku addons:upgrade command to migrate to a new plan.

$ heroku addons:upgrade searchbox:basic
-----> Upgrading searchbox:basic to sharp-mountain-4005... done
Your plan has been updated to: searchbox:basic

Removing the add-on

SearchBox can be removed via the CLI.

This will destroy all associated data and cannot be undone!

$ heroku addons:remove searchbox
-----> Removing searchbox from sharp-mountain-4005... done, v20 (free)

Troubleshooting

SearchBox.io returns errors as JSON objects with message property.

  • 400 - {“message”:“You have reached your maximum index count, upgrade your plan to add more documents!”}
  • 400 - {“message”:“You have reached your maximum storage size, upgrade your plan for more storage!”}
  • 403 - {“message”:“At least one of given indices does not exist!”}
  • 403 - {“message”:“Given api key is invalid!”}
  • 409 - {“message”:“Index can not be deleted via api.”}
  • 409 - {“message”:“Index name is invalid. Index name should be between 3-16 characters and only letters, numbers and hyphens are allowed”}
  • 409 - {“message”:“An index with given name already exists”}
  • 409 - {“message”:“You have reached maximum index count for your current plan.”}

API limitations

Index refresh times are set to 1 second and can not be invoked via API.

While creating an index below parameters are ignored;

  • store
  • translog
  • cache
  • refresh_interval
  • compound_format
  • term_index_interval
  • term_index_divisor

Additionally, all administrative features of ElasticSearch are restricted from the API. Here list of banned ElasticSearch resources to call:

  • _cluster
  • _shutdown
  • _local
  • _primary
  • _gateway
  • _template
  • _nodes
  • _segments
  • _cache

Support

All SearchBox support and runtime issues should be submitted via on of the Heroku Support channels. Any non-support related issues or product feedback is welcome at SeachBox Support.

Additional resources