Found Elasticsearch

This add-on is operated by Found AS

Deliver great search experiences!

Found Elasticsearch

Last Updated: 12 March 2014

Table of Contents

Elasticsearch is an open source, distributed, REST-ful search engine. In addition to being a great search engine, it is also great for analytics, storing logs, etc. — a general “NoSQL”-store.

Found Elasticsearch provides dedicated Elasticsearch clusters with reserved memory and storage, ensuring predictable performance. Replication and automatic failover is provided for production and mission critical environments, protecting your cluster against unplanned downtime.

Installing the add-on

To use Found Elasticsearch on Heroku, install the add-on using the heroku command:

A list of all plans available can be found here.

$ heroku addons:add foundelasticsearch

Once Found Elasticsearch has been added, a FOUNDELASTICSEARCH_URL setting will be available in the app configuration and will contain the canonical URL used to access the newly provisioned cluster. This can be confirmed using the heroku config command:

$ heroku config | grep FOUNDELASTICSEARCH_URL
FOUNDELASTICSEARCH_URL => http://<cluster_id>.foundcluster.com:9200

After installing Found Elasticsearch, the application should be configured to fully integrate with the add-on.

Specifying version and plugins

If you want a specific version of Elasticsearch, you can use the --elasticsearch-version-option, e.g. --elasticsearch-version 1.0.1.

We also provide many of the plugins that are available for Elasticsearch. Use --plugins to specify a comma-separated list of plugins you want installed. For example, --plugins analysis-phonetic,river-rabbitmq.

Complete example:

$ heroku addons:add foundelasticsearch --elasticsearch-version 0.90.5 --plugins analysis-phonetic,river-rabbitmq

After the addon has been added, version upgrades and plugin changes can be done through the add-on dashboard.

If you need to use custom plugins, you can upload and select plugins in the add-on dashboard.

Supported versions and plugins

We support versions on both the 0.19, 0.20, 0.90 and 1.0 series.

New versions are made available for provisioning soon after they’re released. It is your choice when (not) to upgrade.

You can also upload custom plugins.

Accessing the add-on dashboard

The Found Elasticsearch dashboard allows you to manage the cluster, like upgrading versions, enabling plugins, editing the access control lists (ACLs), and viewing the logs emitted from the nodes.

Found Elasticsearch Dashboard

The dashboard can be accessed via the CLI:

$ heroku addons:open foundelasticsearch
Opening foundelasticsearch for <your_app_name>

or by visiting the Heroku apps web interface and selecting Found Elasticsearch from the Add-ons menu.

Access control

We strongly advice configuring the access control for your cluster

With the default configuration, since not all Elasticsearch clients support basic authentication, anyone knowing the cluster-ID has full access to your cluster.

We highly recommend using the access control feature to at least require authentication. Authentication uses HTTP Basic-authentication. Most, but not all, HTTP- and Elasticsearch-libraries support this.

You can limit access based on path, source IP, method, username/password and whether SSL is used. The access control-section of the dashboard has annotated samples to use as templates for your own ACLs.

Using the add-on

In this section, we will briefly go through the indexing, updating, retrieving, searching and deleting documents in an Elasticsearch cluster. We will use curl as our client from the command line.

Indexing

To index documents, simply POST documents to Elasticsearch:

$ curl http://<cluster_id>.foundcluster.com:9200/my_index/my_type -XPOST -d '{
    "title": "One", "tags": ["ruby"]
}'
{"ok":true,"_index":"my_index","_type":"my_type","_id":"HAJppjLLTROm8i35IJEQWQ","_version":1}

In the above example, the index my_index is created dynamically when the first document is inserted into it. All documents in Elasticsearch have a type and an id, which is echoed as _type and _id in the JSON responses. If no id is specified during indexing, a random id is generated.

Bulk indexing

To achieve the best possible performance, using the Bulk API is highly recommended. So let us index a couple more documents using the bulk API:

$ curl http://<cluster_id>.foundcluster.com:9200/my_index/my_type/_bulk -XPOST -d '
{"index": {}}
{"title": "Two", "tags": ["ruby", "python"] }
{"index": {}}
{"title": "Three", "tags": ["java"] }
{"index": {}}
{"title": "Four", "tags": ["ruby", "php"] }
'

Elasticsearch should then give us output similar to this:

{"took":10, "items": [
    {"create":{"_index":"my_index","_type":"my_type","_id":"v7ufoXxSSuOTckcyL7hg4Q","_version":1,"ok":true}},
    {"create":{"_index":"my_index","_type":"my_type","_id":"wOzT31EnTPiOw1ICTGX-qA","_version":1,"ok":true}},
    {"create":{"_index":"my_index","_type":"my_type","_id":"_b-kbI1MREmi9SeixFNEVw","_version":1,"ok":true}}
]}

Updating

To update an existing document in Elasticsearch, simply POST the updated document to http://<cluster_id>.foundcluster.com:9200/my_index/my_type/<id>, where <id> is the id of the document. For example, to update the last document indexed above:

$ curl http://<cluster_id>.foundcluster.com:9200/my_index/my_type/_b-kbI1MREmi9SeixFNEVw -XPOST -d '{
    "title": "Four updated", "tags": ["ruby", "php"]
}'
{"ok":true,"_index":"my_index","_type":"my_type","_id":"_b-kbI1MREmi9SeixFNEVw","_version":2}

As you can see, the document is updated and the _version counter is automatically incremented.

Retrieving documents

We can take a look at the data we indexed by simply issuing a GET request to the document:

$ curl http://<cluster_id>.foundcluster.com:9200/my_index/my_type/_b-kbI1MREmi9SeixFNEVw
{"exists":true,"_index":"my_index","_type":"my_type","_id":"_b-kbI1MREmi9SeixFNEVw","_version":2,"_source":{"title": "Four updated", "tags": ["ruby", "php"]}}

If Elasticsearch find the document, it returns a HTTP status code of 200 OK and sets exists: true in the result. Otherwise, a HTTP status code of 404 Not Found is used and the result will contain exists: false.

Searching

Search requests may be sent to the following Elasticsearch endpoints:

http://<cluster_id>.foundcluster.com:9200/_search
http://<cluster_id>.foundcluster.com:9200/{index_name}/_search
http://<cluster_id>.foundcluster.com:9200/{index_name}/{type_name}/_search

We can search using a HTTP GET or HTTP POST requests. To search using a HTTP GET request, we use URI parameters to specify our query:

$ curl http://<cluster_id>.foundcluster.com:9200/my_index/my_type/_search?q=title:T*

A full explanation of allowed parameters is found in the Elasticsearch URI Request documentation

In order to perform more complicated queries, we have to use HTTP POST requests to search. In the next example, we create a facet on the tags field:

Note that we added ?pretty=true to the request, which makes Elasticsearch return a more human readable JSON response. Due to performance reasons, this is not recommended in production.

$ curl http://<cluster_id>.foundcluster.com:9200/my_index/my_type/_search?pretty=true -XPOST -d '{
    "query": {
        "query_string": {"query": "*"}
    },
    "facets": {
        "tags": {
            "terms": {"field": "tags"}
        }
    }
}'

A full explanation of how the request body is structured is found in the Elasticsearch Request Body documentation

To execute multiple queries in one request, use the Multi Search API.

Deleting

Documents are deleted from Elasticsearch by sending HTTP DELETE requests.

  1. Delete a single document:

    $ curl http://<cluster_id>.foundcluster.com:9200/{index}/{type}/{id} -XDELETE
    
  2. Delete all documents of a given type:

    $ curl http://<cluster_id>.foundcluster.com:9200/{index}/{type} -XDELETE
    
  3. Delete a whole index:

    $ curl http://<cluster_id>.foundcluster.com:9200/{index} -XDELETE
    
  4. Delete all documents matching a query:

    For example, to delete all documents whose title starts with T:

        $ curl http://<cluster_id>.foundcluster.com:9200/{index}/{type}/_query -XDELETE -d '{
            "query_string" : { "query" : "title:T*" }
        }
    

    See Elasticsearch Delete By Query for a complete overview of this functionality.

Elasticsearch clients

All Elasticsearch clients using the REST API can be used with this add-on. Also, the Java Transport client can be used. We do not support the Node client.

Elasticsearch comes with a REST API, which can be used directly via any HTTP client.

Many higher-level clients have been built on top of this API in various programmling languages. A large list of Elasticsearch clients and integrations are found here.

To use the Transport client, you will need to use the Found Elasticsearch Transport Module. This enables authentication and encryption, which is not available with the regular transport client.

Tire client (Ruby)

Tire is a rich and comfortable Ruby API on top of the REST API, with built-in support for Rails.

Configuring Tire

require 'rubygems'
require 'tire'

Tire::Configuration.url ENV['FOUNDELASTICSEARCH_URL']

Remember to update application dependencies with bundler.

$ bundle install

Indexing documents

We start by indexing a couple of documents:

Tire.index 'articles' do
  delete
  create

  store :title => 'One',   :tags => ['ruby']
  store :title => 'Two',   :tags => ['ruby', 'python']
  store :title => 'Three', :tags => ['java']
  store :title => 'Four',  :tags => ['ruby', 'php']

  refresh
end

Searching

After indexing the documents, we search for articles that has a title starting with “T”:

s = Tire.search 'articles' do
  query do
    string 'title:T*'
  end
end

s.results.each do |document|
  puts "* #{ document.title } [tags: #{document.tags.join(', ')}]"
end

# * Two [tags: ruby, python]

ActiveModel integration

See the Tire documentation for more examples and in-depth explanations on how to use Tire to integrate with ActiveModel.

Removing the add-on

Found Elasticsearch can be removed via the CLI.

Warning: This will destroy all associated data and cannot be undone!

$ heroku addons:remove foundelasticsearch
-----> Removing foundelasticsearch from <your_app_name>... done, vX (free)

Migrating between plans

Application owners should carefully manage the migration timing to ensure proper application function during the migration process.

Available memory is a very important factor when sizing your Elasticsearch cluster, and replicating across multiple data centers is important for the resilience of production applications. Our plans are differentiated on the available reserved memory and disk quota, as well as on the number of data centers.

Use the heroku addons:upgrade command to migrate to a new plan:

$ heroku addons:upgrade foundelasticsearch:newplan
-----> Upgrading foundelasticsearch:newplan to <your_app_name__... done, vX ($YY/mo)
       Your plan has been updated to: foundelasticsearch:newplan

Upgrading to a new plan is done by extending the existing cluster with new nodes and migrating data from the old nodes to the new nodes. When the migration is finished, the old nodes are shut down and removed from the cluster. You can search and index while this happens.

Integrating with Heroku Postgres or another SQL database.

While we generally recommend spending time to create performant indexing strategies, using the JDBC river can be a great way to get started with Elasticsearch.

We only support connecting to PostgreSQL and MySQL using Elasticsearch 0.90.3 or newer. You can specify the version when provisioning or by using the add-on dashboard.

Using the jdbc-river, Elasticsearch can connect and fetch data directly from a PostgreSQL or MySQL database. A “river” is a pluggable service running within an Elasticsearch cluster pulling data (or being pushed with data) that is then indexed into the cluster.

To get started, add the jdbc-river addon to your Elasticsearch cluster either when adding the add-on, or by using the add-on dashboard. See Accessing the add-on dashboard for more information about how to access the add-on dashboard.

Once the required plugin has been added to your cluster, we need to create the river with the correct database information, which can be found by using the heroku config command when using Heroku Postgres:

$ heroku config
=== your-app-name Config Vars
HEROKU_POSTGRESQL_MAROON_URL: postgres://USERNAME:PASSWORD@HOSTNAME:PORT/DATABASE_NAME

To create the river instance, send a HTTP PUT request with the river configuration to Elasticsearch:

$ curl -XPUT 'https://<cluster_id>.foundcluster.com:9243/_river/my_jdbc_river/_meta' -d '{
    "type" : "jdbc",
    "jdbc" : {
        "driver" : "org.postgresql.Driver",
        "url" : "jdbc:postgresql://HOSTNAME:PORT/DATABASE_NAME",
        "user" : "USERNAME",
        "password" : "PASSWORD",
        "sql": "select * from TABLE_NAME"
    },
    "index" : {
        "index" : "jdbc",
        "type" : "jdbc"
    }
}'
{"ok":true,"_index":"_river","_type":"my_jdbc_river","_id":"_meta","_version":1}

By default, the data will be indexed once and the river will be stopped, but other strategies are available.

To remove the river at a later time, send a HTTP DELETE request:

$ curl -XDELETE 'https://<cluster_id>.foundcluster.com:9243/_river/my_jdbc_river/'
{"ok":true}

For more information about configuration options please see the JDBC river documentation

Support

Please mail support@found.no if you have any problems.

Additional resources

Found Elasticsearch exposes the majority of the Elasticsearch REST API, which means that most valid Elasticsearch API requests will work with your provisioned instance. Please refer to the Elasticsearch guide for more in-depth explanations of all the possibilities.