Building Full Text Search and Pattern Matching on iOS with Heroku Postgres

Last Updated: 25 March 2014

ios mobile postgres

Table of Contents

Search is an essential part of any modern applications. For mobile applications in particular, search may be the primary window into an application’s content, so it is crucial to provide a responsive, robust search interface to users.

By using the Heroku Postgres add-on in your Heroku applications, you can make use Postgres’s advanced full text search indexing and operators. Take advantage of these features to easily build search APIs that can be integrated into native iOS controls.

Overview

This article highlights some best practices in designing and implementing search interfaces on iOS using a Sinatra backend. Though the example application uses Sinatra and the Sequel ORM, one could easily implement an equivalent backend API using a comparable web framework and ORM in any other language.

Code for the iOS Client and the Sinatra application is available on GitHub.

Final Product - Master & Detail View

Our sample application lets you find information about common houseplants. It follows the master-detail format, such that plants are listed in a master table view, and tapping on a particular plant will push a detail view for that plant. However, rather than showing a paginated list of plants, in this example the master table view dynamically loads results as the user types in the search bar. The user can search for a plant by either its Latin name and common names, or the content of its full description.

Prerequisites

  • Basic Objective-C knowledge, including a development environment running Mac OS X with Xcode 4.2 installed.
  • The Heroku Toolbelt must be installed and configured.
  • Basic Ruby knowledge, including an installed version of Ruby 1.9.2, Rubygems, Bundler, and Rails 3.2.
  • A PostgreSQL server running locally. Installation instructions can be found in Local Postgres Installation.
  • A Heroku user account. Signup is free and instant.

Full text indexing on Postgres

Postgres comes with built-in full text search functionality. Here is a rundown of the salient data types and operators:

  • TSVector: a sorted list of distinct lexemes, or root words (for example, eat, eats, ate, and eating are all forms of the same lexeme).
  • TSQuery: a collection of lexemes to be searched for that respects Boolean operators (AND / OR / NOT, and their symbolic counterparts) and parentheses for logical groupings of terms.
  • @@: the match operator, which can be used in WHERE clauses in statements to find tsvector values that match a particular tsquery.

Creating the base schema

The example application uses a plants table, with varchar columns common_name, latin_name, & description:

CREATE TABLE plants (
    id SERIAL,
    latin_name varchar,
    common_name varchar,
    description varchar
);

Adding a TSVector column

In order to make the contents of the plants table searchable, a TSVector column is added:

ALTER TABLE plants
  ADD tsv TSVector;

Adding a trigger function

Although the column is in place, its contents must be updated to include the text from the searchable columns. The easiest way to ensure that this column is kept up-to-date is to use a trigger, which can be set to execute every time a row is inserted or updated:

CREATE TRIGGER TS_tsv
  BEFORE INSERT OR UPDATE ON plants
FOR EACH ROW EXECUTE PROCEDURE
  tsvector_update_trigger(tsv, 'pg_catalog.english', latin_name, common_name, description);

tsvector_update_trigger() is a function that updates a TSVector column from the contents of the specified columns. Here, the tsv column will store a TSVector representation from latin_name, common_name, and description.

Adding a full text index

The last step is to add an index on this column to maximize query performance:

CREATE INDEX tsv_GIN ON plants
  USING GIN(tsv);

For full text indexing, Postgres provides two index types, GIN and GiST, which have significantly different performance characteristics that are worth mentioning:

  • GIN index lookups are about three times faster than GiST
  • GIN indexes take about three times longer to build than GiST
  • GIN indexes are moderately slower to update than GiST indexes
  • GIN indexes are two-to-three times larger than GiST indexes

Since the content in the example will remain relatively static, the optimized performance on lookups make up for the extra time and space needed to create the index in the first place.

Adding a pattern matching index

Pattern matching is a standard feature of SQL databases. If a column contains relatively short strings, such as names or titles, then basic filtering on that column individually can be more easily accomplished using pattern matching than full text search.

In the example, plants can be filtered by name (common or Latin) as the user types. Since these columns will be used for lookup, it’s important that they are properly indexed:

CREATE INDEX latin_name_idx ON plants(latin_name);
CREATE INDEX common_name_idx ON plants(common_name);

Unlike the full text index, it is not important to specify the indexing scheme, as the default B-Tree index will suffice.

Full text search SQL

Using the full text index requires SQL of the following form:

SELECT * FROM plants
  WHERE tsv @@ to_tsquery('english', ':*?');

This statement returns all of the rows in which the tsv column (the sorted set of lexemes from the text of the searchable columns) matches the search query.

to_tsquery() is a function that returns a TSQuery object, and takes in two arguments: the dictionary used to normalize the query string, and the query string itself, which is preceded by :* in order to partially match on the initial content of the string.

Pattern matching SQL

Using the pattern matching index is just a matter of using the ILIKE clause:

SELECT * from plants
  WHERE common_name ILIKE '?%'
    OR latin_name ILIKE '?%';

This statement returns all of the rows in which the common_name or latin_name match the search query, using the ILIKE, or case-insensitive pattern match, operator. Here, the query string appends a %, the wildcard character, which allows for partial matches.

The API endpoint

Search results are provided through a single API endpoint, GET /plants/search, which takes two required params: q, the query string, and scope, either “title” or “content”. The appropriate search query is then performed:

case params[:scope]
when 'title'
  @plants = Plant.filter(:common_name.ilike("#{params[:q]}%") |
                         :latin_name.ilike("#{params[:q]}%"))
when 'content'
  @plants = Plant.filter("tsv @@ to_tsquery('english', ?)", "#{params[:q]}:*")
end

For the example project, the endpoint implementation can be found in app.rb. You can query the endpoint directly using curl:

$ curl "http://stark-galaxy-3381.herokuapp.com/plants/search?scope=title&q=nightshade"

{
    "plants": [
        {
            "common_name": "Nightshade, White Jasmine Nightshade",
            "description": "A large genus with plants of all life forms. Those usually grown are erect or vining shrubs producing star-shaped flowers.",
            "latin_name": "solanum"
        }
    ]
}

The API endpoint is really a thin veneer around the Postgres database, making use of the indexes and queries to perform the searches.

The iOS client

UISearchDisplayController manages the display of a search bar and a table view of search results.

On the client, PlantsViewController conforms to UISearchDisplayControllerDelegate, implementing -searchDisplayController:shouldReloadTableForSearchString: and -searchDisplayController:shouldReloadTableForSearchScope:; these delegate methods are called when either the search string or search scope are changed.

Both methods are set to update the search results through -filterContentsWithSearchString:searchScope, and returning NO, since search results will be updated manually once results are asynchronously loaded from the API:

PlantsViewController.m

- (void)filterContentsWithSearchString:(NSString *)searchString searchScope:(PlantSearchScope)searchScope {
    if ([[searchString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]] length] == 0) {
        return;
    }

    if (searchScope == TitleSearchScope) {
        NSPredicate *latinNamePredicate = [NSPredicate predicateWithFormat:@"latinName BEGINSWITH[cd] %@", searchString];
        NSPredicate *commonNamePredicate = [NSPredicate predicateWithFormat:@"commonName BEGINSWITH[cd] %@", searchString];

        self.filteredContents = [[self.contents filteredSetUsingPredicate:[NSCompoundPredicate orPredicateWithSubpredicates:[NSArray arrayWithObjects:latinNamePredicate, commonNamePredicate, nil]]] allObjects];
    }

    [Plant plantsWithSearchString:searchString searchScope:searchScope success:^(NSSet *plants) {
        self.contents = [self.contents setByAddingObjectsFromSet:plants];
        self.filteredContents = [plants allObjects];
    } failure:^(NSError *error) {
        NSLog(@"Error: %@", error);
    }];
}

First, the method returns early if the search string is blank. This is necessary since -searchDisplayController:shouldReloadTableForSearchScope: calls this method, and can be triggered with empty search text.

If the search is scoped to titles, local results can be filtered while search results are loaded asynchronously from the API. This pattern of filtering local data while waiting for remote data to be loaded is a great way to keep your app feeling fast and responsive, even when the network is slow or unavailable.

Plant +plantsWithSearchString:searchScope:success makes a call to the API. When the results are successfully loaded, they are added to a shared collection (used when filtering locally) and displayed in the search results table view.

Plant.m

Plant.m uses an HTTP client to talk to the API backend, passing in the appropriate flags and query, and processing the returned JSON.

Here’s an excerpt from the API call:

[[PlantsAPIClient sharedClient] getPath:@"/plants/search" parameters:mutableParameters success:^(AFHTTPRequestOperation *operation, id JSON) {
      NSMutableSet *mutablePlants = [NSMutableSet set];
      for (NSDictionary *attributes in [JSON valueForKey:@"plants"]) {
          Plant *plant = [[Plant alloc] initWithAttributes:attributes];
          [mutablePlants addObject:plant];
      }
  /* .. */
  } failure:^(AFHTTPRequestOperation *operation, NSError *error) {
    /* .. */
  }];

Next steps

This example provides a working template for searching records from an API. Depending on your particular needs for your application, you may want to look into additional PostgreSQL dictionaries and parsers, or at how to highlight search terms in search results.