Ben Wen

This article was contributed by Ben Wen

Ben works at MongoLab, a cloud hosting provider of the NoSQL MongoDB database. MongoLab monitors, backs up, and simplifies running MongoDB in production.

follow @mongolab on twitter

Building a Real-time, Polyglot Application with Node.js, Ruby, MongoDB and Socket.IO

Last Updated: 07 February 2014

mongodb node polyglot ruby socketio

Table of Contents

Real-time apps, or evented apps that incorporate push-based interactivity, are the basis for a new generation of in-browser capabilities such as chat, large-scale games, collaborative editing and low-latency notifications. Though there are many technologies supporting the real-time movemement, four stand out in particular: Ruby, Node.js, MongoDB and Socket.IO.

This article walks you through both the architecture and code for building a real-time app with these technologies.

If you have questions about Node.js on Heroku, consider discussing it in the Node.js on Heroku forums.

Prerequisites

Overview

Sample code for this article's Ruby data-writer component and Node.js web application is available on GitHub and can be seen running at http://tractorpush.herokuapp.com/

The TractorPush real-time application uses a Ruby-based data component to push messages into a queue in MongoDb that is then received by the Node.js web app and pushed to users’ browsers via Socket.IO. In effect, the entire stack works in a push notification manner.

Overview of TractorPush components

The system shares a backing service (MongoDB) which acts as the glue that connects the independent apps that make up the whole.

MongoDB as a message queue

Queues are a powerful mechanism for describing interoperating but independent processes. There are tens if not hundreds of commercially viable solutions, from the venerable and widely implemented IBM WebSphere MQ to newer open-source and open-standards ones like RabbitMQ and ZeroMQ.

MongoDB serves as a capable polyglot message-queue because of its flexible document storage capabilities, wide-variety of supported languages and tailable cursor “push” feature. Marshalling and unmarshalling of arbitrarily complex JSON messages is handled automatically. Safe-writes are enabled for improved message durability and reliability, and tailable cursors are used to “push” data from MongoDB to Node.js.

Provision Ruby publisher app

Ruby’s mature web frameworks (Rails, Sinatra) make it ideal for the user-facing portion of most web apps which is often the origin of queued messages. TractorPush simulates this with a data-writer written in Ruby.

Create application

Clone the app from GitHub and create the app on Heroku.

$ git clone https://github.com/mongolab/tractorpush-writer-ruby.git
Cloning into tractorpush-writer-ruby...
...
Resolving deltas: 100% (8/8), done.

$ cd tractorpush-writer-ruby
$ heroku create tp-writer
Creating tp-writer... done, stack is cedar
http://tp-writer.herokuapp.com/ | git@heroku.com:tp-writer.git
Git remote heroku added

Attach MongoDB

Provision the MongoLab add-on to create the MongoDB instance that will contain the message queue.

$ heroku addons:add mongolab
----> Adding mongolab to tp-writer... done, v2 (free)
      Welcome to MongoLab.

Configure MongoDB capped collection

A MongoDB capped collection supports tailable cursors which allows MongoDB to push data to the listeners. If this type of cursor reaches the end of the result set, instead of returning with an exception, it blocks until new documents are inserted into the collection, returning the new document.

Also capped collections are extremely high performance. In fact, MongoDB internally uses a capped collection for storing the operations log (or oplog). As a trade-off, capped collections are fixed (i.e. “capped” in size) and not shardable. For many applications this is acceptable.

The TractorPush application relies on a capped collection in MongoDB to store messages. Open the MongoLab add-on dashboard for the writer app with heroku addons:open mongolab and add a new collection using the “Add” button.

Add MongoLab collection

Name the collection messages and expand the advanced options to specify a capped collection of 8,000,000 bytes (ample space for the demo).

Create capped collection

Deploy publisher app

We have created the message writer app on Heroku and have configured the MongoLab add-on. Next, deploy the application to Heroku.

$ git push heroku master
Counting objects: 20, done.
...
-----> Heroku receiving push
-----> Ruby app detected
...
-----> Launching... done, v4
       http://tp-writer.herokuapp.com deployed to Heroku

The TractorPush writer is a simple and headless Ruby program that writes a message to the MongoDB database. Looking at the Procfile reveals that the writer process type is labeled worker. Scale the worker process to a single dyno to begin queuing messages.

$ heroku ps:scale worker=1
Scaling worker processes... done, now running 1

Verify that the worker process is running with heroku ps.

$ heroku ps
=== worker: `ruby writer.rb`
worker.1: up for 47s

Using Ruby to queue messages

The TractorPush application uses three types of document-based messages to demonstrate the flexibility of MongoDB object marshalling/unmarshalling: simple (or name-value), array and complex (or nested document) messages.

The writer.rb script writes one of the three document types to a MongoDB collection at a default rate of one per second.

while(true)
  coll.insert(doc, :safe => true)
  sleep(rate)
end

The :safe write-option ensures that the database has received and acknowledged the message document without error.

You can also access the MongoLab add-on dashboard with heroku addons:open mongolab to view the increase in the collection's contents and document count.

Viewing the logs shows the message types as they’re queued.

$ heroku logs -t --ps worker.1
2012-03-23T14:56:35+00:00 app[worker.1]: Inserting complex message
2012-03-23T14:56:36+00:00 app[worker.1]: Inserting simple message
2012-03-23T14:56:37+00:00 app[worker.1]: Inserting simple message
2012-03-23T14:56:38+00:00 app[worker.1]: Inserting array message

Because the messages collection is a capped collection old documents will be discarded if the collection size exceeds its limit.

Provision Node.js web app

The consumer side of the system is a Node.js web application that consumes messages from the capped collection. Clone the app locally and create the app on Heroku.

$ git clone https://github.com/mongolab/tractorpush-server.git
Cloning into tractorpush-server...
...
Resolving deltas: 100% (8/8), done.

$ cd tractorpush-server
$ heroku create tp-web
Creating tp-web… done, stack is cedar
http://tp-web.herokuapp.com/ | git@heroku.com:tp-web.git
Git remote heroku added

Sharing application resources

In order to use two language environments (Node.js and Ruby) as a single system the two applications must share the message-store. Share the MongoDB instance between the writer and web apps by copying the MONGOLAB_URI config var from the writer app and setting it on the Node.js web app.

Note that the MONGOLAB_URI includes your connection username and password. Please keep it confidential

$ heroku config:set -a tp-web `heroku config -a tp-writer -s | grep MONGOLAB_URI`
Adding config vars and restarting app... done, v23
  MONGOLAB_URI => mongodb://heroku...eroku_app123456

Removing the mongolab add-on from tp-writer, or destroying the app itself, will irreversibly de-provision the database even though it's still referenced from tp-web. Be careful of such situations when working with shared resources.

Deploy web app

Applications deployed with a web process type will automatically be scaled to one web dyno.

Deploy the Node.js app to Heroku and check the status of the web process with heroku ps.

$ git push heroku master
Counting objects: 30, done.
...
-----> Heroku receiving push
-----> Node.js app detected
...
-----> Launching... done, v3
       http://tp-web.herokuapp.com deployed to Heroku

$ heroku ps
=== web: `node app.js`
web.1: up for 40s

Run heroku open to open the application in your browser to see the JSON form of each message type being pushed, in real-time, from the Ruby writer app to the Node.js web app and finally to your browser.

TractorPush screenshot

Consuming messages in Node.js

The readAndSend function in app.js of the Node.js web app is responsible for consuming the messages sent to the capped collection by the ruby data writer component.

function readAndSend(socket, collection) {
  collection.find({}, {'tailable': 1, 'sort': [['$natural', 1]]}, function(err, cursor) {
    cursor.intervalEach(300, function(err, item) {
      if(item != null) {
        socket.emit('all', item);
      }
    });
  });
  // ...
};

The call to collection.find returns a cursor that iterates over all documents in the messages collection. The 'tailable' option specifies for the cursor to wait for additional data if it’s reached the end of the result-set, thus mimicking real-time message receiving behavior.

For demonstration of listening to multiple queues an additional collection.find call does the same for only complex message types.

collection.find({'messagetype':'complex'}, {'tailable': 1, 'sort': [['$natural', 1]]}, function(err, cursor) {
  cursor.intervalEach(900, function(err, item) {
    if(item != null) {
      socket.emit('complex', item);
    }
  });
});

Notice that each iteration of the cursor emits the message document using socket.emit.

socket.emit('all', item);
// and
socket.emit('complex', item);

This pushes the message document (a JSON object) from the server to any connected browser clients. The library that powers this client-push feature is called Socket.IO.

Pushing messages to the browser with Socket.IO

The Node.js web application serves up a single index.html page that uses Socket.IO to open a connection to the server and create a listener attached to message types ‘all’ and ‘complex’.

var socket = io.connect('/');
socket.on('all', function (data) { ... }
socket.on('complex', function (data) { … }

True bi-directional messaging with WebSockets is now available as a Heroku labs feature.

In reality, the client is polling the server for more data as the server-side Socket.IO configuration forces the connection to utilize XHR-polling with a 10-second timeout.

io.configure(function () {
  io.set("transports", ["xhr-polling"]);
  io.set("polling duration", 10);
});

Once the browser has shown all available messages it will stop. As new messages are inserted into the database, the browser will be pushed the new messages and resume.

Conclusion

The four technologies covered in this article are just one of many combinations that support a componentized real-time app. More fundamental is the role MongoDB’s data-flexibility and Cedar’s polyglot capabilities play in eschewing monolithic applications for a more modular system design.