MongoDB Needs and Loves DevOps

DevOps is about more than just running infrastructure smarter, with more automation. It’s a means to engineering agile, robust, and modern software by blending the traditionally siloed areas of administration and development. This means infrastructure as code, and developer concern for operations. It means more than calling sysadmins “DevOps Engineers”; it means operators and developers working together seamlessly. NoSQL in general is a solution that by it’s nature loves DevOps, and it thrives under teams that take a DevOps approach, but it can really hurt you if you do not. MongoDB is no exception to this rule. It has gained enormous traction in the developer community by blending flexibility, features, and performance. Some teams have been seduced by the ease with which they set up MongoDB for development, and neglect to think about capacity planning, tuning, monitoring, and maintenance. Do not let this happen to you! Or you will pay the price at 3am when you are scrambling to figure out why your swell app has ground to a halt. Instead, take the time to learn MongoDB’s needs and capabilities, because it doesn’t just make development easy, it can truly sweeten operations with things like hassle-free replication and full access to administrative tasks via the same API used to perform data access. Let’s have a look at MongoDB from a DevOps perspective.

DevOps Wants:

Clueful administrators.

There isn’t much room in the DevOps movement for system administrators who don’t speak at least one high-level language. I personally find Ruby most appealing to to me for systems programming, but Python is also a fine choice. MongoDB admins should be practiced at performing administrative tasks in both the javascript shell and the programming language of their choice, setting the stage for a rich infrastructure monitoring and automating codebase. Administration best practices are extensively, if not exhaustively, documented in the administration docs, so start there. Pay particular attention to the Operational Strategies section of the manual, which provides a checklist of concerns you should address as you plan your system’s design and deployment.

Developers who know the ramifications of their infrastructure.

The age of throwing applications over the wall to operations is mercifully drawing to a close. Developers must understand the nature of their infrastructure, how to best utilize it, and how it handles failures. MongoDB affords developers the ability to code to infrastructure details without coupling to particular instances, or even hostnames. In this, tags, write concerns, and read preferences are your friends. Code that runs reports (which should be kept away from your production instances) can specify that it will read only from instances tagged as “reporting”. Admins can add instances tagged as ‘reporting’, or swap which instances have that tag, with a few lines of code. Refer to the end of this article for example code.

Infrastructure state to alert cleanly, and for trends to be visualized.

The MongoDB Monitoring manual section provides an overview of the ways to monitor your instances, including 3rd party solutions, both self-hosted and SaaS. MongoDB Management Service is MongoDB, Inc.’s free monitoring, trending, and alerting service, which you should set up from day one to monitor every single instance you rely on.

Infrastructure to be programmable.

Ok, you are monitoring your instances, why not work on automating your infrastructure to respond to faults and load requirements? While MMS is a no-brainer, it’s not going to plug into your infrastructure code. You can roll your own integration using MongoDB’s handy exposure of its internals via database commands, and for replication, the local database. For example, you can check on replica set lag in Python like so:

import pymongo
# three mongods running on my laptop
c = pymongo.MongoReplicaSetClient(
    'Kusanagi.local:29017,Kusanagi.local:29018,Kusanagi.local:29019',
    replicaSet='test'
)

rs_status = c.admin.command('replSetGetStatus')

lag_threshold = 120 # up to two minutes of replica lag is ok for this example
primary_optime = [ member['optime'].as_datetime()
    for member in rs_status['members'] if member['state'] == 1 ][0]
secondary_optimes = [ member['optime'].as_datetime()
    for member in rs_status['members'] if member['state'] == 2 ]
for secondary_optime in secondary_optimes:
    if ( (primary_optime - secondary_optime).total_seconds > lag_threshold):
        # do stuff

Provisioning MongoDB instances programmatically on Amazon EC2 is easy with the official AMIs, but automating MongoDB infrastructure goes beyond that. Learn to use the database commands to configure replica sets, because you can script joining new instances to a set. If you use sharding, you can update tag ranges for sharded clusters to migrate chunks to shards specifically provisioned to warehouse data.

Disaster recovery to be robust and automated.

There are numerous ways to back up MongoDB, and the one you pick should be appropriate for your deploy. So read through your options in the backups section of the MongoDB Manual. You should probably read that section twice.

To aid in release management.

By being schema-free, MongoDB frees enterprises from the hassles of schema alterations and their attendant maintenance windows. Instead of altering a user table to include a new column for “birthplace”, a new release of your application will just begin to write the user document with that attribute included. Naturally, your user profile rendering will not assume this value is set and throw an exception, it will just not display “birthplace”. This does not excuse you from change management! In a DevOps shop, everything about monitoring that applies to production applies to your integration environment, which I sincerely hope is continuous. Configure all the monitors you’d have in production, and alert developers when metrics go out of range.

Closing Thoughts

In a DevOps culture, admins read code, developers plan infrastructure, and they both attend the same planning meetings. Engineers who work with MongoDB, both in operations and development, should read the WHOLE manual. If your developers read the development section and your admins read the administration section, you’re really only doing Dev and Ops.

Example Code – Tags

Tagging an instance of a replica set looks like this in the MongoDB shell:

test:PRIMARY> config = rs.config()
{
    "_id" : "test",
    "version" : 2,
    "members" : [
        {
            "_id" : 0,
            "host" : "Kusanagi.local:29017"
        },
        {
            "_id" : 1,
            "host" : "Kusanagi.local:29018"
        },
        {
            "_id" : 2,
            "host" : "Kusanagi.local:29019"
        }
    ]
}
test:PRIMARY> config.members[0].tags = { "use": "production" }
{ "use" : "production" }
test:PRIMARY> config.members[1].tags = { "use": "reporting" }
{ "use" : "reporting" }
test:PRIMARY> config.members[2].tags = { "use": "production" }
{ "use" : "production" }
test:PRIMARY> rs.reconfig(config)
Fri Mar  7 18:35:25.040 DBClientCursor::init call() failed
Fri Mar  7 18:35:25.042 trying reconnect to 127.0.0.1:29017
Fri Mar  7 18:35:25.042 reconnect 127.0.0.1:29017 ok
reconnected to server after rs command (which is normal)

Here’s Ruby code, assuming the same config:

require 'mongo'

# I'm running 3 mongods on my laptop
rset = Mongo::MongoReplicaSetClient.new(["Kusanagi.local:29017",
                                        "Kusanagi.local:29018",
                                        "Kusanagi.local:29019"])

# using the local db to get current config
config = rset['local']['system.replset'].find_one() 
admin_db = rset['admin']
config['members'][0]['tags'] = { "use" => "production" }
config['version'] += 1  # mongod avoids reconfig conflicts by insisting on an increasing version value
verify_version = config['version']
begin
    admin_db.command({ 'replSetReconfig' => config })
rescue Mongo::ConnectionFailure # expect your connection to close during this operation
    actual_version = rset['local']['system.replset'].find_one()['version']
    raise unless actual_version == verify_version
end

Specifying tag sets only applies to reads from secondaries (since there is only one primary to choose from). In Ruby, it looks like this:

rset = Mongo::MongoReplicaSetClient.new(["Kusanagi.local:29017",
                "Kusanagi.local:29018", "Kusanagi.local:29019"])
analytics = rset['analytics']  # say we have an 'analytics' DB
cursor = analytics['page_views'].find({'page' => 'home'},
                                        :read => :secondary,
                                        :tag_sets => {:use => 'reporting'})

This query specifies, with the :read option, that only secondaries should answer, and with the :tag_sets option, that only secondaries tagged with 'use': 'production' can be used. If I were to shut down members[0] (Kusanagi.local:29017) while it was primary, and members[1] (Kusanagi.local:29018) became primary, the above query would fail, because as primary it’s no longer eligible to answer queries directed at secondaries, even though member 1 is tagged as use: 'production'. This is exactly what you want, to keep your reporting queries from hosing your production instance. After taking stock of the situation, you might decide to assign 'use': 'reporting' to member 3, and the above query would then again work. Note that the MongoReplicaSetClients created by the Ruby driver do not by default refresh replica set state (as of v1.9.2), they use the config that was present when the connection was established (this is not the case for all drivers, make sure you determine your driver’s behavior). This means that if you reassign tag values, existing connections will not notice the change. If I were to change the tags for member 1 to 'use': 'not reporting', the above query would not fail, it would route the query right to member 1. So if you are using tagging, and Ruby, instantiate your client with the refresh_mode option:

rset = Mongo::MongoReplicaSetClient.new(["Kusanagi.local:29017",
            "Kusanagi.local:29018", "Kusanagi.local:29019"],
            :refresh_mode => :sync)

See the MongoReplicaSetClient documentation for details. The Python driver handles this by launching a background process to monitor Replica Set health and config changes, but the API does not document any means to configure its polling interval.