Any shop that has a DevOps culture, and that has MongoDB backed applications, uses replica sets… not just for production applications, but for those still in the prototyping phase. Replication is critical for uptime, so your Ops team has been working with your Devs all through the development of your project, and you didn’t defer a replication strategy until after you had a working app.
Nor did you neglect to plan for your reporting needs. It is a given that beyond running your app, you need to get analytics on user behavior, general use, and other business-specific metrics. You can’t just run reports against your live database, (for reasons I detail below, in case they are not yet obvious to you). In your DevOps shop, developers and admins discussed, at the inception of the project, the plan for sequestering reporting tasks so that they do not interfere with production tasks.
Keeping Reporting Reads Away From Production
Limiting reporting queries to dedicated nodes is a canonical example, used all over the MongoDB replication documentation. Reporting does not require writes, and permits eventually consistent data. Daily summaries do not suffer if they are derived from data which is seconds or minutes stale. It does not change the fundamental meaning of user behavior reports if your counts are missing a few actions, and some tallies are slightly misaligned.
But why is it so important to take advantage of this?
Aside: The Problem With Mingling Production and Analytics
Skip this section if this is already a no-brainer for you. If you want some background, read on.
Consider the working set, the subset of your entire database that MongoDB reads from and writes to over any given time span. Your production environment will have a set of active users, and the documents pertaining to their data will be what the OS attempts to keep in physical memory.
(Do not let your working set grow larger than RAM! If it’s heading there, you need to shard, so capacity planning is critical. But that’s a separate lesson. Monitor your instances with MongoDB’s free monitoring servicee, and maybe take an upcoming webinar in capacity planning.)
Even if your database is hundreds or thousands of times the size of your available memory, if you have planned your schema and indices well, MongoDB will run efficiently because the data outside the working set can remain untouched on disk. As users become idle, their documents will fall out of use, and the RAM they occupy will become available to contain documents for new active users.
Reporting jobs, though, read a wide range of data, do not visit the same data repeatedly, and each reporting task may address completely different data sets. On any database of decent size, this means these jobs require constant ejection of recently used documents from RAM to make room for new documents to be read. If you run these jobs on the same instance that is backing your production work load, your reporting jobs will fight with your production apps for RAM footprint, continuously ejecting your live users’ data, while your app continuously reloads it. Congratulations, you have built a thrashing machine.
Replica Sets With Dedicated Reporting Instances
You can build dedicated reporting nodes atop MongoDB replication by taking advantage of hidden replica set members, or tag sets in concert with read preferences. The first method is simpler, the second is more flexible.
Reviewing MongoDB Replica Sets
MongoDB Replica Sets create uptime durability by replicating data to all the nodes in a set, and providing seamless fail-over to clients. They contain one primary node that allows writes, while the rest are read-only secondaries. They manage among themselves which is primary, holding elections to determine which node should be primary when conditions require. Replica sets should contain an odd number of members to facilitate rapid elections without ties.
It is fundamentally not knowable whether unreachable machines are down or if the network has been partitioned, so if a majority of the nodes in a replica set go offline (say, 2 out of a 3-member set), even if a healthy primary remains, it will step down to a read-only secondary. Not doing so could lead to multiple machines declaring themselves primary in the case of a network partition, and horrific data inconsistencies.
Thus a replica set contains a minimum of 3 members, providing a fault tolerance of one machine failure.
Reporting Instance, Hidden Member
Hidden members of a replica set are configured to be priority: 0
, to prevent them from ever being elected primary, and to be hidden: true
, which prevents clients connected to the replica set from routing reads to it, even if they specify a read preference of secondary
.
To read from this hidden member, you will use a standalone connection, rather than the MongoReplicaSetClient
type, and specify slave_ok
.
Hidden Member Setup
We can use the mongo shell to hide a member of an existing replica set:
# connect to primary directly
Kusanagi:~ avery$ mongo --port 29019
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:29019/test
// using my local replica set playground
PRIMARY> conf = rs.config()
{
"_id" : "test",
"version" : 21,
"members" : [
{
"_id" : 0,
"host" : "Kusanagi.local:29017",
},
{
"_id" : 1,
"host" : "Kusanagi.local:29018",
},
{
"_id" : 2,
"host" : "Kusanagi.local:29019",
}
]
}
// we'll use members[1], the instance on port 29018
PRIMARY> conf.members[1].priority = 0
PRIMARY> conf.members[1].hidden = true
PRIMARY> conf.version += 1
PRIMARY> rs.reconfig(conf)
Tue Apr 1 12:24:34.045 DBClientCursor::init call() failed
Tue Apr 1 12:24:34.046 trying reconnect to 127.0.0.1:29019
Tue Apr 1 12:24:34.047 reconnect 127.0.0.1:29019 ok
reconnected to server after rs command (which is normal)
Kusanagi.local:29018 is now hidden. It will continue to replicate and vote in elections as usual, but clients connecting to the replica set will never read from it, even if Kusanagi.local:29019 is taken down:
irb(main):012:0> rs = Mongo::MongoReplicaSetClient.new(["Kusanagi.local:29017", "Kusanagi.local:29018", "Kusanagi.local:29019"])
=> <Mongo::MongoReplicaSetClient:0x3fe06e4fe564 @seeds=[["Kusanagi.local", 29017], ["Kusanagi.local", 29018], ["Kusanagi.local", 29019]] @connected=true>
irb(main):013:0> rs.primary
=> ["Kusanagi.local", 29017]
irb(main):014:0> rs.secondaries
=> #<Set: {}>
# an empty set -- as far as this connection is concerned, there are no secondaries.
Reporting code would look like this (in Ruby):
require 'mongo'
reporting = Mongo::MongoClient.new("Kusanagi.local", "29018", slave_ok: true)
# error checking goes here
reporting['my_application']['users'].aggregate(...)
Considerations
Using a hidden member is the simplest way to set up an instance for a dedicated workload such as reporting, however:
Hidden Members Cannot Be Read From In Case of Emergency
With 2 ordinary and one hidden member in a replica set, fault tolerance for writing is identical to a regular 3-member set. However, should you lose two nodes, your production application will not be able to gracefully degrade to read-only mode, because your hidden member will not allow replica set client reads. If you just like the simplicity of a hidden member, and cost is not an issue, use a 5-member set (with one member hidden) instead.
Wrapper Code for Replica Sets Cannot Be Used
Many teams create application-specific wrappers to add infrastructure knowledge to the clients provided by MongoDB drivers. Since you need to address your reporting instance with a stand-alone connection, you will not be able to reuse this investment, which will make you sad.
Reporting Instance, Tagged Member
The more complex, but flexible, method for routing reporting queries to a dedicated node is to use tagging and read preferences.
As with the hidden member, set one member to priority: 0
, but do not set it to be hidden. Instead, assign a tag of use: reporting
:
PRIMARY> conf = rs.config()
{
"_id" : "test",
"version" : 21,
"members" : [
{
"_id" : 0,
"host" : "Kusanagi.local:29017",
},
{
"_id" : 1,
"host" : "Kusanagi.local:29018",
},
{
"_id" : 2,
"host" : "Kusanagi.local:29019",
}
]
}
// we'll use members[1], the instance on port 29018
PRIMARY> conf.members[1].priority = 0
PRIMARY> conf.members[1].tags = { "use": "reporting" }
PRIMARY> conf.version += 1
PRIMARY> rs.reconfig(conf)
[...]
As before, Kusanagi.local:29018 will never become primary; however, in the case that the other two machines become unreachable, your application will be able to issue reads to the reporting server. It should go without saying that your reporting should be suspended during such an event.
Your reporting code will look something like this (in Python, this time):
from pymongo import MongoReplicaSetClient
from pymongo.read_preferences import ReadPreference
rep_set = MongoReplicaSetClient(
'Kusanagi.local:29017,Kusanagi.local:29018,Kusanagi.local:29019',
replicaSet = 'test',
read_preference = ReadPreference.SECONDARY,
tag_sets = [{'use':'reporting'}]
)
# check to ensure we're not running reporting against the sole remaining secondary
if rep_set.primary is not None:
rep_set.my_application.users.aggregate(...)
The above will only send reporting queries to secondaries tagged use: reporting
, and it guards against running at all if there is no available primary. In practice, you should throw exceptions and handle them in your escalation code if you find no primaries! Better still, your monitoring could set runtime-available values so you can branch on, for example, reporting_system.ok()
.
Benefits and Considerations
Using tags and read preferences allows a few degrees of flexibility that are not possible with hidden members.
Reporting Instances Can Be Easily Added
Because your connection code is declarative, rather than specific to a particular host, to add more nodes for reporting jobs, just add them and tag them, like so:
PRIMARY> rs.add({_id:3, host:"Kusanagi.local:29020", priority:0, tags:{'use':'reporting'}})
Your existing code will make use of the new capacity, and the replica set will continue operating without triggering an election and disconnecting your clients.
Reporting Instances Can Be Shifted or Dropped
The reporting tag can be moved, or even removed if you need to offer read bandwidth to other jobs in a pinch. A reconfiguration like this will trigger an election and disconnect all your clients, but this is no worse than any other option. Note: It is an anti-pattern to increase your general capacity by distributing production reads to secondaries. This is only an emergency measure.
Some Drivers Require Manual Sync
The Ruby driver (as of 1.9.2), for example, does not refresh its view of the replica set unless the client is initialized explicitly to do so with refresh_mode: :sync
. Check your driver documentation.
Conclusion
Simple replication setup has been one of my favorite operational aspects of MongoDB since its introduction pre–1.0, when it made MySQL replication look like something out of the stone age. It was a little rough around the edges back then, but has truly come into its own. Whether you use tag sets or hidden members, building a reporting infrastructure atop MongoDB’s replication features simplifies operations, letting you focus on building a great application.