The Apache Cassandra database is the right choice of database if you are looking for scalability and high availability without compromising performance for your mission-critical applications. Additionally, Cassandra’s support for replicating across multiple data centers is best-in-class, providing lower latency for users and the peace of mind of knowing that you can survive regional outages. To that end, we’ve compiled a list of the Top 10 reasons why enterprises onboarding and deploying mission-critical applications should use Cassandra (Apache Cassandra, DataStax Enterprise).
1) Requirement for fast writes: Easily deals with data velocity, data variety and data complexity issues
Many of the challenges associated with next-generation cloud applications center around data volume and data velocity. Is Cassandra able to handle the speed of data coming into the system? The answer is “yes” based on the amount of data and cluster size. Not only does Cassandra come with this ability out of the box, but there are systems of data pipeline architectures being built around ingestion speed. And to top it off, it scales linearly, making it easy to determine the right amount of capacity based on data flow.
But, there are also two often overlooked components it provides: 1) data variety and 2) data complexity. Data variety is an alternate way of saying that data coming into one database can come in different forms. An example of this would be sensor input from a heart monitor and sensor input from an IV both for the same patient. The second component, data complexity, extends the previous example. The heart monitor might report 100 metrics twice per second under normal operating circumstances and up to 125 metrics once per second while the wearer is sleeping. This means the write patterns, locations, and frequencies can vary. Cassandra handles these situations gracefully.
2) Can handle massive data sets
If there were any questions about whether Cassandra is capable of handling large data sets, there is no need to look any further than the companies using it. They operate at massive scale—Netflix, Hulu, Instagram, eBay, Apple, and Spotify all have Cassandra working in interesting ways as part of their offerings.
The other way you know Cassandra is up to the challenge is in use case examples. Many organizations use it for applications where data grows in an unbounded way very quickly. These include Twitter clones, a web log analytics data warehouse and telemetry or sensor data.
3) Homogeneous environment
Unlike some of the legacy distributed systems, Cassandra does not require outside support for synchronization. All of the required components for basic operation are built in. Since Cassandra also operates in a peer-to-peer fashion, this means that there is no master-slave or sharding setup and that all nodes in the ring are equal. Additionally, there is only one machine type that an administrator needs to worry about.
4) Highly fault-tolerant
Cassandra employs many mechanisms for fault tolerance. Since it is masterless, there is no single point of failure. There is also the potential for zero downtime rolling upgrades. This is because Cassandra can support the temporary loss of multiple nodes (depending on cluster size) with negligible impact to the overall performance of the cluster.
The safety net Cassandra offers extends outside of your data center as well. Cassandra allows you to replicate your data to other data centers and keep multiple copies in multiple locations. This helps satisfy many regulatory requirements in addition to being a part of an strong disaster recovery and business continuity plan.
5) Proven success across enterprise applications and in many use cases already
There are already many examples where Cassandra is being used effectively. Banks and other financial institutions are storing large quantities of financial data in Cassandra. Analytics companies are using Cassandra to store web analytics data. Medical companies are using Cassandra to store sensor data and other time series inputs. There are also many companies making use of Cassandra for storing internet of things (IoT) data.
6) Ease of administration
Cassandra is a straightforward system to administer. With Cassandra being a masterless system, all nodes in the ring are the same—a homogenous system. It’s fault-tolerant and can support the temporary loss of nodes with minimal impact to production performance. This means that nodes are easy to replace and the requirement to replace downed nodes immediately isn’t as strict.
7) Custom tuning
There are a lot of knobs and levers that can be turned to get Cassandra to perform optimally for your workload and environment. You can set it up to operate in a way that is consistent with your workload. For example, if you write lots of log data and read infrequently, then there are configuration tweaks to be made for write-heavy systems. If you write heavily to one data center and then do all your reading from another data center, then you can adjust the settings on a data center-by-data center basis. This idea of tuning isn’t just available at the application level; you can also tune the JVM and Java settings, including GC and logging levels. Changes can even be made by the drivers at connection time to aid in the performance of your system.
8) Easy to integrate core applications
A lot of work has been done on data manipulation and parsing systems to integrate with Cassandra. For instance, the full text search engine Apache Solr has been packaged to work with Cassandra to provide full featured search capabilities to an existing Cassandra database. Apache Spark, a big data analytics engine, also has been plugged in to work on an existing database. There are entire suites of tools that can be integrated or bolted on to increase its capabilities. These include Apache Mahout, Kafka and Zipkin, to name a few. This is important because the more tools you have available to you, the more powerful your data becomes. You also have the ability to gain more insight about your data without having to build and maintain the application systems that were previously required.
9) Excellent Monitoring Options
Included in the system of tools referenced in No. 8 are monitoring packages. If you are a user of automated monitoring platforms such as Datadog or Netuitive, you’ll find examples of prepackaged agents to monitor the important parts of Cassandra. You can then tack on your own additions of other metrics that are important to you. This is made possible by Cassandra taking advantage of Java MBeans and exposing them to the client. You can use these to get at much of the internal information Cassandra uses to make its own decisions and decide on its own health. Datastax also offers its own monitoring and control application called Opscenter.
10) Amazing community
One of the best things about any piece of software is having a great community of developers and experts available to you for help or guidance. There is a huge yearly database summit put on by Datastax, the primary backer and largest contributor of code to Cassandra. It also sponsors community events all over the world so you can meet and interact with other developers around you.
For open-source software to be successful, there needs to be an ecosystem that develops around it. In the case of Cassandra, there are consultancies, monitoring and troubleshooting systems, plugins, instrumentation systems and backup systems. That is all a set a competencies that your organization no longer needs to own and can use what the greater community has already created. There are even PaaS companies that will completely take over the management for you, leaving you just the development of your application to focus on.
Given the sizable number of organizations and people that are a part of the ecosystem and the Cassandra community as a whole, there is no shortage of articles, documentation and people willing to help. A welcoming and helpful community isn’t always a given, but in the case of Cassandra, it’s alive and thriving. This is important because software is always about people. The more of them that you can interact with that have shared your experiences, the better. It also will be easier to find solutions to your problems having a network of people who might have faced them before.
There are many reasons why Cassandra could be the right tool for your application. Knowing your systems requirements, workloads and future will help you make the right choice. As you can see, if you do choose Cassandra, you are bound to be in good company.