Open source is the dominant model of software consumption in the modern era. Cutting-edge startups and entrenched incumbents alike find open source software development and community building a significant part of their overall business strategies.
For many open source projects, independent steering committees ensure that projects remain stable and are not beholden to a single company or profit motive. Alternatively, for open core projects where a single company financially supports and contributes the vast majority of open source code, open sourcing makes a project publicly available and accessible for potential new users. Opening a project to public contribution widens the pool of potential users and contributors, therefore ensuring more rapid innovation than commercial software offerings.
Choosing the Right Open Source Projects for Your Real-Time App
There are seemingly limitless open source projects to evaluate. Further complicating matters, each project has its own domain competence and cultural nuances. Considering a single distributed application will have tens or (many) more open source dependencies, development teams must continuously test new open source offerings and hone new skills to use these solutions, while simultaneously architecting their applications.
In order to help sort through the noise, I’ve looked at six categories of open source projects for this article: data storage, message systems, service meshes, REST frameworks and streaming frameworks. For each category, I’ll identify any dominant players as well as some other projects of note.
Here are the top 13 open source projects by category.
Data Storage: Cassandra, Hazelcast
Apache Cassandra, originally developed at Facebook, is one of the most widely used open source projects for real-time applications. Functionally, it’s a distributed NoSQL database management system. But practically, it’s become the linchpin of most at-scale, real-time applications. This is because Cassandra was designed to handle large amounts of real-time data across many servers, and has robust clustering support.
For applications that require extremely low latency or have exceptionally high data rates, Hazelcast may provide a useful alternative. Unlike Cassandra, Hazelcast is an in-memory data grid database, which enables development teams to benefit from the latency advantages of in-memory computing.
Message Systems: Kafka, Pulsar, NATS
Apache Kafka is the dominant open source messaging system used by application development teams today. Originating at LinkedIn, Kafka has grown into the de facto standard for high performance open source messaging systems. For streaming use cases, Kafka users can also turn to Kafka Streams.
Apache Pulsar, developed at Yahoo!, is another alternative messaging system which has been optimized for streaming data sources. Pulsar further differentiates itself with TTL message support and queuing via shared subscription.
For applications where latency is a critical consideration, NATS.io (a CNCF project) is one of the fastest pub-sub messaging systems available.
Service Meshes: Istio, Linkerd
Trends such as microservices, serverless and hybrid cloud deployment have contributed to the increasing architectural complexity of distributed applications. With hundreds or thousands of distributed application services, service meshes provide a mechanism for pulling all the disparate services together into a manageable and cohesive whole.
The most notable open source service mesh offering is Istio. One of the most popular open source service mesh projects, Istio is primarily used for applications built on top of Kubernetes. Another alternative is Linkerd, a flexible service mesh solution for Kubernetes or other frameworks.
REST Frameworks: Node.js, Spring
Node.js is a popular JavaScript framework for building scalable web applications. When comparing web development frameworks, Node.js typically ranks among the top three most (along with frontend frameworks React.js and Angular.JS). While Node.js can support large scale applications, it does not support real-time use cases well at scale.
Spring is another application framework, optimized for distributed microservices-style architectures. Like Node.js, Spring “focuses on the ‘plumbing’ of applications so that teams can focus on application-level business logic, without unnecessary ties to specific deployment environments.”
Streaming Frameworks: swimOS
SwimOS is a software framework for distributed streaming applications. Unlike REST-based frameworks, swimOS utilizes a unique and lightweight Web Agent architecture, optimized for composing and transforming many data streams in real-time. Whereas DevOps oriented projects, such as Spinnaker, seek to provide an abstraction layer to converge distributed cloud services, swimOS provides an abstraction across distributed data streams, so teams can build applications by composing many distributed data streams in a single mesh environment.
In order to ensure real-time consistency of distributed services, swimOS automatically ensures eventual consistency between all Web Agents. Complete with a UI toolkit and it’s own stateful messaging protocol called WARP, swimOS is the fastest way to build stateful streaming applications.
DevOps: Kubernetes, Spinnaker, RancherOS
Kubernetes is one of the most talked about open source projects in the world. Kubernetes describes itself as “an open-source system for automating deployment, scaling and management of containerized applications.”
While it may simply be a platform for container orchestration, rightly or wrongly, Kubernetes has become a stand-in for DevOps on the whole. In spite of its dominance, there are alternatives to Kubernetes, for example there’s RancherOS to manage Docker containers.
Furthermore, as application architectures continue to become more distributed and complex there is a need for platforms which can converge multiple distributed edge and cloud environments. Spinnaker provides an abstraction layer to connect the various environments and provides a simple mechanism for continuous integration/continuous delivery (CI/CD).
Final Thoughts
Your own particular use case should determine the appropriate open source technologies for your team’s real-time application. Whether you’re building an ultra-high performance low latency application or simply trying to tame large volumes of real-time data, knowing your options and choosing the right framework or platform can save you significant time and effort.
The open source software landscape is changing every day. If you have something to add about one of the technologies above or think I skipped over the next OSS breakthrough, I’d love to hear about it. Let me know your thoughts in the comments section below.