We’re in the middle of what we at ScyllaDB dubbed The Next Tech Cycle. Not “The World of Tomorrow” nor “The Shape of Things to Come” nor “The Wave of the Future”—because it’s already here today. We’re right in the middle of it. It’s a wave that’s carrying us forward, but it began with trends that got their start earlier this century.
Everything is co-evolving, simultaneously, from the hardware you run on to the languages and operating systems you work with to the operating methodologies you use day-to-day. All of those familiar technologies and business models are themselves undergoing revolutionary change.
This Next Tech Cycle Goes far Beyond “Big Data”
We’re talking about huge data. Welcome to the Zettabyte Era. This era, depending on who is defining it and what it is measuring, either started in 2010 (if you’re measuring total data stored on earth) or in 2016 (if you’re measuring total internet protocol traffic in a year).
Right now, individual data-intensive corporations are generating information at the rate of petabytes per day and storing exabytes in total. There are some prognosticators who believe we’ll see humanity, our computing systems and our IoT-enabled machinery generating half a zettabyte of data per day by 2025.
Yet, conversely, we’re also seeing the importance of small data. Look at the genomics revolution.
Because the RNA gene sequence of, say, COVID-19 is actually not that big, data-wise—you can store it in less than 100kb; less than an ancient floppy. But it is increasingly important to understand every single byte of that information because vaccinating against this global pandemic requires understanding every change of that rapidly-evolving pathogen.
So, this next tech cycle has to scale everything from huge data and small data systems. And the database you will use and the data analytics you perform need to align with the volume, variety and velocity of the data you have under management.
“Great! Now Make it All Multi-Cloud!”
Also, this next tech cycle is not just the “cloud computing cycle.” AWS launched in 2006. Google Cloud was launched in 2008. And Microsoft Azure was formally launched in 2010. So, we’re already well over a decade past the dawn of the public cloud. Yet this next tech cycle definitely builds on the ecosystems, methodologies and technologies these hyperscalers provide.
The database you use also has to align with where you need to deploy it. Does it only work in the cloud, or can it be deployed on-premises far behind your firewall? Does it just work with one cloud vendor or is it deployable to any of them? Or all of them simultaneously? These are important questions.
Just as we do not want to be locked into old ways of thinking and doing, the industry does not want to be locked into any one technology provider.
If you’ve just been mastering the art of running stateful distributed databases on a single cloud using Kubernetes, that’s not good enough. Now you’re being asked to do it all over again; this time in a hybrid or multi-cloud environment using Anthos, OpenShift, Tanzu, EKS Anywhere or Azure Arc.
Computing Beyond Moore’s Law
It’s also not the basic broadband or wireless internet revolutions. We’re a full two decades into both of those. Yet the advent of gigabit broadband and the new diverse range of 5G services—also capable of scaling to a gigabit—enable incredible new opportunities in real-time data streaming services, IoT and more.
So how does your database work when you need to connect to systems far and near? How important are the limitations of the speed of light to your latencies? How well do you deal with data ingested from hundreds of millions of endpoints at gigabit-per-second scale?
And finally, underpinning all of this are the raw capabilities of silicon, summed up by the transistor and core counts of current generation CPUs. We’ve already reached 64-core CPUs. The next generation(s) will double that, to the point where a single CPU will have more than 100 processors. Fill a rack-based high-performance computer with those and you can easily get into thousands of cores per server.
And all of this is just traditional CPU-based computing. You also have GPU advancements that are powering the world of distributed ledger technologies like blockchain. Plus, all of this is happening concurrently as IBM plans to deliver a 1,000 qubit quantum computer in 2023 and Google plans to deliver a computer with one million qubits by 2029.
This next tech cycle is powered by all of these fundamentally revolutionary capabilities. It’s what’s enabling real-time full streaming data from anyone to anywhere. And this is just the infrastructure.
If you dive deeper into that infrastructure, you know that each of the hardware architecture bottleneck points is undergoing its own revolution.
We’ve already seen CPU densities growing. Yet vanilla standalone CPUs themselves are also giving way to full systems on a chip (or SoCs).
And while they’ve been used in high-performance computing in the past, expect to see commonly-available server systems with greater than 1,000 CPUs. These will be the workhorses —or, more specifically, the warhorses—of this next generation: Huge beasts capable of carrying mighty workloads.
Memory, another classic bottleneck, is getting a huge boost from DDR5 today and DDR6 in just a few years. Densities are going up, so you can expect to see warhorse systems with a full terabyte of RAM. These and larger scales are going to be increasingly common—and, for businesses, increasingly affordable.
Storage is also seeing its own revolution with the recently-approved NVMe base and transport specifications, which will enable much easier implementation of NVMe over fabrics.
Now, software will have to play catch-up to these capabilities. Just as it took time for kernels and then applications inside of a vertically scaled box to be made async everywhere, shared-nothing, sharded-per-core, and NUMA-aware, this next tech cycle is going to require systems to adapt to whole new methodologies of getting the most from these new hardware capabilities. We will need to reconsider many foundational software assumptions.
Evolving Methodologies: Agile and Beyond
Speaking of methodologies, just look at these from the dawn of the millennium and onward. As an industry, we’ve moved from batch operations and monolithic upgrades performed with multi-hour windows of downtime on the weekend to a world of streaming data and continuous software delivery performed 24 × 7 × 365 with zero downtime ever.
And by migrating to the cloud (and this always-on world) we’ve exposed ourselves and our organizations to a world of random chaos and security threats. We now have to operate fleets of servers autonomously and orchestrate them across on-premises, edge and multiple public cloud vendor environments.
While Scrum has been around since the 1980s, and continuous integration since 1991, in this century, the twelve principles of the Agile Manifesto in 2001 altered the very philosophy—never mind the methodologies—underlying the way software is developed.
The Agile Manifesto’s very first line talks about the highest priority being “to satisfy the customer through early and continuous delivery of valuable software.”
However, the specific term continuous delivery (CD), as we know it today, didn’t take hold until 2009. It was then joined to continuous integration and coincided with the birth of what we now know as DevOps.
With that, you had a framework for defining change-oriented processes and software life cycles through a responsive developer culture that now, a decade or two into this revolution, everyone takes as a given.
This graph shows how continuous integration (CI) and continuous delivery (CD) evolved independently. They were eventually conjoined by the term CI/CD, whose popularity as a search term only began to rise around 2016 and did not displace the two separate terms until early in 2020. It is now increasingly rare to refer to CI or CD separately, according to Google Trends.
Onto that baseline were built tools and systems and philosophies that extended those fundamental principles. The Chaos Monkeys of the world, as well as the pentesters, want to break your system—or break into them—to uncover flaws and defects long before something stochastic and catastrophic or someone maliciously does it for you.
Cloud-native technologies like Kubernetes and single-source-of-truth for infrastructure methodologies like GitOps were created out of sheer necessity to scale systems to the hundreds of thousands of production software deployments under management.
DevSecOps
And that’s still not enough. We’ve already seen software supply chain attacks like SolarWinds, low-level system attacks like Spectre, Meltdown and Zombieload, or human-factor threats like viral deep fakes and millions of fictitious social media accounts using profile images generated by generative adversarial networks, never mind millions of IoT-enabled devices being nefariously harnessed for distributed denial-of-service (DDoS) botnet attacks. Just recently, news broke of the Log4j vulnerability—our best wishes to anyone who was patching code this past holiday season.
These are all just the bowshocks of what’s still to come.
So now, your AI-powered security systems are locked in combat every day in real-time against the threat actors attempting to undermine your normal operations. We know this because a growing number of intrusion prevention and malware analysis systems are built at terabyte-and-beyond scale using distributed databases as their underlying storage engines.
Hence, these days, rather than just DevOps, increasingly we talk about DevSecOps—because security cannot be an afterthought. Not even for your MVP. Not in 2022.
And these methodologies are continuing to evolve.
Summary
This next tech cycle is already upon us. We’re in the thick of it. You can feel it in the same way you’d yearn for major life or career change. Maybe it’s a programming language rebase. Is it time to rewrite some of your core code as Rust? Or maybe it’s in the way you are considering repatriating certain cloud workloads or extending your favorite cloud services on-premises such as through an AWS Outpost. Or maybe you’re actually looking to move data to the edge? Planning for a massive fan-out? Whatever this sea change means to you, you’ll need rock-solid infrastructure that’s available today to take workloads into production right now, but flexible enough to keep growing with your emerging, evolving requirements.