There’s a reason the term “data mining” came into existence. Enterprises sit on a vast resource of information—millions of digital gems that, if fully extracted, can not only answer many of their toughest questions, but also help them meet their most ambitious organizational goals.
DevOps professionals have a unique opportunity to impact this mining process. Their vision can transform the way their organizations utilize data by building fresh capabilities and strategies into the applications and platforms they create. Yet, recent advancements in data science, particularly the use of parallel processing platforms that interrogate massive datasets in near-real time, are changing the game as well.
Time-to-Insight Is Shrinking
Parallel processing improves analytics along two important paths.
First is the time requirement. When organizations are able to process billions of rows of data in milliseconds, it changes the value of that data. In many industries—financial services, for example—there is a big difference between a query that is answered in seconds or minutes, versus one that is answered in a fraction of a second. Those small deltas in time can make a big difference in the bottom line. As they say, time is money.
The other path is the curiosity factor. In conventional analytics, the paradigm is to ask a question and get an answer. But if the user can ask one question, get the answer and then ask three more questions based on that result, without concern for a time penalty, it creates an opportunity to be more curious. This is a pain point that is underappreciated and largely overlooked in most organizations. Without speed and agility, decisionmakers can end up in a rut, asking the same questions over and over again because the time/value equation never improves.
When queries only take milliseconds to answer, it’s easy to ask new questions, dig deeper into data and look at problems in entirely new ways. DevOps has the ability to leverage the potential of parallel processing to empower their organizations and significantly raise the value of data.
Here are three of the more prominent ways.
Use Multiple Processors to Scale Processes
A central data repository facilitates analytics throughout the organization, especially for enterprises that make heavy use of business intelligence (BI). While a centralized location is advantageous, it also places huge demands on DevOps to deliver adequate performance.
Parallel processing can support this approach, even in mixed GPU/CPU environments. Parallelized code allows a processor to compute multiple data items simultaneously. This is necessary to achieve optimal performance on GPUs, which contain thousands of execution units. However, optimizing hybrid execution also translates well to CPUs, which increasingly have wide execution units capable of processing multiple data items at once. Parallelized computation can improve query performance in CPU-only systems as well.
Accelerated analytics platforms allow single queries to span more than one physical host when data is too large to fit on a single machine. Such platforms can scale up to tens, even hundreds of billions of rows on a single node and can also run across multiple nodes. Load times improve linearly with the number of nodes because loading can be done concurrently.
Connect Data Silos for Deeper Understanding
New insights arise when questions are asked not simply within one silo or another, but at the place where those silos intersect and relate to each other. No longer do decision makers have to look under one rock and then move on to five other rocks; instead they can treat their organization’s data stores as one complete, interconnected resource.
The advantages of connected silos don’t stop there. Using accelerated analytics, it’s possible to combine internal datasets with external sources. Analysts may want to review the impact of demographic shifts on retail performance, for instance, or of weather on public safety.
Use Advanced Analytics to Tighten Datacenter Security
No one has to tell developers that cybersecurity is essential. Yet, the difficulty often lies in staying on top of the challenge—log files, for instance, can grow to billions of rows of data every day.
The ability to pull security data into an accelerated analytics environment provides instant insight into data center security. Dashboards that incorporate such capabilities allow staff to interrogate log files and other security datasets in near-real time. Furthermore, spatiotemporal analysis—the ability to combine and analyze time and location data—is becoming increasingly valuable for intrusion forensics. When professionals understand what region of the world an attack is coming from, it becomes that much easier to minimize or stop altogether.
Giving Data a Second Look
Despite the incredible rise in big data, organizations don’t always turn to their repositories to answer their hardest questions. They assume the platforms they have, or the data they possess, are too unwieldy, slow or difficult to leverage.
Organizations need to go back to the era in which there was confidence in the ability of data to answer those really hard questions. Too many teams have succumbed to asking only those questions they know data can answer, rather than asking harder questions that more advanced analytics solutions can uncover.
With the right technologies and processes in place, the questions that seemed to be impossible are no longer insurmountable. The data that resides in the organization is more capable than anyone thinks. All one has to do is have the daring to ask.