The Role of DevOps in Building an AI-Ready Infrastructure

The successful application of machine learning for artificial intelligence (AI) requires several contexts to be addressed, primarily the technical, data and business considerations.

The Technical

From a technical standpoint, in building an AI-ready framework there are choices to be made such as which machine learning framework to select, the type and structure of learning function and the style of training algorithm to use. These decisions are normally based on the skill, experience and judgment of the computer scientist. A practitioner should be able to coordinate these aspects and focus on business outcomes without overly burdening the line of business with technical detail. The exception to this is when core design decisions in some way will impact business outcome.

The Data

Equally as critical to the efficacy of machine learning is the data available for use in development and training. A deployed algorithm is a function of the data inputs it received during training and testing; these inputs should be consistent and representative of the scenarios that will be subject to inference by the algorithm. It is difficult to accommodate so-called black swan events, as typically these require a level of adaptiveness beyond present machine learning technology.

However, we can still hope to achieve an effective and enriched algorithm that is appropriately responsive to real-world events by collecting and building data sets and training libraries that are truly representative of the environment in which the algorithm will operate. Ideally, machine learning data should include as many data outliers as may occur naturally within the real world—we should not be improving or self-selecting data, though granted most applications will use some form of normalization.

Speaking to data scientists and machine learning specialists, it is clear as individuals they have the technical skills and access to the core technology they need to do their jobs. However, more often than not it is access to and availability of real-world, suitably random data that is one of the biggest and most time-consuming challenges. Addressing this requires good engagement with the line of business, as access to sensitive data requires negotiation. But as matter of course, any business that utilizes data analysts to help inform critical business decisions and wants to benefit from machine learning should take the necessary steps to ensure it offers accessible and open data services to their machine learning teams. Having an open, accessible and agile data architecture is an essential prerequisite of being able to deliver an AI/machine learning-ready infrastructure.

The Business

On the business side, rather than address the how and what questions of technology and data, we also need to ask why. Certainly, as with any IT project, business does not want to engage machine learning or AI for its own sake. It is important to understand the business goals and seek new technology that will align with those objectives. This is where the application of DevOps techniques can greatly improve business outcome.

By aligning software development with software operations, more emphasis can be placed on what is important to the business while also helping to support the effectiveness and usability of new machine learning functions. In their deployment and use, these new applications should also exhibit many of efficiency features of modern IT technology. Thus, we expect new deployments to be able to make effective and efficient use of container technology. And where a particular function is infrequently required, how much more efficient is it to use a serverless function, perhaps with the data resident on private storage, and the function being called on demand in the cloud.

This type of capability is now readily available to machine learning thanks to the advent of virtualized GPUs provided by a choice of cloud services. This ability to dynamically and adaptively use infrastructure aligns well with the capabilities DevOps practitioners, who for a long now time have been used to managing the deployment of continuously developed releases of code. This agility, driven by the need for adaptability and innovation is very much what is required for successful machine learning and AI. But underpinning all of this is the need to maintain the conversation with the business around the open, yet appropriate accessibility to data.

— Matt Watts