Aspects of Machine Learning on the Edge

Machine learning (ML) is hard. Making it work within the confined environment of an embedded device can easily become a quagmire unless we consider, and frequently revisit, the design and deployment aspects crucially affected by ML requirements. A bit of upfront planning makes the difference between project success and failure.

For this article, our focus is on building commercial-grade applications with significant, or even dominant, ML components. Edge devices, especially ML enabled ones, don’t operate in isolation; they form just one element of a complex automated pipeline.

You have a device, or better yet, an idea for one which will perform complex analytics, usually in something close to real time and deliver results as network traffic, user data displays, machine control or all three. The earlier you are in the design process, the better positioned you’ll be to adjust your hardware and software stack to match the ML requirements. The available tools (especially at the edge) are neither mature, nor general purpose. The more flexible you are, the better your odds of building a viable product.

Let’s start by describing a hypothetical device and we’ll work through some ML considerations of the design. As we discuss the design, we’ll visit and revisit DevOps automations that go hand in hand with these other engineering processes.

A Smart Security Camera

For our design, let’s look at a networked security camera. As an IoT device, it’s continually connected to the internet. We’ll assume our device will have at least 4GB of SDRAM, a 64-bit ARM CPU and run an embedded Linux that supports an Anaconda Python distribution, OpenCV, DLIB and TensorFlow.

Our ML related goals are first, recording and labeling interesting frames, and second, alerting security personnel to suspicious activity. We are constrained by various physical, environmental and cost factors. To make the best use of the available data, we’ll need to use ML to examine and classify multiple objects from every image frame. We’ll use a cloud service to handle the second goal, so here, ML on the edge concerns image recognition. How should we proceed?

Process the Images in the Cloud?

We’ll need to recognize various objects in the frames, such as people, faces, vehicles, etc. Each object set requires execution (inference) of an ML model, which produces a set of bounded, labeled objects. A typical camera will record about 30 frames per second. Couldn’t our camera just send that data to a cloud provider? Ignoring other considerations, this is about 2.5 million images/day; even at a deep discount this would be billed at about $1000 a day, and that’s for each recognition model applied. Clearly, we’ll need to make other choices.

Let’s start by examining the raw input stream. Each raw 720p (standard HD 1280x720px) frame uses about 5MB, so if we were to send 30 frames/sec over the network we would need an incredible 1.5Gbps connection (about a terabyte every two hours). For full HD and higher, multiply this by four to 10 times. We will not be sending raw, uncompressed video. Our problem is our ML models will only work against individual image frames. Where should we make our trade-offs?

Video produces vast amounts of data, but very little information in a given frame is new; this is why various compression techniques work extremely well, and raw data reductions of 100-1000x is typical. These reduced rates of 15 down to 1.5 Mbps is fairly reasonable, so perhaps we can do our ML inference using cloud services, as long as we host our own machine instances rather than paying for ML services. Depending on our workloads, we can expect to process 24×7 video for between $50 and $200 per month.

Having decided to host our ML models in the cloud, we now need to manage a corresponding set of DevOps considerations. Do we have enough compute power to manage the service? How will we scale the workload as the number of edge machines increases? What are the storage and data retention policies? And of course, since we’re dealing with images of people, how will we manage privacy concerns and the associated legal and political ramifications.

What should bias us toward more cloud processing?

Our initial development budget is more limited.
Our ML models have not been proven effective in limited processing/memory environments.
We need the flexibility to rapidly change the models we use for processing.
We know the network connectivity is robust and offline operation is not required.
Our budget for cloud services can significantly exceed the cost of our edge hardware.

The choice between processing in the cloud, or on the edge is not a binary one. Designs can massively benefit from even limited edge computing; in some cases, simple detections (motion/vehicle/human/face) can be implemented on the edge to reduce cloud workloads by orders of magnitude with no loss in utility.

Can We Do Most of the Work on the Edge?

If we’re using Python with C library acceleration on our quad core ARM, simple ML models such as face detection (not recognition) may handle 10 to 20 frames/second, but 30/fps on anything like a full sized image will not be possible. Deeper, more complex models won’t fit into our limited RAM, and if they did fit, we could see inference times exceed a second per frame. We’re now faced with the realities of life on the edge: trade-offs.

Once we’ve committed to placing our models into the edge device, we’ll need continuous integration (CI) and deployment services to support our product. These will let us train and retrain our models in the cloud, automate the process of validating improvements and produce the embedded code that updates the models in our target devices.

Since we initially used models well suited to GPU based servers, can we just add a GPU to our device? In a word, yes, but with some very real costs. Adopting an alternate SoM with GPU support will provide a magnitude of improvement in inference performance, but will double or treble the unit cost and power consumption will increase. Such changes may incur significant non-recurring engineering as adjustments, rewrites or retraining of the model will be required to take advantage of the new hardware. Non-recurring engineering is a one shot cost associated with a change to a part, assembly or system.

In some cases we can plug an Intel Movidius Neural Compute Stick (NCS) into a spare USB port and see 5x to 10x boost in per image rates. These are supported by the Intel OpenVINO toolkit, but there will be some development time to get things running and not every model can be made to work.

The next tool in the box is Field Programmable Gate Arrays (FPGA) development. It’s a chore to incorporate these into our board design, but performance can be equal or better than GPUs (on some models), and power consumption is lower. The Intel OpenVINO Toolkit supports Arria and Stratix FPGAs for DNNs. Pricing can be prohibitive.

Finally, if our device will be produced in large quantities, it can be worth a few million dollars of NRE to build an Application Specific Integrated Circuit (ASIC). The sunk cost of ASIC design can be huge, but the lower unit costs will make up for it. We’ll go this route only once our design has been proven, locked down and we are sure the market can take the volume needed to break even on these engineering costs. I should note architectural improvements in dataflow processing of DNNs in ASICs, such as Vivienne Sze’s work, show promise for dramatic improvements in weight, power consumption and cost. If you are planning for a longer term, don’t rule out these developments.

What would make our edge processing easier?

Start with low footprint models such as TinyYOLO, MobileNet or SqueezeNet.
Accept lower frame rates for image processing in our design.
Evaluate the ML models with the hardware specifics and tool sets in mind, up front.
Provide automated DevOps services to update models in your devices.

Building our ML models to run on edge devices will involve significant development costs. Tool chains do not install seamlessly, and it’s common to spend days to weeks tweaking the environments. Development and testing on these platforms tend to be slower and involve more manual steps than we’re used to in our DevOps environments.

Running models on the edge will increase our DevOps complexity. Our cloud workload will be lower, but now we’ll include cross compilation, embedded code builds, download management of larger images and various additional test and validation stages. These will be on top of the training, load balancing, data retention, privacy and security issues we need to address for a cloud-only system.

Conclusions

What else will affect our choices here? The nature of the inputs can really matter. If our camera visualizes traffic on a busy roadway, or people in an airport terminal, we can expect almost every frame will contain something of interest; if we’re monitoring the bottom floor of a parking garage–not so much.

To recap:

Cloud services for video rate ML are cost prohibitive.
Integrate DevOps early to complete your product support system.
Understand and characterize your input data by volume, scale and importance.
Recognize not every model can execute on edge hardware.
You have more choices in edge hardware than you thought.

Although there are still challenges to building more intelligence into devices, it’s important to realize every year these constraints are fewer, the tools get better and the devices become more powerful. A compelling argument can be made that most data processing should occur as close to its source as possible.

— John Fogarty