Developing Simple and Stable Machine Learning Models

A current challenge and debate in artificial intelligence is building simple and stable machine learning models capable of identifying patterns and even objects. Is it possible to know which models will work best or to simply see the data? It is possible, and the question is figuring out how to get there.

Simple models are popular because they are easy to understand, trustworthy and tend to be more resilient to change than complex models. Simple models also have the benefit of overcoming transfer learning pains. While everyone wants simple models, they also want accurate models. In practice, there are always transfer learning issues when applying learning from the past, into future practice.

An issue arises when trying to fit a model from a known family to the available data and hoping it will generalize effectively. It could be any proposed family: linear models, trees or trained deep neural networks.

There remains a fundamental problem with any overly complicated model—linear, tree or neural network—in the risk of “overfitting” the data resulting in poor generalization and results. At the same time, an overly simple model risks missing the nuances of the data and sacrificing accuracy. There are techniques to handle this problem.

First is to reduce the size of the set of models. This works well with the PAC model. For example, reduce the number of parameters, reduce the depth of the tree and reduce the nodes in the network. Reducing the size of the set means it’s easier to understand, and with fewer possibilities built into the model, it’s less likely to overfit the data.

Next, add L1 or L2 regularization. Models with small weight, or many zero weights, are preferable and will help the model generalize. The question is, why? Why do we use trees or neural networks rather than another hypothesis family? Why do we want models with small weights? Why do some network topologies work well?

These questions bring in the actual world we live in and the types of problems that likely need solving. There is no learning when any function is allowed, and the model family isn’t limited. It is well-known that when the family of allowed hypotheses is too large, the model overfits the data ad-absurdum and becomes useless. Thus, machine learning almost always optimizes along a well-understood family of hypotheses while human feature engineering and representation construction captures our prior knowledge about the world. The question is how to exploit such knowledge when automating machine learning. Arbitrary limiting is better than not limiting at all, but we can do better.

The world has patterns repeating themselves across many domains. Different applications will use similar building blocks to analyze rocket movement or medical sensors, or to solve a predictive maintenance task. But we are all experts in the world we live in. Even studying a domain where we don’t have specific expertise, we still recognize some patterns and rules.

For example, take a look at this ring-tailed lemur. Even if you’ve never seen one before, you are now likely to identify this type of primate if you come across one. My 5-year-old son was shown this picture and he could visit the zoo and identify a ring-tailed lemur. It only took one photo.

Compare that result to modern neural network research which typically requires many examples—often into the millions—when training to identify objects, and still they suffer from embarrassing mistakes when new data differs from the data the network was trained on.

How did my 5-year-old son learn to identify ring-tailed lemurs from a single picture? He’s an expert. Not a lemur or primate expert, but an expert in the world we live in. He understands identifying an object from its background. What a tail, eyes, ears, legs and fur look like. He knows how to imagine a 3D object from different directions. At 5, he already brought a ton of knowledge with him when learning to identify lemurs.

This is the essence of learning. We understand the notion of simplicity is not in number of nodes in the network or number of lines of code, but that it relates to simplicity when explaining human language.

This process can be formalized by codifying human knowledge to solve tough problems. We can formalize the many patterns seen before in various domains and in the various tools data scientists use when constructing features. Then we can look for patterns that make sense and are similar to what we’ve seen before.

We can also look for more novel patterns represented within the complex combinations of things we’ve seen before. As an industry, we need to do exactly this: Codify human knowledge in curated code libraries, and in structured knowledge and facts, use them to search for patterns in data and build highly accurate resilient models. This will bring simplicity and stability to machine learning models without sacrificing accuracy by leveraging humanity’s existing knowledge.

— Meir Maor