How TRI is using DevOps automation to drive its research and engineering
Toyota Research Institute (TRI) is working to build a future where everyone has the freedom to move, engage and explore with a focus on reducing vehicle collisions, injuries and fatalities. Working tirelessly behind the scenes is TRI’s Infrastructure Engineering team, responsible for designing, deploying and maintaining the infrastructure that makes this possible.
I recently sat down with Mike Garrison, the technical lead for Infrastructure Engineering at TRI, to talk about DevOps automation and cloud-based deep learning. (Full disclosure: TRI is a customer of Flux7)
Note: Mike Garrison and Adrien Gaidon, Machine Learning Lead at Toyota Research Institute, will share more about their experience with Distributed Deep Learning & High Resolution Driving Data at this week’s AWS re:Invent.
Can you start by giving us a broad idea of the project you’re working on at TRI?
Our mission at TRI is to improve the quality of human life through advances in artificial intelligence, automated driving and robotics. The Infrastructure Engineering team supports the researchers and engineers at TRI by making it easier for them to utilize the power of the cloud in a secure, automated and reliable fashion while not slowing them down. We see ourselves as partners who can help reduce the time and burden, so they can focus more on their deliverables.
Have you evolved as a team to enable the rapid innovation that this kind of project needs?
Yes. The biggest change we’re making is an evolution to embrace more DevOps ideas, methods and processes.  Automation is a core tenet of DevOps. We are applying that in two ways. First, to reduce tactical, manual IT operations activities; second, to make it faster and easier for our researchers to test their ideas.
For example, we are creating a self-service portal that will allow researchers to easily and quickly provision the AWS assets they need to test new ideas. This helps researchers become more productive as they won’t wait for the infrastructure team to spin up resources. Having a secure cloud sandbox environment enables them to try new ideas, fail fast, destroy the sandbox if needed, and start over. It’ll enable our researchers to innovate at velocity and at scale. Modern cloud infrastructure and DevOps automation is empowering us to quickly remove any barriers that get in the way, allowing the team to do their best work, advance research quickly, push boundaries and transform the industry.
How are you applying the latest technology advancements to help produce cars that are safer, more accessible and more environmentally friendly?
The process of developing autonomous vehicles is data intensive. At TRI we log a massive amount of data—terabytes of data per car per day—from our fleet of test vehicles. Researchers analyze the data and continuously refine machine learning algorithms to ensure the most accurate outcomes possible across a myriad of driving scenarios. Researchers train new models and update test cars for new test runs. We’re also using simulation to virtually recreate traffic situations, adding adverse conditions such as bad weather and difficult traffic patterns. It’s a process of continuous improvement that simply couldn’t be achieved without the high compute power of the cloud.
Supporting this high-powered data analysis is a cloud infrastructure that has been optimized to minimize the amount of time researchers take in this gather, learn, tune and test process. For example, at TRI, we have two driving technology modes—Guardian and Chauffeur. Guardian is designed to intervene only when a driver needs help avoiding an accident, whereas Chauffeur is meant as a truly autonomous vehicle. We developed a single technology stack to support both modes.
And, using the PyTorch deep learning framework, my teammates on the machine learning side developed a series of computer vision models that monitor and control both modes as well. Working together, we are able to collect information from their deep learning models, spinning up new compute and storage resources on demand. We’ve married this elasticity with advanced management and orchestration services to ensure that collected data can be retrieved and analyzed quickly. Through advanced automation, we are able to shrink model training times so they are measured in hours—or even minutes.
What lessons have you learned along the way? Any barriers or surprises that others could learn from?
While DevOps automation requires more upfront work, it pays dividends in time savings, staff being able to focus on strategy rather than tactics, and provides logged history which can be critical for security, problem resolution and much more. If I had any words of advice for others looking to adopt DevOps automation, I would advise them: Don’t try to go back a year or two later and try to reverse engineer what was done. Start on the right foot and automate as much as you possibly can.
— Aater Suleman