Large companies and monolithic applications are not usually associated with DevOps. But I recently discovered that some of the best DevOps operations have come from the largest organizations and most complex applications, and PTC is the latest example.
If you are not familiar with PTC, the company has a wide range of offerings for enterprises looking to improve their operations. Its solutions range from enterprise resource planning (ERP) to product lifecycle management (PLM) and Internet of Things (IoT) platforms as a service (PaaS). Its existing products started as on-premises solutions, and have expanded to modern solutions including IoT. But now there is a large demand to convert them to cloud-based offerings. And as PTC goes down this path, it needs to have a modern delivery chain to support it.
Enter Cloud Services
So PTC created its Cloud Services group to deliver cloud-based versions of its existing platforms to customers. Its cloud-based offerings are subject to all the same demands of any other modern software-as-a-service (SaaS) and PaaS offering—faster releases, more frequent updates and higher quality. After a few years of building the Cloud Services group, the company has proven it can automate and improve delivery of its traditionally on-prem platforms in the same fashion as poster-child DevOps environments.
The journey was not easy, and it did not start with DevOps as a goal. The goal was simply to deliver solutions to its customers. DevOps was the natural outcome following the challenges it faced. Some of the challenges were common for all development operations, but others were very unique:
- Lots of variables: PTC has to support a wide range of customers across a wide range of deployments. PTC works with its customers on their varying needs of platform versions and also understands their varying requirements around software updates. And each potential variation has a unique snowflake configuration, which, unlike most cloud platforms that offer one production version to all users, its entire ecosystem of releases is actually a wide range of configurations and versions.
- Data integrity: Data is important no matter what. But PTC works with customers that have had data in their on-premises solution for a long time, and that same data needs to make it to the cloud, securely, without a single flaw. That data is part of business-critical applications and ongoing business processes.
- Educating users: As customers continue on their journey to the cloud, PTC is there to support the changes and enable them through customer success, continuous knowledge transfers and an endless online platform of PTC educational services.
So not only does it face technical challenges, it also is facing huge change control and people challenges. Before any software delivery can begin, there is a specific part of the Cloud Services group that is responsible for the necessary development to SaaS-ify the existing applications and build a gold master for each release.
The initial delivery chain was hand-to-hand combat: one solution, one customer, one release at a time. This required a lot of warm bodies to manage manual release processes and customers.
“We could not keep on throwing people at the problem,” notes Tameem Hourani, senior director of Cloud Services. Even though PTC had established workflows via ServiceNow, each step was manual and simply not scalable. But the customers and releases were increasing at an ever-climbing rate. It simply would not be feasible to on-board new team members to handle the load.
So it turned to good practices and tools to build a delivery chain that can take care of a brand-new deployment in less than two hours, instead of days, and upgrades in less than 10 minutes. Here is what the resulting setup looks like:
PTC was not new to automation. The company had been leveraging ITIL practices for some time. But what ITIL lacked was exposure to the customer. So with the ITIL backbone and ServiceNow in place, PTC added pipeline-level orchestration called Foreman.
In this configuration, both ServiceNow and Foreman essentially act as a system of record for the entire environment. ServiceNow is the system of record for all states and workflows. It also is the base entry point for any new update or deployment. And Foreman is the system of record for infrastructure and configurations across all customers and all deployments. But it also serves as the “checklist” of all the aspects of orchestration.
To manage the infrastructure orchestration and create the templates for configurations, PTC turned to Saltstack. It originally used another tool, but the benefit of Salt is that it allows for more sequential orchestration, as opposed to another tool with a stateful-only option.
PTC uses another unique tool called Intigua. Intigua peaked my interest because I have previously talked about the overhead in both resources and management of server agents. And because so much of what is done for headless monitoring and control of servers these days is done by agents, this actually becomes a serious management problem.
“The only way to grow a modern delivery chain is tools. And most new tooling is driven by agents,” Hourani notes. “It is easy to forget that the new tooling and processes also come at a cost. Usually the cost is less than the benefit they provide, but it is there nonetheless. Management of agents seems simple enough, but [when] actually doing this at scale, things can easily get out of hand.”
PTC had four-plus unique agents across a wide range of configurations. One of Intigua’s value points is the ability to “self-heal” and make sure the agents are always running correctly.
Once it is handed an IP, Intigua installs and configures agents based on tags, policies and AWS regions. When Intigua is running, it will make sure agents are up to date, and that self-heal agents go to work should something go wrong. The ability to use Intigua as a single pane of glass in a highly variable grid of configurations is a huge benefit.
Triggered by ServiceNow, configured by Foreman, Salt and Intigua, the infrastructure and code are deployed. But to learn and maintain all environments for the customer, PTC needed some strong production monitoring tools. It uses SumoLogic, deployed as an agent via Intigua, for log analysis. For most applications, using a single instance of Sumo is fine. But PTC has to segregate an instance per customer, and the process must be fully automated. There is some overlap with another monitoring tool, Zabbix, which has been set up to attempt self-healing upon a triggered alert.
And then there’s Jenkins. While not used for the entire deployment process, it is used to drive testing and manage states during the testing process. Upgrades to production releases are done incrementally, and complete deployments are done full-stack.
Services Layers Are All the Rage
Cloud services and shared services are not necessarily new. I recently did a write-up on Wix, which has a similar approach. But the complexities associated with SaaS-ifying on-premises solutions, along with the number of variables, adds not only another layer of complexity, but also another layer of value to a repeatable delivery chain.
I am humbled by case studies like these, because it proves that those who say their environments are too complex for DevOps have yet to see what’s possible.