EnerNOC (Energy Network Operations Center) is an energy intelligence software provider with more than 1,300 employees in its Boston headquarters and international locations. The public firm helps global customers to efficiently consume energy. “Our mission is to change how the world uses energy, helping customers to improve profitability and visibility for better decision-making and sustainability by focusing on how they buy energy and when and how they use it,” says James Nichols, principal engineer and cloud architect at EnerNOC.
A regular user of Splunk (the company uses other DevOps tools and platforms as well, including AWS and Oracle), EnerNOC has fully automated monitoring in its perpetual drive to move past continuous delivery and on to continuous deployment.
Embedded in EnerNOC’s story of why it uses continuous development and continuous deployment models is a chapter on its move to fully automated monitoring.
EnerNOC DevOps Models
EnerNOC’s DevOps model is in constant transition toward full automation. Currently, it has fully automated deployment, fully automated monitoring and semi-automated quality assurance (QA), which pull the company’s continuous development approaches.
A change to “builds on commit” of smaller applications within the legacy software monolith at EnerNOC came hand in hand with further automation, targeting what the company calls “trustworthy continuous deployment,” Nichols says. That was the crux of EnerNOC extending itself into deployment automation.
Wherever the model is not fully automated and continuous in deployment, it remains continuous delivery. “Most of the variance between the approaches is due to aspects of the delivery pipeline that still have manual processes; these parts are usually comprised of the remaining manual testing procedures,” says Nichols.
Change Control Necessitates Two Models
EnerNOC develops and maintains mission-critical software that must pass through a change management approval process. This process includes communications within and across the company and customers, as well as user acceptance testing. This is where the manual testing comes in. Only at the end of this approval pipeline can EnerNOC pass the software baton into the hands of production.
Due to EnerNOC’s tightly yoked systems and the manual testing, change control remains, along with a mix of continuous development and semi-automation with fully automated continuous deployment. “DevOps is a cultural movement as well as a technical one, and so the existing model could evolve with the organization’s quality and maturity evolutions,” says Nichols. If the manual testing and the relationships between systems give way to more fully automated testing and more amenable relationships, QA could more closely embrace full automation.
Monitoring Challenges/Achievements
EnerNOC’s application diversity includes hybrid software with a foothold in the Amazon cloud, in EnerNOC’s data centers and on-premises. “We use three tiers on top of Oracle in the data center. In AWS, we use a combination of big data, cloud-friendly and serverless technologies,” says Nichols. Business transaction data traverses many machines that belong to service groups’ stakeholders. “The distributed, decentralized architecture makes troubleshooting, performance management, security operations and production operations challenging,” notes Nichols. Better monitoring was the answer.
EnerNOC wanted to offer nontechnical users easy access to systems-level monitoring data with transparency into the many systems through a single pane-of-glass dashboard with readily available system notices and troubleshooting abilities. “We would use Splunk to do this and then feedback everything we learned from using Splunk into driving our DevOps and engineering processes into the future,” he says.
The EnerNOC DevOps performance team built the dashboard that accompanied the Splunk service, ensuring it would register the performance metrics that are of value to the organization. The performance, product management and development teams maintain an awareness of the metrics and receive the related alerts. “The development team incorporates the ongoing metrics early in the development life cycle to improve application performance,” says Nichols.
EnerNOC first applied Splunk to web logs for operational capacity planning, then used Splunk to give the Ops team transparency into the function of its new real-time data product that was working in production. As new products came on board, the organization added them to Splunk to provide Ops teams with dashboards, at the urgent request of Ops team members.
Current Status: All Systems Good
EnerNOC applies Splunk to test and production systems for views into consistency and acceleration. “Splunk is our primary alerting infrastructure for low-level system metrics like CPU utilization as well as for complex business transactions that span multiple systems,” says Nichols. Splunk has eyes on application health, cluster state across machines, expected systems tasks and suspect or known-bad system behavior, as well as the presence or lack of any significant event.