One of the key tenants of the DevOps principle is that your environment has to be prepared for anything. If your shiny modern delivery chain does not support the adoption of new DevOps tools, or does not help you learn, and iterate, then it will ultimately fall into the same category of solutions as Waterfall, and will be on a clear path to Legacy.
The problem is that most of the sustainability approaches equate to slow, and end up a gate. In order to support the moving target that is modern development, teams need to implement automated analytics that give visibility and help them respond and learn without manual pull- based learning. Analytics that go far beyond server or application logging. Wix realized this six years ago, and did something very unique to address it.
I’m sure you have heard of Wix. It’s a popular tool that lets typical business owners build a web presence. Wix is no stranger to adopting modern development tools. So when I got the opportunity to talk to them more about their DevOps environment, I was not surprised to discover a radically new concept they’ve implemented. Wix has made a deliberate effort to support the key aspect of DevOps — gaining tactical insights by building a team specifically for it.
Around six years ago, Wix established a small monitoring department, with great success. In some organizations, it would seem that the NOC team is taking on the same objectives as Wix. But a typical NOC operates as a room full of IT folks focused only on red, yellow, and green light indicators of production environments — a very reactive and manual approach to monitoring. What is unique about the Wix monitoring team is not only the fact that they exist, but that they are putting a focus on the broader pipeline, not specific data. And they intend to be a service, not a specific solution. The monitoring team is responsible for standing up and sharing a library of analytics tools across their 300+ developers. You can imagine that their library has grown substantially.
“We use many monitoring tools. Something for server logs, something for APM, something for our application framework, and even internal tools. But they are static and only address specific needs,” says Mark Sonis, Wix Monitoring Architect. So in order to gain global visibility, the team decided to adopt a new tool called Anodot. Anodot is a pipeline analytics tool which recently came out of stealth mode. The objective: have a central place for all metrics and systems, and leverage a key feature Anodot has, which is the ability to correlate data into insights across all systems for full visibility.
Anodot is not the only tool that falls into this category. Moogsoft and arguably DataDog are addressing the visibility issue head-on. But Anodot’s promise is to go far beyond infrastructure and into all the systems that touch the delivery chain. “We rely heavily on BI to determine application functionality. And data is coming from everywhere in real-time. Our analysts need to review and respond immediately and cannot sift through data from each system,” adds Sonis.
Analytics tools can be confusing. There are a lot of them, in many sub-categories. An artifact that stems from this is that many teams adopt the tool that is right for their portion of the delivery chain. IT Ops adopts server monitoring, and developers adopt APM. But the big picture is often neglected, which puts the entire operation’s sustainability at risk, but also the nature of full-stack applications and deployments. The ability to correlate issues across metric pools is manual, if it happens at all. This was a blind spot for Wix, which is a serious issue when keeping up with how their product team responds to application data, and their more than 60 releases a day.
“It is not scalable to create manual alerts, thresholds, and metrics in all the disparate systems,” explains Sonis. They needed machine-learning on top of the data to produce insights. The promise of machine-learning is also not new, but I personally have found that it is highly subjective. And only in real use cases over several months can the value be demonstrated. The performance of machine-learning is also critical. What is learned in modern applications has a short shelf life, so it needs to happen in near-real time.
After the monitoring team deployed Anodot they knew there would be good adoption, but were a little surprised at how it happened. “We just published it as a service and provided all data from every application version, every server, etc. Once we did that, backend developers started using it themselves via Slack. All the alerts go into a channel, and the best part is they set it up themselves. Additionally, because the channel also has a Git integration, they can see if a new commit impacted an alert right away,” Sonis added.
In order to keep track of who gets what, Wix leveraged ZooKeeper, and a management layer they built on top of it to collect metadata, and keep track of all developers and teams. The management layer knows what data is relevant to teams, which helps automate the creation of Slack channels. So all a team member needs to do is subscribe to the group in which they already belong — which also means Sonis’ team does not have to deal with a large up-front effort to encourage and train new users.
Building a dedicated monitoring team is not common, and not required for most. In fact, with a platform like Anodot, it likely isn’t necessary at all. If an organization leads with the high level visibility concept, they will have the data they need from day one, without manually rallying all sources.
Wix is a technical company that is not afraid of new technology. But they are also not brand new. What has been impressive about their adoption of DevOps all along is that they’ve kept pace with new tools and practices without distributing what they do (for example, making a huge shift from a monolithic application to one comprised of micro-services). And they’ve sustained their modern delivery chain through a deliberate effort to monitor and learn from their pipeline.