Cloud computing is a paradigm shift in how applications are delivered, developed, and managed. Because of this, Information that was available in the IT operations infrastructure may no longer accessible or available, putting the onus on architects, operation developer and DevOps teams to build capabilities into the application itself. Following these best practices can help establish a more efficient support process.
1) Application data must be accessible
Many of today’s cloud platforms, commercial frameworks, remote data centers and multi-services component based architecture are app-centric, and will not provide rich information or ongoing profiling access to the application infrastructure. Sometimes, there isn’t even any access to the file system or other server-side resources that are required for effective troubleshooting, and the limited access that the user does receive isn’t sufficient for forensics. That holds true whether you’re using a third party IaaS, PaaS, or install your application in a remote datacenter. Therefore, it’s critically important to collect as much descriptive application data available as possible. Agent or agentless (and sometimes both) approaches should be deployed to ensure that data is always readily available, and there are numerous solutions available for that purpose. Never rely on end users to supply your data for remote installations. Failure to include operational capabilities in the development environment results in long routines and processes that would make supporting each new application deployment or upgrade too inefficient to be sustainable.
2) Data should be in a readable format
Automating data collection and understanding events can be very challenging. It’s not uncommon for remote application data to be scattered across distributed environments in multiple formats and naming conventions (that’s especially true with dynamic environments, which typify cloud platforms). Your data should be organized to support a multi-tier infrastructure with a uniform format, design pattern, and structure. That will enable you to automatically access, add, and scan data. Keep files and data sources readable, accessible, and coherent; and stay organized: multiple Web servers can be logged to a common directory. Another helpful practice is to define tags in your application classes – it will make logs much more readable.
3) Log quality must be high
Gaining insights into remote systems is a one-shot deal: you won’t have other means to access data on remote systems. Gain as much understanding on the actual content and clean as much noise as possible from log files, up front. You will need as much detail as possible, quality patterns and formats, as well as rich descriptive and diagnostic data to traverse the cloud environment’s app-centric model.
4) Maintaining and reusing knowledge
Scientists share their findings through peer-review, which advances discoveries. Devops teams likewise can save considerable time and resources by understanding which messages and events other users already have already examined. Your applications will generate a massive volume of dynamic data, so maintain knowledge on what the data means and what has been learned from it in the past. It’s most helpful to perform analysis on your data – even when the log types, sources, and data patterns are all correct. Analytics helps devops support engineers and other stakeholders to understand whether a problem is preexisting or new. Reusing knowledge also helps to classify whether an issue is related to infrastructure, coding, or is another underlying cause, much more rapidly. Always keep some form of a runbook, whether it’s an automated commercial tool or homegrown.
5) Proactive troubleshooting
In today’s 24/7 app economy, it’s more important than ever to quickly address errors and anomalies before end users realize that there are problems. Try to anticipate the unexpected, and know when new releases or deployment happen – the problems may be coming from the remote infrastructure that you don’t support or have control over. Make appropriate use of application tagging and pre-fault analytics (application support shouldn’t only be post mortem). Above all, kill your darlings: manual processes no longer have a place in IT operations
Haim Koshchitzky is the Founder and CEO of XpoLog and has over 20 years of experience in complex technology development and software architecture. Prior to XpoLog, he spent several years as the tech lead for Mercury Interactive (acquired by HP) and other startups. He has a passion for data analytics and technology, and is also an avid marathon runner and Judo black belt.