Continuous test and monitoring for SDN and NFV

The prior blog Continuous Test Monitoring DevOps Healthbeat discussed the importance and suggested practices for testing and monitoring test trends. This need is amplified when the system that is being tested and monitored is a network of co-operating applications, infrastructures and devices such as networked components of next generation Software Defined Networks (SDN) and Network Function Virtualization (NFV) systems. In networked systems a problem with any network component such as end-user applications, network interface devices, network control plane, network data plane, load balancers, firewalls, network administration, orchestration, and network virtual infrastructures can seriously impact user and network services performance. The number, variety, topology and versions of components and their host environments present special challenges for DevOps continuous testing and monitoring.
In an SDN and NFV world all production topologies and network variations must be tested and results must be monitored or the results will be uncertain. Despite the complexity, or perhaps because of it, continuous testing and monitoring must be optimized to ensure that testing and results analysis for each CI/CT cycle keep pace with accelerated continuous testing and longer term continuous test monitoring is necessary to assure overall delivery confidence for each release and longer term product performance trend in a positive direction. Individual test results or a fixed number of test campaigns can only tell you so much.

Unless the results of testing and test analysis are continuously monitored and results aggregated over multiple test and release cycles for all network nodes and all topologies then there is no way to build sufficient sustainable confidence in the long term health of the SDN network and NFV components. The combination of topology-aware continuous testing solutions together with continuous test monitoring tools that can aggregate results over the entire set of network resource can provide a longer term strategic view of test results that is necessary to collect, aggregate and organize test results data to gain confidence in network components and services for each release and evolution over multiple releases.

Below are some suggestions in a checklist format that have proven useful for continuous testing and monitoring for sustainable high performance SDN and NFV systems and services.

Determine continuous testing and monitoring priorities: Some examples of problems that continuous testing and monitoring can help with include intermittent failures caused by the virtual network infrastructures, virtual network functions, network administration, orchestration and management, 3rd party network and end-user systems compatibility, network performance, network fault-tolerance features. The best practice for continuous testing and monitoring indicates that the problems of most concern to a specific network system or service will be tested and result trends will be monitored.
Regression test network systems and services even though there were no expected changes: Unintended consequences of indirect changes may impact performance so automated regression suites should audit all production network test topologies and version variations areas occasionally just to be sure. Typical examples that are often caught by this are features important to audit functions, system backwards compatibility features and upgrade/downgrade administration features.
Select continuous test and monitoring tools that collect and report trends:
Tools that can correlate and report test results across multiple network nodes and topology variations can find intermittent bugs or problem trends. Thresholds and email alerts and dashboards that highlight short term results views from long term results views are especially valuable.
Use continuous test monitoring results to diagnose and resolve problems:
Intermittent failures and problems that only become apparent when test results are presented as trends over a long term trend are easier to diagnose when a large data set is accumulated and filtered by most probable cause tags. Once a diagnosis is determined, the root cause can be verified through targeted retest cases that set all the conditions in accordance with the diagnosis. Once confirmed then the offending design can be refactored to handle or avoid the failure condition.

The above is a partial list of suggestions for continuous test monitoring that have been proven to support sustainable SDN and NFV network systems and services. At Spirent and Zephyr we think continuous testing and monitoring is critical for successful DevOps. You can hear more about our views on this topic by reviewing our joint webinar at:
Overcoming DevOps Challenges

What do you think of these suggestions and do you have others that should be mentioned?

About The Authors ⁄ Marc Hornbeek & Sanjay Zalavadia

Marc Hornbeek is Sr. Solutions Architect of DevOps continuous test solutions at Spirent Communications, Infrastructure Test Optimization (ITO) BU. He recently managed DevOps at Spirent. He has performed as the primary architect of test automation tools and champion of test automation for firms ranging from start-ups to large multi-national companies. He published more than 30 articles and has been a speaker at numerous conferences and user forums primarily regarding topics related to continuous automated testing and DevOps.

Sanjay Zalavadia is responsible for driving custom success at Zephyr. This includes training, consulting, customer support and client management. Most recently, as the Associate Vice President for Patni Computers Telecoms IT Managed Services Practice, he established IT Operations teams supporting mobile content providers. Sanjay brings more than 15 years of leadership experience in IT and Technical Support Services teams across multiple geographies for both large and small companies. Sanjay has a graduate degree from the Manipal Institute of Technology in India