As you may recall, in part one we uncovered 1.5 of the top three must-haves in development/IT operations project handoffs. We now know that operations really needs to know what the normal and abnormal operating conditions are for any software that development produces. We know that development needs to share the accumulated technical debt for that software as well. And we had just begun to understand the role that Big-O Notation plays. If you have not read part one, you may want to do that first. Now here we go with part two.
Here is an example of O(n). Given the goal of computing the sum of all elements in an integer array, for example, the algorithm would work like this: start with the first element, add the second, add the third, etc. In Big O Notation, this is O(n), which is linear, i.e. adding one element means executing an additional step in the algorithm.
Here is an example of O(log n). Given the goal of finding a particular value in a sorted array of integers, the binary search algorithm would work like this:
- Find the middle element and check to see whether it is the value you are looking for. If it is, return it.
- If the middle element is bigger than the searched value, continue with step 1 over the first half of the array.
- If the middle element is smaller than the searched value, continue with step 1 over the second half of the array.
In Big O notation, this is O(log n), which represents logarithmic complexity, i.e. adding one element will (usually) not increase the number of steps, but adding K elements will add one step.
There are many types of algorithms that fall in different groups of Big O notations. The general rule of thumb is:
O(log n) < O(n) < O(n log n) < O(n^2) < O(n^3)
So, if for example operations knows that the algorithmic performance of a software component is O(n) (expected linear performance), but it behaves like O(n^2) (quadratic, i.e. adding one element exponentially increases the operations needed for processing) they will realize that there is something wrong from an operational perspective.
“I would like to have that information because it helps me do capacity planning, it helps me understand dependencies and intricacies, it helps me do my job in an optimal way (no need to get that expensive 10G network controller if our current implementation is unable to hit 1Gbps),” says Vachkov.
Number Three Must-Have: The Architectural Diagrams for the Overall System and Its Components
“This is somewhat understood and followed. However, the diagrams and documentation are not usually comprehensive enough, if at all comprehensive,” says Vachkov. Many operations people have a deep understanding of the software development lifecycle. They need to know the inner-workings of software components to properly design reaction procedures, dependency checks or checklists, and run-books.
Here are some examples. In many environments, operations personnel will develop their internal procedures in the form:
If you see A -> do X
If you see A, but B -> do Y
This applies to reaction procedures, scripts, and runbooks, according to Vachkov.
Dependency lists are a bit more complex, sometimes called checklists. These have the form:
if you need to do A then:
-> check if AA
-> check if BB
-> check if CC
-> do X
“These can be complex and cover different areas, from installation of software components to deployments of new tenants in multi-tenant systems to troubleshooting and post-mortem analysis,” says Vachkov.
“When this kind of information is lacking, the software is a black box. Being a black box is bad for operations. I need to know that the software expects to find a MySQL server on the same host (because it only uses a UNIX socket to communicate with it) and I should not need to figure this out on launch day,” says Vachkov.
Extra Pointer!
“Very big Bonus Point to all developers who keep lists of all the hidden knobs and switches that operations can use in run-time to deal with a particular failure or abnormal scenario,” says Vachkov.