In 2011 I joined a company as their fifth employee. They had been in business a little over a year. They had a product. They had customers. Oh, and the other four people? They were rockstars.
Customers, however, were unhappy. The system was frequently down. Rebooting a machine was often necessary and lead to poor performance for hours.
I was hired to fix “the database problem” in a multi-tenant SaaS system. On my first day I was given a tour of the technology by the engineer. The code was immaculate, but the database had some core issues. We had the database in good working order in a couple of weeks. We lost our first customer about two weeks after that.
Why did this customer leave? Ultimately because we were not working on the problem that mattered. We created a local optima in the database and another in code quality. It turned out that there were more fundamental problems.
Achieving local optima is a false economy. (1) We needed to think outside our core skill sets. Systems thinking pointed the way.
The Reaction that Mattered
We had three months cash left. We had two months until our make-or-break event. The business closed a sale for two months operating capital! The customer was a very big brand whose needs would stress the existing software beyond its limits.
At this point the problem that mattered was immediate and clear. How do we guarantee this customer’s success? As with many things, the answer was obvious once we thought of it: We stood up a duplicate of the production software just for them.
The Patterns we Recognized
Standing up a second production environment was no easy task. Over 200 bash commands lovingly executed across five hand-crafted cloud servers. It took days to get our bespoke second system stitched together.
While it was ideal that we landed such a big contract, the truth of the matter was the business was not confident in our product’s technology. For each potential sale after landing the big one, I would get a call, “Are you sure we can handle this next customer?” The answer was already apparent. We, the tech team, were not confident either.
Changing the Software
We needed to improve the software so one customer’s activity had no possibility of impacting performance for another. This would localize problems to a single customer and increase the business’ confidence. We needed to do it quick. We needed to do it with a single engineer and DBA.
Each customer would get their own full copy (share nothing) of the software. Deployment of the software would be reduced to the simplicity of stamping out parts in a factory. With this model we rewrote the system and closed three new customers on it for our big event.
Moving to this “share nothing” paradigm is likely the single biggest contributor to the company staying in business, thriving and eventually being acquired. The confidence of the sales force increased dramatically. Sales increased dramatically. Our runway moved out past the six month mark.
Why Share Nothing
We recognized our system was a B2B offering, not B2C. This a type of pivot know as a business architecture pivot. (2)
Trying to build all of the infrastructure for a self service system that scales to unknown demand, while really cool, was not the problem we needed to be solving. Yes, there are inefficiencies in such a share nothing system. No, none of them impacted our bottom line.
For example, we didn’t care that we were wasting compute. The price of the compute was built into the product. The customer was not paying for compute. They were paying for the product and they were willing to pay so much that this cost didn’t matter.
We effectively rewrote the original system in two months. This still seems miraculous, and to some extent it is a testament to the amazing quality of the business logic code. There were, however, other consequences to the “share nothing” model that gave us a significant, if temporary, market advantage.
- We had no need to spend engineering cycles on plumbing such as user management.
- We had no need for a billing system. Human-time invoicing in a B2B world was plenty effective.
- We had no massive scale of data or load. We used familiar technologies.
- When an instance went down, only one customer called. We learned most of these things were training issues.
For each item above, the need for engineering was eliminated or dramatically decreased. This means the engineer used that time to create new features. More features means more revenue if you are building the right things. Our CEO and founder was a genius; we were building the right things.
And it Broke
Sales trusted the system. New customers were coming in. Existing customers were happy. Infrastructure costs decreased.
It was short lived. The tools and technology used to implement the repeatable, “share nothing” software were not up to the task. The success of sales was creating a new problem, though a much nicer one to have! We needed more systems thinking.
Did DevOps Fail
We did not initially choose the right tool. Is this story a DevOps failure? If you subscribe to the idea that process informs tool choice, then the answer is a surprising, “No.” The DevOps thinking was sound, we simply need to choose better tools to support the model we had chosen to implement. Technology is all about the business at the end of the day.
Part Two of this case study will highlight some of the evolution of the tools and processes we put in place that lead us to further success.
(1) The Goal – Eliyahu Goldratt, Jeff Cox
(2) The Lean Start up Eric Ries