Released in March, 2018, Accelerate: The Science of Lean Software and DevOps -- Building and Scaling High Performing Technology Organizations by Nicole Forsgren, Jez Humble, and Gene Kim is a breakdown of how to optimize your organization for running in the 24/7 uptime world that is required for Continuous Integration/Continuous Deployment software development. I recently listened to the a16z podcast called "[Feedback Loops: Company Change, Culture, and DevOps]"(https://soundcloud.com/a16z/devops-org-change-software-performance) where Fosgren and Humble discussed their findings and why DevOps is so essential.
The three authors are heavyweights in the DevOps space, combining decades of knowledge about how companies make microservices work. Or not work, as the case may be. Accelerate is based on research involving 23,000 data points across a wide range of industries, and all across the world, although the majority of respondents are from North America. The book goes into detail about the surveys used and reasoning behind it, if you want to geek out on that sort of detail. For this article, let’s say the conclusions discussed here are backed up by a rigorous scientific method.
What is DevOps and Why is it Even Necessary?
Simply put, DevOps bookends of the software development lifecycle process. It starts with development and ends in operations, but it touches everything in between.
"The ability to develop and deliver software with speed and stability drives organizational performance," said Fosgren. “It drives things like profitability, productivity, and market share.”
This increase in speed and stability is a massive shift. We’ve been hearing “IT doesn’t matter” over the last 30-40 years, backed up by research showing that tech didn’t drive performance or ROI. That was then.
"Before, tech was on-prem plug and play," said Humble. Everyone buys it and plugs it in, so it simply creates a level playing field and doesn’t help deliver value. “IT was a point of parity, not a point of distinction.”
"However, in today’s continuous deployment world where companies have the in-house capability to improve your product, technology becomes a differentiator," said Humble. Your ability to upgrade your product and make it available round the clock around the globe sets you apart from your competitors. Leveraging tech to drive value makes you a tech company, no matter what your end product is.
Also, the research in Accelerate shows it adds to your bottom line.
“The companies whom I worry about,” said Fosgren, “are the companies who insist ‘but I’m not a tech company.’ Insisting you’re not a tech company leads to failure & quickly make you irrelevant.”
Agile is Out
"DevOps was born out of necessity because Agile doesn’t work for microservices," Humble said. "It simply doesn’t scale."
The Agile process was built to make software development faster. Which it did and was, in fact, an improvement in the old waterfall method. However, in the world of continuous deployment, most of an engineer’s tasks are related to maintenance of the system and its diverse services. Agile does not work for maintenance, upgrading, and fixes.
New Rule 1: “Day 1 is short, Day 2 is long”
Moving to microservices and continuous delivery requires a considerable shift in how companies think about their systems and processes. It’s a lot more than speeding up your old software delivery process.
In the waterfall world, you have a plan, you iterate on the plan, and then you’re done. "Day one is software creation, and Day two is deployment," explained Fosgren.
With waterfall or Agile, these "days" were somewhat similar in length because there was an endpoint. The software was delivered then you were done and went on to the next project. Any bug fixes were done with the next release cycle, which typically happened annually or a few times a year.
Now, Day one is short, and Day two, system maintenance, is extended. More accurately, day two is never-ending. With continuous delivery, there is never an actual endpoint. "There are many milestones along the way, but never an actual end," said Fosgren.
Fortunately, the development pieces don’t change with DevOps. Fosgren said, “You still need development, you still need to test, you still need QA, you still need operations, you still need to deal with technical debt, you still need to deal with re-architecting monolithic code bases. What this enables you to do is find the problems quickly and enables you to move forward.”
New Rule 2: Speed and Stability Go Hand-in-Hand
“If you take one thing away from DevOps take away this: High performing companies, don’t make those tradeoffs,” said Humble. They’re moving fast and making more stable, more high-quality systems. Facebook leadership is famous for saying “move fast and break things,” and the industry categorizes these things as dichotomies, as tradeoffs. Unfortunately, that is a false dichotomy.
“The capabilities that enable high performance in one field, if done right, enable it in other fields,” Humble said. “If you use version control for software, you should be using that for production infrastructure. If there’s a problem in production, we can reproduce the state of the production environment in a disaster recovery scenario that is predictable and repeatable.”
“Toyota didn’t win by making shitty cars faster; they won by making high-quality cars faster with a shorter time to market.”
How Can You Tell it’s Working?
In a world where Day two is endless, how do you measure productivity?
The authors picked several metrics to determine stability and success at speed. Across these metrics, an interesting finding emerged: Slowing down does not make you more performant.
Tempo measures lead time (speed or throughput) comes from checking version control to release into production and then release frequency (how often you do it). This method is both predictable and measurable.
For stability, they chose two metrics: Mean Time to Restore (MTTR) and mean time between failures (MTBF). For MTTR, they asked, “If something goes awry, how long does it take to restore service?”
MTBF is not just the time between failures, but how often the failures occur. If you only go down once a year, but it takes three days to fix, that could severely impact your company. "Especially if one of those three days include Black Friday," said Humble.
If you are down in short bursts and a small blast radius, that is much better for your company. “When you can find a problem and fix it quickly, the customers might not even notice, then that’s fine for most companies,” said Humble.
Next up is Change Fail Rate, which measures the quality of your process. When you push a service into production, what percentage of the time do you have to fix it because you didn’t test it well enough or something else went wrong?
“In a high-quality process," explained Humble "When I do something for Nicole, she can use it instead of sending it back to me?”
Acceptance of Failure
There’s a significant shift here as well. Companies like to focus on not letting things break. “One of the paradigm shifts in moving to continuous delivery is the acceptance that systems fail,” Humble said. “The systems are complicated, and failure is inevitable.”
So the question becomes not how do we stop failure, but how fast can we fix it?
There’s much talk about how moving to microservices necessitates a change in culture, but what does that even mean? Research has answers.
Culture is transformative, explained Fosgren. “We measure this in our work, and we show that culture has a predictive effect on organizational outcomes and technological capabilities.” Take a look at how your company deals with failure and novelties.
Failure should be treated as a learning opportunity, Fosgren said. Wait, what? Since failure is inevitable, your culture needs to accept and accommodate that. “Instead of asking ‘whose fault is it?’” Fosgren suggested, “ask instead ‘what can we learn from this? What information/process was missing that led to the failure? How can we get the team members what they need to make this not happen again?’ If you’re seeing a system failure and looking for who to blame, your thinking needs an upgrade.”
Another critical aspect of needed change is how you deal with novelties. With continuous delivery, new ideas are the lifeblood of system maintenance. "In your company, are novelties crushed, or are they implemented?" Fosgren asked. Use of novelties, they found, is dependent on the strength of your teams.
Google did much research, both internally and externally, on what makes the greatest team. The number one ingredient was psychological safety leads to comfort with taking risks.
Their best advice for culture change: create teams where it’s safe to go wrong and make mistakes.
“What the data shows,” said Humble, “is that companies that do well in the performance measures we talked about outperform their low performing peers by a factor of two.”
These high performing organizations regarding speed and stability are creating feedback loops, Humble said. “What the feedback loop allows us to do is build a thin slice, a prototype of a feature, get feedback through some UX mechanism whether that's showing people the prototype and getting their feedback, whether it's running AB tests or multivariate tests in production, it's what creates these feedback loops that allow you to shift direction very fast.”
Moving to microservices and continuous delivery requires a simultaneous move to DevOps to be successful, according to the research done by this team. Humble summed it up, quoting Jeff Patten: “Minimize output, maximize outcomes."