How Spinnaker fits into the Continuous Delivery puzzle

Whenever a new tool appears in the IT landscape, people have the natural tendency to compare it to existing solutions. This is understandable as we always build on existing knowledge and try to “fit” new tools into what we already know.

Comparing new things to old things is also great for elevator pitches. Product X comes out, and people describe it as “a direct competitor to product Y but with certain improvements in area Z”. This allows people to understand the scope and goal of Product X in a single sentence provided they are already familiar with its direct competitor.

Not all tools, however, are created as direct competitors to something else. Every now and then, a new tool arrives that attempts to push the envelope one step further and instead of improving an existing aspect of a workflow, it instead calls for a paradigm shift.

Spinnaker is one such tool that has no direct competitor and expects a mindset change from its users. In a world where people talk about build servers, configuration management tools, cluster orchestrators and cloud consoles, Spinnaker redefines the basic concepts in a much higher abstraction level - in a way that simply did not exist in another tool.

In this article, we will see how Spinnaker fits into the the Continuous Delivery puzzle and why it is different than all the tools before it. If on the other hand you insist on direct comparisons, feel free to consult the Spinnaker versus other solutions article first.

The basic software life-cycle

If you look at any software application from 10,000 feet, the basic lifecycle is the following:

lifecycle

The phases are:

  1. The software must be packaged in its final binary form
  2. The binary is copied to the machine that will run it
  3. The application is executed and after it becomes healthy, it is continuously monitored
  4. Whenever a new version of the application is created, the cycle starts from the beginning

Traditionally, the phases of this cycle could take several hours or even days. In the case of physical data centers, the deploy phase could even take months if the machine responsible for the software was not ready at the time of installation.

Thankfully, cloud infrastructure has (first in the form of Virtual Machines and lately with containers) removed all barriers that arose from the use of physical machines. Creating a new machine is now possible in a manner of minutes/seconds.

This has allowed fast-moving companies to minimize the feedback time of the basic software cycle, resulting in multiple software releases every day/hour, when traditional releases would happen per week or per month.

Deploying an application very often is a huge advantage for the business owners of the application because it means that new features reach the application users in the most timely manner. Unfortunately, the frequent releases also strain the development and operations teams responsible for the application. The previous picture is too abstract, and a better look at what happens in reality is shown below:

lifecycle-parallel

There is not a “single” application cycle anymore. At any given point in time, there are at least 3 versions of the application (there can be more depending on the company):

  • The application that is already running in production and must stay healthy at all costs
  • The application that is waiting to be deployed. It may even be partially deployed into a subset of machines (canaries)
  • The application that developers are working on and will soon be released

In well-disciplined companies, more versions of the application might be present at any given time. For example, another version could exist that is currently passing load testing.

Moving from the the data-center to the cloud

Minimizing the length of the application lifecycle is a constant effort. Getting new features quickly into production is the ultimate goal of every software company. We have seen in the previous section that deployments are a crucial part of the length of the cycle, making them one of most usual bottlenecks.

In fact, if we take a vertical slice of time in the application lifecycle from the previous picture, we can see that all 3 phases happen essentially at the same time. Even though each phase is for a different version, it should be clear that deployments are now an intrinsic component of the whole cycle. At any single point of time:

lifecycle-status

This new reality has caught a lot of companies off-guard. Traditionally, deployments were something that required special attention, happened infrequently, and usually required several manual steps. The problem of slow and error-prone deployments became even more apparent as companies moved their application to public and private clouds.

Companies that still treat deployments as something exceptional that needs special attention have discovered that moving to the cloud is not as easy as changing the physical machines with virtual ones. After moving to the cloud, several companies discovered that deployments are now the bottleneck of each release cycle. Even if developers can implement features as fast as possible, problematic deployments will always prevent those features from reaching the application users.

Tools that help the development part of the cycle, can still work just fine in a cloud environment. Build servers and build systems that are responsible for preparing the application binary are still applicable even when the application is destined for the cloud.

Deploying the application on the other hand is a completely different problem and all techniques that deal with physical servers are mostly obsolete when it comes to virtual machines.

It was obvious that a new solution was needed when it came to cloud deployments.

deployment-tools

For the compilation phase, developers can still use the platforms they know and love. For example, Jenkins is a very powerful build server that is responsible for compiling and packaging the application binary regardless of the final target of the application.

In the case of public clouds, Amazon, Google, Microsoft, etc. already offer cloud consoles that show the status of each virtual machine. Kubernetes clusters come with their own handy dashboard that allows everybody to quickly glance at the status of each machine.

But what happens in the middle? How is the application actually deployed to the cloud?

Deployment scripts - the dark ages of cloud delivery

When it comes to cloud deployments, most companies followed the path of least resistance. They just extended their existing tools to handle the new paradigm of virtual machines:

jenkins-scripts

The picture above shows one of the classic pattern of cloud adoption within companies that had already physical servers.

  • Jenkins is still used for compilation, but it is now extended with extra jobs that handle deployment
  • Configuration management tools (e.g. Puppet and Chef) are now tasked with application deployment
  • Custom scripts serve as glue code that brings the end result of Jenkins to what cloud APIs expect

This situation is the result of “patching” existing tools with the new requirements of cloud development. Even though this technique can work in a limited manner for small teams, it quickly shows its shortcomings when it comes to big companies with a large number of applications.

Jenkins was never designed with deployment capabilities in mind. Its basic construct is a job and nothing else. It knows nothing about environments or deployments. Mixing compilation and deployment jobs quickly becomes a mess. Jenkins is used as an example here, but any traditional build server suffers from the same issues.

Puppet, Chef and similar tools were designed for system administration and not application deployment. They are perfect for setting up a machine and changing its configuration to a new state. Attempting to use them for application deployment is a losing battle. Their fire-and-forget nature is a huge obstacle when it comes to safe deployments. Rolling back a release is a nightmare if the main application deployment fails. Configuration drift is a constant problem and can ruin deployments in the most unpredictable manner.

The truth is that custom deployments scripts are essentially abusing all the existing tools (e.g. Jenkins and Chef/Puppet). They were never designed for cloud deployment in the first place. Any big company that tries this pattern will quickly see the limitations:

  • Deployments are stressful because rollbacks are hard
  • The custom deployment scripts are handled by a specialized team with tribal knowledge
  • Debugging a failing deployment is very hard due to all the moving parts
  • Rollbacks are next to impossible and require heavy manual intervention
  • Unplanned downtime is constantly happening either because of bad releases or unsuccessful rollbacks

Instead of writing custom deployment scripts, a better solution would be to adopt a cloud-native tool specifically designed to handle deployments and rollbacks. Enter Spinnaker!

Embracing immutable infrastructure with Spinnaker

We have seen in the previous section the problems that appear when a company attempts to adapt datacenter tools for cloud deployments.

The main issue here, is that most existing tools are centered around the concept of mutable infrastructure. A physical server is used to deploy the initial version of an application and any new version that appears is layered on top of the existing server filesystem.

This solution might work well with physical servers, but with a cloud server there is an alternative option. We can instead create a completely new server on-the-fly when a new application version is created. This makes deployments (redirect traffic from old server to new server), canaries (keep both old and new server running) and rollbacks (destroy new server) very easy. Rollbacks in particular can now happen in a completely automated manner without any human intervention.

The benefits of immutable infrastructure are explained with more details in our immutable infrastructure post.

Once we understand why immutable infrastructure is the way forward, it becomes apparent why adapting existing tools is problematic when it comes to cloud deployments.

Most (if not all) configuration management tools are designed for mutable infrastructure. This makes them a bad fit for cloud deployments that want to use immutable infrastructure.

immutable

The truth is that all configuration management tools deploy applications in a destructive manner. They take an existing server, perform several changes and leave it in a different state. The old application is gone. For ever. A rollback is the reverse process of bringing that state back. This means that if there are a hundred things that can go wrong with a deployment, there are also a hundred things that can go wrong with the rollback itself.

A quick note about Terraform

Terraform is one of the newer tools that has gained a lot of traction lately for deployment automation. Even though at first glance terraform has adopted immutable infrastructure, in reality terraform is a low-level solution destined towards the network constructs (e.g. load balancers) instead of application deployment. It does not support any kind of red/black deployment or canary states.

Netflix saw this need for a dedicated tool to handle cloud deployments in a native manner. Spinnaker is the first and currently the only tool that fully embraces immutable infrastructure.

spinnaker-logo

Spinnaker was used internally by Netflix and was recently open-sourced as a community project. It is already used in production by major companies and has contributors from major cloud providers. Spinnaker also works great with Kubernetes clusters (support was added by Google itself).

Cloud deployments with Spinnaker

We have already described the basic software lifecycle at the beginning of this article. Let’s see now what Spinnaker does differently for each phase.

First of all, for compilation/packaging Spinnaker just delegates to Jenkins. Spinnaker is not a build server and does not want to be a build server. Using the standard Jenkins API, Spinnaker can start Jenkins jobs, monitor their progress and obtain their results.

The difference here is the inversion of control. Spinnaker is controlling the main pipeline and Jenkins is just one of the build steps. This keeps the scope of Jenkins contained at what it is doing best - compiling code. All compilation jobs still stay with Jenkins, but the deployment responsibility stays with Spinnaker.

spinnaker-and-jenkins

This pattern comes in contrast with the custom Jenkins script described in the previous section, where Jenkins is controlling the build and has special jobs for application deployment.

With the compilation phase out of the way, the next phase is deployment. This is the part where Spinnaker really shines. Spinnaker has a built-in support for basic (rolling deploys) and advanced deployments (red/black). This support is created in a standardized way without the need of special code or custom glue scripts.

red-black

This means that by using Spinnaker you're going to get for free the deployment patterns used by Netflix that allow for minimal (even zero) downtime when a new application is released.

With immutable infrastructure, rollbacks are also very easy. Again Spinnaker has native support for rollbacks. Rollbacks in Spinnaker are literally a single button push.

rollback3

The capability to rollback a cluster is so easy, because Spinnaker has high-level knowledge for every cloud environment. Clusters, load balancers and security groups are natively modeled in Spinnaker.

The end result is a build pipeline within Spinnaker that is much more rich than a pure Jenkins pipeline. Creating a cluster, deploying and resizing are native steps instead of custom Jenkins jobs that call external scripts. All the familiar capabilities (e.g. manual approval, parallel steps) that developers expect from pipelines are also offered by Spinnaker.

pipe9

Spinnaker is using the “infrastructure-as-code” concept too. All pipelines are also represented by yml files. The UI is completely optional and the underlying API that runs behind the scenes is fully open for everybody to call (if such flexibility is needed).

The final step is of course to make sure that the application is actually running once deployed. No guesswork is needed here. Spinnaker has native support for the API of all major cloud providers (Google, Azure, Amazon, Openstack, etc.) and can query the status of clusters from the same interface used for deployment.

cloud-status

Therefore it is no longer necessary to visit the cloud console of your cloud provider to see what is happening. You can even resize, delete or even create completely new clusters from Spinnaker itself.

It should be clear now that Spinnaker can handle all phase of cloud development with a single platform.

spinnaker-one-stop-shop

Here's a great 'Spinnaker 101' video by Google:

Spinnaker and Kubernetes

The icing on the cake is that Spinnaker has native support for Kubernetes. Kubernetes support was added by Google itself and it is an integral part of the basic Spinnaker distribution.

Spinnaker handles Kubernetes like any other cloud provider. It can read the health of Kubernetes cluster and perform deployments using the same calls as any other external Kubernetes tool. In fact, Spinnaker augments the basic Kubernetes deployment capabilities by offering deployment strategies not offered in Kubernetes yet.

For a detailed description on Kubernetes support, see the interview of the people that actually implemented it.

Learn More