Armory recently sponsored the EXO Cloud Summit, where we gave Amar Zumkhawala a hello world demo of Spinnaker. If you've heard of Spinnaker but never seen it in action, this is a great opportunity to see it for the first time:
Here's the Transcript:
DROdio: Okay, so this is Daniel, Isaac, and Amar. Did I get that right? Okay. We’re at the Exo Cloud Summit in Aspen. And Amar was asking about Spinnaker. So to set the stage here, you are a technologist, but you’ve never used Spinnaker. You’ve heard a little bit about it.
Amar: Heard a little bit about it. I’ve seen Kubernetes and other orchestration systems. And I’ve said, “Hey, using me as a guinea pig, show me. Start from scratch and explain to me what this is and what you guys want to offer.”
DROdio: So this will be great because it’ll be an opportunity for anybody who’s in the same shoes as you are – which is many people – to really understand what Spinnaker is. Our guide will be Isaac today, our CTO and founder. So let’s get to it, man.
Isaac: Sure. Okay, so Spinnaker is a framework or a deployment tool that is focused on immutable infrastructure. So that’s something you already know and we discussed. And because it’s immutable, Spinnaker takes an infrastructure point of view of the world. So when you land on the Spinnaker homepage, literally, you’re searching infrastructure. You’re looking at it from infrastructure point of view. It does have a notion of applications. But really, applications are just servers and logical groupings of servers. I’ll just show you the Hello World server.
DROdio: And Isaac, when you said the Spinnaker homepage, can you describe this is an implementation of Spinnaker? How would you describe when you say the Spinnaker homepage? What is it that Amar is looking at?
Isaac: Sure. This is just kind of a summary of your infrastructure. Infrastructure here is relatively small. So you’re able to see bits of data here, things that have been recently changed in production. But if you were to look at Netflix, they do 4000 deployments a day, and they’re destroying the starting-up servers 4000 times a day.
Amar: So for me to get to here, I would download Spinnaker, stand up a server and run it. And then I would start putting in applications and other infrastructure. And then it would show up on this dashboard.
Isaac: You would actually start with no infrastructure because you haven’t done a deployment. So the way that Spinnaker deploys is a very specific way. You would start with a new application.
Amar: Start with a new app. That’s my first step.
Isaac: Right. So we’ll start here. You give the application the name. Who’s the owner? Who created it? [unintelligible – 02:38] type and a few other things. One of the things are accounts. So one of the accounts we have is test account. That’s our AWS account, and then docker.io. So that’s where you push container. So I’m actually pushing to Amazon and Kubernetes with Spinnaker. And that’s a really important thing to note, that it deploys to multiple cloud targets, which is really its power. So it deploys to a bunch of different targets. And it’s flexible in that sense, so it doesn’t really lock you into a specific vendor. So you would add that application. I’ve already added the hello world one. Let’s see what that looks like now.
Amar: When we say not logged into a vendor, can you talk a little bit more about that? What kind of freedom am I getting here?
Isaac: What Spinnaker does… The way that Netflix built this originally is they wanted to abstract out the implementation for Amazon. And the reason they did that was that they ultimately wanted to get Google and Microsoft and the cloud [founder – 03:34] teams involved. So instead, what they started with is an interface and a specification of this is how we’re going to deploy to the cloud, no specific implementation. And then they built a specific implementation for Amazon. And then they ushered in Google, Microsoft, and Pivotal to build in the implementation for their cloud services. So when you are deploying, you’re deploying to an interface. And you choose a specific implementation of where you wanted to go. But you can literally click and change that implementation as you saw with that drop-down.
Amar: So Spinnaker already has support in relationships from a lot of different cloud providers.
Isaac: Yes, all in production, all these systems for each cloud provider being used in production. We do intend to try to release an official 1.0 release of Spinnaker here in [Q1 – 04:23]. But a lot of companies are actually already using it. And it’s been proven to work in… Netflix has been using it in production for the last three years. So there’s a ton of support.
Let me show you how a deployment pipeline works. Let’s start just configuring one. The general idea of Spinnaker is that you start off with a trigger. That trigger can be one of these options here. Or you can use an API call to kick off the pipeline. But typically, you’ll do something like Jenkins. They’re building our Travis CI support. One pipeline can kick off another pipeline or if you push into a docker registry. But the point here too is that we are not trying to replace what CI tools have already done. So Jenkins, Travis CI, CircleCI, they all do a phenomenal job of building a package, and we don’t want to replace that. What we want to do is focus on continuous delivery of that particular package. So you choose your package. You choose your job, the Jenkins job. Here’s Jenkins. This is the hello build. We’ll kick this thing off here in a second. But the idea is that this is what’s going to kick it off. When this thing is done building and running any unit test, it’ll kick it off. Then it goes into a big stage. Now for your traditional AMI or image [base – 05:48] for the cloud…
Amar: What’s cooking here.
Isaac: Yeah. So the idea here is that you take the package that was built in the previous step. So here’s our hello world. This is a Debian package that gets installed. But you can install a zip file, a [rar – 06:03] file, whatever you want.
Amar: And you have two base OSes here.
Isaac: Yeah. Those are the standard two OSes that this comes with, just because it was born out of Netflix. But if you go to advanced options, you can choose your own base AMI. So if you have a sanctioned or blessed AMI inside of your company that is not just standard Ubuntu, you can use that too. So it just gives you the option there.
Amar: So I can bring my own Linux if I want.
Isaac: Exactly. A lot of people use Amazon Linux.
Amar: And Windows?
Isaac: Yeah, I guess we could bake a Windows image, just anything that is bakeable with inside of Amazon, so anything that you can create Amazon image or as your image, obviously, as your support, Windows, as well as creating images from GCP. So they’re all supportable.
This will install this package from my Debian repository. It’ll bake it into an image. And that’s ultimately the basis for a deployment, this image.
Amar: And that image is your immutable infrastructure.
Isaac: That’s right. So from then on, we typically don’t log into servers because there’s nothing you would need to change. If you need to change anything in that particular image, you create a new image. You don’t just modify the servers themselves.
So now we’re going to deploy that image to stage. And it already knows that the previous stage was a bake step. So it knows to grab that image from the previous step. So you tell it what account that you want to deploy to. This is my Amazon test account. What region? Any VPC information. Stack and Detail allows you to create logical groupings. So here I’ve created something called stage. But maybe it’s Stage DB, Stage API, Stage whatever else you want to do to create these logical groupings because the power of Spinnaker is that when you keep these consistent naming mechanisms, then you can use them later for analytics or use them with your APMs. You can export this information out to the APMs. And then you can start seeing like, “Oh, it’s a Stage DB, version 001 that’s having a problem. 002 looks like this. 001 looks like this.” And you can compare two versions. And we’ll get into that in a second.
Amar: So this is going to solve the problem for me of coordinating different pieces of immutable infrastructure.
Isaac: Correct. It actually does it across your organization. So what this does is keeps it consistent. Let’s just say you’re an operations person who has to deal with team A, team B, team C. They’re all [in a metrics – 08:49]. I’m able to see that there’s a hierarchy in the naming convention. It’s a consistent naming convention. [And it goes to – 08:59] versions in infrastructure, not versions in code [which is – 09:02] versions in entire infrastructure. And I can compare those versions with previous versions of infrastructure because they’re all tagged as such. Amar: So in Netflix, they have their own version of Spinnaker. So when they go to their Spinnaker dashboard, they see everything.
Isaac: They see everything.
Amar: And they can see the connections between all their services.
Isaac: Yes, exactly. And they can see the versions of the services, when it was revved, who deployed, what [unintelligible – 09:33] incredible amount of transparency because they keep everything consistent and they have a consistent way of deploying.
Amar: This is really a big-picture tool.
Amar: It explains to you what’s going on with your software.
Isaac: Very much so. Very much so. We call it a continuous delivery tool, but it really does so much more beyond that. From beginning that somebody decides that code needs to be [pushed for – 09:54] production to the deployment to even in post-production when operationally you have visibility into what’s there.
So back to the deployments, you can deploy a bunch of different strategies, one of them being Red/Black or Blue/Green. Netflix calls it Red/Black because of the colors in their logo. But it’s commonly called Blue/Green in the marketplace. What Red/Black or Blue/Green allows you to do is to deploy one version of your infrastructure and then disable the old version, not destroy it but disable it so that when the new version is rolled out, if there are any problems, it allows you to roll back very quickly. And I’ll show you what that looks like here in a second. You can attach it to a load balancer. So these are more static resources. You don’t really need to recreate load balancers because they’re managed by Amazon or Azure or GCP. You can apply security groups so that you can only talk to certain specific instances. You can decide on sizing here, the capacity, how many instances to run. And we’ll be running two. Availability zones. So a whole bunch of different flexibility and options around how you want your cluster to look. But really, the thing here is I’m taking my image that I built on a T1 Micro. And then it’s going to apply it to two instances of this t2.small. And then we add an additional step, a manual judgment step. This is where you can have manual QA. Most people have either manual QA or automated QA. You can add another Jenkins job here to run integration test. And this could be a [gaining – 11:35] factor for you to run the production. Production looks very similar. But in production, I chose Red/Black. And the previous one I chose the [unintelligible – 11:46] strategy, which is it only keeps one instance because I don’t need two versions of my staging infrastructure. But in production I do because I may want to roll back.
Let’s look at what this pipeline looks like. Here’s build 10, which corresponds to build 10 here to this. Now I stopped it at the manual step. So it’s waiting for the manual judgment. So we’ll review what the steps look like here. So it baked the image based on that last build 10. It deployed at the stage. It deployed version 014. And now I’m going to show you this right here. This is a clustered view. So if I look at hello stage, if you see V014 is there. And if I go to my load balancers, I have V014. My cluster is there. And I’ll show you what this looks like. This is what it looks like, simple enough. So you can see every time I do a refresh, it goes to a different instance. Those are two different instances. So this is V014. Now what I’m going to have to do, I’m going to go to this manual step and I’m going to say continue.
Amar: So basically are supervising, just QA.
Isaac: “QA”. Ideally, that’s an automated step. But I made it manual because it does take a little bit of time to start up instances inside of Amazon.
Amar: So like a release manager. They could step in here and push that button [unintelligible – 13:24] after doing there.
Isaac: Or a product manager. Anybody who you want, you can set up different roles. You can also set up different criteria.
Amar: Can I integrate my release-go, no-go processes with this tool?
Isaac: It’s incredibly flexible. Well, this is spinning up. You’ll see that V009 is spinning up here. You’ll see V008 is going to go out of commission. So right now it’s waiting. You can see the spinning. You see this is the status. It’s deploying in us-west-2. There’s a whole bunch of other things that are going to happen. But eventually, you’ll see V008 go away.
In the meantime, what I’ll show you is how easy it is to change job. Let’s assume that you wanted something like an additional stage here to happen. Let’s do another deploy. We’ll do deploy to production. Let’s just call it… there’s a data center in Brazil. So if you want to have this happen in parallel, you can easily just change that there. So now, the next time I do the manual judgment, it’ll deploy to my us-west-2 and now into Brazil. You can also say we can have this happen after…
Amar: That’s cool. I like this graph up here.
Isaac: Yeah. So now let’s just say after stage [O is – 14:48] deployed to Brazil, but in the United States we need a manual check. The configurations here are incredible. Let me show you a quick blog post. Complex pipelines [unintelligible – 15:04] to show you how crazy it gets. And this is a pipeline that I modeled off of what they do at Netflix. So what’s happening here is they build four different browser plugins and wait for all of them to build and run integration tests and wait for them to pass. Once everything looks good, they run into a canary analysis. And they play 10, 50, 100 percent. They stage it out instead of just doing a full-out deployment.
Amar: And I can get notified as it goes through that pipeline.
Isaac: Exactly. And I’ll show you notifications here in a second. Let’s go back. This is still going. Again, really, the wait time here is we’re just waiting on Amazon to start up these instances. Let’s check. So they’re now healthy. So what we’ll see over time is that the old ones will get disabled, V008. And what you’ll see here is it’ll actually destroy… Why that’s red? It’s destroying V007 because I asked it to only keep two old versions. This became the third old version. That’s obviously more than two, and now it’s deleting that. It’s destroying the infrastructure. So you see what’s happening here. And then at certain point, if I go look at my load balancers, now V009 is behind the load balancers. Once the load balancer says it’s healthy, it just disabled it. So now the blue means it’s disabled. Green means it’s good. So these are still around, these servers.
If I were to go look at this server, let’s do this. Let me show you. If I were to look at my Amazon account, you’ll still see them there. So you see, they’re running, but they’re not behind the load balancer. And you can see this is the value in what Spinnaker does, which is it consistently tagged it. So it’s got an auto-scaling group. Where is 008? 008 is here somewhere. But it does tag it with additional metadata information so that you can go and find it and provide additional metadata back into your analytics. So that’s it.
Actually, you should do one more thing. You should roll it back so you can see. Let’s assume 009 was a problem. You’re like, “That doesn’t seem right. I restore back to V008. I didn’t like this deployment. It caused problems for my customers.” And so now it’s actually rolling back. I can close this. It’ll happen asynchronously. And then in a little bit of time, it’ll roll back to V008. And it all happens just super smooth. I didn’t build any code to do all this functionality. This stuff just works out of the box.
So how does that compare to what you expected to see in Spinnaker? It’d be really great to [unintelligible – 18:19] seeing it for the first time.
Amar: Let’s see. I wanted to see what it looks like in production. What is actually going… What is my production? You said there that Netflix has 4000 services. I think I was expecting some sort of visualizations of what it looks like in production. But now that I understand it better, what you really are visualizing is that pipeline and everything is going on in the pipeline. And it can be a dashboard of what’s going on in production as well. But the main emphasis is that this is what helps you get from code to production.
Isaac: In a safe manner.
Amar: In a safe manner.
Isaac: Cool, thank you for being the guinea pig, man. I really appreciate it.
Amar: Thank you.
Isaac: And we’ll put this up onto the site.