One of Spinnaker's core value propositions is the ability to perform safe & repeatable deployments. But how do you know if a deployment is truly safe?
A common approach to quantifying an application's general health is with an SLA (service level agreement). To measure how safe a deployment is, we can look at the delta in SLA before and after a given deployment. And, over time, we'd hope to see your service's SLA increase as we add automation and implement best practices for deployments, monitoring, and testing.
We've implemented a turn-key SLA measurement service directly within Armory Spinnaker in order to ensure you are constantly improving the safety of your deployments.
To calculate SLA, we look at uptime, response time, and error rates within CloudWatch. At some regular interval (default is every minute), we'll check to see if those three metrics are within your specified thresholds, and if all three pass then you are within your SLA for that time interval. If any one of the tests fail, you are NOT within your SLA for that time interval. The overall SLA score is simply the percentage of time intervals that are within SLA divided by total intervals.
Here's how you configure the thresholds for your SLA.
Once you configure your SLA for each application, here's what your SLA dashboard will look like:
How to Enable the SLA Feature
SLA_ENABLED=trueto your prod.env file
- Restart Spinnaker with this command
service armory-spinnaker restart(or by redeploying Spinnaker with your Spinnaker Deploy Spinnaker pipeline)
We realize that not every application's SLA can be effectively quantified with just uptime, response time, and error rate. You may want to look at custom metrics or perform other types of tests to truly determine if your service is available. In future versions of this service, we will allow you to define additional metrics that contribute to your SLA.
Let us know in the form below if there are specific types of metrics that you'd like to see us add.