When using Spinnaker, it queries your cloud provider (AWS, GCP, Azure, Kubernetes, etc) frequently to understand about the state of your existing infrastructure and current deployments. In doing so however, you might run into rate limits imposed by the cloud provider. On AWS you might see an exception similar to the following:
com.amazonaws.AmazonServiceException: CloudWatchAlarm Rate exceeded (Service: AmazonAutoScaling; Status Code: 400; Error Code: Throttling; Request ID: a217a0a2-da7e-11e5-734a-a1917861e2d6) at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1160) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:748) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:467) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:302) at com.amazonaws.services.autoscaling.AmazonAutoScalingClient.invoke(AmazonAutoScalingClient.java:3246)
This typically means you have hit the rate limits for the cloud provider.
Recently, the Spinnaker community has added service limits as part of Clouddriver to address growing concerns about these limits. Below is an example configuration for global rate limits for all services that you would place in
serviceLimits: defaults: rateLimit: 20
If you have multiple cloud providers, you can limit each one differently:
serviceLimits: cloudProviderOverrides: aws: rateLimit: 15
You can provide account specific overrides as well in case you have significantly more resources in one account while others have less
serviceLimits: accountOverrides: test: rateLimit: 5 prod: rateLimit: 100
And lastly, you can have more fine-grained control for particular AWS endpoints that might have a different rate limits:
implementationLimits: AmazonEC2: defaults: rateLimit: 200 accountOverrides: prod: rateLimit: 500 AmazonElasticLoadBalancing: defaults: rateLimit: 10
Using these rate limits will help you avoid hitting the rate limits and potentially make Spinnaker more responsive as the cloud provider clients won't have to implement back-off strategy to continue to query the infrastructure.
As always, we want to provide ridiculously responsive support, let us know how we can help.