autoscaling 3.3 design

Table of Contents Overview Tracking Out of Scope Feature Dependencies Related Features Analysis Open Issues Resolved Issues Design Implementation Summary Entities Service Impact Input Validation Service vs Activities Tasks HA Testability Function Service Summary Supported Actions IAM Integration Eucalyptus Extensions IAM Administrative Functionality Listing Resources Deleting Resources Updating Resources Administrative Suspension Configuration Upgrade Packaging Documentation Security Testing References

Overview

This design covers our implementation of AWS Auto Scaling.

Tracking

Status	Draft
Updated	2012/12/19	Initial document, focus is preparation for Sprint 1
Updated	2013/01/28	Updated for Sprint 2 functionality (which was previously for Sprint 1)
Updated	2013/02/01	Updated with Sprint 2 final details
Updated	2013/04/03	Updated with Sprint 5 final details
Updated	2013/04/09	Added more developer targetted design info

Out of Scope

Implementation of items related to features that we do not support are out of scope. These features are:

Placement groups
Spot Instances
SNS
VPC

Feature Dependencies

CW - Listing alarms associated with a scaling policy
EC2 - Verifying metadata, managing instances, tagging
ELB - Health checks, register/deregister instances

Related Features

This feature relates to the following features in this release:

EC2 - Tagging / Filtering
EC2 - Instance health checking
EC2 - Idempotent instance launch
CloudWatch
ElasticLoadBalancing
VM Types API

Analysis

Open Issues

Are we supporting "pagination" for describe actions?
Where are ARNs supported instead of names? (are ARNs supported for the policy name in PutScalingPolicy?)
What error occurs when setting desired capacity or executing policy is rejected due to cooldown?
Are only security group names accepted? (or can group identifiers be used as well?)
Is manual termination of instances permitted when termination is suspended?
Timing of adding to load balancer (is running "running"?, InService?)

Resolved Issues

What is the expected behaviour when the desired capacity is set to less than min or more than max? (fail - 400 - ValidationError - New SetDesiredCapacity value 2 is above max value 1 for the AutoScalingGroup.)
Should resource type/id be included in the tag information when describing groups with tags? (Yes)

Design

This section provides design details relevant to developers.

Implementation Summary

DB	eucalyptus_autoscaling
Source modules	scaling	Contents under com.eucalyptus.scaling package
	scaling-common	Contents under com.eucalyptus.scaling.common package
Service class	AutoScalingService
Component ID class	AutoScaling
Service type name	autoscaling
Internal URI	/internal/AutoScaling	Internal SOAP

Entities

The following entities will be added:

Launch configuration
Auto scaling group
Auto scaling instance
Auto scaling policy
Scaling activity
Tag / Auto scaling group tag

Service Impact

New SOAP / Query API user facing services are added.

Input Validation

The majority of input validation for the service occurs prior to routing to the service implementation. The AutoScalingMessageValidator class processes all messages (as configured in autoscaling-model.xml) prior to service invocation.

AutoScalingMessage exposes a validate method that returns a list of validation errors. The validation for each field is defined by annotations and validation requiring message context is performed by overriding the validate method. The annotations used are:

FieldRegex - A string field must match a provided regular expression
FieldRange - A numeric value or length of a collection must match the specified range

Annotations are defined in AutoScalingMessageValidation along with related regular expressions.

Validation requiring (cloud) contextual information is performed in the service implementation.

Service vs Activities

The service implementation is generally disconnected from the back-end scaling activities. The service implementation (AutoScalingService) is similar to other services and follows the Gen2ools patterns.

Activities are performed by ActivityManager which allows a single task to be run for each auto scaling group subject to exponential backoff on failures. Every clock tick (ten seconds by default) work can be performed by the activity manager on the following tasks:

Timeout - Timeout scaling activities that presumably failed without a state update
Expiry - Delete records for scaling activities older than the maximum retained age
ZoneHealth - Update auto scalings record of which Availability Zones are up
Recovery - Resume activities for instances in unstable states (mid launch, etc)
Scaling - Perform auto scaling work (launch, terminate)
InstanceCleanup - Find and terminate unexpected auto scaling instances
MetricsSubmission - Submit auto scaling metrics to CloudWatch

The autoscaling.suspendedtasks property can be used to suspend any of the above tasks (e.g. "InstanceCleanup, MetricsSubmission" would suspend two tasks)

In addition to task suspension it is possible to suspend auto scaling processes for all groups using the autoscaling.suspendedprocesses property.

Tasks

ScalingProcessTask represents a task to run (which may be subject to backoff), each task is composed of one or more ScalingActivityTasks. A ScalingActivityTask typically sends a message to another component and does something with the response.

The task framework runs all activities for a task concurrently.

Tasks are executed with an ActivityContext which provides clients to access other (possibly remote) components.

HA

The following items support operation in a high availability environment where there may be primary/secondary auto scaling components.

Auto scaling activities are only performed when the AutoScaling component is locally enabled.
In memory state is persisted to the database or can be recovered.
Activities that do not complete are timed out and will be retried.

Failover can degrade auto scaling performance with activities taking longer to complete than would otherwise be the case.

Testability

Auto scaling uses thin abstraction layers to facilitate mocking for test purposes:

DispatchingClient - Allows simulation of message exchanges
LaunchConfigurations / PersistenceLaunchConfigurations - Allows simulation of persistence

Unit tests are added for the service implementation and activity manager.

Function

This section provides implementation details and descriptions of functionality beyond the core auto scaling features.

Service Summary

Version	2011-01-01	Version of AutoScaling API supported
Service URI	/services/AutoScaling	SOAP / Query API
Eucarc variable name	AWS_AUTO_SCALING_URL

Supported Actions

This sections describes implemented auto scaling service actions. Actions that are not implemented relate to scheduled scaling and notfications.

For more information on administrative support by action, see Administrative Functionality below.

Action	Policy Enforced	Admin Enabled	Notes
CreateAutoScalingGroup	Y	Placement group, VPC Zone Identifier not supported.
CreateLaunchConfiguration	Y	Spot Price, EBS Optimized not supported.
CreateOrUpdateTags	Y
DeleteAutoScalingGroup	Y	Y
DeleteLaunchConfiguration	Y	Y
DeletePolicy	Y	Y
DeleteTags	Y
DescribeAdjustmentTypes	Y
DescribeAutoScalingGroups	Y	Y
DescribeAutoScalingInstances	Y	Y
DescribeLaunchConfigurations	Y	Y
DescribeMetricCollectionTypes	Y
DescribePolicies	Y	Y
DescribeScalingActivities	Y	Y
DescribeScalingProcessTypes	Y
DescribeTags	Y
DescribeTerminationPolicyTypes	Y
DisableMetricsCollection	Y	Y
EnableMetricsCollection	Y	Y
ExecutePolicy	Y	Y
PutScalingPolicy	Y
ResumeProcesses	Y	Y
SetDesiredCapacity	Y	Y
SuspendProcesses	Y	Y
TerminateInstanceInAutoScalingGroup	Y	Y
UpdateAutoScalingGroup	Y	Y	Placement group, VPC Zone Identifier not supported.

IAM Integration

Item	Value	Notes
Vendor	autoscaling	Prefix for actions, e.g. autoscaling:DescribeLaunchConfigurations
Resources	autoscalinggroup
	instance
	launchconfiguration
	scalingactivity
	scalingpolicy
	tag
Actions	autoscaling:*	We will permit use in policy of any actions from the supported API version
Quotas	autoscaling:quota-autoscalinggroupnumber
	autoscaling:quota-launchconfigurationnumber
	autoscaling:quota-scalingpolicynumber

The ARN of an Auto Scaling resource has the form:

  arn:aws:autoscaling::<account id>:<resource type>/<resource id>:[<name>/<value>]+

An example policy for service access:

    {
       "Statement":[{
          "Effect":"Allow",
          "Action":"autoscaling:*",
          "Resource":"*"
       }]
    }

Eucalyptus Extensions

IAM

Support for quotas and resource (see above)

An example policy for limiting auto scaling groups:

    {
      "Statement":[{
        "Effect":"Limit",
        "Action":"autoscaling:createautoscalinggroup",
        "Resource":"*",
        "Condition":{
          "NumericLessThanEquals":{
            "autoscaling:quota-autoscalinggroupnumber":"2",
          }
        }
      }]
    }

Administrative Functionality

We will extend the standard functionality for administrative purposes.

Listing Resources

The following actions will support listing of all accounts/users item:

DescribeAutoScalingGroups
DescribeAutoScalingInstances
DescribeLaunchConfigurations
DescribePolicies
DescribeScalingActivities

To enable this the parameter verbose is passed as a name selector.

Deleting Resources

The following actions will support administrative deletion of resources:

DeleteAutoScalingGroup
DeleteLaunchConfiguration
DeletePolicy

For administrative deletion the ARN of the resource must be used (not the simple name), e.g.:

  arn:aws:autoscaling::013765657871:launchConfiguration:6789b01a-a9c9-489f-9d24-ed39533fca61:launchConfigurationName/Test

Updating Resources

The following actions will support administrative update of resources:

DisableMetricsCollection
EnableMetricsCollection
ExecutePolicy
ResumeProcesses
SetDesiredCapacity
SuspendProcesses
TerminateInstanceInAutoScalingGroup
UpdateAutoScalingGroup

For administrative modification the ARN of the resource must be used (not the simple name)

Administrative Suspension

Scaling processes for auto scaling groups are automatically placed under administrative suspension when failing to launch instances (subject to the related configuration properties). Scaling processes can be resumed the owning account or by an administrator.

If an administrator manually suspends scaling processes for an auto scaling group then the group is also shown as being administratively suspended.

Configuration

The following properties are added for auto scaling configuration:

Property	Default value	Notes
autoscaling.activityexpiry	42d	6 weeks
autoscaling.activitytimeout	5m
autoscaling.maxlaunchincrement	20
autoscaling.maxregistrationretries	5
autoscaling.suspendedprocesses
autoscaling.suspendedtasks
autoscaling.suspensionlaunchattemptsthreshold	15
autoscaling.suspensiontimeout	1d
autoscaling.untrackedinstancetimeout	5m
autoscaling.zonefailurethreshold	5m

Interval values have a suffix to denote the units:

ms - Milliseconds
s - Seconds
m - Minutes
h - Hours
d - Days

Upgrade

No upgrade impact noted.

Packaging

No specific packaging requirements.

Documentation

Administrative functionality should be documented.

Security

Use of ARN in place of simple name adds risk of permitting deletion of other accounts resources.

Testing

No specific test cases noted.

References

AutoScaling and IAM (amazonwebservices.com)

tag:rls-3.3 tag:autoscaling

Architecture Wiki
- Contributing to...
- Organization of...

Tags
- Releases: 2.0, 3.0, 3.1, 3.2, 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4, 5.0
- Features: dns, ebs, image management, selinux, vpc
- Services: autoscaling, cloudwatch, elb, iam, object storage, sqs
- Components: eucanetd

Contact Info
- email: [email protected]
- IRC: #eucalyptus-devel (freenode)

Eucalyptus Links