OUTDATED!! This feature is now incorporated into ECS.

See the following links for setting up the

OUTDATED

This article is a solution for replacing ECS instances without having them terminate with running tasks.

My solution is based on Amazons ECS Container draining code.

The advantages over this solution vs. AWS’s implementation

  • Efficiency: Event driven and does not use sleep(5) 🤯
  • Maintainable: Deployed with Terraform not CloudFormation 🤮

Overview of Resources

Diagram of resource that are needed

Logic Flow

  1. The auto scaling group triggers a lifecycle hook when it plans to remove an instance and the instance is on terminate:wait
  2. The lifecycle hooks triggers an SNS topic
  3. Subscribed to the SNS topic is a lambda function
  4. The function sets the ECS instance to drain
  5. Once the instance is fully drained (no running or pending tasks) the EventBridge rule
  6. The rule triggers the second lambda function
  7. The function tells the auto scaling group to continue the termination of the instance (terminate:proceed)

Code

To preface this part I am not a developer, so my code could probably be improved.

If you have any improvement please contribute to improve them. The repository can be found in this repository.

Drain ECS Instance

The function can be found here on github.

  1. First the function take a SNS input. From that it checks if the instance is terminating and the ID of the EC2 server
  2. Then it gets the ECS instance ID if from the ECS server ID
  3. Finally, it drains the instance using the ECS instance ID

Complete ECS Lifecycle

The function can be found here on github.

  1. When the function gets triggered by the EventBridge rule it gets the auto scaling groups name from the tag on the EC2 server
  2. Next it gets the lifecycle hooks name
  3. Finally, it tells the auto scaling group to proceed with the termination of the EC2 server

Spinning it Up

Using Terraform we can deploy all the resource. The only two resources not created in the module is the ECS cluster and the Auto Scaling Group.

The full Terraform module can be found in github.

module "ecs_asg_lifecycle" {
  source = "git::https://github.com/andrew-aiken/website-ref.git//ecs_asg_lifecycle"

  autoscaling_group_name = "ecs-asg-name"
  ecs_cluster_name       = "ecs-cluster-name"

  drain_ecs_instance_name     = "drain-ecs-instance"
  complete_ecs_lifecycle_name = "complete-ecs-lifecycle"

  tags = {
    key = "value"
  }
}

terraform {
  required_version = "1.5.5"

  backend "local" {
    path = "terraform.tfstate"
  }

  required_providers {
    archive = {
      source  = "hashicorp/archive"
      version = "2.4.0"
    }
    aws = {
      source  = "hashicorp/aws"
      version = "5.12.0"
    }
  }
}

provider "aws" {
  region = "us-west-2"
}

Put the code above in a file (main.tf).

To deploy the code you need AWS CLI setup and permissions to deploy the resources.

terraform init

terraform apply

There are 16 resources the module will deploy. Once you have reviewed all the changes type yes and Terraform will deploy the resources.