Oct 20 AWS Outage: The DNS Bug

AWS uses automated "DNS Enactors" to update Route53 records for internal services. Two Enactors tried to update the DynamoDB endpoint at the same time (race-condition):

Enactor A starts an update but experiences unusual delays
Enactor B completes a newer update and cleans up old plans
Enactor A finally applies its now-outdated plan, overwriting Enactor B's
The cleanup process deletes this stale plan — removing all IP addresses from dynamodb.us-east-1.amazonaws.com

Result: an empty DNS record — every service in us-east-1 that depended on DynamoDB lost connectivity

The Cloud

Agenda

Virtualization

Virtualization

Benefits of Virtualization

Virtualization

Traditional Infrastructure

Traditional Infrastructure

What's in a Rack?

The Cost of Traditional Infrastructure

Limitations of Traditional Infrastructure

The Cloud

What is "The Cloud"?

The Cloud Trade-Off: Costs

The Cloud Trade-Off: Security

The Cloud Trade-Off: Control

Managed Services

What are Managed Services?

Two Types of Managed Services

Open Source Managed Services

Proprietary Managed Services

Choosing Managed Services

Oct 20 AWS Outage

Oct 20 AWS Outage: The DNS Bug

Oct 20 AWS Outage: Cascading Failures

Oct 20 AWS Outage: Takeaways

Lab: AWS