The Cloud

Agenda

  1. Virtualization
  2. Traditional Infrastructure
  3. The Cloud
  4. Managed Services
  5. Cloud vs Local

Virtualization

Virtualization

  • Problem: A single physical server running one application is not resource efficient
  • Solution: Run multiple virtual machines (VMs) on one physical server
  • A hypervisor (e.g. VMware, KVM) splits physical resources among VMs
  • Each VM thinks it has its own dedicated hardware

vm diagram

Benefits of Virtualization

  • Better hardware utilization: Run multiple applications on one server
    • Can additionally run multiple applications on a single VM, this is what we use Docker/Kubernetes for!
  • Flexibility: Easier to create/destroy VMs compared to physical servers
  • Snapshots: Save and restore VM states
  • Isolation: VMs are separated from each other

Virtualization

Virtualization does not solve everything. You still need to:

  • Own and maintain the physical hardware
  • Manage the hypervisor layer
  • Handle capacity planning

Traditional Infrastructure

Traditional Infrastructure

  • Before the cloud, companies had to manage their own physical servers
  • This meant purchasing, installing, and maintaining physical hardware in racks
  • Required dedicated facilities (data centers) with:
    • Power and cooling systems
    • Physical security
    • Network infrastructure

What's in a Rack?

  • Servers: Physical machines to run applications
  • Storage: Hard drives/SSDs for data persistence
  • Networking: Switches, routers, load balancers
  • Power: UPS (Uninterruptible Power Supply) for redundancy
  • Cooling: Prevent hardware from overheating

rack array

rack cables

The Cost of Traditional Infrastructure

  • Upfront capital: Expensive hardware purchases
  • Personnel: System administrators to manage hardware
  • Space: Physical data center space
  • Utilities: Power and cooling costs
  • Maintenance: Hardware failures, upgrades, replacements

Limitations of Traditional Infrastructure

  • Scaling is slow: Ordering and installing new hardware takes weeks/months
  • Over-provisioning: Must buy capacity for peak load, leading to waste during normal operation
  • Single points of failure: Hardware failures can bring down entire services
  • Geographic constraints: Limited to physical locations where you have data centers

The Cloud

What is "The Cloud"?

  • Instead of owning hardware, rent virtual machines from a cloud provider (or bare metal)
  • Major providers: AWS (Amazon), Google Cloud, Microsoft Azure, Alibaba
  • They manage the racks and hypervisors, you manage your VMs and applications
  • Pay only for what you use, can be short term or long term

The Cloud Trade-Off: Costs

Cloud Provider

  • Pay per-use pricing
  • No upfront capital
  • Easy to scale
  • Potentially expensive at scale

Your Own Sysadmin

  • Fixed costs
  • Large upfront hardware costs
  • Hard to scale
  • Predictable costs

The Cloud Trade-Off: Security

  • Advantage: Professional security teams, compliance certifications, DDoS protection
  • Disadvantage: No physical access to hardware
    • Must trust the provider, e.g. Amazon
    • Potential for data exposure
    • Shared infrastructure with other customers

The Cloud Trade-Off: Control

  • Advantage: Don't need to worry about hardware failures, power outages, network issues
    • Still may want to have multi-cloud, i.e. AWS outage on Monday!
  • Advantage: Access to global infrastructure instantly
  • Disadvantage: Limited control over underlying infrastructure
  • Disadvantage: Vendor lock-in (hard to switch providers)

Managed Services

What are Managed Services?

  • Cloud providers don't just rent bare servers
  • They also offer managed services: pre-configured software that they operate for you
  • Examples:
    • Managed databases (DynamoDB, RDS, Cloud SQL)
    • Managed Kubernetes (EKS, GKE)
    • Object storage (S3, Cloud Storage)

Two Types of Managed Services

Open Source

  • Based on existing open source projects
  • Examples: Managed PostgreSQL, Managed Kubernetes
  • Can migrate off more easily

Proprietary

  • Custom services built by provider
  • Examples: AWS Lambda, DynamoDB, BigQuery
  • Often more integrated

Open Source Managed Services

Pros:

  • Familiar APIs and interfaces
  • Can run locally for development
  • Easier migration between providers
  • Community support and documentation

Cons:

  • May not integrate as deeply with provider's ecosystem
  • Sometimes lag behind latest OSS versions

Proprietary Managed Services

Pros:

  • Often better performance and integration
  • Unique features not available elsewhere
  • Deep integration with provider's other services

Cons:

  • Vendor lock-in: Hard/impossible to migrate
  • Hard to test locally
  • Learning curve is provider-specific

Choosing Managed Services

  • Consider your priorities:
    • Portability → favor open source services
    • Performance/features → proprietary may be worth it
    • Team expertise → stick with familiar technologies
  • Hybrid approaches are common
    • Use proprietary services for non-critical components
    • Use open source for core business logic

Oct 20 AWS Outage

https://health.aws.amazon.com/health/status

Oct 20 AWS Outage

A highly simplified view:

  1. DNS resolution for DynamoDB has issues
  2. Any code that uses DynamoDB can no longer connect to their databases, i.e. dynamodb.us-east-1.amazonaws.com fails to connect
  3. Applications cannot run without database access
  4. A number of cascading effects on other AWS services, primarily because of requeue mechanisms
  5. No initial recovery for almost 3 hours!

Oct 20 AWS Outage

  • A case for multi-cloud infrastructure
  • This particular issue would have required a distributed database across cloud providers, something 99% of companies probably do not have
  • These solutions exist! Funnily enough, CockroachDB posted about this exact potential problem well over a year ago
    • Failover and disaster recovery: By deploying CockroachDB clusters in multiple cloud providers, you can implement effective failover and disaster recovery strategies. If one cloud provider experiences an outage or becomes unavailable, the application can automatically failover to another cloud provider where CockroachDB is running.

Lab: AWS

Note: Add diagram of a typical server rack with labeled components