Database Migrations

Reminders

  • HW1 due next week
  • Labs are counted as attendance! Make sure to submit them.
    • Correctness is not prioritized, just make sure you have things working locally

Agenda

  1. Databases/Migrations
  2. Why are migrations hard?
  3. Migration ordering
  4. Versioned migrations
  5. Lab

Review of Databases

  • When we interact with a database, we must first define a schema
  • A schema defines the format of data we are storing, i.e. what tables, what columns, what relations, etc.
  • Schemas are necessary in order for database to optimize their queries

Trade-Offs of Schemas

  1. # Schema 1
  2. class Cat:
  3. name
  4. breed
  5. litter_preference
  6. class Dog:
  7. name
  8. breed
  9. ball_preference
  10. class Gerbil:
  11. name
  12. breed
  13. seed_preference
  1. # Schema 2
  2. class Pet:
  3. name
  4. breed
  5. species
  6. attributes
  7. class Attribute:
  8. name
  9. value
  • What are the differences?

Migrations

  • Inevitably, schemas will need to change
    • What if I want to add color to my pet db?
  • The process of making changes to a schema is called a migration. Things that could be modified:
    • Table columns
    • Relationships (foreign keys)
    • Constraints (e.g. uniqueness)

Why are Migrations hard?

  • Migration Ordering: Application code deployment needs to be coordinated with migrations
  • Versioned Migrations: The changes needed depend on the current state of the database
  • Depending on the type of change, migrations can potentially lock databases entirely while they occur
  • Large systems may have multiple databases that need to have coordinated migrations
  • If migrations alter not just the schema but also data, rollbacks are complex if something goes wrong

Migration Ordering

  • To make this change in lock-step, we must shut down the system
  • Otherwise, we must either deploy new code first or change db schema first

migration_ordering

Example: Data Addition

  • Let's say we want to add new data to our pet database to match a pet to their owner. Additionally, we want to add a new section in the frontend to display this owner.
  • What do we need to add to our database schema?
  • What do we need to add to our application code?

Example: Data Addition

  • We need to add a new table for Owner, which Pet has a foreign key to
  • We need to make changes on the backend to deliver this new data, and also changes on the frontend to display this data
  • What could happen if we deploy the code before the migration is done?

Example: Data Deletion

  • Now let's imagine the opposite sort of change, we want to remove the Owner table.
  • What do we need to change in our database schema?
  • What do we need to change in our application code?

Example: Data Deletion

  • Need to remove Owner table along with foreign key on Pet table
  • Need to change backend code to no longer send owner, and also have the frontend no longer display this data
  • What could happen if we make the migration before the code is deployed?

Code First Migrations

code_first_migration

Data First Migrations

code_first_migration

Versioned Migrations

  • Historically, migrations were manually run, e.g. SSHing into the database, copy pasting some SQL commands and hoping it worked without issues
    • The easy way out was to just take scheduled downtime
  • We've identified how to ensure this plays nicely with the current application code, but what else is not great about this?
    • We need to know the current state of the database before we know what changes to apply
    • Manual work is not ideal, e.g. run migrations, then trigger application deployment

Versioned Migrations

  • Idea: let's track migrations as code. Each migration can be thought of as a version update.
  • We assume the initial state of the database is completely empty, and each migration is a file that transitions the state of the database. These files form a sort of linked list since they depend on each other.
  • We can track the current state of the database in the database!
    • A table called Migrations can track the current files/transitions applied.

Versioned Migrations

Migrations then are applied with the deterministic process:

  1. Look at Migrations table to determine current state
  2. Compare to migration files we have
  3. Apply new migration files in order

Versioned Migrations

versioned_migrations

Migrations in Practice

  • Now, we can combine these two concepts to implement auto-migrations on deployment
  • Many packages exist for versioned migrations, we'll be using alembic
  • We'll create a manual method to migrate locally since there's no notion of deployment yet
  • We'll revisit this later when we start looking at Kubernetes

Lab: Migrations