Database Migrations & Job Queues

Reminders

  • HW2 released, due Feb 25
  • AWS invites will be sent out soon
  • Drop deadline is Feb 23

Agenda

  1. Why are migrations hard?
  2. Versioned migrations
  3. Job queues
  4. Agentic AI overview

Review of Databases

  • When we interact with a database, we must first define a schema
  • A schema defines the format of data we are storing, i.e. what tables, what columns, what relations, etc.
  • Schemas are necessary in order for database to optimize their queries

Trade-Offs of Schemas

# Schema 1
class Cat:
  name
  breed
  litter_preference

class Dog:
  name
  breed
  ball_preference

class Gerbil:
  name
  breed
  seed_preference
# Schema 2
class Pet:
  name
  breed
  species
  attributes

class Attribute:
  name
  value
  • What are the differences?

Migrations

  • Inevitably, schemas will need to change during development. What makes this tricky is if the database is already serving live traffic.
  • The process of making changes to a schema is called a migration. Things that could be modified:
    • Table columns
    • Relationships (foreign keys)
    • Constraints (e.g. uniqueness)

Why are Migrations hard?

  • Migration Ordering: Application code deployment needs to be coordinated with migrations
  • Versioned Migrations: The changes needed depend on the current state of the database
  • Depending on the type of change, migrations can potentially lock databases entirely while they occur
  • Large systems may have multiple databases that need to have coordinated migrations
  • If migrations alter not just the schema but also data, rollbacks are complex if something goes wrong

Migration Ordering

  • To make this change in lock-step, we must shut down the system entirely
  • Otherwise, we must either deploy new code first or change db schema first

migration_ordering

Example: Data Addition

  • Let's say we want to add new data to our pet database to match a pet to their owner. Additionally, we want to add a new section in the frontend to display this owner.
  • What do we need to add to our database schema?
  • What do we need to add to our application code?

Example: Data Addition

  • We need to add a new table for Owner, which Pet has a foreign key to
  • We need to make changes on the backend to deliver this new data, and also changes on the frontend to display this data
  • What could happen if we deploy the code before the migration is done?

Example: Data Deletion

  • Now let's imagine the opposite sort of change, we want to remove the Owner table.
  • What do we need to change in our database schema?
  • What do we need to change in our application code?

Example: Data Deletion

  • Need to remove Owner table along with foreign key on Pet table
  • Need to change backend code to no longer send owner, and also have the frontend no longer display this data
  • What could happen if we make the migration before the code is deployed?

Code First Migrations

code_first_migration

Data First Migrations

code_first_migration

Versioned Migrations

  • Historically, migrations were manually run, e.g. SSHing into the database, copy pasting some SQL commands and hoping it worked without issues
    • The easy way out was to just take scheduled downtime
  • We've identified how to ensure this plays nicely with the current application code, but what else is not great about this?
    • We need to know the current state of the database before we know what changes to apply
    • Manual work is not ideal, e.g. run migrations, then trigger application deployment

Versioned Migrations

  • Idea: let's track migrations as code. Each migration can be thought of as a version update.
  • We assume the initial state of the database is completely empty, and each migration is a file that transitions the state of the database. These files form a sort of linked list since they depend on each other.
  • We can track the current state of the database in the database!
    • A table called Migrations can track the current files/transitions applied.

Versioned Migrations

Migrations then are applied with the deterministic process:

  1. Look at Migrations table to determine current state
  2. Compare to migration files we have
  3. Apply new migration files in order

Versioned Migrations

versioned_migrations

Migrations in Practice

  • Now, we can combine these two concepts to implement auto-migrations on deployment
  • Many packages exist for versioned migrations, we'll be using alembic
  • We'll create a manual method to migrate locally since there's no notion of a deployment pipeline yet
  • We'll revisit this later when we start looking at Kubernetes

Jobs & Lambdas

Jobs: Motivation

  • Consider an endpoint like post upload for Instagram. We may want to do some additional actions like:
    • Resize/compress the image
    • Send notifications/emails to friends
    • Run it through content moderation
  • What could happen if we coded all of this logic within the endpoint handler itself?

Jobs: Motivation

  • If this logic was on the path of the web request handler:
    • The web request might take a long time to return or even timeout
    • This will block/reduce resources from other requests happening on the server
    • What if some parts fail, e.g. notifications?
  • We need a way to offload these tasks from the main thread that handles web requests

Jobs: Motivation

  • Within a system, you may find a need for asychronous or long running tasks
  • These tasks might need to be triggered in different ways
    • User action, e.g. web request or button press
    • Scheduled job
  • How can we achieve this?

Job Queues

A system for asynchronously processing tasks outside the request-response cycle, either offloading or scheduling them.

Often called other names, but they're all basically the same thing:

  • Message Queue
  • Publisher & Subscriber (pub/sub)

Core Components

  1. Producer - Creates jobs and adds them to the queue
  2. Queue/Broker - Stores jobs waiting to be processed
  3. Worker - Consumes and executes jobs

The Producer

Enqueues jobs in response to events. Importantly, only needs to ensure description of task is stored in queue.

Responsibilities:

  1. Serialize job data
  2. Set priority/retry policies
  3. Return quickly to caller

The Queue

Storage and distribution of jobs.

Responsibilities:

  1. Store job states
  2. Distribute jobs (push or pull)

Additional Features:

  • Persistence (survive crashes)
  • Ordering guarantees
  • Delivery semantics (at-least-once, exactly-once)

The Worker

Execute jobs from the queue.

Responsibilities:

  1. Poll or subscribe to the queue
  2. Process jobs
  3. Handle errors and retries
  4. Report completion status

Importantly, workers are horizontally scalable.

job queue

job queue

Delivery Guarantees

We can implement various types of guarantees for job delivery/execution.

At-most-once: Job might be lost, never duplicated

At-least-once: Job always delivered, might be duplicated

Exactly-once: Job delivered once and only once

Real-World Example

User uploads profile photo:

  1. Web handler saves file, returns 200 OK
  2. Enqueues ProcessProfilePictureJob with file path
  3. Worker picks up job
  4. Generates thumbnails, runs moderation, sends notifications, etc.
  5. Updates database with results
  6. User's browser polls for completion

Aside: Redis

Redis is an in memory data store

  • Extremely fast (microsecond latency)
  • Rich data structures (strings, lists, sets, sorted sets, hashes)
  • Optional persistence to disk
  • Pub/sub messaging
  • Atomic operations

Common uses: Cache, session store, real-time analytics, job queues

Agentic AI

How do LLMs like ChatGPT and Claude do things like search the internet?

Agentic AI

  • LLMs are effectively glorified "bags of words", spitting out text
  • If we want them to actually execute tasks like run code or search the web, we need to give them "tools"
  • These tools are functions we define that the LLM can call
  • We combine this with the LLM's reasoning ability to accomplish complex tasks autonomously

Simple Example

"Find the cheapest flights from NYC to London next week with reasonable travel times"

Task Execution:

  1. Search for flights on multiple sites (tool: web search)
  2. Compare prices across results
  3. Check baggage policies
  4. Verify travel times are reasonable
  5. Present top options

Agentic AI: Tools

How does an LLM know what tools it has?

  • Effectively sophisticated prompt engineering
  • We provide all context of tools in the prompt, e.g. name, params, types, etc
  • When the model wants to call a tool, it outputs a structured output
  • We parse this output, execute the function call, then return the result to the model, which then continues
  • Multiple libraries exist that abstract all of this away for us, we'll be using pydantic-ai

Building Agentic AI: Core Loop

  1. Jarvis receives message and decides if it is actionable
  2. If there is a task, first determine execution plan
  3. Pass plan to execution agent that has access to tools
  4. Execution agent runs step by step and calls tools when necessary
  5. Returns final result to user
User Message
    ↓
┌──────────────────────┐
│ TaskDetectionAgent   │ ← Is this actionable?
└──────────┬───────────┘
           ↓
    ┌──────────────┐
    │ PlanningAgent│ ← Generate execution plan
    └──────┬───────┘
           ↓
    ┌──────────────────┐
    │ ExecutionAgent   │ ← Execute with tools
    └──────┬───────────┘
           ↓
    [Tools: web_search, mark_progress]
           ↓
    Redis Progress Updates → Final Result
User Message
    ↓
┌──────────────────────┐
│ TaskDetectionAgent   │ ← Is this actionable?
└──────────┬───────────┘
           ↓
    ┌──────────────┐
    │ PlanningAgent│ ←──────┐
    └──────┬───────┘        │
           ↓                │── Job Queue
    ┌──────────────────┐    │
    │ ExecutionAgent   │ ←──┘
    └──────┬───────────┘
           ↓
    [Tools: web_search, mark_progress]
           ↓
    Redis Progress Updates → Final Result
@self.agent.tool_plain
async def web_search(search_query: str, freshness: str = "pw") -> str:
    """Perform a real web search and return formatted results."""
    search_results = search_web(search_query, self.context.tools, freshness)
    return format_results(search_results)

@self.agent.tool_plain
async def mark_progress(message: str) -> str:
    """Stream a progress update to the user in real-time."""
    update = TaskUpdate(
        status=TaskStatus.EXECUTING,
        content=message,
        timestamp=datetime.now(timezone.utc)
    )
    self.context.execution.redis_client.publish_task_update(
        self.context.agent.chat_id, update
    )
    return "✅ Progress update sent"
# From agentic_job.py - RQ job function
def execute_agentic_task_job(task_context: AgentContext, plan: TaskPlan):
    """Job function executed by RQ worker"""

    # Convert to context bundle
    context_bundle = AgentContextBundle.create(
        user_id=task_context.user_id,
        chat_id=task_context.chat_id,
        response_id=task_context.response_id,
        original_message=task_context.original_message,
        redis_client=get_redis_client(settings.redis_url)
    )

    # Create execution agent and run
    execution_agent = ExecutionAgent(context_bundle)
    result = await execution_agent.execute_plan(plan)

    return result

Lab: Job Queues