Skip to Content
Deep DivesMigration Strategies

Migration Strategies

Safe deployment and migration strategies minimize risk while enabling continuous delivery. This guide covers patterns for zero-downtime deployments, database migrations, and cloud migrations.

Deployment Strategies Overview

┌─────────────────────────────────────────────────────────────┐ │ Deployment Strategy Spectrum │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Risk ◀────────────────────────────────────────────────────▶│ │ Low High │ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ Feature │ │ Blue- │ │ Canary │ │ Rolling │ │ │ │ Flags │ │ Green │ │ │ │ │ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ │ │ │ │ Decouple Instant Gradual Sequential │ │ deploy rollback rollout updates │ │ │ └─────────────────────────────────────────────────────────────┘

Blue-Green Deployment

Maintain two identical environments; switch traffic instantly.

Architecture

Before Deployment: ┌──────────┐ ┌───────────────┐ │ Load │────▶│ Blue (v1) │ ← Live │ Balancer │ │ Production │ └──────────┘ └───────────────┘ ┌───────────────┐ │ Green (v1) │ ← Idle │ Staging │ └───────────────┘ During Deployment: ┌──────────┐ ┌───────────────┐ │ Load │ │ Blue (v1) │ ← Live │ Balancer │ └───────────────┘ └──────────┘ ┌───────────────┐ │ Green (v2) │ ← Deploy & Test └───────────────┘ After Switch: ┌──────────┐ ┌───────────────┐ │ Load │ │ Blue (v1) │ ← Rollback ready │ Balancer │────▶│ │ └─────┬────┘ └───────────────┘ │ ┌───────────────┐ └─────────▶│ Green (v2) │ ← Live └───────────────┘

Implementation

# AWS ALB Target Group Switch aws elbv2 modify-listener \ --listener-arn $LISTENER_ARN \ --default-actions Type=forward,TargetGroupArn=$GREEN_TG # Kubernetes blue-green with Service apiVersion: v1 kind: Service metadata: name: my-app spec: selector: app: my-app version: green # Switch from 'blue' to 'green'

Pros and Cons

ProsCons
Instant rollback2x infrastructure cost
Full testing before switchDatabase migration complexity
Zero downtimeSession handling needed
Simple mental modelAll-or-nothing switch

Best Practices

  • Warm up green before switch (load balancer, caches)
  • Health checks must pass before switching
  • Database compatibility between versions
  • Session handling (sticky sessions or external store)

Canary Deployment

Route small percentage of traffic to new version, gradually increase.

Traffic Flow

Phase 1: 5% Canary ┌──────────┐ ┌───────────────┐ │ Load │─95%▶│ v1 (Stable) │ │ Balancer │ └───────────────┘ └─────┬────┘ ┌───────────────┐ └──5%─────▶│ v2 (Canary) │ └───────────────┘ Phase 2: 25% Canary ┌──────────┐ ┌───────────────┐ │ Load │─75%▶│ v1 (Stable) │ │ Balancer │ └───────────────┘ └─────┬────┘ ┌───────────────┐ └─25%────▶│ v2 (Canary) │ └───────────────┘ Phase 3: 100% Promoted ┌──────────┐ ┌───────────────┐ │ Load │────▶│ v2 (Stable) │ │ Balancer │ └───────────────┘

Automated Canary Analysis

# Argo Rollouts Canary Strategy apiVersion: argoproj.io/v1alpha1 kind: Rollout spec: strategy: canary: steps: - setWeight: 5 - pause: {duration: 10m} - analysis: templates: - templateName: success-rate - setWeight: 25 - pause: {duration: 10m} - setWeight: 50 - pause: {duration: 10m} - setWeight: 100

Canary Metrics to Monitor

MetricAction
Error rateRollback if higher than baseline
Latency (P50, P99)Rollback if significantly higher
CPU/MemoryCheck for resource issues
Business metricsOrders, conversions, etc.

Pros and Cons

ProsCons
Limited blast radiusSlower rollout
Real traffic testingComplex traffic splitting
Data-driven decisionsRequires good metrics
Gradual risk exposureSession affinity challenges

Rolling Deployment

Update instances sequentially, maintaining availability.

Pattern

Initial State: ┌────┐ ┌────┐ ┌────┐ ┌────┐ │ v1 │ │ v1 │ │ v1 │ │ v1 │ └────┘ └────┘ └────┘ └────┘ Step 1: Update first instance ┌────┐ ┌────┐ ┌────┐ ┌────┐ │ v2 │ │ v1 │ │ v1 │ │ v1 │ └────┘ └────┘ └────┘ └────┘ Step 2: Update second instance ┌────┐ ┌────┐ ┌────┐ ┌────┐ │ v2 │ │ v2 │ │ v1 │ │ v1 │ └────┘ └────┘ └────┘ └────┘ Final State: ┌────┐ ┌────┐ ┌────┐ ┌────┐ │ v2 │ │ v2 │ │ v2 │ │ v2 │ └────┘ └────┘ └────┘ └────┘

Kubernetes Rolling Update

apiVersion: apps/v1 kind: Deployment spec: replicas: 4 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Extra pods during update maxUnavailable: 1 # Pods that can be down

Pros and Cons

ProsCons
Resource efficientBoth versions run simultaneously
Built into orchestratorsRollback is another rolling update
Gradual rolloutLonger deployment time

Feature Flags

Decouple deployment from release; enable features without deploying.

Architecture

┌─────────────────────────────────────────────────────────────┐ │ Feature Flag System │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Management │───▶│ Feature │ │ │ │ Console │ │ Flag Store │ │ │ └──────────────┘ └──────┬───────┘ │ │ │ │ │ ┌───────────────────┼───────────────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ Service │ │ Service │ │ Service │ │ │ │ A │ │ B │ │ C │ │ │ └─────────┘ └─────────┘ └─────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘

Flag Types

TypeDescriptionExample
ReleaseEnable/disable featuresnew_checkout_flow
ExperimentA/B testingvariant_a vs variant_b
OpsOperational controlsenable_cache, rate_limit
PermissionUser-based accessbeta_users, premium

Implementation

# Simple feature flag check def checkout(cart): if feature_flags.is_enabled("new_checkout_v2", user=current_user): return new_checkout_flow(cart) else: return legacy_checkout_flow(cart) # Percentage rollout feature_flags.create( name="new_checkout_v2", rollout_percentage=10, # 10% of users allowed_users=["beta_testers"], excluded_regions=["EU"] # GDPR concerns ) # Gradual rollout day1: feature_flags.set_percentage("new_checkout_v2", 5) day2: feature_flags.set_percentage("new_checkout_v2", 25) day3: feature_flags.set_percentage("new_checkout_v2", 50) day4: feature_flags.set_percentage("new_checkout_v2", 100)

Feature Flag Best Practices

PracticeDescription
Short-livedRemove flags after rollout complete
Consistent evaluationSame user gets same result
Fallback to safe defaultIf flag service down
Audit loggingTrack flag changes
Testing both pathsTest enabled and disabled

Database Migration Strategies

Expand-Contract Pattern

Make backward-compatible changes in phases.

Phase 1: Expand (Add new) ┌─────────────────────────────────┐ │ users table │ ├─────────────────────────────────┤ │ id │ │ name ← old column │ │ first_name ← new column │ │ last_name ← new column │ └─────────────────────────────────┘ Code: Write to both, read from old Phase 2: Migrate - Backfill: Copy name → first_name, last_name - Verify: Data consistency checks Phase 3: Contract (Remove old) ┌─────────────────────────────────┐ │ users table │ ├─────────────────────────────────┤ │ id │ │ first_name ← only column │ │ last_name ← only column │ └─────────────────────────────────┘

Zero-Downtime Schema Changes

-- ❌ Locking operation ALTER TABLE users ADD COLUMN email VARCHAR(255) NOT NULL; -- ✅ Zero-downtime approach -- Step 1: Add nullable column ALTER TABLE users ADD COLUMN email VARCHAR(255) NULL; -- Step 2: Backfill data UPDATE users SET email = CONCAT(username, '@example.com') WHERE email IS NULL; -- Step 3: Add NOT NULL constraint (after all rows filled) ALTER TABLE users ALTER COLUMN email SET NOT NULL;

Online Schema Migration Tools

ToolDatabaseFeatures
gh-ostMySQLNon-blocking, pausable
pt-online-schema-changeMySQLPercona toolkit
pgrollPostgreSQLExpand-contract
LHMMySQLLarge Hadron Migrator

Cloud Migration Patterns

The 6 Rs

┌─────────────────────────────────────────────────────────────┐ │ Cloud Migration Strategies │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Rehost Replatform Repurchase │ │ (Lift & (Lift & (Replace) │ │ Shift) Tinker) │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │ VM │ │ PaaS │ │ SaaS │ │ │ │ → VM │ │ + DB │ │ │ │ │ └──────┘ └──────┘ └──────┘ │ │ │ │ Refactor Retain Retire │ │ (Re-arch) (Keep) (Remove) │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │Cloud │ │On-prem│ │ EOL │ │ │ │Native│ │ │ │ │ │ │ └──────┘ └──────┘ └──────┘ │ │ │ └─────────────────────────────────────────────────────────────┘
StrategyEffortBenefitWhen to Use
RehostLowQuick migrationLegacy systems
ReplatformMediumSome cloud benefitsDB to managed
RepurchaseMediumModern SaaSCRM, email
RefactorHighFull cloud-nativeCore apps
RetainNoneNo riskCompliance needs
RetireLowCost savingsUnused apps

Data Migration Approaches

Option 1: Offline Migration ┌──────────┐ Export ┌──────────┐ Import ┌──────────┐ │ Source │───────────▶│ File │───────────▶│ Target │ │ DB │ │ (S3/GCS) │ │ DB │ └──────────┘ └──────────┘ └──────────┘ Downtime: Hours to days Option 2: Online Replication ┌──────────┐ CDC ┌──────────┐ │ Source │─────────▶│ Target │ │ DB │ (real- │ DB │ └──────────┘ time) └──────────┘ Downtime: Minutes (for cutover) Option 3: Dual-Write ┌──────────┐ ┌──────────┐ │ App │─────────▶│ Source │ └────┬─────┘ └──────────┘ └───────────────▶┌──────────┐ │ Target │ └──────────┘ Downtime: Zero (if done right)

Rollback Strategies

Automatic Rollback Triggers

TriggerThreshold
Error rateabove 1%
Latency P99above 500ms
Health check3 consecutive failures
Custom metricBusiness-defined

Rollback Implementation

# Kubernetes rollback kubectl rollout undo deployment/my-app # Argo Rollouts automatic rollback apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate spec: metrics: - name: success-rate provider: prometheus: query: | sum(rate(requests_total{status="200"}[5m])) / sum(rate(requests_total[5m])) successCondition: result[0] >= 0.95 failureLimit: 3

Interview Quick Reference

Common Questions

  1. “How would you deploy to production with zero downtime?”

    • Blue-green for instant switch
    • Canary for gradual rollout
    • Rolling for resource efficiency
    • Feature flags for decoupled release
  2. “How do you handle database migrations?”

    • Expand-contract pattern
    • Backward-compatible changes only
    • Online migration tools (gh-ost)
    • Separate deploy from migrate
  3. “What’s your rollback strategy?”

    • Automated triggers on metrics
    • Instant switch (blue-green)
    • Feature flags for instant disable
    • Database backward compatibility

Deployment Checklist

  • Deployment strategy chosen?
  • Rollback plan tested?
  • Monitoring/alerts configured?
  • Database migration backward-compatible?
  • Feature flags for risky changes?
  • Health checks defined?
  • Traffic routing verified?
Last updated on