Migration Strategies
Safe deployment and migration strategies minimize risk while enabling continuous delivery. This guide covers patterns for zero-downtime deployments, database migrations, and cloud migrations.
Deployment Strategies Overview
┌─────────────────────────────────────────────────────────────┐
│ Deployment Strategy Spectrum │
├─────────────────────────────────────────────────────────────┤
│ │
│ Risk ◀────────────────────────────────────────────────────▶│
│ Low High │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Feature │ │ Blue- │ │ Canary │ │ Rolling │ │
│ │ Flags │ │ Green │ │ │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │ │
│ Decouple Instant Gradual Sequential │
│ deploy rollback rollout updates │
│ │
└─────────────────────────────────────────────────────────────┘Blue-Green Deployment
Maintain two identical environments; switch traffic instantly.
Architecture
Before Deployment:
┌──────────┐ ┌───────────────┐
│ Load │────▶│ Blue (v1) │ ← Live
│ Balancer │ │ Production │
└──────────┘ └───────────────┘
┌───────────────┐
│ Green (v1) │ ← Idle
│ Staging │
└───────────────┘
During Deployment:
┌──────────┐ ┌───────────────┐
│ Load │ │ Blue (v1) │ ← Live
│ Balancer │ └───────────────┘
└──────────┘ ┌───────────────┐
│ Green (v2) │ ← Deploy & Test
└───────────────┘
After Switch:
┌──────────┐ ┌───────────────┐
│ Load │ │ Blue (v1) │ ← Rollback ready
│ Balancer │────▶│ │
└─────┬────┘ └───────────────┘
│ ┌───────────────┐
└─────────▶│ Green (v2) │ ← Live
└───────────────┘Implementation
# AWS ALB Target Group Switch
aws elbv2 modify-listener \
--listener-arn $LISTENER_ARN \
--default-actions Type=forward,TargetGroupArn=$GREEN_TG
# Kubernetes blue-green with Service
apiVersion: v1
kind: Service
metadata:
name: my-app
spec:
selector:
app: my-app
version: green # Switch from 'blue' to 'green'Pros and Cons
| Pros | Cons |
|---|---|
| Instant rollback | 2x infrastructure cost |
| Full testing before switch | Database migration complexity |
| Zero downtime | Session handling needed |
| Simple mental model | All-or-nothing switch |
Best Practices
- Warm up green before switch (load balancer, caches)
- Health checks must pass before switching
- Database compatibility between versions
- Session handling (sticky sessions or external store)
Canary Deployment
Route small percentage of traffic to new version, gradually increase.
Traffic Flow
Phase 1: 5% Canary
┌──────────┐ ┌───────────────┐
│ Load │─95%▶│ v1 (Stable) │
│ Balancer │ └───────────────┘
└─────┬────┘ ┌───────────────┐
└──5%─────▶│ v2 (Canary) │
└───────────────┘
Phase 2: 25% Canary
┌──────────┐ ┌───────────────┐
│ Load │─75%▶│ v1 (Stable) │
│ Balancer │ └───────────────┘
└─────┬────┘ ┌───────────────┐
└─25%────▶│ v2 (Canary) │
└───────────────┘
Phase 3: 100% Promoted
┌──────────┐ ┌───────────────┐
│ Load │────▶│ v2 (Stable) │
│ Balancer │ └───────────────┘Automated Canary Analysis
# Argo Rollouts Canary Strategy
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
steps:
- setWeight: 5
- pause: {duration: 10m}
- analysis:
templates:
- templateName: success-rate
- setWeight: 25
- pause: {duration: 10m}
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 100Canary Metrics to Monitor
| Metric | Action |
|---|---|
| Error rate | Rollback if higher than baseline |
| Latency (P50, P99) | Rollback if significantly higher |
| CPU/Memory | Check for resource issues |
| Business metrics | Orders, conversions, etc. |
Pros and Cons
| Pros | Cons |
|---|---|
| Limited blast radius | Slower rollout |
| Real traffic testing | Complex traffic splitting |
| Data-driven decisions | Requires good metrics |
| Gradual risk exposure | Session affinity challenges |
Rolling Deployment
Update instances sequentially, maintaining availability.
Pattern
Initial State:
┌────┐ ┌────┐ ┌────┐ ┌────┐
│ v1 │ │ v1 │ │ v1 │ │ v1 │
└────┘ └────┘ └────┘ └────┘
Step 1: Update first instance
┌────┐ ┌────┐ ┌────┐ ┌────┐
│ v2 │ │ v1 │ │ v1 │ │ v1 │
└────┘ └────┘ └────┘ └────┘
Step 2: Update second instance
┌────┐ ┌────┐ ┌────┐ ┌────┐
│ v2 │ │ v2 │ │ v1 │ │ v1 │
└────┘ └────┘ └────┘ └────┘
Final State:
┌────┐ ┌────┐ ┌────┐ ┌────┐
│ v2 │ │ v2 │ │ v2 │ │ v2 │
└────┘ └────┘ └────┘ └────┘Kubernetes Rolling Update
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Extra pods during update
maxUnavailable: 1 # Pods that can be downPros and Cons
| Pros | Cons |
|---|---|
| Resource efficient | Both versions run simultaneously |
| Built into orchestrators | Rollback is another rolling update |
| Gradual rollout | Longer deployment time |
Feature Flags
Decouple deployment from release; enable features without deploying.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Feature Flag System │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Management │───▶│ Feature │ │
│ │ Console │ │ Flag Store │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Service │ │ Service │ │ Service │ │
│ │ A │ │ B │ │ C │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘Flag Types
| Type | Description | Example |
|---|---|---|
| Release | Enable/disable features | new_checkout_flow |
| Experiment | A/B testing | variant_a vs variant_b |
| Ops | Operational controls | enable_cache, rate_limit |
| Permission | User-based access | beta_users, premium |
Implementation
# Simple feature flag check
def checkout(cart):
if feature_flags.is_enabled("new_checkout_v2", user=current_user):
return new_checkout_flow(cart)
else:
return legacy_checkout_flow(cart)
# Percentage rollout
feature_flags.create(
name="new_checkout_v2",
rollout_percentage=10, # 10% of users
allowed_users=["beta_testers"],
excluded_regions=["EU"] # GDPR concerns
)
# Gradual rollout
day1: feature_flags.set_percentage("new_checkout_v2", 5)
day2: feature_flags.set_percentage("new_checkout_v2", 25)
day3: feature_flags.set_percentage("new_checkout_v2", 50)
day4: feature_flags.set_percentage("new_checkout_v2", 100)Feature Flag Best Practices
| Practice | Description |
|---|---|
| Short-lived | Remove flags after rollout complete |
| Consistent evaluation | Same user gets same result |
| Fallback to safe default | If flag service down |
| Audit logging | Track flag changes |
| Testing both paths | Test enabled and disabled |
Database Migration Strategies
Expand-Contract Pattern
Make backward-compatible changes in phases.
Phase 1: Expand (Add new)
┌─────────────────────────────────┐
│ users table │
├─────────────────────────────────┤
│ id │
│ name ← old column │
│ first_name ← new column │
│ last_name ← new column │
└─────────────────────────────────┘
Code: Write to both, read from old
Phase 2: Migrate
- Backfill: Copy name → first_name, last_name
- Verify: Data consistency checks
Phase 3: Contract (Remove old)
┌─────────────────────────────────┐
│ users table │
├─────────────────────────────────┤
│ id │
│ first_name ← only column │
│ last_name ← only column │
└─────────────────────────────────┘Zero-Downtime Schema Changes
-- ❌ Locking operation
ALTER TABLE users ADD COLUMN email VARCHAR(255) NOT NULL;
-- ✅ Zero-downtime approach
-- Step 1: Add nullable column
ALTER TABLE users ADD COLUMN email VARCHAR(255) NULL;
-- Step 2: Backfill data
UPDATE users SET email = CONCAT(username, '@example.com')
WHERE email IS NULL;
-- Step 3: Add NOT NULL constraint (after all rows filled)
ALTER TABLE users ALTER COLUMN email SET NOT NULL;Online Schema Migration Tools
| Tool | Database | Features |
|---|---|---|
| gh-ost | MySQL | Non-blocking, pausable |
| pt-online-schema-change | MySQL | Percona toolkit |
| pgroll | PostgreSQL | Expand-contract |
| LHM | MySQL | Large Hadron Migrator |
Cloud Migration Patterns
The 6 Rs
┌─────────────────────────────────────────────────────────────┐
│ Cloud Migration Strategies │
├─────────────────────────────────────────────────────────────┤
│ │
│ Rehost Replatform Repurchase │
│ (Lift & (Lift & (Replace) │
│ Shift) Tinker) │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │ VM │ │ PaaS │ │ SaaS │ │
│ │ → VM │ │ + DB │ │ │ │
│ └──────┘ └──────┘ └──────┘ │
│ │
│ Refactor Retain Retire │
│ (Re-arch) (Keep) (Remove) │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Cloud │ │On-prem│ │ EOL │ │
│ │Native│ │ │ │ │ │
│ └──────┘ └──────┘ └──────┘ │
│ │
└─────────────────────────────────────────────────────────────┘| Strategy | Effort | Benefit | When to Use |
|---|---|---|---|
| Rehost | Low | Quick migration | Legacy systems |
| Replatform | Medium | Some cloud benefits | DB to managed |
| Repurchase | Medium | Modern SaaS | CRM, email |
| Refactor | High | Full cloud-native | Core apps |
| Retain | None | No risk | Compliance needs |
| Retire | Low | Cost savings | Unused apps |
Data Migration Approaches
Option 1: Offline Migration
┌──────────┐ Export ┌──────────┐ Import ┌──────────┐
│ Source │───────────▶│ File │───────────▶│ Target │
│ DB │ │ (S3/GCS) │ │ DB │
└──────────┘ └──────────┘ └──────────┘
Downtime: Hours to days
Option 2: Online Replication
┌──────────┐ CDC ┌──────────┐
│ Source │─────────▶│ Target │
│ DB │ (real- │ DB │
└──────────┘ time) └──────────┘
Downtime: Minutes (for cutover)
Option 3: Dual-Write
┌──────────┐ ┌──────────┐
│ App │─────────▶│ Source │
└────┬─────┘ └──────────┘
│
└───────────────▶┌──────────┐
│ Target │
└──────────┘
Downtime: Zero (if done right)Rollback Strategies
Automatic Rollback Triggers
| Trigger | Threshold |
|---|---|
| Error rate | above 1% |
| Latency P99 | above 500ms |
| Health check | 3 consecutive failures |
| Custom metric | Business-defined |
Rollback Implementation
# Kubernetes rollback
kubectl rollout undo deployment/my-app
# Argo Rollouts automatic rollback
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
spec:
metrics:
- name: success-rate
provider:
prometheus:
query: |
sum(rate(requests_total{status="200"}[5m])) /
sum(rate(requests_total[5m]))
successCondition: result[0] >= 0.95
failureLimit: 3Interview Quick Reference
Common Questions
-
“How would you deploy to production with zero downtime?”
- Blue-green for instant switch
- Canary for gradual rollout
- Rolling for resource efficiency
- Feature flags for decoupled release
-
“How do you handle database migrations?”
- Expand-contract pattern
- Backward-compatible changes only
- Online migration tools (gh-ost)
- Separate deploy from migrate
-
“What’s your rollback strategy?”
- Automated triggers on metrics
- Instant switch (blue-green)
- Feature flags for instant disable
- Database backward compatibility
Deployment Checklist
- Deployment strategy chosen?
- Rollback plan tested?
- Monitoring/alerts configured?
- Database migration backward-compatible?
- Feature flags for risky changes?
- Health checks defined?
- Traffic routing verified?
Last updated on