Migration Playbook: Monolith to Services Without Breaking Customers

How to safely migrate from monolith to services without disrupting customers or accumulating technical debt.

Goal

Incrementally extract services from a monolith while:

In scope:

Out of scope:

Migrate by capability, not by code — Extract business capabilities that can stand alone, not arbitrary code boundaries
Parallel run everything — New service runs alongside old code, with comparison, before cutover
Reversibility over speed — Every migration step must be reversible within minutes

Deliverable: Migration sequencing document with rationale

Deliverable: New service running in production, receiving shadow traffic

Deliverable: Parallel run report showing match rates and discrepancy analysis

Deliverable: Traffic fully on new service, monolith code dormant

Deliverable: Clean codebase, no migration scaffolding remaining

Ritual	Frequency	Duration	Participants
Migration standup	Daily	15 min	Migration team
Parallel run review	Daily during Phase 3	30 min	Tech lead + SRE
Traffic shift decision	As needed	30 min	Tech lead + Product
Migration retrospective	After each service	1 hour	Full team

Migration tracker: Spreadsheet/board showing all services, status, and blockers
Parallel run dashboard: Real-time comparison of monolith vs new service responses
Rollback runbook: Step-by-step procedure to revert to monolith
Migration ADR: Decision record for each extracted service

Metric	Target	Warning	Critical
Availability during migration	99.9%	99.5%	99%
Parallel run match rate	99.9%	99%	95%
Rollback time	Under 5 min	Under 15 min	Under 30 min
Migration scaffolding age	Under 30 days	Under 60 days	Under 90 days

Never cut over without parallel run — No exceptions, even for “simple” changes
Feature freeze during cutover — No other changes to the affected code path
Rollback tested before cutover — Actually execute rollback in staging
Business metrics monitoring — Not just technical metrics (revenue, conversion, etc.)

Signs of breakdown:

Response:

Data synchronization lag: Dual-write had eventual consistency, causing parallel run mismatches. Fixed by synchronous dual-write or accepting comparison window.
Hidden dependencies: Extracted service depended on monolith state not captured in the API. Fixed by more thorough seam analysis.
Migration fatigue: Team burned out on long migration, started cutting corners. Fixed by celebrating milestones and rotating team members.

Feedback loop: Weekly migration retrospective, monthly stakeholder update
Review cadence: Quarterly review of migration strategy and sequencing
Change process: Major sequencing changes require tech lead + product approval