Skip to Content
Deep DivesPlatform Design

Platform Design

Platform engineering builds internal developer platforms that enable self-service capabilities and improve developer productivity. This guide covers platform design principles, patterns, and implementation strategies.

What is Platform Engineering?

┌─────────────────────────────────────────────────────────────┐ │ Platform Engineering │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Traditional Ops: │ │ Developer → Ticket → Ops Team → Provision → Developer │ │ (Days to weeks) │ │ │ │ Platform Engineering: │ │ Developer → Self-Service Platform → Provision │ │ (Minutes to hours) │ │ │ └─────────────────────────────────────────────────────────────┘

Platform vs DevOps vs SRE

RoleFocusOutput
DevOpsCulture + practicesCI/CD, automation
SREReliability + operationsSLOs, incident response
PlatformDeveloper experienceSelf-service products

Platform Design Principles

1. Platform as a Product

┌─────────────────────────────────────────────────────────────┐ │ Platform as a Product │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Customers │ Internal developers │ │ Product Manager │ Platform team lead │ │ User Research │ Developer interviews, surveys │ │ Features │ Based on developer needs │ │ Roadmap │ Prioritized by impact │ │ Success Metrics │ Developer productivity, adoption │ │ │ └─────────────────────────────────────────────────────────────┘

2. Self-Service First

CapabilitySelf-ServiceTraditional
Create new serviceTemplate in 10 minTicket, 3 days
Provision databaseUI/CLI in 5 minRequest, 1 week
Set up CI/CDAuto-generatedManual config
Scale resourcesOne-clickApproval process
View logs/metricsDashboard accessAsk SRE

3. Golden Paths

Opinionated, well-supported paths for common tasks.

Golden Path for New Service: ┌─────────────────────────────────────────────────────────────┐ │ 1. Create Service │ │ └── backstage create service --template=go-api │ │ │ │ 2. Generated Automatically: │ │ ├── Repository with code template │ │ ├── CI/CD pipeline │ │ ├── Kubernetes manifests │ │ ├── Monitoring dashboards │ │ ├── Alerts (SLO-based) │ │ ├── Documentation skeleton │ │ └── Service catalog entry │ │ │ │ 3. Developer Focus: │ │ └── Write business logic │ │ │ └─────────────────────────────────────────────────────────────┘

4. Paved Roads, Not Rails

Paved Road (Encouraged): ┌─────────────────────────────────────────────┐ │ Use standard templates, tools, patterns │ │ → Fast, supported, best practices │ └─────────────────────────────────────────────┘ Off-Road (Allowed but harder): ┌─────────────────────────────────────────────┐ │ Custom solutions when needed │ │ → Possible but self-supported │ └─────────────────────────────────────────────┘ Rails (Forced - avoid): ┌─────────────────────────────────────────────┐ │ Must use exactly this, no exceptions │ │ → Frustrating, blocks innovation │ └─────────────────────────────────────────────┘

Internal Developer Platform (IDP)

Platform Layers

┌─────────────────────────────────────────────────────────────┐ │ Developer Portal │ │ (Backstage, Port, Cortex) │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Platform APIs │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Service │ │ Database │ │ CI/CD │ │ Secrets │ │ │ │ │ │ Creation │ │Provisioning│ │ Config │ │ Mgmt │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Infrastructure Abstraction │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Helm │ │Terraform │ │ Crossplane│ │ Pulumi │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Cloud Infrastructure │ │ │ │ AWS / GCP / Azure / Kubernetes │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘

Service Catalog

# Backstage catalog-info.yaml apiVersion: backstage.io/v1alpha1 kind: Component metadata: name: order-service description: Handles order processing tags: - java - critical annotations: github.com/project-slug: myorg/order-service pagerduty.com/service-id: PXXXXXX spec: type: service lifecycle: production owner: team-commerce system: e-commerce dependsOn: - component:payment-service - resource:orders-database providesApis: - orders-api

Developer Portal Features

FeatureDescription
Service CatalogAll services, owners, dependencies
TemplatesCreate new services from blueprints
DocumentationTechnical docs in one place
API CatalogAll APIs, specs, versioning
Tech RadarRecommended technologies
SearchFind anything in the org
ScorecardsService health and compliance

Developer Experience (DevEx)

Measuring Developer Experience

MetricWhat it Measures
Lead TimeCommit to production
Deployment FrequencyHow often teams deploy
Developer Survey (DX)Satisfaction, friction points
Cognitive LoadComplexity to get things done
Time to First DeployNew developer onboarding

Reducing Friction

Common Friction Points: ┌─────────────────────────────────────────────────────────────┐ │ Friction Point │ Platform Solution │ ├─────────────────────────────────────────────────────────────┤ │ "How do I create a │ Service templates with │ │ new service?" │ one-command creation │ │ │ │ │ "Where are the logs?" │ Unified observability │ │ │ portal with service linking │ │ │ │ │ "How do I get a │ Self-service provisioning │ │ database?" │ with guardrails │ │ │ │ │ "Who owns this │ Service catalog with │ │ service?" │ ownership metadata │ │ │ │ │ "What's the deploy │ Standardized CI/CD │ │ process?" │ auto-generated │ │ │ │ │ "Is my service │ Scorecards and │ │ compliant?" │ automated checks │ └─────────────────────────────────────────────────────────────┘

Developer Onboarding

Day 1 Target: First commit merged Hour 1-2: Environment Setup ├── Automated laptop provisioning ├── Access to all necessary systems └── Development environment ready Hour 2-4: Orientation ├── Platform overview ├── Key tools walkthrough └── Find your team's services Hour 4-8: First Contribution ├── Small starter task ready ├── Pair with teammate └── PR merged to production Week 1: Fully Productive ├── Owns small feature ├── Understands team processes └── Can navigate platform

Self-Service Infrastructure

Infrastructure as Code Templates

# Terraform module for standard service module "standard_service" { source = "platform/standard-service" name = "order-service" team = "commerce" environment = "production" # Reasonable defaults, overridable replicas = 3 cpu = "500m" memory = "512Mi" # Auto-configured # - Kubernetes deployment # - Service mesh integration # - Monitoring dashboards # - Alerts # - Network policies }

Crossplane Example

# Developer requests database apiVersion: database.platform.io/v1 kind: PostgresInstance metadata: name: orders-db namespace: commerce spec: size: medium backups: true # Platform provisions automatically: # - RDS instance # - Security groups # - IAM roles # - Secrets in Vault # - Connection pooler

Guardrails, Not Gates

┌─────────────────────────────────────────────────────────────┐ │ Guardrails │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Policy Enforcement (OPA/Kyverno): │ │ ├── Resource limits required │ │ ├── Security context set │ │ ├── Labels present │ │ └── No privileged containers │ │ │ │ Cost Controls: │ │ ├── Instance size limits by environment │ │ ├── Budget alerts │ │ └── Auto-shutdown for dev environments │ │ │ │ Security: │ │ ├── Automatic vulnerability scanning │ │ ├── Secrets from Vault only │ │ └── Network policies enforced │ │ │ │ All automated, no manual approval needed │ │ │ └─────────────────────────────────────────────────────────────┘

API Design for Platforms

Platform API Principles

PrincipleDescription
DeclarativeDescribe desired state, not steps
VersionedSupport multiple versions
Self-documentingOpenAPI specs, examples
IdempotentSafe to retry
ObservableStatus, events, logs

Resource Model

# Kubernetes-style API design apiVersion: platform.company.io/v1 kind: Application metadata: name: order-service namespace: commerce labels: team: commerce tier: backend spec: # Desired state image: registry/order-service:v1.2.3 replicas: 3 resources: cpu: 500m memory: 512Mi status: # Current state (managed by platform) phase: Running replicas: 3 conditions: - type: Available status: "True" - type: Progressing status: "False"

Platform Team Structure

Team Responsibilities

┌─────────────────────────────────────────────────────────────┐ │ Platform Team │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Core Infrastructure: │ │ ├── Kubernetes clusters │ │ ├── Networking │ │ └── Security baseline │ │ │ │ Developer Platform: │ │ ├── Service templates │ │ ├── CI/CD pipelines │ │ ├── Developer portal │ │ └── Documentation │ │ │ │ Data Platform: │ │ ├── Database provisioning │ │ ├── Data pipelines │ │ └── Analytics infrastructure │ │ │ │ Observability: │ │ ├── Logging infrastructure │ │ ├── Metrics and alerting │ │ └── Tracing │ │ │ └─────────────────────────────────────────────────────────────┘

Platform Team Size

Org SizePlatform TeamRatio
50 developers3-51:10-15
200 developers10-151:15-20
500 developers20-301:20-25

Success Metrics

MetricTarget
Time to first deploy< 1 day
Service creation time< 30 minutes
Platform adoption> 90% of services
Developer satisfaction> 4/5
Support ticketsDecreasing trend
Deployment frequencyIncreasing trend

Platform Evolution

Maturity Model

Level 1: Ad-hoc ├── Manual processes ├── Tribal knowledge └── Hero culture Level 2: Standardized ├── Documented processes ├── Some automation └── Shared tools Level 3: Self-Service ├── Templates and guardrails ├── Developer portal └── Automated provisioning Level 4: Optimized ├── Continuous improvement ├── Metrics-driven └── Innovation enabled Level 5: Autonomous ├── AI-assisted operations ├── Predictive scaling └── Self-healing systems

Starting Your Platform Journey

Phase 1: Foundation (3-6 months) ├── Define platform vision ├── Identify top pain points ├── Build core team └── Quick wins (CI/CD, templates) Phase 2: Core Capabilities (6-12 months) ├── Service catalog ├── Self-service infrastructure ├── Observability stack └── Developer portal MVP Phase 3: Scale (12-24 months) ├── Advanced features ├── Multi-cloud/region ├── Platform API ecosystem └── Community building

Interview Quick Reference

Common Questions

  1. “How would you design an internal developer platform?”

    • Start with developer pain points
    • Platform as a product mindset
    • Golden paths for common workflows
    • Self-service with guardrails
    • Measure adoption and satisfaction
  2. “How do you balance standardization vs flexibility?”

    • Paved roads, not rails
    • 80% use cases covered by golden path
    • 20% can go off-road with more effort
    • Contribute back improvements
  3. “How do you measure platform success?”

    • Developer productivity (DORA metrics)
    • Time to first deploy
    • Adoption rate
    • Developer satisfaction surveys
    • Support ticket trends

Platform Checklist

  • Developer pain points identified?
  • Platform vision and roadmap?
  • Service templates available?
  • Self-service provisioning?
  • Observability integrated?
  • Documentation centralized?
  • Guardrails (not gates)?
  • Success metrics defined?
  • Feedback loop with developers?

Key Technologies

CategoryTools
PortalBackstage, Port, Cortex
IaCTerraform, Pulumi, Crossplane
CI/CDGitHub Actions, GitLab CI, ArgoCD
KubernetesEKS, GKE, AKS
PolicyOPA, Kyverno, Sentinel
SecretsVault, AWS Secrets Manager
GitOpsArgoCD, Flux
Last updated on