Skip to content
Unverified — AI-generated content. Help verify this page

Platform Engineering Maturity

Platform engineering is the discipline of building and maintaining an Internal Developer Platform (IDP) — a self-service layer that abstracts infrastructure complexity so product teams can deploy, operate, and observe their services without filing tickets or needing deep infrastructure expertise.

The concept is not new. Google had Borg, its internal platform, for over a decade before Kubernetes was born. Netflix built its own internal platform around 2012. What changed is that the rest of the industry now has the tools (Kubernetes, Backstage, Terraform, ArgoCD) and the organizational models (Team Topologies) to build IDPs without being a FAANG-scale company.

The central problem platform engineering solves is cognitive load. As organizations scale from 5 to 50 to 500 engineers, the number of tools, processes, and infrastructure decisions grows faster than any individual can track. Without a platform, every team reinvents the wheel: their own CI/CD pipeline, their own monitoring setup, their own deployment scripts. Platform engineering centralizes these capabilities into a product that internal developers consume.

This page defines a four-level maturity model, then covers each component of a mature IDP: golden paths, self-service infrastructure, Backstage, developer experience metrics, platform team structure, and migration strategies.

Related: Backstage & Developer Portals | Developer Experience


The Platform Maturity Model

Level 0 — Ad Hoc

Characteristics:

  • Infrastructure is provisioned by filing Jira tickets to an ops team
  • Deployment processes vary per team (some use shell scripts, some use CI/CD, some deploy manually)
  • No service catalog — "who owns this service?" requires asking in Slack
  • Monitoring and alerting are set up differently (or not at all) for each service
  • New developers take 2-4 weeks to make their first commit
  • Knowledge lives in people's heads, not in documentation

Metrics (typical):

MetricTypical Value
Lead time for changes1-4 weeks
Deployment frequencyMonthly or less
Time to onboard new developer2-4 weeks
Time to provision new service1-2 weeks (ticket queue)
MTTR (Mean Time to Recovery)Hours to days

How you know you are here: Engineers spend more time fighting infrastructure than writing business logic. The answer to most "how do I..." questions is "ask Bob."

Level 1 — Standardized

Characteristics:

  • Standard CI/CD pipeline exists (all teams use the same basic structure)
  • Infrastructure as Code (Terraform/Pulumi) for provisioning
  • Documented runbooks for common operations
  • Consistent monitoring stack (Prometheus + Grafana, Datadog, etc.)
  • Service templates exist but are not enforced
  • Some automation, but still requires infrastructure team involvement for many tasks

What changes from Level 0:

  • Documented golden paths (even if not automated)
  • Standard Dockerfile template, Helm chart structure
  • Centralized logging and monitoring
  • Shared CI/CD pipeline with per-repo configuration
  • Service ownership tracked (even if in a spreadsheet)

Level 2 — Self-Service

Characteristics:

  • Internal Developer Portal (Backstage, Port, Cortex) provides a single pane of glass
  • Creating a new service takes minutes, not weeks (scaffolding templates)
  • Infrastructure provisioning is self-service (through platform API, not tickets)
  • Golden paths are automated and enforced
  • Developer documentation is discoverable through the portal
  • Cost attribution per team/service is available
  • Most common operations require zero interaction with the platform team

What changes from Level 1:

  • Portal replaces documentation as the primary interface
  • Templates replace manual setup
  • Self-service replaces tickets for infrastructure
  • Guardrails replace reviews for standards enforcement

Level 3 — Optimized

Characteristics:

  • DORA and SPACE metrics are tracked and used for continuous improvement
  • Platform team operates as a product team (user research, backlog, roadmap)
  • Developer satisfaction is measured regularly (quarterly surveys)
  • Platform capabilities are continuously evolved based on data
  • Cost optimization is automated (right-sizing, scale-to-zero)
  • Security and compliance are built into the platform (shift-left)
  • Migration to new standards is platform-assisted (automated codemods, migration tooling)

What changes from Level 2:

  • Metrics drive decisions (not intuition)
  • Platform has its own product management
  • Continuous improvement loop is formalized
  • Developer experience is quantified, not assumed

Internal Developer Platforms (IDPs)

What an IDP Actually Provides

IDP vs PaaS

AspectPaaS (Heroku, Railway)IDP
OwnershipVendor-ownedOrganization-owned
CustomizationLimited to vendor's modelFully customizable to your org
Abstraction levelOpaque (you cannot see the infrastructure)Transparent (you can see and escape the abstraction)
Lock-inHigh (vendor-specific APIs)Low (built on open standards)
Cost modelPer-resource pricingInfrastructure cost + platform team cost
Escape hatchMigrate to different platformModify the platform itself
Best forSmall teams, startups, simple workloads50+ engineers, complex infrastructure, compliance requirements

IDPs Are Products, Not Projects

The number one failure mode in platform engineering is building the IDP as a project (fixed scope, fixed timeline, handed off to maintenance mode) instead of as a product (ongoing user research, backlog prioritization, continuous iteration). A platform that stops evolving stops being used.


Golden Paths

Golden paths (sometimes called "paved roads") are opinionated, pre-built paths through your infrastructure that represent the recommended way to accomplish common tasks. They are not mandatory — teams can go off-path — but the golden path should be so good that going off-path feels like unnecessary effort.

What Golden Paths Look Like

Golden Path: "Create a new backend service"

1. Developer opens Backstage → clicks "Create Component"
2. Selects "Backend Service (Go)" template
3. Fills in: service name, team, description, tier (critical/standard)
4. Platform automatically creates:
   ├── GitHub repo with standard structure
   │   ├── main.go (starter code with health check, metrics, tracing)
   │   ├── Dockerfile (multi-stage, distroless base)
   │   ├── helm/ (Helm chart with sane defaults)
   │   ├── .github/workflows/ (CI pipeline)
   │   ├── catalog-info.yaml (Backstage registration)
   │   └── docs/ (TechDocs skeleton)
   ├── Kubernetes namespace
   ├── ArgoCD Application (GitOps deploy)
   ├── Datadog monitors (golden signals)
   ├── PagerDuty service + escalation policy
   ├── Slack channel (#svc-<name>)
   └── Backstage catalog entry (owner, dependencies, docs)

Time: 3-5 minutes (vs 1-2 weeks without platform)

Designing Effective Golden Paths

Principles:

  1. Opinionated but not mandatory: The golden path works out of the box. Teams can customize or diverge, but they should not need to.

  2. Escape hatches at every layer: If a team needs to customize the Helm chart, they can. If they need a non-standard database, they can request it. The platform does not lock people in.

  3. Incrementally adoptable: Teams can adopt golden paths one component at a time. "Use our CI pipeline but keep your own Helm chart" is a valid intermediate state.

  4. Production-ready by default: A service created from a golden path template should have monitoring, alerting, health checks, structured logging, and security scanning from day one.

Golden Path Coverage Map

TaskGolden Path (Mature)Ad Hoc (Immature)
Create new serviceTemplate in Backstage, 5 minClone old repo, customize manually, 1-2 weeks
Add a databaseSelf-service in portal, 10 minJira ticket to DBA team, 3-5 days
Set up monitoringAutomatic with templateManual Grafana dashboard creation
Deploy to productiongit push triggers GitOps pipelineSSH into server, run deploy script
Create a new environmentSelf-service, 15 minJira ticket to infra team, 1-2 weeks
Onboard new developerPortal + docs, 1-2 daysShadow senior engineer, 2-4 weeks
Investigate production incidentPortal shows service graph, logs, metricsSSH into multiple servers, grep logs

Backstage Deep Dive

Backstage (covered in detail on the Backstage & Developer Portals page) is the most widely adopted open-source framework for building an IDP portal. Here we focus on how Backstage fits into platform maturity.

Backstage as a Platform Maturity Accelerator

Backstage Scaffolder Templates

Templates are the golden path automation engine in Backstage.

yaml
# backstage-template.yaml — Create a new Go microservice
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: go-microservice
  title: Go Microservice
  description: Create a new Go microservice with CI/CD, monitoring, and docs
  tags: ['go', 'microservice', 'recommended']
spec:
  owner: platform-team
  type: service

  parameters:
    - title: Service Details
      required: ['name', 'description', 'owner']
      properties:
        name:
          title: Service Name
          type: string
          pattern: '^[a-z][a-z0-9-]{2,30}$'
          ui:help: 'Lowercase, hyphens only, 3-31 characters'
        description:
          title: Description
          type: string
          maxLength: 200
        owner:
          title: Owner Team
          type: string
          ui:field: OwnerPicker
          ui:options:
            catalogFilter:
              kind: Group
        tier:
          title: Service Tier
          type: string
          enum: ['critical', 'standard', 'experimental']
          default: 'standard'
          description: 'Critical = 99.99% SLO, Standard = 99.9%, Experimental = best-effort'

    - title: Infrastructure Options
      properties:
        database:
          title: Database
          type: string
          enum: ['none', 'postgresql', 'redis', 'both']
          default: 'none'
        cloudProvider:
          title: Cloud Provider
          type: string
          enum: ['aws', 'gcp']
          default: 'aws'

  steps:
    - id: fetch-template
      name: Fetch Template
      action: fetch:template
      input:
        url: ./template
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          owner: ${{ parameters.owner }}
          tier: ${{ parameters.tier }}

    - id: create-repo
      name: Create GitHub Repository
      action: publish:github
      input:
        allowedHosts: ['github.com']
        repoUrl: 'github.com?owner=my-org&repo=${{ parameters.name }}'
        defaultBranch: main
        protectDefaultBranch: true
        requireCodeOwnerReviews: true

    - id: create-namespace
      name: Create Kubernetes Namespace
      action: kubernetes:create-namespace
      input:
        namespace: ${{ parameters.name }}
        labels:
          team: ${{ parameters.owner }}
          tier: ${{ parameters.tier }}

    - id: create-argocd-app
      name: Create ArgoCD Application
      action: argocd:create-application
      input:
        appName: ${{ parameters.name }}
        repoUrl: ${{ steps['create-repo'].output.repoUrl }}
        path: helm/
        destNamespace: ${{ parameters.name }}

    - id: create-monitors
      name: Create Monitoring
      action: datadog:create-monitors
      input:
        serviceName: ${{ parameters.name }}
        tier: ${{ parameters.tier }}
        owner: ${{ parameters.owner }}

    - id: register-catalog
      name: Register in Backstage Catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }}
        catalogInfoPath: '/catalog-info.yaml'

  output:
    links:
      - title: Repository
        url: ${{ steps['create-repo'].output.remoteUrl }}
      - title: Open in Backstage
        icon: catalog
        entityRef: ${{ steps['register-catalog'].output.entityRef }}

Backstage Alternatives

ToolTypeStrengthsLimitations
Backstage (Spotify, CNCF)Open-source frameworkMassive plugin ecosystem, fully customizableRequires dedicated team to build and maintain
PortCommercial SaaSNo-code portal builder, fast setupLess customizable than Backstage
CortexCommercial SaaSScorecards, service maturity trackingFocused on catalog, less on scaffolding
OpsLevelCommercial SaaSService ownership, maturity rubricsLess extensible
KratixOpen-sourceKubernetes-native platform APINarrower scope (infrastructure only)

Developer Experience Metrics

Measuring platform success requires the right metrics. See the Developer Experience page for the full DORA and SPACE treatment. Here we focus on platform-specific metrics.

The Three Layers of Platform Metrics

DORA Metrics for Platform Teams

DORA MetricWhat It Means for PlatformTarget (Elite)
Deployment FrequencyHow often product teams deploy (enabled by platform)Multiple times per day per team
Lead Time for ChangesCommit to production (CI/CD pipeline speed)Less than 1 hour
Change Failure Rate% of deployments causing failures (platform guardrails)0-5%
Mean Time to RecoveryIncident detection to resolution (observability quality)Less than 1 hour

SPACE Framework for Platform Teams

DimensionExample MetricMeasurement
SatisfactionDeveloper NPSQuarterly survey
PerformanceService creation timeAutomated measurement
ActivityTemplates used per monthBackstage analytics
CommunicationPlatform support ticketsTicket system
Efficiency% of self-service vs ticket requestsTicket vs portal analytics

Developer Surveys

Surveys are the most direct way to measure developer experience. Run them quarterly.

Key survey questions (scored 1-5):

Infrastructure & Deployment:
1. "I can deploy my changes to production confidently and quickly."
2. "Creating a new service or environment is straightforward."
3. "I rarely need to file tickets for infrastructure tasks."

Observability & Debugging:
4. "When something breaks in production, I can find the root cause quickly."
5. "I have the monitoring and alerting I need for my services."

Documentation & Discovery:
6. "I can find documentation for internal tools and services easily."
7. "I know who owns a service and how to contact them."

Developer Tools:
8. "My local development environment works reliably."
9. "CI/CD pipelines are fast enough that they don't block my workflow."

Overall:
10. "The internal developer platform makes me more productive."

Free text:
11. "What is the biggest source of friction in your day-to-day work?"
12. "If you could change one thing about our developer tools, what would it be?"

Track Trends, Not Absolutes

The absolute score on developer satisfaction matters less than the trend. A score of 3.2 that improves to 3.5 quarter-over-quarter means the platform is getting better. A score of 4.0 that drops to 3.7 signals a problem. Always present metrics as trend lines, not snapshots.


Platform Team Structure

Team Topologies for Platform Engineering

Team Topologies (by Matthew Skelton and Manuel Pais) provides the organizational model for platform engineering. The platform team is an enabling team — its purpose is to reduce the cognitive load of stream-aligned (product) teams.

Platform Team Roles

RoleResponsibilitySkills
Platform Product ManagerUser research, roadmap, prioritizationProduct management, developer empathy
Platform EngineerBuild and maintain IDP capabilitiesKubernetes, Terraform, CI/CD, Go/Python
Platform SREReliability of the platform itselfMonitoring, incident response, SLOs
Developer Advocate (DX)Documentation, onboarding, supportTechnical writing, empathy, training
Security EngineerEmbed security into golden pathsAppSec, supply chain security, compliance

Platform Team Sizing

Organization SizePlatform Team SizeRatio
20-50 engineers2-3 platform engineers1:15-20
50-200 engineers5-10 platform engineers1:15-25
200-500 engineers10-25 platform engineers1:20-30
500+ engineers25+ (multiple platform sub-teams)1:20-40

The "Platform Team of One" Trap

A single platform engineer cannot build and maintain an IDP alone. They become a bottleneck and a single point of failure — the exact problem the platform was supposed to solve. If you cannot invest at least 2-3 people, focus on standardization (Level 1) rather than attempting self-service (Level 2).


Service Catalogs

The service catalog is the foundation of an IDP. Without knowing what exists, who owns it, and how it connects to other services, everything else (golden paths, monitoring, cost attribution) falls apart.

What a Service Catalog Entry Contains

yaml
# catalog-info.yaml (Backstage format)
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  description: "Handles payment processing, billing, and invoicing"
  annotations:
    github.com/project-slug: my-org/payment-service
    backstage.io/techdocs-ref: dir:.
    pagerduty.com/service-id: P1234567
    datadoghq.com/dashboard-url: https://app.datadoghq.com/dashboard/abc-123
    argocd/app-name: payment-service
  tags:
    - go
    - critical
    - pci-compliant
  links:
    - url: https://wiki.internal/payment-service
      title: Wiki
    - url: https://grafana.internal/d/payment-service
      title: Grafana Dashboard
spec:
  type: service
  lifecycle: production
  owner: team-payments
  system: billing-system
  providesApis:
    - payment-api
  consumesApis:
    - user-api
    - notification-api
  dependsOn:
    - resource:postgresql-payments
    - resource:redis-payments

Service Maturity Scorecards

Scorecards measure how well each service follows organizational standards:

CategoryCheckWeight
OwnershipHas owner defined in catalogRequired
DocumentationTechDocs published and updated in last 90 days15%
MonitoringHas SLO defined + alerting configured20%
SecurityPasses vulnerability scan, no critical CVEs20%
ReliabilityHas runbook, has on-call rotation15%
Testing>80% code coverage, integration tests in CI15%
CostHas cost attribution labels15%
Service Scorecard:
  payment-service: A (92/100)  ✅
  search-service:  B (78/100)  ⚠️  Missing: runbook, cost labels
  legacy-api:      D (41/100)  ❌  Missing: docs, monitoring, tests, owner unclear

Scaffolding and Templates

Beyond Backstage Templates

Scaffolding goes beyond creating repos. A mature platform provides templates for every common task:

TemplateOutput
New backend serviceRepo + CI/CD + namespace + monitors + docs
New frontend appRepo + CI/CD + CDN config + feature flags
New APIOpenAPI spec + generated server stub + docs + gateway registration
New databaseProvisioned DB + secret in Vault + connection string injected
New teamNamespace + RBAC + cost center + Slack channel + on-call rotation
New environmentFull environment clone with all services + data seeding

Cookiecutter vs Backstage Scaffolder vs Yeoman

ToolStrengthsLimitations
Backstage ScaffolderIntegrated with catalog, executes actions (create repo, namespace, etc.)Requires Backstage deployment
CookiecutterSimple, language-agnostic, widely usedTemplate-only (no orchestration of infrastructure)
YeomanInteractive prompts, generatorsJavaScript ecosystem only, less active
copierGit-based templates with update supportPython ecosystem focus
ProjenGenerates and manages project config filesAWS CDK focus, opinionated

Documentation-as-Code

TechDocs in Backstage

TechDocs renders Markdown documentation from the repo and surfaces it in Backstage alongside the service it belongs to. This means developers discover docs when they discover services.

service-repo/
├── docs/
│   ├── index.md          # Overview
│   ├── architecture.md   # Architecture decisions
│   ├── runbook.md         # Operational runbook
│   └── api.md            # API reference
├── mkdocs.yml            # TechDocs config
└── catalog-info.yaml     # Links to TechDocs
yaml
# mkdocs.yml
site_name: Payment Service
nav:
  - Overview: index.md
  - Architecture: architecture.md
  - Runbook: runbook.md
  - API Reference: api.md

plugins:
  - techdocs-core

Documentation Standards

A mature platform enforces documentation standards through scorecards:

Documentation TypeRequired ForFreshness Target
READMEAll servicesUpdated per release
Architecture Decision Records (ADRs)All significant decisionsWhen decisions are made
RunbookProduction servicesReviewed quarterly
API documentationAll APIsAuto-generated from spec
Onboarding guidePer teamUpdated quarterly

Cost Attribution

Why Cost Attribution Matters

Without cost attribution, infrastructure spending is an opaque number that grows until someone panics. With per-team, per-service cost attribution, teams own their costs and can make informed trade-offs.

Implementation

yaml
# Kubernetes labels for cost attribution
metadata:
  labels:
    cost-center: "engineering"
    team: "payments"
    service: "payment-service"
    environment: "production"
    tier: "critical"
Cost Attribution Pipeline:

1. All resources labeled with team + service + environment
2. Cloud provider cost data exported (AWS CUR, GCP billing export)
3. Kubernetes cost data from kubecost / OpenCost
4. Costs aggregated by team, service, environment
5. Dashboards in Backstage or Grafana show per-team spend
6. Monthly cost reports sent to team leads
7. Teams with unusual cost growth are flagged for review

Tools for cost attribution:

ToolScopeModel
KubecostKubernetes costsOpen-source + commercial
OpenCostKubernetes costsCNCF open-source
InfracostTerraform plan cost estimationOpen-source + commercial
CloudHealthMulti-cloud cost managementCommercial (VMware)
VantageMulti-cloud cost managementCommercial

Migration: From Tickets to Self-Service

The Migration Path

Phase 1 — Audit (2-4 weeks):

  • Catalog all existing services (owner, language, deployment method, monitoring)
  • Count tickets by category (infrastructure, access, environment, deploy)
  • Interview developers (what takes the most time? what is the most frustrating?)
  • Map the current developer journey from "I have an idea" to "it is in production"

Phase 2 — Standardize (1-3 months):

  • Choose one CI/CD pipeline, one deployment method, one monitoring stack
  • Create templates for the top 2-3 service types (e.g., Go backend, React frontend)
  • Write runbooks for the top 10 operational tasks
  • Establish service ownership (every service has an owner in a spreadsheet or catalog)

Phase 3 — Automate (3-6 months):

  • Build self-service for the highest-volume ticket categories
  • Automate service creation from templates (Backstage scaffolder or equivalent)
  • Automate environment provisioning (Terraform modules + CI/CD)
  • Automate monitoring setup (included in templates)

Phase 4 — Portal (2-4 months):

  • Deploy Backstage (or alternative)
  • Migrate service catalog into Backstage
  • Surface key integrations (CI/CD status, Kubernetes health, cost, on-call)
  • Train developers, hold office hours

Phase 5 — Iterate (Ongoing):

  • Measure adoption, satisfaction, and operational metrics
  • Run quarterly developer surveys
  • Prioritize platform backlog based on data
  • Deprecate old processes as new paths mature

Common Migration Mistakes

  1. Big bang migration: Forcing all teams to adopt the platform at once. Migrate 2-3 willing teams first, fix problems, then expand.

  2. Building without users: Building platform capabilities nobody asked for. Start with the most painful, most frequent tickets.

  3. Ignoring the developer experience: A self-service portal with a terrible UI is worse than a ticket system. Invest in UX.

  4. No escape hatches: Platforms that force everyone into one box will be circumvented. Allow customization at every layer.

  5. Platform team as gatekeepers: If the platform team reviews and approves every service change, you have not built self-service — you have built a different ticket queue.


Measuring Platform Success

The Platform Success Dashboard

┌──────────────────────────────────────────────────────────┐
│                    Platform Health                         │
├─────────────────────┬────────────────────────────────────┤
│ Adoption            │ 78% of services on golden paths    │
│ Self-Service Rate   │ 92% of requests automated          │
│ Developer NPS       │ +42 (up from +28 last quarter)     │
│ Onboarding Time     │ 1.5 days (was 3 weeks)            │
│ Service Creation    │ 8 minutes (was 2 weeks)            │
│ Deploy Frequency    │ 12 deploys/day/team (was 2/week)  │
│ MTTR               │ 22 minutes (was 4 hours)           │
│ Platform Uptime    │ 99.97% (SLO: 99.9%)               │
│ Cost per Developer │ $1,200/month (was $1,800)          │
└─────────────────────┴────────────────────────────────────┘

Leading vs Lagging Indicators

TypeMetricWhat It Tells You
LeadingTemplate usage trendFuture adoption
LeadingDeveloper survey scoresFuture satisfaction
LeadingPlatform support ticket trendFuture self-service rate
LaggingDORA metrics improvementPlatform impact on delivery
LaggingDeveloper attrition rateLong-term developer satisfaction
LaggingIncident count trendPlatform quality impact

When NOT to Use Platform Engineering

  • Your org has fewer than 20 engineers: The overhead of building and maintaining a platform does not pay off at small scale. Use a PaaS (Heroku, Railway, Render) instead. You need the problems of scale before you need platform engineering.

  • You do not have executive sponsorship: Platform engineering requires sustained investment (18+ months to see full value). Without executive support, the platform team will be deprioritized when product deadlines hit.

  • Your infrastructure is simple and stable: If you have 5 services on a single cloud provider with a straightforward CI/CD pipeline, the current approach may be fine. Platform engineering solves complexity — if you do not have complexity, you do not need it.

  • You want to centralize control, not enable autonomy: Platform engineering is about self-service and developer empowerment. If the goal is to force standardization top-down without developer input, the platform will be resented and circumvented.

  • You are not willing to treat the platform as a product: If the plan is "build it once and hand it off," the platform will decay within a year. Platforms require continuous investment, user research, and iteration.

  • Your teams have fundamentally different needs: If your frontend team uses Vercel, your ML team uses Kubeflow, and your backend team uses raw EC2 instances, a single unified platform may not serve anyone well. Consider targeted automation per team instead.


Key Takeaway

  • Platform engineering is not about tools — it is about reducing cognitive load for product teams by providing self-service, opinionated defaults (golden paths), and a unified developer experience. The tools (Backstage, Terraform, ArgoCD) are means to that end.
  • Progress through maturity levels incrementally: standardize before you automate, automate before you build a portal, measure before you optimize. Skipping levels leads to fragile platforms.
  • The most important metric is developer satisfaction — measured through regular surveys. If developers do not find the platform useful, adoption will remain low regardless of how technically sophisticated it is.

Common Misconceptions

"Platform engineering is just DevOps with a new name." DevOps is a culture and set of practices for breaking down silos between development and operations. Platform engineering is a specific discipline: building a self-service product (the IDP) for internal developers. You can practice DevOps without a platform team, and you can have a platform team that does not follow DevOps principles. They are complementary but distinct.

"We need to build everything from scratch." No. A mature IDP is composed of existing tools (Backstage, Terraform, ArgoCD, Prometheus) glued together with automation and a portal. You build the integration layer and the developer experience, not the underlying tools.

"If we build it, they will come." The number one reason platform initiatives fail is building capabilities nobody asked for. Treat your internal developers as customers. Do user research. Prioritize based on actual pain points, not assumed ones. Run pilots with willing teams before organization-wide rollouts.

"The platform team should control all infrastructure." The platform team enables self-service — it does not gatekeep. If creating a database still requires a ticket to the platform team, you have not achieved self-service. The platform provides guardrails (security, cost limits, compliance), not approval workflows.

"Golden paths mean every team must use the same tools." Golden paths are recommendations, not mandates. They represent the well-supported, easy path. Teams can diverge if they have good reasons, but they lose platform support for the divergent parts. The goal is to make the golden path so good that diverging feels like unnecessary effort.

"DORA metrics measure platform success." DORA metrics measure software delivery performance, which the platform enables but does not solely determine. Platform-specific metrics (adoption rate, self-service rate, developer NPS, onboarding time) are more directly actionable. DORA metrics are a lagging indicator of platform impact.

"We need Backstage specifically." Backstage is the most popular choice, but Port, Cortex, OpsLevel, or even a custom portal built on your internal tools can serve the same purpose. What matters is the capability (unified developer portal with self-service), not the specific tool.


In Production

Spotify built Backstage to manage 2,000+ microservices across hundreds of engineering teams. They open-sourced it after proving it internally, and it is now a CNCF incubating project used by thousands of organizations.

Mercedes-Benz adopted Backstage as their internal developer portal, integrating it with their automotive software development workflows and compliance requirements.

Netflix built its own internal platform (not Backstage-based) that includes service creation, deployment, monitoring, and cost management — all self-service. Their platform engineering team is one of the most mature in the industry.

Airbnb runs an internal developer platform that reduced new service onboarding from weeks to hours. Their golden paths include service templates, CI/CD, monitoring, and automated security scanning.

Zalando uses a platform engineering approach with a centralized Kubernetes platform that serves hundreds of product teams across Europe. They measure platform success through developer satisfaction surveys and DORA metrics.

Adobe runs one of the largest Backstage deployments, with hundreds of custom plugins integrating their internal toolchain for 20,000+ engineers.


Quiz

1. What distinguishes Level 2 (Self-Service) from Level 1 (Standardized)?

At Level 1, standards exist but developers still need to interact with the infrastructure/platform team for many tasks (provisioning, environment creation, database setup). At Level 2, these tasks are automated and accessible through a self-service portal or API. The key shift is from "documented process that requires human involvement" to "automated workflow that developers execute independently."


2. Why should a platform team be structured as a product team, not a project team?

A project team builds something to a fixed spec and hands it off. A product team continuously iterates based on user feedback. Platforms that stop evolving stop being useful — developer needs change, tools evolve, and teams grow. Treating the platform as a product means having a product manager, a roadmap, user research (developer surveys), and an ongoing backlog. Without this, the platform decays within 12-18 months.


3. What is the difference between a golden path and a mandate?

A golden path is an opinionated, well-supported default that developers can follow for the fastest, easiest path to production. A mandate forces all teams to use specific tools or processes. Golden paths provide escape hatches — teams can diverge if they have good reasons, but they lose platform support for non-golden-path components. Mandates create resentment and shadow IT. The goal is to make the golden path so good that diverging is unnecessary, not to forbid it.


4. How do you measure whether a platform engineering initiative is succeeding?

Three tiers of metrics: (1) Operational metrics — onboarding time, service creation time, build times (immediate, easy to measure). (2) Platform metrics — adoption rate (% of services on golden paths), self-service rate (% of requests automated), developer NPS (quarterly surveys). (3) Business outcomes — DORA metrics improvement, incident rate trends, developer retention. Track trends over time, not absolute values.


5. Why is starting with 2-3 pilot teams better than an organization-wide rollout?

Pilot teams provide fast feedback in a low-risk environment. They discover usability issues, missing features, and edge cases before the platform is used by the whole organization. Pilots also create internal champions — developers who can advocate for the platform from personal experience. An organization-wide rollout amplifies every bug and missing feature across all teams simultaneously, creating frustration that can permanently damage the platform's reputation.


Exercise

Build a Platform Maturity Assessment for Your Organization

Assess your current platform maturity and create a 6-month roadmap:

  1. Audit current state (Level 0-3):

    • How long does it take to create a new service? (Time the entire process.)
    • How many tickets per month are filed for infrastructure tasks? (Count them.)
    • Can a new developer make their first commit within 48 hours? (Verify with a recent hire.)
    • Do you have a service catalog? (Check if it is complete and accurate.)
    • What percentage of services have monitoring and alerting configured? (Audit.)
  2. Score each dimension:

    DimensionScore (0-3)Evidence
    Service creation?Time to first deploy
    CI/CD standardization?% of teams on standard pipeline
    Monitoring coverage?% of services with golden signal alerts
    Documentation?% of services with up-to-date docs
    Self-service infrastructure?% of requests that need no ticket
    Developer onboarding?Days to first commit
    Cost attribution?Can you tell per-team spend?
    Service ownership?Is every service owned and contactable?
  3. Identify the top 3 pain points from developer interviews or surveys

  4. Create a 6-month roadmap that moves you up one maturity level:

    • Month 1-2: Address top pain point with quick wins
    • Month 3-4: Build the next most impactful capability
    • Month 5-6: Measure impact and plan next phase
  5. Define success metrics for the 6-month period:

    • One operational metric (e.g., onboarding time from 3 weeks to 3 days)
    • One adoption metric (e.g., 50% of services on golden path)
    • One satisfaction metric (e.g., developer NPS > 0)

One-Liner Summary: Platform engineering maturity is the progression from ad-hoc, ticket-driven infrastructure to a self-service internal developer platform with golden paths, service catalogs, and measured developer experience — treating your internal platform as a product with real users, not a one-time project.

Last updated: 2026-04-04

"What I cannot create, I do not understand." — Richard Feynman