How long does it take a new developer to make their first commit at your organization? If the answer is measured in days or weeks rather than hours, you have a platform engineering problem.
As a Lead Engineer managing multiple projects across teams, I've witnessed firsthand how infrastructure complexity crushes developer productivity. Teams spend more time fighting Kubernetes configurations, waiting for cloud resources, and navigating approval workflows than actually writing code. Platform engineering changes this equation fundamentally.
In this guide, I'll walk you through building an Internal Developer Platform (IDP) that abstracts infrastructure complexity, enables developer self-service, and maintains the governance your organization requiresβall while reducing cognitive load and accelerating delivery.
What is Platform Engineering?
Platform engineering is the discipline of designing and building self-service capabilities for software engineering organizations. It creates a foundationβan Internal Developer Platformβthat enables developers to provision infrastructure, deploy applications, and manage services without deep infrastructure expertise or waiting on operations teams.
Platform Engineering vs DevOps
DevOps brought developers and operations closer together, but it also shifted significant operational burden onto development teams. The "you build it, you run it" philosophy, while valuable, created cognitive overload as developers needed to master Kubernetes, Terraform, CI/CD pipelines, observability stacks, and security policies alongside their actual application code.
Platform engineering addresses this by:
- Abstracting complexity: Developers interact with simplified interfaces, not raw infrastructure
- Encoding best practices: Security, compliance, and operational standards are built into the platform
- Enabling self-service: No more waiting for tickets or approvals for standard operations
- Reducing cognitive load: Developers focus on business logic, not infrastructure details
The Business Case
The numbers speak for themselves:
- Organizations with mature IDPs report 60%+ reduction in developer onboarding time
- Self-service platforms reduce infrastructure provisioning from days to minutes
- Golden paths decrease configuration errors by standardizing deployments
- Platform teams typically serve 10-15 development teams effectively
- Gartner predicts 80% of engineering organizations will have dedicated platform teams by 2026
Core Principles of Platform Engineering
Before diving into implementation, internalize these principles that separate successful platforms from expensive failures.
1. Treat Your Platform as a Product
Your developers are your customers. The platform exists to make them productive, not to showcase infrastructure sophistication.
What this means in practice:
- Conduct user research with your development teams
- Prioritize features based on developer pain points
- Measure adoption and satisfaction, not just technical metrics
- Iterate based on feedback, not assumptions
- Market the platform internallyβif developers don't know it exists, they won't use it
2. Enable Self-Service Without Sacrificing Governance
The tension between developer autonomy and organizational control is at the heart of every IDP conversation. The solution isn't choosing one over the otherβit's building guardrails that make the right thing easy and the wrong thing hard.
Self-service with guardrails:
- Developers can provision resources instantly
- All resources automatically comply with security policies
- Cost controls are enforced by default
- Audit trails are captured automatically
- Non-compliant configurations are rejected before deployment
3. Start with Golden Paths, Not Golden Cages
Golden paths are opinionated, supported routes to production that encode your organization's best practices. They're recommendations, not requirements.
Golden path characteristics:
- Cover 80%+ of common use cases
- Significantly easier than DIY alternatives
- Well-documented with clear benefits
- Allow escape hatches for legitimate edge cases
- Continuously improved based on usage patterns
4. Measure Developer Experience
You can't improve what you don't measure. Implement a Developer Experience Score (DXS) that tracks:
- Time to first commit for new developers
- Time from code commit to production deployment
- Self-service success rate (requests completed without human intervention)
- Developer satisfaction surveys (quarterly NPS)
- Platform adoption rate across teams
IDP Architecture: The Platform Layer Cake
A well-architected Internal Developer Platform consists of five interconnected planes, each serving a specific purpose.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DEVELOPER INTERFACE PLANE β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Developer β β CLI β β IDE β β
β β Portal β β Tools β β Plugins β β
β β (Backstage)β β (kubectl+) β β (VS Code) β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β PLATFORM ORCHESTRATOR β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Score / Humanitec / Crossplane / Custom Orchestration β β
β β - Workload specifications - Dynamic configuration β β
β β - Resource dependencies - Environment management β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β SECURITY PLANE β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Secrets β β Policy β β RBAC β β Supply β β
β β Mgmt β β Engine β β & IAM β β Chain β β
β β (Vault) β β (OPA) β β β β Securityβ β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β DELIVERY PLANE β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β CI/CD β β GitOps β β Artifactβ β Feature β β
β β Pipelinesβ β (ArgoCD)β β Registryβ β Flags β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β INFRASTRUCTURE PLANE β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β βKubernetesβ β Databasesβ β Message β β Cloud β β
β β Clusters β β (RDS/etc)β β Queues β β Services β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β OBSERVABILITY PLANE β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Metrics β β Logging β β Tracing β β Alerts β β
β β(Promethe)β β (Loki) β β (Jaeger) β β(PagerDut)β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. Developer Interface Plane
This is what developers actually interact with. The interface should meet developers where they are:
Developer Portal (Backstage) A single pane of glass for service catalogs, documentation, and self-service workflows. Backstage, originally from Spotify, has become the de facto standard.
CLI Tools Command-line interfaces for developers who prefer terminal workflows. Wrap complex operations in simple commands.
IDE Plugins Bring platform capabilities directly into VS Code or JetBrains IDEs. Reduce context switching.
2. Platform Orchestrator
The brain of your IDP. Platform orchestrators handle the complex logic of translating developer intent into infrastructure reality.
Key capabilities:
- Workload specifications (what the developer wants)
- Resource matching (what infrastructure to provision)
- Dynamic configuration (environment-specific settings)
- Dependency management (ordering and relationships)
Popular options:
- Score: Open-source workload specification
- Humanitec: Commercial platform orchestrator
- Crossplane: Kubernetes-native infrastructure abstraction
- Custom solutions: Built on Kubernetes operators
3. Security Plane
Security must be embedded, not bolted on. The security plane ensures every resource provisioned through the platform meets organizational standards.
Components:
- Secrets Management: Vault, AWS Secrets Manager, or similar
- Policy Engine: Open Policy Agent (OPA) for policy-as-code
- RBAC & IAM: Fine-grained access control
- Supply Chain Security: SBOM generation, image signing, vulnerability scanning
4. Delivery Plane
Automates the path from code to production:
- CI/CD Pipelines: GitHub Actions, GitLab CI, Jenkins, or cloud-native options
- GitOps Controllers: ArgoCD or Flux for declarative deployments
- Artifact Registries: Container registries with scanning
- Feature Flags: LaunchDarkly, Unleash, or similar
5. Infrastructure Plane
The underlying compute, storage, and services:
- Kubernetes Clusters: EKS, GKE, AKS, or self-managed
- Managed Databases: RDS, Cloud SQL, managed Redis
- Message Queues: SQS, Pub/Sub, Kafka
- Cloud Services: Integrated via Crossplane or Terraform
6. Observability Plane
You can't manage what you can't see:
- Metrics: Prometheus + Grafana
- Logging: Loki, Elasticsearch, or cloud-native
- Tracing: Jaeger, Tempo, or commercial APM
- Alerting: PagerDuty, Opsgenie integration
Building Your First Golden Path
Let's build a practical golden path for deploying a web service. This covers 80% of what development teams need.
Step 1: Define the Developer Experience
What should the developer do to deploy a new service?
# Ideal developer experience
$ platform create service my-api \
--template nodejs-api \
--environment staging
β Repository created: github.com/org/my-api
β CI/CD pipeline configured
β Kubernetes namespace provisioned
β Database (PostgreSQL) provisioned
β Secrets configured in Vault
β Monitoring dashboards created
β Service registered in Backstage
Your service is ready!
Dashboard: https://backstage.internal/catalog/my-api
Deployment URL: https://my-api.staging.internal
Step 2: Create the Service Template
Using Backstage scaffolder:
# templates/nodejs-api/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: nodejs-api-template
title: Node.js API Service
description: Create a production-ready Node.js API with all infrastructure
tags:
- recommended
- nodejs
- api
spec:
owner: platform-team
type: service
parameters:
- title: Service Information
required:
- name
- description
- owner
properties:
name:
title: Service Name
type: string
description: Unique name for your service
pattern: '^[a-z0-9-]+$'
description:
title: Description
type: string
owner:
title: Owner Team
type: string
ui:field: OwnerPicker
ui:options:
allowedKinds:
- Group
- title: Infrastructure Options
properties:
database:
title: Database
type: string
default: postgresql
enum:
- postgresql
- mysql
- none
cacheEnabled:
title: Enable Redis Cache
type: boolean
default: false
steps:
# Create repository from template
- id: fetch-base
name: Fetch Base Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
description: ${{ parameters.description }}
owner: ${{ parameters.owner }}
# Create GitHub repository
- id: publish
name: Create Repository
action: publish:github
input:
repoUrl: github.com?repo=${{ parameters.name }}&owner=org
description: ${{ parameters.description }}
defaultBranch: main
protectDefaultBranch: true
# Provision infrastructure via Crossplane
- id: provision-infra
name: Provision Infrastructure
action: kubernetes:apply
input:
manifest:
apiVersion: platform.company.io/v1alpha1
kind: ServiceEnvironment
metadata:
name: ${{ parameters.name }}
spec:
database:
type: ${{ parameters.database }}
size: small
cache:
enabled: ${{ parameters.cacheEnabled }}
namespace:
create: true
# Register in catalog
- id: register
name: Register in Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
output:
links:
- title: Repository
url: ${{ steps.publish.output.remoteUrl }}
- title: Service Dashboard
url: https://backstage.internal/catalog/${{ parameters.name }}
Step 3: Define Infrastructure as Code
Using Crossplane for Kubernetes-native infrastructure:
# crossplane/composite-resource-definition.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: serviceenvironments.platform.company.io
spec:
group: platform.company.io
names:
kind: ServiceEnvironment
plural: serviceenvironments
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
database:
type: object
properties:
type:
type: string
enum: [postgresql, mysql, none]
size:
type: string
enum: [small, medium, large]
cache:
type: object
properties:
enabled:
type: boolean
namespace:
type: object
properties:
create:
type: boolean
# crossplane/composition.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: serviceenvironment-aws
spec:
compositeTypeRef:
apiVersion: platform.company.io/v1alpha1
kind: ServiceEnvironment
resources:
# Kubernetes Namespace
- name: namespace
base:
apiVersion: kubernetes.crossplane.io/v1alpha1
kind: Object
spec:
forProvider:
manifest:
apiVersion: v1
kind: Namespace
metadata:
labels:
platform.company.io/managed: "true"
patches:
- fromFieldPath: metadata.name
toFieldPath: spec.forProvider.manifest.metadata.name
# PostgreSQL Database (RDS)
- name: database
base:
apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
spec:
forProvider:
region: us-east-1
dbInstanceClass: db.t3.micro
engine: postgres
engineVersion: "15"
masterUsername: admin
skipFinalSnapshotBeforeDeletion: true
publiclyAccessible: false
writeConnectionSecretToRef:
namespace: crossplane-system
patches:
- fromFieldPath: metadata.name
toFieldPath: metadata.name
transforms:
- type: string
string:
fmt: "%s-db"
- fromFieldPath: spec.database.size
toFieldPath: spec.forProvider.dbInstanceClass
transforms:
- type: map
map:
small: db.t3.micro
medium: db.t3.small
large: db.t3.medium
Step 4: CI/CD Pipeline Template
# .github/workflows/deploy.yaml (generated for each service)
name: Build and Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
SERVICE_NAME: ${{ github.event.repository.name }}
REGISTRY: ghcr.io/${{ github.repository_owner }}
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
security-events: write
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Run security scan
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload scan results
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
- name: Build and push container
if: github.ref == 'refs/heads/main'
run: |
echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u $ --password-stdin
docker build -t $REGISTRY/$SERVICE_NAME:${{ github.sha }} .
docker push $REGISTRY/$SERVICE_NAME:${{ github.sha }}
deploy-staging:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: staging
steps:
- name: Update deployment
uses: company/platform-deploy-action@v1
with:
service: ${{ env.SERVICE_NAME }}
environment: staging
image-tag: ${{ github.sha }}
argocd-server: ${{ secrets.ARGOCD_SERVER }}
argocd-token: ${{ secrets.ARGOCD_TOKEN }}
deploy-production:
needs: deploy-staging
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
steps:
- name: Update deployment
uses: company/platform-deploy-action@v1
with:
service: ${{ env.SERVICE_NAME }}
environment: production
image-tag: ${{ github.sha }}
argocd-server: ${{ secrets.ARGOCD_SERVER }}
argocd-token: ${{ secrets.ARGOCD_TOKEN }}
Policy as Code: Guardrails That Scale
Security and compliance can't rely on manual reviews. Implement policy-as-code to enforce standards automatically.
Open Policy Agent (OPA) Example
# policy/kubernetes.rego
package kubernetes.admission
# Deny containers running as root
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
container.securityContext.runAsUser == 0
msg := sprintf("Container '%v' must not run as root", [container.name])
}
# Require resource limits
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not container.resources.limits.memory
msg := sprintf("Container '%v' must specify memory limits", [container.name])
}
# Enforce approved registries
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not starts_with(container.image, "ghcr.io/company/")
not starts_with(container.image, "gcr.io/company/")
msg := sprintf("Container '%v' uses unapproved registry", [container.name])
}
# Require labels
deny[msg] {
input.request.kind.kind == "Deployment"
not input.request.object.metadata.labels["app.kubernetes.io/name"]
msg := "Deployments must have 'app.kubernetes.io/name' label"
}
# Enforce cost allocation tags
deny[msg] {
input.request.kind.kind == "Namespace"
not input.request.object.metadata.labels["cost-center"]
msg := "Namespaces must have 'cost-center' label for cost allocation"
}
Integrating with Kubernetes
# gatekeeper-config.yaml
apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
name: config
namespace: gatekeeper-system
spec:
sync:
syncOnly:
- group: ""
version: "v1"
kind: "Namespace"
- group: "apps"
version: "v1"
kind: "Deployment"
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: require-cost-center
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Namespace"]
parameters:
labels:
- key: cost-center
- key: owner
Building the Developer Portal with Backstage
Backstage serves as the front door to your platform. Here's how to set it up effectively.
Installation and Configuration
# Create a new Backstage app
npx @backstage/create-app@latest
cd my-backstage-app
# Install useful plugins
yarn add --cwd packages/backend @backstage/plugin-kubernetes-backend
yarn add --cwd packages/app @backstage/plugin-kubernetes
yarn add --cwd packages/backend @backstage/plugin-scaffolder-backend
Service Catalog Configuration
# catalog-info.yaml (in each service repo)
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: my-api
description: My API service
annotations:
github.com/project-slug: company/my-api
backstage.io/kubernetes-id: my-api
backstage.io/techdocs-ref: dir:.
tags:
- nodejs
- api
links:
- url: https://my-api.staging.internal
title: Staging
- url: https://my-api.production.internal
title: Production
spec:
type: service
lifecycle: production
owner: team-a
system: ecommerce
dependsOn:
- resource:default/my-api-db
providesApis:
- my-api
---
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: my-api-db
description: PostgreSQL database for my-api
spec:
type: database
owner: team-a
system: ecommerce
Custom Actions for Scaffolder
// packages/backend/src/plugins/scaffolder/actions/provision-infrastructure.ts
import { createTemplateAction } from '@backstage/plugin-scaffolder-node';
import { Config } from '@backstage/config';
export const createProvisionInfraAction = (config: Config) => {
return createTemplateAction<{
serviceName: string;
environment: string;
database: string;
cacheEnabled: boolean;
}>({
id: 'platform:provision-infrastructure',
description: 'Provisions infrastructure for a new service',
schema: {
input: {
required: ['serviceName', 'environment'],
type: 'object',
properties: {
serviceName: {
type: 'string',
title: 'Service Name',
},
environment: {
type: 'string',
title: 'Environment',
enum: ['development', 'staging', 'production'],
},
database: {
type: 'string',
title: 'Database Type',
enum: ['postgresql', 'mysql', 'none'],
},
cacheEnabled: {
type: 'boolean',
title: 'Enable Redis Cache',
},
},
},
},
async handler(ctx) {
const { serviceName, environment, database, cacheEnabled } = ctx.input;
ctx.logger.info(`Provisioning infrastructure for ${serviceName}`);
// Call your platform orchestrator API
const response = await fetch(
`${config.getString('platform.orchestrator.url')}/provision`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${config.getString('platform.orchestrator.token')}`,
},
body: JSON.stringify({
serviceName,
environment,
database,
cacheEnabled,
}),
}
);
if (!response.ok) {
throw new Error(`Failed to provision infrastructure: ${response.statusText}`);
}
const result = await response.json();
ctx.logger.info(`Infrastructure provisioned: ${JSON.stringify(result)}`);
// Output for subsequent steps
ctx.output('namespaceUrl', result.namespaceUrl);
ctx.output('databaseConnectionString', result.databaseConnectionString);
},
});
};
Measuring Platform Success
Key Metrics Dashboard
Track these metrics to measure platform effectiveness:
# grafana-dashboard.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: platform-metrics-dashboard
labels:
grafana_dashboard: "1"
data:
platform-metrics.json: |
{
"title": "Platform Engineering Metrics",
"panels": [
{
"title": "Developer Onboarding Time",
"type": "stat",
"targets": [
{
"expr": "avg(developer_first_commit_time_hours)"
}
]
},
{
"title": "Self-Service Success Rate",
"type": "gauge",
"targets": [
{
"expr": "sum(rate(platform_requests_success[24h])) / sum(rate(platform_requests_total[24h])) * 100"
}
]
},
{
"title": "Time to Production",
"type": "stat",
"targets": [
{
"expr": "avg(deployment_lead_time_minutes)"
}
]
},
{
"title": "Platform Adoption",
"type": "graph",
"targets": [
{
"expr": "count(kube_namespace_labels{platform_managed='true'})"
}
]
}
]
}
Developer Experience Score (DXS)
Implement quarterly surveys with these questions:
- How easy is it to deploy a new service? (1-10)
- How often do you wait for infrastructure provisioning? (Never/Sometimes/Often)
- How clear is the platform documentation? (1-10)
- Would you recommend the platform to other teams? (NPS)
- What's the biggest pain point in your development workflow? (Open text)
Calculate DXS as a weighted average, with deployment ease and wait time weighted higher.
Common Pitfalls and How to Avoid Them
1. Building for Power Users First
Mistake: Designing the platform for your most sophisticated engineers.
Solution: Start with the 80% case. Build for the developer who just wants to deploy a standard web service. Add complexity only when clearly needed.
2. Mandatory Adoption
Mistake: Forcing teams to use the platform through mandates.
Solution: Make the platform so good that teams choose it voluntarily. If they don't, you have a product problem.
3. Ignoring Existing Workflows
Mistake: Requiring developers to completely change how they work.
Solution: Meet developers where they are. Support their existing tools (Git, VS Code, CLI) and gradually introduce new capabilities.
4. Over-Engineering the MVP
Mistake: Trying to build the perfect platform before launching.
Solution: Start with a Thinnest Viable Platform (TVP). Solve one painful problem well before expanding scope.
5. No Escape Hatches
Mistake: Forcing all use cases through standardized paths.
Solution: Provide escape hatches for legitimate edge cases. Document when and why to use them.
Getting Started: Your 90-Day Plan
Days 1-30: Foundation
- Interview developers: Identify top 3 pain points
- Choose your first golden path: Pick the most common service type
- Set up Backstage: Basic installation with service catalog
- Implement one self-service workflow: Service creation from template
Days 31-60: Expansion
- Add CI/CD integration: Automated pipelines for golden path services
- Implement basic policies: Security guardrails via OPA
- Set up observability: Basic dashboards for platform metrics
- Onboard pilot team: Get real feedback from early adopters
Days 61-90: Scale
- Add second golden path: Based on pilot team feedback
- Implement cost visibility: Show infrastructure costs per service
- Create documentation: Golden path guides and troubleshooting
- Expand adoption: Onboard 2-3 additional teams
Conclusion
Platform engineering isn't about building the most sophisticated infrastructureβit's about removing friction from developer workflows. The best platforms are invisible; developers simply get their work done without thinking about the underlying complexity.
Start small, measure everything, and treat your developers as customers. The organizations that master platform engineering will have a significant competitive advantage: their developers will spend more time building products and less time fighting infrastructure.
The tools are mature. The patterns are proven. The only question is whether you'll be the one who builds the platform your organization needs.
Resources and Further Reading
Official Documentation
Community Resources
- Platform Engineering Community
- CNCF Platform Engineering Whitepaper
- Internal Developer Platform Reference Architectures
Books
- "Team Topologies" by Matthew Skelton & Manuel Pais
- "Platform Strategy" by Gregor Hohpe
- "Building Evolutionary Architectures" by Neal Ford et al.
Frequently Asked Questions
How many engineers do I need for a platform team? Start with 2-3 engineers. A mature platform team typically serves 10-15 development teams, so plan for roughly 1 platform engineer per 5 dev teams.
Should we build or buy? Start with open-source tools (Backstage, Crossplane, ArgoCD) and build integrations. Consider commercial platform orchestrators only when you hit scale limitations.
How do we handle legacy applications? Don't force legacy apps onto the platform. Focus on new services and migrations that teams choose voluntarily.
What if developers resist the platform? Resistance usually means the platform doesn't solve real problems. Go back to user research and fix the product.
How do we measure ROI? Track developer hours saved, infrastructure costs, deployment frequency, and time-to-production. Most organizations see 30-50% improvement in developer productivity within the first year.
Have questions about platform engineering? Connect with me on LinkedIn or explore more at abdulkadersafi.com.