KAI TANAKA Senior Platform Engineer I make infrastructure disappear. Kubernetes, GitOps, and the kind of observability that pages you before customers notice. kai@kaitanaka.dev · Portland, OR · kaitanaka.dev · GitHub: https://github.com/kaitanaka · LinkedIn: https://linkedin.com/in/kaitanaka ABOUT ───────────── • Nine years building and operating production Kubernetes clusters serving 50K+ rps across three cloud providers. • Built the platform that lets 200 engineers deploy to production 80 times per day with zero-downtime rollouts. • Maintain four CNCF-adjacent open-source tools with a combined 8K GitHub stars. • Deeply allergic to tickets that say "the deploy broke" — I build the systems that prevent them. EXPERIENCE ────────────────── Senior Platform Engineer — Datadog · Remote (Portland, OR) 2022 – Present Internal platform team building deployment, observability, and developer tools for 1,200 engineers. • Designed the multi-cluster GitOps deployment system handling 800+ microservices across 4 regions; zero-downtime canary rollouts reduced incident rate by 62%. • Built the self-service namespace provisioning system that cut team onboarding from 3 weeks to 2 hours. • Led the migration from Helm to Kustomize + Argo CD across 120 services; reduced config drift incidents to near-zero. • On-call rotation lead; drove post-incident reviews that reduced MTTR from 45 to 12 minutes over 18 months. Kubernetes · Argo CD · Terraform · Go · Prometheus Infrastructure Engineer — Shopify · Ottawa, ON (remote) 2019 – 2022 Core infrastructure team supporting Shopify's multi-region Kubernetes platform. • Co-architected the Black Friday/Cyber Monday capacity planning system; handled 1.3M rps peak without manual intervention. • Built the cost attribution pipeline that tagged $42M/year in cloud spend to individual teams; drove a 28% reduction in waste. • Implemented pod security policies and network policies across 6,000 namespaces; passed SOC 2 Type II audit with zero findings. • Mentored 5 junior engineers; 3 promoted to mid-level within 18 months. Kubernetes · GCP · Terraform · Ruby · Prometheus Site Reliability Engineer — New Relic · Portland, OR 2017 – 2019 SRE for the core ingest pipeline processing 1 TB/hour of telemetry data. • Reduced the Kafka consumer lag from 45 minutes to under 30 seconds through partition rebalancing and consumer tuning. • Built the automated runbook system that resolved 40% of pages without human intervention. • Authored the incident response playbook adopted company-wide. Kafka · AWS · Ansible · Python · Grafana CERTIFICATIONS ────────────────────── Certified Kubernetes Administrator (CKA) — CNCF / Linux Foundation (2023) Certified Kubernetes Security Specialist (CKS) — CNCF / Linux Foundation (2024) AWS Solutions Architect — Professional — Amazon Web Services (2022) HashiCorp Certified: Terraform Associate — HashiCorp (2021) Google Professional Cloud Architect — Google Cloud (2020) SKILLS ────────────── Orchestration: Kubernetes, Argo CD, Flux, Helm, Kustomize, Istio Infrastructure: Terraform, Pulumi, Crossplane, AWS, GCP, Azure Observability: Prometheus, Grafana, Datadog, OpenTelemetry, Loki, Tempo Languages: Go, Python, Rust, Bash, HCL, Rego Data / Messaging: Kafka, NATS, Redis, Postgres, etcd Practices: GitOps, SRE, Chaos engineering, Incident response, Capacity planning OPEN SOURCE ─────────────────── kube-janitor — Creator · Maintainer (2021–) https://github.com/kaitanaka/kube-janitor A Kubernetes controller that automatically cleans up stale preview environments and expired resources based on TTL annotations. 3.2K stars. Tech: Go, Kubernetes, controller-runtime tf-cost-guard — Creator (2022) https://github.com/kaitanaka/tf-cost-guard A Terraform plan analyzer that estimates cost impact and blocks PRs exceeding budget thresholds. Integrates with GitHub Actions and GitLab CI. Tech: Go, Terraform, GitHub Actions prom-aggregator — Creator (2023) https://github.com/kaitanaka/prom-aggregator A Prometheus federation proxy that pre-aggregates high-cardinality metrics before they hit Thanos. Reduced Thanos query latency by 70% at Datadog. Tech: Rust, Prometheus EDUCATION ───────────────── BSc, Computer Science — Oregon State University (2017) Summa cum laude. Senior capstone: container orchestration for scientific computing. LANGUAGES ───────────────── English — Native · Japanese — Fluent