Experience
Engineering Manager, Online Databases
- Lead LinkedIn’s Online Analytics team, managing a unified multi-engine analytics platform across Pinot, ClickHouse, and related engines — ~13,000 servers and ~10 PB of data.
- Drive the AI-native evolution of the platform, applying AI to onboarding, query understanding, ingestion optimization, troubleshooting, and capacity planning.
- Operate as a Tech Lead Manager for a 15-engineer team, spending 50%+ of time on coding, architecture, design reviews, and platform tradeoff decisions.
- Established technical leadership across core platform areas by developing tech leads and clarifying ownership boundaries.
- Served on LinkedIn’s AI-first hiring council, helping define interview processes and evaluation criteria.
Engineering Manager, Reliability Infra
- Scaled ForgeFire from a nascent stress-testing platform into LinkedIn’s enterprise-wide reliability validation standard — 500+ critical services, ~30% lower change failure rate, ~25% lower MTTR.
- Defined and drove LinkedIn’s performance testing and disaster recovery strategy with Principal Staff engineers, VPs, infra, product, and SRE orgs.
- Led development of LinkedIn’s Disaster Recovery platform, automating failout of unhealthy datacenters and cutting time-to-mitigate by ~70%.
- Introduced AI agents to automate stress-test creation, environment setup, and result analysis — ~80% faster authoring, ~75% faster setup.
- Grew the team from 3 to 7 engineers and sponsored multiple promotions.
Site Reliability Engineer, Tech Lead
- Led reliability and performance engineering for LinkedIn Company Pages, redesigning a monolith into microservices and scaling to ~1M QPS serving 200M+ pages — ~10% lower P99 latency, ~40% less GC pause time.
- Established a reliability-first operating model: SLOs/SLAs, error budgets, automated monitoring, graceful degradation, capacity planning — ~50% less unplanned downtime.
- Introduced disaster recovery and chaos engineering as team standards.
- Designed and rolled out Investigator, a triage platform adopted by 4,000+ engineers and TSMs, cutting manual debugging toil by ~80%.
Software Engineer, Network Automation
- Implemented zero-touch provisioning for Juniper and Cisco devices, reducing deployment from hours to minutes.
- Built network configuration automation across a global fleet, eliminating recurring configuration drift.
- Migrated 3,000+ devices to enterprise observability platforms (PRTG, Grafana, PagerDuty) with automated discovery and alerting.
Education
University of Colorado, Boulder
Master's — Network Engineering · 2016 – 2018
Pune Institute of Computer Technology
Bachelor's — Computer Science (IT) · 2009 – 2013
Skills
Leadership & Execution: Team Building, Hiring, Mentorship, Technical Strategy, Roadmap & OKR Planning, Stakeholder Management
Systems & Platform Engineering: Distributed Systems, Microservices, Platform Engineering, Control Planes, Kubernetes, AWS, GCP
Reliability & Operations: SLO/SLA/SLI, Capacity Planning, Incident Management, Disaster Recovery, Chaos Engineering, Observability
AI, Data & Observability: Python, LangChain, LangGraph, OpenAI Agents SDK, RAG, LLM Evaluation, Guardrails, Claude Code