Experience

Engineering Manager, Online Databases

LinkedIn · 2025 – Present

Lead LinkedIn’s Online Analytics team, managing a unified multi-engine analytics platform across Pinot, ClickHouse, and related engines — ~13,000 servers and ~10 PB of data.
Drive the AI-native evolution of the platform, applying AI to onboarding, query understanding, ingestion optimization, troubleshooting, and capacity planning.
Operate as a Tech Lead Manager for a 15-engineer team, spending 50%+ of time on coding, architecture, design reviews, and platform tradeoff decisions.
Established technical leadership across core platform areas by developing tech leads and clarifying ownership boundaries.
Served on LinkedIn’s AI-first hiring council, helping define interview processes and evaluation criteria.

LinkedIn · 2022 – 2024

Scaled ForgeFire from a nascent stress-testing platform into LinkedIn’s enterprise-wide reliability validation standard — 500+ critical services, ~30% lower change failure rate, ~25% lower MTTR.
Defined and drove LinkedIn’s performance testing and disaster recovery strategy with Principal Staff engineers, VPs, infra, product, and SRE orgs.
Led development of LinkedIn’s Disaster Recovery platform, automating failout of unhealthy datacenters and cutting time-to-mitigate by ~70%.
Introduced AI agents to automate stress-test creation, environment setup, and result analysis — ~80% faster authoring, ~75% faster setup.
Grew the team from 3 to 7 engineers and sponsored multiple promotions.

LinkedIn · 2018 – 2022

Led reliability and performance engineering for LinkedIn Company Pages, redesigning a monolith into microservices and scaling to ~1M QPS serving 200M+ pages — ~10% lower P99 latency, ~40% less GC pause time.
Established a reliability-first operating model: SLOs/SLAs, error budgets, automated monitoring, graceful degradation, capacity planning — ~50% less unplanned downtime.
Introduced disaster recovery and chaos engineering as team standards.
Designed and rolled out Investigator, a triage platform adopted by 4,000+ engineers and TSMs, cutting manual debugging toil by ~80%.

Rakuten · 2013 – 2016

Implemented zero-touch provisioning for Juniper and Cisco devices, reducing deployment from hours to minutes.
Built network configuration automation across a global fleet, eliminating recurring configuration drift.
Migrated 3,000+ devices to enterprise observability platforms (PRTG, Grafana, PagerDuty) with automated discovery and alerting.

University of Colorado, Boulder
Master's — Network Engineering · 2016 – 2018

Pune Institute of Computer Technology
Bachelor's — Computer Science (IT) · 2009 – 2013

Leadership & Execution: Team Building, Hiring, Mentorship, Technical Strategy, Roadmap & OKR Planning, Stakeholder Management

Systems & Platform Engineering: Distributed Systems, Microservices, Platform Engineering, Control Planes, Kubernetes, AWS, GCP

Reliability & Operations: SLO/SLA/SLI, Capacity Planning, Incident Management, Disaster Recovery, Chaos Engineering, Observability

AI, Data & Observability: Python, LangChain, LangGraph, OpenAI Agents SDK, RAG, LLM Evaluation, Guardrails, Claude Code