Amar Chaudhari

Engineering Manager · LinkedIn

AmarChaudhari

Making AI agents reliable enough to trust in production.

I've spent a decade making large-scale systems dependable — reliability platforms, disaster recovery, and chaos engineering across hundreds of critical services. I'm now focused on the next frontier: reliability for Agentic AI.

Where I'm headed

Agents are powerful but unpredictable. My work is making them trustworthy — applying hard-won reliability practice to a new class of systems.

01

Evaluation & guardrails

Designing evals, SLOs, and guardrails for non-deterministic agents — measuring quality where there is no single correct answer.

02

Failure-mode engineering

Bringing chaos engineering, disaster recovery, and incident practice to agentic workflows so failures are expected, contained, and recoverable.

03

Agents in production

The observability, capacity, and control-plane work that lets agents operate safely at scale — not just in a demo.

Recent writing

All posts →