Building AI agents that actually work in production

Agent demos look magical. Agent products look like distributed systems with a probabilistic component. Here is what survives contact with reality.
Tools, not autonomy
The reliable pattern is a constrained tool-use loop, not open-ended autonomy. Give the model a small toolkit and a clear goal.
Observability is the moat
Log every prompt, response, and tool call. Without traces you are debugging blind.
Evals, not vibes
Build an eval set early and run it on every prompt change. Vibes drift; evals do not.



