Design production-grade AI agents with tool calling, agent loops, parallel execution, human-in-the-loop checkpoints, state persistence, and error recovery.
Decide between fine-tuning and RAG with decision frameworks, cost/performance tradeoffs, hybrid approaches, and evaluation metrics like RAGAS and G-Eval.
Build comprehensive LLM observability using LangSmith, Langfuse, span instrumentation, and LLM-as-judge evaluations to catch quality regressions and optimize performance.
Defend against prompt injection: direct vs indirect attacks, input sanitization, system prompt isolation, output validation, sandboxed execution, and rate limiting.