Services · GenAI & LLM Engineering
We build LLM pipelines, RAG architectures, and agentic workflows that survive real enterprise traffic — not PoCs that live in a notebook.
Architecture
Every production LLM deployment at Anagha follows this pattern: isolated retrieval, deterministic prompt construction, model-agnostic serving, and an evaluation loop wired to CI.
Our Approach
We design chunking strategies, embedding pipelines, and re-ranking layers tailored to your document corpus — not boilerplate examples from a tutorial.
Every prompt is version-controlled and tested against a golden dataset. We wire Ragas or Phoenix evals into CI so regressions surface before they reach users.
Multi-step tool-calling agents with deterministic fallback paths. We instrument every tool call with traces so you can debug a bad run in seconds.
We abstract the model behind a provider interface. Swap GPT-4.1 for Claude or Llama 4 with a config change — your orchestration code never changes.
What We Solved
Agents spent 8–12 minutes per call searching PDFs and wikis for policy clauses. Hallucinated answers were a compliance liability.
Built a RAG pipeline on LangChain + pgvector over 2.1M policy documents with hybrid BM25 + dense retrieval, re-ranking via Cohere, and a citation layer that pins every answer to a source paragraph.
Tier-1 ops team resolved 300+ daily shipment exceptions manually — weather delays, address mismatches, customs holds — each requiring 3–4 system lookups.
Deployed a LangGraph multi-agent workflow with tools calling TMS, carrier APIs, and weather data. Agent drafts resolution, notifies customer, updates ETA — with human-in-loop only on escalations.
Developers writing proprietary DSL for risk models received poor suggestions from general-purpose Copilot — unfamiliar syntax, wrong library imports.
Fine-tuned Code Llama 13B on 180K lines of internal DSL using QLoRA. Served via vLLM on A10G instances, integrated into VS Code via custom LSP extension.
Technologies We Deploy
Tell us what you're building — we'll scope it in a 30-minute call.