GenAI & LLM Engineering — Anagha Solutions

Architecture

How we wire LLMs into your systems

Every production LLM deployment at Anagha follows this pattern: isolated retrieval, deterministic prompt construction, model-agnostic serving, and an evaluation loop wired to CI.

Our Approach

Four disciplines we never skip

01

Retrieval-Augmented Generation

We design chunking strategies, embedding pipelines, and re-ranking layers tailored to your document corpus — not boilerplate examples from a tutorial.

02

Prompt Engineering & Evals

Every prompt is version-controlled and tested against a golden dataset. We wire Ragas or Phoenix evals into CI so regressions surface before they reach users.

03

Agentic Workflows

Multi-step tool-calling agents with deterministic fallback paths. We instrument every tool call with traces so you can debug a bad run in seconds.

04

Model-Agnostic Serving

We abstract the model behind a provider interface. Swap GPT-4.1 for Claude or Llama 4 with a config change — your orchestration code never changes.

What We Solved

Real engagements, measurable outcomes

Insurance · Enterprise Search

RAG over 2M policy documents for a national insurer

Agents spent 8–12 minutes per call searching PDFs and wikis for policy clauses. Hallucinated answers were a compliance liability.

Built a RAG pipeline on LangChain + pgvector over 2.1M policy documents with hybrid BM25 + dense retrieval, re-ranking via Cohere, and a citation layer that pins every answer to a source paragraph.

73%Faster agent lookup

0Compliance citations issued

<180msP95 retrieval latency

Logistics · Agentic AI

Multi-agent exception handler for a 3PL operator

Tier-1 ops team resolved 300+ daily shipment exceptions manually — weather delays, address mismatches, customs holds — each requiring 3–4 system lookups.

Deployed a LangGraph multi-agent workflow with tools calling TMS, carrier APIs, and weather data. Agent drafts resolution, notifies customer, updates ETA — with human-in-loop only on escalations.

68%Exceptions auto-resolved

4.1×Ops throughput increase

FinTech · Fine-tuning

Domain-adapted code generation for a financial platform

Developers writing proprietary DSL for risk models received poor suggestions from general-purpose Copilot — unfamiliar syntax, wrong library imports.

Fine-tuned Code Llama 13B on 180K lines of internal DSL using QLoRA. Served via vLLM on A10G instances, integrated into VS Code via custom LSP extension.

40%Faster feature delivery

62%DSL suggestion acceptance rate

Technologies We Deploy

The bench behind the build

GPT-4.1 Claude Sonnet 4.6 Llama 4 Scout Gemini 2.5 Pro LangChain LlamaIndex LangGraph Semantic Kernel Pinecone pgvector Weaviate Chroma OpenAI Embeddings Cohere Reranker BGE-M3 · voyage-3 vLLM TGI Ollama Ragas Arize Phoenix QLoRA Axolotl

AI systems that work
in production, not just demos.

How we wire LLMs into your systems

Four disciplines we never skip

Retrieval-Augmented Generation

Prompt Engineering & Evals

Agentic Workflows

Model-Agnostic Serving

Real engagements, measurable outcomes

RAG over 2M policy documents for a national insurer

Multi-agent exception handler for a 3PL operator

Domain-adapted code generation for a financial platform

The bench behind the build

Ready to ship AI that actually works?

AI systems that workin production, not just demos.

How we wire LLMs into your systems

Four disciplines we never skip

Retrieval-Augmented Generation

Prompt Engineering & Evals

Agentic Workflows

Model-Agnostic Serving

Real engagements, measurable outcomes

RAG over 2M policy documents for a national insurer

Multi-agent exception handler for a 3PL operator

Domain-adapted code generation for a financial platform

The bench behind the build

Ready to ship AI that actually works?

AI systems that work
in production, not just demos.