AVAE — Agentic Verification & Audit Engine

Enterprise document verification for UK compliance: extract structured data from PDFs, verify against official sources (Companies House, EPC, Land Registry), and keep a full audit trail with human-in-the-loop resolution.

🏛️

UK-first

Compliance focus

⚙️

LangGraph

Orchestration

👤

HITL

Human review

Get oriented

Book a short call Watch 2–3 min overview View GitHub

Problem → product flow → outcome. Short walkthrough—not a code review.

TL;DR

Problem: Teams need evidence—not fluent guesses—from PDFs and LLMs for compliance decisions.
Solution: Structured extraction, verification against authority sources where integrated, full audit trail, and human review on mismatches.
Who it’s for: Ops, risk, and compliance-led teams handling high-stakes PDF evidence (UK-first MVP).
Differentiator: Verification-first—not “chat with documents.” Compare to registers, log the decision, route exceptions to humans.
Ask: If this matches a workflow you own, watch the short overview or get in touch—no need to read every section first.

The Problem

Teams in regulated workflows cannot safely treat “the model read the PDF” as evidence. Generic RAG or summarisation gives fluent answers without deterministic checks against authority sources, traceable decisions, or a clear path when data disagrees. For UK-facing operations, proving consistency with Companies House, EPC, or Land Registry-style references—and showing your working to auditors—is as important as speed. Silent failures and missing audit trails create legal and operational risk.

For compliance & risk

Regulators and auditors ask for proof: what the document claimed, what you checked it against, and who signed off when something didn’t match.

For engineering

RAG can sound confident while ungrounded; the system needs deterministic checks where the schema allows, traceable pipeline stages, and idempotent async work—not a single opaque completion.

The Engineering Solution

The browser and API stay responsive: files go to object storage, heavy work runs in background workers, and every meaningful step can be traced in the database—including when a human must decide. Under the hood that maps to concrete services and queues, not a single monolithic script.

AVAE is a verification-first pipeline: ingest documents, run structured extraction and orchestrated reasoning, compare claims to trusted external checks where integrated, persist results and decisions in a durable audit model, and route mismatches to human review instead of hiding them. The product is built as an MVP with a clear separation between a fast API surface, object storage for files, asynchronous workers for heavy processing, and a Next.js dashboard for outcomes and review.

Diagram D — System architecture. End-to-end path: browser and Next.js UI to FastAPI, file storage and queue, worker tier, PostgreSQL for outcomes and audit data, back to the UI. LangGraph runs in the worker path; Redis supports progress alongside Postgres.

Architecture overview

In plain terms: the browser talks to an API, files land in object storage, and heavy processing runs in background workers so uploads stay fast. Verified outcomes, audit metadata, and graph checkpoints live in PostgreSQL; Redis backs operational state like progress and rate limits.

▸Technical details (stack & services)

The system uses a Next.js frontend (with Clerk authentication) talking to a FastAPI backend. Documents are stored in Amazon S3; background processing is driven by AWS SQS and workers (with Redis used for caching, rate limiting, and task state). The processing pipeline uses LangGraph for multi-step orchestration with Postgres-backed checkpoints (including human-in-the-loop flows), PostgreSQL with pgvector for retrieval where needed, and LlamaParse / LlamaIndex-family tooling plus traditional PDF libraries for parsing. External verification hooks include UK-oriented integrations such as Companies House, EPC Open Data, and Land Registry–style checks as configured. Optional Streamlit UI exists alongside the main Next.js app for experimentation.

Why async processing matters

Diagram E — Async processing. The API enqueues work to SQS; workers consume messages and write status to Redis and PostgreSQL so the UI stays responsive. Optional DLQ handles poison messages after retries.

Interested in this architecture or a similar build?

Short email works—mention regulated PDFs, verification, or LangGraph pipelines. Or watch the overview first, then reach out.

Email for a 15 min conversation Watch 2–3 min overview Contact form

Product UI

Technical challenges & solutions

Challenge 1:

Building an auditable path from raw PDF to verified fields—without relying on a single black-box summary—while keeping latency and cost manageable for an MVP.

My solution:

I combined structured parsing (LlamaParse / LlamaIndex and fallbacks like PyMuPDF, pdfplumber) with explicit pipeline stages in LangGraph so each step can be logged, retried, and inspected. Checkpoints in PostgreSQL support resumable runs and human-in-the-loop handoffs when the graph detects uncertainty or mismatch.

Takeaway: Every step is inspectable; the graph—not one black-box completion—owns resumability and HITL.

Challenge 2:

Connecting document claims to external ground truth (UK registries and datasets) in a way that is configurable and traceable in the audit log.

My solution:

I integrated verification modules against official or open APIs (e.g. Companies House, EPC Open Data, Land Registry–oriented flows per environment keys) so results record what was queried, what matched, and what failed—feeding both automated decisions and review queues.

Takeaway: Ground-truth checks are logged as first-class outcomes, not side comments in a chat transcript.

Challenge 3:

Running CPU- and IO-heavy extraction and LLM steps without blocking uploads or the interactive UI.

My solution:

The FastAPI surface accepts work quickly while SQS-backed workers perform the heavy lifting; Redis supports progress and operational concerns like rate limits. This pattern keeps the web tier responsive and allows scaling workers independently.

Takeaway: Heavy work stays off the request path; the API stays fast while workers scale.

Challenge 4:

Storing embeddings and structured outcomes in one place while keeping migrations and vector use maintainable.

My solution:

I used PostgreSQL with SQLAlchemy and Alembic migrations, pgvector where vector search is required, and clear separation between application tables, audit-oriented records, and LangGraph checkpoint storage (including langgraph-checkpoint-postgres).

Takeaway: One Postgres estate for app data, vectors, and graph state—migrations stay coherent.

Tech stack

Layer → choice → why—faster to scan than a flat tag list.

Layer	Technology	Why
Frontend	Next.js, React, TypeScript, Clerk, TanStack Query	Dashboard, PDF viewing, authenticated sessions, and API-driven UI state.
API	FastAPI, Python, Pydantic	Fast request handling, uploads, job lifecycle, and integration boundary for the worker tier.
Async & workers	Amazon SQS, Celery, Redis	Heavy extraction and LLM work off the request path; progress, limits, and task state.
Orchestration & AI	LangGraph, LangChain, OpenAI, LlamaParse, LlamaIndex	Multi-step pipelines, checkpoints/HITL, parsing, and embeddings-backed retrieval where needed.
Data	PostgreSQL, pgvector, SQLAlchemy, Alembic, LangGraph checkpoint store	Structured outcomes, vectors, migrations, and resumable graph state.
Storage & cloud	Amazon S3, AWS SDK, Docker	Document storage, deployment, and environment parity for local vs cloud.

Full tag list (same stack, collapsed)

Next.jsReactTypeScriptClerkTanStack QueryFastAPIPythonLangGraphLangChainOpenAILlamaIndexLlamaParsePostgreSQLpgvectorSQLAlchemyAlembicRedisCeleryAmazon S3Amazon SQSAWSDockerStreamlit

Roadmap

Honest boundaries: what exists today vs what comes next—no fake ship dates.

Now

MVP: end-to-end upload → process → verify → audit → review flows with core UK verification hooks where keys are configured.
FastAPI + Next.js + SQS worker path with LangGraph checkpoints and Postgres audit-oriented storage.
Short Loom overview linked from the hero on this case study (problem → flow → UI).

Deeper observability (metrics, tracing) and stronger automated tests on the graph and workers.
Vertical-specific verification packs and optional Loom refresh with industry-specific examples.

Later

Enterprise concerns (SSO/SCIM, stronger tenancy models) only where they align with product direction—stated here to set boundaries, not as a date-bound promise.

If you own a regulated PDF or verification workflow…

Insurance, legal, property, or similar high-stakes evidence—email, use the form below, or replay the overview in the hero. A short call is enough to see whether this architecture fits.

Email for a 15 min conversation Watch 2–3 min overview Contact form

View GitHub Back to Projects