RAG Source Attestation — Use Case

This page is for

When the AI replies "based on internal policy section X," can you later prove that the citation came from the exact version of the policy that was live at the time? Are vector DBs being rebuilt, embeddings drifting, and citation-to-source mappings quietly slipping?

Cited ≠ verified. Do you have a path that cryptographically traces the basis behind an AI's answer back to the source?

Product leads and legal heads at legal-tech (case research, contract analysis AI)
Teams running enterprise knowledge platforms where citation integrity in AI answers has become an operational issue
Financial compliance functions generating regulatory reports and audit trails through AI
AI governance leads preparing for EU AI Act / ISO 42001 by setting up citation-source verification
Engineering leads who see "same question, different citation" and "post-rebuild citation drift" as an operational risk

How Lemma approaches it

For every source the AI cites in an answer, Lemma generates a ZK proof binding the citation to the docHash of the precise document version it references. A citation stops being a label and becomes a cryptographically bound reference. Source documents themselves never enter the index or the answer; what crosses to the verifier is only the cryptographic fact "this citation traces to paragraph B of policy v3, which was live at answer time."

When the vector DB is rebuilt and the policy is revised, the citation proofs attached to past AI answers remain intact. There's no reconstructing "what was that answer based on" after the fact — you reference the citation proof directly.

Where the citation-proof layer fits between your answer generation and citation step is what we map out in a first conversation.

Lemma Discovery Call — Start with a 30-minute conversation

Tell us how your AI workflow is wired today, what citations get attached to answers, and where citation integrity is hurting most. We'll explore together whether Lemma's citation-proof layer could fit. No answer logs or source documents required.

If we see a fit, we move to NDA and then into sector-specific citation requirement design, reference architecture, and PoC design.

Book a Discovery Call → Download whitepaper

A real-world example: a legal AI citing a pre-revision policy

An internal AI at a legal-tech firm reviews contracts and replies, "Clause 4.2 aligns with internal guideline GL-2025-08, section C." Six months later, after a regulator's reinterpretation, the internal guideline is revised to GL-2026-02. The RAG index has been rebuilt since.

A problem surfaces on one of those contracts, and the team needs to re-examine the past reviews. Which version of which guideline did the AI rely on, in which paragraph? The answer log says "GL-2025-08, section C," but the vector DB has been rebuilt and embeddings have shifted. There is no cryptographic evidence that the paragraph quoted at the time and the same-named paragraph in the current index are the same content.

With Lemma in place, every citation in an AI answer ships with a docHash proof. Differences between the paragraph C of the guideline as it was cited and paragraph C in the current revision are structurally detectable. No need to dig through answer logs — you reference the citation proof to cryptographically present "the answer relied on this exact docHash of section C in GL-2025-08."

Sector-specific citation requirement design, integration patterns with RAG and retrieval frameworks (LangChain, LlamaIndex, etc.), and evidence-trail design for EU AI Act / ISO 42001 are shared in the sector-specific kit we send after the consultation call.

Architecture in concept

Lemma does not replace your AI answer generation (LLM, prompts, RAG retrieval). We add a citation-attestation step between answer generation and citation emission.

When the LLM produces an answer carrying citations, Lemma generates a ZK proof that bundles each citation's docHash, the answer timestamp, and the RAG index version in use. The source documents themselves do not appear in the citation or in the proof. Downstream audit and compliance checks simply reference the citation proof chain to cryptographically reproduce the original mapping between citation and source content.

Integration patterns with LLM and RAG frameworks (Anthropic Claude, OpenAI, LangChain, LlamaIndex, etc.), citation-format design (footnotes, sidenotes, JSON metadata), and evidence-trail design for EU AI Act / ISO 42001 are detailed in the whitepaper and the post-call technical kit.

What Lemma cryptographically guarantees

A cryptographic binding between every citation and the docHash of the referenced document version
The answer timestamp, the RAG index version in use, and the LLM identifier
No exposure of source data, with the citation proof chain verifiable by third parties
The cryptographic identity of past citations, unchanged across document revisions and index rebuilds

Related Use Cases

1. AI Audit Log Proof (P2)

Thesis: "Logged ≠ immutable."

Both use cases address verifiability of AI system outputs. While rag‑source‑attestation proves that a specific citation matches a source document, ai‑audit‑log‑proof ensures that the entire audit trail of AI decisions—including queries, responses, model versions, and inference parameters—is tamper‑evident and anchored on‑chain.

Complementary Focus

Citation‑level proof vs. log‑level proof.
RAG‑source‑attestation provides fine‑grained, per‑citation binding; AI‑audit‑log‑proof provides coarse‑grained, holistic integrity of the entire interaction log.
Together, they enable end‑to‑end verifiability: from the user query, through the AI's reasoning, down to the exact document fragments cited.

Overlap in Technology

Both use ZK proofs and on‑chain anchoring (LemmaRegistry).
Both require issuer signatures and timestamp immutability.
Both are designed to be independent of the underlying AI infrastructure's mutable state.

2. RAG Content Provenance (P1)

Thesis: "Indexed ≠ trustworthy."

RAG‑source‑attestation assumes the document being cited already has a verifiable provenance. RAG‑content‑provenance provides that foundation: it ensures each document ingested into the RAG index is cryptographically bound to its source, issuer, and ingestion timestamp.

Upstream/Downstream Relationship

Upstream: RAG‑content‑provenance guarantees that documents entering the index have known, trusted origins.
Downstream: RAG‑source‑attestation guarantees that citations pulled from the index refer to those verified documents.

Layered Assurance

Without provenance at ingestion, citations could be bound to documents of unknown or malicious origin. With both layers:

Ingestion‑time proof: Document X comes from Legal Department, signed at time T₁.
Citation‑time proof: AI’s citation of paragraph Y is bound to Document X’s hash at time T₂.
Chain of trust: The citation inherits the provenance of the document it references.

Cross‑Use‑Case Scenarios

Scenario A: Compliance Audit of AI‑Driven Policy Advice

RAG‑content‑provenance ensures the policy documents in the index are authentic and up‑to‑date.
RAG‑source‑attestation proves that each piece of advice cites the correct policy version.
AI‑audit‑log‑proof provides an immutable record of which employee asked which question, when, and what the AI answered.

Scenario B: Litigation Discovery

A court requests evidence that an AI’s response relied on a specific version of a regulatory document.
RAG‑source‑attestation supplies the citation proof.
RAG‑content‑provenance supplies the proof that the regulatory document was issued by the competent authority.
AI‑audit‑log‑proof supplies the proof that the query and response were not altered after the fact.

Scenario C: Supply‑Chain Due Diligence

An AI reviews supplier certifications and flags non‑compliant suppliers.
RAG‑content‑provenance verifies the authenticity of the certification documents.
RAG‑source‑attestation verifies that the AI’s “non‑compliant” verdict cites the exact clause that was violated.
AI‑audit‑log‑proof keeps a tamper‑proof record of the due‑diligence process.