This page is for
Is your enterprise RAG indexing thousands of internal documents every day? At ingestion, are identity with the source, issuer signatures, and revision history all being quietly dropped?
When the AI cites a specific sentence in an answer, can you cryptographically trace it back to the currently-valid version of the original document — from the indexed chunk forward?
- Heads and architects of enterprise RAG and knowledge platforms
- AI product owners ingesting internal documents (policies, contracts, SOPs, technical specs) at scale
- Compliance leads in regulated industries (finance, healthcare, legal) where the authenticity of AI-referenced documents matters
- Teams running into "the AI just quoted an outdated policy" or "we're getting answers based on the pre-revision SOP" as an operational issue
- Engineering leads who have DLP and existing knowledge-management platforms but want post-ingestion document chunks anchored to verifiable provenance
How Lemma approaches it
At the moment an internal document enters the RAG pipeline, Lemma encrypts the source with AES-GCM and writes the docHash, content identifier (CID), issuer signature, and active-version metadata into the index. What the retrieval layer matches on is not the raw source but a fact carrying provenance.
A sentence quoted in an answer is cryptographically traceable, via the indexed docHash, to a specific version of a specific source. When the document is revised, a new docHash and issuer signature are attached; references to the prior version are structurally detectable. The AI only touches facts whose provenance can be verified.
Where the provenance layer fits into your existing RAG ingest pipeline is what we map out in a first conversation.
Lemma Discovery Call — Start with a 30-minute conversation
Tell us how your RAG pipeline is wired today, which document classes flow through it, and where citation accuracy is hurting most. We'll explore together whether Lemma's provenance layer could fit. No source documents or index internals required.
If we see a fit, we move to NDA and then into document-class design, reference architecture, and PoC design.
A real-world example: an internal AI quoting a stale policy
An internal AI at a financial institution retrieves customer-service SOPs, compliance policies, product specs, and contract templates via RAG to answer the field team's questions. Policies are revised every few months; each department manages them in its own format. The RAG index is rebuilt after each revision, but operationally verifying that prior-version chunks were fully replaced — and that no stale quotes leak through — is hard.
One day a field rep acts on an AI answer, and audit comes back: "That policy was revised three months ago." Tracing which document version backs the chunk the AI quoted means digging across logs and reconstructing index snapshots. There is no cryptographic path that guarantees the authenticity of the citation.
With Lemma in place, each chunk carries its docHash, issuer signature, and active-version metadata from ingestion. A chunk quoted in an AI answer can be cryptographically verified as coming from the version that was live at answer time. Revisions attach a new docHash; references to prior versions are structurally detectable. At audit, the correspondence between AI answers and document versions is presentable without reconstruction.
Industry-specific document class design, integration patterns with knowledge platforms (Confluence, Notion, Box, SharePoint, etc.), and regulatory mappings (FSA, FINRA, healthcare privacy) are shared in the sector-specific kit we send after the consultation call.
Architecture in concept
Lemma does not replace your RAG pipeline (vector DB, embedding model, retrieval layer). We add a provenance layer on the ingest and retrieval paths.
Source documents are encrypted with AES-GCM. Only the docHash, CID, issuer signature, and active-version metadata enter the vector DB and search index. Retrieved chunks emerge with their provenance attached, so each citation is cryptographically traceable. Document revisions are handled by issuing a new docHash; references to prior-version chunks are structurally detectable.
Integration patterns with vector DBs (Pinecone, Weaviate, Qdrant, etc.), retrieval frameworks (LangChain, LlamaIndex, etc.), and knowledge platforms are detailed in the whitepaper and the post-call technical kit.
What Lemma cryptographically guarantees
- The ingest time, issuer signature, docHash, and CID of every document chunk
- Identity with the source — with the source itself kept encrypted under AES-GCM
- A new docHash on revision, with references to prior-version chunks structurally detectable
- A cryptographic binding between citations in AI answers and the underlying source version, verifiable by third parties
Ready to prove?
Talk to us about your use case. We respond within one business day.