P1 · Verifiable Origin

RAG Content Provenance

Hide the document original and full text
Prove the version the AI used is an authentic publication

Anchor RAG documents to verifiable provenance at ingest — docHash, CID, issuer signature. Citation authenticity becomes cryptographically traceable.

Enterprise RAG platforms · Knowledge management · AI-native companies 6 min read
live in production since 2025 · Public-infrastructure PoC in production · ETHGlobal AI Agents 2026 Finalist
01 · THE PROBLEM

Three voices from the front line.

  • IT / business DX

    “We need to lock a document's provenance at the moment it's ingested into RAG”

  • Governance

    “We want to verify later that a document is untampered and who issued it”

  • AI engineering

    “We want document versioning and tamper detection operated as one”

02 · THE SHIFT

Hand over the source, or just the facts?

Change what reaches the AI, and the leakage risk goes with it.

Without Lemma
Hand over the original
doc_path:
/shared/work-rules.pdf
uploaded_by:
user-123
content:
body…
version:
untracked
signed:
none
↓ all of it goes to the AI / outside
With Lemma
Hand over just the facts
subject:
did:lemma:doc-policy-v3
issuer:
did:lemma:docs.internal
sourceHash:
0x4f8a…
lineageChain:
[upload, index, embed]
integrity:
poseidon-merkle
ZK verified:
✓ VALID
↓ only the necessary facts to the AI

At the moment an internal document is ingested into RAG, the original is encrypted and its fingerprint (sourceHash), issuer signature and valid version are inscribed on the index side. What the AI retrieves is not the original itself but only facts that carry provenance. A cited sentence can be traced through the inscribed fingerprint to the version it came from, and once revised, citations of the old version are structurally detected.

See the technical details ↗
03 · HOW TO CHOOSE

Choose on three criteria.

Only work that needs all three at once — pass without exposing, independent verification, tamper-proof — is Lemma's domain.

Method Pass without exposing Independent verification Tamper-proof
Access control only
Masking / anonymization
Encryption only
Lemma (ZK proof)the only one with all 3
04 · HOW IT WORKS

What's next

We enter through RAG provenance design and a PoC, and stay alongside you through to operations.

  1. A 30-minute review — identify the path where citation authenticity matters most among the documents you ingest into AI.
  2. Narrow to 1–2 decisions (results) to prove — e.g. "this citation derives from the valid version," "not a prior-version chunk" — the facts an AI answer relies on. Not the originals.
  3. Design connection and version-fixing — how the provenance layer slots into your existing RAG ingest/retrieval pipeline, and source-version fixing.
  4. Prove one path via a (quote-based) PoC — confirm provenance is traceable for one document class.
  5. Hands-on support from rollout through operations — existing plan tiers (Civic / Critical / Compliance) serve only as a cost reference; the setup and pricing are designed together.

Tell us one document path where "indexed, but unsure the citation is trustworthy" applies, in the first 30 minutes. No source documents or index internals required.

The bigger picture

The bigger picture this use case belongs to.

We map use scenarios across industries and workflows by the four axes.

See use scenarios for Verifiable Origin in Solutions →

TRY LEMMA

Run it yourself.

No sales call needed — start hands-on with Lemma's products.