RAG Content Provenance
Anchor RAG documents to verifiable provenance at ingest — docHash, CID, issuer signature. Citation authenticity becomes cryptographically traceable.
Three voices from the front line.
- IT / business DX
“We need to lock a document's provenance at the moment it's ingested into RAG”
- Governance
“We want to verify later that a document is untampered and who issued it”
- AI engineering
“We want document versioning and tamper detection operated as one”
Hand over the source, or just the facts?
Change what reaches the AI, and the leakage risk goes with it.
- doc_path:
- /shared/work-rules.pdf
- uploaded_by:
- user-123
- content:
- body…
- version:
- untracked
- signed:
- none
- subject:
- did:lemma:doc-policy-v3
- issuer:
- did:lemma:docs.internal
- sourceHash:
- 0x4f8a…
- lineageChain:
- [upload, index, embed]
- integrity:
- poseidon-merkle
- ZK verified:
- ✓ VALID
At the moment an internal document is ingested into RAG, the original is encrypted and its fingerprint (sourceHash), issuer signature and valid version are inscribed on the index side. What the AI retrieves is not the original itself but only facts that carry provenance. A cited sentence can be traced through the inscribed fingerprint to the version it came from, and once revised, citations of the old version are structurally detected.
See the technical details ↗Choose on three criteria.
Only work that needs all three at once — pass without exposing, independent verification, tamper-proof — is Lemma's domain.
| Method | Pass without exposing | Independent verification | Tamper-proof |
|---|---|---|---|
| Access control only | △ | ✗ | ✗ |
| Masking / anonymization | △ | ✗ | ✗ |
| Encryption only | ✓ | ✗ | ✗ |
| Lemma (ZK proof)the only one with all 3 | ✓ | ✓ | ✓ |
What's next
We enter through RAG provenance design and a PoC, and stay alongside you through to operations.
- A 30-minute review — identify the path where citation authenticity matters most among the documents you ingest into AI.
- Narrow to 1–2 decisions (results) to prove — e.g. "this citation derives from the valid version," "not a prior-version chunk" — the facts an AI answer relies on. Not the originals.
- Design connection and version-fixing — how the provenance layer slots into your existing RAG ingest/retrieval pipeline, and source-version fixing.
- Prove one path via a (quote-based) PoC — confirm provenance is traceable for one document class.
- Hands-on support from rollout through operations — existing plan tiers (Civic / Critical / Compliance) serve only as a cost reference; the setup and pricing are designed together.
Tell us one document path where "indexed, but unsure the citation is trustworthy" applies, in the first 30 minutes. No source documents or index internals required.
The bigger picture
The bigger picture this use case belongs to.
We map use scenarios across industries and workflows by the four axes.
See use scenarios for Verifiable Origin in Solutions →TRY LEMMA
Run it yourself.
No sales call needed — start hands-on with Lemma's products.