Cryptography Layer

docHash — document content digest

docHash

A cryptographic digest of a document's byte representation. Lemma uses it as the base identifier that pins the identity of every provenance, attribute, and citation unit.

Definition

docHash is a fixed-length output of a collision-resistant hash function (SHA-256, BLAKE3, etc.) applied to a document's canonicalized byte representation. Identical bytes always produce the same docHash; a single bit of change produces an entirely different value.

Standing alone, docHash leaks nothing about the document — preimage recovery is computationally infeasible. Sharing the docHash therefore reveals that a document exists without exposing its contents.

For the in-circuit path, Lemma also computes a Poseidon hash representation of docHash. SHA-256-family hashes serve external interoperability; the Poseidon form serves the ZK circuit. Two layers, one logical anchor.

Lemma Oracle implementation

Provenance, attributes, and AI inference traces all collapse down to docHash. Documents, datasets, model weights, and logs become byte-level singletons — auditable, verifiable, comparable.

Combined with commitments, docHash lets Lemma prove "a document with this attribute exists" via zero-knowledge proof — without releasing the document. This is the foundation of selective disclosure.

A provenance chain is, structurally, a time-linked sequence of docHashes. docHash is the smallest atom in Lemma's cryptographic infrastructure.

Get started

Pin the unit of verification at the byte level.