| Internet-Draft | Knowledge Units | April 2026 |
| Farley | Expires 8 October 2026 | [Page] |
This document defines the Knowledge Unit (KU) format for representing verified knowledge produced through structured multi-model deliberation. A Knowledge Unit captures the question asked, the models that participated, the consensus achieved, the points of agreement and disagreement, and the cryptographic receipts that bind each deliberation round to an independently verifiable chain.¶
The format addresses the epistemic integrity gap in LLM-maintained knowledge bases: how to prove that knowledge was derived through a rigorous process, that disagreement was preserved rather than smoothed away, and that the record has not been tampered with.¶
This specification complements draft-farley-acta-signed-receipts, which defines the receipt format and verification protocol used to sign individual deliberation rounds.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 8 October 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
Large language models (LLMs) are increasingly used to produce, curate, and maintain knowledge bases. The "LLM Wiki" pattern, in which an LLM incrementally compiles and maintains a structured collection of interlinked documents from raw sources, has gained significant adoption. Multiple independent implementations have appeared across personal research, team knowledge management, and agent memory systems.¶
While effective for content maintenance, single-model knowledge bases lack three properties critical for shared knowledge:¶
This document defines the Knowledge Unit (KU) format to address these gaps. A KU is the output of a structured multi-model deliberation process in which:¶
The result is a self-contained knowledge artifact that records WHAT is known, HOW it was determined, WHERE models agree and disagree, and provides CRYPTOGRAPHIC PROOF of the entire process.¶
A Knowledge Unit is not a replacement for single-model wikis. It is the artifact produced when a question is important enough to warrant multi-model deliberation with cryptographic proof. Many wiki entries do not need this level of rigour. KUs are for the entries that do.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
A Knowledge Unit MUST be represented as a JSON object conforming to the schema defined in this section. Implementations MUST support JSON serialisation. Implementations MAY support additional serialisation formats (e.g., CBOR, YAML frontmatter). See Section 10 for alternative serialisation guidance.¶
Points where all participating models converge. Each element MUST be either a string containing a claim text, or an object with the fields: claim (string, REQUIRED), confidence (string, OPTIONAL: "high"/"medium"/"low"), evidence (string, OPTIONAL), source_refs (array of strings, OPTIONAL: references to source identifiers from the sources array).¶
The array MUST NOT be empty for KUs with consensus_level "unanimous" or "strong".¶
Source documents that provided context for the deliberation. Each element MUST be an object with: uri (string, REQUIRED), title (string, OPTIONAL), content_hash (string, REQUIRED: SHA-256 with "sha256:" prefix), ingested_at (string, REQUIRED: ISO 8601).¶
When a source's content changes (its hash no longer matches the stored content_hash), any KU derived from that source SHOULD be considered potentially stale, regardless of its fresh_until value.¶
Typed relations to other Knowledge Units. Each element MUST be an object with: target_ku_id (string, REQUIRED), relation (string, REQUIRED: one of "supports", "contradicts", "refines", "extends", "depends_on"), claims (array of strings, OPTIONAL: specific claims the relation applies to).¶
Relations are directional. Implementations SHOULD maintain bidirectional awareness by adding reciprocal relations when discovered.¶
The default deliberation process ("3-round") consists of three rounds. Implementations MAY define additional process templates; the process_template field MUST indicate which was used.¶
Each participating model independently answers the canonical question. Models MUST NOT be shown each other's responses during Round 1. Implementations SHOULD present responses to subsequent rounds using blind labels (e.g., "Response A", "Response B") to prevent anchoring on perceived model authority.¶
Implementations SHOULD include at least one model from a different training lineage than the majority (e.g., at least one non-US model when the majority are US-trained).¶
Each Round 1 response MUST be signed per [I-D.farley-acta-signed-receipts].¶
The response record MUST include: ku_id, round (1), slot (1-based), model identifier, role ("independent"), content, content_hash (SHA-256), receipt_sig (Ed25519), and receipt_kid.¶
Models are presented with all Round 1 responses (using blind labels) and assigned critique roles. Recommended roles include: verifier (checks factual claims), devil_advocate (argues against the emerging consensus), synthesizer (identifies common ground), clarity_editor (ensures responses are unambiguous).¶
Implementations SHOULD assign different roles to different models. At least one model MUST be assigned a role that challenges the emerging consensus. Each Round 2 response MUST be signed.¶
A synthesis engine processes all Round 1 and Round 2 responses to produce: agreed (array), disputed (array), uncertain (array), consensus_level (string), and follow_ups (array, OPTIONAL).¶
The synthesis output MUST be signed. The synthesis engine MUST NOT invent claims that do not appear in Round 1 or Round 2 responses. Its role is extraction and classification, not generation.¶
Consensus levels are determined structurally by the synthesis engine based on agreement patterns. Consensus levels MUST NOT be assigned editorially.¶
IMPORTANT: Consensus among AI models is evidence, not proof. Strong consensus means multiple models with different training data, architectures, and potential biases independently arrived at similar conclusions. It does not establish truth.¶
Different phrasings of the same question SHOULD resolve to the same canonical_question. Canonicalization serves three purposes: deduplication, linking (hierarchical knowledge structures), and discovery.¶
The canonicalization model is hierarchical inheritance: a general question serves as the canonical anchor, and specific variants (audience, constraint, temporal) inherit from it and add specialized components. Each variant MAY produce its own KU with a parent_ku_id reference.¶
Implementations MUST normalise questions by: converting to lowercase, removing leading/trailing whitespace, removing trailing punctuation, reducing consecutive whitespace, and removing filler phrases. Implementations SHOULD additionally apply domain-specific synonym resolution and normalise named entities to canonical forms.¶
Before initiating a new deliberation, implementations SHOULD check whether a KU already exists for the normalised canonical question. If an active KU exists, the implementation SHOULD return it rather than re-deliberating, unless the KU is stale or fresh deliberation is explicitly requested.¶
KEEP: Re-deliberation confirms same conclusions; fresh_until extended. UPDATE: Refines without contradiction; new KU with supersedes link. SUPERSEDE: Contradicts previous consensus; same mechanics as UPDATE. MERGE: Overlapping KUs combined into one. ARCHIVE: No longer relevant; superseded with no replacement.¶
IMPORTANT: Published KUs are NEVER modified in place. All changes produce new KUs with supersedes links.¶
For a standard 3-round deliberation with N models, the receipt_hash is computed as:¶
round1_hash = SHA-256(r1_1.sig || r1_2.sig || ... || r1_N.sig) round2_hash = SHA-256(r2_1.sig || r2_2.sig || ... || r2_N.sig) round3_hash = SHA-256(r3_synth.sig) receipt_hash = SHA-256(round1_hash || round2_hash || round3_hash)¶
The receipt_sig is produced by signing receipt_hash with the gateway's Ed25519 private key per [RFC8032]. Verification uses Ed25519-Verify(public_key, receipt_hash, receipt_sig). The public key MUST be retrievable through the receipt_kid field or direct inclusion.¶
To support efficient consumption across different contexts, implementations SHOULD support progressive disclosure at four standard levels:¶
An agent querying a KU corpus SHOULD read L0 for all candidates, L1 for top matches, and L2 or L3 only for the selected KU.¶
The canonical representation is JSON. However, many knowledge management tools operate on YAML frontmatter with markdown bodies. Implementations MAY represent a KU as YAML frontmatter followed by a markdown body. The markdown body maps to the synthesis field (editorial, not canonical). When converting between formats, the agreed and disputed arrays MUST preserve their full structure.¶
Model Collusion: If all models share the same bias, consensus may be spurious. Implementations SHOULD include at least one model from a different training lineage.¶
Synthesis Bias: The synthesis field is NOT canonical. The agreed, disputed, and uncertain arrays are authoritative.¶
Receipt Replay: Verifiers MUST check that receipt_hash covers the claimed content by recomputing the chain construction.¶
Key Compromise: Implementations SHOULD support key rotation and publish revocation lists.¶
Prompt Injection: Source material is context, not instruction. The adversarial critique round surfaces injected inconsistencies.¶
Freshness: Consumers SHOULD check lifecycle state before relying on KU content.¶
Source Integrity: When sources are available, content_hash enables verification that the source has not changed. Implementations SHOULD flag KUs with source hash mismatches as stale.¶
Confidence Decay: KUs about rapidly evolving domains (volatility "volatile") should be consumed with appropriate scepticism even within their freshness window.¶
This document has no IANA actions. A future revision may request registration of a media type "application/vnd.acta.knowledge-unit+json".¶