AFiR: Post-Quantum Signed Inference Receipts as a TEE-Free Profile for IETF SPICE Inference Chain

Internet-Draft	AFiR SPICE Profile	June 2026
Rotzin	Expires 14 December 2026	[Page]

Abstract

This document defines AFiR (Attested Fragmented Inference Routing) as a production profile of the IETF SPICE Inference Chain specification [I-D.draft-mw-spice-inference-chain].¶

The SPICE Inference Chain defines computational provenance via two mechanisms: Zero-Knowledge Machine Learning (ZKML) proofs and Trusted Execution Environment (TEE) attestation quotes. Both require either significant proof generation latency (ZKML) or specialized hardware (TEE). Neither is deployable today in commodity serverless inference environments without infrastructure changes.¶

AFiR defines a third proof type -- post-quantum digital signature attestation using ML-DSA-65 (NIST FIPS 204) -- that is deployable on any inference platform, requires no specialized hardware, adds 0.785ms of overhead per fragment, and produces a 384-byte receipt anchored on a public blockchain. AFiR receipts are structurally compatible with the SPICE Inference Chain Merkle tree and can coexist with ZKML and TEE entries in the same session chain.¶

AFiR extends the SPICE inference chain with five concrete production primitives: Signed Tool Calls (P1), Cross-Agent Receipt Trees (P2), KV Cache Signing (P3), Model Manifest attestation (P4), and a Crypto-Agile Signature Layer (P5). All five are deployed and serving production traffic as of June 2026, making AFiR the first production implementation of the SPICE inference_root claim for multi-agent pipelines.¶

1. Introduction

1.1. The Deployment Gap in the SPICE Inference Chain

The SPICE Inference Chain [I-D.draft-mw-spice-inference-chain] defines two proof types for computational provenance:¶

ZKML proofs: mathematically certain, but proof generation takes minutes to hours per inference and is currently limited to models of approximately 100 million parameters or fewer.¶
TEE attestation: production-scale and real-time, but requires specific hardware (Intel TDX, AMD SEV-SNP, NVIDIA H100 Confidential Computing) and manufacturer PKI dependencies. Most serverless inference environments do not expose TEE primitives to the application layer.¶

The practical effect is that the SPICE Inference Chain, as currently defined, cannot be adopted in commodity cloud environments (serverless functions, container-based inference runtimes, shared GPU pools) without either accepting ZKML latency incompatible with real-time serving, or deploying specialized hardware unavailable in most production inference clouds. This leaves the majority of production AI inference volume outside the scope of any SPICE-conformant inference attestation.¶

1.2. AFiR Approach

AFiR addresses this gap by defining a third proof type: post-quantum digital signature attestation using ML-DSA-65 (NIST FIPS 204 [FIPS204]).¶

A post-quantum signature attestation makes the following proof statement:¶

"Agent A, at timestamp T, signed a commitment over (input_hash, output_hash, model_id, tool_name, session_id) using ML-DSA-65 with key K. Key K is registered and publicly verifiable. The signature is unforgeable under standard lattice hardness assumptions (Module Learning With Errors, MLWE). A cryptographic receipt anchored on Base Mainnet via USDC provides a tamper-evident timestamp independent of any single party's infrastructure."¶

This proof type does not require:¶

Specialized hardware (no TEE, no GPU confidential compute)¶
Proof generation delay (signing is 0.785ms per fragment)¶
Trust in a hardware manufacturer's PKI¶
Any changes to the inference runtime or model serving stack¶

AFiR is in production as of June 2026, operating on serverless infrastructure. All five primitives defined in this document are deployed, smoke-tested, and serving live traffic.¶

1.3. Relationship to Existing SPICE Drafts

This document is a companion to, not a replacement of:¶

[I-D.draft-mw-spice-inference-chain]: defines the inference chain Merkle structure and ZKML/TEE proof types. AFiR adds a third proof type to this framework.¶
[I-D.draft-mw-spice-actor-chain]: AFiR's P1 (Signed Tool Calls) extends the actor chain by adding per-tool-invocation receipts at the tool execution layer.¶
[I-D.draft-mw-spice-intent-chain]: AFiR's P3 (KV Cache Signing) addresses a gap not covered by the intent chain: provenance of cached token prefixes served from distributed KV stores.¶

AFiR receipt entries are structurally compatible with the SPICE inference chain Merkle tree and MAY coexist with ZKML and TEE entries in the same session's inference chain.¶

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

AFiR Receipt:: A signed record produced by the AFiR signing layer before an inference output propagates to the next stage. Contains input commitment, output commitment, model identity, timestamp, nullifier, and a post-quantum digital signature.¶
Nullifier:: A unique, non-reusable identifier bound to each AFiR receipt, preventing replay of a valid receipt against a different output.¶
On-Chain Anchor:: A transaction on Base Mainnet containing the Merkle root of a session's inference chain, providing a tamper-evident timestamp independent of any single operator's infrastructure.¶
ML-DSA-65:: Module Lattice-based Digital Signature Algorithm, security parameter set 65, as defined in NIST FIPS 204 [FIPS204]. Post-quantum secure under MLWE hardness assumptions.¶
Fragment:: The smallest unit of inference output for which an AFiR receipt is produced. In streaming inference, a fragment is a single generation step. In non-streaming inference, a fragment is the complete response.¶
KV Cache Prefix:: The cached key-value state from prior turns in a multi-turn conversation or agentic session, reused by the inference engine to avoid recomputing attention over prior tokens.¶

3. The AFiR Proof Type

3.1. Algorithm: ML-DSA-65 (NIST FIPS 204)

AFiR uses ML-DSA-65 as its primary signature algorithm. ML-DSA-65 is the NIST-standardized post-quantum digital signature algorithm (FIPS 204, August 2024), providing:¶

Security level: NIST Level 3 (approximately 128-bit classical security, quantum-secure under MLWE)¶
Signature size: 3309 bytes¶
Public key size: 1952 bytes¶
Signing time: under 1ms on commodity hardware¶
Verification time: under 1ms on commodity hardware¶

The signed message for each AFiR receipt is the SHA-256 hash of the canonical JSON serialization [RFC8785] of the receipt payload fields: input_hash, output_hash, model_id, model_fingerprint, tool_name (if applicable), session_id, iat, nullifier.¶

3.2. Performance Characteristics

AFiR measured performance on commodity serverless infrastructure (2026):¶

Signing overhead per fragment: 0.785ms¶
End-to-end median wall latency: 241ms¶
On-chain receipt anchoring: approximately 7ms (Base Mainnet via USDC)¶
Throughput cost vs. baseline: 98.5% cheaper (tiered routing)¶
Speed vs. prior signing approach: 6.1x faster (223ms vs 1,369ms P50 wall-clock)¶

These measurements are from production traffic and represent the overhead of the complete AFiR signing pipeline including on-chain anchoring.¶

3.3. On-Chain Anchoring

AFiR anchors the Merkle root of each session's inference chain on Base Mainnet via a USDC transfer carrying the root hash as calldata. This provides:¶

Tamper-evident timestamp from a public, decentralized ledger¶
Independence from any single operator's infrastructure¶
Permanent, publicly auditable record of the session root¶
Approximately 7ms latency from signing to on-chain confirmation¶

The on-chain anchor does not contain individual receipt payloads. Per-entry proof retrieval uses the inference registry URI, following the same architecture as defined in [I-D.draft-mw-spice-inference-chain] Section 5.¶

4. AFiR Entry Structure

4.1. Common Fields (SPICE-Compatible)

AFiR entries include all REQUIRED common fields from [I-D.draft-mw-spice-inference-chain] Section 4.1. The entry type value is afir_pq_signature.¶

4.2. AFiR-Specific Fields

input_hash:: SHA-256 hash of the inference input (prompt or tool call parameters).¶
nullifier:: Unique non-reusable identifier for this receipt. Format: hex string, 32 bytes.¶
algorithm:: Signature algorithm used. One of: "ML-DSA-65" (primary, post-quantum), "ML-DSA-44" (compact, post-quantum), "Ed25519" (classical, transition support), "SLH-DSA" (reserved, FIPS 205), "FN-DSA" (reserved, FIPS 206).¶
public_key_hint:: First 16 bytes (hex) of the signing public key, for key disambiguation without transmitting the full key inline.¶
receipt_chain:: URI of the AFiR inference registry partition for this session.¶
on_chain_anchor:: Base Mainnet transaction hash containing the session Merkle root. OPTIONAL at entry level; REQUIRED in the token's inference_registry response for completed sessions.¶
phase:: For P1 (Signed Tool Calls): "before" or "after", indicating whether the receipt was produced before or after tool execution.¶

4.3. Full Entry Example

The following is an example AFiR inference chain entry for a signed tool call (P1, before phase):¶

{
  "type": "afir_pq_signature",
  "sub": "spiffe://thehiveryiq.com/agent/orchestrator",
  "model_fingerprint": "sha256:a3f9...",
  "model_id": "claude-opus-4-20260401",
  "input_hash": "sha256:b7c2...",
  "output_hash": "sha256:d4e1...",
  "intent_entry_ref": 2,
  "iat": 1749780000,
  "nullifier": "8a3f2c91b0e74d56a1f3c8b2e9d07f4a...",
  "algorithm": "ML-DSA-65",
  "public_key_hint": "79c1383bb1ba226d",
  "phase": "before",
  "receipt_chain":
    "https://api.thehiveryiq.com/afir/receipts/sess-uuid-12345",
  "on_chain_anchor": null,
  "inference_digest": "sha256:f8a3...",
  "inference_sig": "eyJhbGciOiJNTC1EU0EtNjUi..."
}

5. Five Signing Primitives

AFiR ships five production primitives, each corresponding to a distinct layer of the AI inference stack.¶

5.1. P1 -- Signed Tool Calls

Endpoints: POST /v1/afir/tool/sign and POST /v1/afir/tool/verify¶

P1 produces a before-and-after receipt for every MCP or Agent-to-Agent (A2A) tool invocation. The "before" receipt is produced before the tool executes, binding: tool_name, tool_version, input_hash, model_id, session_id, parent_receipt_nullifier, iat. The "after" receipt is produced after the tool returns, binding: output_hash, tool_exit_status, latency_ms, parent_receipt_nullifier (the nullifier of the "before" receipt), iat.¶

The nullifier chain from before to after ensures that a tool call receipt cannot be detached from its corresponding response receipt, and that replay of a valid before-receipt against a different tool response is detectable.¶

P1 directly addresses the unsigned tool invocation vulnerability class present in MCP deployments. The AFiR signing sidecar intercepts the call before the MCP transport layer, requiring no changes to MCP server implementations.¶

5.2. P2 -- Cross-Agent Receipt Trees

Endpoint: POST /v1/afir/tree/build¶

P2 implements the inference chain Merkle tree architecture defined in [I-D.draft-mw-spice-inference-chain] using AFiR receipt entries as leaf nodes. When Agent A calls Agent B which calls Agent C, P2 builds a Merkle tree across all receipts produced in the session. The root hash is the inference_root included in the OAuth token.¶

P2 is the AFiR reference implementation of the inference_root claim defined in [I-D.draft-mw-spice-inference-chain] Section 5.3. It is deployed and serving production traffic as of June 2026.¶

5.3. P3 -- KV Cache Signing

Endpoint: POST /v1/afir/cache/sign¶

P3 addresses a provenance gap not covered by the intent chain or the existing inference chain draft: the attestation of cached token prefixes served from distributed KV stores. In production agentic deployments using disaggregated prefill architectures, KV cache hit rates exceeding 90% have been measured. This means the majority of tokens served to the model in high-cache-hit deployments have no provenance attestation.¶

P3 signs each KV cache entry at write time and validates the signature at read time before cached tokens are injected into the model's context. If a cached prefix does not match its receipt on retrieval, the request MUST fail before the prefix is injected into the model's context.¶

5.4. P4 -- Model Manifest

Endpoints: POST /v1/afir/manifest/publish and GET /v1/afir/manifest/{nullifier}¶

P4 provides TEE-free attestation of which model, which weights, and which quantization configuration served a given request. A Model Manifest is a signed document binding: model_id, model_fingerprint (SHA-256 of model weights plus architecture), quantization, serving_runtime, infrastructure, iat, and nullifier.¶

The Model Manifest nullifier is included in all subsequent AFiR receipt entries produced during a session, creating a binding between every inference receipt and the specific model configuration that produced it.¶

P4 addresses the Model Masquerading attack class identified in [I-D.draft-mw-spice-inference-chain] Section 1.1 without requiring TEE hardware. The trust basis is the operator's key management rather than hardware isolation. P4 is therefore appropriate for environments where TEE is unavailable, with this distinction explicitly understood.¶

5.5. P5 -- Crypto-Agile Signature Layer

Endpoints: POST /v1/afir/sign and GET /v1/afir/algorithms¶

P5 implements a crypto-agile signing endpoint supporting multiple post-quantum and classical signature algorithms under a single API surface. The algorithm is specified per-request and recorded in the receipt entry, making receipts from different algorithm generations cross-verifiable via the Merkle structure.¶

Table 1: P5 Supported Algorithms
Algorithm	Status	Standard	Notes
ML-DSA-65	Active	NIST FIPS 204	Primary, post-quantum
ML-DSA-44	Active	NIST FIPS 204	Compact, post-quantum
Ed25519	Active	RFC 8032	Classical, transition support
SLH-DSA	Reserved	NIST FIPS 205	Planned
FN-DSA	Reserved	NIST FIPS 206	Planned

Algorithm negotiation follows the same model as TLS cipher suite negotiation. When a customer needs to upgrade from ML-DSA-65 to a future algorithm, they change a single configuration field. Prior receipts remain verifiable under their original algorithm.¶

Table 2: AFiR Tiered Verification
Risk Level	Actor Chain	Intent Chain	Inference Chain
Low	Sync	Skip	Skip
Medium	Sync	Cached proof	AFiR signature check (<1ms)
High	Sync	Full	AFiR + on-chain anchor (~7ms)
Critical	Sync	Full	AFiR + on-chain + ZKML/TEE

10. Security Considerations

10.1. Post-Quantum Security Basis

ML-DSA-65 is secure under the hardness of the Module Learning With Errors (MLWE) problem, which is believed to be hard for both classical and quantum computers. NIST standardized ML-DSA-65 in FIPS 204 [FIPS204] (August 2024) following an eight-year public evaluation process. The security basis of AFiR signatures is mathematical (lattice hardness), not hardware-rooted. Both trust bases are valid; they are appropriate for different deployment contexts and threat models.¶

10.2. On-Chain Anchoring and Tamper Evidence

The Base Mainnet on-chain anchor provides tamper evidence independent of AFiR operator infrastructure. An adversary wishing to forge an AFiR receipt for a past session must either forge an ML-DSA-65 signature (computationally infeasible under MLWE hardness) or rewrite Base Mainnet history (computationally infeasible under proof-of-stake consensus). Neither is feasible under standard assumptions.¶

10.3. Threat Coverage Compared to ZKML and TEE

Table 3: Threat Coverage by Proof Type
Threat	ZKML	TEE	AFiR
Model substitution	Yes	Yes	P4
Weight tampering	Yes	Yes	P4
Environment spoofing	No	Yes	No*
Replay of stale proofs	Yes	Yes	Yes
Tool call repudiation	No	No	P1
Cache poisoning	No	No	P3
Cross-agent chain break	No	No	P2
Output repudiation	Yes	Yes	Yes

* AFiR does not provide hardware-rooted proof that inference ran inside an isolated enclave. For deployments requiring environment isolation proof, TEE entries SHOULD be used for the relevant chain segments, potentially coexisting with AFiR entries as described in Section 9.¶

10.4. Key Management

AFiR signing keys MUST be generated as ML-DSA-65 key pairs per FIPS 204, stored in a key management system with access logging, rotated on a configurable schedule (90 days RECOMMENDED), and bound to a single operator identity per key pair. Public keys SHOULD be published in a discoverable registry to allow verifiers to retrieve the full public key given the public_key_hint in an AFiR receipt entry.¶

12. References

12.1. Normative References

[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8785]: Rundgren, A., Jordan, B., and S. Erdtman, "JSON Canonicalization Scheme (JCS)", RFC 8785, June 2020, <https://www.rfc-editor.org/info/rfc8785>.
[FIPS204]: National Institute of Standards and Technology, "Module-Lattice-Based Digital Signature Standard", NIST FIPS 204, August 2024, <https://csrc.nist.gov/pubs/fips/204/final>.
[I-D.draft-mw-spice-inference-chain]: Krishnan, R., Prasad, A., Lopez, D., and S. Addepalli, "Cryptographically Verifiable Inference Chain for AI Agent Computational Provenance", Work in Progress, Internet-Draft, draft-mw-spice-inference-chain-00, March 2026, <https://datatracker.ietf.org/doc/html/draft-mw-spice-inference-chain-00>.
[I-D.draft-mw-spice-actor-chain]: Prasad, A., Krishnan, R., Lopez, D., and S. Addepalli, "Cryptographically Verifiable Actor Chains for OAuth 2.0 Token Exchange", Work in Progress, Internet-Draft, draft-mw-spice-actor-chain-05, April 2026, <https://datatracker.ietf.org/doc/html/draft-mw-spice-actor-chain-05>.
[I-D.draft-mw-spice-intent-chain]: Krishnan, R., Prasad, A., Lopez, D., and S. Addepalli, "Cryptographically Verifiable Intent Chain for AI Agent Content Provenance", Work in Progress, Internet-Draft, draft-mw-spice-intent-chain-00, March 2026, <https://datatracker.ietf.org/doc/html/draft-mw-spice-intent-chain-00>.

12.2. Informative References

[RFC9334]: Birkholz, H., "Remote ATtestation procedureS (RATS) Architecture", RFC 9334, January 2023, <https://www.rfc-editor.org/info/rfc9334>.