<?xml version="1.0" encoding="UTF-8"?>
<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  ipr="trust200902"
  category="info"
  submissionType="independent"
  docName="draft-dpa-uzpif-outbound-indexing-01"
  sortRefs="true"
  symRefs="true"
  tocInclude="true"
  version="3">

  <front>
    <title abbrev="UZPIF-OI">UZPIF Outbound Indexing for Search Engines and AI</title>

    <author fullname="Benjamin Anthony Fisher" initials="B.A." surname="Fisher">
      <organization abbrev="DPA R&amp;D">DPA R&amp;D Ltd (https://www.dpa-cloud.co.uk)</organization>
      <address>
        <email>b.fisher@dpa-cloud.co.uk</email>
        <uri>https://orcid.org/0009-0004-4412-2269</uri>
      </address>
    </author>

    <date year="2026" month="March" day="16"/>

    <keyword>UZPIF</keyword>
    <keyword>outbound indexing</keyword>
    <keyword>consent-first discovery</keyword>
    <keyword>search engines</keyword>
    <keyword>AI indexing</keyword>
    <keyword>zero-port</keyword>
    <keyword>identity-bound grants</keyword>

    <abstract>
      <t>
        This document proposes an outbound, opt-in mechanism for web content discovery and indexing,
        complementing or replacing traditional inbound crawling models such as those governed by the
        Robots Exclusion Protocol (REP; <xref target="RFC9309"/>).
        In the proposed approach, servers proactively initiate authenticated outbound connections to trusted
        indexers (search engines or AI systems) using identity-bound grants, enabling explicit consent for
        indexing, freshness signalling, and content usage policy communication.
      </t>

      <t>
        The mechanism integrates with identity-centric frameworks such as the Universal Zero-Port Interconnect
        Framework (UZPIF; <xref target="UZPIF"/>) and supports both traditional search engines and AI-driven
        indexing and retrieval systems. This document is part of an experimental, research-oriented Independent
        Stream suite and defines the current normative baseline for trust objects, validation rules, and security
        semantics within its scope. Hard interoperability is expected for shared object semantics and validation
        rules. Full wire-level, clustering, and proof-family interoperability is not claimed everywhere yet; the
        remaining details are intentionally profile-defined or deferred. This revision defines semantic transparency
        objects and baseline evaluation now while leaving append-only proof-family interoperability to deployment
        profiles. The design aims to reduce unsolicited
        crawling abuse and improve signal quality for authorised indexers without claiming universal control over
        discoverability.
      </t>
    </abstract>
  </front>

  <middle>

    <section anchor="scope-and-status" toc="include">
      <name>Scope and Status</name>
      <t>
        This Internet-Draft is part of an experimental, research-oriented suite prepared for the Independent Stream.
        It is published to enable structured technical review, interoperability discussion, and disciplined
        specification development around outbound, consent-first indexing mechanisms for UZPIF-style transports.
      </t>
      <t>
        Within that suite, this document defines the current normative baseline for trust objects, validation rules,
        and security semantics for outbound, consent-first indexing over UZPIF-style transports, especially Discovery
        Grants, policy objects, and freshness signalling. Hard interoperability is expected for shared object
        semantics and validation rules.
      </t>
      <t>
        The material is a research artefact.
        It does not claim technical completeness, production readiness, or endorsement by the IETF or any other
        standards body, and it is not presented as a standards-track specification.
      </t>
      <t>
        Full wire-level, clustering, and proof-family interoperability is not claimed everywhere yet. Message
        encodings, transport bindings, proof families, and deployment profiles remain intentionally profile-defined or
        deferred. This draft therefore should not be read as claiming a fully closed wire-level system, universal
        discoverability control, or solved availability properties.
      </t>
      <t>
        It is designed for experimentation, operator feedback, and profile-driven deployments.
        It does not require changes to the HTTP protocol, but it can carry or reference HTTP-origin content.
      </t>
      <t>
        During conversion from internal research documents into IETF XML, care has been taken to:
      </t>
      <ul>
        <li><t>preserve a clear distinction between normative and informative content;</t></li>
        <li><t>use requirement language (e.g., "MUST", "SHOULD", "MAY") only where behaviour is intentionally specified;</t></li>
        <li><t>avoid any implication of registry finalisation, mandatory implementation, or standards-track status; and</t></li>
        <li><t>maintain intellectual-property neutrality, with no implied patent grants or licensing commitments beyond the IETF Trust copyright licence applicable to Internet-Draft text.</t></li>
      </ul>
      <t>
        Ongoing research, implementation, performance validation, and real-world pilot work remain outside the scope
        of this Internet-Draft text and may be pursued separately.
      </t>
    </section>

    <section anchor="executive-summary" toc="include">
      <name>Executive Summary</name>
      <t>
        This document defines an outbound indexing model in which content publishers (servers) initiate outbound,
        authenticated sessions to trusted indexers to advertise content availability, request refresh, and communicate
        explicit usage constraints.
        The model is intended as a complement to inbound crawling and robots.txt-based opt-out signalling.
      </t>
      <t>
        The core components are:
      </t>
      <ul>
        <li><t><strong>Outbound Discovery Session:</strong> A publisher-initiated secure session to an indexer service, established over UZPIF and secured using identity-bound handshakes (e.g., TLS-DPA).</t></li>
        <li><t><strong>Discovery Grant:</strong> An identity-bound, purpose-scoped authorisation object that grants an indexer permission to index, cache, summarise, or otherwise process specific content for specific purposes.</t></li>
        <li><t><strong>Policy Communication:</strong> Machine-readable statements about permitted uses (e.g., search indexing, snippet generation, retrieval augmentation, AI training), retention, attribution, and derivative generation.</t></li>
        <li><t><strong>Freshness Signalling:</strong> A method for publishers to explicitly request refresh or indicate change without exposing unauthenticated inbound endpoints.</t></li>
      </ul>
      <t>
        The proposal is compatible with legacy web publishing and can be adopted incrementally.
        It is especially suited to "zero-port" deployments where inbound crawling is undesirable or impossible.
      </t>
    </section>

    <section anchor="terminology">
      <name>Terminology</name>
      <t>
        <strong>Requirements Language:</strong>
        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
        "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted
        as described in <xref target="RFC2119"/> and <xref target="RFC8174"/> when, and only when,
        they appear in all capitals.
      </t>

      <t>
        This Internet-Draft is primarily exploratory; requirement language is used sparingly and only where
        behaviour is intentionally specified.
      </t>

      <dl newline="false">
        <dt>Publisher</dt>
        <dd><t>A server or origin that wishes to make content available for indexing.</t></dd>
        <dt>Indexer</dt>
        <dd><t>A service (e.g., search engine, AI retrieval system, dataset builder) that consumes content for indexing, ranking, retrieval, summarisation, training, or related processing.</t></dd>
        <dt>Indexing Node</dt>
        <dd><t>An independently operated Indexer instance participating in outbound indexing.</t></dd>
        <dt>Trusted Indexer</dt>
        <dd><t>An Indexer whose identity is known to, and explicitly authorised by, a Publisher via a Discovery Grant.</t></dd>
        <dt>Outbound Discovery Session</dt>
        <dd><t>A Publisher-initiated secure session to an Indexer service endpoint; within this session, discovery, policy, and content transfer messages may be exchanged.</t></dd>
        <dt>Discovery Grant</dt>
        <dd><t>A cryptographically bound, application-specific Grant profile that conveys consent and scope (what may be indexed, by whom, and for what purposes) while preserving the suite-level Grant semantics defined by UZPIF (<xref target="UZPIF"/>); see <xref target="discovery-grants"/>.</t></dd>
        <dt>Content Scope</dt>
        <dd><t>A set of resources to which a Discovery Grant or policy applies (e.g., URL set, content-hash set, semantic collection, feed).</t></dd>
        <dt>Usage Purpose</dt>
        <dd><t>A declared intent for automated processing, such as traditional search indexing, snippet generation, retrieval augmentation, or AI training.</t></dd>
        <dt>Freshness Signal</dt>
        <dd><t>A notification indicating that indexed content may have changed and that refresh is desired.</t></dd>
        <dt>Inclusion Log</dt>
        <dd><t>A transparency record stream or append-only structure that records content accepted for indexing under a declared scope and purpose; the exact append-only proof family is profile-defined.</t></dd>
        <dt>UZPIF Session</dt>
        <dd><t>A secure, identity-bound connectivity substrate defined by <xref target="UZPIF"/>, typically established via outbound connections to one or more Rendezvous Nodes.</t></dd>
      </dl>
    </section>

    <section anchor="introduction" toc="include">
      <name>Introduction</name>
      <t>
        Traditional web indexing relies on inbound crawling, where automated clients (crawlers) initiate connections
        to servers and respect opt-out signals such as robots.txt (as standardised in the Robots Exclusion Protocol;
        <xref target="RFC9309"/>).
        While effective for many years, this model exposes servers to unsolicited traffic, abuse from malicious
        crawlers, and challenges in enforcing preferences - particularly as AI systems increasingly use crawled content
        for training or real-time retrieval.
      </t>

      <t>
        Recent developments, including crawler best practices (<xref target="draft-illyes-aipref-cbcp"/>) and
        discussions on AI-specific controls, highlight limitations of opt-out regimes.
        Inbound crawling assumes servers are reachable and willing to respond, which conflicts with emerging
        zero-port and zero-trust architectures.
      </t>

      <t>
        This document describes an outbound opt-in alternative: servers explicitly initiate authenticated connections
        to authorised indexers (traditional search engines or AI agents) when they wish to be discovered or refreshed.
        Discovery requests are bound to cryptographic identities, Discovery Grants, and policy artefacts, enabling
        fine-grained control over who may index content and for what purpose (e.g., traditional search, AI training,
        or summarisation).
      </t>

      <t>
        The approach builds on identity-first transports such as the Universal Zero-Port Interconnect Framework
        (UZPIF; <xref target="UZPIF"/>) and Universal Zero-Port Transport Protocol (UZP; <xref target="UZP"/>),
        where endpoints establish outbound-only sessions.
        It provides a proactive, consent-first model suited to both human-readable web content and AI-driven search
        consumption.
      </t>

      <t>
        This is a research proposal intended for experimentation and discussion, particularly in contexts where
        reducing inbound exposure and strengthening consent are priorities.
      </t>
      <t>
        This draft should therefore be read as part of an experimental, research-oriented Independent Stream suite
        and as the current normative baseline for trust objects, validation rules, and security semantics within its
        scope. Hard interoperability is expected for shared object semantics and validation rules. Full wire-level,
        clustering, and proof-family interoperability is not claimed everywhere yet; the remaining details are
        intentionally profile-defined or deferred. Outbound initiation can reduce unsolicited crawling exposure, but
        it does not by itself provide traffic-pattern privacy, universal discoverability control, or solved
        rendezvous or indexer availability.
      </t>
    </section>

    <section anchor="problem-statement" toc="include">
      <name>Problem Statement</name>

      <t>
        The inbound crawling model is built on an assumption of open reachability: crawlers discover a server,
        initiate inbound connections, and learn policies after connecting.
        For modern deployments - especially those seeking to minimise exposed attack surface - this is an inversion of
        the desired trust model.
      </t>

      <t>
        Specific limitations include:
      </t>
      <ul>
        <li><t><strong>Unsolicited load and abuse:</strong> REP is advisory, and many automated clients do not comply. Even compliant crawlers can produce significant load when scaled across multiple indexers and AI agents.</t></li>
        <li><t><strong>Weak identity and accountability:</strong> A User-Agent string is not a strong identity; it is easy to spoof and difficult to bind to policy obligations.</t></li>
        <li><t><strong>Purpose ambiguity:</strong> The same content acquisition can be used for search indexing, summarisation, retrieval augmentation, or AI training. Without explicit purpose signalling and enforcement, site operators cannot make informed consent decisions.</t></li>
        <li><t><strong>Incompatibility with zero-port architectures:</strong> If a Publisher exposes no public inbound listening ports, inbound crawling becomes impossible by design.</t></li>
        <li><t><strong>Policy enforcement gaps:</strong> Opt-out mechanisms require a crawler to first connect and then choose to comply, rather than enforcing access authorisation at session establishment.</t></li>
      </ul>

      <t>
        The outbound indexing model aims to preserve the benefits of web discoverability while shifting control to the
        Publisher, making consent explicit, identity-bound, and enforceable within authenticated channels.
      </t>
    </section>

    <section anchor="design-goals" toc="include">
      <name>Design Goals</name>

      <t>
        The mechanism defined in this document has the following goals:
      </t>

      <ul>
        <li><t><strong>Opt-in discoverability:</strong> indexing and refresh occur only when the Publisher chooses to contact an Indexer.</t></li>
        <li><t><strong>Identity binding:</strong> all sessions are authenticated and bound to cryptographic identities, enabling durable accountability.</t></li>
        <li><t><strong>Purpose limitation:</strong> Publishers can grant indexing permission for specific purposes (e.g., search indexing) while denying others (e.g., AI training).</t></li>
        <li><t><strong>Policy expressiveness:</strong> Publishers can express usage constraints, retention expectations, attribution requirements, and derivative permissions.</t></li>
        <li><t><strong>Freshness signalling:</strong> Publishers can efficiently request refresh without being continuously crawled.</t></li>
        <li><t><strong>Transport independence:</strong> the mechanism should operate over UZPIF sessions and may be profiled for other outbound-friendly transports.</t></li>
        <li><t><strong>Incremental adoption:</strong> the mechanism complements existing protocols such as REP and does not require immediate ecosystem-wide migration.</t></li>
      </ul>
    </section>

    <section anchor="architectural-overview" toc="include">
      <name>Architectural Overview</name>

      <t>
        At a high level, outbound indexing replaces "Indexer-driven fetch" with a Publisher-initiated session that can
        carry announcements, policy, and (optionally) content.
        Under UZPIF, both Publisher and Indexer may maintain outbound connectivity to one or more Rendezvous Nodes (RNs),
        which stitch permitted sessions.
      </t>

      <figure anchor="fig-model">
        <name>Outbound indexing model: Publisher initiates the session to a trusted Indexer</name>
        <artwork><![CDATA[
Publisher (Site)          RN(s)             Trusted Indexer
  |-- outbound setup ----->|<-- outbound presence --|
  |<== identity-bound secure session (via RN stitch) ==>|
  |-- ANNOUNCE + GRANT + POLICY ------------------------->|
  |<-- (optional) REQUEST(resource set) ------------------|
  |-- CONTENT(resource set/deltas) ---------------------->|
  |<-- RECEIPT / STATUS ----------------------------------|
]]></artwork>
      </figure>

      <t>
        The session is initiated by the Publisher, but once established it is a bidirectional secure channel in which
        the Indexer may request specific resources and the Publisher may provide them.
        The key property is that the Publisher does not expose an unauthenticated public inbound service for discovery.
      </t>

      <section anchor="roles-and-relationships">
        <name>Roles and Relationships</name>

        <t>
          The model distinguishes three relationships:
        </t>

        <ul>
          <li><t><strong>Publisher &lt;-&gt; Indexer:</strong> A trust and consent relationship expressed via Discovery Grants and enforced via authenticated sessions.</t></li>
          <li><t><strong>Publisher &lt;-&gt; RN:</strong> A connectivity relationship in which the Publisher maintains outbound sessions to one or more RNs, as defined in <xref target="UZPIF"/>.</t></li>
          <li><t><strong>Indexer &lt;-&gt; RN:</strong> An analogous connectivity relationship enabling stitching to Publishers that authorise the Indexer.</t></li>
        </ul>

        <t>
          This document focuses on Publisher-to-Indexer semantics and does not redefine UZPIF stitching or transport
          behaviour.
        </t>
      </section>
    </section>

    <section anchor="trust-and-identity" toc="include">
      <name>Trust and Identity Model</name>

      <t>
        Outbound indexing relies on cryptographic identities for both Publishers and Indexers.
        In UZPIF deployments, these identities are typically represented by certificates or equivalent credentials
        issued within an identity plane (e.g., Pantheon as described in <xref target="UZPIF"/>), and sessions are
        established over secure channels such as TLS-DPA (<xref target="TLS-DPA"/>).
      </t>

      <t>
        A Publisher MUST authenticate the Indexer identity before sending any content beyond minimal discovery metadata.
        An Indexer MUST authenticate the Publisher identity before accepting Discovery Grants, policy, or content.
      </t>

      <t>
        Identity binding serves two purposes:
      </t>

      <ul>
        <li><t><strong>Consent enforcement:</strong> Discovery Grants are bound to specific identities and cannot be meaningfully replayed by unauthorised parties.</t></li>
        <li><t><strong>Operational accountability:</strong> Publishers can select and audit trusted Indexers, and Indexers can maintain verifiable provenance of content acquisition.</t></li>
      </ul>

      <section anchor="indexer-discovery">
        <name>Discovering Indexer Service Endpoints</name>

        <t>
          This document does not mandate a single Indexer discovery mechanism.
          A Publisher may discover Indexer identities and endpoints through out-of-band agreements, operator-curated
          trust lists, or an identity plane such as Pantheon (<xref target="UZPIF"/>).
        </t>

        <t>
          A Publisher SHOULD treat Indexer discovery as a trust decision comparable to granting API access.
          Blind acceptance of unsolicited Indexer identities reintroduces abuse vectors that outbound indexing is
          intended to reduce.
        </t>
      </section>
    </section>

    <section anchor="decentralisation-and-index-governance" toc="include">
      <name>Decentralisation and Index Governance</name>

      <t>
        Outbound indexing MUST NOT require a central registry for Indexer discovery, listing eligibility, or
        participation.
      </t>

      <t>
        Indexing Nodes MAY operate independently.
        Federation between Indexers is voluntary and MAY be bilateral or multilateral according to local policy.
      </t>

      <t>
        No entity SHALL possess mandatory inclusion authority.
        No index SHALL be required for network participation.
      </t>
      <t>
        Participation in outbound indexing SHALL NOT be required for transport-layer operability.
      </t>

      <t>
        These constraints are intended to prevent gatekeeper capture of discovery and listing decisions.
      </t>

        <section anchor="index-transparency">
          <name>Index Transparency</name>

          <t>
          To support transparency and accountability, Indexers that claim baseline semantic transparency support
          MUST publish transparency artefacts that are publicly retrievable and individually signature-verifiable
          under the common signed artefact envelope defined by UZPIF (<xref target="UZPIF"/>).
          This revision defines transparency semantics, object types, and baseline evaluation rules for those
          artefacts. It standardises semantic transparency objects now while leaving append-only proof-family
          interoperability to deployment profiles.
          </t>

          <t>
          Deployment profiles are responsible for fixing the append-only structure, digest algorithm, proof
          algorithm, checkpoint format, and consistency verification rules before proof-level interoperability can
          be claimed.
          </t>

          <t>
          At minimum, an implementation claiming baseline semantic transparency support MUST support the
          following concrete object types:
          </t>

        <section anchor="common-log-envelope">
          <name>Common Log Profile</name>

          <t>
            Outbound indexing uses the common signed artefact envelope for transparency artefacts and defines a common
            log body profile within that envelope. Index Transparency Entries, Signed Checkpoints, and Revocation
            Acknowledgement artefacts MUST use this profile for baseline semantic interoperability and artefact
            handling.
          </t>
          <t>
            This profile does not redefine the suite envelope. It inherits canonical serialisation, exact signature
            coverage, object_id derivation, unknown-field and unknown-extension handling, signature ordering,
            algorithm identifier matching, epoch-versus-sequence precedence, and the rule that detached signatures
            are not part of baseline interoperability for these registered object types.
          </t>
          <t>
            This profile also does not by itself define checkpoint construction, inclusion-proof shape,
            consistency-proof shape, proof verification inputs, or proof failure behaviour.
            Profiles that require interoperable proof verification MUST fix the append-only structure, digest
            algorithm, proof algorithm, checkpoint format, consistency verification rules, and any inclusion-proof
            verification inputs they require.
          </t>

          <t>A minimal common log profile MUST carry:</t>
          <ul>
            <li><t>the Indexer identity;</t></li>
            <li><t>a log object type;</t></li>
            <li><t>a sequence number or checkpoint high-water mark;</t></li>
            <li><t>a previous hash where chaining applies;</t></li>
            <li><t>a timestamp;</t></li>
            <li><t>a payload digest or the object-specific payload fields;</t></li>
            <li><t>an optional checkpoint reference; and</t></li>
            <li><t>a valid suite-envelope signature set.</t></li>
          </ul>
          <t>
            This profile binds transparency entries, checkpoints, and acknowledgement artefacts into an accountable
            log history while inheriting the suite-wide envelope unchanged. Append-only proof verification remains
            profile-defined.
          </t>
        </section>

        <section anchor="index-transparency-entry">
          <name>Index Transparency Entry</name>

          <t>
            An Index Transparency Entry records a single indexing decision or state transition.
            For baseline semantic interoperability, it MUST use the common log profile with "object_type" set to
            "index-transparency-entry".
          </t>

          <t>A minimal Index Transparency Entry MUST carry the following object-specific fields in addition to the common log profile:</t>
          <ul>
            <li><t>a decision type: include, exclude, revoke-ack, delete-ack, or update;</t></li>
            <li><t>a content-scope hash or object identifier;</t></li>
            <li><t>the relevant grant identifier;</t></li>
            <li><t>the relevant policy hash; and</t></li>
            <li><t>any object-specific status needed to interpret the decision.</t></li>
          </ul>
        </section>

        <section anchor="signed-checkpoint">
          <name>Signed Checkpoint</name>

          <t>
            A Signed Checkpoint provides a compact signed summary of the current transparency state so that relying
            parties can reference the current transparency state and, under a profile-defined proof family, evaluate
            log continuity without replaying the entire log.
            For baseline semantic interoperability, it MUST use the common log profile with "object_type" set to
            "signed-checkpoint".
          </t>

          <t>A minimal Signed Checkpoint MUST carry the following object-specific fields in addition to the common log profile:</t>
          <ul>
            <li><t>a tree size or sequence high-water mark;</t></li>
            <li><t>a root hash or chain-tip hash; and</t></li>
            <li><t>any checkpoint algorithm identifier required to interpret the root or tip hash.</t></li>
          </ul>
        </section>

        <section anchor="transparency-verification-procedure">
          <name>Transparency Evaluation and Profile-Defined Verification</name>

          <t>
            A client, Publisher, or auditor evaluating transparency artefacts SHOULD perform baseline semantic
            validation. If a deployment profile defines an append-only proof family, the evaluator SHOULD also
            perform the applicable profile-defined proof verification:
          </t>
          <ol>
            <li>
              <t><strong>Signature check:</strong> verify the signature on the Index Transparency Entry, the Signed Checkpoint (<xref target="signed-checkpoint"/>), any associated Indexer Receipt (<xref target="indexer-receipt"/>), and any Revocation Acknowledgement artefact (<xref target="revocation-acknowledgement"/>) against the authenticated Indexer identity.</t>
            </li>
            <li>
              <t><strong>Object match and freshness check:</strong> verify that the entry and any associated receipt or revocation acknowledgement match the declared scope, the relevant grant identifier, the applicable policy hash or policy object, and the expected freshness or sequence context.</t>
            </li>
            <li>
              <t><strong>Profile-defined proof check:</strong> if the deployment profile defines an append-only proof family, verify checkpoint construction and any inclusion or consistency proofs using that profile's append-only structure, digest algorithm, proof algorithm, checkpoint format, verification inputs, and failure rules.</t>
            </li>
          </ol>
          <t>
            If the baseline semantic checks fail, the artefact set MUST NOT be treated as valid accountability
            evidence for the relevant scope.
          </t>
          <t>
            If profile-defined proof verification is unavailable or fails, the artefact set MUST NOT be treated as
            interoperably verified evidence of append-only consistency.
            It MAY still be used as signed local accountability input under deployment policy.
          </t>
          <t>
            Authenticity alone is insufficient for indexing authority.
            A well-signed Discovery Grant, Signed Checkpoint, Index Transparency Entry, Indexer Receipt, or
            Revocation Acknowledgement artefact MUST also be evaluated for freshness, scope, declared sequence or checkpoint
            position, and current policy eligibility before it is relied upon as current authority or accountability
            evidence.
          </t>
          <t>
            Clients MAY use these artefacts when evaluating Indexer transparency, accountability, detectability of
            divergence, and profile-defined consistency posture.
            They MUST NOT treat the log alone as objective proof of fairness, neutral ranking, or justified
            inclusion decisions.
          </t>
        </section>
      </section>
    </section>

    <section anchor="discovery-grants" toc="include">
      <name>Discovery Grants</name>

      <t>
        A Discovery Grant conveys explicit permission from a Publisher to an Indexer, scoped by content and purpose.
        It is an authorisation artefact, not merely a preference hint.
      </t>

      <t>
        A Discovery Grant is an application-specific Grant profile.
        It MUST preserve the baseline Grant semantics defined by UZPIF (<xref target="UZPIF"/>) while adding the
        indexing-specific body fields defined in this section.
      </t>

      <t>
        Discovery Grants MUST use the common signed artefact envelope defined by <xref target="UZPIF"/> with
        "object_type" set to "discovery-grant". This document defines only the additional indexing-specific fields
        and semantics carried in the object-specific body. Discovery Grants inherit the UZPIF common envelope
        unchanged, including canonical serialisation, exact signature coverage, object_id derivation, unknown
        extension handling, signature ordering, algorithm identifier matching, epoch-versus-sequence precedence, and
        the rule that detached signatures are not part of baseline interoperability.
      </t>

      <section anchor="grant-properties">
        <name>Grant Properties</name>

        <t>
          A Discovery Grant MUST include at least:
        </t>

        <ul>
          <li><t><strong>Issuer:</strong> the Publisher identity that issues the grant.</t></li>
          <li><t><strong>Audience:</strong> the Indexer identity authorised to use the grant.</t></li>
          <li><t><strong>Scope:</strong> a Content Scope describing what may be indexed or retrieved.</t></li>
          <li><t><strong>Purposes:</strong> one or more Usage Purposes for which the Indexer is authorised.</t></li>
          <li><t><strong>Constraints:</strong> optional limits such as maximum fetch rate, retention period, or required attribution.</t></li>
          <li><t><strong>Expiry:</strong> a time limit after which the grant is no longer valid.</t></li>
          <li><t><strong>Signature set:</strong> a valid suite-envelope signature set binding the grant to the Issuer.</t></li>
        </ul>

        <t>
          Discovery Grants MUST be bound to the authenticated identities observed in the Outbound Discovery Session.
          An Indexer MUST reject a grant if the authenticated Publisher identity is not the Issuer, or if the
          authenticated Indexer identity is not the intended Audience.
        </t>
        <t>
          In the common envelope, a Discovery Grant will normally use an issuer authority identifier associated with
          the Publisher's trust context, a subject identifier for the Publisher or relationship being authorised, an
          audience identifier for the intended Indexer, and scope and policy fields that capture the authorised content
          set and usage purposes.
        </t>
        <t>
          These fields populate the discovery-grant body only and MUST NOT redefine the suite envelope semantics.
        </t>
      </section>

      <section anchor="grant-scope-examples">
        <name>Scope Examples</name>

        <t>
          Content Scope may be expressed in multiple ways, depending on deployment:
        </t>
        <ul>
          <li><t><strong>URL prefix scope:</strong> allow indexing for all resources under a given origin and path prefix.</t></li>
          <li><t><strong>Feed scope:</strong> allow indexing for resources enumerated in a signed feed.</t></li>
          <li><t><strong>Hash set scope:</strong> allow indexing for content objects identified by cryptographic hashes.</t></li>
          <li><t><strong>Semantic collection:</strong> allow indexing for a named collection (e.g., "docs", "blog", "product-catalogue") maintained by the Publisher.</t></li>
        </ul>
      </section>
    </section>

    <section anchor="policy-communication" toc="include">
      <name>Policy Communication</name>

      <t>
        Outbound indexing provides a channel for Publishers to communicate content usage policy to Indexers in a form
        that is:
      </t>

      <ul>
        <li><t>bound to authenticated identities;</t></li>
        <li><t>associated with explicit Content Scope; and</t></li>
        <li><t>auditable and revocable.</t></li>
      </ul>

      <t>
        The policy data model is intentionally generic.
        Deployments MAY use the vocabulary defined by the IETF AIPREF working group (e.g., <xref target="draft-ietf-aipref-vocab"/>),
        or they MAY define private purpose tokens under bilateral agreement.
      </t>

      <section anchor="policy-elements">
        <name>Policy Elements</name>

        <t>
          A policy statement SHOULD be able to express:
        </t>
        <ul>
          <li><t><strong>Allowed purposes:</strong> which Usage Purposes are permitted (or denied) for a given scope.</t></li>
          <li><t><strong>Derivation:</strong> whether summaries, snippets, embeddings, or other derived artefacts are permitted.</t></li>
          <li><t><strong>Training:</strong> whether AI training or model fine-tuning is permitted.</t></li>
          <li><t><strong>Attribution:</strong> requirements for attribution or source linking in downstream displays.</t></li>
          <li><t><strong>Retention:</strong> permitted retention duration for cached copies or extracted features.</t></li>
          <li><t><strong>Redistribution:</strong> whether indexed content may be redistributed or provided to third parties.</t></li>
        </ul>

        <t>
          Policy is not an access-control mechanism by itself; it is enforced by the combination of Discovery Grants,
          authenticated sessions, and Indexer compliance.
          However, unlike REP, policy is exchanged in an authenticated context where non-compliance can be attributed to
          a specific identity.
        </t>
      </section>

      <section anchor="policy-example">
        <name>Illustrative Policy Example</name>

        <t>
          The following is a non-normative example of a policy object that permits traditional search indexing and
          snippet generation, but denies training and long-term retention:
        </t>

        <figure anchor="fig-policy-example">
          <name>Example policy object (illustrative)</name>
          <artwork>
{
  "scope": "https://example.com/docs/*",
  "allowed_purposes": [
    "search.index",
    "search.snippet",
    "rag.retrieve"
  ],
  "denied_purposes": ["ai.train", "ai.finetune"],
  "derivatives": {
    "summary": "allowed",
    "embeddings": "allowed",
    "snippets": "allowed"
  },
  "retention": {
    "cached_copy_days": 14,
    "embedding_days": 30
  },
  "attribution": {
    "required": true,
    "link_back": true
  }
}
</artwork>
        </figure>

        <t>
          The internal syntax of policy objects remains open for profiling and experimentation.
          However, when a policy object is referenced by an Index Transparency Entry, Signed Checkpoint, Indexer
          Receipt, or Revocation Acknowledgement, it MUST be identifiable by a stable policy hash or equivalent
          canonical reference.
          A deployment MAY sign policy objects as part of a Discovery Grant, or MAY carry them as a separate signed
          artefact within the session.
        </t>
      </section>
    </section>

    <section anchor="protocol-operation" toc="include">
      <name>Protocol Operation</name>

      <t>
        This section describes a baseline operational sequence.
        Session message framing remains intentionally abstract in this version of the document; the focus is on
        semantics and security properties.
        This abstraction does not apply to Index Transparency Entries, Signed Checkpoints
        (<xref target="signed-checkpoint"/>), Indexer Receipts (<xref target="indexer-receipt"/>), or Revocation
        Acknowledgement artefacts (<xref target="revocation-acknowledgement"/>), which use the interoperable object
        formats defined in this document.
      </t>

      <section anchor="session-establishment">
        <name>Session Establishment</name>

        <t>
          A Publisher initiates an Outbound Discovery Session to an Indexer endpoint using UZPIF connectivity
          (<xref target="UZPIF"/>) and an identity-bound secure channel (e.g., TLS-DPA; <xref target="TLS-DPA"/>).
        </t>

        <t>
          The Publisher MUST verify the authenticated Indexer identity before sending any sensitive content.
          The Indexer MUST verify the authenticated Publisher identity before accepting Discovery Grants, policy, or
          indexing data.
        </t>
      </section>

      <section anchor="announcement-and-grant-presentation">
        <name>Announcement and Grant Presentation</name>

        <t>
          Once the secure session is established, the Publisher sends:
        </t>

        <ul>
          <li><t>a discovery announcement identifying the Publisher, site scope, and desired indexing actions (new discovery, refresh, or revocation);</t></li>
          <li><t>one or more Discovery Grants defined in <xref target="discovery-grants"/>; and</t></li>
          <li><t>associated policy statements for the relevant Content Scopes.</t></li>
        </ul>

        <t>
          The Indexer MUST validate Discovery Grants and reject any announcement that lacks sufficient authorisation for
          the requested purposes.
        </t>
      </section>

      <section anchor="content-transfer-modes">
        <name>Content Transfer Modes</name>

        <t>
          This document defines two conceptual modes for content transfer:
        </t>

        <ul>
          <li><t><strong>Publisher Push:</strong> the Publisher proactively sends content objects or deltas to the Indexer.</t></li>
          <li><t><strong>Indexer Request within Session:</strong> the Indexer requests specific resources inside the established session, and the Publisher responds over the same channel.</t></li>
        </ul>

        <t>
          Both modes preserve the "no unauthenticated inbound ports" property because the Indexer does not initiate a new
          network connection to the Publisher.
        </t>

        <t>
          A Publisher MAY choose to offer only one mode.
          For example, a Publisher with strict egress policy may prefer request/response within a session, while a
          Publisher with pre-generated feeds may prefer push.
        </t>
      </section>

      <section anchor="freshness-signals">
        <name>Freshness Signalling</name>

        <t>
          A Publisher MAY send Freshness Signals to request that an Indexer refresh previously indexed content.
          Freshness Signals SHOULD be lightweight and SHOULD include enough metadata for the Indexer to prioritise work
          (e.g., change timestamps, object identifiers, or hashes).
        </t>

        <t>
          Freshness Signals do not, by themselves, grant permission; they operate under the permissions already
          established by Discovery Grants and policy.
        </t>
      </section>

      <section anchor="receipts-and-auditability">
        <name>Receipts and Auditability</name>

        <t>
          An Indexer MAY provide receipts indicating that it has accepted content for indexing and the purposes under
          which it will be processed.
          Receipts can improve auditability and facilitate contractual enforcement in commercial relationships.
          Their relationship to Signed Checkpoints is evaluated using
          <xref target="transparency-verification-procedure"/>.
        </t>

        <t>
          To make receipts interoperable, this document defines a minimal Indexer Receipt object.
          For interoperable exchange, it MUST use the common signed artefact envelope defined by UZPIF
          (<xref target="UZPIF"/>) with "object_type" set to "indexer-receipt".
        </t>

        <section anchor="indexer-receipt">
          <name>Indexer Receipt Format</name>

          <t>A minimal Indexer Receipt MUST carry:</t>
          <ul>
            <li><t>the Publisher identity;</t></li>
            <li><t>the Indexer identity;</t></li>
            <li><t>the grant identifier;</t></li>
            <li><t>the scope hash;</t></li>
            <li><t>the declared purpose or purposes;</t></li>
            <li><t>a timestamp; and</t></li>
            <li><t>a valid suite-envelope signature set.</t></li>
          </ul>
          <t>
            Receipts MUST be matchable to the relevant Index Transparency Entry and Signed Checkpoint so that a
            relying party can confirm that the declared grant, scope, and purposes align with logged behaviour.
          </t>
          <t>
            Profiles MAY extend receipts with retention commitments, policy hashes, receipt identifiers, or
            processing-mode declarations.
          </t>
          <t>
            These fields populate the indexer-receipt body only and MUST NOT redefine the suite envelope semantics.
            Indexer Receipts inherit the UZPIF common envelope unchanged, including canonical serialisation, exact
            signature coverage, object identifiers, unknown extension handling, signature ordering, algorithm
            identifier matching, epoch-versus-sequence precedence, and the rule that detached signatures are not part
            of baseline interoperability.
          </t>
        </section>
      </section>

      <section anchor="revocation">
        <name>Revocation</name>

        <t>
          A Publisher MUST be able to revoke consent.
          Revocation may apply to a specific grant, a policy scope, or an entire Publisher-to-Indexer relationship.
        </t>

        <t>
          Upon receiving a valid revocation instruction from an authenticated Publisher, an Indexer SHOULD cease further
          acquisition under the revoked scope and SHOULD follow the Publisher's stated retention and deletion policy.
        </t>
        <t>
          If revocation is expressed as a signed object rather than an in-band session instruction, it MUST use the
          common signed artefact envelope defined by UZPIF (<xref target="UZPIF"/>) with "object_type" set to
          "revocation" and with scope and epoch or sequence values sufficient for freshness and conflict handling
          when interoperable exchange is required.
          Deployments that require interoperable signed revocation objects or quorum-backed revocation evidence SHOULD
          align those objects with the Revocation Signal and Threshold-Consensus Evidence models defined by TLS-DPA
          (<xref target="TLS-DPA"/>).
        </t>
        <t>
          A Publisher SHOULD treat revocation as an operational and legal relationship issue.
          Technical signalling can communicate intent and scope, but enforcement ultimately depends on Indexer compliance.
        </t>
        <section anchor="revocation-acknowledgement">
          <name>Revocation Acknowledgement Artefact</name>

          <t>
            To support accountable revocation handling, an Indexer that accepts and processes a revocation request
            SHOULD emit a Revocation Acknowledgement artefact.
            For baseline semantic interoperability, this artefact MUST use the common log profile with "object_type" set to
            "revocation-acknowledgement" or an Index Transparency Entry decision type of "revoke-ack" or
            "delete-ack".
          </t>
          <t>A minimal Revocation Acknowledgement artefact MUST carry:</t>
          <ul>
            <li><t>the Publisher identity;</t></li>
            <li><t>the Indexer identity;</t></li>
            <li><t>the referenced revocation signal, grant identifier, or policy identifier;</t></li>
            <li><t>the affected scope hash or object identifier;</t></li>
            <li><t>an acknowledgement type of revoke-ack or delete-ack;</t></li>
            <li><t>a processing timestamp;</t></li>
            <li><t>an optional checkpoint reference; and</t></li>
            <li><t>a valid suite-envelope signature set.</t></li>
          </ul>
          <t>
            A relying party SHOULD be able to validate the acknowledgement's signatures, scope linkage, and any
            profile-defined checkpoint linkage using the procedure in
            <xref target="transparency-verification-procedure"/>.
          </t>
        </section>
      </section>
    </section>

    <section anchor="relationship-to-existing-mechanisms" toc="include">
      <name>Relationship to Existing Mechanisms</name>

      <t>
        Outbound indexing is designed to be complementary to existing web controls and does not attempt to obsolete them.
      </t>

      <section anchor="rep-relationship">
        <name>Relationship to the Robots Exclusion Protocol</name>

        <t>
          REP (<xref target="RFC9309"/>) is an opt-out signalling mechanism interpreted by automated clients that initiate
          inbound connections.
          It is widely deployed and remains relevant for legacy crawling.
        </t>

        <t>
          Outbound indexing differs in that it:
        </t>

        <ul>
          <li><t>is opt-in by default;</t></li>
          <li><t>operates over authenticated, identity-bound channels; and</t></li>
          <li><t>supports explicit purpose limitation and richer policy statements.</t></li>
        </ul>

        <t>
          A Publisher MAY use REP for the general web while using outbound indexing for high-value relationships with
          specific trusted indexers.
          An Indexer MAY choose to prioritise outbound indexing signals when present, as they can be higher quality and
          fresher than crawl-derived heuristics.
        </t>
      </section>

      <section anchor="aipref-relationship">
        <name>Relationship to AI Preference Signalling</name>

        <t>
          The IETF AIPREF working group is developing vocabulary and attachment mechanisms for expressing usage
          preferences (e.g., <xref target="draft-ietf-aipref-vocab"/> and <xref target="draft-ietf-aipref-attach"/>).
        </t>

        <t>
          Outbound indexing does not compete with these efforts.
          Instead, it provides an authenticated delivery channel for the same or compatible preference statements,
          including in environments where HTTP acquisition is not the primary mechanism or where inbound HTTP access is
          intentionally restricted.
        </t>
      </section>
    </section>

    <section anchor="security-considerations">
      <name>Security Considerations</name>

      <t>
        Outbound indexing reduces exposure to unsolicited inbound traffic by eliminating the need for publicly
        reachable discovery endpoints.
        However, it introduces new considerations around trust, grant handling, and policy enforcement.
      </t>

      <t>
        Implementations SHOULD consider:
      </t>

      <ul>
        <li><t><strong>Indexer impersonation:</strong> Publishers must authenticate Indexer identities; otherwise, an attacker could harvest content by posing as an Indexer.</t></li>
        <li><t><strong>Grant replay:</strong> Discovery Grants should be bound to identities and sessions; replay in a different context should be rejected.</t></li>
        <li><t><strong>Scope escalation:</strong> Indexers must enforce the declared Content Scope and Purposes; ambiguous scopes should be avoided.</t></li>
        <li><t><strong>Confused deputy:</strong> Publishers should avoid issuing broadly scoped grants that a downstream Indexer could use to justify unexpected processing.</t></li>
        <li><t><strong>Compromised Indexers:</strong> A trusted Indexer compromise can lead to large-scale misuse; publishers should prefer short-lived grants and revocation readiness.</t></li>
        <li><t><strong>Metadata leakage:</strong> Even announcing content existence can reveal sensitive information; publishers should consider minimal announcements and staged disclosure.</t></li>
      </ul>

      <t>
        When used with UZPIF (<xref target="UZPIF"/>) and TLS-DPA (<xref target="TLS-DPA"/>), outbound indexing benefits
        from identity-bound handshake properties and reduced scanning surface.
        This document does not define cryptographic primitives; it relies on the referenced transports for channel
        security.
      </t>
    </section>

    <section anchor="privacy-considerations">
      <name>Privacy Considerations</name>

      <t>
        Outbound indexing provides publishers with positive control over who may access content for automated processing.
        This can reduce privacy harms associated with indiscriminate crawling and reduce exposed discovery surface for
        automated access.
      </t>

      <t>
        In this document, reduced publisher-side discoverability, encrypted content protection, and traffic-pattern
        privacy are distinct properties.
        Outbound indexing primarily changes discoverability and policy control.
        Confidentiality of transferred content depends on the authenticated encrypted session in use, and privacy
        against relationship or timing analysis depends on separate metadata-minimising measures.
        Indexers, rendezvous infrastructure, and network observers may still learn useful metadata about who contacted
        whom, when refreshes occurred, and how often updates were requested.
      </t>

      <t>
        Publishers SHOULD consider:
      </t>

      <ul>
        <li><t>minimising announcement metadata to what is necessary for indexing;</t></li>
        <li><t>scoping grants narrowly to avoid unintended disclosure;</t></li>
        <li><t>using short-lived grants and explicit retention policy; and</t></li>
        <li><t>auditing relationships with trusted indexers.</t></li>
      </ul>

      <t>
        Indexers SHOULD consider:
      </t>

      <ul>
        <li><t>providing transparency about processing purposes and retention;</t></li>
        <li><t>supporting publisher revocation and deletion requests; and</t></li>
        <li><t>limiting onward disclosure of content to third parties unless explicitly permitted.</t></li>
      </ul>
    </section>

    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>
        This document has no IANA actions.
      </t>
    </section>

  </middle>

  <back>

    <references>
      <name>Normative References</name>
      <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.2119.xml"/>
      <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8174.xml"/>
    </references>

    <references>
      <name>Informative References</name>
      <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.9309.xml"/>

      <reference anchor="UZPIF">
        <front>
          <title>The Universal Zero-Port Interconnect Framework (UZPIF): An Identity-Centric Architecture for Post-Port Networking</title>
          <author fullname="Benjamin Anthony Fisher"/>
          <date/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-dpa-uzpif-framework"/>
      </reference>

      <reference anchor="UZP">
        <front>
          <title>UZP: Universal Zero-Port Transport Protocol</title>
          <author fullname="Benjamin Anthony Fisher"/>
          <date/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-dpa-uzp-transport"/>
      </reference>

      <reference anchor="TLS-DPA">
        <front>
          <title>TLS-DPA: An Identity-Bound Security Protocol for Traditional, Overlay, and Zero-Port Transports</title>
          <author fullname="Benjamin Anthony Fisher"/>
          <date/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-dpa-tls-dpa"/>
      </reference>

      <reference anchor="draft-illyes-aipref-cbcp">
        <front>
          <title>Crawler best practices</title>
          <author fullname="G. Illyes"/>
          <author fullname="M. Kuehlewind"/>
          <author fullname="A. Kohn"/>
          <date/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-illyes-aipref-cbcp"/>
      </reference>

      <reference anchor="draft-ietf-aipref-vocab">
        <front>
          <title>A Vocabulary For Expressing AI Usage Preferences</title>
          <author fullname="P. Keller"/>
          <author fullname="M. Thomson"/>
          <date/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-ietf-aipref-vocab"/>
      </reference>

      <reference anchor="draft-ietf-aipref-attach">
        <front>
          <title>Associating AI Usage Preferences with Content in HTTP</title>
          <author fullname="M. Thomson"/>
          <author fullname="M. Nottingham"/>
          <date/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-ietf-aipref-attach"/>
      </reference>
    </references>

  </back>

</rfc>
