<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!DOCTYPE rfc [
]>
<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  ipr="trust200902"
  docName="draft-han-ai-manifest-00"
  category="info"
  submissionType="independent"
  version="3">

  <front>
    <title abbrev="AI Manifest">
      AI Manifest: Embedded Workflow Instructions for AI Agents
    </title>
    <seriesInfo name="Internet-Draft" value="draft-han-ai-manifest-00"/>

    <author fullname="Won-pyo Han" initials="W." surname="Han">
      <organization>Individual</organization>
      <address>
        <postal>
          <country>KR</country>
        </postal>
        <email>pk102h@naver.com</email>
      </address>
    </author>

    <date year="2026" month="April" day="21"/>

    <area>General</area>
    <workgroup>Independent Submission</workgroup>

    <keyword>AI agents</keyword>
    <keyword>browser automation</keyword>
    <keyword>LLM</keyword>
    <keyword>web automation</keyword>
    <keyword>manifest</keyword>
    <keyword>MCP</keyword>

    <abstract>
      <t>
        This document specifies the AI Manifest protocol, a JSON-based
        format for websites to declare step-by-step user interface (UI)
        workflow instructions readable by autonomous AI agents. By
        embedding the manifest, website operators allow AI agents using
        browser-automation tools to execute multi-step transactions
        directly via Cascading Style Sheets (CSS) selectors, without
        repeated analysis of the full Document Object Model (DOM).
        The specification defines three interoperable embedding methods,
        a SHA-256 canonical hash verification procedure via a central
        trust registry, and security mitigations against prompt
        injection attacks.
      </t>
      <t>
        Empirical results from a reference implementation demonstrate
        an 81.9% reduction in input tokens consumed by the AI agent and
        an increase in task success rate from 20% to 100% on a
        representative multi-step transaction, compared with
        conventional DOM-analysis approaches.
      </t>
    </abstract>
  </front>

  <middle>

    <section anchor="introduction" numbered="true">
      <name>Introduction</name>
      <t>
        Large Language Model (LLM)-based AI agents increasingly interact
        with web services via browser-automation protocols such as the
        Model Context Protocol (MCP), Playwright, Puppeteer, and
        Selenium WebDriver. Current agents typically parse entire DOM
        trees or screenshots on every page to infer UI structure,
        producing three well-known problems:
      </t>
      <ol>
        <li>
          Substantial token consumption due to repeated analysis of
          large DOMs on every session.
        </li>
        <li>
          High failure rates on complex multi-step transactional UIs
          such as enterprise resource planning (ERP) systems, academic
          manuscript submission portals, and government e-services.
        </li>
        <li>
          Absence of a standardized mechanism for a website operator to
          declare an AI-agent-friendly ("AI-Ready") operational surface.
        </li>
      </ol>
      <t>
        Related prior work includes <tt>robots.txt</tt>,
        <tt>llms.txt</tt>, <tt>agents.txt</tt>, and
        <tt>ai-plugin.json</tt>. These address crawling permissions,
        LLM-friendly documentation, agent capability declarations, and
        API-level integration respectively. None provides step-by-step
        UI workflow instructions for multi-page transactional flows.
      </t>
      <t>
        AI Manifest fills this gap by specifying a JSON format that
        enumerates ordered UI operations keyed to CSS selectors. An AI
        agent detects, parses, and verifies the manifest before
        executing the listed steps, and avoids further DOM-based
        inference for those steps.
      </t>
    </section>

    <section anchor="conventions" numbered="true">
      <name>Conventions and Definitions</name>
      <t>
        The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
        "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>",
        "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>",
        "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>",
        "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and
        "<bcp14>OPTIONAL</bcp14>" in this document are to be
        interpreted as described in BCP 14 <xref target="RFC2119"/>
        <xref target="RFC8174"/> when, and only when, they appear in
        all capitals, as shown here.
      </t>
      <dl>
        <dt>AI Manifest</dt>
        <dd>
          A structured data object, expressed in JSON, that a website
          operator embeds into a web page or serves at a well-known
          URI, containing at minimum a task identifier, an ordered
          <tt>steps</tt> array, and for each step an <tt>action</tt>
          field and a CSS selector.
        </dd>
        <dt>AI Agent</dt>
        <dd>
          A software process, typically driven by an LLM, that accesses
          a web page via a browser-automation tool and executes actions
          on behalf of a human user.
        </dd>
        <dt>Central Trust Registry</dt>
        <dd>
          A network service that stores SHA-256 hash values of
          manifests pre-registered by publishers, and responds to
          real-time trust lookups from AI agents with a status of
          white-listed, black-listed, or unknown.
        </dd>
        <dt>Canonical Form</dt>
        <dd>
          The representation of an AI Manifest obtained by
          lexicographically sorting all JSON object keys at every
          nesting level and serializing the result using the JSON
          encoding defined in <xref target="RFC8259"/> with UTF-8.
        </dd>
      </dl>
    </section>

    <section anchor="protocol-overview" numbered="true">
      <name>Protocol Overview</name>

      <section anchor="embedding-methods" numbered="true">
        <name>Embedding Methods</name>
        <t>
          A website <bcp14>MAY</bcp14> provide an AI Manifest via one
          or more of the following methods:
        </t>

        <section anchor="method-a-well-known" numbered="true">
          <name>Method A: Well-Known URI</name>
          <t>
            The server <bcp14>SHOULD</bcp14> make the manifest
            retrievable at the following well-known URI
            <xref target="RFC8615"/>:
          </t>
          <artwork><![CDATA[
  /.well-known/ai-manifest.json
          ]]></artwork>
          <t>
            In addition, the HTML document
            <bcp14>SHOULD</bcp14> declare the entry point via an
            HTML <tt>meta</tt> element:
          </t>
          <artwork><![CDATA[
  <meta name="ai-manifest"
        content="/.well-known/ai-manifest.json">
          ]]></artwork>
        </section>

        <section anchor="method-b-hidden-dom" numbered="true">
          <name>Method B: Hidden DOM Element</name>
          <t>
            The server <bcp14>MAY</bcp14> embed the manifest into the
            HTML response as a hidden element with the
            <tt>display:none</tt> style and
            <tt>aria-hidden="true"</tt> attribute:
          </t>
          <artwork><![CDATA[
  <div id="ai-manifest" style="display:none" aria-hidden="true"
       data-manifest='{"version":"1.0", ... }'></div>
          ]]></artwork>
        </section>

        <section anchor="method-c-http-header" numbered="true">
          <name>Method C: HTTP Response Header</name>
          <t>
            The server <bcp14>MAY</bcp14> declare the manifest location
            and hash in a response header:
          </t>
          <artwork><![CDATA[
  X-AI-Manifest: url=/.well-known/ai-manifest.json;
                 hash=sha256:<hex>
          ]]></artwork>
          <t>
            Method C is <bcp14>RECOMMENDED</bcp14> in conjunction with
            Method A, so that an AI agent can discover the manifest URL
            and validate the hash in a single request-response round
            trip before fetching the body.
          </t>
        </section>
      </section>

      <section anchor="manifest-schema" numbered="true">
        <name>Manifest Schema</name>
        <t>
          An AI Manifest is a JSON object <xref target="RFC8259"/> with
          the following top-level fields:
        </t>
        <dl>
          <dt><tt>version</tt> (string, <bcp14>REQUIRED</bcp14>)</dt>
          <dd>
            Specification version. This document defines version
            <tt>"1.0"</tt>.
          </dd>
          <dt><tt>publisher</tt> (string, <bcp14>REQUIRED</bcp14>)</dt>
          <dd>
            Canonical domain name of the publishing website.
          </dd>
          <dt><tt>manifestId</tt> (string, <bcp14>REQUIRED</bcp14>)</dt>
          <dd>
            Stable identifier scoped to the publisher, used for
            registry lookup.
          </dd>
          <dt><tt>registry_url</tt> (string, <bcp14>REQUIRED</bcp14>)</dt>
          <dd>
            HTTPS URL of the trust registry that the publisher has
            pre-registered this manifest with.
          </dd>
          <dt><tt>task</tt> (object, <bcp14>REQUIRED</bcp14>)</dt>
          <dd>
            Contains a task-level <tt>id</tt> and an ordered
            <tt>steps</tt> array. Each element of <tt>steps</tt> is an
            object with at least <tt>step</tt> (integer),
            <tt>action</tt> (string), and <tt>selector</tt> (string).
            The <tt>action</tt> value
            <bcp14>MUST</bcp14> be one of the registered actions
            (see <xref target="iana"/>).
          </dd>
        </dl>
      </section>

      <section anchor="detection-algorithm" numbered="true">
        <name>Agent Detection Algorithm</name>
        <t>
          Upon loading a page, an AI agent implementing this
          specification <bcp14>SHOULD</bcp14> perform the following
          detection sequence before any full-DOM inference pass:
        </t>
        <ol>
          <li>
            Inspect the HTTP response headers for
            <tt>X-AI-Manifest</tt> (Method C).
          </li>
          <li>
            If absent, retrieve
            <tt>/.well-known/ai-manifest.json</tt> (Method A)
            or resolve the URI declared by the <tt>meta</tt> element.
          </li>
          <li>
            If still absent, search the DOM for an element with
            <tt>id="ai-manifest"</tt> and read its
            <tt>data-manifest</tt> attribute (Method B).
          </li>
          <li>
            If none is found, fall back to conventional DOM-based
            inference.
          </li>
        </ol>
      </section>

      <section anchor="hash-verification" numbered="true">
        <name>Canonical Hash and Trust Verification</name>
        <t>
          Prior to execution, the AI agent
          <bcp14>MUST</bcp14> compute a SHA-256 hash over the
          canonical form (see <xref target="conventions"/>) of the
          manifest and send a trust lookup request to the URI in the
          <tt>registry_url</tt> field. The request
          <bcp14>MUST</bcp14> use HTTPS <xref target="RFC2818"/> and
          <bcp14>MUST</bcp14> carry the tuple
          <tt>{publisher, manifestId, hash}</tt> as a JSON body.
        </t>
        <t>
          The registry response is a JSON object containing a
          <tt>status</tt> field with one of the following values:
        </t>
        <ul>
          <li>
            <tt>"white"</tt> — the manifest is trusted; the agent
            <bcp14>MAY</bcp14> proceed to execution.
          </li>
          <li>
            <tt>"black"</tt> — the manifest is explicitly distrusted;
            the agent <bcp14>MUST</bcp14> abort and
            <bcp14>SHOULD</bcp14> alert the human user.
          </li>
          <li>
            <tt>"unknown"</tt> — the manifest is not registered; the
            agent <bcp14>SHOULD</bcp14> warn the user and
            <bcp14>MAY</bcp14> fall back to DOM-based inference.
          </li>
        </ul>
        <t>
          Implementations <bcp14>MAY</bcp14> cache a non-expired
          registry response keyed by the manifest hash, to avoid
          repeated network round trips for an identical manifest.
        </t>
      </section>

      <section anchor="execution" numbered="true">
        <name>Execution</name>
        <t>
          When trust is confirmed, the agent executes the
          <tt>steps</tt> array in declared order, mapping each step's
          <tt>action</tt> and <tt>selector</tt> to a browser-automation
          primitive (e.g. "click", "fill", "select", "upload"). For
          the duration of a manifest-driven execution the agent
          <bcp14>SHOULD NOT</bcp14> perform additional LLM-based
          inference over the page DOM.
        </t>
      </section>

    </section>

    <section anchor="registry" numbered="true">
      <name>Central Trust Registry</name>
      <t>
        A Central Trust Registry accepts manifest registrations from
        publishers and answers real-time hash lookups from AI agents.
        A conforming registry <bcp14>SHOULD</bcp14>:
      </t>
      <ul>
        <li>
          Store the SHA-256 hash of the canonical form of each
          registered manifest, together with the
          <tt>publisher</tt> and <tt>manifestId</tt> fields.
        </li>
        <li>
          Perform static analysis on the submitted <tt>steps</tt> array
          and reject or black-list manifests whose selectors or actions
          match a published pattern of prompt-injection risk (for
          example, selectors targeting <tt>iframe</tt> elements for
          cross-origin form submission, or actions outside the
          registered action set).
        </li>
        <li>
          Expose a community-driven mechanism for reporting and
          black-listing malicious manifests.
        </li>
      </ul>
      <t>
        This document does not mandate a specific registry operator.
        Multiple interoperable registries <bcp14>MAY</bcp14> exist, and
        each manifest declares which registry is authoritative for it
        via <tt>registry_url</tt>.
      </t>
    </section>

    <section anchor="iana" numbered="true">
      <name>IANA Considerations</name>

      <section anchor="iana-well-known" numbered="true">
        <name>Well-Known URI Registration</name>
        <t>
          This document requests IANA to register the following entry
          in the "Well-Known URIs" registry established by
          <xref target="RFC8615"/>:
        </t>
        <dl>
          <dt>URI Suffix:</dt>
          <dd><tt>ai-manifest.json</tt></dd>
          <dt>Change Controller:</dt>
          <dd>Independent Submission Stream editor</dd>
          <dt>Reference:</dt>
          <dd>This document</dd>
          <dt>Status:</dt>
          <dd>provisional</dd>
          <dt>Related Information:</dt>
          <dd>None</dd>
        </dl>
      </section>

      <section anchor="iana-actions" numbered="true">
        <name>AI Manifest Actions Registry (initial)</name>
        <t>
          This document requests IANA to create a new registry named
          "AI Manifest Actions", with the following initial
          registrations. Registration policy:
          Specification Required <xref target="RFC8126"/>.
        </t>
        <dl>
          <dt><tt>click</tt></dt>
          <dd>Invoke a click event on the selected element.</dd>
          <dt><tt>fill</tt></dt>
          <dd>Type a value into a text input element.</dd>
          <dt><tt>select</tt></dt>
          <dd>Choose an option from a drop-down list element.</dd>
          <dt><tt>upload</tt></dt>
          <dd>Attach a file to a file input element.</dd>
          <dt><tt>wait</tt></dt>
          <dd>Pause for a condition or duration.</dd>
          <dt><tt>navigate</tt></dt>
          <dd>Change the current URL.</dd>
          <dt><tt>assert</tt></dt>
          <dd>Verify that a condition holds before proceeding.</dd>
        </dl>
      </section>
    </section>

    <section anchor="security" numbered="true">
      <name>Security Considerations</name>

      <section anchor="sec-prompt-injection" numbered="true">
        <name>Prompt Injection Risk</name>
        <t>
          A malicious website could embed an AI Manifest whose
          <tt>steps</tt> array leads an AI agent to perform actions
          harmful to the user (for example, submitting a form to a
          third party with user-supplied credentials). The Central
          Trust Registry mechanism (<xref target="registry"/>) is the
          primary mitigation. Agents <bcp14>MUST NOT</bcp14> execute a
          manifest whose registry lookup returns
          <tt>"black"</tt> and <bcp14>SHOULD</bcp14> warn the user
          before executing an <tt>"unknown"</tt> manifest.
        </t>
      </section>

      <section anchor="sec-canonical-hash" numbered="true">
        <name>Integrity of the Manifest</name>
        <t>
          The SHA-256 hash is computed over the canonical form of the
          manifest so that semantically equivalent encodings produce
          identical digests. Implementations
          <bcp14>MUST NOT</bcp14> rely on a hash computed over
          non-canonical bytes.
        </t>
      </section>

      <section anchor="sec-https" numbered="true">
        <name>Transport Security</name>
        <t>
          All communication with the registry
          <bcp14>MUST</bcp14> use HTTPS with server authentication per
          <xref target="RFC2818"/>. Registry operators
          <bcp14>SHOULD</bcp14> sign their responses with a public key
          published out of band so that an AI agent can verify the
          integrity of a cached response.
        </t>
      </section>

    </section>

    <section anchor="privacy" numbered="true">
      <name>Privacy Considerations</name>
      <t>
        Registry lookups necessarily expose to the registry operator
        the fact that a particular AI agent has visited a particular
        publisher's manifest. Registry operators
        <bcp14>SHOULD</bcp14> minimize the retention of client
        identifiers associated with lookup requests. Agents
        <bcp14>MAY</bcp14> employ private, time-limited caching of
        registry responses to reduce the frequency of such lookups.
      </t>
    </section>

    <section anchor="implementation" numbered="true">
      <name>Implementation Status</name>
      <t>
        Note to RFC Editor: This section is intended to be removed
        prior to publication as an RFC.
      </t>
      <t>
        A reference implementation, including an example publisher
        server, a reference registry, two AI agent variants
        (DOM-analysis baseline and manifest-aware), and an automated
        benchmark harness, is available at
        <eref target="https://github.com/11pyo/AINavManifest"/>
        under the MIT License.
      </t>
      <t>
        In the reference benchmark — a two-step ERP order-entry
        transaction repeated 30 times with input tokens counted via
        the tiktoken <tt>cl100k_base</tt> encoding — the
        manifest-aware agent consumed an average of 341 input tokens
        per task with a 100% task success rate (30 of 30 runs), while
        the DOM-analysis baseline consumed an average of 1887.6 input
        tokens with a 20% success rate (6 of 30 runs). Raw results
        accompany the reference implementation.
      </t>
    </section>

    <section anchor="ipr" numbered="true">
      <name>Intellectual Property Rights Disclosure</name>
      <t>
        The technology described in this document is the subject of
        Korean Patent Application No. 10-2026-0071716, filed on
        2026-04-21 by the author. The applicant commits to offer any
        essential claims under Fair, Reasonable, and Non-Discriminatory
        (FRAND) terms to implementers of this specification, as
        declared in the project repository.
      </t>
    </section>

    <section anchor="acknowledgments" numbered="true">
      <name>Acknowledgments</name>
      <t>
        The author thanks the Anthropic Claude Code, Model Context
        Protocol, and OpenAI function-calling communities for the
        empirical observations that motivated this work.
      </t>
    </section>

  </middle>

  <back>

    <references>
      <name>Normative References</name>

      <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
        <front>
          <title>Key words for use in RFCs to Indicate Requirement Levels</title>
          <author initials="S." surname="Bradner" fullname="S. Bradner"/>
          <date year="1997" month="March"/>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="2119"/>
        <seriesInfo name="DOI" value="10.17487/RFC2119"/>
      </reference>

      <reference anchor="RFC2818" target="https://www.rfc-editor.org/info/rfc2818">
        <front>
          <title>HTTP Over TLS</title>
          <author initials="E." surname="Rescorla" fullname="E. Rescorla"/>
          <date year="2000" month="May"/>
        </front>
        <seriesInfo name="RFC" value="2818"/>
        <seriesInfo name="DOI" value="10.17487/RFC2818"/>
      </reference>

      <reference anchor="RFC8126" target="https://www.rfc-editor.org/info/rfc8126">
        <front>
          <title>Guidelines for Writing an IANA Considerations Section in RFCs</title>
          <author initials="M." surname="Cotton" fullname="M. Cotton"/>
          <author initials="B." surname="Leiba" fullname="B. Leiba"/>
          <author initials="T." surname="Narten" fullname="T. Narten"/>
          <date year="2017" month="June"/>
        </front>
        <seriesInfo name="BCP" value="26"/>
        <seriesInfo name="RFC" value="8126"/>
        <seriesInfo name="DOI" value="10.17487/RFC8126"/>
      </reference>

      <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
        <front>
          <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
          <author initials="B." surname="Leiba" fullname="B. Leiba"/>
          <date year="2017" month="May"/>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="8174"/>
        <seriesInfo name="DOI" value="10.17487/RFC8174"/>
      </reference>

      <reference anchor="RFC8259" target="https://www.rfc-editor.org/info/rfc8259">
        <front>
          <title>The JavaScript Object Notation (JSON) Data Interchange Format</title>
          <author initials="T." surname="Bray" fullname="T. Bray" role="editor"/>
          <date year="2017" month="December"/>
        </front>
        <seriesInfo name="STD" value="90"/>
        <seriesInfo name="RFC" value="8259"/>
        <seriesInfo name="DOI" value="10.17487/RFC8259"/>
      </reference>

      <reference anchor="RFC8615" target="https://www.rfc-editor.org/info/rfc8615">
        <front>
          <title>Well-Known Uniform Resource Identifiers (URIs)</title>
          <author initials="M." surname="Nottingham" fullname="M. Nottingham"/>
          <date year="2019" month="May"/>
        </front>
        <seriesInfo name="RFC" value="8615"/>
        <seriesInfo name="DOI" value="10.17487/RFC8615"/>
      </reference>
    </references>

  </back>
</rfc>
