HTTP                                                        C. Lecklider
Internet-Draft                                               Independent
Intended status: Informational                               1 July 2026
Expires: 2 January 2027


 Content Negotiation for Consolidated Machine-Readable Representations
                     draft-consolidated-content-01

Abstract

   This document specifies the use of HTTP content negotiation and the
   Prefer header to request consolidated, machine-readable
   representations of web resources.  A client uses the Accept header to
   negotiate an appropriate media type and Prefer: return=consolidated
   to request a representation intended for automated consumption.  A
   server that honours the preference identifies the selected
   representation with Preference-Applied: return=consolidated and
   varies cached responses on Accept and Prefer.  The mechanism uses
   existing HTTP semantics and introduces a single new preference value
   for the existing Prefer header.  It provides a protocol-level
   alternative to ad-hoc client detection or path conventions when
   serving machine-readable representations, while also enabling
   publishers to reduce repeated fetching, bandwidth use, and server
   processing.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 2 January 2027.

Copyright Notice

   Copyright (c) 2026 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Lecklider                Expires 2 January 2027                 [Page 1]

Internet-Draft      Consolidated Content Negotiation           July 2026


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Requesting Consolidated Representations . . . . . . . . . . .   4
     2.1.  Accept Header . . . . . . . . . . . . . . . . . . . . . .   4
     2.2.  Prefer Header . . . . . . . . . . . . . . . . . . . . . .   5
     2.3.  Combined Request  . . . . . . . . . . . . . . . . . . . .   5
   3.  Server Behaviour  . . . . . . . . . . . . . . . . . . . . . .   5
     3.1.  Honouring Preferences . . . . . . . . . . . . . . . . . .   5
     3.2.  Content Structure . . . . . . . . . . . . . . . . . . . .   6
       3.2.1.  Media Type Independence . . . . . . . . . . . . . . .   6
     3.3.  Discovery . . . . . . . . . . . . . . . . . . . . . . . .   7
     3.4.  When Consolidation is Not Practical . . . . . . . . . . .   7
   4.  Caching Benefits  . . . . . . . . . . . . . . . . . . . . . .   7
   5.  Applicability to HTTP Versions  . . . . . . . . . . . . . . .   8
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
     6.1.  Consolidated Representation Trust . . . . . . . . . . . .   9
     6.2.  Content Integrity . . . . . . . . . . . . . . . . . . . .   9
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  10
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  10
   Appendix A.  Practical Benefits . . . . . . . . . . . . . . . . .  10
     A.1.  Why This Matters  . . . . . . . . . . . . . . . . . . . .  10
     A.2.  Beyond Simple Conversion  . . . . . . . . . . . . . . . .  11
     A.3.  Quantifiable Benefits . . . . . . . . . . . . . . . . . .  11
     A.4.  Operational Benefits  . . . . . . . . . . . . . . . . . .  12
   Appendix B.  Deployment Lessons and Related Approaches  . . . . .  13
     B.1.  Relationship to Earlier Machine-Readable Web
           Approaches  . . . . . . . . . . . . . . . . . . . . . . .  13
     B.2.  Contemporary Approaches . . . . . . . . . . . . . . . . .  14
       B.2.1.  Representations Selected by Client Detection  . . . .  14
       B.2.2.  The llms.txt Convention . . . . . . . . . . . . . . .  16
     B.3.  Manipulation of Machine-Readable Representations  . . . .  17
   Appendix C.  Example Transformation . . . . . . . . . . . . . . .  18
     C.1.  Markdown Example: Product Website . . . . . . . . . . . .  18
     C.2.  JSON Example: Financial News Article  . . . . . . . . . .  20
   Appendix D.  Implementation . . . . . . . . . . . . . . . . . . .  22


Lecklider                Expires 2 January 2027                 [Page 2]

Internet-Draft      Consolidated Content Negotiation           July 2026


     D.1.  Client Implementation . . . . . . . . . . . . . . . . . .  22
     D.2.  Server Implementation . . . . . . . . . . . . . . . . . .  22
     D.3.  Incremental Adoption  . . . . . . . . . . . . . . . . . .  23
     D.4.  Format Selection  . . . . . . . . . . . . . . . . . . . .  24
     D.5.  Validation and Testing  . . . . . . . . . . . . . . . . .  25
   Appendix E.  Motivation . . . . . . . . . . . . . . . . . . . . .  25
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  26
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  26

1.  Introduction

   Web content is traditionally structured for human navigation through
   HTML pages.  Automated agents retrieving this content for analysis or
   training must fetch multiple pages to obtain complete information
   about a topic.  This creates unnecessary server load from repeated
   page fetches, consumes bandwidth inefficiently, produces fragmented
   information requiring client-side reassembly, and makes change
   detection difficult.  The volume and fragmentation of web content
   make comprehensive caching impractical for automated systems,
   compounding the inefficiency.

   Since the initial publication of this draft, publisher and
   infrastructure deployments have emerged that provide machine-readable
   or agent-oriented representations of web resources, commonly using
   Markdown or other compact formats.  These deployments demonstrate
   demand for machine-optimised representations, but often rely on ad-
   hoc client detection rather than explicit representation negotiation.
   This document specifies a protocol-level mechanism for requesting
   such representations using existing HTTP semantics.

   HTTP provides content negotiation (Section 12 of [RFC9110]) and
   client preferences ([RFC7240]) to address varying client needs.  This
   document specifies how these existing mechanisms can be combined so
   that clients can request consolidated representations optimised for
   machine consumption in appropriate media types.  It introduces a new
   return=consolidated preference value for the existing Prefer header
   and does not define any new HTTP headers or media types.

1.1.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.


Lecklider                Expires 2 January 2027                 [Page 3]

Internet-Draft      Consolidated Content Negotiation           July 2026


   This document uses the terms "resource", "representation", "content
   negotiation", "client", and "server" as defined by HTTP semantics
   [RFC9110].

   A *machine-readable representation* is a representation whose format
   and structure are intended primarily for automated processing rather
   than direct human presentation.  Examples include Markdown, JSON,
   XML, CSV, and other structured or lightly structured formats.

   A *consolidated representation* is a representation selected in
   response to Prefer: return=consolidated.  It is intended to present a
   resource in a form more suitable for the requesting application, for
   example by omitting presentation chrome, restructuring information,
   combining related material, or providing additional context.  The
   term describes the requested representation form; it does not imply a
   particular media type, or accuracy, completeness, equivalence,
   neutrality, or trustworthiness.

   A *navigational HTML representation* is an HTML representation
   intended primarily for human navigation in a web browser.  It
   typically includes navigation, layout, styling hooks, scripts,
   advertising, related links, and other material used for presentation
   or site traversal.

   An *automated client* is an HTTP client that retrieves
   representations for automated processing, whether directly or on
   behalf of another system.  Examples include crawlers, search systems,
   retrieval systems, monitoring tools, and agent infrastructure.

   A *publisher* is the party responsible for making a resource and its
   representations available.  In this document, the term is used
   informally and may refer to a site operator, origin server operator,
   content provider, or other party controlling publication of the
   representation.

2.  Requesting Consolidated Representations

   Clients request consolidated representations using two standard HTTP
   headers.

2.1.  Accept Header

   Clients indicate preferred media type using the Accept header.  The
   examples below prefer Markdown or JSON over HTML, reflecting the
   primary use case for consolidated machine-readable representations:

   Accept: text/markdown;q=0.9, text/html;q=0.8


Lecklider                Expires 2 January 2027                 [Page 4]

Internet-Draft      Consolidated Content Negotiation           July 2026


   or

   Accept: application/json;q=0.9, text/html;q=0.8

   Clients MAY specify multiple formats with appropriate quality values.

2.2.  Prefer Header

   Clients indicate desire for consolidated content using the Prefer
   header with the return preference [RFC7240]:

   Prefer: return=consolidated

2.3.  Combined Request

   A complete request combines both headers:

   GET /documentation HTTP/1.1
   Host: example.org
   Accept: text/markdown;q=0.9, text/html;q=0.8
   Prefer: return=consolidated

3.  Server Behaviour

3.1.  Honouring Preferences

   Servers receiving requests with Prefer: return=consolidated SHOULD
   provide consolidated representations when practical.  Servers that
   honour the preference MUST include Preference-Applied in the
   response:

   HTTP/1.1 200 OK
   Content-Type: text/markdown
   Preference-Applied: return=consolidated
   ETag: "consolidated-v1-a3f8b2"
   Vary: Accept, Prefer

   The Vary header MUST include both Accept and Prefer to ensure proper
   caching behaviour by intermediaries (proxies, CDNs).  Other headers
   (such as Accept-Encoding) MAY also appear in Vary as appropriate.
   Without appropriate Vary headers, caches may incorrectly serve
   consolidated representations to clients that did not request them, or
   vice versa.


Lecklider                Expires 2 January 2027                 [Page 5]

Internet-Draft      Consolidated Content Negotiation           July 2026


3.2.  Content Structure

   Consolidated representations SHOULD differ in structure and
   organisation from their navigational HTML counterparts.  Servers
   SHOULD:

   *  Consolidate related content from multiple pages into hierarchical
      sections

   *  Organise information by semantic relationships rather than
      navigation structure

   *  Include appropriate context for understanding without navigation
      chrome

   *  Preserve information fidelity while restructuring for machine
      consumption

   *  Focus on coherent topics rather than consolidating entire sites

   Very small sites MAY consolidate all content into a single resource.
   Larger sites SHOULD create multiple focused consolidated resources,
   each addressing a specific topic or information need.  Content that
   is not directly relevant to understanding the primary topic SHOULD be
   excluded.

3.2.1.  Media Type Independence

   The return=consolidated preference describes the requested
   representation form, not a particular media type.  A client MAY
   request a consolidated representation in any media type that the
   server is willing to provide.

   The primary use case for this specification is machine-readable
   textual representations such as Markdown, JSON, XML, or other formats
   suited to automated processing.  However, this specification does not
   restrict consolidated representations to those formats.  A
   consolidated representation MAY be served as text/html, or as another
   media type including image, audio, or video, where such a
   representation is meaningful for the resource and supported by the
   server.

   This specification does not define what makes a consolidated
   representation meaningful for a particular media type or application.

   Servers are not required to support consolidated representations in
   any particular media type.


Lecklider                Expires 2 January 2027                 [Page 6]

Internet-Draft      Consolidated Content Negotiation           July 2026


3.3.  Discovery

   Publishers MAY advertise the availability of alternate
   representations using HTML <link> elements in the same manner as feed
   discovery:

   <link rel="alternate" type="text/markdown"
         href="/products/catalogue"
         title="Product Catalogue (Markdown)">
   <link rel="alternate" type="application/ld+json"
         href="/products/catalogue"
         title="Product Catalogue (Linked Data)">

   This allows automated agents to discover available formats without
   relying solely on content negotiation.  The href attribute points to
   the resource URL, and the type attribute indicates the available
   media type.  These alternate representations may support
   consolidation via the Prefer header, or may simply be format
   conversions - the <link> element advertises format availability
   either way.  Publishers providing alternate representations for
   multiple resources MAY include multiple sets of <link> elements.

   Representations MAY also advertise alternate formats of the same
   resource within themselves.  For example, a markdown representation
   might include in its frontmatter:

   alternate-formats:
     - type: application/ld+json
       title: Product Catalogue (Linked Data)
     - type: text/html
       title: Product Catalogue (HTML)

   This enables format discovery for agents that bypass HTML parsing and
   directly request machine-readable representations.  The URL for
   alternate formats is the same as the current resource.

3.4.  When Consolidation is Not Practical

   Servers MAY decline to provide consolidated representations by
   serving the standard representation without the Preference-Applied
   header.

4.  Caching Benefits

   Consolidated representations can be cached independently with their
   own ETag values.  This enables efficient conditional requests:


Lecklider                Expires 2 January 2027                 [Page 7]

Internet-Draft      Consolidated Content Negotiation           July 2026


   GET /documentation HTTP/1.1
   Host: example.org
   Accept: text/markdown;q=0.9
   Prefer: return=consolidated
   If-None-Match: "consolidated-v1-a3f8b2"

   HTTP/1.1 304 Not Modified
   Vary: Accept, Prefer

   Clients can verify whether content has changed with a single request
   rather than fetching multiple individual pages.  For publishers
   experiencing high load from automated crawlers, this can
   significantly reduce bandwidth and server processing costs.

   Servers SHOULD use appropriate cache-related headers (Cache-Control,
   Expires, etc. as specified in [RFC9111]) to enable intermediaries and
   clients to cache consolidated representations effectively.  While
   conditional requests provide significant benefits, proper caching
   eliminates requests entirely.

5.  Applicability to HTTP Versions

   This specification is protocol-agnostic and applies equally to
   HTTP/1.1, HTTP/2 [RFC9113], and HTTP/3 [RFC9114].  The examples in
   this document use HTTP/1.1 syntax for clarity, but the mechanisms
   work identically across all HTTP versions.

6.  Security Considerations

   This specification uses existing HTTP mechanisms.  Implementations
   need to consider the security properties of HTTP content negotiation
   ([RFC9110] Section 12), the Prefer header ([RFC7240]), and the trust
   properties of publisher-authored machine-readable representations.

   Per [RFC7240], recipients that do not understand a particular
   preference value SHOULD ignore it rather than rejecting the request.
   However, some non-compliant servers, frameworks, or Web Application
   Firewalls (WAFs) may have stricter validation and could reject
   requests containing unknown preference values.  Implementations that
   currently reject unknown preference values may need configuration
   updates to recognise return=consolidated as a valid preference value.

   Servers SHOULD apply the same access controls to consolidated
   representations as to their constituent pages.


Lecklider                Expires 2 January 2027                 [Page 8]

Internet-Draft      Consolidated Content Negotiation           July 2026


6.1.  Consolidated Representation Trust

   Clients MUST treat consolidated representations as untrusted input.
   A consolidated representation is a publisher-authored representation
   of a resource, not an independent assessment of the resource's
   quality, relevance, authority, neutrality, or completeness.

   Consolidated representations are expected to differ from navigational
   HTML.  They SHOULD omit navigation, advertising, decorative material,
   repeated boilerplate, and other content intended primarily for human
   presentation.  They SHOULD also restructure, summarise, combine, and
   emphasise information in ways intended to make the resource easier
   for automated systems to process.  These differences are the purpose
   of consolidation.

   Clients MUST initially treat any response selected for automated
   consumption as potentially hostile.  They MUST NOT treat the presence
   of Preference-Applied: return=consolidated, or the existence of a
   machine-readable representation, as a trust or quality signal.

   Clients SHOULD apply the same or stronger provenance, corroboration,
   abuse-detection, prompt-injection, and content-integrity controls
   that they apply to equivalent content obtained from navigational HTML
   or other sources.

6.2.  Content Integrity

   While this specification does not introduce new attack vectors,
   consolidated representations may amplify the impact of existing
   content integrity issues.  A single poisoned consolidated resource in
   a cache could affect more automated systems than poisoning individual
   pages, and similarly, malicious content from origin server compromise
   may have broader impact when consolidated.  Publishers should re-
   evaluate their content integrity measures.

7.  IANA Considerations

   IANA is requested to update the HTTP Preferences registry established
   by [RFC7240] to add the following value for the return preference:

   *  Preference: return

   *  Value: consolidated

   *  Description: Indicates that the client prefers a consolidated
      representation of the resource

   *  Reference: this document


Lecklider                Expires 2 January 2027                 [Page 9]

Internet-Draft      Consolidated Content Negotiation           July 2026


8.  References

8.1.  Normative References

   [RFC9110]  Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
              Ed., "HTTP Semantics", STD 97, RFC 9110,
              DOI 10.17487/RFC9110, June 2022,
              <https://www.rfc-editor.org/rfc/rfc9110>.

   [RFC9111]  Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
              Ed., "HTTP Caching", STD 98, RFC 9111,
              DOI 10.17487/RFC9111, June 2022,
              <https://www.rfc-editor.org/rfc/rfc9111>.

   [RFC7240]  Snell, J., "Prefer Header for HTTP", RFC 7240,
              DOI 10.17487/RFC7240, June 2014,
              <https://www.rfc-editor.org/rfc/rfc7240>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

8.2.  Informative References

   [RFC9113]  Thomson, M., Ed. and C. Benfield, Ed., "HTTP/2", RFC 9113,
              DOI 10.17487/RFC9113, June 2022,
              <https://www.rfc-editor.org/rfc/rfc9113>.

   [RFC9114]  Bishop, M., Ed., "HTTP/3", RFC 9114, DOI 10.17487/RFC9114,
              June 2022, <https://www.rfc-editor.org/rfc/rfc9114>.

Appendix A.  Practical Benefits

A.1.  Why This Matters

   Automated systems increasingly access web content: search engines, AI
   training systems, research tools, and monitoring services.  These
   systems must fetch multiple pages to assemble complete information,
   repeatedly crawl sites to detect changes, and employ heuristics to
   distinguish content from presentational markup.  This inefficiency
   costs money: publishers serve largely redundant traffic, consumers
   fetch and process bloated responses, and both produce suboptimal
   results.


Lecklider                Expires 2 January 2027                [Page 10]

Internet-Draft      Consolidated Content Negotiation           July 2026


   This specification provides a mechanism for publishers to serve
   consolidated, machine-optimised representations directly.  Publishers
   reduce costs and load, and consumers improve information quality and
   efficiency.

A.2.  Beyond Simple Conversion

   While format conversion alone reduces bandwidth and parsing overhead,
   consolidated representations provide something more valuable: a
   direct way to communicate relevance and context to automated systems.
   Rather than forcing machines to scrape multiple pages and infer
   relationships, consolidated representations explicitly state what
   information belongs together (see Example Transformation below for
   concrete illustrations).

   By consolidating related information, publishers guide automated
   systems to the complete picture while implicitly indicating, by
   omission, what is not relevant.  This benefits both parties: machines
   get better information with less work, publishers reduce load while
   maintaining control over how their content is understood.

A.3.  Quantifiable Benefits

   Based on typical modern web architectures, implementing consolidated
   representations can yield substantial operational improvements.  The
   figures in this section are illustrative estimates intended to convey
   likely magnitudes of effect, not results of formal benchmarking.

   Server load can, in many cases, drop by on the order of 70-90% per
   information retrieval session, as a single consolidated resource
   replaces five to twenty individual page fetches.  This reduction
   cascades through the infrastructure: fewer database queries, less
   server-side rendering overhead, and diminished load on application
   servers.

   Bandwidth consumption often decreases proportionally.  Consolidated
   representations in machine-readable formats are typically
   significantly smaller than equivalent HTML pages, lacking navigation
   chrome, advertisements, and presentational markup.  A single ETag
   check can replace multiple page checks for change detection, further
   reducing transfer overhead.  Network egress costs decline
   accordingly.

   Caching efficiency can also improve markedly.  Rather than caching
   dozens of individual pages, CDNs and intermediaries cache single
   consolidated resources.  Cache hit rates increase, invalidation
   simplifies, and CDN costs drop.


Lecklider                Expires 2 January 2027                [Page 11]

Internet-Draft      Consolidated Content Negotiation           July 2026


   *Illustrative example:* Consider a documentation site receiving
   10,000 automated agent visits daily.  Suppose each visit currently
   fetches an average of 10 pages to assemble complete information
   (typical for documentation traversal), representing 100,000 daily
   requests.  With consolidated representations reducing this to one
   request per visit, the site serves 90,000 fewer requests daily.

   Modern documentation pages of even modest visual complexity can
   easily reach 1 MB once navigation, styling, and scripts are included,
   while equivalent consolidated markdown representations might
   plausibly be 350 KB for the same underlying content.  In this
   scenario, bandwidth drops from approximately 100 GB daily (HTML) to
   3.5 GB daily (consolidated markdown) - a reduction of roughly 2.9
   terabytes monthly.  Actual figures will vary by site design and
   traffic patterns, but these orders of magnitude are realistic for
   many contemporary sites.

   Automated systems benefit as well.  Consolidated representations
   provide clearer information structure, reducing parsing errors from
   complex HTML and JavaScript execution.  Deliberate consolidation
   supplies context that scattered pages cannot, improving comprehension
   and result quality.

A.4.  Operational Benefits

   The benefits of explicit negotiation are not limited to reduced
   server load, improved cache efficiency, and lower bandwidth use.  A
   publisher can measure requests containing Prefer:
   return=consolidated, responses carrying Preference-Applied:
   return=consolidated, and cache behaviour varying on Accept and
   Prefer.  These are stable protocol signals rather than inferences
   from client identity.

   Client-detection approaches require servers to decide whether a
   request comes from a recognised crawler, agent, or intermediary.
   That decision may depend on User-Agent strings, IP ranges, reverse
   DNS, CDN-provided bot classifications, allow lists, or deployment-
   specific heuristics.  Each additional rule increases operational
   complexity and creates another failure mode: an unrecognised
   automated client receives navigational HTML, a recognised identity is
   imitated, or a cache cannot safely reuse a response because
   representation selection depends on implicit client classification.

   Explicit negotiation separates representation selection from access
   control, authorisation, commercial policy, abuse prevention, and
   telemetry.  Servers may still apply those policies, but the request
   for a machine-readable representation is expressed independently of
   the client's identity.  Unknown clients that implement the protocol


Lecklider                Expires 2 January 2027                [Page 12]

Internet-Draft      Consolidated Content Negotiation           July 2026


   can request the same representation form without being pre-
   registered, and publishers can evaluate support using ordinary HTTP
   request and response data.

   These benefits are difficult to express as universal percentages, but
   they are operationally significant.  They reduce dependence on
   brittle client-detection rules, improve cacheability, simplify
   measurement, and provide a cleaner path for new automated clients to
   interoperate with publishers.

Appendix B.  Deployment Lessons and Related Approaches

   This appendix describes deployment lessons that informed this
   specification.  The sections that follow draw on patterns in web
   deployment history that apply to machine-readable representations.

   First, mechanisms that work with HTTP deployment practice tend to
   outlast mechanisms that require publishers, clients, and
   intermediaries to adopt a separate architectural model, even where
   that model is technically well-designed.

   Second, selecting representations by detecting client identity has a
   poor history.  It can be useful during early deployment, but it does
   not scale cleanly and is difficult to unwind once clients and servers
   depend on it.

   Third, any representation used by automated systems for
   classification, ranking, summarisation, citation, recommendation, or
   other downstream action creates incentives for optimisation and
   manipulation.

   These lessons support the design choice made by this specification:
   consolidated machine-readable representations are requested and
   identified using existing HTTP semantics.  They are representations
   of resources, not side effects of path conventions, client identity,
   or deployment-specific inference.

B.1.  Relationship to Earlier Machine-Readable Web Approaches

   Various initiatives have attempted to make web content more machine-
   accessible.  The Semantic Web / RDF / Linked Data efforts, XHTML and
   XML-based publishing approaches, and embedded structured data
   approaches like Microdata and Schema.org have been under development
   for decades, yet broad and consistent adoption has not materialised.
   Feed formats like RSS and Atom achieved significant adoption but
   remained separate from the standard web browser model, requiring
   dedicated client software or an aggregator.  In practice, this
   separation was vulnerable: when major implementations were


Lecklider                Expires 2 January 2027                [Page 13]

Internet-Draft      Consolidated Content Negotiation           July 2026


   discontinued, the ecosystem fragmented and recovery proved difficult.

   These approaches addressed real problems, and some remain useful in
   particular domains.  The deployment lesson is narrower: mechanisms
   that require publishers, clients, and intermediaries to adopt a
   separate architectural model have struggled to displace ordinary HTTP
   deployment practice, even where the model is technically well-
   designed.

   This specification therefore takes a different approach by using only
   existing HTTP mechanisms.

   *  *Uses existing HTTP content negotiation* - RFC 9110

   *  *Uses existing Prefer header* - RFC 7240

   *  *Uses existing media types* - for example, text/markdown (RFC
      7763), application/json, application/xml

   *  *No new protocols, no new standards, no new infrastructure*

   Unlike approaches that prescribe specific data models or require
   adoption of complex frameworks, this specification leaves content
   organisation entirely to implementers.  The challenge shifts from
   technical implementation to editorial judgement: what information
   belongs together, what context is needed, what can be omitted.

B.2.  Contemporary Approaches

B.2.1.  Representations Selected by Client Detection

   Some deployments provide machine-readable or agent-oriented
   representations only to recognised automated clients.  These
   deployments are operationally useful and are not precluded by this
   specification.  They also demonstrate demand for the underlying
   pattern: publishers want to provide representations that differ from
   ordinary navigational HTML, and automated systems benefit from
   representations that are smaller, cleaner, better structured, or more
   complete for machine processing.

   However, client-detection-based deployment is not, by itself,
   representation negotiation.  Selecting a response because a server
   recognised, or believed it recognised, a particular crawler or agent
   does not provide an unknown client with a standard way to request the
   same kind of representation.  Nor does it provide a general mechanism
   for selecting between available machine-readable representations.


Lecklider                Expires 2 January 2027                [Page 14]

Internet-Draft      Consolidated Content Negotiation           July 2026


   During earlier phases of web deployment, servers often had genuine
   reasons to vary responses according to browser identity.  Browsers
   differed substantially in their support for HTML, CSS, scripting,
   media formats, and platform behaviours.  Server-side User-Agent
   detection was therefore a practical compatibility technique:
   operators wanted to route clients to content they could actually
   render or execute.

   That model took a long time to unwind, and it has not disappeared
   entirely.  The web eventually shifted toward capability detection,
   progressive enhancement, explicit negotiation, and well-defined
   protocol semantics, but browser-specific handling remained common for
   many years and survives in some deployments.

   Browser detection also changed client incentives.  When usable
   content depended on being recognised as a particular browser, clients
   sometimes had to identify themselves as that browser to receive a
   usable response.  The result was increasingly artificial User-Agent
   strings listing multiple products, engines, and compatibility claims.
   Even today, they are as much an accretion of historical deployment
   workarounds as a reliable means to identify a browser.

   The same failure mode will arise when machine-readable
   representations depend on detecting particular automated clients.  If
   access to a better, cleaner, cheaper, or more complete representation
   depends on being recognised as a favoured client, other clients will
   imitate that identity.  Detection rules then become a moving target,
   and deployment shifts toward a whack-a-mole model.

   A server can maintain an allow list of known crawlers or agents, but
   the set of possible clients is not knowable in advance.  Without a
   generic request mechanism, an unknown automated client must either be
   recognised heuristically, use a special-purpose path convention,
   impersonate a known client, or fall back to processing navigational
   HTML.

   This history is relevant because machine-readable representations are
   still an emerging deployment pattern.  Implementations do not have to
   repeat the failure modes of browser sniffing.  Client detection may
   remain useful for access control, abuse prevention, commercial
   policy, or telemetry, but it should not become part of the
   representation layer.  By using explicit HTTP negotiation for
   representation selection, new agent-facing systems can avoid making
   client identity a substitute for representation semantics.


Lecklider                Expires 2 January 2027                [Page 15]

Internet-Draft      Consolidated Content Negotiation           July 2026


B.2.2.  The llms.txt Convention

   The llms.txt convention was proposed in September 2024 as a pragmatic
   mechanism for helping automated systems use websites at inference
   time.  Its motivating concern was that context windows were too small
   to handle many websites in their entirety, and that converting
   complex HTML pages into LLM-friendly plain text was difficult and
   imprecise.  Since then, long-context models, retrieval systems, and
   infrastructure-level HTML-to-text conversion have reduced the force
   of those constraints, although precision and consistency at scale
   remain open questions.  The convention therefore reflects an earlier
   stage of deployment: a root /llms.txt Markdown file containing
   concise background information, guidance, and links to more detailed
   Markdown files, together with optional Markdown variants of
   individual pages using filename conventions.

   That approach remains useful for curated guidance, documentation
   sets, and cases where a client benefits from a compact orientation to
   a site.  It is simple to deploy, particularly for static sites and
   documentation generators, and can coexist with this specification.

   However, llms.txt is not a general representation-negotiation
   mechanism.  It does not define how a client requests a consolidated
   representation of a resource, how a server confirms that such a
   representation was supplied, or how intermediaries vary cached
   responses.  Relationships between HTML resources and Markdown
   variants are inferred from paths, links, or content discipline rather
   than expressed through HTTP semantics.

   As model context windows, retrieval systems, and agent infrastructure
   have evolved, the central deployment problem has shifted.  The issue
   is no longer only how to provide a compact guide to a site, but how
   automated clients and publishers can exchange machine-optimised
   representations predictably at web scale.  The llms.txt convention
   does not remove the need for protocol-level negotiation.

   The approaches therefore differ in architecture: llms.txt is a path
   and content convention, while this specification uses HTTP content
   negotiation and client preferences to identify representations of the
   same resource.  Both approaches recognise that automated systems
   benefit from clean machine-readable content, but they address
   different layers of the problem.


Lecklider                Expires 2 January 2027                [Page 16]

Internet-Draft      Consolidated Content Negotiation           July 2026


B.3.  Manipulation of Machine-Readable Representations

   Machine-readable representations create incentives.  Once automated
   systems use a representation to classify, rank, summarise, cite,
   recommend, or otherwise act on a resource, publishers and
   intermediaries have an incentive to optimise that representation for
   the consuming system.

   This is not new.  Search engines produced keyword stuffing, hidden
   text, doorway pages, cloaking, link farms, and other attempts to
   influence ranking systems.  Social platforms produced optimisation
   for preview cards, sharing heuristics, recommendation systems, and
   engagement metrics.  Machine-readable representations intended for
   automated agents are now producing the same pattern: representations
   are being created or modified specifically to influence downstream
   automated behaviour.

   Consolidated representations make the optimisation target more
   explicit.  That does not create a new class of risk.  Sites can
   already provide different content to automated clients through client
   detection, bot-specific routing, cloaking, or other response-
   selection mechanisms.  In those deployments, the consuming system may
   receive content that differs materially from the representation
   served to ordinary human navigation, without any protocol-level
   indication that this has occurred.

   This specification moves that risk into an explicit representation
   layer.  A client can request a consolidated representation, and a
   server can indicate that such a representation was supplied.  That
   improves protocol clarity, but it does not prove that the
   representation is accurate, complete, equivalent, benign, or faithful
   to any other representation of the resource.

   A publisher can decide what context to include, what to omit, what to
   emphasise, and how to structure the resource for automated
   processing.  A hostile or compromised origin can go further: it can
   serve a machine-readable representation that has little or no
   substantive relationship to the navigational HTML representation of
   the same resource.  Those behaviours are possible with or without
   this specification; the difference is that explicit negotiation makes
   the representation boundary visible.

   This is a deployment risk, not a protocol anomaly.  The protocol
   mechanism only identifies a requested representation of a resource;
   it does not attest to the accuracy, completeness, neutrality, or
   intent of that representation.  Consumers therefore need independent
   quality, provenance, corroboration, abuse-detection, and content-
   integrity controls.


Lecklider                Expires 2 January 2027                [Page 17]

Internet-Draft      Consolidated Content Negotiation           July 2026


   Optimisation is inevitable where automated systems create economic or
   reputational incentives.  The purpose of this specification is to
   avoid confusing representation availability with trustworthiness, and
   to ensure that clients have an explicit mechanism for requesting
   machine-readable representations without relying on client identity
   or ad-hoc detection.

Appendix C.  Example Transformation

   The following examples use text/markdown and application/json to
   illustrate consolidated representations.  These formats are chosen
   for clarity and represent common use cases, but the consolidation
   principle applies equally to any machine-readable format.

C.1.  Markdown Example: Product Website

   Consider a typical small business website with navigational
   structure:

   Site structure (navigational):
   /               (landing page: hero image, value proposition,
                    social proof testimonials, call-to-action buttons)
   /features/      (feature list with marketing copy)
   /features/a/    (detailed feature A with screenshots)
   /features/b/    (detailed feature B with screenshots)
   /pricing/       (pricing tiers with comparison table)
   /contact/       (contact form, office locations, map)
   /docs/          (technical documentation)

   A consolidated representation for the root resource provides an
   overview with links to detailed consolidated resources:


Lecklider                Expires 2 January 2027                [Page 18]

Internet-Draft      Consolidated Content Negotiation           July 2026


   GET / HTTP/1.1
   Accept: text/markdown;q=0.9
   Prefer: return=consolidated

   # Product Overview
   [Value proposition and core description from landing page.
    Hero images, testimonials, and CTAs omitted]

   High-level feature summary with key capabilities.

   For detailed information:
   - Features: /features/
   - Pricing: /pricing/
   - Documentation: /docs/

   [Note: Links point to same URLs; consolidated representations
    served based on Accept/Prefer headers]

   A consolidated representation for the features resource provides
   comprehensive technical detail:

   GET /features/ HTTP/1.1
   Accept: text/markdown;q=0.9
   Prefer: return=consolidated

   # Features

   ## Feature A
   ### What it does
   [Consolidated from /features/ and /features/a/ - technical
    description of capabilities]

   ### How it works
   [Implementation details from /docs/ where relevant]

   ### Pricing
   [Relevant pricing tier information for this feature, consolidated
    from /pricing/]

   ### Technical requirements
   [System requirements, API details, integration notes]

   ## Feature B
   [Similar comprehensive structure]

   Related resources:
   - Complete technical documentation: /docs/
   - Pricing comparison: /pricing/


Lecklider                Expires 2 January 2027                [Page 19]

Internet-Draft      Consolidated Content Negotiation           July 2026


   Note the key differences from simple page conversion:

   *  *Multiple consolidated views*: Different URLs provide different
      semantic organisations of the same underlying content

   *  *Deep consolidation*: Feature pages pull in relevant pricing,
      documentation, and technical details

   *  *Semantic restructuring*: Content organised by "what/how/
      requirements" rather than mirroring site navigation

   *  *Selective omission*: Marketing copy, testimonials, decorative
      elements excluded

   *  *Preserved navigation*: Links to other consolidated resources
      maintained for context

   This illustrates the key principle: consolidation is about semantic
   organisation and selective inclusion of substantive information, not
   mechanical conversion of all page content.  Each consolidated
   resource provides a complete, contextual view optimised for
   understanding that specific topic, drawing from multiple source pages
   as needed.

C.2.  JSON Example: Financial News Article

   Consolidated representations are not limited to text/markdown.
   Consider a financial news website:

   Site structure (navigational):
   /article/12345      (news article with ads, related links)
   /stock/ACME         (stock price chart and basics)
   /company/acme-corp  (company profile)
   /filings/acme-q3    (SEC filing summary)

   A consolidated JSON representation for the article provides
   structured data combining relevant financial information:

   GET /article/12345 HTTP/1.1
   Accept: application/json;q=0.9
   Prefer: return=consolidated

   HTTP/1.1 200 OK
   Content-Type: application/json
   Preference-Applied: return=consolidated
   ETag: "article-12345-consolidated-v2"
   Vary: Accept, Prefer


Lecklider                Expires 2 January 2027                [Page 20]

Internet-Draft      Consolidated Content Negotiation           July 2026


   {
     "alternateFormats": [
       {
         "type": "text/markdown",
         "title": "Article (Markdown)"
       }
     ],
     "article": {
       "headline": "ACME Corp Reports Strong Q3 Results",
       "published": "2024-12-16T14:30:00Z",
       "summary": "ACME Corp exceeded analyst expectations...",
       "content": "[Article text without ads/chrome]"
     },
     "financial_data": {
       "stock_symbol": "ACME",
       "current_price": 142.50,
       "change_percent": 8.3,
       "market_cap": "125B",
       "as_of": "2024-12-16T20:00:00Z"
     },
     "company": {
       "name": "ACME Corporation",
       "sector": "Technology",
       "employees": 15000,
       "founded": 1995
     },
     "quarterly_results": {
       "period": "Q3 2024",
       "revenue": "8.2B",
       "revenue_growth": 12.5,
       "eps": 1.85,
       "eps_expected": 1.72
     },
     "related": {
       "stock_details": "/stock/ACME",
       "company_profile": "/company/acme-corp",
       "sec_filings": "/filings/acme-q3"
     }
   }

   This JSON consolidation pulls key financial metrics from separate
   stock ticker, company profile, and filing pages, presenting them
   alongside the article content in a structured format optimised for
   automated analysis.  The same article URL serves both human-readable
   HTML and machine-readable consolidated JSON based on content
   negotiation.


Lecklider                Expires 2 January 2027                [Page 21]

Internet-Draft      Consolidated Content Negotiation           July 2026


Appendix D.  Implementation

D.1.  Client Implementation

   Clients can begin requesting consolidated representations
   immediately.  The protocol is designed for graceful degradation: if a
   server does not support consolidated representations, it will simply
   return the standard representation (typically HTML), which clients
   already handle.  In conforming deployments, unsupported preferences
   are ignored rather than treated as errors.  Early adoption benefits
   clients immediately whenever any publisher implements support, with
   no cost when publishers have not yet done so.

   Clients SHOULD request text/markdown by default, as it handles the
   vast majority of web content effectively.  When requesting resources
   known to contain structured data (API endpoints, financial feeds,
   datasets), clients SHOULD request application/json instead.  These
   formats have widespread parser support and cover nearly all use
   cases.  Clients can refine their format preferences as publisher
   implementations mature and specific needs emerge.

   Clients SHOULD:

   1.  Include Prefer: return=consolidated in requests where
       consolidated content would be beneficial

   2.  Specify appropriate Accept headers (text/markdown by default for
       text-heavy resources, application/json for known structured
       resources)

   3.  Check for Preference-Applied: return=consolidated in responses to
       confirm support

   4.  Fall back to standard content processing when the preference is
       not honoured

D.2.  Server Implementation

   Implementation requires three steps:

   1.  Parse the Prefer header and recognise return=consolidated

   2.  Serve machine-readable format when requested via Accept header

   3.  Generate appropriate consolidated content for the publisher's use
       case


Lecklider                Expires 2 January 2027                [Page 22]

Internet-Draft      Consolidated Content Negotiation           July 2026


   Existing web servers, frameworks, or WAFs that already parse the
   Prefer header may need minor updates to recognise the
   return=consolidated value.  Beyond that, the technical implementation
   is straightforward.  Publishers SHOULD log when consolidated
   representations are served for analytics and capacity planning;
   logging the Preference-Applied response header provides one
   straightforward approach.

   Publishers SHOULD begin by implementing text/markdown for text-heavy
   content and application/json for structured data.  These formats have
   widespread parser support, are straightforward to generate, and align
   with client expectations.  Publishers may evolve their format
   offerings as the ecosystem matures and specific consumer needs
   emerge, but starting with these pragmatic defaults ensures immediate
   interoperability.

   The real work lies in content decisions: which information to
   consolidate, how to structure it, what context to include, what to
   omit.  These decisions depend on site architecture, content type, and
   audience needs.  This specification provides the mechanism,
   publishers provide the judgement.  (Large language models may prove
   surprisingly capable at making these editorial decisions, should
   publishers wish to automate the process.)

   Publishers can begin with minimal investment; even simple format
   conversion of existing pages provides immediate bandwidth and load
   reduction benefits.  The incremental adoption approach (below) allows
   publishers to start small and expand based on observed value.

D.3.  Incremental Adoption

   Implementing consolidated representations does not require a complete
   site overhaul.  Publishers can adopt this specification
   incrementally:

   *Phase 1: Simple Format Conversion* Start by serving machine-readable
   versions of existing pages (ignoring the Prefer header initially).
   This provides immediate bandwidth and parsing benefits.  A simple
   conversion tool can generate markdown from HTML with minimal effort.

   *Phase 2: Analyse Access Patterns* Monitor which pages automated
   agents fetch together.  Collect statistics on common access patterns:
   which documentation pages are read sequentially, which product pages
   are accessed alongside pricing information, and so on.


Lecklider                Expires 2 January 2027                [Page 23]

Internet-Draft      Consolidated Content Negotiation           July 2026


   *Phase 3: Create Targeted Consolidated Resources* Based on usage
   patterns, create consolidated representations for high-traffic
   combinations.  A site might start with just two or three strategic
   consolidated resources covering the most common information retrieval
   patterns.

   *Phase 4: Expand as Beneficial* Add consolidated representations
   where server load or bandwidth justify the effort.

D.4.  Format Selection

   *Initial adoption*

   Publishers and consumers SHOULD begin with text/markdown for text-
   heavy content (documentation, articles, blogs, guides) and
   application/json for structured data (financial information, API
   responses, datasets, metrics).  These formats provide widespread
   parser availability, straightforward implementation, clear
   specifications, and existing adoption by automated consumers.
   Starting with these formats ensures immediate interoperability.

   Publishers MAY use alternative formats where compelling domain-
   specific reasons exist.  For example, application/xml for complex
   structured documents where XML tooling provides clear value, or text/
   csv for tabular data in domains with established CSV conventions.

   *Format evolution*

   This specification is deliberately format-agnostic to accommodate
   future evolution.  Publishers and consumers SHOULD NOT treat markdown
   and JSON as permanent requirements.  They are pragmatic starting
   points that work today.  Should new formats emerge with compelling
   advantages, this specification supports their adoption through
   discovery and standard content negotiation.  Migration is possible
   without specification change.

   *Selection principle*

   Keep it simple.  Use formats with widespread support, clear
   specifications, and proven parser availability.  Novel formats
   require compelling justification from major consumers demonstrating
   concrete benefits that outweigh implementation burden.


Lecklider                Expires 2 January 2027                [Page 24]

Internet-Draft      Consolidated Content Negotiation           July 2026


D.5.  Validation and Testing

   Analogous to how social media platforms provide preview tools for
   OpenGraph meta tags, organisations consuming consolidated
   representations SHOULD provide validation tools allowing publishers
   to verify their implementations.  Such tools should provide metrics
   and feedback beyond what simple command-line tools offer:
   consolidation quality assessment, structure analysis, and guidance on
   whether the representation meets consumer needs.  Only consumers
   possess the technical capability to evaluate whether consolidated
   representations suit their processing requirements.

Appendix E.  Motivation

   The World Wide Web was originally conceived as a system for sharing
   information, with HTML providing semantic markup focused on content
   and structure.

   Over time, the web evolved to prioritise presentation.  Modern web
   pages contain dramatically more presentational markup, navigation,
   advertising, and scripts than actual content, with the informational
   payload representing only a small fraction of transmitted bytes.

   For human readers using browsers, this evolution has been successful.
   However, for automated agents attempting to extract information, this
   presentational complexity is counterproductive.  Agents must parse
   elaborate HTML, execute JavaScript, and employ heuristics to
   distinguish content from chrome.

   This specification provides a mechanism for automated agents to
   request consolidated, presentation-free representations.  The purpose
   of information-rich sites - documentation, news, research, technical
   content - is to convey information.  Whether that information is
   consumed directly by humans or via automated intermediaries is
   immaterial; the underlying purpose remains unchanged.  Sites whose
   primary value lies in substantive content benefit from making that
   content efficiently accessible to machines.  Sites whose value
   proposition is purely presentational will find this specification of
   limited relevance, which is as it should be.

   The challenge was to identify existing HTTP capabilities that could
   address this use case.  The HTTP specifications are both extensive
   and comprehensive; practical deployment requires working within these
   established capabilities.  Content negotiation with client
   preferences proved sufficient.


Lecklider                Expires 2 January 2027                [Page 25]

Internet-Draft      Consolidated Content Negotiation           July 2026


Acknowledgements

   The author thanks Darryl Hughes and Helen King for their careful
   review and thoughtful feedback on this document.

Author's Address

   Charles Lecklider
   Independent
   Email: charlesl@invis.net
   URI:   https://invis.net


Lecklider                Expires 2 January 2027                [Page 26]