<?xml version="1.0" encoding="UTF-8"?>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude"
     version="3"
     ipr="trust200902"
     category="info"
     docName="draft-chen-quic-logical-vuln-mitigations-00"
     submissionType="IETF"
     consensus="false"
     sortRefs="true"
     symRefs="true"
     tocInclude="true">

  <front>
    <title abbrev="QUIC Logical Vulnerability Mitigations">
      Mitigating Logical Vulnerabilities in QUIC Implementations
    </title>

    <seriesInfo name="Internet-Draft" value="draft-chen-quic-logical-vuln-mitigations-00"/>

    <author fullname="Jianjun Chen" initials="J." surname="Chen">
      <organization>Tsinghua University</organization>
      <address>
        <postal>
          <country>China</country>
        </postal>
        <email>jianjun@tsinghua.edu.cn</email>
      </address>
    </author>

    <date year="2026" month="03" day="13"/>

    <area>Transport</area>

    <keyword>QUIC</keyword>
    <keyword>security</keyword>
    <keyword>robustness</keyword>
    <keyword>resource exhaustion</keyword>
    <keyword>state machine</keyword>

    <abstract>
      <t>
        This document describes protocol and implementation practices for
        mitigating logical vulnerabilities in QUIC implementations. It focuses
        on denial-of-service and correctness failures caused by unbounded
        resource retention, unsafe state transitions, and insufficiently
        defensive handling of adversarial but parseable protocol inputs.
      </t>
      <t>
        The document provides actionable guidance for protocol designers,
        implementers, and operators. It proposes concrete resource bounds,
        defensive validation rules, queue-management practices, and
        state-machine invariants intended to reduce exposure to memory
        exhaustion, CPU exhaustion, queue blow-up, connection starvation, and
        crash-triggering state confusion in QUIC endpoints.
      </t>
    </abstract>
  </front>

  <middle>
    <section numbered="true" anchor="intro">
      <name>Introduction</name>
      <t>
        QUIC provides low-latency connection establishment, encrypted transport,
        stream multiplexing, and path migration over UDP. The core transport is
        specified in <xref target="RFC9000"/>, TLS integration in
        <xref target="RFC9001"/>, and loss detection and congestion control in
        <xref target="RFC9002"/>.
      </t>
      <t>
        QUIC has broad deployment and therefore broad attack surface. While
        protocol analyses and implementation testing have historically focused
        on memory corruption and parser defects, recent work shows that QUIC
        endpoints are also vulnerable to <em>logical vulnerabilities</em>:
        failures induced by valid or near-valid protocol behaviors that exploit
        ambiguous requirements, missing limits, or brittle implementation
        assumptions. Wang et al. evaluated 16 widely used QUIC implementations
        and reported 14 previously unknown logical vulnerabilities affecting
        projects such as quiche, xquic, aioquic, picoquic, h2o, lsquic, and
        neqo <xref target="QUICLOGIC"/>.
      </t>
      <t>
        This document does not redefine QUIC. Instead, it provides a defensive
        robustness profile for QUIC endpoints. Unless explicitly stated
        otherwise, the requirements in this document are implementation
        robustness recommendations and do not update the wire protocol
        specified by <xref target="RFC9000"/>, <xref target="RFC9001"/>, or
        <xref target="RFC9002"/>. The goal is to preserve interoperability
        while making denial-of-service and state-confusion attacks materially
        harder to trigger.
      </t>
    </section>

    <section numbered="true" anchor="conventions">
      <name>Conventions and Terminology</name>
      <t>
        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in BCP
        14 <xref target="BCP14"/> when, and only when, they appear in all
        capitals, as shown here.
      </t>
      <t>
        The term "logical vulnerability" refers to an implementation weakness
        that can be triggered without memory corruption and that results from
        unsafe protocol-state transitions, unbounded resource retention,
        insufficient input normalization, or inconsistent handling of malformed
        but parseable protocol events.
      </t>
    </section>

    <section numbered="true" anchor="threat-model">
      <name>Threat Model and Scope</name>
      <t>
        This document considers off-path and on-path attackers that can create
        new QUIC connections, send large numbers of frames, manipulate timing,
        withhold acknowledgments, or send frame sequences that are valid,
        ambiguous, duplicated, reordered, or otherwise adversarially chosen.
        The primary attacker objective is denial of service through memory
        exhaustion, CPU exhaustion, send-queue growth, connection starvation,
        or assertion failure.
      </t>
      <t>
        The recommendations in this document target server and client
        implementations, though the operational risk is generally higher for
        publicly reachable servers. The document focuses on transport-layer
        behaviors and does not address application-layer misuse above QUIC
        except where application scheduling interacts with transport state.
      </t>
    </section>

    <section numbered="true" anchor="taxonomy">
      <name>Observed Vulnerability Classes</name>

      <section numbered="true" anchor="crypto-flood">
        <name>CRYPTO Flood</name>
        <t>
          QUIC CRYPTO frames carry TLS handshake data. <xref target="RFC9000"/>
          requires endpoints to support buffering at least 4096 bytes of
          out-of-order CRYPTO data, and it explicitly notes that, because
          CRYPTO frames are not flow-controlled, a peer could force unbounded
          buffering if implementations do not apply limits. Wang et al.
          observed implementations that retained many out-of-order CRYPTO
          fragments in memory, resulting in memory exhaustion
          <xref target="QUICLOGIC"/>.
        </t>
      </section>

      <section numbered="true" anchor="pc-flood">
        <name>PATH_CHALLENGE Flood</name>
        <t>
          During path validation, a peer sends PATH_CHALLENGE frames and expects
          corresponding PATH_RESPONSE frames as described in
          <xref target="RFC9000"/>. Wang et al. observed implementations in
          which rapid arrival of PATH_CHALLENGE frames, combined with delayed or
          missing acknowledgment of corresponding responses, caused memory
          growth, buffer retention, or unbounded send-queue expansion
          <xref target="QUICLOGIC"/>.
        </t>
      </section>

      <section numbered="true" anchor="nci-flood">
        <name>NEW_CONNECTION_ID Flood</name>
        <t>
          QUIC connection migration relies on NEW_CONNECTION_ID and
          RETIRE_CONNECTION_ID frame exchanges. <xref target="RFC9000"/>
          includes requirements around active_connection_id_limit and recommends
          limiting the number of locally retired connection IDs that remain
          unacknowledged. Wang et al. observed implementations where repeated
          NEW_CONNECTION_ID processing and delayed retirement acknowledgment
          created unbounded RC-frame accumulation or send-queue growth
          <xref target="RFC9000"/> <xref target="QUICLOGIC"/>.
        </t>
      </section>

      <section numbered="true" anchor="hold-on">
        <name>Connection Hold-On Flood</name>
        <t>
          Wang et al. reported a scheduling and lifecycle bug in which
          connection entries that were effectively closed or inactive still
          occupied positions in a traversal structure used to service active
          connections. Under attacker-induced connection churn, legitimate
          connections were starved even though CPU and memory usage appeared
          normal. This creates a "silent" denial of service that is visible in
          service availability rather than host resource alarms
          <xref target="QUICLOGIC"/>.
        </t>
      </section>

      <section numbered="true" anchor="double-unreg">
        <name>Double Unregistered CID</name>
        <t>
          Wang et al. also observed failures in connection ID retirement logic
          where an implementation could unregister or retire the same logical
          CID state twice under adversarial sequencing, ultimately triggering a
          crash condition <xref target="QUICLOGIC"/>.
        </t>
      </section>

      <section numbered="true" anchor="ack-confusion">
        <name>ACK Confusion</name>
        <t>
          ACK frames represent received packet ranges. Wang et al. generated ACK
          frames containing overlapping ranges and observed divergence among
          implementations: some ignored them, some closed the connection, and at
          least one repeatedly mutated acknowledgment state for already
          acknowledged packets, causing CPU exhaustion or crash behavior
          <xref target="QUICLOGIC"/>. Robust handling of malformed or redundant
          ACK range encodings is therefore an interoperability and security
          concern.
        </t>
      </section>
    </section>

    <section numbered="true" anchor="requirements">
      <name>Robustness Requirements for QUIC Endpoints</name>

      <section numbered="true" anchor="bounds">
        <name>Explicit Resource Bounds</name>

        <section numbered="true" anchor="bounds-crypto">
          <name>CRYPTO Reassembly Bounds</name>
          <t>
            An endpoint MUST maintain an explicit upper bound on buffered
            out-of-order CRYPTO data per connection. The implementation MUST
            apply the bound independently of allocator behavior or container
            growth policy.
          </t>
          <t>
            This document RECOMMENDS a default limit of 512 KiB of buffered
            out-of-order CRYPTO data per connection, consistent with the
            concrete example suggested by Wang et al. <xref target="QUICLOGIC"/>.
            An implementation MAY use a different value, but any larger value
            SHOULD be justified by deployment needs such as unusually large
            certificate chains or delegated credentials.
          </t>
          <t>
            If the configured bound is exceeded during the handshake, an
            endpoint SHOULD either temporarily expand the buffer only as needed
            to complete the handshake or close the connection with
            CRYPTO_BUFFER_EXCEEDED, consistent with <xref target="RFC9000"/>.
            After the handshake, an endpoint SHOULD prefer discarding excess
            CRYPTO data or terminating the connection rather than retaining
            unbounded state.
          </t>
        </section>

        <section numbered="true" anchor="bounds-path">
          <name>Path Validation Bounds</name>
          <t>
            An endpoint MUST bound the number of outstanding path-validation
            items it is willing to track per connection and per candidate path.
            This bound MUST apply to pending PATH_CHALLENGE state, queued
            PATH_RESPONSE state, retransmission bookkeeping, and any
            implementation-specific metadata derived from those frames.
          </t>
          <t>
            This document RECOMMENDS accepting no more than 256 outstanding
            PATH_CHALLENGE-related transactions for a connection before path
            validation completes, after which the counter can be reset as
            suggested by Wang et al. <xref target="QUICLOGIC"/>. Endpoints
            SHOULD additionally apply per-path rate limits and SHOULD drop or
            coalesce redundant validation work when newer probes supersede older
            ones.
          </t>
        </section>

        <section numbered="true" anchor="bounds-cid">
          <name>CID Retirement and Issuance Bounds</name>
          <t>
            Implementations MUST cap all internal state associated with
            NEW_CONNECTION_ID and RETIRE_CONNECTION_ID processing, including:
          </t>
          <ul spacing="normal">
            <li>active connection IDs, as bounded by active_connection_id_limit;</li>
            <li>locally retired but unacknowledged CIDs;</li>
            <li>queued RETIRE_CONNECTION_ID frames;</li>
            <li>per-path metadata associated with issued CIDs; and</li>
            <li>historical retirement bookkeeping used to prevent duplicate processing.</li>
          </ul>
          <t>
            <xref target="RFC9000"/> already recommends that endpoints limit the
            number of locally retired connection IDs whose
            RETIRE_CONNECTION_ID frames have not been acknowledged and allow at
            least twice the peer's active_connection_id_limit for tracking.
            This document further RECOMMENDS an implementation-level hard cap of
            256 CID retirements in progress before migration completes, in line
            with the example suggested by Wang et al. <xref target="RFC9000"/>
            <xref target="QUICLOGIC"/>.
          </t>
          <t>
            If the implementation reaches any retirement or issuance threshold,
            it SHOULD stop allocating additional per-CID state and SHOULD either
            reject the triggering input with a connection error or defer
            processing until existing retirement work is completed.
          </t>
        </section>
      </section>

      <section numbered="true" anchor="state-machine">
        <name>State-Machine Safety</name>

        <section numbered="true" anchor="closed-conns">
          <name>Closed or Inactive Connections</name>
          <t>
            Closed, draining, or otherwise non-serviceable connections MUST NOT
            remain in scheduling structures in a way that can block forward
            progress for active connections. Implementations SHOULD ensure that
            connection iteration skips closed entries without terminating the
            scheduling loop for the remaining active set.
          </t>
          <t>
            Implementations SHOULD enforce a separate lifecycle invariant:
            connection cleanup MUST eventually remove all inactive connection
            entries from traversal and lookup structures, even if application
            streams close before transport state is fully reclaimed.
          </t>
        </section>

        <section numbered="true" anchor="cid-idempotence">
          <name>CID Retirement Idempotence</name>
          <t>
            CID retirement and unregistration logic MUST be idempotent.
            Reprocessing a retirement event for the same CID sequence number
            MUST NOT free, unregister, erase, or otherwise invalidate the same
            underlying object twice.
          </t>
          <t>
            Implementations SHOULD maintain explicit per-CID state markers such
            as issued, active, retiring, retired, and destroyed. A transition
            to retired or destroyed MUST be monotonic and MUST be checked before
            any destructive action is taken.
          </t>
        </section>

        <section numbered="true" anchor="ack-validation">
          <name>ACK Range Normalization and Validation</name>
          <t>
            Before mutating loss-recovery or acknowledgment state, an endpoint
            SHOULD validate that ACK ranges are well-formed, non-overlapping,
            and strictly descending according to the QUIC ACK encoding model.
            Receipt of an ACK frame whose reconstructed ranges overlap, repeat
            packet numbers in a way the implementation does not intentionally
            support, or underflow range arithmetic SHOULD be treated as a
            connection error of type FRAME_ENCODING_ERROR or
            PROTOCOL_VIOLATION.
          </t>
          <t>
            An endpoint MUST NOT repeatedly mutate per-packet acknowledgment
            state for the same packet number solely because that packet appears
            multiple times in a malformed ACK frame. If the implementation
            chooses not to close the connection immediately, it MUST at minimum
            normalize duplicate ranges so that each acknowledged packet is
            processed at most once.
          </t>
        </section>
      </section>

      <section numbered="true" anchor="queue-mgmt">
        <name>Queue Management and Backpressure</name>
        <t>
          Every QUIC endpoint SHOULD enforce quotas on internal queues whose
          growth can be influenced by peer input, including send queues,
          retransmission queues, CRYPTO reassembly buffers, path-validation
          queues, ACK processing work queues, and CID-retirement queues.
        </t>
        <t>
          Implementations SHOULD define admission-control behavior for each such
          queue. Once a queue reaches its configured limit, the endpoint SHOULD
          drop superseded work, coalesce equivalent work items, defer further
          processing, or close the connection. Blindly appending new work items
          is NOT RECOMMENDED.
        </t>
        <t>
          Endpoints SHOULD also maintain per-connection and per-peer fairness
          controls so that one connection cannot indefinitely consume processing
          budget needed by others.
        </t>
      </section>

      <section numbered="true" anchor="implementation">
        <name>Implementation Guidance</name>
        <t>
          Wang et al. observed that resource-consumption bugs were more common
          in implementations written in higher-level languages using dynamic
          containers such as lists, dictionaries, or vectors
          <xref target="QUICLOGIC"/>. This document therefore RECOMMENDS that
          implementations:
        </t>
        <ul spacing="normal">
          <li>attach explicit quotas to all attacker-influenced dynamic containers;</li>
          <li>separate protocol correctness from allocator growth behavior;</li>
          <li>instrument memory, queue depth, and per-connection state cardinality;</li>
          <li>use invariant checks in debug and fuzzing builds for CID and ACK state; and</li>
          <li>treat path validation, retirement, and reassembly structures as security-sensitive resources.</li>
        </ul>
      </section>

      <section numbered="true" anchor="testing">
        <name>Continuous Differential and Adversarial Testing</name>
        <t>
          The vulnerabilities motivating this document were discovered through
          adversarial black-box fuzzing and differential testing across
          implementations <xref target="QUICLOGIC"/>. QUIC stacks SHOULD be
          tested continuously with:
        </t>
        <ul spacing="normal">
          <li>long frame sequences rather than single-frame mutations;</li>
          <li>delayed, dropped, duplicated, and reordered frame interactions;</li>
          <li>resource-usage assertions, not only crash detection;</li>
          <li>cross-implementation differential checks for connection behavior; and</li>
          <li>invariants covering connection cleanup, ACK processing, and CID retirement.</li>
        </ul>
      </section>
    </section>

    <section numbered="true" anchor="updates">
      <name>Suggested Areas for Future QUIC Specification Clarification</name>
      <t>
        Wang et al. argue that some implementation divergence stems from places
        where existing RFC text provides minimum behavior but not explicit upper
        bounds <xref target="QUICLOGIC"/>. Based on that observation, future
        QUIC maintenance or applicability documents should consider clarifying:
      </t>
      <ul spacing="normal">
        <li>the expectation that CRYPTO buffering is both mandatory and bounded;</li>
        <li>the expectation that path-validation state is bounded and rate-limited;</li>
        <li>the expectation that CID retirement tracking is finite and subject to backpressure;</li>
        <li>the handling of malformed or overlapping ACK ranges; and</li>
        <li>implementation obligations for cleanup of inactive connection state.</li>
      </ul>
      <t>
        Such clarifications would improve interoperability by reducing the space
        of "technically plausible but operationally unsafe" implementation
        choices.
      </t>
    </section>

    <section numbered="true" anchor="security-considerations">
      <name>Security Considerations</name>
      <t>
        This entire document is about security considerations. Logical
        vulnerabilities in QUIC can enable denial-of-service attacks without
        violating packet protection or exploiting memory corruption. The
        recommended mitigations reduce exposure to memory exhaustion, CPU
        exhaustion, queue blow-up, connection starvation, and crash-triggering
        state confusion.
      </t>
      <t>
        None of the mitigations in this document eliminate the need for
        authentication, address validation, anti-amplification controls, or
        congestion control as defined by the base QUIC specifications. Instead,
        they complement those mechanisms by constraining state growth and
        hardening transport logic against adversarial but syntactically
        parseable inputs.
      </t>
    </section>

    <section numbered="true" anchor="iana">
      <name>IANA Considerations</name>
      <t>
        This document has no IANA actions.
      </t>
    </section>
  </middle>

  <back>
    <references>
      <name>Normative References</name>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml-rfcsubseries/reference.BCP.14.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9000.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9001.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9002.xml"/>
    </references>

    <references>
      <name>Informative References</name>

      <reference anchor="QUICLOGIC"
                 target="https://www.ndss-symposium.org/ndss-paper/identifying-logical-vulnerabilities-in-quic-implementations/">
        <front>
          <title>Identifying Logical Vulnerabilities in QUIC Implementations</title>
          <author initials="K." surname="Wang" fullname="Kaihua Wang"/>
          <author initials="J." surname="Chen" fullname="Jianjun Chen"/>
          <author initials="P." surname="Chen" fullname="Pinji Chen"/>
          <author initials="J." surname="Zhuge" fullname="Jianwei Zhuge"/>
          <author initials="J." surname="Bai" fullname="Jiaju Bai"/>
          <author initials="H." surname="Duan" fullname="Haixin Duan"/>
          <date month="February" year="2026"/>
        </front>
        <seriesInfo name="NDSS Symposium" value="2026"/>
      </reference>
    </references>
  </back>
</rfc>
