Media Over QUIC                                                 M. Jiang
Internet-Draft                                                    Y. Liu
Intended status: Standards Track                            Alibaba Inc.
Expires: 18 September 2026                                         R. Wu
                                                              Ant Group.
                                                           17 March 2026


                        MoQ Multimodal Feedback
                 draft-jiang-moq-multimodal-feedback-00

Abstract

   This document defines an extension to Media over QUIC Transport
   (MOQT) that enables MoQ receivers to report delivery quality
   information for media Objects to senders.  The MoQ layer synthesizes
   MMF feedback and local congestion control (CC) output to compute
   control decisions such as bitrate, frame rate, and pacing, and inform
   the CC algorithm module via a cross-layer control interface.  This
   mechanism reuses the MOQT Track/Object data model without introducing
   new control message types.  While QUIC ACK and reception timestamp
   extensions continue to provide per-packet CC signals; this mechanism
   adds per-Object media semantic feedback when the MMF extension is
   negotiated and enabled.

Discussion Venues

   This note is to be removed before publishing as an RFC.

   Discussion of this document takes place on the Media Over QUIC
   Working Group mailing list (moq@ietf.org), which is archived at
   https://mailarchive.ietf.org/arch/browse/moq/.

   Source for this draft and an issue tracker can be found at
   https://github.com/Yanmei-Liu/draft-moq-multimodal-feedback.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.


Jiang, et al.           Expires 18 September 2026               [Page 1]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 18 September 2026.

Copyright Notice

   Copyright (c) 2026 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
     1.1.  Conventions and Definitions . . . . . . . . . . . . . . .   4
   2.  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . .   5
   3.  Architecture  . . . . . . . . . . . . . . . . . . . . . . . .   5
     3.1.  Three-Layer Architecture: Application / MoQ /
           Transport . . . . . . . . . . . . . . . . . . . . . . . .   5
       3.1.1.  Application Layer: Media Sources  . . . . . . . . . .   5
       3.1.2.  MoQ Layer: Semantic Hub . . . . . . . . . . . . . . .   6
       3.1.3.  Transport Layer: CC Algorithms  . . . . . . . . . . .   7
     3.2.  Dual-Layer Feedback Model: QUIC receive-ts and MMF  . . .   7
       3.2.1.  QUIC receive-ts Requirements  . . . . . . . . . . . .   8
     3.3.  Cross-Layer Control Interface . . . . . . . . . . . . . .   8
     3.4.  Bidirectional MMF: Output Feedback and Input Feedback . .  10
   4.  Feedback Track  . . . . . . . . . . . . . . . . . . . . . . .  10
     4.1.  Track Definition  . . . . . . . . . . . . . . . . . . . .  10
     4.2.  Track Naming  . . . . . . . . . . . . . . . . . . . . . .  11
     4.3.  Track Establishment and Lifecycle . . . . . . . . . . . .  11
     4.4.  Transport and Priority  . . . . . . . . . . . . . . . . .  12
   5.  Feedback Report Format  . . . . . . . . . . . . . . . . . . .  12
     5.1.  Report Structure  . . . . . . . . . . . . . . . . . . . .  13
     5.2.  Object Entry  . . . . . . . . . . . . . . . . . . . . . .  13
       5.2.1.  Object ID . . . . . . . . . . . . . . . . . . . . . .  13
       5.2.2.  Status  . . . . . . . . . . . . . . . . . . . . . . .  14
       5.2.3.  Receive Timestamp Delta . . . . . . . . . . . . . . .  14
     5.3.  Delivery Status Codes . . . . . . . . . . . . . . . . . .  14


Jiang, et al.           Expires 18 September 2026               [Page 2]

Internet-Draft           MoQ Multimodal Feedback              March 2026


       5.3.1.  RECEIVED_LATE Determination . . . . . . . . . . . . .  15
       5.3.2.  NOT_RECEIVED Reporting Timing . . . . . . . . . . . .  15
       5.3.3.  PARTIALLY_RECEIVED  . . . . . . . . . . . . . . . . .  15
     5.4.  Summary Stats Block . . . . . . . . . . . . . . . . . . .  16
       5.4.1.  Report Interval . . . . . . . . . . . . . . . . . . .  16
       5.4.2.  Total Objects Evaluated . . . . . . . . . . . . . . .  16
       5.4.3.  Objects Received  . . . . . . . . . . . . . . . . . .  17
       5.4.4.  Objects Received Late . . . . . . . . . . . . . . . .  17
       5.4.5.  Objects Lost  . . . . . . . . . . . . . . . . . . . .  17
       5.4.6.  Avg Inter-Arrival Delta . . . . . . . . . . . . . . .  17
     5.5.  Optional Media Metrics  . . . . . . . . . . . . . . . . .  18
     5.6.  Report Size Control . . . . . . . . . . . . . . . . . . .  20
       5.6.1.  Encoding Example  . . . . . . . . . . . . . . . . . .  21
   6.  Negotiation . . . . . . . . . . . . . . . . . . . . . . . . .  22
     6.1.  Setup Parameter . . . . . . . . . . . . . . . . . . . . .  22
     6.2.  Capability Negotiation Rules  . . . . . . . . . . . . . .  23
     6.3.  Behavior When Parameter Not Declared  . . . . . . . . . .  23
     6.4.  Runtime Capability Change . . . . . . . . . . . . . . . .  24
   7.  Receiver Behavior . . . . . . . . . . . . . . . . . . . . . .  24
     7.1.  Arrival Time Recording  . . . . . . . . . . . . . . . . .  24
     7.2.  Delivery Status Determination . . . . . . . . . . . . . .  24
     7.3.  MMF Generation Frequency  . . . . . . . . . . . . . . . .  24
     7.4.  Object Entry Selection  . . . . . . . . . . . . . . . . .  25
     7.5.  Exception Handling  . . . . . . . . . . . . . . . . . . .  26
   8.  Sender Behavior . . . . . . . . . . . . . . . . . . . . . . .  26
     8.1.  Object to QUIC Packet Mapping . . . . . . . . . . . . . .  26
       8.1.1.  Object Granularity  . . . . . . . . . . . . . . . . .  26
       8.1.2.  Packet Not Crossing Object Boundaries . . . . . . . .  26
       8.1.3.  Sender-Side per-Object Transmission Statistics  . . .  27
       8.1.4.  Frame-Level Pacing  . . . . . . . . . . . . . . . . .  28
     8.2.  Application-Layer Consumption . . . . . . . . . . . . . .  28
     8.3.  Transport-Layer Consumption (Cross-Layer Control) . . . .  29
     8.4.  Example: MoQ Layer Controlling BBR  . . . . . . . . . . .  29
   9.  Application Scenarios: Streaming Media and AI Inference . . .  30
     9.1.  MMF-Driven Rate-Quality Adaptation  . . . . . . . . . . .  30
     9.2.  Typical Use Cases . . . . . . . . . . . . . . . . . . . .  32
       9.2.1.  Use Case A: Video Live Streaming ABR Adaptation . . .  32
       9.2.2.  Use Case B: Bandwidth Drop (Audio-Video Mixed)  . . .  32
       9.2.3.  Use Case C: Multi-Layer Quality Adaptation (AI
               Inference)  . . . . . . . . . . . . . . . . . . . . .  33
       9.2.4.  Use Case D: Generation Rate Overload (AI
               Inference)  . . . . . . . . . . . . . . . . . . . . .  33
       9.2.5.  Use Case E: Streaming Input Inference (Bidirectional
               MMF)  . . . . . . . . . . . . . . . . . . . . . . . .  33
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  34
   11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  34
   12. Normative References  . . . . . . . . . . . . . . . . . . . .  35
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  35


Jiang, et al.           Expires 18 September 2026               [Page 3]

Internet-Draft           MoQ Multimodal Feedback              March 2026


1.  Introduction

   Media over QUIC Transport (MOQT, [MoQTransport]) is a QUIC-based
   publish/subscribe media transport framework.  In low-latency
   interactive scenarios, senders need to obtain media delivery quality
   information from peer to adjust sending strategies.  Adjustments
   occur at two layers:

   *  *Application layer:* Encoders adjust bitrate/frame rate, inference
      systems adjust generation rate, and ABR switches Tracks.

   *  *Transport layer:* Congestion control (CC) algorithms adjust cwnd/
      pacing rate.

   QUIC Transport layer feedback (QUIC-ACK, receive timestamps
   [quic-receive-ts]) only covers the transport layer, leaving blind
   spots at the MoQ semantic level (see Section 2 for details).  This
   document defines a MoQ-layer feedback mechanism that provides per-
   Object media semantic feedback to the application layer.  The MoQ
   layer also serves as the control layer for CC algorithms,
   synthesizing MMF signals and local transport state to issue control
   commands such as pacing rate and pacing gain to CC (see Section 3.3
   for details).

1.1.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   The following terms are used throughout this document:

   *  *MOQT:* Media over QUIC Transport, following [MoQTransport].

   *  *Object:* The smallest semantic unit of data delivery in MOQT.

   *  *Feedback Track:* A MoQ Track that carries MMF feedback.

   *  *MMF (MoQ Multimodal Feedback):* A receiver report at the MoQ
      layer containing per-Object delivery status.

   *  *Cross-Layer Control Interface:* A mechanism for the MoQ layer to
      issue control commands to CC algorithms; the specific form is
      implementation- defined.


Jiang, et al.           Expires 18 September 2026               [Page 4]

Internet-Draft           MoQ Multimodal Feedback              March 2026


2.  Motivation

   QUIC layer feedback mechanisms (ACK, Receive Timestamps) operate at
   the packet level, leaving the following blind spots at the MoQ layer:

   +==================+===============================================+
   | Blind Spot       | Description                                   |
   +==================+===============================================+
   | Object Semantics | QUIC ACK confirms packets but cannot perceive |
   |                  | frame integrity, type, or deadline            |
   +------------------+-----------------------------------------------+
   | Frame-Level      | QUIC ACK provides packet-level delay but      |
   | Timing           | cannot provide inter-Object arrival timing    |
   +------------------+-----------------------------------------------+
   | Playback         | QUIC layer cannot know peer's playback buffer |
   | Progress         | level or application-layer consumption state  |
   +------------------+-----------------------------------------------+

                                 Table 1

   This document defines a MoQ-layer feedback mechanism (MMF) that
   supplements Object-level semantic signals which are unavailable at
   the QUIC layer.  MMF serves both the application layer (per-Object
   delivery status) and CC algorithms (aggregate statistics injection).
   MMF enables senders to reduce sending quality before the peer's
   playback buffer is depleted, rather than passively responding after
   stuttering occurs.

3.  Architecture

3.1.  Three-Layer Architecture: Application / MoQ / Transport

   This mechanism adopts a three-layer architecture: the application
   layer serves as the media source, the transport layer implements CC
   algorithms, and the MoQ layer resides in between, responsible for
   semantic understanding and signal distribution.

3.1.1.  Application Layer: Media Sources

   MMF does not restrict the type of application-layer media sources.
   Two typical media source types are:


Jiang, et al.           Expires 18 September 2026               [Page 5]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   +==================+====================+===========================+
   | Media Source     | Characteristics    | MMF-Driven Adjustments    |
   | Type             |                    |                           |
   +==================+====================+===========================+
   | Traditional      | Unidirectional     | Unidirectional            |
   | Media            | output,            | adjustment: encoding      |
   | (encoder/camera/ | controllable       | bitrate, frame rate,      |
   | live streaming)  | frame rate/        | resolution, ABR Track     |
   |                  | bitrate            | switching                 |
   +------------------+--------------------+---------------------------+
   | AI Inference     | Bidirectional      | Bidirectional adjustment: |
   | Pipeline         | interaction,       | above general adjustments |
   | (multimodal      | tunable            | + inference parameters    |
   | inference        | generation         | (chunk_size, flush        |
   | engine)          | parameters         | strategy)                 |
   +------------------+--------------------+---------------------------+

                                  Table 2

   Both media source types share the same MMF format; the difference
   lies only in the adjustable actions that can be taken after consuming
   MMF.

3.1.2.  MoQ Layer: Semantic Hub

   The MoQ layer assumes two responsibilities in the three-layer
   architecture:

   *  *Semantic Translation:* The MoQ Track/Object/Group model provides
      feedback with frame-level semantics.  MMF reports Object delivery
      status (complete, expired, lost) rather than packet-level status.

   *  *Control Hub:* The MoQ layer synthesizes MMF feedback and local
      transport state, driving application-layer adjustments upward via
      callbacks (encoding bitrate, frame rate) and instructing CC to
      adjust sending rate downward via the cross-layer control interface
      (pacing_rate, pacing_gain).  The MoQ layer can also directly
      execute certain adjustments (ABR Track switching, Object
      transmission frequency control) without requiring application-
      layer media source cooperation.

   The Feedback Track is a normal MoQ Track, at the same level as
   audio/video/text Tracks.  This mechanism introduces no new QUIC
   frames or MOQT control messages.


Jiang, et al.           Expires 18 September 2026               [Page 6]

Internet-Draft           MoQ Multimodal Feedback              March 2026


3.1.3.  Transport Layer: CC Algorithms

   CC algorithms (BBR, GCC, etc.) are responsible for congestion
   detection and bandwidth estimation based on local QUIC ACK and
   receive-ts.  In real-time media scenarios, the MoQ layer issues
   control commands (pacing_rate, pacing_gain) to CC via the cross-layer
   control interface (Section 3.3); CC algorithms SHOULD execute these
   commands.  The MoQ layer makes decisions by synthesizing three
   sources of information: MMF feedback, local CC output (BWE), and
   frame-level statistics (Section 8.1), achieving higher information
   completeness than CC algorithms alone.  An integration example is
   provided in Section 8.4.

3.2.  Dual-Layer Feedback Model: QUIC receive-ts and MMF

   CC algorithms rely on per-packet delay signals for bandwidth
   estimation and congestion detection; this signal is provided by QUIC
   receive-ts ([quic-receive-ts]).  MMF operates at the MoQ layer,
   forming a dual-layer feedback which could cooperate with QUIC
   receive-ts.  Both mechanisms can be enabled simultaneously in
   implementations.

    +==========+=============+============+=============+=============+
    |Feedback  | Granularity | Hop        | CC Role     | Application |
    |Layer     |             |            |             | Layer Role  |
    +==========+=============+============+=============+=============+
    |QUIC      | per-packet  | QUIC       | Primary     | None        |
    |receive-ts| (~us)       | connection | signal      |             |
    |          |             | layer      | (delay      |             |
    |          |             |            | gradient,   |             |
    |          |             |            | BW est.)    |             |
    +----------+-------------+------------+-------------+-------------+
    |MoQ MMF   | per-Object  | Client/    | MoQ layer   | Primary     |
    |          | (~ms)       | Server     | synthesizes | signal      |
    |          |             | direct     | and issues  | (bitrate/   |
    |          |             |            | control     | frame       |
    |          |             |            | commands    | rate/ABR/   |
    |          |             |            |             | inference   |
    |          |             |            |             | parameter   |
    |          |             |            |             | adjustment) |
    +----------+-------------+------------+-------------+-------------+

                                  Table 3

   The two layers cover different granularities:


Jiang, et al.           Expires 18 September 2026               [Page 7]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   QUIC receive-ts provides per-packet inter-arrival delta, enabling CC
   algorithms to perform delay-based congestion detection and bandwidth
   estimation.

   MMF supplements additional signals:

   *  *Per-Object Signals:* QUIC transport layer cannot determine
      whether an Object is complete or arrived within deadline.  MMF
      provides per-Object status (RECEIVED_LATE, NOT_RECEIVED, etc.).

   *  *Application-Layer Metrics:* Playout headroom (PLAYOUT_AHEAD),
      receiver-side bandwidth estimation, etc.

3.2.1.  QUIC receive-ts Requirements

   Implementations would still need to support both QUIC receive-ts
   ([quic-receive-ts]) and MMF simultaneously when both extensions are
   negotiated.

   QUIC receive-ts carries per-packet reception timestamps (Timestamp
   Range / Timestamp Delta) via ACK_EXTENDED frames, enabling CC
   algorithms to compute inter-arrival delta (delay gradient) and
   bandwidth estimation.  Its role is equivalent to TWCC in WebRTC:

   *  *QUIC receive-ts:* per-packet reception timestamps --> delay-based
      CC (GCC/SQP, etc.)

   *  *MMF:* per-Object delivery status --> application-layer adaptation
      + MoQ-layer CC control

   Compared to WebRTC GCC+TWCC, this framework adds a per-Object
   semantic feedback layer on top of per-packet feedback.  TWCC only
   provides packet-level arrival times and cannot express Object
   completeness, expiration status, or playout headroom.

3.3.  Cross-Layer Control Interface

   MoQ implementations could provide a cross-layer control interface
   that enables the MoQ layer to issue control commands to CC
   algorithms.  The specific form of the interface is implementation-
   defined.

   Distinct from approaches that pass raw data to CC algorithms for
   independent judgment, this mechanism could help the MoQ layer to
   achieve better performance.  The MoQ layer possesses three kinds of
   information resource: MMF feedback, local CC output (BWE), and frame-
   level statistics (Section 8.1), achieving higher information
   completeness than CC algorithms alone.


Jiang, et al.           Expires 18 September 2026               [Page 8]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   CC algorithms could still run their own congestion detection logic,
   or adapted to accept control commands issued by the MoQ layer.

   *Control Commands:*

    +================+================================================+
    | Command        | Description                                    |
    +================+================================================+
    | target_bitrate | Target encoding bitrate computed by MoQ layer  |
    |                | from BWE and MMF; notifies CC of current       |
    |                | application-layer sending budget               |
    +----------------+------------------------------------------------+
    | pacing_gain    | Sending gain coefficient; MoQ layer calculates |
    |                | from MMF signals (Object loss rate, expiration |
    |                | rate, playout headroom); CC may execute as     |
    |                | pacing_rate = BWE x pacing_gain                |
    +----------------+------------------------------------------------+
    | pacing_rate    | Directly specify sending rate; MoQ layer       |
    |                | calculates and issues in scenarios like frame- |
    |                | level pacing (Section 8.1.4)                   |
    +----------------+------------------------------------------------+

                                  Table 4

   *MoQ Layer Decision Inputs:*

   The MoQ layer synthesizes the following signals for control
   decisions:

   *  *MMF Signals (from peer):* Object loss rate, expiration rate, Avg
      Inter-Arrival Delta, PLAYOUT_AHEAD_MS

   *  *CC Output (from local):* BWE, RTT, loss rate

   *  *Frame-Level Statistics (from local, Section 8.1.3):* per-Object
      transmission/loss/delivery duration

   *CC Algorithm Behavior:*

   CC algorithms SHOULD execute control commands after receiving them:

   *  *Upon pacing_gain:* Update sending rate as pacing_rate = BWE x
      pacing_gain

   *  *Upon pacing_rate:* Use this value directly as sending rate

   *  *Upon target_bitrate:* Record current application-layer sending
      budget for internal decision reference


Jiang, et al.           Expires 18 September 2026               [Page 9]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   When no control commands are issued by the MoQ layer, CC algorithms
   operate normally according to their own logic.

   The cross-layer control interface is optional.  CC algorithms that do
   not support this interface can still operate independently but cannot
   benefit from MMF-driven frame-level control capabilities.

3.4.  Bidirectional MMF: Output Feedback and Input Feedback

   The MMF report direction described in the preceding sections is from
   client to server, reporting delivery quality of downstream media
   (audio_response, text_response).  In streaming input scenarios, the
   server simultaneously subscribes to upstream media (audio_input,
   video_input) and MAY establish a reverse feedback Track to report
   upstream media delivery quality to the client.

   Client Server | | |----- audio_input ------------->| Server
   subscribes |<---- input_feedback (MMF) -----| Server reports input
   delivery quality | | |<---- audio_response -----------| Client
   subscribes |-- multimodal_feedback (MMF) -->| Client reports output
   delivery quality | |

   Output MMF (client-->server) and Input MMF (server-->client) use the
   same report format (Section 5).  Their purposes differ:

   *  *Output MMF:* Server adjusts encoding bitrate, inference
      parameters, and CC based on it.

   *  *Input MMF:* Client adjusts upstream behavior based on it:

      -  *Audio Input:* Adjust chunk size, encoding bitrate,
         transmission frequency

      -  *Video Input:* Adjust frame rate, resolution, pause/resume

   Input MMF is optional.  Support of the input MMF is declared via
   bit-2 of the Setup Parameter bitmap.

4.  Feedback Track

4.1.  Track Definition

   The Feedback Track is a normal MoQ Track where the Payload of each
   Object is an MMF.  Each MMF is published as an independent Object
   with monotonically increasing Object ID within a Group.


Jiang, et al.           Expires 18 September 2026              [Page 10]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   The Group division strategy for Feedback Track is implementation-
   defined.  It is RECOMMENDED to use a single Group (Group ID = 0) with
   continuously incrementing Object IDs to simplify implementation.

   Each Feedback Track MUST be associated 1:1 with a media Track.  The
   association is established during the PUBLISH phase (Section 4.3) and
   is not repeated in MMF reports.

   When feedback is needed for multiple media Tracks, independent
   Feedback Tracks MUST be established separately.

4.2.  Track Naming

   Feedback Track naming SHOULD follow these conventions:

   *  *Namespace:* Same Namespace as the media Track being fed back.

   *  *Track Name:* multimodal-feedback/<media_track_name>.

   Examples: multimodal-feedback/audio_response, multimodal-feedback/
   video_response.

   Media Track Names MUST NOT contain the / character to avoid parsing
   ambiguity in Feedback Track Names.

   When reverse feedback exists in a session (see Section 3.4), the
   Input Feedback Track Name is input-feedback/<media_track_name>.

   When a sender receives a PUBLISH request for a Feedback Track, it
   MUST identify the associated media Track via the <media_track_name>
   portion of the Track Name.  If no matching established media Track is
   found, it SHOULD respond with REQUEST_ERROR.

4.3.  Track Establishment and Lifecycle

   The Feedback Track is established by the feedback generator
   (typically the media receiver) as Publisher through MOQT's PUBLISH /
   PUBLISH_OK negotiation.

   *Establishment Flow Example:*

   Media Receiver Media Sender (Media Subscriber / (Media Publisher /
   Feedback Publisher) Feedback Subscriber) | | |
   SUBSCRIBE(track=audio_response) | (1) Subscribe media
   Track |----------------------------------->| |
   SUBSCRIBE_OK | |<-----------------------------------| | | |
   PUBLISH(track=multimodal-feedback/ | (2) Publish Feedback Track |
   audio_response, | (Role reversal: receiver |


Jiang, et al.           Expires 18 September 2026              [Page 11]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   namespace=same_as_media) | becomes feedback
   Publisher) |----------------------------------->| |
   PUBLISH_OK | |<-----------------------------------| | | | [Object:
   MMF seq=0] | (3) Send MMF |----------------------------------->| |
   [Object: MMF seq=1] | |----------------------------------->| | ...  |

   *Lifecycle Rules:*

   The Feedback Track lifecycle SHOULD align with the subscribed media
   Track it covers.  After the media Track publisher sends PUBLISH_DONE
   for the media Track, the Feedback Track publisher (i.e., the media
   receiver) SHOULD send PUBLISH_DONE for the Feedback Track after
   transmitting the final MMF and stop publishing Objects.

   When re-establishing a Feedback Track after connection interruption,
   the Report Sequence SHOULD restart from 0.

4.4.  Transport and Priority

   The Feedback Track SHOULD be carried over QUIC Stream (consistent
   with ordinary MoQ Objects).

   The Subscriber Priority for Feedback Track SHOULD be set lower than
   media Tracks ([MoQTransport], Section 7).  Example priority
   assignment:

      +=====================+=====================+================+
      | Track Type          | Subscriber Priority | Description    |
      +=====================+=====================+================+
      | audio_response      | 0 (highest)         | Audio media    |
      +---------------------+---------------------+----------------+
      | video_response      | 1                   | Video media    |
      +---------------------+---------------------+----------------+
      | multimodal-feedback | 3                   | Feedback Track |
      +---------------------+---------------------+----------------+

                                 Table 5

   When bandwidth contention occurs, media data SHOULD take precedence
   over feedback data transmission.

5.  Feedback Report Format

   This section defines the binary encoding format of MMF.  All integer
   fields use QUIC Variable-Length Integer encoding (RFC 9000,
   Section 16) unless otherwise specified.


Jiang, et al.           Expires 18 September 2026              [Page 12]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   Fields marked as "signed encoding" use ZigZag mapping and are
   transmitted as unsigned QUIC varint:

   *  *Encoding:* unsigned = (signed << 1) ^ (signed >> 63)

   *  *Decoding:* signed = (unsigned >>> 1) ^ -(unsigned & 1)

   Mapping examples: 0 -> 0, -1 -> 1, 1 -> 2, -2 -> 3, 2 -> 4.

5.1.  Report Structure

   MoQ Multimodal Feedback { Report Timestamp (i), Report Sequence (i),
   Object Entry Count (i), Object Entry (..) ..., Summary Stats (..),
   Optional Metric Count (i), Optional Metric (..) ..., }

   The MMF format version is negotiated via Setup Parameter
   (Section 6.1) and is not repeated in each report.  This document
   defines version 0.

   Each MMF reports Object delivery status for a single media Track
   associated with its Feedback Track (1:1 association, see
   Section 4.1).

   *Report Timestamp (varint):* The moment when the report is generated,
   using receiver's local monotonic clock value in microseconds.  This
   value is only used for Receive Timestamp Delta chain anchoring and
   report ordering, not for cross-end time alignment.  Monotonic clock
   MUST be used.

   *Report Sequence (varint):* Monotonically increasing from 0.  Senders
   detect feedback loss via sequence gaps (Section 10.1).

   *Object Entry Count (varint):* Number of Object Entries that follow.
   When the value is 0, the report contains only Summary Stats (for
   heartbeat purposes).

5.2.  Object Entry

   Object Entry { Object ID (i), Status (i), [Receive Timestamp Delta
   (i)], }

   Object Entries within the same MMF MUST be sorted in ascending order
   by Object ID.

5.2.1.  Object ID

   The Object ID (varint) within the media Track.


Jiang, et al.           Expires 18 September 2026              [Page 13]

Internet-Draft           MoQ Multimodal Feedback              March 2026


5.2.2.  Status

   Delivery status code (varint), see Section 5.3 for values.

5.2.3.  Receive Timestamp Delta

   Conditional presence field (varint, signed encoding): Present only
   when Status is RECEIVED (0x00) or RECEIVED_LATE (0x01).  Encoding
   rules:

   *  *First received Object within the same MMF* (i.e., the first entry
      with Status RECEIVED or RECEIVED_LATE when iterating through the
      list): Delta is the offset of the Object's arrival time relative
      to Report Timestamp in microseconds, encoded as signed varint
      (negative values indicate arrival time earlier than report
      generation time).

   *  *Subsequent received Objects:* Delta is the offset of the Object's
      arrival time relative to the arrival time of the most recent entry
      with Status RECEIVED or RECEIVED_LATE that precedes it in the
      list, encoded as signed varint.

   NOT_RECEIVED and PARTIALLY_RECEIVED entries are skipped in the delta
   chain.

   Since Object Entries are sorted by Object ID in ascending order while
   Objects may arrive out of order, delta values can be negative
   (indicating the Object arrived earlier than the previous received
   Object in the list).  Negative delta from reordering is a valid
   network state signal.

   This Delta chain encoding approach is consistent with QUIC Receive
   Timestamps (draft-ietf-quic-receive-ts-00), but at Object granularity
   rather than packet.  For small Objects (audio frames ~200B,
   approximately 1 packet/Object), the Delta sequence approaches per-
   packet precision.

5.3.  Delivery Status Codes

        +=======+====================+============================+
        | Value | Name               | Description                |
        +=======+====================+============================+
        | 0x00  | RECEIVED           | Completely received and    |
        |       |                    | within delivery deadline   |
        +-------+--------------------+----------------------------+
        | 0x01  | RECEIVED_LATE      | Completely received but    |
        |       |                    | exceeded delivery deadline |
        +-------+--------------------+----------------------------+


Jiang, et al.           Expires 18 September 2026              [Page 14]

Internet-Draft           MoQ Multimodal Feedback              March 2026


        | 0x02  | NOT_RECEIVED       | No bytes received at       |
        |       |                    | report generation time     |
        +-------+--------------------+----------------------------+
        | 0x03  | PARTIALLY_RECEIVED | Partial bytes received but |
        |       |                    | Object is incomplete       |
        +-------+--------------------+----------------------------+

                                  Table 6

5.3.1.  RECEIVED_LATE Determination

   Receiver MUST determine Object expiration based on local playback
   deadline:

   An Object is RECEIVED_LATE when its arrival time exceeds its expected
   playback moment.

   When playback deadline is unavailable (e.g., non-real-time playback
   scenarios), receiver SHOULD report RECEIVED rather than
   RECEIVED_LATE.

5.3.2.  NOT_RECEIVED Reporting Timing

   Receiver SHOULD report an Object as NOT_RECEIVED when one of the
   following conditions is met:

   *  A subsequent Object with larger Object ID has arrived, but this
      Object has not.

   *  The Object's expected arrival time has exceeded 2 x
      expected_interval (see Section 5.4.6) without arrival, and no
      subsequent Object is available for reference.

   Condition 2 covers the "last Object lost" scenario (no subsequent
   Object to trigger Condition 1).

   Receiver MUST NOT report NOT_RECEIVED for Objects that are not yet
   reasonably expected to arrive.

5.3.3.  PARTIALLY_RECEIVED

   Applicable to Objects carried over QUIC Stream: When a Stream is
   terminated by RESET_STREAM or closed due to timeout, Objects with
   partially received data SHOULD be reported as PARTIALLY_RECEIVED.

   The timeout threshold used to determine whether a partially received
   Object should be reported as PARTIALLY_RECEIVED is application-
   defined.  Different applications may have different latency


Jiang, et al.           Expires 18 September 2026              [Page 15]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   requirements (e.g., real-time voice vs. file transfer), and the
   receiver's application layer is best positioned to decide when an
   incomplete Object is no longer useful.  Implementations SHOULD allow
   the application layer to configure or signal this timeout value.

   For Objects carried over QUIC Datagram: Since QUIC Datagrams are not
   retransmitted by the transport layer, any Object whose Datagram(s)
   are lost results in incomplete data at the receiver.  If the receiver
   detects that some but not all Datagrams constituting an Object have
   arrived, and the remaining Datagrams are not expected to arrive
   (e.g., a subsequent Object has arrived or the application-defined
   timeout has expired), the Object SHOULD be reported as
   PARTIALLY_RECEIVED.  If no Datagrams for the Object have arrived, the
   Object SHOULD be reported as NOT_RECEIVED instead.

5.4.  Summary Stats Block

   Summary Stats { Report Interval (i), Total Objects Evaluated (i),
   Objects Received (i), Objects Received Late (i), Objects Lost (i),
   Avg Inter-Arrival Delta (i), }

   Summary Stats MUST always be included in every MMF, not controlled by
   negotiation bitmap.  It provides windowed aggregate information,
   enabling lightweight CC consumers that do not parse Object Entry to
   obtain effective signals.  The MoQ layer can compute control
   decisions based on this block and issue them to CC algorithms via the
   cross-layer control interface (Section 3.3).

   The Report Interval window of Summary Stats is independent from the
   coverage of Object Entries.  Object Entries MAY include Objects
   outside the Report Interval window (e.g., continuously reported
   NOT_RECEIVED entries, see Section 7.4).

5.4.1.  Report Interval

   The time window length covered by this report (varint), in
   microseconds.  This window spans from Report Timestamp - Report
   Interval to Report Timestamp.  RECOMMENDED value is 50000-200000
   (50-200ms).

5.4.2.  Total Objects Evaluated

   Total number of Objects (varint) evaluated within the Report Interval
   window.  Includes Objects of all statuses.  MUST equal Objects
   Received + Objects Received Late + Objects Lost.


Jiang, et al.           Expires 18 September 2026              [Page 16]

Internet-Draft           MoQ Multimodal Feedback              March 2026


5.4.3.  Objects Received

   Number of Objects (varint) with status RECEIVED within the window.

5.4.4.  Objects Received Late

   Number of Objects (varint) with status RECEIVED_LATE within the
   window.

5.4.5.  Objects Lost

   Number of Objects (varint) with status NOT_RECEIVED or
   PARTIALLY_RECEIVED within the window.

5.4.6.  Avg Inter-Arrival Delta

   Average arrival interval deviation (varint, signed encoding) of
   consecutive received Object pairs within the window, in microseconds.
   Calculation method:

   For consecutive received Object pairs (i-1, i) sorted by arrival time
   within the window:

   delta(i) = (A(i) - A(i-1)) - expected_interval Avg Inter-Arrival
   Delta = mean(delta(i)) for all i

   Where A(i) is the arrival time of Object i, and expected_interval is
   the expected arrival interval.

   Methods to determine expected_interval (by priority):

   1.  Known media frame rate (e.g., 50 obj/s --> 20000us).

   2.  Historical average arrival interval within current session
       (sliding window).

   If neither method is available, receiver SHOULD use method 2 or set
   this field to 0.

   Positive values indicate Object arrival interval greater than
   expected (increased queuing), negative values indicate less than
   expected.  When fewer than 2 received Objects exist in the window,
   this field MUST be 0.


Jiang, et al.           Expires 18 September 2026              [Page 17]

Internet-Draft           MoQ Multimodal Feedback              March 2026


5.5.  Optional Media Metrics

   Optional Media Metrics immediately follow Summary Stats.  Optional
   Metric Count (varint) specifies the number of subsequent Optional
   Metrics; a value of 0 indicates no optional metrics are included.

   Each metric uses Key-Value-Pair encoding (draft-ietf-moq-transport-
   17, Section 1.4.2):

   Optional Metric { Metric Type (i), Metric Value (i), }

   Defined metric types are divided into two categories: Application-
   Layer Metrics and QUIC Layer Summary Metrics.

   *Application-Layer Metrics:*


Jiang, et al.           Expires 18 September 2026              [Page 18]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   +======+==========================+==============+==================+
   | Type | Name                     | Unit         | Description      |
   +======+==========================+==============+==================+
   | 0x02 | PLAYOUT_AHEAD_MS         | milliseconds | Remaining time   |
   |      |                          |              | until playback   |
   |      |                          |              | stall at         |
   |      |                          |              | receiver, i.e.,  |
   |      |                          |              | buffered but     |
   |      |                          |              | not yet played   |
   |      |                          |              | media duration.  |
   |      |                          |              | Smaller values   |
   |      |                          |              | indicate closer  |
   |      |                          |              | to stall (0 =    |
   |      |                          |              | currently        |
   |      |                          |              | stalled).        |
   +------+--------------------------+--------------+------------------+
   | 0x04 | ESTIMATED_BANDWIDTH_KBPS | kbps         | Available        |
   |      |                          |              | bandwidth        |
   |      |                          |              | estimate         |
   |      |                          |              | observed at      |
   |      |                          |              | receiver.  Can   |
   |      |                          |              | be calculated    |
   |      |                          |              | as bytes         |
   |      |                          |              | received in      |
   |      |                          |              | window / window  |
   |      |                          |              | duration.  For   |
   |      |                          |              | sender to        |
   |      |                          |              | cross-reference  |
   |      |                          |              | with local       |
   |      |                          |              | bandwidth        |
   |      |                          |              | estimation.      |
   +------+--------------------------+--------------+------------------+

                                  Table 7

   *QUIC Layer Summary Metrics:*

   The following metrics expose the receiver's local QUIC connection
   transport state to the sender for CC algorithm cross-validation.


Jiang, et al.           Expires 18 September 2026              [Page 19]

Internet-Draft           MoQ Multimodal Feedback              March 2026


     +======+================+==============+=======================+
     | Type | Name           | Unit         | Description           |
     +======+================+==============+=======================+
     | 0x10 | PEER_RTT_US    | microseconds | Receiver's local QUIC |
     |      |                |              | connection smoothed   |
     |      |                |              | RTT, corresponding to |
     |      |                |              | smoothed_rtt in RFC   |
     |      |                |              | 9002.  For sender to  |
     |      |                |              | cross-validate with   |
     |      |                |              | local RTT estimation. |
     +------+----------------+--------------+-----------------------+
     | 0x12 | PEER_LOSS_RATE | per mille    | Receiver's local QUIC |
     |      |                |              | connection packet     |
     |      |                |              | loss rate within      |
     |      |                |              | Report Interval,      |
     |      |                |              | expressed in per      |
     |      |                |              | mille (e.g., 50 =     |
     |      |                |              | 5.0%).                |
     +------+----------------+--------------+-----------------------+

                                 Table 8

   MMF's core CC signals rely on Receive Timestamp Delta in Object Entry
   (Section 5.2.3, referencing QUIC receive-ts / WebRTC TWCC per-packet
   delta encoding approach) and Summary Stats (Section 5.4), not
   Optional Metrics.  Optional Metrics serve only as supplements.

   Type values 0x00-0x1f are reserved for this specification.  Values
   0x20 and above are available for application-layer custom use.

   Receiver MUST ignore unrecognized Metric Types.

   Optional Media Metrics MAY be included in MMF only when both parties
   have declared bit1=1 in Setup negotiation (Section 6).  When not
   negotiated or bit1=0, Optional Metric Count MUST be 0.

5.6.  Report Size Control

   A single MMF is RECOMMENDED not to exceed 1200 bytes to avoid QUIC
   packet fragmentation.

   When the number of Objects to report exceeds the capacity of a single
   MMF, receiver SHOULD:

   *  Prioritize including recent Object Entries (largest Object IDs).

   *  Trim the oldest Object Entries.


Jiang, et al.           Expires 18 September 2026              [Page 20]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   *  Ensure Summary Stats covers the complete Report Interval window
      (Summary Stats is not affected by Object Entry trimming).

5.6.1.  Encoding Example

   The following is an encoding structure of a typical MMF reporting
   delivery status of the 5 most recent Objects on the audio_response
   Track.  Object Entries are sorted in ascending order by Object ID.
   Signed fields use ZigZag-encoded unsigned values (encoding convention
   at the beginning of Section 5).

   ``` MoQ Multimodal Feedback: Report Timestamp: 2000000 (2000000us =
   2s since setup) Report Sequence: 10 (10th report) Object Entry Count:
   5 (5 Objects)

   Object Entry [0]: (smallest Object ID, first in list) Object ID: 96
   Status: 0x00 (RECEIVED) Recv Ts Delta: 169999 (ZigZag(-85000): 85ms
   before Report Timestamp) (First received Object, baseline=Report
   Timestamp)

   Object Entry [1]: Object ID: 97 Status: 0x02 (NOT_RECEIVED) (No Recv
   Ts Delta)

   Object Entry [2]: Object ID: 98 Status: 0x01 (RECEIVED_LATE) Recv Ts
   Delta: 100000 (ZigZag(+50000): 50ms later than Object 96) (Skip
   NOT_RECEIVED 97, baseline=Object 96)

   Object Entry [3]: Object ID: 99 Status: 0x00 (RECEIVED) Recv Ts
   Delta: 40000 (ZigZag(+20000): 20ms later than Object 98)

   Object Entry [4]: Object ID: 100 Status: 0x00 (RECEIVED) Recv Ts
   Delta: 40000 (ZigZag(+20000): 20ms later than Object 99)

   Summary Stats: Report Interval: 100000 (100000us = 100ms) Total
   Objects Evaluated: 5 Objects Received: 3 Objects Received Late: 1
   Objects Lost: 1 Avg Inter-Arrival Delta: 6000 (ZigZag(+3000): avg
   arrival interval 3ms larger)

   Optional Metric Count: 2 Optional Metric [0]: Metric Type: 0x02
   (PLAYOUT_AHEAD_MS) Metric Value: 150 (playout headroom 150ms)
   Optional Metric [1]: Metric Type: 0x04 (ESTIMATED_BANDWIDTH_KBPS)
   Metric Value: 800 (estimated bandwidth 800kbps) ```


Jiang, et al.           Expires 18 September 2026              [Page 21]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   In this example, Object 97 is lost and Object 98 arrived late.
   Object 98's delta (+50ms) is significantly larger than normal
   interval (~20ms).  Avg Inter-Arrival Delta is positive (+3ms)
   indicating larger-than-expected arrival intervals.  PLAYOUT_AHEAD_MS
   is only 150ms.  These signals combined indicate deteriorating network
   conditions.

6.  Negotiation

6.1.  Setup Parameter

   During MOQT Setup phase, both parties declare Multimodal Feedback
   capability via Setup Parameter.

   MOQT_MULTIMODAL_FEEDBACK Setup Parameter { Type = TBD1 (i), Length
   (i), Value (i), }

   The Value field is a capability bitmap (varint) with the following
   bit definitions:

   +======+==================+========================================+
   | Bit  | Name             | Description                            |
   +======+==================+========================================+
   | 0    | OUTPUT_FEEDBACK  | Support output direction Feedback      |
   |      |                  | Track (receiver-->sender)              |
   +------+------------------+----------------------------------------+
   | 1    | OPTIONAL_METRICS | Support Optional Media Metrics         |
   |      |                  | (Section 5.5)                          |
   +------+------------------+----------------------------------------+
   | 2    | INPUT_FEEDBACK   | Support input direction Feedback Track |
   |      |                  | (sender-->receiver, Section 3.4)       |
   +------+------------------+----------------------------------------+
   | 3-62 | Reserved         | Sender MUST set to 0, receiver MUST    |
   |      |                  | ignore                                 |
   +------+------------------+----------------------------------------+

                                 Table 9

   *Negotiation Example:*


Jiang, et al.           Expires 18 September 2026              [Page 22]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   Client Server | | | CLIENT_SETUP( | | version=16, | | params=[ | |
   {type=0x00, value=0x02}, | (ROLE=Publisher) | {type=TBD1,
   value=0x03} | (MULTIMODAL_FEEDBACK: bit0|1=1) |
   ]) | |----------------------------------->| | | | SERVER_SETUP( | |
   version=16, | | params=[ | | {type=0x00, value=0x03}, |
   (ROLE=PubSub) | {type=TBD1, value=0x01} | (MULTIMODAL_FEEDBACK:
   bit0=1) | ]) | |<-----------------------------------| | | |
   Negotiation Result: | | bit0=1 (Output Feedback) | | bit1=0 (Client
   declared but | | Server did not, | | Optional Metrics disabled) |

6.2.  Capability Negotiation Rules

   Feature enable conditions (both parties MUST declare corresponding
   bit as 1):

      +========================+==================+=================+
      | Feature                | Enable Condition | Dependency      |
      +========================+==================+=================+
      | Output Feedback        | Both bit0=1      | None            |
      +------------------------+------------------+-----------------+
      | Optional Media Metrics | Both bit1=1      | Output Feedback |
      |                        |                  | enabled         |
      +------------------------+------------------+-----------------+
      | Input Feedback         | Both bit2=1      | None            |
      +------------------------+------------------+-----------------+

                                  Table 10

   When a feature is not enabled:

   *  Receiver MUST NOT publish Feedback Track in corresponding
      direction.

   *  When sender receives PUBLISH request for Feedback Track in un-
      negotiated direction, it SHOULD respond with REQUEST_ERROR (draft-
      ietf-moq-transport-17, Section 9.8).

   *  MMF MUST NOT include un-negotiated optional fields (e.g., Optional
      Metrics).

6.3.  Behavior When Parameter Not Declared

   When peer's Setup does not include MOQT_MULTIMODAL_FEEDBACK
   Parameter, it is equivalent to Value=0 (no Multimodal Feedback
   capability supported).

   This end MUST NOT proactively establish Feedback Track.


Jiang, et al.           Expires 18 September 2026              [Page 23]

Internet-Draft           MoQ Multimodal Feedback              March 2026


6.4.  Runtime Capability Change

   This version does not support runtime changes to Multimodal Feedback
   capability.  If change is needed, MoQ session MUST be re-established.

7.  Receiver Behavior

7.1.  Arrival Time Recording

   Receiver MUST record the arrival time of each Object.  Arrival time
   is defined as the moment when the last byte of the Object arrives at
   the receiver's MoQ layer.  Implementation MUST use monotonic clock,
   unaffected by system time adjustments.  Time precision SHOULD be no
   less than 1 millisecond, RECOMMENDED to be microsecond-level.

7.2.  Delivery Status Determination

   Receiver MUST maintain delivery status for each known Object:

   *  Last byte of Object arrives and within deadline: RECEIVED (0x00).

   *  Last byte of Object arrives but exceeds deadline: RECEIVED_LATE
      (0x01).

   *  Object has not arrived but is reasonably considered lost (see
      Section 5.3): NOT_RECEIVED (0x02).

   *  Object partially arrived and carrying Stream is closed:
      PARTIALLY_RECEIVED (0x03).

   Object status MAY be updated in subsequent MMF.  For example, from
   NOT_RECEIVED to RECEIVED (when a delayed Object is eventually
   received).  Therefore, counts such as Objects Lost in Summary Stats
   reflect an observation snapshot at report generation time and are not
   guaranteed to be consistent with final statistics.  Sender SHOULD use
   them as immediate signals rather than precise statistics.

7.3.  MMF Generation Frequency

   Receiver SHOULD generate MMF at the following recommended
   frequencies:


Jiang, et al.           Expires 18 September 2026              [Page 24]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   +==================+=============+==================================+
   | Scenario         | Recommended | Description                      |
   |                  | Frequency   |                                  |
   +==================+=============+==================================+
   | Audio Track (~50 | Every       | High-frequency Objects           |
   | Object/s)        | 50-100ms    | require dense feedback           |
   +------------------+-------------+----------------------------------+
   | Video Track      | Every       | Low-frequency Objects can        |
   | (~2-30 Object/s) | 100-200ms   | reduce feedback frequency        |
   +------------------+-------------+----------------------------------+
   | No new Objects   | Every       | Heartbeat to prevent             |
   | arriving         | 500ms-1s    | sender from misjudging           |
   |                  |             | connection state                 |
   +------------------+-------------+----------------------------------+

                                  Table 11

   Generation frequency SHOULD NOT exceed once per 50ms (to avoid
   feedback itself consuming excessive bandwidth).  Generation frequency
   SHOULD NOT be lower than once per 2s (to ensure sender receives
   timely feedback).

   When receiver detects rapid deterioration in delivery quality (e.g.,
   consecutive Object losses), it MAY immediately generate an additional
   MMF (without waiting for the scheduled cycle) to accelerate sender
   response.

7.4.  Object Entry Selection

   When generating MMF, receiver SHOULD follow these Object Entry
   selection strategies:

   *  Prioritize covering recent Objects (most recent in time).

   *  Total number of Object Entries per MMF is RECOMMENDED not to
      exceed 50.

   *  For Objects already reported in previous MMF with unchanged
      status, they MAY be omitted.

   *  Objects with status NOT_RECEIVED SHOULD be continuously reported
      (for at least 3 MMF cycles) until status changes or exceeding the
      report window.


Jiang, et al.           Expires 18 September 2026              [Page 25]

Internet-Draft           MoQ Multimodal Feedback              March 2026


7.5.  Exception Handling

   *  *Object Out-of-Order Arrival:* Receiver MUST record by actual
      arrival time without reordering.  Arrival time reflects actual
      network behavior; out-of-order itself is a valid network signal.

   *  *Duplicate Objects:* Receiver SHOULD ignore duplicate arrivals,
      retaining the first arrival time.

   *  *Feedback Track Publish Failure:* Receiver SHOULD retry
      establishing Feedback Track with backoff strategy.  Retry interval
      is RECOMMENDED to use exponential backoff (initial 1s, maximum
      30s).

8.  Sender Behavior

8.1.  Object to QUIC Packet Mapping

   If a sender needs to correlate MMF feedback with local transmission
   events, it MUST maintain the mapping relationship between Objects and
   QUIC packets.  Without this mapping, the sender cannot determine the
   number of packets, packet loss count, and delivery duration for a
   single video frame, and the per-Object status reported by MMF cannot
   be aligned with sender-side statistics.

8.1.1.  Object Granularity

   In real-time media scenarios, a MoQ Object SHOULD correspond to an
   independently decodable media unit (a video frame or an audio frame).

   This enables per-Object feedback in MMF to directly carry frame-level
   semantics: Object loss = frame loss, Object expiration = frame
   expiration.  If an Object contains multiple frames or a single frame
   spans multiple Objects, there will be deviation between the delivery
   status reported by MMF and the actual media quality.

8.1.2.  Packet Not Crossing Object Boundaries

   Senders SHOULD avoid merging data from different Objects into the
   same QUIC packet.

   If a packet contains data from two Objects, then loss or delay of
   that packet cannot be attributed to a single Object, leading to:

   *  Frame-level packet loss rate statistics distortion (single packet
      loss affects statistics for both frames)


Jiang, et al.           Expires 18 September 2026              [Page 26]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   *  Frame-level delivery time cannot be accurately measured
      (transmission times of two frames overlap)

   *  Per-Object arrival time at the receiver becomes inaccurate

   During implementation, when the QUIC send queue attempts to append
   new data to an existing packet, it SHOULD check whether both belong
   to the same Object; if not, it SHOULD create a new packet.

8.1.3.  Sender-Side per-Object Transmission Statistics

   Senders SHOULD maintain the following transmission statistics for
   each Object:

     +=================+=============================================+
     | Statistic       | Description                                 |
     +=================+=============================================+
     | sent_packets    | Number of QUIC packets sent for this Object |
     +-----------------+---------------------------------------------+
     | lost_packets    | Number of QUIC packets lost for this Object |
     +-----------------+---------------------------------------------+
     | first_sent_time | Transmission time of the first packet for   |
     |                 | this Object                                 |
     +-----------------+---------------------------------------------+
     | all_acked_time  | Time when all packets for this Object are   |
     |                 | acknowledged                                |
     +-----------------+---------------------------------------------+

                                  Table 12

   Based on these statistics, the sender can compute:

   *  *Frame-level bandwidth sample:* object_size / (all_acked_time -
      first_sent_time)

   *  *Frame-level packet loss rate:* lost_packets / sent_packets

   *  *Frame-level delivery duration:* all_acked_time - first_sent_time

   These metrics provide the foundation for frame-level BWE and frame-
   level pacing (Section 8.1.4).  Standard CC algorithms (BBR, CUBIC)
   sample bandwidth at the packet level.  Frame-level bandwidth sampling
   serves as a complementary approach, suitable for real-time video
   scenarios with large frame size variations (I-frames may be 10x
   larger than P-frames), helping to reduce noise from single-packet
   sampling.


Jiang, et al.           Expires 18 September 2026              [Page 27]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   Whether a CC algorithm provides BWE output depends on the algorithm
   type.  Model-based algorithms such as BBR and GCC provide bandwidth
   estimation; pure loss-based algorithms like CUBIC only output cwnd
   without explicit BWE.  When the CC algorithm does not provide BWE,
   the MoQ layer MAY use frame-level bandwidth sampling as the bandwidth
   estimation source.

8.1.4.  Frame-Level Pacing

   Real-time video exhibits significant frame size variations.  When
   using a global fixed pacing rate for transmission, large frames
   (I-frames) will burst a large number of packets in a short time,
   while small frames (P-frames) underutilize the sending window.

   Senders SHOULD calculate pacing intervals at Object granularity.

   The packet sending interval for each Object is RECOMMENDED to be
   computed as follows:

   pkt_send_interval = media_pace_duration / (sent_packets - 1)

   Where media_pace_duration is the media duration corresponding to this
   Object (e.g., 33ms@30fps for video, 20ms@50fps for audio).  The first
   packet of an Object is sent immediately, and subsequent packets are
   sent at equal intervals of pkt_send_interval.

   This approach distributes an Object's data uniformly across its frame
   interval.  I-frames have denser packet intervals, P-frames have
   sparser intervals, but neither produces bursts.

   Frame-level pacing may conflict with CC's pacing rate.  When they are
   inconsistent, senders SHOULD use the lower sending rate to prevent
   frame-level pacing from bypassing CC's congestion control.

8.2.  Application-Layer Consumption

   After the sender's MoQ layer receives MMF, it exposes the following
   information to the inference scheduler/ABR through application-layer
   callbacks:

   *  Delivery quality (loss rate, expiration rate) of the associated
      media Track

   *  Per-Object status and timestamps (can be used for detailed
      analysis)

   *  Optional media metrics (playout headroom, bandwidth estimation,
      etc.)


Jiang, et al.           Expires 18 September 2026              [Page 28]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   The inference scheduler makes decisions based on the above
   information (see Section 9 for details).

8.3.  Transport-Layer Consumption (Cross-Layer Control)

   The MoQ layer synthesizes MMF signals and local CC output, computes
   control decisions, and issues them to CC algorithms via the cross-
   layer control interface (Section 3.3).  CC algorithms do not directly
   parse MMF, but instead execute control commands from the MoQ layer.
   Decision inputs, issuable commands, and CC behavior are described in
   Section 3.3.

8.4.  Example: MoQ Layer Controlling BBR

   The core formula of BBR is pacing_rate = bandwidth x pacing_gain.

   BBR itself estimates bandwidth through ACK sampling and controls
   pacing_gain via its state machine.  In cross-layer control mode, the
   MoQ layer takes over control of pacing_gain.  BBR remains responsible
   for bandwidth estimation, while pacing_gain is determined by the MoQ
   layer based on MMF feedback:

   ``` MoQ Layer: 1.  Read BBR's BWE 2.  Read MMF: Objects Lost, Late,
   PLAYOUT_AHEAD_MS, Delta 3.  Compute pacing_gain (comprehensive
   judgment) 4.  Issue pacing_gain --> CC

   BBR: 1.  Receive pacing_gain 2. pacing_rate = BWE x pacing_gain 3.
   Send at pacing_rate ```

   Example logic for MoQ layer computing pacing_gain:

   1.  *Objects Lost > 0 and BBR local loss = 0* --> Reduce pacing_gain
       to 1.0 (CC local ACK is normal, but peer is actually losing
       frames; stop probing upward)

   2.  *PLAYOUT_AHEAD_MS < 100ms* --> Reduce pacing_gain to 1.0
       (Insufficient playout headroom; avoid aggressive probing)

   3.  *Avg Inter-Arrival Delta consistently positive* --> Reduce
       pacing_gain to 0.9 (Receiver-side queuing is worsening;
       proactively reduce speed)

   4.  *High proportion of Objects Received Late* --> Reduce
       target_bitrate (Transmission has no packet loss but delay exceeds
       deadline; reduce per-frame data volume at the source)

   5.  *None of the above conditions met* --> Do not issue commands; BBR
       runs according to its own state machine


Jiang, et al.           Expires 18 September 2026              [Page 29]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   When the MoQ layer does not issue commands, BBR operates normally
   according to its own state machine.  The MoQ layer overrides
   pacing_gain only when MMF signals indicate intervention is needed;
   otherwise, BBR operates fully autonomously.

9.  Application Scenarios: Streaming Media and AI Inference

   This section describes the usage of MMF in streaming media and AI
   inference scenarios.  Section 9.1 presents a general adaptation
   framework, and Section 9.2 illustrates with concrete use cases.

9.1.  MMF-Driven Rate-Quality Adaptation

   The following adaptation rules apply to all MoQ streaming media
   scenarios:


Jiang, et al.           Expires 18 September 2026              [Page 30]

Internet-Draft           MoQ Multimodal Feedback              March 2026


    +==================+===============+==============+==============+
    | MMF Field        | Adaptation    | Effect       | Applicable   |
    | (Section)        | Action        |              | Scenarios    |
    +==================+===============+==============+==============+
    | PLAYOUT_AHEAD_MS | Reduce        | Less data    | Video/Audio/ |
    | downward trend   | encoding      | per frame    | Inference    |
    | (5.5)            | bitrate       |              |              |
    +------------------+---------------+--------------+--------------+
    | PLAYOUT_AHEAD_MS | Reduce frame  | Reduced      | Video        |
    | downward trend   | rate / Object | transmission |              |
    | (5.5)            | sending       | volume       |              |
    |                  | frequency     |              |              |
    +------------------+---------------+--------------+--------------+
    | Objects Lost > 0 | Reduce        | Match        | All          |
    | (5.4.5)          | sending rate  | available    |              |
    |                  | (with CC)     | bandwidth    |              |
    +------------------+---------------+--------------+--------------+
    | Objects Received | Reduce        | Trade        | All          |
    | Late > 0 (5.4.4) | quality to    | quality for  |              |
    |                  | meet deadline | timeliness   |              |
    +------------------+---------------+--------------+--------------+
    | Avg Inter-       | Preventive    | Avoid sudden | All          |
    | Arrival Delta    | quality       | degradation  |              |
    | increasing       | reduction     |              |              |
    | (5.4.6)          | (before       |              |              |
    |                  | packet loss)  |              |              |
    +------------------+---------------+--------------+--------------+
    | PLAYOUT_AHEAD_MS | Gradually     | Improve      | All          |
    | recovery (5.5)   | restore       | experience   |              |
    |                  | bitrate/frame |              |              |
    |                  | rate          |              |              |
    +------------------+---------------+--------------+--------------+

                                 Table 13

   These adaptations can be executed at the MoQ publishing layer with
   per-Object granularity and latency < 1 frame cycle, without requiring
   application-layer media source cooperation.  When the application
   layer is an AI inference pipeline, MMF can also adjust inference
   pipeline parameters (real-time flush, dynamic chunk_size).

   Unidirectional and bidirectional adaptation modes are described in
   Section 3.4 (Bidirectional MMF).


Jiang, et al.           Expires 18 September 2026              [Page 31]

Internet-Draft           MoQ Multimodal Feedback              March 2026


9.2.  Typical Use Cases

   The following use cases assume BBR as the CC algorithm (integration
   method described in Section 8.4) and illustrate the effects of MMF in
   different scenarios.  All cases follow the strategy of prioritizing
   audio output and progressively reducing quality.

9.2.1.  Use Case A: Video Live Streaming ABR Adaptation

   *  *Scenario:* Sender publishes video live streaming with two Tracks:
      1080p (3Mbps) and 720p (1.5Mbps).  Receiver is currently
      subscribed to the 1080p Track.

   *  *Trigger condition:* Available network bandwidth drops from 4Mbps
      to 2Mbps.

   *  *MMF signals:* Proportion of Objects Received Late increases
      (video frames arrive but exceed deadline), PLAYOUT_AHEAD_MS
      gradually decreases.

   *  *CC response:* BBR reduces pacing_rate to match new bandwidth.

   *  *Application-layer response:* MoQ publishing layer detects
      persistently high Objects Received Late on the 1080p Track,
      triggers ABR switching: switches to 720p Track at the next Group
      boundary (native MoQ Track switching mechanism).

   *  *Recovery:* After MMF reports Objects Received Late returning to
      normal and PLAYOUT_AHEAD_MS recovering, sender may attempt to
      switch back to 1080p.

   *  *Benefit:* Per-frame integrity and deadline information provided
      by MMF enables higher ABR decision accuracy than pure CC-driven
      approaches.

9.2.2.  Use Case B: Bandwidth Drop (Audio-Video Mixed)

   *  *Trigger condition:* Available bandwidth drops sharply.

   *  *MMF signals:* Objects Lost increases, Avg Inter-Arrival Delta
      increases, PLAYOUT_AHEAD_MS decreases.

   *  *CC response:* MoQ layer issues pacing_gain=1.0 based on MMF loss
      signal, pauses upward probing.

   *  *Application-layer response:* Reduce Opus bitrate, pause video
      Track (prioritize audio).


Jiang, et al.           Expires 18 September 2026              [Page 32]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   *  *Benefit:* Peer-side frame loss signal from MMF and congestion
      signal from BBR local ACK can cross-validate.  Gradually restore
      bitrate after network recovery.

9.2.3.  Use Case C: Multi-Layer Quality Adaptation (AI Inference)

   *  *Trigger condition:* Persistent congestion.

   *  *MMF signals:* Objects Lost persistently high, PLAYOUT_AHEAD_MS at
      low level.

   *  *CC response:* MoQ layer continuously issues low pacing_gain,
      pauses upward probing.

   *  *Application-layer response:* Based on general adaptation from Use
      Case B, inference pipeline reduces chunk_size, accelerates audio
      flush, reduces output latency.  CC and application layer adjust
      synchronously.

   *  *Benefit:* Per-frame level adaptation takes effect within frame
      cycle; audio remains uninterrupted throughout.  During recovery,
      gradually restore chunk_size and bitrate.

9.2.4.  Use Case D: Generation Rate Overload (AI Inference)

   *  *Trigger condition:* Inference model suddenly generates long
      response, audio_response traffic spikes.

   *  *MMF signals:* High proportion of Objects Received Late (frames
      arrive but exceed playback deadline), Avg Inter-Arrival Delta is
      large.

   *  *CC response:* Reduce pacing_rate.

   *  *Application-layer response:* Reduce encoding bitrate, accelerate
      flush, guide inference to generate more concise responses (reduce
      audio duration at the source).

   *  *Benefit:* Trade quality for timeliness, avoid playback
      stuttering.

9.2.5.  Use Case E: Streaming Input Inference (Bidirectional MMF)

   *  *Trigger condition:* Uplink network jitter (streaming input
      inference scenario).

   *  *MMF signals:* Input MMF reports high proportion of NOT_RECEIVED.


Jiang, et al.           Expires 18 September 2026              [Page 33]

Internet-Draft           MoQ Multimodal Feedback              March 2026


   *  *Application-layer response:*

      -  *Client:* Increase chunk size, reduce uplink bitrate, pause
         video input.

      -  *Server:* Manage KV cache, adjust handling strategy for
         incomplete input.

   *  *Benefit:* Bidirectional MMF coordinates quality of both uplink
      and downlink paths.  MMF session activity signals (feedback
      frequency, PLAYOUT_AHEAD_MS) can be used for KV cache eviction and
      priority decisions.

10.  IANA Considerations

   This document defines the following code points for registration:

   *  *TBD1:* MOQT_MULTIMODAL_FEEDBACK Setup Parameter Type
      (Section 6.1)

   *  *Delivery Status Code Registry* (Section 5.3):

      -  0x00: RECEIVED

      -  0x01: RECEIVED_LATE

      -  0x02: NOT_RECEIVED

      -  0x03: PARTIALLY_RECEIVED

   *  *Optional Metrics Type Registry* (Section 5.5):

      -  0x02: PLAYOUT_AHEAD_MS

      -  0x04: ESTIMATED_BANDWIDTH_KBPS

      -  0x10: PEER_RTT_US

      -  0x12: PEER_LOSS_RATE

11.  Acknowledgments

   The design of this document references QUIC Extended ACK Receive
   Timestamps, RTP Transport Congestion Control Feedback, and drafts
   from the MoQ Working Group (MOQT / MSF / Metrics), as well as real-
   time multimodal inference systems (LongCat-Flash-Omni, Qwen3-Omni,
   Voxtral-Realtime), the vLLM-Omni Stage Pipeline framework, and vLLM
   streaming input mode.


Jiang, et al.           Expires 18 September 2026              [Page 34]

Internet-Draft           MoQ Multimodal Feedback              March 2026


12.  Normative References

   [MoQTransport]
              Nandakumar, S., Vasiliev, V., Swett, I., and A. Frindell,
              "Media over QUIC Transport", Work in Progress, Internet-
              Draft, draft-ietf-moq-transport-17, 2 March 2026,
              <https://datatracker.ietf.org/doc/html/draft-ietf-moq-
              transport-17>.

   [quic-receive-ts]
              Swett, I. and J. Beshay, "QUIC Extended Acknowledgement
              for Reporting Packet Receive Timestamps", Work in
              Progress, Internet-Draft, draft-ietf-quic-receive-ts-01, 2
              March 2026, <https://datatracker.ietf.org/doc/html/draft-
              ietf-quic-receive-ts-01>.

   [QUIC-TRANSPORT]
              Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
              Multiplexed and Secure Transport", RFC 9000,
              DOI 10.17487/RFC9000, May 2021,
              <https://www.rfc-editor.org/rfc/rfc9000>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

Authors' Addresses

   Minghui Jiang
   Alibaba Inc.
   Email: shimei.jmh@alibaba-inc.com


   Yanmei Liu
   Alibaba Inc.
   Email: miaoji.lym@alibaba-inc.com


   Ronghua Wu
   Ant Group.
   Email: r.wu@antgroup.com


Jiang, et al.           Expires 18 September 2026              [Page 35]