Internet Engineering Task Force                                  N. Kuhn
Internet-Draft                                       Thales Alenia Space
Intended status: Standards Track                              E. Stephan
Expires: 1 September 2025                                         Orange
                                                            G. Fairhurst
                                                               R. Secchi
                                                  University of Aberdeen
                                                              C. Huitema
                                                    Private Octopus Inc.
                                                        28 February 2025


         Convergence of Congestion Control from Retained State
                   draft-ietf-tsvwg-careful-resume-15

Abstract

   This document specifies a cautious method for IETF transports that
   enables fast startup of CC for a wide range of connections.  It
   reuses a set of computed CC parameters that are based on previously
   observed path characteristics between the same pair of transport
   endpoints.  These parameters are saved, allowing them to be later
   used to modify the CC behaviour of a subsequent connection.

   It describes assumptions and defines requirements for how a sender
   utilises these parameters to provide opportunities for a connection
   to more rapidly get up to speed and rapidly utilise available
   capacity.  It discusses how use of Careful Resume impacts the
   capacity at a shared network bottleneck and the safe response that is
   needed after any indication that the new rate is inappropriate.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 1 September 2025.




Kuhn, et al.            Expires 1 September 2025                [Page 1]

Internet-Draft               Careful Resume                February 2025


Copyright Notice

   Copyright (c) 2025 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Use of saved CC parameters by a Sender  . . . . . . . . .   4
     1.2.  Receiver Preference . . . . . . . . . . . . . . . . . . .   5
     1.3.  Transport Protocol Interaction  . . . . . . . . . . . . .   5
     1.4.  Examples of Scenarios of Interest . . . . . . . . . . . .   5
     1.5.  Design Principles . . . . . . . . . . . . . . . . . . . .   6
   2.  Language, Notation and Terms  . . . . . . . . . . . . . . . .   7
     2.1.  Requirements Language . . . . . . . . . . . . . . . . . .   7
     2.2.  The Remote Endpoint . . . . . . . . . . . . . . . . . . .   8
     2.3.  Logging support . . . . . . . . . . . . . . . . . . . . .   8
     2.4.  Notation and Terms  . . . . . . . . . . . . . . . . . . .   8
   3.  The Phases of CC using Careful Resume . . . . . . . . . . . .   9
     3.1.  Observing . . . . . . . . . . . . . . . . . . . . . . . .  10
     3.2.  Reconnaissance Phase  . . . . . . . . . . . . . . . . . .  10
     3.3.  Unvalidated Phase . . . . . . . . . . . . . . . . . . . .  11
     3.4.  Validating Phase  . . . . . . . . . . . . . . . . . . . .  13
     3.5.  Safe Retreat Phase  . . . . . . . . . . . . . . . . . . .  14
       3.5.1.  Loss Recovery after entering Safe Retreat . . . . . .  15
     3.6.  RTO Expiry while using Careful Resume . . . . . . . . . .  15
     3.7.  Normal Phase  . . . . . . . . . . . . . . . . . . . . . .  15
   4.  Implementation Notes and Guidelines . . . . . . . . . . . . .  15
     4.1.  Observing the Path Capacity . . . . . . . . . . . . . . .  16
     4.2.  Confirming the Path in the Reconnaissance Phase . . . . .  16
       4.2.1.  Confirming the Path . . . . . . . . . . . . . . . . .  17
     4.3.  Safety in the Unvalidated Phase . . . . . . . . . . . . .  18
       4.3.1.  Lifetime of CC Parameters . . . . . . . . . . . . . .  18
       4.3.2.  Pacing in the Unvalidated Phase . . . . . . . . . . .  18
       4.3.3.  Exit from the Unvalidated Phase because of Variable
               Network Conditions  . . . . . . . . . . . . . . . . .  19
     4.4.  The Validating Phase  . . . . . . . . . . . . . . . . . .  19
     4.5.  Safety in the Safe Retreat Phase  . . . . . . . . . . . .  20
     4.6.  Returning to Normal Congestion Control  . . . . . . . . .  21



Kuhn, et al.            Expires 1 September 2025                [Page 2]

Internet-Draft               Careful Resume                February 2025


     4.7.  Limitations from Transport Protocols  . . . . . . . . . .  21
   5.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  21
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  21
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  21
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  21
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  21
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  22
   Appendix A.  Notes on the Careful Resume Phases . . . . . . . . .  24
     A.1.  Example with No Loss  . . . . . . . . . . . . . . . . . .  26
     A.2.  Example with No Loss, Rate-Limited  . . . . . . . . . . .  27
     A.3.  Example with Loss detected in the Reconnaissance Phase  .  28
     A.4.  Example with Loss detected in the Validating Phase  . . .  28
   Appendix B.  Implementation Notes for using BBR . . . . . . . . .  29
     B.1.  Sending unvalidated packets using BBR . . . . . . . . . .  29
     B.2.  Validation for BBR  . . . . . . . . . . . . . . . . . . .  30
     B.3.  Safe Retreat for BBR  . . . . . . . . . . . . . . . . . .  30
   Appendix C.  Internet Draft Revision details  . . . . . . . . . .  30
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  32

1.  Introduction

   All Internet transports are required to either use a Congestion
   Control (CC) algorithm, or to constrain their rate of transmission
   [RFC8085].  In 2010, a survey of alternative CC algorithms [RFC5783],
   noted that there are challenges when a CC algorithm operates across
   an Internet path with a high and/or varying Bandwidth-Delay Product
   (BDP).  This mechanism targets a solution for these challenges.

   A CC algorithm typically takes time to ramp-up the sending rate,
   called the "Slow-Start phase", informally known as the time to "Get
   up to speed".  This defines a time in which a sender intentionally
   uses less capacity than might be available, with the intention to
   avoid or limit overshoot of the available capacity for the path.
   This can increase queuing (latency or jitter) and/or congestion
   packet loss for the flow.  Any overshoot can have a detrimental
   effect on other flows sharing a common bottleneck.  A sender can use
   a method that observes the rate of acknowledged data, and seek to
   avoid an overshoot of the bottleneck capacity (e.g., Hystart++
   [RFC9406]).  In the extreme case, an overshoot can result in
   persistent congestion with unwanted starvation of other flows
   [RFC8867] (i.e., preventing other flows from successfully sharing the
   capacity at a common bottleneck).

   The present document specifies a mechanism, called Careful Resume,
   which is expected to reduce the time to complete a transfer when the
   transfer sends significantly more data than allowed by the Initial
   congestion Window (IW), and where the BDP of the path is also
   significantly more than the IW.  It introduces an alternative



Kuhn, et al.            Expires 1 September 2025                [Page 3]

Internet-Draft               Careful Resume                February 2025


   mechanism to select initial CC parameters that seeks to more rapidly
   and safely grow the sending rate controlled by the congestion window
   (CWND).  CC algorithms that are rate-based can make similar
   adjustments to their target sending rate.

   Careful Resume is based on temporal sharing (sometimes known as
   caching) of a saved set of CC parameters that relate to previous
   observations of the same path.  The parameters include: the
   saved_cwnd for the path and the minimum Round Trip Time (RTT).  These
   parameters are saved and used to modify the CC behaviour of a
   subsequent connection between the same endpoints.  Some CC algorithms
   might use other parameters.  For example, a rate-based CC algorithm
   also retains the value of the bottleneck bandwidth required to reach
   the capacity available to the flow (e.g., BBR.max_bw,see
   [I-D.ietf-ccwg-bbr]).

   When used with the QUIC transport, this provides transport services
   that resemble those that could be implemented in TCP, using methods
   such as TCP Control Block (TCB) [RFC9040] caching.

1.1.  Use of saved CC parameters by a Sender

   CC parameters are used by Careful Resume for three functions:

   1.  Information to confirm whether a saved path corresponds to the
       current path.

   2.  Information about the utilised path capacity to set CC
       parameters.

   3.  Information to check the CC parameters are not too old.

   "Generally, implementations are advised to be cautious when using
   saved CC parameters on a new path", as stated in [RFC9000].  While
   this statement has been proposed in the context of QUIC
   standardisation, this advice is appropriate for any IETF transport
   protocol.  Care is therefore needed to assure safe use and to be
   robust to changes in traffic patterns, network routing, and link/node
   conditions.  There are cases where using the saved parameters of a
   previous connection is not appropriate (see Section 3.2).











Kuhn, et al.            Expires 1 September 2025                [Page 4]

Internet-Draft               Careful Resume                February 2025


1.2.  Receiver Preference

   Whilst a sender could take optimisation decisions without considering
   the receiver's preference, there are cases where a receiver could
   have information that is not available at the sender, or might
   benefit from understanding that Careful Resume might be used.  In
   these cases, a receiver could explicitly ask to enable or inhibit
   Careful Resume when an application initiates a new connection.

   Examples where a receiver might request to inhibit using Careful
   Resume include:

   1.  a receiver that can predict the pattern of traffic (e.g., insight
       into the volume of data to be sent, the expected length of a
       connection, or the requested maximum transfer rate);

   2.  a receiver with a local indication that a path/local interface
       has changed since the CC parameters were saved;

   3.  knowledge of the current hardware limitations at a receiver;

   4.  a receiver that can predict additional capacity will be needed
       for other concurrent or later flows (i.e., prefers to activate
       Careful Resume for a different connection).

1.3.  Transport Protocol Interaction

   The CWND is one factor that limits the sending rate of a transport
   protocol.  Other mechanisms also constrain the maximum sending rate.
   These include the sender pacing rate and the receiver-advertised
   window (or flow credit),see Section 4.7.

1.4.  Examples of Scenarios of Interest

   This section provides a set of examples where Careful Resume is
   expected to improve performance.  Either endpoint can assume the role
   of a sender or a receiver.  Careful Resume also supports a
   bidirectional data transfer, where both endpoints simultaneously send
   data (e.g., remote execution of an application, or a bidirectional
   video conference call).

   In one example, an application uses a series of connections over a
   path.  Without a new method, each connection would need to
   individually discover appropriate CC parameters, whereas Careful
   Resume allows the flow to use a rate based on the previously observed
   CC parameters.





Kuhn, et al.            Expires 1 September 2025                [Page 5]

Internet-Draft               Careful Resume                February 2025


   In another example, an application connects after a disruption had
   temporarily reduced the path capacity.  When the endpoint returns to
   use the path using Careful Resume, the sending rate can be based on
   the previously observed CC parameters.

   There is particular benefit for any path with an RTT that is much
   larger than typical Internet paths.  In a specific example, an
   application connected via a satellite access network [IJSCN] could
   take 9 seconds to complete a 5.3 MB transfer using standard CC,
   whereas a sender using Careful Resume could reduce this transfer time
   to 4 seconds.  The time to complete a 1 MB transfer could similarly
   be reduced by 62 % [MAPRG111].  This benefit is also expected for
   other sizes of transfer and for different path characteristics when a
   path has a large BDP.

1.5.  Design Principles

   Resuming a connection with parameters that were observed during a
   previous connection is inherently a tradeoff between the potential
   performance gains for the new connection and the risks of degraded
   performance for other connections that share a common bottleneck.  We
   describe a mechanism designed to obtain good performance when
   resuming is appropriate, while seeking to minimise the impact on
   other connections when it is not appropriate.

   The following design principles seek to mitigate the risk that a
   sender adds excessive congestion to an already congested path:

      The first precaution is to recognise whether the conditions have
      changed so much that the saved values are no longer valid.  We
      describe that as the "Reconnaissance Phase".  During that phase,
      the sender will not send more data than allowed for any new
      connection, e.g., using the recommended maximum IW for the first
      RTT of transmitting data [RFC6928] [RFC9002].  The sender will
      only proceed with the resume process if the reconnaissance
      succeeds.  If it fails, for example if previous packets in a
      connection experience congestion or the RTT is significantly
      different, the sender will follow the standard process for new
      connections.  This provides some protection against aggravating
      severe congestion and to establish the minimum RTT.

      The second precaution is to cautiously use the saved parameters
      when resuming.  This is called the "Unvalidated Phase".  For
      example, the jump in the size of CWND/rate is restricted to a
      fraction (1/2) of the saved_cwnd, to avoid starving other flows
      that may have started or increased their capacity after the last
      measurement.  The same principle applies for algorithms that use
      different parameters to classic TCP CC: do not push more than a



Kuhn, et al.            Expires 1 September 2025                [Page 6]

Internet-Draft               Careful Resume                February 2025


      fraction of the remembered values.  For example, a connection
      using a rate-based CC algorithm (e.g.,BBR) will set the pacing
      rate to half the remembered value of the "bottleneck bandwidth".
      The sender also needs to pace all unvalidated packets, to ensure
      the rate does not exceed the previously used rate.  This is
      intended to avoid a sudden influx of a large number of packets
      that could result in building bottleneck queues and disrupt
      existing flows.  Successful validation can allow further increases
      of the CWND; after validating that the used rate did not result in
      congestion, the sender can then increase CWND to saved_cwnd.

      The third precaution is to enter a "Safe Retreat Phase" if the
      validation fails, for example if congestion is detected during
      validation.  The risk here is that the trial use of the saved
      parameters could have disrupted existing connections.  Suppose,
      for example a connection using Reno TCP CC.  When exiting "slow
      start" mode due to loss, Reno would normally update the CWND to a
      "slow start threshold" set to half the volume of data in flight.
      However, during this validation the CWND is restored from the
      saved parameters.  The resultant sending rate could be much larger
      than the value that would have been reached by a "standard" slow
      start process, and the overload of the path potentially causing
      significant congestion to other flows.  Instead of continuing with
      that "too large" value, the retreat process resets the congestion
      window to a value no greater than what a standard process would
      have discovered.  For other CC algorithms, such as Cubic [RFC9438]
      or BBR, the implementation details may differ, but the principle
      remains: trying and failing should not confer an undue advantage
      (e.g., starving) over existing connections that share a common
      bottleneck.

2.  Language, Notation and Terms

   This subsection provides a brief summary of key terms and the
   requirements language.

2.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.








Kuhn, et al.            Expires 1 September 2025                [Page 7]

Internet-Draft               Careful Resume                February 2025


2.2.  The Remote Endpoint

   The Remote Endpoint is an implementation-dependent value that
   identifies the sender's view of the network path being used.  This is
   used to match the current path with a set of CC parameters associated
   with a previously observed path.  It includes:

   *  an identifier representing the sending interface (e.g., a globally
      assigned address/prefix or other local identifier);

   *  an identifier representing the destination (e.g., a name or IP
      address).

   The Remote Endpoint could include information such as the DSCP, the
   transport ports, a flow label, etc.  This information needs to be set
   consistently for a resumed connection to the same endpoint.  Although
   additional information could improve the path differentiation, it
   could reduce the re-usability of saved parameters.

2.3.  Logging support

   This document defines triggers to support logging key events.
   [I-D.custura-tsvwg-careful-resume-qlog] provides definitions that
   enable a Careful Resume implementation to generate qlog events when
   using QUIC.

2.4.  Notation and Terms

   The document uses language drawn from a range of IETF RFCs.  The
   following terms are defined:

      Beta: A scaling factor between 0.5 and 1, the default value is
      0.5.

      Careful Resume (CR): The method specified in this document to
      select initial CC parameters and to more rapidly and safely
      increase the initial sending rate.

      CC parameters: A set of saved CC parameters from observing the
      capacity of an established connection (see Section 1.1).

      CWND: The congestion window, or equivalent CC variable limiting
      the maximum sending rate;

      current_remote_endpoint: The Remote Endpoint;

      current_rtt: A sample measurement of the current RTT;




Kuhn, et al.            Expires 1 September 2025                [Page 8]

Internet-Draft               Careful Resume                February 2025


      flight_size: The current volume of unacknowledged data;

      jump_cwnd: The resumed CWND, used in the Unvalidated Phase.

      LifeTime: The time for which the saved CC parameters can be safely
      re-used.

      max_jump: The configured maximum jump_cwnd;

      PipeSize: A measure of the validated available capacity based on
      the acknowledged data;

      Remote Endpoint: See Section 2.2;

      saved_cwnd: The preserved capacity derived from observation of a
      previous connection (see Section 4.1);

      saved_remote_endpoint: The Remote Endpoint associated with a set
      of CC parameters;

      saved_rtt: The preserved minimum RTT (see Section 4.1).

      Unvalidated Packet: A packet sent when the CWND has been increased
      beyond the size normally permitted by the CC algorithm; if such a
      packet is acknowledged, it contributes to the PipeSize, but if
      congestion is detected, it triggers entry to the Safe Retreat
      Phase.

3.  The Phases of CC using Careful Resume

   This section defines a series of phases that the congestion
   controller moves through as a connection uses Careful Resume.


   Normal ...> Connect -> Reconnaissance --------------------> Normal
   (Observing)              |                                    ^
                            v                                    |
                           Unvalidated --------------------------+
                            |      |                             |
                            |      +--> Validating --------------+
                            |               |                    |
                            |               |                    |
                            +---------------+--> Safe Retreat ---+


         Figure 1: Key transitions between Phases in Careful Resume





Kuhn, et al.            Expires 1 September 2025                [Page 9]

Internet-Draft               Careful Resume                February 2025


   The key phases of Careful Resume are illustrated in Figure 1.
   Examples of these transitions between phases are provided in
   Appendix A.

3.1.  Observing

   An established connection in the Normal Phase, can save a set of CC
   parameters for the specific path to the current endpoint.  Each set
   of CC parameters includes the saved_remote_endpoint and the LifeTime
   (e.g., as a timestamp after which the parameters must not be used).

   *  Observing (saved_cwnd): The saved_cwnd is a measure of the
      currently utilised capacity for the connection, measured as the
      volume of bytes sent during an RTT.  This could be computed by
      measuring the volume of data acknowledged in one RTT.  If the
      measured CWND is less than four times the Initial Window (IW) a
      sender can choose to not save the CC parameters, because the
      additional actions associated with performing Careful Resume for a
      small CWND would not justify its use.

   *  Observing (saved_rtt): The minimum RTT at the time of observation
      is saved as the saved_rrt.

   Implementation notes are provided in Section 4.1.

3.2.  Reconnaissance Phase

   A sender enters the Reconnaissance Phase after connection setup.  In
   this phase, the CWND is initialised to the IW, and the sender
   transmits initial data.  The CWND MAY be increased using normal CC as
   each ACK confirms delivery of previously unacknowledged data (i.e.,
   the CC is unchanged).

   The phase seeks to determine if the path is consistent with a
   previously observed path (saved as a set of CC parameters).  The
   following conditions need to be confirmed before the sender enters
   the Reconnaissance Phase:

   *  Reconnaissance Phase (Endpoint change): If the
      current_remote_endpoint is not the same as one of the
      saved_remote_endpoints, the sender MUST enter the Normal Phase.
      (A difference in the Remote Endpoint indicates a network path was
      different to one that was observed.)

   *  Reconnaissance Phase (Lifetime of saved CC parameters): The CC
      parameters are temporal.  If the LifeTime of the observed CC
      parameters is exceeded, the CC parameters are not used and the
      sender enters the Normal Phase.



Kuhn, et al.            Expires 1 September 2025               [Page 10]

Internet-Draft               Careful Resume                February 2025


   The following actions are performed during the Reconnaissance Phase:

   *  Reconnaissance Phase (Recording the current_rtt): During this
      phase, a sender MUST record the minimum RTT for the current
      connection as the current_rtt

   *  Reconnaissance Phase (Detected congestion): If the sender detects
      congestion (e.g., packet loss or ECN-CE marking), the sender MUST
      enter the Normal Phase to respond to the detected congestion.

   *  Reconnaissance Phase (Using saved_cwnd): Only one connection can
      use a specific set of saved CC parameters.  If another connection
      has already started to use the saved_cwnd, the sender MUST enter
      the Normal Phase.

   *  Reconnaissance Phase (Path confirmed): When a sender has confirmed
      the RTT (see Section 4.2.1) and also has received an
      acknowledgement for the initial data without reported congestion,
      it MAY then enter the Unvalidated Phase.  Although a sender can
      immediately transition to the Unvalidated Phase, this transition
      MAY be deferred to the time at which more data is sent than would
      have normally permitted by the CC algorithm.

   If a sender is rate-limited [RFC7661], it might send insufficient
   data to be able to validate transmission at the higher rate.  A
   sender is allowed to remain in the Reconnaissance Phase and to not
   transition to the Unvalidated Phase until there is more data in the
   transmission buffer than normally permitted by the CC algorithm.

   When a path is not confirmed, Careful Resume does not modify the CWND
   and the sender enters the Normal Phase.

   Implementation notes are provided in Section 4.2.

3.3.  Unvalidated Phase

   The Unvalidated Phase is designed to enable the CWND to more rapidly
   get up to speed by using paced transmission of a tentatively
   increased CWND.  The following conditions need to be confirmed before
   the sender enters the Unvalidated Phase:

   *  Unvalidated Phase (Confirming the path on entry): If the
      current_rtt is less than or equal to (saved_rtt / 2) or the
      current_rtt is greater than (saved_rtt x 10) (see Section 4.2.1),
      the sender MUST enter the Normal Phase (logged as
      rtt_not_validated in [I-D.custura-tsvwg-careful-resume-qlog]).
      This is because the calculation of a sending rate from a
      saved_cwnd is directly impacted by the RTT, therefore a



Kuhn, et al.            Expires 1 September 2025               [Page 11]

Internet-Draft               Careful Resume                February 2025


      significant change in the RTT is a strong indication that the
      previously observed CC parameters are not valid for the current
      path.

   On entry to the Unvalidated Phase, the actions are performed:

   *  Unvalidated Phase (Initialising PipeSize): The variable PipeSize
      is initialised to the flight_size on entry to the Unvalidated
      Phase.  This records the CWND before a jump is applied.

   *  Unvalidated Phase (Setting the jump_cwnd): To avoid starving other
      flows that could have either started or increased their use of
      capacity after the Observation Phase, the jump_cwnd MUST be no
      more than half of the saved_cwnd.  Hence, jump_cwnd is less than
      or equal to Min(max_jump,(saved_cwnd/2)).  CWND = jump_cwnd.

   The following actions are performed during the Unvalidated Phase:

   *  Unvalidated Phase (Pacing transmission): All packets sent in the
      Unvalidated Phase MUST use pacing based on the current_rtt.

   *  Unvalidated Phase (Confirming the path during transmission): If a
      sender determines that the previous CC parameters are not valid
      (due to a detected path change), the Safe Retreat Phase is
      entered.  (This is because in the Unvalidated Phase, insufficient
      time has passed for a sender to receive feedback validating the
      jump in CWND.  Therefore, any detected congestion must have
      resulted from packets sent before the Unvalidated Phase.)

   *  Unvalidated Phase (Completed sending all unvalidated packets): The
      sender enters the Validating Phase when the flight_size equals the
      CWND.

   *  Unvalidated Phase (Tracking PipeSize): The variable PipeSize is
      increased by the volume of data acknowledged by each received ACK.
      (This indicates a previously unvalidated packet has been
      successfully sent over the path.)

   *  Unvalidated Phase (Receiving acknowledgement for an unvalidated
      packet): The sender enters the Validating Phase when an
      acknowledgement is received for the first packet number (or
      higher) that was sent in the Unvalidated Phase or greater than 1
      RTT has passed in the Unvalidated Phase (logged as
      first_unvalidated_packet_acknowledged, see Section 2.3).







Kuhn, et al.            Expires 1 September 2025               [Page 12]

Internet-Draft               Careful Resume                February 2025


   *  Unvalidated Phase (Limiting time in the Unvalidated Phase): A
      sender enters the Validating Phase if more than one RTT has
      elapsed while in the Unvalidated Phase (logged as rtt_exceeded,
      see Section 2.3).

   Implementation notes are provided in Section 4.3.

   Notes describing how this phase is implemented for BBR are provided
   in Appendix B.1.

3.4.  Validating Phase

   The Validating Phase checks whether all packets sent in the
   Unvalidated Phase were received without inducing congestion.  The
   CWND remains unvalidated and the sender typically remains in this
   phase for one RTT.  On entry to the Validating Phase, the sender:

   *  Validating Phase (Check flight_size on entry): On entry to the
      Validating Phase, if the flight_size is less than or equal to the
      PipeSize, the Normal Phase is entered with the CWND reset to the
      PipeSize.  (The PipeSize does not include the part of the
      jump_cwnd that was not utilised.)

   *  Validating Phase (Limiting CWND on entry): On entry to the
      Validating Phase (when flight_size is greater than the PipeSize),
      the CWND is set to the flight_size.

   During the Validating Phase, the sender performs the following
   actions:

   *  Validating Phase (Tracking PipeSize): The PipeSize is increased by
      the volume of acknowledged data for each received ACK that
      indicates a packet was successfully sent over the path.

   *  Validating Phase (Updating CWND): The CWND is updated using the
      normal rules for the current congestion controller, this typically
      will use "slow start", which allows CWND to be increased for each
      received acknowledgement that indicates a packet has been
      successfully sent across the path.

   *  Validating Phase (Congestion indication): If a sender determines
      that congestion was experienced (e.g., packet loss or ECN-CE
      marking), Careful Resume enters the Safe Retreat Phase (logged as
      packet_loss and ECN_CE, see Section 2.3).

   *  Validating Phase (Receiving acknowledgement of the unvalidated
      packets): The sender enters the Normal Phase when an
      acknowledgement is received for the last packet number (or higher)



Kuhn, et al.            Expires 1 September 2025               [Page 13]

Internet-Draft               Careful Resume                February 2025


      that was sent in the Unvalidated Phase (logged as
      last_unvalidated_packet_acknowledged, see Section 2.3).  This
      means that the packets sent in the Unvalidated Phase were
      acknowledged without congestion.

   Notes describing how this phase is implemented for BBR are provided
   in Appendix B.2.

3.5.  Safe Retreat Phase

   This phase is entered when congestion is detected for an unvalidated
   packet.  It drains the path of other unvalidated packets.

   On entry to the Safe Retreat Phase, the following actions are
   performed:

   *  Safe Retreat Phase (Removing saved information): The set of saved
      CC parameters for the path are deleted, to prevent these from
      being used again by other flows.

   *  Safe Retreat Phase (Re-initializing CWND): The CWND MUST be
      reduced to no more than (PipeSize/2).  This avoids persistent
      starvation by allowing capacity for other flows to regain their
      share of the total capacity.  (Note: The minimum CWND in QUIC is 2
      packets,see: [RFC9002] section 4.8).

   *  Safe Retreat Phase (Loss Recovery): When the CWND is reduced, a
      QUIC sender can immediately send a single packet prior to the
      reduction [RFC9002].  A similar method for TCP is described in
      Section 5 of [RFC6675].  This speeds up loss recovery if the data
      in the lost packet is retransmitted.

   In the Safe Retreat Phase, the sender performs the following actions:

   *  Safe Retreat Phase (Tracking PipeSize): The sender continues to
      update the PipeSize after processing each acknowledgement.  (This
      PipeSize is used to reset the ssthresh when leaving this phase, it
      does not modify CWND.)

   *  Safe Retreat Phase (Maintaining CWND): The CWND MUST NOT be
      increased in the Safe Retreat Phase.

   *  Safe Retreat Phase (Acknowledgement of unvalidated packets): The
      sender enters the Normal Phase when the last packet (or a later
      packet) sent during the Unvalidated Phase has been acknowledged.
      On leaving the Safe Retreat Phase, the ssthresh MUST be set to no
      larger than the most recently measured PipeSize * Beta, where Beta
      is a scaling factor between 0.5 and 1.  The default value is 0.5,



Kuhn, et al.            Expires 1 September 2025               [Page 14]

Internet-Draft               Careful Resume                February 2025


      chosen to reduce the probability of inducing a second round of
      congestion.  Cubic defines a Beta__cubic of 0.7 [RFC9438] (logged
      as exit_recovery in [I-D.custura-tsvwg-careful-resume-qlog]).

   Implementation notes are provided in Section 4.5.

   Notes describing the implementation of the Safe Retreat Phase for BBR
   are described in Appendix B.3.

3.5.1.  Loss Recovery after entering Safe Retreat

   Unacknowledged packets that were sent in the Unvalidated Phase can be
   lost.  Loss recovery commences using the reduced CWND that was set on
   entry to the Safe Retreat Phase and continues until acknowledgment of
   the last packet number (or a later packet) sent in the Unvalidated
   Phase.  If the last unvalidated packet is not cumulatively
   acknowledged, then additional packets might need to be retransmitted.

3.6.  RTO Expiry while using Careful Resume

   A sender that experiences a Retransmission Time Out (RTO) expiry
   ceases to use Careful Resume.  The sender enters the Normal Phase.
   If using BBR, the normal processing of packet losses will cause it to
   enter the Drain state while the "carefully-resuming" flag is set to
   True,see Appendix B.3.

   As in loss recovery, data sent in the Unvalidated Phase could be
   later acknowledged after an RTO event (see Section 3.5.1).

3.7.  Normal Phase

   In the Normal Phase, the sender transitions to using the normal CC
   algorithm (e.g., in congestion avoidance when CWND is more than
   ssthresh, or slow start when less than or equal to ssthresh).  (Note
   that when the sender did not use the entire jump_cwnd the CWND was
   reduced on entering the Validating Phase.)

   Implementation notes are provided in Section 4.6.

4.  Implementation Notes and Guidelines

   This section provides guidance for implementation and use.









Kuhn, et al.            Expires 1 September 2025               [Page 15]

Internet-Draft               Careful Resume                February 2025


4.1.  Observing the Path Capacity

   There are various approaches to measuring the capacity used by a
   connection.  Congestion controllers, such as CUBIC or Reno, can
   estimate the capacity based on the CWND, flight_size, acknowledged
   rate, etc.  A different approach could estimate the same parameters
   for a rate-based congestion controller, such as BBR
   [I-D.ietf-ccwg-bbr], or by observing the rate at which data is
   acknowledged by the remote endpoint.

   Implementations are required to calculate a saved_rtt, measuring the
   minimum RTT while observing the capacity.  For example, this could be
   the minimum of a set RTT of measurements measured over the previous 5
   minutes.

   Implementations are expected to include a LifeTime parameter in the
   CC parameters that can be used to remove old CC parameters when no
   longer needed, or the CC parameters are out of date.

   *  There are cases where the current CWND does not reflect the path
      capacity.  At the end of slow start, the CWND can be significantly
      larger than needed to fully utilise the path (i.e., a CWND
      overshoot).  It is inappropriate to use an overshoot in the CWND
      as a basis for estimating the capacity.  In most cases, the CWND
      will converge to a stable value after several more RTTs.  One
      mitigation could be to set the saved_cwnd based on the
      flight_size, or an averaged CWND.

   *  When a sender is rate-limited, or in the RTT following a burst of
      transmission, a sender typically transmits less data than allowed
      by the CWND.  Such observations could be discounted when
      estimating the saved_cwnd (e.g., when a previous observation
      recorded a higher value.)

4.2.  Confirming the Path in the Reconnaissance Phase

   In the Reconnaissance Phase, a sender initiates a connection and
   starts sending initial data, while measuring the current_rtt.  The CC
   is not modified.  A sender therefore needs to limit the initial data,
   sent in the first RTT of transmitted data, to not more than the IW
   [RFC9002].  This transmission using the IW is assumed to be a safe
   starting point for any path to avoid adding excessive load to a
   potentially congested path.

   Careful Resume does not permit multiple concurrent reuse of the saved
   CC parameters.  When multiple new concurrent connections are made to
   a server, each can have a valid saved_remote_endpoint, but the
   saved_cwnd can only be used for one connection at a time.  This is to



Kuhn, et al.            Expires 1 September 2025               [Page 16]

Internet-Draft               Careful Resume                February 2025


   prevent a sender from performing multiple jumps in the CWND, each
   individually based on the same saved_cwnd, and hence creating an
   excessive aggregate load at the bottleneck.

   The method that is used to prevent re-use of the saved CC parameters
   will depend upon the design of the server (e.g., if all connections
   from a given client IP arrive at the same server process, then the
   server process could use a hash table, whereas when using some types
   of load balancing, a distributed system might be needed to ensure
   this invariant when the load balancing hashes connections by 4-tuple
   and hence multiple connections from the same client device are served
   by different server processes.)

4.2.1.  Confirming the Path

   Path characteristics can change over time for many reasons.  This can
   result in the previously observed CC parameters becoming irrelevant.

   To help confirm the path, the sender compares the saved_rrt with each
   current_rtt sample.

   If the current_rtt is less than a half of the saved_rrt, this is
   regarded as too small, this is an indicator of a path change.  (This
   factor of two arises, because the jump_cwnd is calculated as half the
   measured saved_cwnd and sending rate ought not to exceed the observed
   rate when the saved_cwnd was measured.)

   If the current RTT is larger than saved_rtt, this would results in a
   proportionally lower resumed rate, because the transmission is paced
   based on the current_rtt ,and hence is still safe.  If the
   current_rtt is incorrectly measured as larger than the actual path
   RTT, the sender will receive an ACK for an unvalidated packet before
   it completed the Unvalidated Phase, this ACK resets the CWND to
   reflect the flight_size, and the sender then enters the Validating
   Phase.  A current_rtt more than ten times the saved_rrt is indicative
   of a path change.  (The value of ten accommodates both increases in
   latency from buffering on a path and any variation between RTT
   samples).

   A sender also verifies that the initial data was acknowledged (i.e.,
   a loss could be indicative of persistent congestion).  A sender in
   Reconnaissance Phase reverts to the Normal Phase if congestion is
   detected.  Some transport protocols implement CC mechanisms that
   infer potential congestion from an increase in the current_rtt.
   Designs need to consider if such an indication is a suitable trigger
   to revert to the Normal Phase.





Kuhn, et al.            Expires 1 September 2025               [Page 17]

Internet-Draft               Careful Resume                February 2025


4.3.  Safety in the Unvalidated Phase

   This section considers the safety for using saved CC parameters to
   tentatively update the CWND.  This is designed to mitigate the risk
   of adding excessive congestion to an already congested path.

   A connection MUST NOT directly use the previously saved_cwnd to
   directly initialise a new flow causing it to resume sending at the
   same rate.  The jump_cwnd is therefore limited to half the previously
   saved_cwnd.

4.3.1.  Lifetime of CC Parameters

   The long-term use of the previously observed parameters is not
   appropriate, a lifetime therefore needs to be specified during which
   the saved CC parameters can be safely re-used.  [RFC9040] provides
   guidance on the implementation of TCP Control Block Interdependence,
   but does not specify how long a saved parameter can safely be reused.
   [RFC7661] specifies a method for managing an unvalidated CWND.  This
   states: "After a fixed period of time (the non-validated period
   (NVP)), the sender adjusts the CWND (Section 4.4.3).  The NVP SHOULD
   NOT exceed five minutes."  Section 5 of [RFC7661] discusses the
   rationale for choosing that period.  However, RFC 7661 targets rate-
   limited connections using normal CC.  Careful Resume includes
   additional mechanisms to avoid and mitigate the effects of overshoot,
   and therefore this can be used to justify a longer lifetime of the
   saved_cwnd using Careful Resume.

4.3.2.  Pacing in the Unvalidated Phase

   A sender needs to avoid sending a burst of packets greater than IW as
   a result of a step-increase in the CWND.  This is consistent with
   [RFC8085], [RFC9000].

   Pacing packets as a function of the current_rtt, rather than the
   saved_rrt provides additional safety during the Unvalidated Phase,
   because it avoids a smaller saved_rrt inflating the sending rate.  A
   limit to the minimum acceptable current_RTT avoids sending at a rate
   higher than was previously observed.

   The following example provides a relevant pacing rhythm: An Inter-
   packet Transmission Time (ITT) is determined by using the current
   Maximum Message Size (MMS), the saved_cwnd and the current_RTT.  A
   safety margin can be configured to avoid sending more than a maximum
   (max_jump):

      jump_cwnd = Min(max_jump,saved_cwnd/2)




Kuhn, et al.            Expires 1 September 2025               [Page 18]

Internet-Draft               Careful Resume                February 2025


      ITT = (current_RTT x MMS)/jump_cwnd

   This follows the idea presented in [RFC4782],
   [I-D.irtf-iccrg-sallantin-initial-spreading] and [CONEXT15].  Other
   sender mitigations have also been suggested to avoid line-rate bursts
   (e.g., [I-D.hughes-restart]).

4.3.3.  Exit from the Unvalidated Phase because of Variable Network
        Conditions

   *  Careful Resume has been designed to be robust to changes in
      network conditions due to variations in the forwarding path, such
      as reconfiguration of equipment, or changes in the link
      conditions.  This is mitigated by path confirmation.

   *  Careful Resume has been designed to be robust to changes in
      network traffic, including the arrival of new flows that compete
      for capacity at a shared bottleneck.  This is mitigated by jumping
      to no more than a half of the saved_cwnd and by pacing.

   *  Careful Resume has been designed to avoid unduly suppressing flows
      that have used the capacity since the capacity was observed.  This
      is further mitigated by bounding the duration of the Unvalidated
      Phase and the following Validating Phase, and the conservative
      design of the Safe Retreat Phase.

4.4.  The Validating Phase

   The purpose of the Validating Phase is to trigger an entry to the
   Safe Retreat Phase if the capacity is not validated.

   When a sender completes the Unvalidated Phase, either by sending a
   jump_cwnd of data or after one RTT or an acknowledgment for an
   unvalidated packet, it ceases to use the unvalidated CWND.

   If the flight_size was less than or equal to the PipeSize, the sender
   resets the CWND to the PipeSize, and enters the Normal Phase.

   Otherwise, if the CWND is larger than the flight_size, the CWND is
   reset to the flight_size.  The sender then awaits reception of ACKs
   to validate the use of this capacity.

   New packets are sent when previously sent data is newly acknowledged.
   The CWND is increased during the Validating Phase, based on received
   ACKs.  This allows new data to be sent, but this does not have any
   final impact on the CWND if congestion is subsequently detected.





Kuhn, et al.            Expires 1 September 2025               [Page 19]

Internet-Draft               Careful Resume                February 2025


4.5.  Safety in the Safe Retreat Phase

   This section considers the safety after congestion has been detected
   for unvalidated packets.

   The Safe Retreat Phase sets a safe CWND value to drain any
   unvalidated packets from the path after a packet loss has been
   detected or when ACKs that indicate sent packets were ECN CE-marked.
   The CC parameters that were used are invalid, and are removed.

   The Safe Retreat reaction differs from a traditional reaction to
   detected congestion, because a jump_cwnd can result in a
   significantly higher rate than would be allowed by Slow-Start.  Such
   a jump could aggressively feed a congested bottleneck, resulting in
   overshoot where a disproportionate number of packets from existing
   flows are displaced from the buffer at the congested bottleneck.  For
   this reason, a sender in the Safe Retreat Phase needs to react to
   detected congestion by reducing CWND significantly below the
   saved_cwnd.

      During loss recovery, a receiver can cumulatively acknowledge data
      that was previously sent in the Unvalidated Phase in addition to
      acknowledging the successful retransmission of data.  [RFC3465]
      describes how to appropriately account for such ACKs.  ACKS
      received for unvalidated packets are tracked to measure the
      maximum available capacity, called the PipeSize (The first
      unvalidated packet can be determined by recording the sequence
      number of the first packet sent in the Unvalidated Phase.)  This
      calculated PipeSize is later used to reset the ssthresh.  However,
      note that this is not a safe measure of the currently available
      share of the capacity whenever there was also a significant
      overshoot at the bottleneck, and must not be used to reinitialise
      the CWND.

      The Proportional Rate Reduction (PRR) [RFC6937] assumes that it is
      safe to reduce the rate gradually when in congestion avoidance.
      PRR is therefore not appropriate when there might be significant
      overshoot in the use of the capacity, which can be the case when
      the Safe Retreat Phase is entered.

      The recovery from loss depends on the design of a transport
      protocol.  A TCP or SCTP sender is required to retransmit all lost
      data [RFC5681].  For some transports (e.g., QUIC and DCCP), the
      need for loss recovery depends on the sender policy for
      retransmission.  On entry to the Safe Retreat Phase, the CWND can
      be significantly reduced.  When there was multiple loss, a sender
      recovering all lost data could then take multiple RTTs to
      complete.



Kuhn, et al.            Expires 1 September 2025               [Page 20]

Internet-Draft               Careful Resume                February 2025


4.6.  Returning to Normal Congestion Control

   After using Careful Resume, the CC controller returns to the Normal
   Phase.  The implementation details for different transports depend on
   the design of the transport.  In the Normal Phase, a sender is
   permitted to start Observing the capacity of the path.

4.7.  Limitations from Transport Protocols

   The CWND is one factor that limits the sending rate of a sender.
   Other mechanisms can also constrain the maximum sending rate of a
   transport protocol.  A transport protocol might need to update these
   mechanisms to fully utilise the CWND made available by Careful
   Resume:

      A TCP sender is limited by the receiver window (rwnd).  Unless
      configured at a receiver, the rwnd constrains the rate of increase
      for a connection and reduces the benefit of Careful Resume.

      QUIC includes flow control mechanisms and mechanisms to prevent
      amplification attacks.  In particular, a QUIC receiver might need
      to issue proactive MAX_DATA frames to increase the flow control
      limits of a connection that is started when using Careful Resume
      to gain the expected benefit.

5.  Acknowledgments

   The authors would like to thank John Border, Gabriel Montenegro,
   Patrick McManus, Ian Swett, Igor Lubashev, Robin Marx, Roland Bless,
   Franklin Simo, Kazuho Oku, Tong, Ana Custura, Neal Cardwell, Marten
   Seemann, Matthias Hofstaetter and Joerg Deutschmann for their
   fruitful comments in developing this specification.

   The authors would like to thank Tom Jones for co-authoring several
   previous versions of this document.

6.  IANA Considerations

   No current parameters are required to be registered by IANA.

7.  Security Considerations

   This document does not exhibit specific security considerations.

8.  References

8.1.  Normative References




Kuhn, et al.            Expires 1 September 2025               [Page 21]

Internet-Draft               Careful Resume                February 2025


   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
              Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
              March 2017, <https://www.rfc-editor.org/info/rfc8085>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC9000]  Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
              Multiplexed and Secure Transport", RFC 9000,
              DOI 10.17487/RFC9000, May 2021,
              <https://www.rfc-editor.org/info/rfc9000>.

8.2.  Informative References

   [CONEXT15] Li, Q., Dong, M., and P B. Godfrey, "Halfback: Running
              Short Flows Quickly and Safely", ACM CoNEXT , 2015.

   [I-D.custura-tsvwg-careful-resume-qlog]
              Custura, A. and G. Fairhurst, "Quic Logging for
              Convergence of Congestion Control from Retained State",
              Work in Progress, Internet-Draft, draft-custura-tsvwg-
              careful-resume-qlog-01, 28 February 2025,
              <https://datatracker.ietf.org/api/v1/doc/document/draft-
              custura-tsvwg-careful-resume-qlog/>.

   [I-D.hughes-restart]
              Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP
              Slow-Start Restart After Idle", Work in Progress,
              Internet-Draft, draft-hughes-restart-00, December 2001,
              <https://www.ietf.org/archive/id/draft-hughes-restart-
              00.txt>.

   [I-D.ietf-ccwg-bbr]
              Cardwell, N., Swett, I., and J. Beshay, "BBR Congestion
              Control", Work in Progress, Internet-Draft, draft-ietf-
              ccwg-bbr-01, 21 October 2024,
              <https://datatracker.ietf.org/doc/html/draft-ietf-ccwg-
              bbr-01>.

   [I-D.irtf-iccrg-sallantin-initial-spreading]
              Sallantin, R., Baudoin, C., Arnal, F., Dubois, E., Chaput,
              E., and A. Beylot, "Safe increase of the TCP's Initial



Kuhn, et al.            Expires 1 September 2025               [Page 22]

Internet-Draft               Careful Resume                February 2025


              Window Using Initial Spreading", Work in Progress,
              Internet-Draft, draft-irtf-iccrg-sallantin-initial-
              spreading-00, 15 January 2014,
              <https://datatracker.ietf.org/doc/html/draft-irtf-iccrg-
              sallantin-initial-spreading-00>.

   [IJSCN]    Thomas, L., Dubois, E., Kuhn, N., and E. Lochin, "Google
              QUIC performance over a public SATCOM access",
              International Journal of Satellite Communications and
              Networking 10.1002/sat.1301, 2019.

   [MAPRG111] Kuhn, N., Stephan, E., Fairhurst, G., Jones, T., and C.
              Huitema, "Feedback from using QUIC's 0-RTT-BDP extension
              over SATCOM public access", IETF 111 - MAPRG meeting ,
              2022.

   [RFC3465]  Allman, M., "TCP Congestion Control with Appropriate Byte
              Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February
              2003, <https://www.rfc-editor.org/info/rfc3465>.

   [RFC4782]  Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick-
              Start for TCP and IP", RFC 4782, DOI 10.17487/RFC4782,
              January 2007, <https://www.rfc-editor.org/info/rfc4782>.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
              <https://www.rfc-editor.org/info/rfc5681>.

   [RFC5783]  Welzl, M. and W. Eddy, "Congestion Control in the RFC
              Series", RFC 5783, DOI 10.17487/RFC5783, February 2010,
              <https://www.rfc-editor.org/info/rfc5783>.

   [RFC6675]  Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
              and Y. Nishida, "A Conservative Loss Recovery Algorithm
              Based on Selective Acknowledgment (SACK) for TCP",
              RFC 6675, DOI 10.17487/RFC6675, August 2012,
              <https://www.rfc-editor.org/info/rfc6675>.

   [RFC6928]  Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis,
              "Increasing TCP's Initial Window", RFC 6928,
              DOI 10.17487/RFC6928, April 2013,
              <https://www.rfc-editor.org/info/rfc6928>.

   [RFC6937]  Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional
              Rate Reduction for TCP", RFC 6937, DOI 10.17487/RFC6937,
              May 2013, <https://www.rfc-editor.org/info/rfc6937>.





Kuhn, et al.            Expires 1 September 2025               [Page 23]

Internet-Draft               Careful Resume                February 2025


   [RFC7661]  Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating
              TCP to Support Rate-Limited Traffic", RFC 7661,
              DOI 10.17487/RFC7661, October 2015,
              <https://www.rfc-editor.org/info/rfc7661>.

   [RFC8867]  Sarker, Z., Singh, V., Zhu, X., and M. Ramalho, "Test
              Cases for Evaluating Congestion Control for Interactive
              Real-Time Media", RFC 8867, DOI 10.17487/RFC8867, January
              2021, <https://www.rfc-editor.org/info/rfc8867>.

   [RFC9002]  Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection
              and Congestion Control", RFC 9002, DOI 10.17487/RFC9002,
              May 2021, <https://www.rfc-editor.org/info/rfc9002>.

   [RFC9040]  Touch, J., Welzl, M., and S. Islam, "TCP Control Block
              Interdependence", RFC 9040, DOI 10.17487/RFC9040, July
              2021, <https://www.rfc-editor.org/info/rfc9040>.

   [RFC9406]  Balasubramanian, P., Huang, Y., and M. Olson, "HyStart++:
              Modified Slow Start for TCP", RFC 9406,
              DOI 10.17487/RFC9406, May 2023,
              <https://www.rfc-editor.org/info/rfc9406>.

   [RFC9438]  Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed.,
              "CUBIC for Fast and Long-Distance Networks", RFC 9438,
              DOI 10.17487/RFC9438, August 2023,
              <https://www.rfc-editor.org/info/rfc9438>.

Appendix A.  Notes on the Careful Resume Phases

   The table below is provided to illustrate the operation of Careful
   Resume.  This table is informative, please refer to the body of the
   document for the normative specification.  The description is based
   on a Normal CC that uses Reno.  The PipeSize tracks the validated
   CWND.
















Kuhn, et al.            Expires 1 September 2025               [Page 24]

Internet-Draft               Careful Resume                February 2025


   +------+---------+---------+------------+-----------+------------+
   |Phase |Normal   |Recon.   |Unvalidated |Validating |Safe Retreat|
   +------+---------+---------+------------+-----------+------------+
   |      |Observing|Confirm  |Send faster |Validate   |Drain path; |
   |      |CC params|path     |using       |new CWND;  |Update PS   |
   |      |         |         |saved_cwnd  |Update PS  |            |
   +------+---------+---------+------------+-----------+------------+
   |On    |    -    |CWND=IW  |PS=FS       |If (FS>PS) |CWND=(PS/2) |
   |entry:|         |         |jump_cwnd   |{CWND=FS}  |            |
   |      |         |         |=saved_cwnd |else       |            |
   |      |         |         |/2;         |{CWND=PS;  |            |
   |      |         |         |CWND        |enter      |
   |      |         |         |=jump_cwnd  |Normal}    |            |
   +------+---------+---------+------------+-----------+------------+
   |CWND: |When in  |CWND     |CWND is not |CWND can   |CWND is not |
   |      |observe, |increases|increased   |increase   |increased   |
   |      |measure  |using SS |            |using      |            |
   |      |saved    |         |            |normal CC  |            |
   |      |_cwnd    |         |            |           |            |
   +------+---------+---------+------------+-----------+------------+
   |PS:   |    -    |    -    |              PS+=ACked              |
   +------+---------+---------+------------+-----------+------------+
   |RTT:  |Measure  |Measure  |      -     |     -     |      -     |
   |      |saved_rtt|current  |            |           |            |
   |      |         |_rtt     |            |           |            |
   +------+---------+---------+------------+-----------+------------+
   |If    |Normal   |Normal   |          Enter         |      -     |
   |loss  |CC       |CC;      |          Safe          |            |
   |or    |         |CR is not|          Retreat       |            |
   |ECNCE:|         |allowed  |                        |            |
   +------+---------+---------+------------+-----------+------------+
   |Next  |Observing|If (     |If (FS=CWND |If (ACK    |If (ACK     |
   |Phase:|(as      |FS=CWND, |or >1 RTT   |>= last    |>= last     |
   |      |needed)  |Lifetime,|has passed  |unvalidated|unvalidated |
   |      |         |and RTT  |or ACK      |packet),   |packet),    |
   |      |         |confirmed|>= first    |enter      |ssthresh =  |
   |      |         |), enter |unvalidated |Normal     |PS x Beta;  |
   |      |         |Unvalidat|packet),    |           |and enter   |
   |      |         |ed else  |enter       |           |Normal      |
   |      |         |enter    |Validating  |           |            |
   |      |         |Normal   |            |           |            |
   +------+---------+---------+------------+-----------+------------+

         Figure 2: Illustration of the operation of Careful Resume







Kuhn, et al.            Expires 1 September 2025               [Page 25]

Internet-Draft               Careful Resume                February 2025


   The following abbreviations are used SS = Slow-Start FS =
   flight_size; PS = PipeSize; ACK = highest acknowledged packet.  The
   PipeSize tracks the validated portion of the CWND.  It is set to the
   CWND on entry to the Unvalidated Phase and is updated as each
   additional packet is acknowledged.  The default value of Beta is 0.5.

   Note: For an implementation that keeps track of transmitted data in
   terms of packets: In the Unvalidated Phase, the first unvalidated
   packet corresponds to the highest sent packet recorded on entry to
   this phase.  In the Validating Phase and Safe Retreat Phase, the
   sender tracks the last unvalidated packet (this is also the highest
   sent packet number recorded on entry to this phase).

   The remaining subsections provide informative examples of use.

   Note: To simplify the description, these examples are described using
   packet numbers (whereas QLOG variables are expressed in bytes).

A.1.  Example with No Loss

   In the first example of using Careful Resume, the sender starts by
   sending IW packets, assumed to be 10 packets, in the Reconnaissance
   Phase, and then continues in a subsequent RTT to send more packets
   until the sender becomes CWND-limited (i.e., flight_size = CWND).

   The sender in the Reconnaissance Phase then confirms the RTT and
   other conditions for using Careful Resume.  In this example, this is
   confirmed when the sender has 29 packets in flight.

   The sender then enters the Unvalidated Phase.  (This path
   confirmation could have happened earlier if data had been available
   to send.)  The sender initialises the PipeSize to the flight_size (in
   this case, 29 packets) and then sets the CWND to 150 packets (based
   upon half of the previously observed saved_cwnd of 300 packets).

   The sender now sends 121 unvalidated packets (the unused portion of
   the current CWND).  Each time a packet is sent, the sender checks
   whether 1 RTT has passed since entering the Unvalidated Phase
   (otherwise, the Validating Phase is entered).  This check triggers
   only for cases where a sender is rate-limited,see the following
   example.

   The PipeSize increases after each ACK is received.








Kuhn, et al.            Expires 1 September 2025               [Page 26]

Internet-Draft               Careful Resume                February 2025


   When the first unvalidated packet is acknowledged (packet number 30)
   the sender enters the Validating Phase.  (This transition would also
   occur if the flight_size increased to equal CWND.)  During this
   phase, the CWND can be increased for each ACK that acknowledges an
   unvalidated packet, because this indicates that the packet was
   validated.

   When an ACK is received for the last packet that was sent in the
   Unvalidated Phase, the sender has completed using Careful Resume.  It
   then enters the Normal Phase.  For example, if the CWND is less than
   ssthresh, a Reno or Cubic sender in the Normal Phase is permitted to
   use Slow-Start to grow the CWND towards the ssthresh, and will then
   enter congestion avoidance.

A.2.  Example with No Loss, Rate-Limited

   A rate-limited sender will not fully utilise the available CWND when
   using Careful Resume, and CWND is therefore reset on entry to the
   Validating Phase, as described below.

   The sender starts by sending up to IW packets (10) in the
   Reconnaissance Phase.  It commences as described in the first
   example, transitioning to the Unvalidated Phase, where the CWND is
   set to 150 packets, and the PipeSize is set to the flight_size (i.e.,
   29 packets).

   The sender then becomes rate-limited, because the example only sends
   50 unvalidated packets.

   After about one RTT (e.g., by comparing the current time with local
   timestamps for each sent packet or by receiving an ACK for the first
   unvalidated packet), the sender will still not have fully-used the
   CWND.  It then enters the Validating Phase and resets the CWND to the
   current flight_size, (i.e., 50 packets).  During this phase, the CWND
   can be increased for each received ACK that validates reception of an
   unvalidated packet.  The PipeSize also increases with each ACK
   received, to reflect the discovered capacity.

   The sender completes using Careful Resume, when an ACK is received
   for the last packet that was sent in the Unvalidated Phase.  It then
   enters the Normal Phase, as in a previous example with no loss.










Kuhn, et al.            Expires 1 September 2025               [Page 27]

Internet-Draft               Careful Resume                February 2025


A.3.  Example with Loss detected in the Reconnaissance Phase

   When a packet is lost in the Reconnaissance Phase, the sender will
   enter the Normal Phase and recover the loss using the normal method.
   (It is considered that the sender has discovered a potential capacity
   limit and is not allowed to continue to use Careful Resume, therefore
   there is no change to the CC method and the CWND is the same as if
   Careful Resume had not been attempted.)

A.4.  Example with Loss detected in the Validating Phase

   As in the first example, the sender enters the Unvalidated Phase with
   a CWND of 150 packets and with the PipeSize initialised to the
   flight_size (i.e., 29 packets).

   The sender now sends 121 unvalidated packets (consuming the remaining
   unused CWND).  This example considers the case when one of the
   unvalidated packets is lost.  We assume in the example, that the lost
   packet is 64 (the 35th packet sent in the Unvalidated Phase).

   ACKs confirm reception of the first 34 unvalidated packets.  The
   PipeSize at this time is equal to 63 (29 + 34) packets.

   A loss is then detected (by a timer or by receiving three ACKs that
   do not cover packet number 35).  The sender then enters the Safe
   Retreat Phase because the CWND was not validated.  The PipeSize at
   this point is equal to 66 (29 + 34) packets.  Assuming that the IW
   was 10 packets, the CWND is reset to Max(10,PS/2) = Max(10,66/2) = 33
   packets.  This CWND is used during the Safe Retreat Phase, because
   congestion was detected and the sender still does not yet know if the
   remaining unvalidated packets will be successfully acknowledged.
   This conservative CWND calculation ensures the sender drains the path
   after this potentially severe congestion event.  There is no further
   increase in CWND in this phase.

   The sender continues to receive ACKs for the remaining 86 (121-35)
   unvalidated packets.  Recall that the 35th unvalidated packet was
   lost and had packet number 64 (29+35).  The PipeSize tracks the
   capacity discovered by acknowledgments for the unvalidated packets
   (i.e., the PipeSize is increased for each received ACK that
   acknowledges new data).  Although this PipeSize cannot be used to
   safety initialise the CWND (because it was measured when the sender
   had aggressively created overload), the estimated PipeSize (which, in
   this case, is 121-1 = 120 packets) can be used to set the ssthresh on
   exit from Safe Retreat, since it does indicate a measured upper limit
   to the current capacity.





Kuhn, et al.            Expires 1 September 2025               [Page 28]

Internet-Draft               Careful Resume                February 2025


   At the point where all packets sent in the Unvalidated Phase have
   been either acknowledged or have been declared lost, the sender
   updates ssthresh and enters the Normal Phase.  Because CWND will now
   be less than ssthresh, a sender in the Normal Phase is permitted to
   use Slow-Start to grow the CWND towards the ssthresh, after which it
   will enter congestion avoidance.

Appendix B.  Implementation Notes for using BBR

   Bottleneck Bandwidth and Round-trip propagation time (BBR) uses
   recent measurements of a transport connection's delivery rate, round-
   trip time, and packet loss rate to build an explicit model of the
   network path.  BBR then uses this model to control both how fast it
   sends data and the maximum volume of data it allows in flight in the
   network at any time[I-D.ietf-ccwg-bbr].

   When the flow is controlled using BBR, Careful Resume is implemented
   by setting the pacing rate from the saved CC parameters, with the
   following precautions:

   *  The flag "carefully-resuming" is added to the BBR state to
      indicate that the sender is allowed to send unvalidated packets.
      This is initialised to "False" when the BBR flow starts;

   *  Prerequisites for using Careful Resume are described in
      Section 3.2.

B.1.  Sending unvalidated packets using BBR

   Careful Resume is allowed to transmit unvalidated packets only when
   the BBR flow is in the Startup state.

   The probing rate is configured to 1/2 of the bottleneck bandwidth,
   derived from the CWND calculation specified in the saved CC
   parameters according to the requirements in Section 3.3.

   The sender starts the Unvalidated Phase at the beginning of a BBR
   round, and sets the "carefully-resuming" flags to "True".  When this
   "carefully-resuming" flag is set, the BBR congestion controller sets
   the BBR pacing rate to the larger of the nominal pacing rate (BBR.bw
   multiplied bytes BBRStartupPacingGain) or the calculated probing
   rate.  Then, CWND is set to the larger of BBR.bw and the probing
   rate, multiplied by BBR.rtt_min times BBRStartupCwndGain.








Kuhn, et al.            Expires 1 September 2025               [Page 29]

Internet-Draft               Careful Resume                February 2025


   The "carefully-resuming" flag is reset to False two rounds after it
   is set, i.e., after all the packets sent in the first round of
   "carefully resuming" have been received and acknowledged by the peer.
   At that stage (after the capacity has been validated), the measured
   delivery rate is expected to reflect the probing rate.

   If congestion is experienced while the "carefully-resuming" flag is
   True, BBR exits the Startup state and enters the Drain state
   (implementing the Safe Retreat Phase).

B.2.  Validation for BBR

   When using BBR, the Validation Phase is realised using the BBR rules
   for exiting Startup.  Upon exiting Startup, the connection estimates
   that the measured delivery rate will reflect the flow's share of the
   actual bottleneck bandwidth.  If congestion is experienced (e.g.,
   packet losses were detected) while using careful resume (i.e, the
   "carefully-resuming" flag is True), BBR then exits the Startup state
   and enters the Drain state.

B.3.  Safe Retreat for BBR

   When using BBR, the Safe Retreat Phase is entered if the Drain state
   is entered while the "carefully-resuming" flag (see Appendix B) is
   still True, i.e., if less than 2 full rounds have elapsed after the
   sender entered the Unvalidated Phase.  The delivery rates measured in
   these conditions are tainted, because packets sent during the attempt
   are still queued at the bottleneck and may have "pushed out"
   competing traffic.  The delivery rates measured in Drain state MUST
   be discarded if the "carefully-resuming" flag is set to True.  This
   flag is cleared upon exiting the Drain state.

Appendix C.  Internet Draft Revision details

   Previous individual submissions were discussed in TSVWG and QUIC.

      WG -00 included clarifications and restructuring to form the 1st
      WG draft.

      WG -01 included review comments and suggestions from John Border,
      and follows the setting of the TSVWG milestone with an intended
      status of "Proposed Standard".

      WG -02 includes steps to complete the spec.  In particular,
      consideration of rate-limited senders; selection of reasoned
      parameters; specification of the Safe Retreat Phase; and
      improvements to the consistency throughout.  Added the Validating
      Phase.



Kuhn, et al.            Expires 1 September 2025               [Page 30]

Internet-Draft               Careful Resume                February 2025


      WG -03, explain entry to Validating Phase, editorial tidy.

      WG -04, update based on review comments from Kazuho Oku.

      WG-05, update based on review comments from Neal Cardwell.  WG
      feedback from IETF-118.  Reviewed the requirements v. guidelines;
      clarified that CC is not changed in recon., but the recon. info is
      used to steer the next phase; clarified saved_cwnd can be computed
      from ACK rate; use jump once; that real server platforms are
      complex.  Clarified lifetime for saved CC params.  Incorporates
      comments from Tong.

      WG-06, SR updated following Hackathon comments from Kazuho Oku,
      and rework of use of PipeSize.  Added an informative summary of
      actions, on suggestion by Tong.  Added examples based on text by
      Ana Custura.

      WG-07, Use "rate-limited" uniformly instead of application and
      data limited.

      Updated to exit early when the unvalidated CWND not utilised,
      detected in tests by Q Misell.  Change pipe_size to be PipeSize.

      WG-08, Updated CDDL, and made constraints to Observing into
      guidance, they say what makes sense - but do not need to be
      followed for conformance.  Updated table in the appendix to align
      with text.

      WG-09, Cleaning text to separate guidelines and specification and
      adjust wording to improve clarity based on questions received
      during implementation.

      WG-10, CH developed text to explain expected operation with BBR.
      This also fixed some typos introduced in previous edits.  Fix XML
      and fix CDDL bugs for submission.

      Changed the ssthresh value used after an exit of Safe Retreat to
      be (PipeSize/2).

      WG-11, JD fixed mistakes.  GF clarified text.  RS added that after
      SR, ssthresh ought to match the behaviour of Cubic/Reno.  updated
      the text to be allow for an implementation to update CWND ahead of
      entering the Unvalidated Phase, and to clarify that the
      Unvalidated Phase starts when the first unvalidated packet is
      actually sent.

      WG-12, JD, MH and GF clarified text, and checked for typos.




Kuhn, et al.            Expires 1 September 2025               [Page 31]

Internet-Draft               Careful Resume                February 2025


      WG-13, After WGLC including NC and MK's review comments.

      WG-14, removed qlog to a separate ID (LP), made relevant to other
      transports (MK).

      WG-15, fixes typos and aligns with new Rev of the Qlog spec.

Authors' Addresses

   Nicolas Kuhn
   Thales Alenia Space
   Email: nicolas.kuhn.ietf@gmail.com


   Emile Stephan
   Orange
   Email: emile.stephan@orange.com


   Godred Fairhurst
   University of Aberdeen
   Department of Engineering
   Fraser Noble Building
   Aberdeen
   AB24 3UE
   United Kingdom
   Email: gorry@erg.abdn.ac.uk


   Raffaello Secchi
   University of Aberdeen
   Department of Engineering
   Fraser Noble Building
   Aberdeen
   AB24 3UE
   United Kingdom
   Email: r.secchi@erg.abdn.ac.uk


   Christian Huitema
   Private Octopus Inc.
   Email: huitema@huitema.net









Kuhn, et al.            Expires 1 September 2025               [Page 32]