<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>

<?xml-stylesheet type="text/xsl" href="rfc2629.xslt"?>

<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  category="std"
  docName="draft-song-rtgwg-falcon-00"
  ipr="trust200902"
  obsoletes=""
  updates=""
  submissionType="IETF"
  xml:lang="en"
  tocInclude="true"
  tocDepth="4"
  symRefs="true"
  sortRefs="true"
  version="3">

  <!-- ============================================================ -->
  <!--  FRONT MATTER                                                -->
  <!-- ============================================================ -->
  <front>
    <title abbrev="FALCON">
      Fast Latency and Congestion Notification
    </title>

    <seriesInfo name="Internet-Draft" value="draft-song-rtgwg-falcon-00"/>

    <!-- ── Authors ──────────────────────────────────────────────── -->
    <author fullname="Haoyu Song" initials="H." surname="Song" role="editor">
      <organization>Futurewei Technologies</organization>
      <address>
		<postal>
          <country>US</country>
        </postal>  
        <email>haoyu.song@futurewei.com</email>
      </address>
    </author>

	<author fullname="Ying Wan" initials="Y." surname="Wan">
      <organization>Southeast University</organization>
      <address>
		<postal>
          <country>CN</country>
        </postal>  
        <email>wy25@seu.edu.cn</email>
      </address>
    </author>
	
	<author fullname="Keyi Zhu" initials="K." surname="Zhu">
      <organization>Huawei Technologies</organization>
      <address>
		<postal>
          <country>CN</country>
        </postal>  
        <email>zhukeyi@huawei.com</email>
      </address>
    </author>
	
    <!-- ── Working Group / Area ─────────────────────────────────── -->
    <workgroup>Routing Area Working Group</workgroup>

    <!-- ── Keywords (for search indexing) ──────────────────────── -->
    <keyword>FANTEL</keyword>
    <keyword>IOAM</keyword>
    <keyword>SRv6</keyword>

    <!-- ── Abstract ─────────────────────────────────────────────── -->
    <abstract>
      <t>
        This document describes a standard-based method for fast latency and congestion notification. 
		By combining in-network telemetry and source routing, it enables a source node to acquire a path's 
		latency and congestion status in less than half baseline RTT. 
        The more timely and accurate telemetry data allow the source node to apply more effective traffic steering and 
		congestion control actions. The method is applicable to both WAN and DCN, 
		and can be realized through existing IETF standards. 
      </t>
    </abstract>

  </front>

  <!-- ============================================================ -->
  <!--  MIDDLE (BODY)                                               -->
  <!-- ============================================================ -->
  <middle>

    <!-- ── Section 1: Introduction ─────────────────────────────── -->
    <section anchor="intro" numbered="true" toc="default">
      <name>Introduction</name>
      <t>
        Many congestion control (CC) and load balancing (LB) schemes rely on timely path congestion status 
		and/or the measurement of flow packet delay as the basis for path selection or rate limiting. 
		The problem and requirements are articulated in <xref target="I-D.dong-fantel-problem-statement" />.
      </t>
	  <t>
	    However, all conventional methods, either in-band (e.g., In-Network Telemetry or INT) or out-of-band (e.g., ping-mesh), 
		can only return network status or measurements which are at least one RTT old. Unfortunately, RTT is proportional to the path congestion degree. 
		The staleness is aggravated when the path is becoming more congested, which is exactly the moment the real-time network condition
		is needed the most. For example, a packet experiences serious congestion on its forwarding path and its ECN bit is set. 
		However, due to the congestion, it takes longer than usual time for the signal to be fed back to the source node.
		When the source node reacts to the "outdated" signal, it might be too late. 
		The problem is more severe in WAN because the RTT can be hundreds of milliseconds. 
		The belated action would be either futile or even counterproductive. 
	  </t>
      <t>
        Therefore, it is critical for the source node to know the most up-to-date network congestion status when making reaction decisions. 
		The root cause of the staleness of the status is that the data is collected on the probe's forward direction. 
		The lag between the data is collected by the probe and the data is received by the source node is determined by 
		the physical distance as well as the path congestion status.  
      </t>
      <t>
        This document introduces a new method, FALCON (FAst Latency and COgestion Notification), to improve the freshness of the 
		network congestion status sensed by the source node. 
		Specifically, the latency and congestion data are collected on the reverse path toward the source node rather than the packet forwarding path, so the notification lag 
		is reduced to less than half baseline RTT. The method combines the In-Network Telemetry (INT) and Source Routing (SR).  A standard compliant
		implementation can take advantages of IETF IOAM <xref target="RFC9197" /> and SRv6 <xref target="RFC8754" />. 
		The basic approach is as follows: on the forward direction, the source node initiates INT to track the path by recording the 
		nodes (and ports if necessary); the receiver node uses the INT tracked path to generate the reverse path toward the source node, and 
		use SR plus INT to send a packet to the source node which collects the latency and congestion status along the path. 
      </t>
    </section>

    <!-- ── Section 2: Terminology ───────────────────────────────── -->
    <section anchor="terminology" numbered="true" toc="default">
      <name>Terminology</name>

      <section anchor="requirements-language" numbered="true" toc="default">
        <name>Requirements Language</name>
        <t>
          The key words <bcp14>MUST</bcp14>, <bcp14>MUST NOT</bcp14>,
          <bcp14>REQUIRED</bcp14>, <bcp14>SHALL</bcp14>,
          <bcp14>SHALL NOT</bcp14>, <bcp14>SHOULD</bcp14>,
          <bcp14>SHOULD NOT</bcp14>, <bcp14>RECOMMENDED</bcp14>,
          <bcp14>NOT RECOMMENDED</bcp14>, <bcp14>MAY</bcp14>, and
          <bcp14>OPTIONAL</bcp14> in this document are to be interpreted
          as described in BCP 14 <xref target="RFC2119"/>
          <xref target="RFC8174"/> when, and only when, they appear in
          all capitals, as shown here.
        </t>
      </section>

      <section anchor="definitions" numbered="true" toc="default">
        <name>Definitions</name>
        <t>This document uses the following terms:</t>
        <dl newline="true" spacing="normal">
          <dt>INT:</dt>
          <dd>In-Network Telemetry. A normal data packet or a dedicated probe packet can carry an instruction header to collect
			  network data on the network nodes along the packet forwarding path. The collected data can be added to the packet.
			  IOAM trace mode <xref target="RFC9197" /> is an INT example.
		  </dd>

          <dt>SR:</dt>
          <dd>Source Routing. The sender node designates the forwarding path of a packet by listing the network nodes on the path.
			  SRv6 <xref target="RFC8754" /> realizes SR in IPv6.
		  </dd>
        </dl>
      </section>
    </section>

    <!-- ── Section 3: Problem Statement ────────────────────────── -->
    <section anchor="problem" numbered="true" toc="default">
      <name>Problem Statement</name>
      <t>
        A network node (source) needs to sense the current network status (e.g., path latency, congestion, etc.) to make timely reaction for traffic
		toward another network node (destination).
		Due to the limitation of physics, the lower bound of the data lag is determined by the physical distance between the source and 
		the destination. In reality, the minimum lag also include the basic forwarding delay (excluding queuing delay) per network node 
		on the path. This minimum lag is exactly the half of the minimum RTT (i.e., baseline RTT) and usually less than the half of the actually measured RTT due to queuing delay.
		We aims to achieve the minimum lag for the latency and congestion notifications. 	
      </t>
      <t>
        The solution described in this document <bcp14>MUST</bcp14>
        satisfy the following requirements:
      </t>
      <ul spacing="normal">
        <li>Accurate: The notification reflects the true status of the intended path.</li>
        <li>Timely: The notification freshness approaches the theoretical limit.</li>
        <li>Lightweight: The solution requires low packet/bandwidth overhead and low implementation complexity. </li>
		<li>Standard compliant: The solution can be implemented with existing protocols.</li>
      </ul>
    </section>

    <!-- ── Section 4: Solution Overview ────────────────────────── -->
    <section anchor="solution" numbered="true" toc="default">
      <name>Solution Overview</name>
	  
	  <t> FALCON meets all the requirements listed above.</t>
	  
      <t>
        In the forward direction from a sender node S to a receiver node R, 
		a packet P (which can be either a normal packet of a flow F or a dedicated probing packet) 
		is added an INT header which instructs the packet to record the sequence of routers or switches it traverses in order 
		(e.g., S_1, S_2, ...S_n). When R receives P, it constructs a responding packet P’ for it. An SR header is added to P’ 
		which dictates the path of P’ as S_n -> ... -> S_2 -> S_1 -> S (i.e., reverse the path P takes). 
		An INT instruction is also added to P’ to ask it to collect the congestion and/or delay information on each router or switch. 
		P’ is given a high forwarding priority, so it experiences zero or negligible queuing delay and can be considered to 
		only experience the baseline propagation delay (i.e., the minimum lag).
      </t>

	  <t>
		Specifically, if P’ reaches a switch S_i through its port r, we know P in the forward direction leave S_i through the same port r. 
		So the egress queue depth of port r can be used to acquire the more up-to-date congestion status and the queuing delay for the flows on
		this path.
	  </t>
	  
	  <t>If the queue depth exceeds the ECN threshold, ECN congestion bit can be set. As a result, S get the ECN status with the 
	    minimum lag which is shorter than half RTT.
		The queuing delay on S_i can be calculated by dividing the queue size in bytes by the egress port bandwidth. 
		If P’ acquires and accumulates the queuing delay for each S_i on its path, then the queuing delay of the full path can be 
		accurately approximated at S, and its staleness is also less than half RTT.
	  </t>
	  
    </section>

    <!-- ── Section 5: Detail Description ───────────────────── -->
    <section anchor="spec" numbered="true" toc="default">
      <name>Detail Description</name>
      <t>
        The path delay for P contains two parts: the baseline propagation delay and the queuing delay. 
		The queuing delay is the good indicator to the congestion degree. It can be acquired throughput 
		the back-tracing packet P'. The propagation delay can also be measured by P’. Since P' suffers 
		no queuing delay, so the delay experienced by P’ is exactly the propagation delay which is also identical to the propagation delay
		of P due to the path symmetry. Thus, we can add the propagation delay of P’ to the calculated queuing delay to 
		get the full path delay for P.
      </t>
	  
	  <t>
	    If the switches/routers have the computing capability, they can directly calculate the local queuing 
		delay and add it to the accumulated delay from previous switches/routers. Thus, P’ needs only carry 
		an accumulated queuing delay value with a smaller packet overhead. 
		When S receives P’, it can directly retrieve the path queuing delay.
		If the congestion control scheme only needs the information (e.g., queue length) on the bottleneck node, the switches/routers
		can also easily support it by simple compare-and-swap operations.
	  </t>

	  <t>
	    If the switches/routers have limited computing power, on-path accumulation and aggregation can be infeasible. In this case, 
		P’ just needs to collect the queue sizes on each network node 
		(or accumulate them into a single value if all the links have the same bandwidth) and present the data to S 
		to let S calculate the actual queuing delay with the knowledge of the link bandwidth.
	  </t>
	  
	</section>  
	
    <!-- 6 Implementation -->
    <section anchor="implementation" numbered="true" toc="default">
      <name>Implementation and Gap Analysis</name>
      <t>
        INT can use IETF IOAM trace mode <xref target="RFC9197" /> to collect node ID and possibly egress interface.
		In IPv6-based network, SR can use IETF SRv6 SRH <xref target="RFC8754" /> to backtrack the path.
	  </t>
	  
	  <t>
	    The switches and routers on path are addressed through IP addresses. 
		Each L2 switch only has a single IP address, so it is enough for P to just record its IP address and use it to construct the SRH for P’. 
		In contrast, each port on a router has an IP address. In this case, P records the egress port’s address on each router which is used 
		to construct the SRH for P’. In a managed network, it also possible for P to just record the unique device ID, and the receiver node R, 
		when constructing the SRH, will translate device ID to IP address based on a pre-configured mapping table.
	  </t>
	  
	  <t>Standard is needed to support IOAM encapsulation in SRv6 packet. A possible solution is given in <xref target="I-D.song-spring-siam"/>. 
	  </t>
	  
	  <t>IOAM may need to be extended to support new data types (TBD) </t>
	  
	</section>  
    
    <!-- ── Section 7: Security Considerations ───────────────────── -->
    <section anchor="security" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>
        The implementation follows the security considerations for IOAM and SRv6.
      </t>
    </section>

    <!-- ── Section 8: IANA Considerations ──────────────────────── -->
    <section anchor="iana" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>
        TBD.
      </t>
    </section>

  </middle>

  <!-- ============================================================ -->
  <!--  BACK MATTER                                                 -->
  <!-- ============================================================ -->
  <back>

    <!-- ── References ───────────────────────────────────────────── -->
    <references>
      <name>References</name>

      <!-- Normative References -->
      <references anchor="normative-refs">
        <name>Normative References</name>

		<?rfc include='reference.RFC.2119'?> 	
		<?rfc include='reference.RFC.9197'?>
		<?rfc include='reference.RFC.8754'?> 	
		<?rfc include='reference.RFC.8174'?> 		
		<?rfc include='reference.I-D.dong-fantel-problem-statement'?> 	

      </references><!-- end normative -->

      <!-- Informative References -->
      <references anchor="informative-refs">
        <name>Informative References</name>

		<?rfc include='reference.I-D.song-spring-siam'?>
		
      </references><!-- end informative -->

    </references><!-- end references -->

    <!-- ── Acknowledgments ──────────────────────────────────────── 
    <section anchor="acknowledgments" numbered="false" toc="default">
      <name>Acknowledgments</name>
      <t>
        The authors would like to thank
        [Name 1], [Name 2], and [Name 3]
        for their valuable reviews and feedback.
      </t>
    </section>
	-->
	
  </back>

</rfc>