| Internet-Draft | Fast CNP with Proxy | March 2026 |
| Min, et al. | Expires 16 September 2026 | [Page] |
This document describes the necessity and feasibility to introduce a proxy network node between the congested network node and the traffic sender. The proxy network node is used to translate the congestion notification. The congested network node sends the congestion notification to the proxy network node in a format defined in this document, and then the proxy network node translates the received congestion notification to a format known by the traffic sender and resends the translated congestion notification to the traffic sender.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 16 September 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
[I-D.xiao-rtgwg-rocev2-fast-cnp] defines a congestion notification message used in Remote Direct Memory Access (RDMA) over Converged Ethernet version 2 (RoCEv2) networks. This kind of congestion notification message is called RoCEv2 Fast Congestion Notification Packet (Fast CNP), which can be sent by a congested network node to the traffic sender directly. The RoCEv2 Fast CNP extends the standard RoCEv2 CNP ([IBTA-SPEC]) consumed by the traffic sender supporting RoCEv2.¶
RoCEv2 has already been widely deployed, and it runs the InfiniBand transport layer over UDP and IP protocols on an Ethernet network, bringing many of the advantages of InfiniBand to Ethernet networks. For a traffic sender supporting RoCEv2, congestion control is important, so while detecting congestion the RoCEv2 CNP or RoCEv2 Fast CNP must be used to alert the traffic sender slowing down the sending rate. For a traffic sender not supporting RoCEv2, congestion control is still important, so while detecting congestion the corresponding congestion notification message supported by the sender must be used to alert the traffic sender slowing down the sending rate.¶
Considering there are multiple different congestion notification messages existing for the traffic sender, if a congested network node would send a congestion notification message to the traffic sender directly, there is a prerequisite for the congested network node to know what kind of congestion notification message is supported by each specific traffic sender. Except for that precondition, there are two more problems as follows:¶
When the congested network node is a VPN Provider (P) router, it's difficult for the congested network node to send a congestion notification message to the traffic sender directly, because there are different routing domains for the VPN P router and VPN Customer Edge (CE) router.¶
When the traffic sender supports only standard RoCEv2 CNP, it's difficult for the congested network node to construct a standard RoCEv2 CNP (for details please refer to Section 3 of [I-D.xiao-rtgwg-rocev2-fast-cnp]).¶
A proxy network node between the congested network node and the traffic sender can help to resolve the problems described above, being independent of the extensions proposed in [I-D.xiao-rtgwg-rocev2-fast-cnp]. While detecting congestion, the congested network node sends a congestion notification message to a proxy network node first, and then based on the received congestion notification message, the proxy network node notifies the traffic sender about the congestion using a congestion notification message (e.g., the standard RoCEv2 CNP) known by the traffic sender. For the selection of the proxy network node, generally speaking the proxy network node should be positioned as close to the traffic sender as possible (e.g., the leaf switch within a data center), and there are at least three rules as follows:¶
The selected proxy network node must know what kind of congestion notification message is supported by the traffic sender.¶
The selected proxy network node and the congested network node must be within the same routing domain.¶
For RoCEv2 networks where the traffic sender supports only standard RoCEv2 CNP, the selected proxy network node must be able to learn the mapping table between the Source Queue Pair and the Destination Queue Pair through data traffic, which means the selected proxy network node must be located where both the forward and the reverse data traffic traverse.¶
How to select a proxy network node for a specific traffic sender is deployment specific and beyond the scope of this document.¶
This document describes the necessity and feasibility to introduce a proxy network node between the congested network node and the traffic sender. Specifically, the problem statement is described in Sections 1 and 3, and the format of the congestion notification message sent from the congested network node to the proxy network node is defined in Section 4, and the solution on how the congested network node knows the address mapping relationship between the proxy network node and the traffic sender is defined in Section 5.¶
ABR: Area Border Router¶
CNP: Congestion Notification Packet¶
DoS: Denial-of-Service¶
ECN: Explicit Congestion Notification¶
ELC: Entropy Label Capability¶
ELCv3: Entropy Label Characteristic¶
IBTA: InfiniBand Trade Association¶
PNC: Proxy Node Capability¶
RDMA: Remote Direct Memory Access¶
RoCEv2: RDMA over Converged Ethernet version 2¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
In the field of congestion control, there are already at least three kinds of referenced congestion notification mechanisms. This document introduces the fourth congestion notification mechanism called "Fast CNP with Proxy".¶
The first congestion notification mechanism is referred to as Classical Congestion Notification without Dedicated Packet, as shown in Figure 1.¶
Congestion Notification by TCP Marking
|<-------------------------------------------------------+
| |
| Congestion Notification by ECN Marking
| |------------>|
+--------+ +-------+ +-------+ +-------+ +--------+
|Traffic |====>|Network|====>|Network|====>|Network|====>|Traffic |
|Sender | |Node 1 | |Node 2 | |Node 3 | |Receiver|
+--------+ +-------+ +-------+ +-------+ +--------+
Congestion
Point
With this congestion notification mechanism, the traffic sender indicates that it supports the congestion notification from the traffic receiver by a specific Explicit Congestion Notification (ECN) marking within the IP header of the data packet, and the congested network node (Netwok Node 3 in Figure 1) notifies the traffic receiver about the congestion by a specific ECN marking. After receiving a data packet with the specific ECN marking, the traffic receiver would notify congestion to the traffic sender by a specific TCP marking within the TCP header of the data packet. [RFC3168] details how this kind of congestion notification mechanism works.¶
The second congestion notification mechanism is referred to as Classical Congestion Notification with Dedicated Packet, as shown in Figure 2.¶
Congestion Notification Packet Type 1
|<-------------------------------------------------------+
| |
| Congestion Notification by ECN Marking
| |------------>|
+--------+ +-------+ +-------+ +-------+ +--------+
|Traffic |====>|Network|====>|Network|====>|Network|====>|Traffic |
|Sender | |Node 1 | |Node 2 | |Node 3 | |Receiver|
+--------+ +-------+ +-------+ +-------+ +--------+
Congestion
Point
With this congestion notification mechanism, the traffic sender indicates that it supports the congestion notification from the traffic receiver by a specific ECN marking within the IP header of the data packet, and the congested network node (Netwok Node 3 in Figure 2) notifies the traffic receiver about the congestion by a specific ECN marking. After receiving a data packet with the specific ECN marking, the traffic receiver would notify congestion to the traffic sender by a dedicated congestion notification packet. [IBTA-SPEC] details an example on how this kind of congestion notification mechanism works.¶
The third congestion notification mechanism is referred to as Fast CNP without Proxy, as shown in Figure 3.¶
Congestion Notification Packet Type 2
|<-----------------------------------------+
| |
+--------+ +-------+ +-------+ +-------+ +--------+
|Traffic |====>|Network|====>|Network|====>|Network|====>|Traffic |
|Sender | |Node 1 | |Node 2 | |Node 3 | |Receiver|
+--------+ +-------+ +-------+ +-------+ +--------+
Congestion
Point
With this congestion notification mechanism, the congested network node (Netwok Node 3 in Figure 3) notifies the traffic sender about the congestion directly by a dedicated congestion notification packet. [I-D.xiao-rtgwg-rocev2-fast-cnp] details an example on how this kind of congestion notification mechanism works.¶
The fourth congestion notification mechanism is referred to as Fast CNP with Proxy, as shown in Figure 4.¶
Congestion Notification Packet Type 3
|<--------------------------+
| |
Congestion Notification Packet Type 4 |
|<-------------+ |
| | |
+--------+ +-------+ +-------+ +-------+ +--------+
|Traffic |====>|Network|====>|Network|====>|Network|====>|Traffic |
|Sender | |Node 1 | |Node 2 | |Node 3 | |Receiver|
+--------+ +-------+ +-------+ +-------+ +--------+
Congestion Congestion
Notification Point
Proxy
With this congestion notification mechanism, the congested network node (Netwok Node 3 in Figure 4) notifies the proxy network node about the congestion by a dedicated congestion notification packet, and then based on the received congestion notification packet, the proxy network node notifies the traffic sender about the congestion by a congestion notification message supported by the traffic sender. This document details how this kind of congestion notification mechanism works, except that the specific congestion notification message between the proxy network node and the traffic sender is beyond the scope of this document.¶
The congestion notification message sent from the congested network node to the proxy network node is a UDP message formatted as follows:¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDP Source Port | UDP Destination Port = TBD1 | +-------------------------------+-------------------------------+ | UDP Length | UDP Checksum | +-------------------------------+-------------------------------+ | | ~ IP Five-Tuple + Congestion Level ~ | | +---------------------------------------------------------------+ | As much of the invoking packet as possible | + without the UDP packet exceeding 576 bytes + | in IPv4 or 1280 bytes in IPv6 |
UDP Header: The UDP header as specified in [RFC768] includes the UDP source port, UDP destination port, UDP length, and UDP checksum. A well-known UDP destination port (TBD1) needs to be allocated for this Congestion Notification Message.¶
IP Five-Tuple: The IP five-tuple as described in [RFC6438] includes the source IP address, destination IP address, protocol number, source port number, and destination port number. The IP five-tuple is a data flow information copied from the data packets causing congestion, and it's used to identify a congested traffic flow for which the sending rate needs to be reduced by the traffic sender. When the congested network node is a VPN P router, the IP five-tuple is carried below the VPN encapsulation. The source IP address within the IP five-tuple is the IP address of the traffic sender, so it can be used by the receiving node (i.e., the proxy network node) to identify the traffic sender, to which the proxy network node would send a second congestion notification message based on the received congestion notification message. The format of the second congestion notification message sent from the proxy network node to the traffic sender is outside the scope of this document. Also note that in the RoCEv2 networks there is more fine-grained data flow infomation called Queue Pair, which is used in the RoCEv2 networks to identify a congested traffic flow for which the sending rate needs to be reduced by the traffic sender, and if that's the case, the proxy network node can parse the Queue Pair from the invoking packet contained in the last part of the congestion notification UDP message. How the proxy network node can parse the Queue Pair from the invoking packet is outside the scope of this document.¶
Congestion Level: This 3-bit field indicates the congestion level. Value 0 of this field represents the lowest congestion level and value 7 of this field represents the highest congestion level.¶
Before the congested network node can send the congestion notification message to the proxy network node, the congested network node has to know about the IP address of the proxy network node and the IP addresses of the traffic senders behind the proxy network node.¶
The proxy network node notifies the congested network node of its IP address by advertising its proxy node capability in advance. Even though the Proxy Node Capability (PNC) is a property of the node, it is necessary to advertise the PNC with some prefixes attached to the traffic senders behind the proxy network node. That means the proxy network node would advertise the mapping relationship between the IP address of the proxy network node and the IP addresses of the traffic senders, which enables the congested network node to establish the address mapping table between the traffic senders and the proxy network node. On the basis of the established address mapping table, when a congestion of the data traffic is detected, the congested network node would look up the source IP address of the data packet causing congestion, and send the congestion notification message to the proxy network node but not the traffic sender identified by the source IP address.¶
Analogous to the Entropy Label Capability (ELC) Flag (E-flag) defined in Section 3 of [RFC9088], a new bit PNC Flag (P-flag) is defined, which is Bit 7 in the Prefix Attribute Flags [RFC7794], as shown in Figure 6.¶
0 1 2 3 4 5 6 7... +-+-+-+-+-+-+-+-+... |X|R|N|E|A|U|U|P|... | | | | | | |P| |... +-+-+-+-+-+-+-+-+...
P-Flag: PNC Flag (Bit 7)¶
Set for the local host prefix of the originating node if it's used as a congestion notification proxy node for the prefix.¶
The PNC signaling MUST be preserved when a router propagates a prefix between ISIS levels [RFC5302].¶
Analogous to the ELC Flag (E-flag) defined in Section 3.1 of [RFC9089], a new bit PNC Flag (P-flag) is defined, which is Bit 2 in OSPFv2 Prefix Attribute Flags field [RFC9792], as shown in Figure 7.¶
0 1 2 3 4... +-+-+-+-+-+... |U|U|P| | |... | |P| | | |... +-+-+-+-+-+...
P-Flag: PNC Flag (Bit 2)¶
Set for the local host prefix of the originating node if it's used as a congestion notification proxy node for the prefix.¶
The PNC signaling MUST be preserved when an OSPFv2 Area Border Router (ABR) distributes information between areas. To do so, an ABR MUST originate an OSPFv2 Extended Prefix Opaque LSA [RFC7684] including the received PNC setting.¶
Analogous to the ELC Flag (E-flag) defined in Section 3.2 of [RFC9089], a new bit PNC Flag (P-flag) is defined, which is Bit 2 in OSPFv3 Prefix Attribute Flags field [RFC9792], as shown in Figure 8.¶
0 1 2 3 4... +-+-+-+-+-+... |U|U|P| | |... | |P| | | |... +-+-+-+-+-+...
P-Flag: PNC Flag (Bit 2)¶
Set for the local host prefix of the originating node if it's used as a congestion notification proxy node for the prefix.¶
The PNC signaling MUST be preserved when an OSPFv3 Area Border Router (ABR) distributes information between areas. The setting of the PNC Flag in the Inter-Area-Prefix-LSA [RFC5340] or in the Inter-Area-Prefix TLV [RFC8362], generated by an ABR, MUST be the same as the value the PNC Flag associated with the prefix in the source area.¶
Analogous to the Entropy Label Characteristic (ELCv3) TLV defined in Section 2.1 of [I-D.ietf-idr-elc], a new PNC characteristic TLV is defined, which uses code value TBD2 in "BGP Next Hop Dependent Characteristic Codes" registry requested by [I-D.ietf-idr-nhc], as shown in Figure 9.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Characteristic Code = TBD2 | Characteristic Length = 4/16 | +-------------------------------+-------------------------------+ ~ IP address of the Proxy Network Node ~ +---------------------------------------------------------------+
PNC TLV: code TBD2, length 4 or 16, and carries the IP address (4 octets for IPv4 or 16 octets for IPv6) of the proxy network node.¶
Carried for the local host prefix of the originating node if it's used as a congestion notification proxy node for the prefix.¶
The congestion notification from congested network node to the proxy network node MUST be applied in a specific controlled domain. A limited administrative domain provides the network administrator with the means to select, monitor, and control the access to the network, making it a trusted domain.¶
To avoid potential Denial-of-Service (DoS) attacks, it is RECOMMENDED that implementations apply rate-limiting policies when generating and receiving congestion notification messages.¶
A deployment MUST ensure that border-filtering drops inbound congestion notification message from outside of the domain and that drops outbound congestion notification message leaving the domain.¶
A deployment MUST support the configuration option to enable or disable the congestion notification proxy feature defined in this document. By default, the congestion notification proxy feature MUST be disabled.¶
This document requests the following allocations from IANA:¶
- A well-known UDP port number TBD1 from the System Ports range of the "Service Name and Transport Protocol Port Number" registry [RFC6335] is requested to be assigned to the Congestion Notification to Proxy Message. Specifically, IANA is requested to assign a UDP port as shown below for which the Assignee and Contact is the IESG and the IETF Chair, respectively.¶
| Service Name | Port Number | Transport Protocol | Description | Reference |
|---|---|---|---|---|
| Congestion Notification to Proxy | TBD1 | udp | Receiver Port for Fast CNP to Proxy | Section 4 of THIS_DOCUMENT |
- Bit 7 (suggested) in the "IS-IS Bit Values for Prefix Attribute Flags Sub-TLV" registry is requested to be assigned to the PNC Flag (P-Flag).¶
- Bit 2 (suggested) in the "OSPFv2 Prefix Extended Flags" registry is requested to be assigned to the PNC Flag (P-Flag).¶
- Bit 2 (suggested) in the "OSPFv3 Prefix Extended Flags" registry is requested to be assigned to the PNC Flag (P-Flag).¶
- Code value TBD2 in the "BGP Next Hop Dependent Characteristic Codes" registry (to be created based on the request from [I-D.ietf-idr-nhc]) is requested to be assigned to the PNC characteristic TLV.¶
Jeffrey Haas
HPE
Email: jeffrey.haas@hpe.com¶
The authors would like to acknowledge Jinghai Yu, Shaofu Peng, Liming Wu, Jin Yang, and Zheng Zhang for the very helpful discussions.¶