| Internet-Draft | FlowSpec by Destination-QP | July 2026 |
| Li, et al. | Expires 4 January 2027 | [Page] |
BGP Flowspec mechanism (BGP-FS) [RFC8955] [RFC8956] propagates both traffic Flow Specifications and Traffic Filtering Actions by making use of the BGP NLRI and the BGP Extended Community encoding formats. This document specifies a new BGP-FS component type named Destination-QP (Destination Queue Pair) to support filtering by Destination-QP.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 4 January 2027.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
BGP Flowspec mechanism (BGP-FS) [RFC8955] [RFC8956] propagates both traffic Flow Specifications and Traffic Filtering Actions by making use of the BGP NLRI and the BGP Extended Community encoding formats.¶
In modern AI training clusters, especially those based on RoCEv2 and RDMA for high-performance inter-GPU communication, traffic exhibits distinct characteristics that differ significantly from traditional Internet flows. AI training communication typically consists of long-lived, high-throughput, and delay-sensitive and jitter-sensitive flows, such as those generated by collective communication operations like AllReduce. These flows are often bound to long-term Queue Pair (QP) [IB-SPEC] instances, with relatively stable five-tuple fields, making them prone to path polarization and uneven link utilization when scheduled solely by five-tuple-based hashing. Such static mapping fails to adapt to the dynamic communication patterns and strict performance requirements of AI workloads, leading to localized congestion, degraded training throughput, and reduced cluster efficiency.¶
For these reasons, five-tuple-only flow scheduling is no longer sufficient for AI-oriented lossless networks. Instead, scheduling mechanisms based on a combination of five-tuple and QP information have become necessary. By introducing QP-level identification into the flow distribution logic, networks can achieve finer-grained traffic steering, better load balancing across equal-cost multi-path (ECMP) groups, and improved isolation between communication streams.¶
As shown in Figure 1, the controller uses BGP Flow-Spec to distribute QP routes to the Ingress PE according to QP. Based on the QP value, traffic is redirected to different forwarding paths. Methods for redirecting traffic to different paths can include redirection to IP [I-D.ietf-idr-flowspec-redirect-ip] or redirection to SRv6 [I-D.ietf-idr-srv6-flowspec-path-redirect], among others. Specific methods are beyond the scope of this document.¶
At the forwarding level, when the DC sends packets, it carries the corresponding QP for traffic belonging to the same task. The Ingress PE looks up the QP routes distributed via BGP Flow-Spec based on the QP of the traffic and forwards the packets according to the QP routes.¶
+---------+
|Controller|
+----+----+
|
|BGP Flow-Spec
|for Destination-QP
|
+----+ +---V----+ +-- Path1--+ +--------+
|DC | |Ingress | | | |Egress |
| +-------+PE +-------+ +------+PE |
+----+ +--------+ +-- Path2--+ +--------+
This document specifies a new BGP-FS component type named Destination-QP (Destination Queue Pair) to support filtering by Destination-QP.¶
[I-D.ietf-idr-flowspec-v2] defines the Components in the IP Basic TLV. This document proposes a new Component for Destination-QP information.¶
The following new component type is defined:¶
Destination-QP¶
Type TBD - Destination-QP¶
Length: variable¶
Component Value format: [numeric_op, value]+¶
Each Destination-QP value is 4 octets.¶
Per section 10 of [RFC8955], if a receiving BGP speaker cannot
support this new Flow Specification component type, it MUST discard
the NLRI value field that contains such unknown components. Since the
NLRI field encoding (Section 4 of [RFC8955]) is defined in the form
of a 2-tuple <length, NLRI value>, message decoding can skip over
the unknown NLRI value and continue with subsequent remaining NLRI.¶
The BGP agent specifies the traditional 5-tuple and newly defined Destination-QP as matching criteria.¶
As shown in Figure 2, for traffic with a Destination-QP value of 1001, redirect it to Path1; for traffic with a Destination-QP value of 2001, redirect it to Path2.¶
BGP-FS Route 1:¶
FS Filters:¶
Destination: 203.0.113.0/24¶
Source address: 198.51.100.0/24¶
Protocol: UDP¶
Destination port: 4791 (RoCEv2 protocol)¶
Source port: 10001¶
Destination-QP value: 1001 (the newly defined in this document)¶
FS Action: Redirect Flow to Path1 (The specific format is not discussed in this document.)¶
BGP-FS Route 2:¶
FS Filters:¶
Destination: 203.0.113.0/24¶
Source address: 198.51.100.0/24¶
Protocol: UDP¶
Destination port: 4791 (RoCEv2 protocol)¶
Source port: 10001¶
Destination-QP value: 2001 (the newly defined in this document)¶
FS Action: Redirect Flow to Path2 (The specific format is not discussed in this document.)¶
+----------+
|Controller|
+----------+
|
|BGP FS:
|NLRI Filter
| Destination Prefix
| Source Prefix
| IP Protocol
| Destination Port
| Source Port
| Destination-QP
|
|Action:
| Redirect
|
V
+----------+
|Ingress PE|
+----------+
The Ingress PE receives the Flow-Spec route and installs it into the forwarding plane. Upon receiving AI data traffic, it redirects the traffic to the corresponding Path for forwarding based on the traffic 5-tuple and Destination-QP parameters.¶
In this document, the example uses QP, destination address, and source address as filtering criteria for flow-spec to redirect traffic to different paths.¶
| Destination QP | Dest Prefix | Source Prefix | Redirect |
|---|---|---|---|
| 1001 | 203.0.113.0/24 | 198.51.100.0/24 | Path1 |
| 2001 | 203.0.113.0/24 | 198.51.100.0/24 | Path2 |
No new security issues are introduced to the BGP protocol by this specification.¶
[I-D.ietf-idr-flowspec-v2] defines the Types for IP Filters. This document requested to assign a new type code point from “Non-IP Types for IP Filters” registry for Destination-QP.¶
| Type | Definition |
|---|---|
| 64 | Parts of SID |
| 65 | MPLS Match 1: Label in Label stack |
| 66 | MPLS Match 2: EXP bits in top Label |
| TBD | Destination-QP (This document) |
| 67-249 | Unassigned (reserved for now) |
| 250 | Filter Error handling |
| 251-255 | Reserved |
The authors would like to thank the contributors for their valuable feedback and contributions.¶
-00 Initial version.¶