Interdomain Working Group O. Vroonen, Ed. Internet-Draft S. Litkowski Updates: 4271 (if approved) Cisco Intended status: Standards Track K. Chandi Expires: 27 December 2026 Bell Canada J. Dong Huawei 25 June 2026 BGP best path next-hop selection enhancements draft-vroonen-idr-bgp-bestpath-nh-selection-02 Abstract BGP [RFC4271] has originally been designed to carry IPv4 routing information over the Internet. IP routing being "hop-by-hop" in nature, NEXT_HOP which purpose is to carry the address of the next router to send the IP packet to. In BGP, the next-hop may not be a directly connected router, hence, when evaluating paths, a BGP speaker must determine if the next-hop is resolvable and, if so, determine the internal cost to reach it. The incremental use of tunneling technologies to carry traffic between routers (e.g.: GRE, MPLS, SR-MPLS, SRv6...) may violate the assumption that the address carried in the NEXT_HOP is representative of the actual forwarding next-hop. These technologies decouple the BGP control-plane's view of the next-hop from the data-plane's actual forwarding endpoint. This document describes the problems that arise from this decoupling. These problems include sub-optimal path selection, incorrect resolvability tracking of the forwarding path leading to traffic drop or misrouting, and others. This document proposes some modification of BGP path selection procedures to accommodate these use cases. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Vroonen, et al. Expires 27 December 2026 [Page 1] Internet-Draft BGP nh selection enhancements June 2026 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 27 December 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 2. Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Unicast over a BGP-free core . . . . . . . . . . . . . . 4 2.2. Segment Routing Traffic Engineering Policy . . . . . . . 5 2.3. SRv6 services . . . . . . . . . . . . . . . . . . . . . . 7 3. Modification of the BGP route selection process . . . . . . . 8 3.1. Forwarding address . . . . . . . . . . . . . . . . . . . 8 3.2. Resolution constraints . . . . . . . . . . . . . . . . . 9 3.3. Route resolvability condition . . . . . . . . . . . . . . 9 3.4. Internal cost determination . . . . . . . . . . . . . . . 10 3.5. Next hop and forwarding address tracking . . . . . . . . 13 4. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5. Example with Intent-Aware BGP Transport . . . . . . . . . . . 15 6. Operational Considerations . . . . . . . . . . . . . . . . . 18 7. Relationship to draft-ietf-idr-bgp-bestpath-selection-criteria . . . . . 19 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 9. Security Considerations . . . . . . . . . . . . . . . . . . . 20 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 10.1. Normative References . . . . . . . . . . . . . . . . . . 20 10.2. Informative References . . . . . . . . . . . . . . . . . 21 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 Vroonen, et al. Expires 27 December 2026 [Page 2] Internet-Draft BGP nh selection enhancements June 2026 1. Introduction BGP [RFC4271] is designed to exchange network reachability information between routing domains. A BGP update typically contains a prefix and a set of path attributes, including NEXT_HOP. The receiving router uses this next-hop address to determine the egress point for traffic destined for the advertised prefix. The underlying assumption is that the next-hop address represents the actual next router in the data plane's forwarding path. [RFC4271] Section 9.1.2.1 defines the route resolvability condition: a BGP route is considered unresolvable if the BGP speaker's routing table has no route matching the BGP route next-hop address. As per [RFC4271] Section 9.1.2.2 e), when comparing paths received via internal BGP (IBGP), the metric to reach the BGP route's next-hop address, found in the routing table, is used to determine the best path. AS 1 <---------------------------------> +------+ P/m --| RTR1 |----------------+ +------+ (N1, metric=1) | | +------+ | RTR3 | +------+ | +------+ (N2, metric=2) | P/m --| RTR2 |----------------+ +------+ Figure 1 In Figure 1, RTR3 receives prefix P/m from RTR1 and RTR2 (IBGP peers) with NEXT_HOP N1 and N2. RTR3 resolves N1 with a cost of 1 and N2 with a cost of 2. Based on [RFC4271] procedures, RTR3 will select the path with the lowest cost to the NEXT_HOP address, so RTR3 will select the path received from RTR1. In this example, the path used to resolve N1 and N2 reflects the actual forwarding path, which makes RTR3's best-path decision accurate. Vroonen, et al. Expires 27 December 2026 [Page 3] Internet-Draft BGP nh selection enhancements June 2026 This document describes cases where the NEXT_HOP used in the BGP update is not representative of the actual forwarding path. In these cases, the resolvability condition may fail in its goal and path selection may be done on inaccurate criteria leading to suboptimal routing, network congestion, traffic drop or misrouting... These use cases are not new and may have been partially addressed by IETF standards or drafts, some references are provided below: * [I-D.ietf-idr-bgp-bestpath-selection-criteria] addresses the case of MPLS networks and proposes a modification of the route resolvability condition to be performed using forwarding database of a particular data plane protocol. It also proposes an optional data path verification. * [RFC9012] Section 7 improves the route resolvability condition by verifying that there is a feasible tunnel. However, it doesn't take into account that the cost associated with the tunnel may be different from the cost associated with the BGP next-hop. * [RFC9252] defines SRv6 overlay services signaling using BGP. The procedures involve the advertisement of an SRv6 Service TLV within the BGP Prefix SID attribute to signal the SID to be used for forwarding. RFC9252 highlights that ingress PE must perform a resolvability check for the SRv6 SID in addition to the resolvability check done on the NEXT_HOP. However, it doesn't take into account that the cost associated with the SRv6 SID may be different from the cost associated with the BGP next-hop. This document defines generic modifications to the BGP decision process that can apply to all the use cases. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Use cases 2.1. Unicast over a BGP-free core Vroonen, et al. Expires 27 December 2026 [Page 4] Internet-Draft BGP nh selection enhancements June 2026 [D=6] [D=6] +-----+ (M=1) +----+ (M=1) P/m -- | PE1 |-------| P1 |--------+ (N1) +-----+ +----+ | (M=1) \ / | [D=14] \ / | \ / +-----+ X | PE3 | / \ +-----+ (M=5) / \ | [D=14] / \ | (N2) +-----+ +----+ | P/m -- | PE2 |-------| P2 |---------+ +-----+ (M=5) +----+ (M=5) [D=2] [D=2] Figure 2: Next hop attribute not used for forwarding In Figure 2, an IPv4 unicast service is deployed over a BGP-free MPLS core. Let's assume PE3 receives prefix P/m from PE1 and PE2 with NEXT_HOP N1 and N2. The existence of a route in PE3's IPv4 routing table to reach N1 or N2 (defined in [RFC4271]) is not sufficient to ensure that traffic can be carried from PE3 to PE1 or PE2. PE3 needs to ensure that there is a tunnel available to reach PE1 or PE2 that can carry MPLS traffic (e.g.: an MPLS Label Switched Path (LSP)). By checking only for the existence of a route in its routing table, PE3 could create a traffic drop or misrouting if there is no tunnel to carry the IPv4 unicast traffic. This use case is also considered in [I-D.ietf-idr-bgp-bestpath-selection-criteria]. 2.2. Segment Routing Traffic Engineering Policy When using SR Policies [RFC9256], the BGP NEXT_HOP may not accurately represent the actual forwarding path. When SR Policy candidate paths are distributed via BGP [RFC9830], the metric of their segment lists can be carried as specified in [I-D.ietf-idr-sr-policy-metric], allowing the SR Policy metric to be taken into account in the best- path selection. In Figure 2, a BGP VPN service (as defined in [RFC4364]) is deployed between the PEs. Let's assume PE3 receives prefix P/m from PE1 and PE2 with NEXT_HOP N1, N2 and color C1. On PE3, SR Policies (C1, N1) and (C1, N2) are configured to use the low latency path. * Costs M in parentheses represents cost of IGP. * Costs D in brackets represents cost of latency-based metric Vroonen, et al. Expires 27 December 2026 [Page 5] Internet-Draft BGP nh selection enhancements June 2026 If PE3 performs next-hop resolution based solely on the BGP NEXT_HOP, it will not verify the existence and state of a corresponding SR Policy (C1, Nx). Consequently, it may select a suboptimal path by considering the cost to the NEXT_HOP address instead of the cost of the SR policy. The best path considering SR-policy (C1, Nx) cost is via PE2, while the best path considering IGP metric is via PE1. +========+==========+======+ | Prefix | Next hop | Cost | +========+==========+======+ | N1 | P1 | 2 | +--------+----------+------+ | N2 | P1 | 6 | +--------+----------+------+ Table 1: IPv4 routing table of PE3 +===========+==========+======+=======+ | SR Policy | Next hop | Cost | State | +===========+==========+======+=======+ | (C1, N1) | P1 | 12 | Up | +-----------+----------+------+-------+ | (C1, N2) | P2 | 4 | Up | +-----------+----------+------+-------+ Table 2: SR Policies of PE3 Considering that PE3 may take into account the cost of the SR policy and then picks up the path from PE2 as best. If the SR Policy (C1, N2) is down or not present, and considering that color C1 is using CO-flag equal to 00 ([RFC9246] Section 8.8.1), path from PE2 is authorized to use IGP path to next-hop N2 (fallback to IGP path via P1 with cost=6) as there is no policy (C1, N2) available. This situation has multiple drawbacks: * The costs of the two BGP paths are not directly comparable, which makes the comparison inaccurate: the path via PE1 uses a latency- based cost, while the fallback path via PE2 uses an IGP metric. * Using a path without an SR-policy may violate the intent of the service, especially if there is an alternate path (from PE1) that fulfills the intent. It may be desirable for PE3 to prefer the path from PE1 that satisfies color C1 or even exclude any path that does not meet the color C1 requirement. Vroonen, et al. Expires 27 December 2026 [Page 6] Internet-Draft BGP nh selection enhancements June 2026 2.3. SRv6 services When using SRv6 services as defined in [RFC9252], the BGP NEXT_HOP may not be representative of the actual forwarding path. Considering Figure 2, an SRv6-based BGP VPN is deployed between the PEs. PE1 uses locator L1 for algorithm 0 and L1_FA for flexible algorithm 128 ([RFC9350]) optimized for low-latency. Similarly, PE2 uses locator L2 for algorithm 0 and L2_FA for flexible algorithm 128. Let's assume PE3 receives prefix P/m from PE1 and PE2 with NEXT_HOP N1, N2 and SRv6 SID S1 and S2. SID S1 and S2 are allocated respectively from L1_FA and L2_FA. * Costs M in parentheses represents cost of algorithm 0 (as defined in [RFC8665] Section 8.5). * Costs D in brackets represents cost of flexible algorithm [RFC9350] 129 which is using low-latency metric. The IPv6 routing table of PE3 contains the following entries: +========+==========+======+ | Prefix | Next hop | Cost | +========+==========+======+ | N1 | P1 | 2 | +--------+----------+------+ | N2 | P1 | 6 | +--------+----------+------+ | L1 | P1 | 2 | +--------+----------+------+ | L2 | P1 | 6 | +--------+----------+------+ | L1_FA | P1 | 12 | +--------+----------+------+ | L2_FA | P2 | 4 | +--------+----------+------+ Table 3: IPv6 routing table of PE3 Based on [RFC4271] procedures, PE3 verifies that NEXT_HOP address of each path is resolvable. Based on [RFC9252] procedures, PE3 verifies that S1 and S2 addresses are resolvable. PE3 will then select the path with the lowest cost to the NEXT_HOP address according to [RFC4271] . Cost to N1 is lower than cost to N2, so PE3 will select the path received from PE1. However, from a latency perspective, path to PE2 is the best one. Vroonen, et al. Expires 27 December 2026 [Page 7] Internet-Draft BGP nh selection enhancements June 2026 The problem of path suboptimality may also happen with algorithm 0, if for instance SRv6 traffic for algorithm 0 needs to be offloaded from PE1, operator may increase the metric of the locator (while not changing the metric of the next-hop) on PE1. In the example above, if L1 is advertised by PE1 with an offset of 1M, then PE3 will have to cost to L1 of 1000002 but N1 will still be reachable with a cost of 2. The cost to reach the SID must also be taken into account in this scenario to ensure that the traffic offload works properly. 3. Modification of the BGP route selection process 3.1. Forwarding address This document defines the forwarding address as the IP address of the next router to which packets are sent. The forwarding address may come from the NEXT_HOP or a different address which has been signaled in a different attribute along with the path. The following data already defined in BGP standards SHOULD be considered as forwarding addresses: * The Tunnel Egress Endpoint Sub-TLV contained in Tunnel Encapsulation attribute defined in [RFC9012] * SRv6 SID Information Sub-TLV contained within L3 or L2 SRv6 SID defined in [RFC9252] Unless a BGP update contains another type of forwarding address, the BGP NEXT_HOP is considered as the forwarding address. The forwarding address MAY be complemented by a forwarding context. The forwarding context characterizes more the forwarding path to be used. The following attributes defined in BGP standards are examples of forwarding context: * The Tunnel Encapsulation attribute defined in [RFC9012] * The Color extended community defined in [RFC9012] * The SRv6 Sub-TLV and Sub-sub-TLV contained within L3 or L2 SRv6 SID TLV as defined in [RFC9252] other than the SID itself Each new BGP extension SHOULD specify if an address carried by the extension must be considered as a forwarding address. The procedures defined in the next section SHOULD apply for any new forwarding address defined without having to redefine them. Vroonen, et al. Expires 27 December 2026 [Page 8] Internet-Draft BGP nh selection enhancements June 2026 3.2. Resolution constraints When the forwarding address and its context are identified for a BGP route, the implementation may know the required characteristics of the route to be used to resolve the forwarding address. Depending on the type of forwarding address and context, the implementation may need to ensure that the forwarding address is resolved through a specific type of route in a specific table. This resolution constraint may come from the forwarding context and/or may be configured locally. In some cases, operator may want to enforce that the forwarding address is resolved through a specific type of route. This can be achieved by configuring a local resolution constraint. Reusing the example defined in Section 2.1, if we consider an IPv4 unicast BGP service carried over a BGP free-core, the BGP NEXT_HOP must be reachable through a tunnel to allow the end-to-end packet delivery. Such a case cannot be derived from the BGP update context and BGP must be configured to resolve the NEXT_HOP only through tunnels (of any or specific types). Similarly, when aggregate routes are present in the routing table, user may want to prevent the forwarding address (which is a specific route) to be resolved over the aggregate routes. A resolution constraint based on prefix/mask can be done to avoid such resolution to be valid. An implementation MAY provide a set of configuration options for resolution constraints. Another example of a resolution constraint is the "Resolution Scheme", defined in BGP Classful Transport [RFC9832]. The "Resolution Scheme" constrains the resolution of the NEXT_HOP to an ordered set of transport classes dictated by the intent (Mapping Community) attached to the route. 3.3. Route resolvability condition This document updates [RFC4271] Section 9.1.2.1 as follows: * The route resolvability check for the BGP NEXT_HOP MUST continue to be done. * In addition, the route resolvability criteria SHOULD be performed based on the forwarding address. * When resolving the forwarding address, look up SHOULD be performed by applying the resolution constraints defined in Section 3.2. The resolvability check based on the forwarding address MAY be enabled through a configuration knob. Vroonen, et al. Expires 27 December 2026 [Page 9] Internet-Draft BGP nh selection enhancements June 2026 3.4. Internal cost determination For a prefix P/m, different BGP paths may use different forwarding addresses and contexts of various types. P/m Path1: NH=10.0.0.1 Color: 200 Path2: NH=2001::2 SRv6-SID: cafe:0:2:e002:: Path3: NH=10.0.0.3 Tunnel-encap(L2TPv3, endpoint: 10.0.0.3) Path4: NH=2001::4 SRv6-SID: cafe:0:4:e002:: Figure 3 Costs retrieved from different types of forwarding addresses or contexts may not be comparable because they are based on different sets of rules. For instance, path1 may leverage an SR policy (color 200, endpoint R1) optimizing for latency, so the cost of path1 will reflect the latency to R1. Path2 may use the IGP cost to R2. Path3 may have no cost. These values are not directly comparable. The issue of comparing costs of paths of various types is not new. In a regular IP routed network (without tunnels) that runs BGP, a prefix P/m may have paths with nexthops reachable through different IGPs or IGP route types, where costs computed by each IGP may use a different reference. While the routing table can tie-break across different protocols or route types for a single prefix using an internal preference (administrative distance) mechanism, this does not help when comparing BGP paths whose nexthops are resolved through different IGPs. In order to compare the paths, this document introduces the concept of forwarding address preference. The preference is a local numerical value. An implementation SHOULD pick the lowest value as the most preferred. This document updates [RFC4271] Section 9.1.2.2 e) as follows: * Remove from consideration any routes with an highest forwarding address preference value. This preference MAY be retrieved from the resolution lookup of the forwarding address or MAY be configured locally. * For remaining paths, the interior cost of a route is determined by the metric of the resolving route to the forwarding address applying the resolution constraints defined in Section 3.2. If the forwarding address for a route is reachable, but no cost can be determined, the cost SHOULD be set by default to the maximum allowed cost. Vroonen, et al. Expires 27 December 2026 [Page 10] Internet-Draft BGP nh selection enhancements June 2026 Using forwarding preference and forwarding address-based cost SHOULD be enabled through a configuration knob. With the example above and the internal tables defined below, and considering lowest preference value as the most preferred one, BGP would select Path2 as best. Path2 and Path4 have the lowest preference (10), then Path2 has the lowest internal cost (12). P/m Path1: NH=10.0.0.1 Color: 200, preference 100 (from table), cost 1001 Path2: NH=2001::2 SRv6-SID: cafe:0:2:e002::, preference 10 (from BGP), cost 12 Path3: NH=10.0.0.3 Tunnel-encap(L2TPv3, endpoint: 10.0.0.3, sessID: 1), preference 1000 (from BGP), cost max Path4: NH=2001::4 SRv6-SID: cafe:0:4:e002::, preference 10 (from BGP), cost 14 Figure 4 +===============+============+========+========================+ | Prefix, Color | Preference | Metric | Forwarding data | +===============+============+========+========================+ | 10.0.0.1, 200 | 10 | 1001 | interface IF1, label | | | | | stack {L1, L2, L3} | +===============+============+========+========================+ | 10.0.0.2, 200 | 10 | 1002 | interface IF2, label | | | | | stack {L4, L5} | +===============+============+========+========================+ | 10.0.0.3, 200 | 10 | 1003 | interface IF1, label | | | | | stack {L6, L7, L8, L9} | +===============+============+========+========================+ Table 4: IPv4 Color routing table Vroonen, et al. Expires 27 December 2026 [Page 11] Internet-Draft BGP nh selection enhancements June 2026 +===============+============+========+========================+ | Prefix | Preference | Metric | Forwarding data | +===============+============+========+========================+ | cafe:0:1::/48 | 5 | 11 | interface IF1, label | | | | | stack {L1, L2, L3} | +===============+============+========+========================+ | cafe:0:2::/48 | 5 | 12 | interface IF2, label | | | | | stack {L4, L5} | +===============+============+========+========================+ | cafe:0:3::/48 | 5 | 13 | interface IF1, label | | | | | stack {L6, L7, L8, L9} | +===============+============+========+========================+ | cafe:0:4::/48 | 5 | 14 | interface IF1, label | | | | | stack {L10, L11, L12} | +===============+============+========+========================+ Table 5: IPv6 routing table +=============+============+========+ | Destination | Session ID | Status | +=============+============+========+ | 10.0.0.3 | 1 | up | +=============+============+========+ Table 6: L2TP session table +==========================+==================================+ | Type | Preference | +==========================+==================================+ | MPLS LSP (any signaling) | 100 | +==========================+==================================+ | MPLS RSVP-TE LSP | 50 | +==========================+==================================+ | SRv6 SID | 10 | +==========================+==================================+ | Default | use value from table lookup, use | | | 1000 if table provided no value | +==========================+==================================+ Table 7: BGP forwarding address preference configuration Vroonen, et al. Expires 27 December 2026 [Page 12] Internet-Draft BGP nh selection enhancements June 2026 3.5. Next hop and forwarding address tracking A BGP speaker SHOULD track the resolvability of both the NEXT_HOP and the forwarding address. If either the NEXT_HOP or the forwarding address becomes unresolvable or if the cost to reach either the NEXT_HOP or the forwarding address changes, the BGP speaker MUST re- evaluate the best path selection for all prefixes using the affected NEXT_HOP or forwarding address. This tracking MUST be done for all paths, including the best path and non-best paths. 4. Example The example below illustrates the logic of forwarding address preference and cost comparison. +-----+ (M=5) P/m -- | PE1 |----------------+ (N1) +-----+ | | | +-----+ (M=10) +----+ (M=5) +-----+ P/m -- | PE2 |--------------| P1 |--------------| PE4 | (N2) +-----+ +----+ +-----+ | | +-----+ (M=15) | P/m -- | PE3 |----------------+ (N3) +-----+ Figure 5 In Figure 5, a prefix P/m is reachable by PE4 from PE1, PE2, PE3 (with respectively NEXT_HOP N1, N2 and N3). Considering that the network is a BGP free core, traffic must be tunneled between edge devices. Traffic destined to P/m is of high bandwidth and requires traffic-engineering to spread the traffic across the available links of the core. RSVP-TE is used to provide traffic-engineering MPLS tunnels. SR-MPLS is also used to provide best-effort reachability. BGP is configured to use the route preference (or administrative distance) from table lookup as forwarding address preference. RSVP- TE is given a better route preference than SR-MPLS. RSVP-TE tunnel to PE3 cannot be established. Vroonen, et al. Expires 27 December 2026 [Page 13] Internet-Draft BGP nh selection enhancements June 2026 +========+==========+============+======+ | Prefix | Protocol | Preference | Cost | +========+==========+============+======+ | N1 | IS-IS | 100 | 10 | +--------+----------+------------+------+ | N2 | IS-IS | 100 | 15 | +--------+----------+------------+------+ | N3 | IS-IS | 100 | 20 | +--------+----------+------------+------+ Table 8: IPv4 routing table of PE4 +========+==========+============+======+ | Prefix | Protocol | Preference | Cost | +========+==========+============+======+ | N1 | IS-IS SR | 110 | 10 | +--------+----------+------------+------+ | N1 | RSVP-TE | 250 | 1000 | +--------+----------+------------+------+ | N2 | IS-IS SR | 110 | 15 | +--------+----------+------------+------+ | N2 | RSVP-TE | 250 | 100 | +--------+----------+------------+------+ | N3 | IS-IS SR | 110 | 20 | +--------+----------+------------+------+ Table 9: MPLS ingress tunnel table of PE4 As mentioned in Section 3.2, the case of BGP free-core requires BGP on PE4 to be configured to allow the resolution the NEXT_HOP address through tunnels (of any type). Considering that PE4 maintains a separate table for MPLS ingress tunnels, PE4 will look up for N1, N2, N3 addresses only in this table. PE4 will first check the resolvability of N1, N2 and N3. All are resolvable in the MPLS ingress tunnel table. PE4 will end-up with the following information from the MPLS ingress tunnel table to compare the path: Vroonen, et al. Expires 27 December 2026 [Page 14] Internet-Draft BGP nh selection enhancements June 2026 P/m Path1: from PE1, NH=N1 cost 1000, forwarding address preference 250 Path2: from PE2, NH=N2 cost 100, forwarding address preference 250 Path3: from PE3, NH=NH3 cost 20, forwarding address preference 110 Figure 6 PE4 will check the forwarding address preference of the paths. PE4 will not consider the paths received from PE3 because the forwarding address preference is lower than others. Finally, PE4 will compare the internal costs between paths from PE1 and PE2 as they have the same preference and path from PE2 will be elected as best because it has the lowest cost. As mentioned in Section 3.5, if the RSVP-TE tunnel to PE2 goes down, PE4 will re-evaluate the best path selection and will select the path from PE1 as best. This is also true if the cost of RSVP-TE LSP to reach N1 changes and becomes lower than the cost of LSP to reach N2. 5. Example with Intent-Aware BGP Transport This example illustrates the need for forwarding address preference when intent-aware BGP transport technologies are deployed. Both BGP Classful Transport ([RFC9832]) and BGP Color-Aware Routing ([RFC9871]) can lead to situations where two paths for a prefix resolve through different transport mechanisms with incomparable costs. +-----+ (M=5) P/m -- | PE1 |----------------+ (N1) +-----+ | +----+ (M=5) +-----+ | P1 |--------------| PE3 | +----+ +-----+ +-----+ (M=10) | P/m -- | PE2 |----------------+ (N2) +-----+ Figure 7 Vroonen, et al. Expires 27 December 2026 [Page 15] Internet-Draft BGP nh selection enhancements June 2026 In Figure 7, PE3 receives a VPN prefix P/m from PE1 and PE2 with NEXT_HOP N1 and N2 respectively. An intent-aware transport technology is deployed, requesting "Gold" SLA transport for this prefix. Gold RSVP-TE tunnels exist from PE3 to PE1 but not from PE3 to PE2. Best-effort SR-MPLS tunnels exist to both PE1 and PE2. In a BGP Classful Transport ([RFC9832]) deployment, the prefix P/m is advertised with Mapping Community color:0:100 indicating the Gold SLA intent. The Resolution Scheme for color:0:100 is configured with the following ordered set of Transport Route Databases (TRDBs): [TRDB- Gold, TRDB-Best-Effort], meaning the NEXT_HOP should be resolved first over Gold transport tunnels, with a fallback to best-effort tunnels. Following [RFC9832] Section 7.8 procedures, the Resolution Scheme resolves the NEXT_HOP of each path independently: * Path1 (NH=N1): N1 is found in TRDB-Gold (RSVP-TE tunnel to PE1). Resolution succeeds in the primary TRDB. * Path2 (NH=N2): N2 is not found in TRDB-Gold. Fallback occurs: N2 is found in TRDB-Best-Effort (SR-MPLS tunnel to PE2). Resolution succeeds in the backup TRDB. In a BGP Color-Aware Routing ([RFC9871]) deployment, the prefix P/m is advertised as a service route with BGP Color Extended Community C_Gold. PE3 needs color-aware paths (N1, C_Gold) and (N2, C_Gold) to steer traffic. Following [RFC9871] Section 2.5 procedures: * Path1 (NH=N1): color-aware path (N1, C_Gold) is available via RSVP-TE. Resolution succeeds via the primary mechanism. * Path2 (NH=N2): color-aware path (N2, C_Gold) is not available. Local policy allows fallback: N2 is resolved via best-effort SR- MPLS. Resolution succeeds via the fallback mechanism. In both cases, both paths are resolvable, but they resolved through different transport mechanisms with different characteristics. The Gold tunnel to PE1 has a cost of 1000 (RSVP-TE metric), while the best-effort tunnel to PE2 has a cost of 15 (IGP metric). These costs are not comparable. +========+==========+============+======+ | Prefix | Protocol | Preference | Cost | +========+==========+============+======+ | N1 | RSVP-TE | 50 | 1000 | +--------+----------+------------+------+ Table 10: Gold transport tunnels on PE3 Vroonen, et al. Expires 27 December 2026 [Page 16] Internet-Draft BGP nh selection enhancements June 2026 +========+==========+============+======+ | Prefix | Protocol | Preference | Cost | +========+==========+============+======+ | N1 | IS-IS SR | 110 | 10 | +--------+----------+------------+------+ | N2 | IS-IS SR | 110 | 15 | +--------+----------+------------+------+ Table 11: Best-effort transport tunnels on PE3 Without the forwarding address preference mechanism, PE3 would simply compare the costs: 1000 (path via PE1, Gold) vs 15 (path via PE2, best-effort), and would select the path via PE2. This is problematic because: * The path via PE2 uses best-effort transport, violating the Gold SLA intent. * The path via PE1, which satisfies the Gold SLA, is discarded solely because the RSVP-TE metric is numerically higher than the IGP metric, even though these values are not comparable. Using the forwarding address preference, PE3 retrieves the preference value from the transport table lookup. The Gold transport table provides a preference of 50, while the best-effort transport table provides a preference of 110. PE3 will first compare the forwarding address preference values and remove from consideration any path with the highest (least preferred) value. Path2 (preference 110) is removed. PE3 selects path1 via PE1, which satisfies the Gold SLA intent. P/m Path1: from PE1, NH=N1 resolved via Gold transport (RSVP-TE) cost 1000, forwarding address preference 50 Path2: from PE2, NH=N2 resolved via best-effort transport (SR-MPLS) cost 15, forwarding address preference 110 => PE3 selects Path1 (lower preference value = more preferred) Figure 8 Vroonen, et al. Expires 27 December 2026 [Page 17] Internet-Draft BGP nh selection enhancements June 2026 6. Operational Considerations In most cases, tunnels carry traffic from the ingress to the egress node that advertised the BGP service route, as transit routers are not aware of the BGP service routes. For instance, in SRv6 service cases, the SRv6 encapsulation using the advertised SID as destination carries traffic down to the node that knows about the service route. Determining the internal cost by using the forwarding address (tunnel endpoint, SRv6 SID, ...) makes sense in these cases as it reflects the end-to-end cost. However, there are cases where the tunnel does not terminate on the service egress. One example is a service route advertised with a color allowing for usage of a null endpoint policy. Another example is a service route advertised with a Tunnel Encapsulation attribute using a tunnel endpoint different from the service egress. In such cases, the tail node of the tunnel must have routing entries for the service routes to ensure end-to-end reachability. When determining the internal cost using the forwarding address in these cases, the retrieved cost may not reflect the end-to-end path. However, using the NEXT_HOP address would not necessarily provide a better view: it would still give an inaccurate picture of distances from a traffic forwarding perspective, since the forwarding path does not follow the IGP-routed path. Additionally, operators commonly manipulate the NEXT_HOP through routing policies, so there is no guarantee that the NEXT_HOP address correctly reflects the real egress node. Hence, using the forwarding address in cases where the tunnel tail is not the service egress is not worse compared to using the NEXT_HOP. This document defines mechanisms that modify the BGP best path selection process defined in [RFC4271]. Consequently, the potential impact on routing loops needs to be analyzed. The mechanisms described in this document are typically deployed in environments utilizing tunneling technologies (such as MPLS or SRv6). In these scenarios, the BGP path selection determines the specific Forwarding Address used for encapsulation. When the traffic is encapsulated at the ingress router, intermediate nodes forward packets based solely on the outer transport header rather than the inner destination. Consequently, the specific BGP selection criteria are enforced only at the ingress, preventing forwarding loops that might otherwise arise from inconsistent policy decisions on intermediate nodes. Vroonen, et al. Expires 27 December 2026 [Page 18] Internet-Draft BGP nh selection enhancements June 2026 To illustrate this loop prevention, consider the following topology where PE1, PE2, and PE3 provide reachability to Prefix X. +-------+ /| PE2 |--- Prefix X (Low Delay) | | / +-------+ / +-------+ +-------+ | PE1 |----| P | +-------+ +-------+ (Ingress) \ \ +-------+ (Low IGP) | PE3 |--- Prefix X | | +-------+ Figure 9: Loop Prevention via Encapsulation Assume PE1 is configured with the mechanisms defined in this document and selects PE2 as the BGP next-hop for Prefix X to optimize for low delay. Router P is a transit node and uses standard BGP procedures, preferring PE3 due to a lower IGP metric. In a hop-by-hop IP routing paradigm, PE1 would forward a bare IP packet to P. Router P would perform a lookup on the inner destination IP (Prefix X) and deflect the packet toward PE3, directly contradicting PE1's decision and potentially causing a loop if PE3 also had a conflicting policy. However, in a tunneling paradigm, PE1 encapsulates the packet with an outer header (e.g., an SRv6 SID or MPLS label) targeting PE2's Forwarding Address. When the packet arrives at Router P, P only evaluates the outer header and forwards the packet along the IGP shortest path to PE2. Router P's BGP policy regarding Prefix X is never invoked for in-flight traffic. Nevertheless, to ensure predictable routing behavior and simplify troubleshooting, it is recommended that network operators maintain a consistent configuration of these selection criteria across all BGP speakers within the same administrative domain. 7. Relationship to draft-ietf-idr-bgp-bestpath-selection-criteria This document and [I-D.ietf-idr-bgp-bestpath-selection-criteria] are complementary. Their relationship can be summarized as follows: Vroonen, et al. Expires 27 December 2026 [Page 19] Internet-Draft BGP nh selection enhancements June 2026 * Scope: [I-D.ietf-idr-bgp-bestpath-selection-criteria] performs the resolvability check only; this document also covers internal cost determination and path preference. * Generality: [I-D.ietf-idr-bgp-bestpath-selection-criteria] is MPLS-centric; this document is generic and applies to MPLS, SR Policy, SRv6 and intent-aware BGP transport. * This document introduces the forwarding address, decoupling path selection from the NEXT_HOP. * This document introduces the forwarding address preference to compare paths of non-comparable types. * This document derives the internal cost from the forwarding address, not from the NEXT_HOP only. * The MPLS case covered by [I-D.ietf-idr-bgp-bestpath-selection-criteria] is one instance of the generic mechanism defined here. 8. IANA Considerations This document does not require any IANA actions. 9. Security Considerations This document modifies BGP route selection process by using data other than the next-hop address to perform the resolvability check as well as to compute the internal cost. This does not add any security consideration compared to using the existing NEXT_HOP defined in [RFC4271]. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006, . Vroonen, et al. Expires 27 December 2026 [Page 20] Internet-Draft BGP nh selection enhancements June 2026 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8665] Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler, H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF Extensions for Segment Routing", RFC 8665, DOI 10.17487/RFC8665, December 2019, . [RFC9012] Patel, K., Van de Velde, G., Sangli, S., and J. Scudder, "The BGP Tunnel Encapsulation Attribute", RFC 9012, DOI 10.17487/RFC9012, April 2021, . [RFC9246] van Brandenburg, R., Leung, K., and P. Sorber, "URI Signing for Content Delivery Network Interconnection (CDNI)", RFC 9246, DOI 10.17487/RFC9246, June 2022, . [RFC9252] Dawra, G., Ed., Talaulikar, K., Ed., Raszuk, R., Decraene, B., Zhuang, S., and J. Rabadan, "BGP Overlay Services Based on Segment Routing over IPv6 (SRv6)", RFC 9252, DOI 10.17487/RFC9252, July 2022, . [RFC9256] Filsfils, C., Talaulikar, K., Ed., Voyer, D., Bogdanov, A., and P. Mattes, "Segment Routing Policy Architecture", RFC 9256, DOI 10.17487/RFC9256, July 2022, . [RFC9350] Psenak, P., Ed., Hegde, S., Filsfils, C., Talaulikar, K., and A. Gulko, "IGP Flexible Algorithm", RFC 9350, DOI 10.17487/RFC9350, February 2023, . [RFC9832] Vairavakkalai, K., Ed. and N. Venkataraman, Ed., "BGP Classful Transport Planes", RFC 9832, DOI 10.17487/RFC9832, September 2025, . 10.2. Informative References Vroonen, et al. Expires 27 December 2026 [Page 21] Internet-Draft BGP nh selection enhancements June 2026 [I-D.ietf-idr-bgp-bestpath-selection-criteria] Asati, R., "BGP Bestpath Selection Criteria Enhancement", Work in Progress, Internet-Draft, draft-ietf-idr-bgp- bestpath-selection-criteria-12, 5 June 2019, . [I-D.ietf-idr-sr-policy-metric] Li, Z., KaZhang, Dong, J., Talaulikar, K., and R. Gu, "BGP SR Policy Extensions for Metric", Work in Progress, Internet-Draft, draft-ietf-idr-sr-policy-metric-05, 2 June 2026, . [RFC9830] Previdi, S., Filsfils, C., Talaulikar, K., Ed., Mattes, P., and D. Jain, "Advertising Segment Routing Policies in BGP", RFC 9830, DOI 10.17487/RFC9830, September 2025, . [RFC9871] Rao, D., Ed. and S. Agrawal, Ed., "BGP Color-Aware Routing (CAR)", RFC 9871, DOI 10.17487/RFC9871, November 2025, . Acknowledgements The authors would like to acknowledge Ketan Talaulikar, Serge Krier and Shyam Sethuram for review and comments. Authors' Addresses Olivier Vroonen (editor) Cisco Email: ovroonen@cisco.com Stephane Litkowski Cisco Email: slitkows@cisco.com Kandhla Chandi Bell Canada Email: kandhla.chandi@bell.ca Jie Dong Huawei Email: jie.dong@huawei.com Vroonen, et al. Expires 27 December 2026 [Page 22]