| Internet-Draft | Edge Metadata Path | April 2026 |
| Dunbar, et al. | Expires 29 October 2026 | [Page] |
This draft describes a new Edge Metadata Path Attribute and some Sub-TLVs for egress routers to advertise the Edge Metadata about the attached edge services (ES). The edge service Metadata can be used by the ingress routers in the 5G Local Data Network to make path selections not only based on the routing cost but also the running environment of the edge services. The goal is to improve latency and performance for 5G edge services.¶
The extension enables an edge service at one specific location to be more preferred than the others with the same IP address (ANYCAST) to receive data flow from a specific source, like a specific User Equipment (UE).¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 29 October 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This document describes a new Edge Metadata Path Attribute added to a BGP UPDATE message [RFC4271] for egress routers to advertise the Metadata about 5G low latency edge services directly attached to the egress routers. 5G [TS.23.501-3GPP]is characterized by having edge services closer to the Cell Towers reachable by Local Data Networks (LDN). From an IP network perspective, the 5G LDN is a limited domain [RFC8799] with edge services a few hops away from the ingress nodes. Only selective UE services are considered as 5G low latency edge services.¶
Note: The proposed edge service Metadata Path Attribute are not intended for the best-effort services reachable via the public Internet. The information carried by the Edge Metadata Path Attribute can be used by the ingress routers to make path selections for selective low latency services based on not only the network distance but also the running environment of the edge cloud sites. The goal is to improve latency and performance for 5G ultra-low latency services.¶
This extension is targeted for a single domain with a BGP Route Reflector (RR) [RFC4456] controlling the propagation of the BGP UPDATEs. The edge service Metadata Path Attribute is only attached to the low latency services (routes) hosted in the 5G edge cloud sites. These routes are only a small subset of services initiated from UEs, not for UEs accessing many internet sites.¶
While the proposed Edge Metadata Path Attribute is particularly beneficial for low latency services, the Edge Metadata Path Attributes can be expanded to propagate information about GPU availability, power, or other resources necessary for compute-intensive services such as AI and machine learning. This flexibility makes it a valuable tool for a wide range of applications beyond just low latency services when used within a limited domain network.¶
The following conventions are used in this document.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The goal of this edge service Metadata Path Attribute is for egress routers to propagate the metrics about the running environment for a subset of edge services to ingress routers so that the ingress routers can make path selections based on not only the routing cost but also the running environment for those edge services. The BGP speakers that do not support the Edge Metadata Path Attribute can ignore the Edge Metadata Path Attribute in a BGP UPDATE Message. All intermediate nodes can forward the entire BGP UPDATE as it is. Multiple metrics can be attached to one Metadata Path Attribute. One Metadata Path Attribute can contain computing service capability information, computing service states, computing resource states of the corresponding edge site, or more. Computing service capability information can be used to record information of the computing power node or initialization deployment information for computing service initialization. Computing service states can include one of the service connection numbers, service duration, and so on. Computing resource states can be detailed information on computing resources such as CPU/GPU. They can also be an abstract metric from these detailed parameters to indicate the resource status of the edge site. There could be more metrics about the running environment being attached to the Metadata Path Attribute; e.g., some of the metrics being discussed by the IETF CATS Working Group. This document illustrates a few examples of Sub-TLVs of the metrics under the edge service Metadata Path Attribute:¶
This section specifies how those Metadata impact the ingress node's path selections.¶
When an ingress router receives BGP UPDATEs for the same IP prefix from multiple egress routers, all these egress routers' loopback addresses are considered as the next hops for the IP prefix. For the selected low latency edge services, the ingress router BGP engine would call an edge service Management function that can select paths based on the edge service Metadata received. Section 5.1 has an exemplary algorithm to compute the weighted path cost based on the edge service Metadata carried by the Sub-TLV(s) specified in this document.¶
Section 5 has the detailed description of the edge service Metadata influenced optimal path selection.¶
When the ingress router receives a packet and does a lookup on the route in the FIB, it determines the destination prefix's entire path including the optimal egress node. The ingress router encapsulates the packet destined towards the optimal egress router. For routes that carry the Metadata Path Attribute but lack the Tunnel Encapsulation Path Attribute [RFC9012], it is recommended that the ingress router encapsulate the original packet using an IP-in-IP header. This encapsulation ensures that intermediate nodes not supporting the Metadata Path Attribute do not forward the packet to unintended destinations. The outer header SHOULD set the destination address to the optimal egress router and the source address to the ingress router.¶
For routes without the Metadata Path Attribute, no changes are required. Packets are forwarded according to existing behavior: encapsulation is applied when Tunnel Attributes are present, and parkets are forwarded without encapsulation when they are not.¶
For subsequent packets belonging to the same flow, the ingress router needs to forward them to the same egress router unless the selected egress router is no longer reachable. Forwarding packets for a particular flow to the same egress router, also known as Flow Affinity, is supported by many commercial routers. Most registered EC services have relatively short-lived flows.¶
How Flow Affinity is implemented is out of the scope for this document.¶
When a UE moves to a new 5G gNB which is anchored to the same UPF, the packets from the UE traverse to the same ingress router. Path selection and forwarding behavior are same as before.¶
If the UE maintains the same IP address when anchored to a new UPF, the directly connected ingress router might use the information passed from a neighboring router to derive the optimal BGP Next Hop for this route. The detailed algorithm is out of the scope of this document.¶
The Edge Metadata Path Attribute is an optional non-transitive BGP Path attribute that carries metadata associated with edge services attached to the egress router. The attribute consists of one or more Edge Metadata Sub-TLVs, where each Sub-TLV encodes one specific metadata item associated with the advertised route or service.¶
The Edge Metadata Path Attribute MAY be included in a BGP UPDATE together with other BGP Path Attributes, such as Communities, NEXT_HOP, Tunnel Encapsulation Path Attribute, and other applicable attributes. The choice of which routes carry the Edge Metadata Path Attribute, and which Sub-TLVs are included for those routes, is determined by local policy. The fields within the Edge Metadata Path Attribute and all included Sub-TLVs MUST use network byte order.¶
Boundary filtering SHOULD be applied at the administrative boundary to prevent the Edge Metadata Path Attribute from being distributed beyond its intended scope.¶
The Edge Metadata Path Attribute has the following characteristics:¶
A BGP speaker that receives a BGP UPDATE containing the Edge Metadata Path Attribute and readvertises that route within the same metadata distribution domain SHOULD propagate the Edge Metadata Path Attribute without modification, unless local policy explicitly requires otherwise.¶
When advertising the route to a peer outside the intended metadata distribution domain, the speaker SHOULD remove the Edge Metadata Path Attribute.¶
If a BGP speaker originates or modifies a route and is configured to attach Edge Metadata, it MAY add the Edge Metadata Path Attribute to the UPDATE message, subject to local policy and the capability negotiation specified in Section 5.¶
A BGP speaker that receives a malformed Edge Metadata Path Attribute that cannot be parsed according to the attribute format and length rules MUST handle the error as specified in Section 9.¶
When a BGP speaker receives a well-formed Edge Metadata Path Attribute, it MUST process each included Sub-TLV independently.¶
If the attribute contains one or more Sub-TLVs whose types are recognized by the receiving speaker, the receiving speaker SHOULD process those recognized Sub-TLVs according to their definitions in this document and according to local policy.¶
If the attribute contains a Sub-TLV whose type is unknown or unsupported by the receiving speaker, the speaker MUST ignore that Sub-TLV and MUST continue processing the remaining Sub-TLVs. The presence of an unknown or unsupported Sub-TLV MUST NOT by itself cause the entire Edge Metadata Path Attribute to be considered malformed.¶
If a Sub-TLV type is recognized by the receiving speaker, but the value carried in that Sub-TLV is invalid according to the definition of that Sub-TLV, the speaker MUST treat that Sub-TLV as unusable and MUST ignore it for metadata-based route selection. The speaker SHOULD continue processing the remaining Sub-TLVs. An invalid value in one recognized Sub-TLV MUST NOT by itself cause the entire Edge Metadata Path Attribute to be considered malformed unless the corresponding Sub-TLV definition explicitly states otherwise.¶
If the Length field of a Sub-TLV is inconsistent with the encoding defined for that Sub-TLV, or if the Sub-TLV cannot be fully parsed based on the encoded length, the Edge Metadata Path Attribute MUST be treated as malformed, and error handling MUST follow the procedures specified in Section 9.¶
If a recognized Sub-TLV appears more times than allowed by its definition, the receiver SHOULD use only the first occurrence unless the specific Sub-TLV definition states otherwise, and SHOULD ignore the additional occurrences.¶
A BGP speaker that propagates the Edge Metadata Path Attribute SHOULD NOT delete unrecognized Sub-TLVs solely because they are unrecognized. If the route is propagated with the Edge Metadata Path Attribute, unrecognized Sub-TLVs SHOULD remain unchanged in the propagated attribute unless local policy requires removal of the entire attribute.¶
If some Sub-TLVs are absent, the receiving speaker MUST treat the attribute as carrying only the metadata explicitly present. The absence of a particular Sub-TLV MUST NOT be interpreted as a zero value, an infinite value, a degraded condition, or any other inferred semantic value unless the specific Sub-TLV definition explicitly states such behavior.¶
If none of the included Sub-TLVs are recognized by the receiving speaker, the speaker MUST treat the Edge Metadata Path Attribute as present but unusable for local metadata-based route selection. In that case, the speaker SHOULD fall back to route selection based on other applicable BGP attributes and local policy.¶
By default, a BGP speaker is not required to report unknown, unsupported, or unusable Sub-TLVs to its peer. Logging or notification to a local management system is OPTIONAL.¶
Ingress nodes that use Edge Metadata for route selection SHOULD apply a deployment-specific algorithm to the set of recognized Sub-TLVs that are present and usable in the received attribute. To ensure consistent route selection, nodes participating in the same deployment SHOULD use consistent policy regarding which Sub-TLVs are considered and how their values are incorporated into route selection.¶
Different services might have different preference index values configured for the same site. For example, Service-A requires high computing power, Service-B requires high bandwidth among its microservices, and Service-C requires high volume storage capacity. For a DC with relatively low storage capacity but high bisectional bandwidth, its preference index value for Service-B is higher and lower for Service-C. Site Preference Index can also be used to achieve stickiness for some services.¶
It is out of the scope of this document how the preference index is determined or configured.¶
The Site Preference Index Sub-TLV has the following format:¶
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Site-Preference-Index Sub-Type | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Site Preference Index value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The Site Physical Availability Index indicates the percentage of impact on a group of routes associated with a common physical characteristic, for example, a pod, a row of server racks, a floor, or an entire DC. The purpose is to use one UPDATE message to indicate a group of routes of different NLRIs impacted by a physical event. For example, a power outage to a pod can cause the Site Physical Availability Index to be 0% for all the routes in the pod. Partial fiber cut to a row of shelves can cause the Site Physical Availability Index to be 50% for all the routes in those shelves. The value is 0-100, with 100% indicating the site is fully functional, 0% indicating the site is entirely out of service, and 50% indicating the site is 50% degraded.¶
It is recommended to assign each route with one Site-ID. When a route is associated with multiple Site-IDs, the latest BGP UPDATE will override any previous associations. For example, one DC can use POD number as Site-ID, another DC can use Row of Shelves as the Site-ID.¶
Cloud Site/Pod failures and degradation include, but are not limited to, a site degradation or an entire site going down caused by a variety of reasons. Examples include fiber cuts impacting a site or among pods, cooling failures, insufficient backup power, cyber threats attacks, too many changes outside of the maintenance window, etc. Fiber-cuts are not uncommon within a Cloud site or between sites.¶
When a physical failure occurs at an edge site (or a pod), many instances can be affected, and the associated routes (i.e., IP addresses) may not be easily aggregated. Instead of sending numerous BGP UPDATE messages to ingress routers for each impacted instance, the egress router can send a single BGP UPDATE to indicate the site's physical capacity availability. Based on this update, ingress routers can decide to reroute all or some of the affected instances, depending on the extent of the site's degradation. This approach significantly improves efficiency, particularly when fault detection within an edge site relies on proprietary or deployment-specific mechanisms.¶
The BGP UPDATE for the individual instances (i.e., the routes) can include the Capacity Availability Index solely for ingress routers to associate the routes with the Side-ID. The actual Capacity Availability Index value, i.e., the percentage for all the routes associated with the Side-ID, is generated by the egress routers with the egress routers' loopback address as the NLRI.¶
The Site Physical Availability Index Sub-TLV has fixed length of 8 Octets, including the Type field.¶
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PhyAvailIdx Sub-Type | Length |I| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Site-ID (2 octets) | Site Availability Percentage | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
An egress router sets itself as the next hop for a BGP peer before sending an UPDATE with the Edge Metadata Path Attribute that includes the Site Physical Availability Index Sub-TLV. The Site Physical Availability Index Sub-TLV (with RouteFlag-I=1) is for ingress routers to associate the Site Identifier with the prefixes.¶
A BGP UPDATE that includes the Site Availability Index Sub-TLV without specifying attached routes in the NLRI, but instead using the egress router's loopback address in the NLRI, is referred to as a standalone Site Availability Index BGP UPDATE. When an ingress router receives such a BGP UPDATE containing the Edge Metadata Path Attribute with the standalone Site Physical Availability Index Sub-TLV from Router-X or its RR with the Originator-ID equal to Router-X, the ingress router SHOULD use the site availability index to efficiently reduce or increase the preference for all BGP routes attached to Router-X.¶
The BGP UPDATE with a standalone Site Availability Index is NOT intended for resolving NextHop.¶
It is desirable for an ingress router to select a site with the shortest processing time for an ultra-low latency service. However, it is not easy to predict which site has "the fastest processing time" or "the shortest processing delay" for an incoming service request because:¶
Even though utilization measurements, like those below, are collected by most data centers, they cannot indicate which site has the shortest processing time. A service request might be processed faster on Site-A even if Site-A is overutilized.¶
The remaining available resource at a site is a more reasonable indication of process delay for future service requests.¶
The Service Delay Prediction Index is a value that predicts processing delays at the site for future service requests. The higher the value, the longer of the delay.¶
While out of scope, we assume there is an algorithm that can derive the Service Delay Prediction Index that can be assigned to the egress router. When the Service Delay Prediction value is updated, which can be triggered by the available resources change, etc., the egress router can attach the updated Service Delay Predication value in a Sub-TLV under the Edge Metadata Path Attribute of the BGP Route UPDATE message to the ingress routers.¶
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ServiceDelayPredict Sub-Type | Length |F|L|Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Service Delay Predication Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
When ingress routers have embedded analytics tool relying on the raw measurements, it is useful for the egress router to send the raw measurement.¶
Raw Measurement Sub-TLV has the following format:¶
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Raw-Measurement Sub-Type | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Raw-Measurement Sub-Type (16 bits): 4 (specified in this document). Indicating raw measurements Metadata associated with the edge service address.¶
- Length (8 bits): specifies the total length, in octets, of the value field, excluding the Sub-Type and the Length fields. For the Raw-Measurement Sub-Type, the length is determined by the Value field, which carries one or more types of raw measurement.¶
- Reserved (8 bits): These bits are reserved for future use and MUST be set to zero. Future documents may specify different uses for these bits.¶
- Value: The value filed can contain multiple types of raw measurements, each represented as a Sub-Sub-TLV.¶
One example of a raw measurement Metadata Sub-sub-TLV is defined below to convey the total number of packets or bytes transmitted over a specified period for a particular edge service address. When a Data DC GW router cannot directly access the internal state of an edge service, the volume of incoming traffic can be a reliable indicator of its load. A sudden increase in packets or bytes can signal a surge in requests, potentially leading to performance issues or resource constraints on the service side.¶
To differentiate this measurement from others that may be defined in the future, this document assigns a Sub-sub-Type value of 1 to represent the total packets or bytes transmitted to an edge service address.¶
Future documents may define additional Sub-sub-types of raw measurement metadata. Each type of raw measurement will have a unique Sub-sub-type value assigned at the time of its specification.¶
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |RawPacketsMeasure Sub-sub-Type | Length |B|Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Measurement Period | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | total number of packets (or bytes) to the Edge Service | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | total number of packets (or bytes) from the Edge Service | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- RawPacketsMeasure Sub-sub-Type (8 bits): 1 (specified in this document). Indicating raw measurements of packets or bits transmitted to or from the edge service address.¶
- Length (8 bits): specifies the total length in octets of the value field, excluding the Sub-sub-Type and the Length fields. For the raw measurements of packets transmitted to or from the edge service address Sub-sub-Type, the length SHOULD be 22.¶
- B flag (1 bit): If set to 0, the raw measurement is the number of packets. If set to 1, the raw measurement is the number of bytes.¶
- Reserved (7 bits): These bits are reserved for future use and MUST be set to zero.¶
- Measurement Period: BGP Update period in Seconds or user-specified period.¶
- Total number of packets to the Edge Service (32 bits): This field specifies the total number of packets transmitted to the edge service address over the specified measurement period.¶
- Total number of packets from the Edge Service (32 bits): This field specifies the total number of packets from the edge service address over the specified measurement period.¶
The receiver nodes can compute the needed metrics, such as the Service Delay Prediction, for the service based on the raw measurements sent from the egress router and preconfigured algorithms.¶
The service-oriented capability Sub-TLV is for distributing information regarding the capabilities of a specific service in a deployment environment. Depending on the deployment, a deployment environment can be an edge site or other types of environments. This information provides ingress routers or controllers with the available resources for the specific service in each deployment environment. It enables them to make well-informed decisions for the optimal paths to the selected deployment environment.¶
Currently, the Sub-TLV only has an abstract value derived from various metrics, although the specifics of this derivation are beyond the scope of this document. Importantly, this value is significant only when comparing multiple data center sites for the same service. This value is not meaningful when comparing different services, meaning the capability value relevant to Service A cannot be directly compared with that for Service B. Future enhancements may expand this sub-TLV to include more types of metrics or even raw data that represents direct metrics. This information is important in 5G network environments where efficient resource utilization is crucial for enhancing performance and service quality.¶
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ServiceOriented Cap Sub-Type | Length | Res | MT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SO-CapValue | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Multiple Service-Oriented Capability Sub-TLVs with different metric types can be encoded in a Edge Metadata Path Attribute, indicating that multiple metrics are carried. However, if more than one Service-Oriented Capability Sub-TLVs with the same metric type are encoded in a Edge Metadata Path Attribute, only the first one will be processed and the others will be ignored in processing.¶
The "Service-Oriented Available Resource Sub-TLV" is for distributing a metric that measures the real-time avaiable resources allocated for processing specific services or applications at an edge site. This Sub-TLV complements the "Service-Oriented Capability Sub-TLV" described in Section 4.6, which addresses the static resource capability of a site for a service. While the Capability Abstract Value provides a baseline understanding of a site's potential to handle a service, the Available Resource metric offers a dynamic perspective by quantifying how much of this capacity is currently available. This distinction is crucial for managing resource efficiency and responsiveness in network operations, ensuring that capabilities are not only available but also optimally used to meet the actual service demands.¶
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |ServiceOriented Avail Sub-Type | Length |P| Res | MT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SO-AvailRes | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Multiple Service-Oriented Available Resource Sub-TLVs with different metric types can be encoded in a Edge Metadata Path Attribute, indicating that multiple metrics are carried. However, if more than one Service-Oriented Available Resource Sub-TLVs with the same metric type are encoded in a Edge Metadata Path Attribute, only the first one will be processed and the others will be ignored in processing.¶
The BGP Capabilities Optional Parameter allows a BGP speaker to advertise, during the BGP OPEN message exchange, the set of capabilities supported on a session. As specified in [RFC5492], each capability is encoded as a Capability Code, a Capability Length, and a Capability Value.¶
To enable the exchange of the Edge Metadata Path Attribute on a BGP session, this document defines a new Edge Metadata Processing Capability (=78). This capability is used by a BGP speaker to indicate that it can send and receive the Edge Metadata Path Attribute for one or more AFI/SAFI pairs on that session.¶
The Value Field of the Edge Metadata Processing Capability has the following format:¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|A|AFI-SAFI-cnt | AFI | SAFI |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| AFI | SAFI | .. ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Where:¶
When the A Flag is set to 1, the capability applies to any AFI/SAFI enabled on the BGP session. In this case, AFI-SAFI-CNT SHOULD be set to 0 and no AFI/SAFI tuples need be present in the Capability Value. If one or more AFI/SAFI tuples are present when the A Flag is set to 1, the receiver SHOULD ignore those tuples and process the capability as applying to all AFI/SAFI enabled on the session.¶
When the A Flag is set to 0, AFI-SAFI-CNT indicates the exact number of AFI/SAFI pairs listed in the Capability Value, and the capability applies only to those listed AFI/SAFI pairs.¶
A BGP speaker MUST NOT attach the Edge Metadata Path Attribute to any UPDATE message sent on a BGP session unless both peers have advertised the Edge Metadata Processing Capability for the corresponding AFI/SAFI on that session. If one peer has advertised the capability with the A Flag set to 1, that advertisement is considered to cover any AFI/SAFI enabled on the session for the purpose of this check.¶
If a BGP speaker has not advertised the Edge Metadata Processing Capability on a session, or has not received this capability from its peer on that session, the speaker MUST NOT send any UPDATE on that session that carries the Edge Metadata Path Attribute.¶
If a BGP speaker receives an UPDATE carrying the Edge Metadata Path Attribute on a session for which the corresponding Edge Metadata Processing Capability was not successfully advertised by both peers for that AFI/SAFI, the receiver SHOULD ignore the Edge Metadata Path Attribute and process the remainder of the UPDATE according to local policy and the error-handling procedures specified in Section 9.¶
If a BGP speaker does not include the Edge Metadata Processing Capability in its BGP OPEN message for a specific BGP session, or if it does not receive the Edge Metadata Processing Capability from its peer on that session, it MUST NOT send any BGP UPDATE message on that session that bind the Edge Metadata Path Attribute to any prefix.¶
The propagation scope of the Edge Metadata Path Attribute needs careful consideration to ensure it does not inadvertently leak to other BGP domains. According to Section 3 of [ATTRIBUTE-ESCAPE], it is necessary for the Route Reflector (RR) to be upgraded to constrain the propagation scope when propagating the metadata path attributes. Therefore, the Edge Metadata Path Attribute originator sets the attribute as Non-transitive when sending the BGP UPDATE message to its corresponding RR. Non-transitive attributes are only guaranteed to be dropped during BGP route propagation by implementations that do not recognize them, ensuring that the Edge Metadata path attributes do not propagate beyond the intended scope.¶
The RR can append the NO-ADVERTISE well-known community to the BGP UPDATE message with the Edge Metadata Path Attribute when forwarding it to the ingress routers. This signals to the ingress nodes that the associated route's Metadata Path Attribute SHOULD not be further advertised beyond their scope. This precautionary measure ensures that the receiver of the BGP UPDATE message refrains from forwarding the received update to its peers, preventing the undesired propagation of the information carried by the Metadata Path Attribute.¶
To address the potential issue where the NO-ADVERTISE well-known community of the BGP UPDATE message can be dropped by some routers, a new AS-Scope Sub-TLV can be included in the Metadata Path Attribute to prevent the Metadata Path Attribute from being leaked to unintended Autonomous Systems (ASes). The AS-Scope Sub-TLV will enforce stricter control over the propagation of the metadata by associating it with specific AS numbers.¶
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AS-Scope Sub-Type | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | In-Scope AS-Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
When a router receives a BGP UPDATE message containing the AS-Scope Sub-TLV, it must perform the following steps to process the AS-Scope value:¶
- AS Recognition: The router will check the AS value in the AS-Scope Sub-TLV.¶
- If the AS value matches the local AS or a recognized AS in its configuration, the router will process the update as usual. If the AS value does not match or is not recognized, the router SHOULD NOT process the Edge Metadata Path Attribute values in the BGP UDPATE and SHOULD NOT propagate the received BGP UPDATE to other nodes. I.e., treat-as-withdraw behavior will be used.¶
Example Usage:¶
Consider a scenario where a router in AS 65001 advertises a BGP UPDATE message with the AS-Scope Sub-TLV set to AS 65001. When another router in AS 65002 receives this UPDATE, it will check the AS-Scope Sub-TLV value:¶
Since AS 65002 does not match the AS value 65001, the router in AS 65002 will drop the UPDATE, preventing the metadata from leaking into AS 65002.¶
This mechanism ensures that the metadata remains confined to the intended ASes, enhancing the security and control over the propagation of BGP metadata.¶
This section describes how the information carried in the Edge Metadata Path Attribute can be incorporated into BGP route selection by local policy. The procedures in this section do not modify the base BGP decision process defined in [RFC4271]. Instead, they describe how local policy can use recognized Edge Metadata values when comparing candidate routes for services configured for metadata-aware route selection.¶
To remain consistent with Section 9.1.1 of [RFC4271], metadata-aware policy evaluation MUST be applied after LOCAL_PREF has been set for iBGP routes, or after equivalent inbound policy has been applied for eBGP routes.¶
The use of Edge Metadata does not replace existing BGP routing policy. Rather, the Edge Metadata Path Attribute provides additional inputs that local policy MAY use when comparing candidate routes for selected services.¶
A deployment MAY use only a subset of the metadata attributes carried in the Edge Metadata Path Attribute. Which metadata attributes are considered, and for which services they are considered, is determined by local policy.¶
For example, one deployment may consider only the Service Delay Prediction Sub-TLV for latency-sensitive services, while another deployment may consider only availability-related or service-capability-related Sub-TLVs. A route that carries additional recognized metadata does not require all such metadata to be used in route selection.¶
If none of the recognized metadata carried by a route are selected by local policy for preference computation, the route is evaluated using ordinary BGP policy and tie-breaking procedures.¶
For services configured for metadata-aware route selection, local policy MAY use one or more recognized metadata values carried in the Edge Metadata Path Attribute, together with other routing attributes, to derive a preference for each candidate route.¶
The procedure for combining recognized metadata with traditional BGP attributes is deployment specific and outside the scope of this document. The preference computation MAY be performed at a Route Reflector (RR), at an ingress node, or at another policy decision point within the same administrative domain.¶
When metadata-aware policy is applied to a set of candidate routes, the route with the most preferred policy outcome is selected. If two or more routes remain equally preferred after metadata-aware policy evaluation, the normal BGP tie-breaking procedures defined in [RFC4271] apply.¶
Local policy MAY define threshold conditions for one or more metadata types. When the recognized metadata associated with a route indicates that such a threshold has been crossed, local policy MAY reduce the preference of that route or MAY treat the route as ineligible for metadata-aware service steering.¶
This document does not mandate a specific action for degraded metadata values. The action taken, if any, is determined by local policy. For example, local policy may de-prefer a route whose Service Delay Prediction exceeds a configured threshold, or a route whose availability-related metadata falls below a configured level.¶
If local policy excludes a route from metadata-aware service steering, the route MAY still remain valid for ordinary BGP reachability unless separate policy removes or suppresses that route.¶
After metadata-aware policy evaluation, if multiple candidate routes remain equally preferred, BGP tie-breaking proceeds according to [RFC4271].¶
If the decision process results in multiple equally preferred paths and the deployment permits Equal Cost Multi Path (ECMP), those paths MAY be installed in the forwarding plane according to existing BGP procedures and platform capabilities.¶
Route Churn Considerations¶
While the mechanism detailed in this document aims to provide dynamic metrics like Capacity Availability Index, Site Delay Prediction Index, Service Delay Prediction Index, and Raw Measurement to optimize path selection, it is essential to consider the broader implications of metric-induced churn. Particularly, in the context of routes used for BGP nexthop resolution (e.g., labeled unicast), frequent changes in these metrics can lead to significant churn not only for the prefixes carrying the data but also for dependent routes.¶
In normal operation, the metadata associated with a prefix is propagated along with BGP UPDATE messages as per standard BGP behavior. The advertisement interval is governed by the underlying BGP mechanisms, such as the MRAI timer (typically 30 seconds for iBGP). This document does not propose a new periodic advertisement mechanism independent of routing updates. If metadata attributes (e.g., compute availability, service locality) change, a BGP UPDATE is triggered accordingly. If there is no change to the advertised metadata, no additional UPDATE is sent, in order to avoid unnecessary update churn and to comply with BGP best practices. Any active or proactive refresh mechanisms for metadata would require explicit triggers and change detection mechanisms, which are outside the scope of this document.¶
This behavior is analogous to the impacts observed with RSVP auto-bandwidth, which can introduce considerable instability within a network. Such route churn can propagate through the network, causing a cascade of UPDATEs and potential route flaps, thereby affecting overall network stability and performance.¶
To mitigate these effects, network operators SHOULD carefully manage the advertisement intervals of these dynamic metrics, ensuring they are set to avoid unnecessary churn. The default minimum interval for metrics change advertisement, set at 30 seconds, is designed to balance responsiveness with stability. However, in scenarios with higher sensitivity to route stability, operators may consider increasing this interval further to reduce the frequency of UPDATEs.¶
Significant load changes at EC data centers can be triggered by short-term gatherings of UEs, like conventions, lasting a few hours or days. Therefore, a high metrics change rate can persist for hours or days.¶
The Edge Metadata Path Attribute is an optional non-transitive BGP Path attribute that carries metrics and metadata about the edge services attached to the egress router. The Edge Metadata Path Attribute, to be assigned by IANA , consists of a set of Sub-TLVs, and each Sub-TLV contains information for specific metrics of the edge services.¶
When more than one sub-TLV is present in a Metadata Path Attribute, they are processed independently. Suppose a Edge Metadata Path Attribute can be parsed correctly but contains a Sub-TLV whose type is not recognized by a particular BGP speaker; that BGP speaker MUST NOT consider the attribute malformed. Instead, it MUST interpret the attribute as if that Sub-TLV had not been present. Logging the error locally or to a management system is optional. If the route carrying the Edge Metadata path attribute is propagated with the attribute, the unrecognized Sub-TLV remains in the attribute.¶
The edge service Metadata described in this document are only intended for propagating between ingress and egress routers of one single BGP Administrative Domain [RFC1136]. A single BGP Administrative Domain can consist of one AS or multiple ASes.¶
Only a small subset of services are expected to require the Edge Metadata Path Attribute. These are typically services for which metadata-aware route selection is beneficial. The domain in which such metadata is propagated is typically operated under a common administrative policy, even when the routers are supplied by different vendors.¶
Additional non-normative examples of deployment models and metadata-aware route-selection procedures are provided in Appendix C.¶
The proposed edge service Metadata are advertised within the trusted domain of 5G LDN's ingress and egress routers. The ingress routers SHOULD not propagate the edge service Metadata to any nodes that are not within the trusted domain.¶
To prevent the BGP UPDATE receivers (a.k.a. ingress routers in this document) from leaking the Edge Metadata Path Attribute by accident to nodes outside the trusted domain [ATTRIBUTE-ESCAPE], the following practice SHOULD be enforced:¶
BGP Route Filtering or BGP Route Policies [RFC5291] can also be used to ensure that BGP UPDATE messages with Edge Metadata Path Attribute attached do not get forwarded out of the administrative domain. BGP route filtering [RFC5291] allows network administrators to control the advertisements and acceptance of BGP routes, ensuring that specific routes do not leak outside the intended administrative domain. Here are the steps to achieve this:¶
IANA has done early allocation [RFC7120] of the codepoint 42 to the "Edge Metadata Path Attribute" in the "BGP Path Attributes" registry in the BGP Parameters registry group. The reference for this assignment is [this document].¶
+=======+======================================+=================+
| Value | Description | Reference |
+=======+======================================+=================+
| 42 | Edge Metadata Path Attribute | [this document] |
+-------+--------------------------------------+-----------------+
¶
IANA has assigned a Capability Code of 78 from the "BGP Capability Codes" registry in "Capability Codes registry group" for the Edge Metadata Capability in the BGP OPEN message.¶
+=======+======================================+=================+
| Value | Description | Reference |
+=======+======================================+=================+
| 78 | Edge Metadata Capability | [This document] |
+-------+--------------------------------------+-----------------+
¶
IANA is requested to create a new sub-registry under the Edge Metadata Path Attribute registry as follows:¶
+========+=============================+===================+ |Sub-Type| Description | Reference | +========+=============================+===================+ | 0 |reserved |[this document ] | +--------+-----------------------------+-------------------+ | 1 |Site Preference Index |[this document:4.3]| +--------+-----------------------------+-------------------+ | 2 |Site Physical Avail Index |[this document:4.4]| +--------+-----------------------------+-------------------+ | 3 |Service Delay Predication |[this document:4.5]| +--------+-----------------------------+-------------------+ | 4 |Raw Measurement |[this document:4.6]| +--------+-----------------------------+-------------------+ | 5 |Service-Oriented Capability |[this document:4.7]| +--------+-----------------------------+-------------------+ | 6 |Service-Oriented Available | | | |Resource |[this document:4.8]| +--------+-----------------------------+-------------------+ | 7 |AS-Scope |[this document:5.1]| +--------+-----------------------------+-------------------+ |8-65534 | unassigned | | +--------+-----------------------------+-------------------+ | 65535 | reserved |[this document] | +--------+-----------------------------+-------------------+¶
Changwang Lin¶
New H3C Technologies¶
China¶
Email: linchangwang.04414@h3c.com¶
Acknowledgements to Jeff Haas, Tom Petch, Adrian Farrel, Alvaro Retana, Robert Raszuk, Sue Hares, Shunwan Zhuang, Donald Eastlake, Dhruv Dhody, Cheng Li, DongYu Yuan, and Vincent Shi for their suggestions and contributions.¶
When data centers detailed running status are not exposed to the network operator, historic traffic patterns through the egress routers can be utilized to predict the load to a specific service. For example, when traffic volume to one service at one data center suddenly increases a huge percentage compared with the past 24 hours average, it is likely caused by a larger than normal demand for the service. When this happens, another data center with lower-than-average traffic volume for the same service might have a shorter processing time for the same service.¶
Here are some measurements that can be utilized to derive the Service Delay Predication for a service ID:¶
The Service Delay Prediction Index can be derived from LoadIndex/24Hour-Average. A higher value means a longer delay prediction. The egress router can use the ServiceDelayPred sub-TLV to indicate to the ingress routers of the delay prediction derived from the traffic pattern.¶
Note: The proposed IP layer load measurement is only an estimate based on the amount of traffic through the egress router, which might not truly reflect the load of the servers attached to the egress routers. They are listed here only for some special deployments where those metrics are helpful to the ingress routers in selecting the optimal paths.¶
Multiple instances of the same service could be attached to one egress router. When all instances of the same service are grouped behind one application layer load balancer, they appear as one single route to the egress router, i.e., the application loader balancer's prefix. Under this scenario, the compute metrics for all those instances behind one application layer balancer are aggregated under the application load balancer's prefix. In this case, the compute metrics aggregated by the Load Balancer are visible to the egress router as associated with the Load Balancer's prefix. However, how the application layer Load Balancers distribute the traffic among different instances is out of the scope of this document. When multiple instances of the same service have different paths or links reachable from the egress router, multiple groups of metrics from respective paths could be exposed to the egress router. The egress router can have preconfigured policies on aggregating various metrics from different paths and the corresponding policies in selecting a path for forwarding the packets received from ingress routers. The aggregated metrics can be carried in the BGP UPDATE messages instead of detailed measurements to reduce the entries advertised by the control plane and dampen the routes update in the forwarding plane. Upon receiving packets from ingress routers, the egress router can use its policies to choose an optimal path to one service instance. It is out of the scope of this document how the measurements are aggregated on egress routers and how ingress routers are configured with the algorithms to integrate the aggregated metrics with network layer metrics.¶
Many measurements could impact and correspondingly reflect service performance. In order to simplify an optimal selection process, egress routers can have preconfigured policies or algorithms to aggregate multiple metrics into one simple one to ingress routers. Though out of the scope of this document, an egress router can also have an algorithm to convert multiple metrics to network metrics, an IGP cost for each instance, to pass to ingress nodes. This decision-making process integrates network metrics computed by traditional IGP/BGP and the service delay metrics from egress routers to achieve a well-informed and adaptive routing approach. This intelligent orchestration at the edge enhances the service's overall performance and optimizes resource utilization across the distributed infrastructure. When the egress has merged the compute metrics from the local sites behind it, it can include one or more aggregated compute metrics in the Metadata Path Attribute in the BGP UPDATE to the Ingress. Also, an identifier or flag can be carried to indicate that the metrics are merged ones. After receiving the routes for the Service ID with the identifier, the ingress would do the route selection based on pre-configured algorithms (see Section 3 of this document).¶
As the service metrics and network delays are in different units, here is an exemplary algorithm for an ingress router to compare the cost to reach the service instances at Site-i or Site-j.¶
ServD-i * CP-j Pref-j * NetD-i
Cost-i=min(w *(----------------) + (1-w) *(------------------))
ServD-j * CP-i Pref-i * NetD-j
¶
When a set of service Metadata is converted to a simple metric, a decision process is determined by the metric semantics and deployment situations. The goal is to integrate the conventional network decision process with the service Metadata into a unified decision-making process for path selection.¶
Not all metadata attributes specified in this document are intended for use in every deployment. Each deployment may choose to consider only a subset of the available metadata attributes based on its specific service requirements.¶
- Deployment-Specific Attribute Selection:¶
A deployment may prioritize only certain metadata attributes relevant to its operational needs. For example, one deployment might only use the Service Delay Prediction Index for latency-sensitive applications, while another might focus solely on the Capacity Availability Index to manage resource availability.¶
- Influence on BGP Decision Process:¶
The edge service Metadata influences next-hop selection differently from traditional BGP metrics (e.g., Local Preference, MED). Unlike a general next-hop metric that can affect many routes, edge service Metadata selectively impacts optimal next-hop selection for specific routes configured to consider these service-specific attributes. This targeted influence allows for optimized path selection without disrupting broader route decisions.¶
- Handling Degraded Metrics (Policy-Based):¶
If a service-specific metric degrades beyond a configured threshold (e.g., the Service Delay Prediction Index exceeds an acceptable delay threshold or the Capacity Availability Index drops below a required level), the ingress router will treat that route as ineligible for traffic steering. This is similar to a BGP route withdrawal, where the degraded route is deprioritized or ignored, even if traditional BGP attributes would otherwise favor it. This ensures that traffic is directed only to service instances that meet the defined performance criteria.¶
- Fallback to Non-Metadata Routes:¶
If no suitable routes with the required metadata are available, the BGP decision process defaults to traditional attribute evaluation [RFC4271], ensuring consistent routing even when metadata-specific paths are absent.¶
This approach provides flexibility and adaptability in routing decisions, allowing each deployment to apply relevant metadata attributes and enforce performance thresholds for improved service quality.¶
This appendix provides non-normative examples of how a deployment may apply the procedures described in Section 7.¶
In a deployment where the Route Reflector (RR) is the primary policy decision point, the RR may apply metadata-aware local policy when selecting routes for reflection. In such a deployment, routers that rely on the RR for best-path selection receive routes that already reflect the policy outcome.¶
In deployments where the RR is responsible for pre-selecting routes, the RR may combine recognized Edge Metadata with traditional BGP attributes when determining the preferred route for a service. The RR can then reflect only the selected route to its client routers, such as ingress PEs, in accordance with local policy. This can help align reflected routes with service-specific requirements while limiting the number of routes distributed to clients.¶
Deployments using this model SHOULD consider Optimal Route Reflection (ORR) [RFC9107] so that route selection reflects the perspective of ingress routers rather than the physical location of the RR.¶
In some deployments, the RR may reflect multiple candidate routes, for example by using Add-Paths. In such a deployment, the ingress node receives those candidate routes and applies local metadata-aware policy to determine the preferred route for the selected service.¶
The ingress node may combine recognized metadata values with traditional BGP attributes when deriving route preference. This allows the ingress node to make service-specific routing decisions based on its local policy and on the metadata available for the candidate routes.¶
In a deployment where routers exchange iBGP routes directly in addition to receiving reflected routes, all participating nodes, including any RR, should apply consistent metadata-aware policy so that route selection remains aligned across the administrative domain.¶
In this model, the RR is not the sole policy decision point. Instead, each node that performs metadata-aware preference computation applies consistent policy to the same set of recognized metadata and routing attributes. This helps reduce the risk of inconsistent route selection among nodes that receive the same candidate routes.¶
A deployment may choose to assign greater weight to recognized metadata values than to traditional routing attributes, may weigh them equally, or may treat metadata only as a secondary refinement after traditional routing considerations. The weighting method is deployment specific and is not specified by this document.¶
For example, one deployment may emphasize service-delay-related metadata for latency-sensitive services, while another may emphasize availability-related or resource-related metadata. Another deployment may use metadata only after candidate routes have already been narrowed by traditional BGP policy. These examples are illustrative only and do not impose any required computation method.¶