A Framework for Fast Network Notifications

Internet-Draft	FANN Framework	June 2026
Song & Dong	Expires 25 December 2026	[Page]

Abstract

Many network applications, ranging from Artificial Intelligence (AI) / Machine Learning (ML) training and inference to large-scale cloud services, require networks with various combinations of high bandwidth, low delay, low jitter, and minimal packet loss. Meeting these requirements depends on the network's ability to adapt rapidly to faults, signal degradation, and congestion. The companion problem statement describes why existing mechanisms are too slow, too coarse, or too resource-intensive to react within the timescales at which modern forwarding hardware can detect and disseminate intended conditions.¶

This document defines a framework for Fast Network Notifications (FANN). It describes a reference architecture, the functional roles involved in generating and consuming notifications, an information model, delivery and scoping models, procedures for discovery, registration, and subscription, and the integration of fast network notifications with existing Layer 2 to 4 mechanisms. This framework is intended to guide the development of one or more fast network notification protocol specifications.¶

1. Introduction

Modern high-performance networks, in particular data center (DC) and data center interconnect (DCI) fabrics serving AI/ML and cloud workloads, demand rapid adaptation to changing network conditions. A single fiber link failure, signal degradation, or transient congestion event can stall a distributed training job, waste compute and energy, and degrade service experience [I-D.ietf-rtgwg-net-notif-ps].¶

Contemporary forwarding hardware can detect link failures, signal degradation reported as link errors, queue buildup, microbursts, and output-queue congestion at microsecond to sub-millisecond timescales. However, the time required to disseminate this information to the remote nodes that can act on it typically far exceeds the detection time. This gap between detection and reaction is the central problem that fast network notifications address.¶

The Fast Network Notifications Problem Statement [I-D.ietf-rtgwg-net-notif-ps] documents the need for a fast notification mechanism and the limitations of existing approaches. The companion requirements [I-D.geng-fantel-fantel-requirements] and gap analysis [I-D.geng-fantel-fantel-gap-analysis] documents elaborate the requirements and the deficiencies of current technologies. Built on these documents, this document defines a framework to describe the overall architecture, the functional roles, the information carried, how notifications are delivered and scoped, and how the mechanism integrates with existing protocols and technologies across layers.¶

This informational document does not define a wire protocol, encoding, or YANG model. Those are expected to be specified in separate protocol and management documents that build on this framework.¶

1.1. Scope

This framework applies to limited-domain networks under a single administrative control, consistent with the deployment assumptions of the FANN charter. It prioritizes the requirements of DC and DCI networks where rapid responsiveness is critical, while remaining applicable to other deployments such as wide-area backbone networks.¶

The framework initially targets notifications for link failures, signal degradation reported as link errors, and port queue congestion, while remaining extensible to additional conditions in the future. The specific actions a recipient takes in response to a notification (for example fast reroute, adaptive load balancing, or rate adjustment) are out of scope of this framework; they are the responsibility of the consuming subsystem and the protocols that realize those actions.¶

In this document, "fast" does not denote a single rigid numerical threshold. It characterizes a class of mechanisms designed to minimize notification delivery time so that the latency is on the order of microseconds to milliseconds, depending on the operational objective and the diameter of the notification domain, and is substantially shorter than the Round-Trip Time (RTT) of the affected traffic.¶

This framework is solution-agnostic. It defines the functional roles, information model, and delivery and scoping models that a fast network notification solution is expected to instantiate, but it does not specify, mandate, or endorse any particular protocol, encoding, or solution document. It is intentionally general so that a range of realization approaches can conform to it, potentially in combination, without conflicting with one another or with this framework. Consistent with the FANN charter, fast generation and consumption in the forwarding plane (ideally in hardware) is the primary design point and the means of meeting the latency targets described above; consumption by the control plane or management plane is a secondary objective, permitted only where it preserves routing stability and does not compromise forwarding-plane responsiveness. Specific solutions are developed in separate documents; such documents are expected to map their behavior onto the roles and models defined here, and any capability they require that is not yet covered is expected to be accommodated as an extension of this framework rather than a departure from it.¶

1.2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

3. Fast Network Notification Framework

This chapter defines the core of the framework: its design principles, the deployment scenarios it serves, the functional reference architecture, the information carried in notifications, and the models for delivering, scoping, discovering, and controlling them.¶

3.1. Design Principles and Goals

The framework is guided by the following principles, derived from the problem statement [I-D.ietf-rtgwg-net-notif-ps] and requirements [I-D.geng-fantel-fantel-requirements].¶

Event-driven, not periodic:: Notifications are generated in response to detected events. This distinguishes fast network notifications from preconfigured periodic mechanisms such as BFD [RFC5880], which detect rather than disseminate.¶
Forwarding-plane optimized:: The primary design point is fast generation and consumption in the forwarding plane, ideally in hardware, to meet responsiveness targets. Consumption by the control plane and management plane is a secondary objective and MUST NOT compromise routing stability.¶
Lightweight and bounded:: Notification messages are compact and the system is designed to bound the load it places on the network, especially during the very events it reports. The notification system MUST NOT exacerbate a failure or congestion event.¶
Action-agnostic:: A notification message conveys information; it does not mandate a specific reaction. A notification MAY explicitly indicate a recommended action, or the action MAY be determined implicitly by the consumer from the information carried.¶
Extensible:: The information model and event taxonomy are extensible to additional conditions, metrics, and scopes without redefining the architecture. Extensibility is equally a principle for the protocol design: the notification message encoding SHOULD allow new information elements, event types, and optional fields to be added in a forward-compatible way, such that a receiver MUST skip or ignore any element it does not understand rather than discarding the notification. This lets the set of carried information evolve over time without breaking the interoperability with existing implementations or requiring a new protocol version.¶
Complementary:: Fast network notifications complement, and do not replace, existing OAM, control-plane, and telemetry mechanisms. They bridge the time gap between event onset and slower control-plane or telemetry-driven responses.¶
Scoped and isolated:: Notifications are confined to a notification domain. Domain identification and isolation are first-class concerns.¶
Decoupled from routing convergence:: Fast-changing network state SHOULD be conveyed by mechanisms that are separate from the routing protocol database and their own flooding and best-path computation, so that high-frequency or transient events do not introduce churn, instability, or excessive recomputation in the routing control plane. This separation lets notifications be generated and refreshed at a fast pace independently of routing convergence, while any consumption by routing remains a secondary objective bounded by the routing-stability constraint above.¶

3.2. Deployment Scenarios

Fast network notifications apply across a range of network scenarios, but the time budget, processing constraints, and the mechanisms that are practical differ substantially between them. This framework does not assume one-size-fits-all: the scenarios below have materially different characteristics, and a given deployment is expected to select the delivery mode, scope, and realization mechanism appropriate to its scenario rather than apply a single mechanism everywhere.¶

Intra-data-center fabric:: Within a single DC fabric (for example a Clos topology), originators and consumers are typically a small number of hops apart, propagation delay is very low, and forwarding hardware can both detect and consume notifications. This scenario has the tightest time budget (sub-millisecond) and is the most amenable to forwarding-plane, in-band, or scoped-flooding delivery with action such as adaptive load balancing or local repair. The dominant challenge is volume and rate of change at scale rather than propagation distance.¶
Single-hop DCI / point-to-point WAN:: Between two sites or routers connected by one (logical) hop, the recipient set is small and often known in advance, favoring unicast or a directed notification to the upstream or ingress node. The time budget is dominated by the link propagation delay, which is fixed; the design goal is to add minimal processing delay on top of it so the notification still beats the affected traffic's E2E reaction loop.¶
Multi-hop managed DCI interconnect:: Data centers may also be interconnected across multiple IP hops by a managed network, for example for AI collaborative computing and other DCI services. Unlike arbitrary WAN paths, this case is typically highly engineered: traffic follows deterministic, traffic-engineered (TE) paths, and network slicing can be used to isolate tenants. Because the path and the relevant upstream or ingress nodes are known in advance, notifications can be delivered either unicast or hop-by-hop along the path and scoped per slice or per tenant, and the resulting action can be applied at tenant or path granularity rather than only per physical link or node.¶
Multi-hop / arbitrary WAN paths:: When notifications must reach nodes several hops away or across a wider domain, propagation delay, the number of potential recipients, and the risk of notification storms all grow. Time budgets are typically milliseconds rather than sub-millisecond, and subscription, relaying with filtering/aggregation, and bounded scoping become essential. Some timeliness targets achievable intra-DC may simply not be feasible here; the framework expects such cases to use more conservative techniques, and the feasibility of meeting a specific target is itself scenario-dependent.¶

The functional roles (Section 3.3), information model (Section 3.4), and delivery modes (Section 3.5) defined in this document are common across scenarios, but their realization and the achievable latency are not. Where a requirement (for example a sub-millisecond target) is stated, it SHOULD be understood as scoped to the scenario in which it is feasible.¶

3.3. Reference Architecture

3.3.1. Functional Roles

The framework defines four functional roles. A single physical or virtual network element MAY implement more than one role.¶

        +-------------------------------------------------+
        |              Notification Domain                |
        |                                                 |
        |   [Detect]      [Distribute]        [Act]       |
        |                                                 |
        |  +----------+  +-----------+    +-----------+   |
        |  |Originator|->|  Relay    |--->| Consumer  |   |
        |  | (detect/ |  | (forward/ |    | (receive/ |   |
        |  | generate)|  |  filter/  |    |  action)  |   |
        |  +----+-----+  |  damp)    |    +-----+-----+   |
        |       |        +-----+-----+          |         |
        |       |              |                |         |
        |       v              v                v         |
        |  ........................................       |
        |  :        Notification Controller       :       |
        |  : (discovery / registration / policy / :       |
        |  :        global optimization)          :       |
        |  ........................................       |
        +-------------------------------------------------+

Figure 1: FANN Functional Roles Within One Domain

Notification Originator:: Detects an event using local detection mechanisms (for example link fault detection, error counters, queue occupancy thresholds, or BFD [RFC5880] as a detection input) and generates a notification. The originator determines, by policy or signaling, the set of consumers and the delivery mode, and applies origination-side controls such as damping and rate limiting.¶
Notification Relay:: Receives a notification and forwards it toward additional consumers. A relay MAY filter, aggregate, deduplicate, or damp notifications. Relays enable hop-by-hop and scoped-flooding delivery and allow load to be bounded inside the domain.¶
Notification Consumer:: Receives a notification and may act on it in the data plane (for example rate adjusting, ECMP rebalancing, flow steering, traffic pause, etc.), and/or pass the information to the control plane or management plane. A consumer MAY also be a relay.¶
Notification Controller:: An optional entity that supports discovery, registration, subscription, and policy distribution, and that may consume notifications for global traffic-engineering or load-balancing optimization. The controller is not required to be on the fast delivery path and SHOULD NOT be a single point of failure for forwarding-plane reactions.¶

3.3.2. Notification Lifecycle

A fast network notification proceeds through the following stages.¶

Detection. A node observes an event at the forwarding plane using a local detection mechanism. Detection mechanisms are out of scope of this framework, but their output is the trigger for notification generation.¶
Generation. The originator constructs a notification populated from the information model (Section 3.4), subject to origination policy, damping, and rate limiting.¶
Delivery. The notification is delivered to the intended consumers using one of the delivery modes (Section 3.5), possibly via one or more relays.¶
Consumption. A consumer parses the notification and decides whether and how to act, based on the information carried and on any local state it holds.¶
Action. The consumer performs an action (out of scope of this framework) and may relay the notification further.¶
Recovery and withdrawal. When the condition clears, the originator may generate a recovery notification so consumers can revert or update their state. Recovery notifications are subject to damping so that flapping conditions do not generate excessive traffic.¶

3.3.3. Detection Assumptions and Constraints

The mechanisms by which a node detects an event are out of scope of this framework, but the framework assumes their existence and depends on their characteristics. Two assumptions are important.¶

First, the E2E responsiveness of a fast notification system is bounded by detection time as well as delivery time: a notification cannot be faster than the moment the originating node becomes aware of the condition. Detection latency, accuracy, and false-positive behavior therefore directly shape what the notification system can achieve, and an event that is detected slowly or unreliably limits the value of fast delivery.¶

Second, detection itself has a cost that interacts with scaling. For example, achieving fast liveness detection by running BFD [RFC5880] at very short transmit intervals consumes forwarding and control resources and does not by itself notify any node beyond the BFD endpoints. Driving detection intervals down to obtain faster notification can impose significant load, and this trade-off between detection speed and detection cost SHOULD be considered together with the notification load discussed in Section 4.4. Where hardware can detect a condition directly (for example loss of signal, FEC errors, or queue-occupancy thresholds), it is generally preferable to detection mechanisms that rely on periodic message exchange such as BFD. The relevant distinction is between hardware-based and protocol-session-based detection in terms of speed and overhead, rather than between polling and non-polling as such: a hardware mechanism may itself poll internally, but its detection speed and per-event cost are typically far lower than those of a protocol session driven to an aggressive interval.¶

3.4. Information Model

A fast network notification carries one or more information elements. For a given scenario some elements are mandatory and others optional; the framework does not require all elements in every notification. The detailed encoding is left to protocol specifications. The information elements are:¶

Event Type:: The class of event, for example failure, signal degradation, congestion, or performance degradation, and whether the notification reports onset or recovery. The event taxonomy is extensible.¶
Location of Event:: An identifier of where the event occurred, for example a link, node, interface, or queue identifier. Location identifiers SHOULD be interpretable by consumers within the notification domain.¶
Fine-grained Network Status:: Quantifiable metrics such as link utilization, available bandwidth, link capacity, queue length, level of congestion, link or node delay, jitter, and packet loss. Conveying such quantitative metrics, rather than a binary up/down indication, enables graduated and proportional responses such as weighted load-sharing adjustments.¶
Path Identification:: Identification of the path affected by the event, allowing consumers to scope their reaction to specific paths.¶
Flow/Service Identification:: Identification of an affected flow (for example a 5-tuple) or service, allowing differentiated, per-flow or per-service responses.¶
Timing and Validity:: Optional event timestamp and a validity or hold time after which the reported condition should be considered stale absent refresh or recovery.¶
Action Hint:: An optional explicit indication of a recommended action. When absent, the consumer determines the action implicitly from the other elements.¶
Origin and Sequence:: Originator identity, which may be represented by the source IP address of the notification, and an optional sequence or epoch indicator. The sequence or epoch supports ordering, deduplication, and loop detection at relays and consumers; it is RECOMMENDED where notifications may be relayed, flooded, or reordered, and MAY be omitted in simple cases such as single-hop unicast delivery where such protection is unnecessary.¶

A consistent information model across implementations is necessary for interoperability; defining the normative model and encodings is a task for the protocol specification.¶

3.5. Delivery and Scoping

3.5.1. Delivery Modes

Depending on the position and number of consumers, the framework supports the following delivery modes. A scenario MAY use more than one.¶

Unicast:: Direct delivery to a single consumer. Suitable when the originator knows the specific node that must react (for example a designated ingress or upstream node).¶
Multicast / Point-to-Multipoint:: Delivery to a selected group of consumers, for example along a service or forwarding path. Suitable when a defined set of nodes must react together.¶
Hop-by-hop:: Delivery along a series of nodes on a specified path, with each node acting as a relay and possibly a consumer. Suitable for propagating awareness upstream along an affected path.¶
Scoped Flooding:: Dissemination to all nodes within a bounded region of the domain. Suitable for critical events with many interested consumers, with special attention to control overhead and duplicate suppression.¶

3.5.2. Transport Considerations

Delivery MAY reuse existing messaging and transport mechanisms or a new lightweight mechanism MAY be defined where existing ones cannot meet the latency or forwarding-plane processing targets. Regardless of the underlying transport, the delivery mechanism is responsible for timely delivery to the intended consumers and for bounding the load it introduces.¶

Because notifications are most valuable precisely when the network is under stress, the transport MUST support prioritization so that notifications are not delayed or dropped behind the very congestion they report. A notification that is queued behind the congested traffic loses most of its value. Prioritization can be realized using existing forwarding-plane mechanisms, including:¶

DiffServ marking, for example a dedicated DSCP [RFC2474] code point mapped to a high-priority or low-latency per-hop behavior (for example Expedited Forwarding [RFC3246]) along the notification path, so that classification and queuing of notifications can be done in hardware at every hop;¶
a strict-priority or low-latency queue, or a dedicated control-class queue, separated from user-data queues so notifications bypass congested data queues;¶
at Layer 2, priority marking such as IEEE 802.1p / PCP where the delivery path traverses bridged segments.¶

The chosen marking and per-hop behavior MUST be consistent across the notification domain so that priority is honored E2E within the domain. Operators MUST be able to configure the marking, and the markings used for notifications SHOULD be reserved so that ordinary traffic cannot claim the same priority and so that notification traffic itself cannot be abused to obtain preferential treatment (Section 5). Because notifications occupy a high-priority class, their volume MUST be bounded by the rate limiting, damping, and filtering of Section 3.7 to avoid starving other control traffic.¶

Reliability requirements vary by scenario: some events warrant best-effort, low-latency delivery, while others (for example recovery state) may warrant acknowledgement or periodic refresh.¶

3.5.3. Notification Domain and Isolation

Fast network notifications are confined to a notification domain. The framework requires mechanisms to:¶

identify a notification domain and its membership;¶
ensure notifications are not propagated outside the domain without explicit policy;¶
prevent notifications from one domain being injected into or trusted by another.¶

Domain scoping bounds the blast radius of both legitimate notification storms and malicious injection, and it aligns the trust boundary with the single administrative control assumed by the charter.¶

3.6. Discovery, Registration, and Subscription

To deliver notifications only to interested and authorized consumers, the framework supports the following procedures. A deployment MAY use configuration, dynamic signaling, or a combination.¶

Discovery:: Originators and consumers determine the existence, identity, and capabilities (event types, encodings, delivery modes) of relevant peers and relays within the domain.¶
Registration:: A node registers as a potential originator or consumer within the domain, establishing the trust and addressing state needed for delivery.¶
Subscription:: A consumer expresses interest in specific event types, locations, paths, flows, or metric thresholds. A subscription-based approach ensures each consumer receives only relevant information, reducing unnecessary overhead. Subscriptions MAY be brokered by a controller or established directly between nodes.¶

These procedures MAY be realized by reusing existing protocols where appropriate, or by new mechanisms defined in the protocol specification work.¶

3.7. Loop Prevention, Filtering, and Damping

Because relays may forward notifications and consumers may relay further, the solution MUST provide for:¶

Loop prevention:: Use of origin identity, sequence/epoch indicators, scope limits (for example a hop or region bound), and duplicate suppression so that a notification does not circulate indefinitely.¶
Filtering and aggregation:: Relays MAY filter notifications that are not relevant to downstream consumers and MAY aggregate multiple related events to reduce volume.¶
Damping:: The solution MUST define where responsibility lies for handling rapidly changing conditions, such as a flapping link. Damping MAY be applied at the originator, at a transit relay, or required to reach the consumer; the chosen location and its controls MUST be specified explicitly. A common policy is to report a degradation immediately but to delay reporting the corresponding recovery for a configurable interval to confirm stability.¶

4. Realization and Operational Considerations

This chapter describes how the framework relates to existing technologies, the candidate mechanisms that could realize it, the applications it enables, and the scaling and operational considerations that apply when deploying it. It is informational and does not mandate any particular realization.¶

4.1. Integration with Existing Technologies

A central goal of the framework is integration with existing mechanisms across layers, as required by the charter. Fast network notifications are complementary to these mechanisms.¶

Layer 2:: Link-layer fault and error detection (for example physical-layer alarms, FEC error counters, and interface error statistics) are detection inputs to the originator. Layer 2 protection may act as a consumer's response.¶
Layer 3 / Routing:: Fast network notifications complement IGP/BGP-based dissemination and FRR [RFC4090] [RFC5714]. Whereas a Point of Local Repair acts on a local topology view and may cause congestion on a backup path, a notification can give upstream nodes a wider view before they react. Consumption by routing protocols is a secondary objective and MUST preserve routing stability; notifications MUST NOT be allowed to induce control-plane churn or instability. Topology and inventory models such as [RFC8345] may provide context for interpreting location and path identifiers.¶
Layer 4 / Transport:: ECN [RFC3168] signals congestion to the transport sender only coarsely and over a full RTT. Fast network notifications can deliver richer congestion information to network nodes far sooner, but they then act inside the network while end-to-end transport congestion control (TCP, QUIC, or RDMA/RoCE) acts at the endpoints, so the two loops run concurrently on the same traffic. A solution MUST ensure its actions remain a net-positive complement to transport: it SHOULD preserve per-flow ordering and avoid abrupt RTT or capacity changes where feasible, and SHOULD NOT suppress or rewrite end-to-end congestion signals such as ECN marks. Where the interaction cannot be shown to be benign, a conservative reaction is preferred; detailed coordination between network-side and transport-side reactions is for further study with the relevant transport working groups.¶
Detection and OAM:: BFD [RFC5880] provides fast bidirectional fault detection between endpoints but does not notify other nodes; it can serve as a detection input to the originator. Obtaining faster detection by shortening BFD transmit intervals increases resource consumption, as discussed in Section 3.3.3. IOAM [RFC9197], the Alternate-Marking (AM) [RFC9341], and IPFIX [RFC7011] provide detailed data-plane measurements but are not designed for lightweight, rapid alerts to specific nodes for immediate action. Performance metrics may be defined consistently with [RFC7799]. Fast network notifications fill this gap and feed, rather than replace, telemetry pipelines.¶

The interaction with each technology, including any required protocol extensions, is expected to be developed in the relevant IETF working groups.¶

4.2. Candidate Realization Approaches

This section surveys, non-normatively, classes of mechanism that could realize fast network notifications. It does not endorse a specific approach; the choice depends on the deployment scenario (Section 3.2), and a solution MAY combine more than one.¶

Advertisement of link/path status to neighbors, decoupled from the IGP:: A node advertises the up/down status and quality of its links or paths to neighboring nodes, separately from the IGP and its link-state database, so that fast or frequent updates do not perturb routing convergence (see the decoupling principle in Section 3.1). [I-D.zzhang-rtgwg-router-info] illustrates this approach; note that what is advertised is link/path reachability (up/down) and quality, which is distinct from the node and link state that an IGP floods. It suits cases where the consumers are routing or forwarding elements that benefit from a wider view without incurring IGP churn.¶
IGP/BGP protocol extensions:: Existing control-plane protocols could be extended to carry notification information. This reuses deployed machinery but must be weighed carefully against the routing-stability and overhead concerns in Section 4.1, and is generally better suited to slower-changing or control-plane-consumed information than to the fastest forwarding-plane reactions.¶
In-band / data-plane signaling:: Notifications are carried in the forwarding plane, for example in packet headers or lightweight dedicated packets, so that detection, delivery, and consumption can occur in hardware. This offers the lowest latency and best matches the intra-DC scenario, at the cost of requiring forwarding-hardware support and careful scoping. The need for such forwarding-plane notification in AI data center fabrics is motivated by [I-D.clad-rtgwg-ipfrr-aiml], which analyzes the limitations of existing IP-FRR in these fabrics and the requirements for its enhancement. As examples of the notification mechanisms, [I-D.camarillo-rtgwg-lsn] defines a fast notification protocol that operates above the Ethernet layer, and [I-D.csaszar-rtgwg-ipfrr-fn] proposed fast-notification-based IP-FRR optimization over a decade ago, with the companion [I-D.lu-fn-transport] defining a data-plane transport and message container for the notifications themselves.¶
Tunnel- or overlay-based delivery in the WAN:: For multi-site or WAN deployments, notifications may be delivered over established tunnels or overlays toward ingress or upstream nodes; work such as [I-D.hzh-fantel-wan-tunnel] explores fast notification in this context for tunnel-based transport.¶
Telemetry-assisted collection toward the traffic source:: Rather than pushing an alert outward from the detecting node, path latency and congestion state are accumulated in-band along the path and returned to the traffic source, so that the source obtains fresh path state and can steer traffic or adjust its congestion response. FALCON [I-D.song-rtgwg-falcon] realizes this by combining in-network telemetry with source routing and collecting the data on the reverse path toward the source, reducing the feedback lag to less than half the baseline RTT; it applies to both DCN and WAN and reuses existing IETF mechanisms.¶
Path- and slice-scoped flow control and backpressure:: Congestion or available-bandwidth information is notified to upstream nodes along the forwarding path and acted upon as flow control, scoped per path segment or per slice so that control applies at tenant or task granularity. [I-D.liu-rtgwg-srv6-cc] uses SRv6 segments and slicing to throttle specific flows for lossless transmission, and [I-D.han-rtgwg-fine-grained-backpressure] extends Layer 2 PFC into the WAN with hop-by-hop backpressure messages and slice-based isolation. This approach suits the managed multi-hop DCI scenario of Section 3.2.¶

Each approach trades off latency, hardware dependence, protocol reuse, and impact on routing stability differently, and fits some scenarios in Section 3.2 better than others. Coordination when multiple recipients act on the same notification is out of scope and for further study.¶

4.3. Illustrative Applications

This section sketches, non-normatively, applications that fast network notifications enable. The actions themselves are out of scope (Section 1.1); they illustrate what the information in Section 3.4 makes possible.¶

Upstream (remote) protection:: On a link or node failure notification, a node several hops upstream activates a pre-computed backup path instead of relying only on local repair, avoiding the hairpinning that purely local alternates (LFA or TI-LFA) can introduce. Efficient Remote Protection [I-D.clad-rtgwg-efficient-remote-protection] describes such a mechanism and applies both to failures and to degradations such as reduced capacity or congestion.¶
Fast protection in AI/ML fabrics:: AI/ML fabrics need convergence within tens of microseconds, requiring notification and reaction within the forwarding plane without CPU intervention. [I-D.clad-rtgwg-ipfrr-aiml] analyzes the limitations of existing IP-FRR in such fabrics and the requirements for fast, forwarding-plane protection.¶
Graduated load-sharing and flow control:: When a notification carries fine-grained status (utilization, available bandwidth, or capacity degradation) rather than a binary up/down, an upstream node can rebalance load-sharing weights in proportion to the reported severity, or apply per-flow, per-tenant, or per-slice flow control and backpressure toward the congestion point instead of the coarse, port-level pausing of link-layer PFC. Both responses are graduated and capacity-aware, shifting or throttling traffic gradually rather than all at once; the realizing mechanisms are surveyed in Section 4.2.¶

4.4. Scaling Considerations

The solution must remain effective as the network grows. Scaling pressure arises from network size (the number of nodes and links that may report events), the volume and rate of change of reported information, and the number of consumers. The design assumption is that if anything can go wrong it will, so the system must cope with a high proportion of nodes and links reporting simultaneously.¶

The framework addresses scale through subscription (delivering only relevant information), scoping and domain isolation (bounding propagation), relay-based filtering and aggregation, damping of rapidly changing conditions, and transport prioritization and rate limiting. Protocol specifications SHOULD quantify the load their mechanisms place on the forwarding and control planes under worst-case event conditions.¶

4.5. Operational Considerations

Fast network notifications introduce additional traffic. During the failures and congestion events they report, the notification system MUST NOT exacerbate the situation and SHOULD actively assist in mitigating it. Operators SHOULD be able to configure which event types trigger notifications, the delivery modes and scopes used, damping and rate-limiting parameters, and prioritization, so that notification behavior aligns with network operation policies.¶

Management and configuration of the solution are expected to be supported by YANG modules, to be defined as a separate deliverable consistent with the charter. Manageability includes observability of the notification system itself (counts, drops, damping events) so operators can verify it is helping rather than harming.¶

7. References

7.1. Normative References

[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.

7.2. Informative References

[I-D.camarillo-rtgwg-lsn]: Camarillo, P., Filsfils, C., Chachmon, N., Iny, O., Su, Y., and R. Jiang, "Lightspeed Notification Protocol", Work in Progress, Internet-Draft, draft-camarillo-rtgwg-lsn-00, 2 March 2026, <https://datatracker.ietf.org/doc/html/draft-camarillo-rtgwg-lsn-00>.
[I-D.clad-rtgwg-efficient-remote-protection]: Clad, F., Filsfils, C., Su, Y., and D. Cai, "Efficient Remote Protection", Work in Progress, Internet-Draft, draft-clad-rtgwg-efficient-remote-protection-00, 2 March 2026, <https://datatracker.ietf.org/doc/html/draft-clad-rtgwg-efficient-remote-protection-00>.
[I-D.clad-rtgwg-ipfrr-aiml]: Clad, F., Filsfils, C., Jiang, R., and D. Cai, "IP Fast Reroute for AI/ML Fabrics", Work in Progress, Internet-Draft, draft-clad-rtgwg-ipfrr-aiml-00, 2 March 2026, <https://datatracker.ietf.org/doc/html/draft-clad-rtgwg-ipfrr-aiml-00>.
[I-D.csaszar-rtgwg-ipfrr-fn]: Csaszar, A., Envedi, G. S., Tantsura, J., Kini, S., Sucec, J., and S. Das, "IP Fast Re-Route with Fast Notification", Work in Progress, Internet-Draft, draft-csaszar-rtgwg-ipfrr-fn-01, 25 February 2013, <https://datatracker.ietf.org/doc/html/draft-csaszar-rtgwg-ipfrr-fn-01>.
[I-D.geng-fantel-fantel-gap-analysis]: Geng, X., Dong, J., Cheng, W., Li, D., Zhu, Y., and H. Zhengxin, "Gap Analysis of Fast Notification for Traffic Engineering and Load Balancing", Work in Progress, Internet-Draft, draft-geng-fantel-fantel-gap-analysis-02, 26 February 2026, <https://datatracker.ietf.org/doc/html/draft-geng-fantel-fantel-gap-analysis-02>.
[I-D.geng-fantel-fantel-requirements]: Geng, X., Dong, J., Zhu, Y., Li, D., Cheng, W., and C. Liu, "Requirements of Fast Notification for Traffic Engineering and Load Balancing", Work in Progress, Internet-Draft, draft-geng-fantel-fantel-requirements-03, 26 February 2026, <https://datatracker.ietf.org/doc/html/draft-geng-fantel-fantel-requirements-03>.
[I-D.han-rtgwg-fine-grained-backpressure]: Zhengxin, H., Ruan, Z., Pang, R., Yue, Y., Yao, J., and Q. Xiong, "Fine-Grained Flow Control Backpressure Mechanism for Wide Area Networks", Work in Progress, Internet-Draft, draft-han-rtgwg-fine-grained-backpressure-02, 7 June 2026, <https://datatracker.ietf.org/doc/html/draft-han-rtgwg-fine-grained-backpressure-02>.
[I-D.hzh-fantel-wan-tunnel]: Hu, Z., Zhu, Y., Hu, J., and T. Pi, "Fast Notification for tunnel-based lossless RDMA transmission in WAN", Work in Progress, Internet-Draft, draft-hzh-fantel-wan-tunnel-02, 1 March 2026, <https://datatracker.ietf.org/doc/html/draft-hzh-fantel-wan-tunnel-02>.
[I-D.ietf-rtgwg-net-notif-ps]: Dong, J., McBride, M., Clad, F., Zhang, Z. J., Zhu, Y., Xu, X., Zhuang, R., Pang, R., Lu, H., Liu, Y., Contreras, L. M., Mehmet, D., and R. Rahman, "Fast Network Notifications Problem Statement", Work in Progress, Internet-Draft, draft-ietf-rtgwg-net-notif-ps-02, 7 May 2026, <https://datatracker.ietf.org/doc/html/draft-ietf-rtgwg-net-notif-ps-02>.
[I-D.liu-rtgwg-srv6-cc]: Liu, Y., Yao, J., Lin, C., and X. Min, "Congestion Control Based on SRv6 Path", Work in Progress, Internet-Draft, draft-liu-rtgwg-srv6-cc-01, 27 February 2026, <https://datatracker.ietf.org/doc/html/draft-liu-rtgwg-srv6-cc-01>.
[I-D.lu-fn-transport]: Lu, W., Kini, S., Csaszar, A., Envedi, G. S., and J. Tantsura, "Transport of Fast Notification Messages", Work in Progress, Internet-Draft, draft-lu-fn-transport-05, 19 August 2013, <https://datatracker.ietf.org/doc/html/draft-lu-fn-transport-05>.
[I-D.song-rtgwg-falcon]: Song, H., Wan, Y., and K. Zhu, "Fast Latency and Congestion Notification", Work in Progress, Internet-Draft, draft-song-rtgwg-falcon-00, 6 April 2026, <https://datatracker.ietf.org/doc/html/draft-song-rtgwg-falcon-00>.
[I-D.zzhang-rtgwg-router-info]: Zhang, Z. J., Wang, K., Lin, C., Vaidya, N., Tantsura, J., and Y. Liu, "Advertising Router Information", Work in Progress, Internet-Draft, draft-zzhang-rtgwg-router-info-06, 23 February 2026, <https://datatracker.ietf.org/doc/html/draft-zzhang-rtgwg-router-info-06>.
[RFC2474]: Nichols, K., Blake, S., Baker, F., and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, DOI 10.17487/RFC2474, December 1998, <https://www.rfc-editor.org/info/rfc2474>.
[RFC3168]: Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, September 2001, <https://www.rfc-editor.org/info/rfc3168>.
[RFC3246]: Davie, B., Charny, A., Bennet, J.C.R., Benson, K., Le Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V., and D. Stiliadis, "An Expedited Forwarding PHB (Per-Hop Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, <https://www.rfc-editor.org/info/rfc3246>.
[RFC4090]: Pan, P., Ed., Swallow, G., Ed., and A. Atlas, Ed., "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, DOI 10.17487/RFC4090, May 2005, <https://www.rfc-editor.org/info/rfc4090>.
[RFC5714]: Shand, M. and S. Bryant, "IP Fast Reroute Framework", RFC 5714, DOI 10.17487/RFC5714, January 2010, <https://www.rfc-editor.org/info/rfc5714>.
[RFC5880]: Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, <https://www.rfc-editor.org/info/rfc5880>.
[RFC7011]: Claise, B., Ed., Trammell, B., Ed., and P. Aitken, "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information", STD 77, RFC 7011, DOI 10.17487/RFC7011, September 2013, <https://www.rfc-editor.org/info/rfc7011>.
[RFC7799]: Morton, A., "Active and Passive Metrics and Methods (with Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, May 2016, <https://www.rfc-editor.org/info/rfc7799>.
[RFC8345]: Clemm, A., Medved, J., Varga, R., Bahadur, N., Ananthakrishnan, H., and X. Liu, "A YANG Data Model for Network Topologies", RFC 8345, DOI 10.17487/RFC8345, March 2018, <https://www.rfc-editor.org/info/rfc8345>.
[RFC9197]: Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi, Ed., "Data Fields for In Situ Operations, Administration, and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197, May 2022, <https://www.rfc-editor.org/info/rfc9197>.
[RFC9341]: Fioccola, G., Ed., Cociglio, M., Mirsky, G., Mizrahi, T., and T. Zhou, "Alternate-Marking Method", RFC 9341, DOI 10.17487/RFC9341, December 2022, <https://www.rfc-editor.org/info/rfc9341>.