<?xml version="1.0" encoding="US-ASCII"?>
<!-- This Internet-Draft is rendered in the xml2rfc v3 vocabulary (RFC 7991). -->
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<rfc category="info" consensus="true" docName="draft-song-fann-framework-00"
     ipr="trust200902" sortRefs="true" submissionType="IETF" symRefs="true"
     tocInclude="true" version="3" xml:lang="en">
  <front>
    <title abbrev="FANN Framework">A Framework for Fast Network
    Notifications</title>

    <seriesInfo name="Internet-Draft" value="draft-song-fann-framework-00"/>

    <author fullname="Haoyu Song">
      <organization>Futurewei Technologies</organization>

      <address>
        <email>hsong@futurewei.com</email>
      </address>
    </author>

    <author fullname="Jie Dong">
      <organization>Huawei</organization>

      <address>
        <email>jie.dong@huawei.com</email>
      </address>
    </author>

    <date year="2026"/>

    <area>Routing</area>

    <workgroup>Fast Network Notifications</workgroup>

    <keyword>fast network notification</keyword>

    <keyword>traffic engineering</keyword>

    <keyword>load balancing</keyword>

    <keyword>fast reroute</keyword>

    <keyword>data center</keyword>

    <keyword>congestion</keyword>

    <abstract>
      <t>Many network applications, ranging from Artificial Intelligence (AI)
      / Machine Learning (ML) training and inference to large-scale cloud
      services, require networks with various combinations of high bandwidth,
      low delay, low jitter, and minimal packet loss. Meeting these
      requirements depends on the network's ability to adapt rapidly to
      faults, signal degradation, and congestion. The companion problem
      statement describes why existing mechanisms are too slow, too coarse, or
      too resource-intensive to react within the timescales at which modern
      forwarding hardware can detect and disseminate intended conditions.</t>

      <t>This document defines a framework for Fast Network Notifications
      (FANN). It describes a reference architecture, the functional roles
      involved in generating and consuming notifications, an information
      model, delivery and scoping models, procedures for discovery,
      registration, and subscription, and the integration of fast network
      notifications with existing Layer 2 to 4 mechanisms. This framework is
      intended to guide the development of one or more fast network
      notification protocol specifications.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="introduction">
      <name>Introduction</name>

      <t>Modern high-performance networks, in particular data center (DC) and
      data center interconnect (DCI) fabrics serving AI/ML and cloud
      workloads, demand rapid adaptation to changing network conditions. A
      single fiber link failure, signal degradation, or transient congestion
      event can stall a distributed training job, waste compute and energy,
      and degrade service experience <xref
      target="I-D.ietf-rtgwg-net-notif-ps"/>.</t>

      <t>Contemporary forwarding hardware can detect link failures, signal
      degradation reported as link errors, queue buildup, microbursts, and
      output-queue congestion at microsecond to sub-millisecond timescales.
      However, the time required to disseminate this information to the remote
      nodes that can act on it typically far exceeds the detection time. This
      gap between detection and reaction is the central problem that fast
      network notifications address.</t>

      <t>The Fast Network Notifications Problem Statement <xref
      target="I-D.ietf-rtgwg-net-notif-ps"/> documents the need for a fast
      notification mechanism and the limitations of existing approaches. The
      companion requirements <xref
      target="I-D.geng-fantel-fantel-requirements"/> and gap analysis <xref
      target="I-D.geng-fantel-fantel-gap-analysis"/> documents elaborate the
      requirements and the deficiencies of current technologies. Built on
      these documents, this document defines a framework to describe the
      overall architecture, the functional roles, the information carried, how
      notifications are delivered and scoped, and how the mechanism integrates
      with existing protocols and technologies across layers.</t>

      <t>This informational document does not define a wire protocol,
      encoding, or YANG model. Those are expected to be specified in separate
      protocol and management documents that build on this framework.</t>

      <section anchor="scope">
        <name>Scope</name>

        <t>This framework applies to limited-domain networks under a single
        administrative control, consistent with the deployment assumptions of
        the FANN charter. It prioritizes the requirements of DC and DCI
        networks where rapid responsiveness is critical, while remaining
        applicable to other deployments such as wide-area backbone
        networks.</t>

        <t>The framework initially targets notifications for link failures,
        signal degradation reported as link errors, and port queue congestion,
        while remaining extensible to additional conditions in the future. The
        specific actions a recipient takes in response to a notification (for
        example fast reroute, adaptive load balancing, or rate adjustment) are
        out of scope of this framework; they are the responsibility of the
        consuming subsystem and the protocols that realize those actions.</t>

        <t>In this document, "fast" does not denote a single rigid numerical
        threshold. It characterizes a class of mechanisms designed to minimize
        notification delivery time so that the latency is on the order of
        microseconds to milliseconds, depending on the operational objective
        and the diameter of the notification domain, and is substantially
        shorter than the Round-Trip Time (RTT) of the affected traffic.</t>

        <t>This framework is solution-agnostic. It defines the functional
        roles, information model, and delivery and scoping models that a fast
        network notification solution is expected to instantiate, but it does
        not specify, mandate, or endorse any particular protocol, encoding, or
        solution document. It is intentionally general so that a range of
        realization approaches can conform to it, potentially in combination,
        without conflicting with one another or with this framework.
        Consistent with the FANN charter, fast generation and consumption in
        the forwarding plane (ideally in hardware) is the primary design point
        and the means of meeting the latency targets described above;
        consumption by the control plane or management plane is a secondary
        objective, permitted only where it preserves routing stability and
        does not compromise forwarding-plane responsiveness. Specific
        solutions are developed in separate documents; such documents are
        expected to map their behavior onto the roles and models defined here,
        and any capability they require that is not yet covered is expected to
        be accommodated as an extension of this framework rather than a
        departure from it.</t>
      </section>

      <section anchor="requirements-language">
        <name>Requirements Language</name>

        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in
        BCP&#160;14 <xref target="RFC2119"/> <xref target="RFC8174"/> when,
        and only when, they appear in all capitals, as shown here.</t>
      </section>
    </section>

    <section anchor="terminology">
      <name>Terminology</name>

      <t>This document uses the following terms.</t>

      <dl>
        <dt>Fast Network Notification (FANN):</dt>

        <dd>An event-driven message that conveys a locally detected adverse
        network condition, or the recovery from such a condition, to one or
        more remote nodes within a defined notification domain, with the
        objective of low-latency delivery suitable for action in the
        forwarding plane.</dd>

        <dt>Notification Originator:</dt>

        <dd>A functional role that detects an adverse condition (or its
        recovery) and generates a fast network notification. Also referred to
        as the originating or reporting node.</dd>

        <dt>Notification Consumer:</dt>

        <dd>A functional role that receives a fast network notification and
        may take action based on it. Also referred to as the recipient
        node.</dd>

        <dt>Notification Relay:</dt>

        <dd>A functional role that forwards or redistributes a notification
        toward additional consumers, optionally applying filtering, damping,
        or aggregation. A node may be both a relay and a consumer.</dd>

        <dt>Notification Domain:</dt>

        <dd>A bounded region of the network, under a single administrative
        control, within which fast network notifications are generated,
        distributed, and consumed, and outside of which they are not
        propagated without explicit policy.</dd>

        <dt>Notification Controller:</dt>

        <dd>An optional control-plane or management-plane entity that assists
        with discovery, registration, subscription, policy, and global
        optimization. It is not required to be on the fast notification
        delivery path.</dd>

        <dt>Event:</dt>

        <dd>A change in network condition detected by a node, such as a link
        failure, signal degradation, or output-queue congestion, including the
        recovery from a previously reported condition.</dd>
      </dl>

      <t>The terms BFD <xref target="RFC5880"/>, ECN <xref target="RFC3168"/>,
      FRR <xref target="RFC4090"/> <xref target="RFC5714"/>, and IOAM <xref
      target="RFC9197"/> are used as defined in their respective
      references.</t>
    </section>

    <section anchor="framework">
      <name>Fast Network Notification Framework</name>

      <t>This chapter defines the core of the framework: its design
      principles, the deployment scenarios it serves, the functional reference
      architecture, the information carried in notifications, and the models
      for delivering, scoping, discovering, and controlling them.</t>

      <section anchor="design-principles">
        <name>Design Principles and Goals</name>

        <t>The framework is guided by the following principles, derived from
        the problem statement <xref target="I-D.ietf-rtgwg-net-notif-ps"/> and
        requirements <xref target="I-D.geng-fantel-fantel-requirements"/>.</t>

        <dl>
          <dt>Event-driven, not periodic:</dt>

          <dd>Notifications are generated in response to detected events. This
          distinguishes fast network notifications from preconfigured periodic
          mechanisms such as BFD <xref target="RFC5880"/>, which detect rather
          than disseminate.</dd>

          <dt>Forwarding-plane optimized:</dt>

          <dd>The primary design point is fast generation and consumption in
          the forwarding plane, ideally in hardware, to meet responsiveness
          targets. Consumption by the control plane and management plane is a
          secondary objective and <bcp14>MUST NOT</bcp14> compromise routing
          stability.</dd>

          <dt>Lightweight and bounded:</dt>

          <dd>Notification messages are compact and the system is designed to
          bound the load it places on the network, especially during the very
          events it reports. The notification system <bcp14>MUST NOT</bcp14>
          exacerbate a failure or congestion event.</dd>

          <dt>Action-agnostic:</dt>

          <dd>A notification message conveys information; it does not mandate
          a specific reaction. A notification <bcp14>MAY</bcp14> explicitly
          indicate a recommended action, or the action <bcp14>MAY</bcp14> be
          determined implicitly by the consumer from the information
          carried.</dd>

          <dt>Extensible:</dt>

          <dd>The information model and event taxonomy are extensible to
          additional conditions, metrics, and scopes without redefining the
          architecture. Extensibility is equally a principle for the protocol
          design: the notification message encoding <bcp14>SHOULD</bcp14>
          allow new information elements, event types, and optional fields to
          be added in a forward-compatible way, such that a receiver
          <bcp14>MUST</bcp14> skip or ignore any element it does not
          understand rather than discarding the notification. This lets the
          set of carried information evolve over time without breaking the
          interoperability with existing implementations or requiring a new
          protocol version.</dd>

          <dt>Complementary:</dt>

          <dd>Fast network notifications complement, and do not replace,
          existing OAM, control-plane, and telemetry mechanisms. They bridge
          the time gap between event onset and slower control-plane or
          telemetry-driven responses.</dd>

          <dt>Scoped and isolated:</dt>

          <dd>Notifications are confined to a notification domain. Domain
          identification and isolation are first-class concerns.</dd>

          <dt>Decoupled from routing convergence:</dt>

          <dd>Fast-changing network state <bcp14>SHOULD</bcp14> be conveyed by
          mechanisms that are separate from the routing protocol database and
          their own flooding and best-path computation, so that high-frequency
          or transient events do not introduce churn, instability, or
          excessive recomputation in the routing control plane. This
          separation lets notifications be generated and refreshed at a fast
          pace independently of routing convergence, while any consumption by
          routing remains a secondary objective bounded by the
          routing-stability constraint above.</dd>
        </dl>
      </section>

      <section anchor="scenarios">
        <name>Deployment Scenarios</name>

        <t>Fast network notifications apply across a range of network
        scenarios, but the time budget, processing constraints, and the
        mechanisms that are practical differ substantially between them. This
        framework does not assume one-size-fits-all: the scenarios below have
        materially different characteristics, and a given deployment is
        expected to select the delivery mode, scope, and realization mechanism
        appropriate to its scenario rather than apply a single mechanism
        everywhere.</t>

        <dl>
          <dt>Intra-data-center fabric:</dt>

          <dd>Within a single DC fabric (for example a Clos topology),
          originators and consumers are typically a small number of hops
          apart, propagation delay is very low, and forwarding hardware can
          both detect and consume notifications. This scenario has the
          tightest time budget (sub-millisecond) and is the most amenable to
          forwarding-plane, in-band, or scoped-flooding delivery with action
          such as adaptive load balancing or local repair. The dominant
          challenge is volume and rate of change at scale rather than
          propagation distance.</dd>

          <dt>Single-hop DCI / point-to-point WAN:</dt>

          <dd>Between two sites or routers connected by one (logical) hop, the
          recipient set is small and often known in advance, favoring unicast
          or a directed notification to the upstream or ingress node. The time
          budget is dominated by the link propagation delay, which is fixed;
          the design goal is to add minimal processing delay on top of it so
          the notification still beats the affected traffic's E2E reaction
          loop.</dd>

          <dt>Multi-hop managed DCI interconnect:</dt>

          <dd>Data centers may also be interconnected across multiple IP hops
          by a managed network, for example for AI collaborative computing and
          other DCI services. Unlike arbitrary WAN paths, this case is
          typically highly engineered: traffic follows deterministic,
          traffic-engineered (TE) paths, and network slicing can be used to
          isolate tenants. Because the path and the relevant upstream or
          ingress nodes are known in advance, notifications can be delivered
          either unicast or hop-by-hop along the path and scoped per slice or
          per tenant, and the resulting action can be applied at tenant or
          path granularity rather than only per physical link or node.</dd>

          <dt>Multi-hop / arbitrary WAN paths:</dt>

          <dd>When notifications must reach nodes several hops away or across
          a wider domain, propagation delay, the number of potential
          recipients, and the risk of notification storms all grow. Time
          budgets are typically milliseconds rather than sub-millisecond, and
          subscription, relaying with filtering/aggregation, and bounded
          scoping become essential. Some timeliness targets achievable
          intra-DC may simply not be feasible here; the framework expects such
          cases to use more conservative techniques, and the feasibility of
          meeting a specific target is itself scenario-dependent.</dd>
        </dl>

        <t>The functional roles (<xref target="reference-architecture"/>),
        information model (<xref target="info-model"/>), and delivery modes
        (<xref target="delivery"/>) defined in this document are common across
        scenarios, but their realization and the achievable latency are not.
        Where a requirement (for example a sub-millisecond target) is stated,
        it <bcp14>SHOULD</bcp14> be understood as scoped to the scenario in
        which it is feasible.</t>
      </section>

      <section anchor="reference-architecture">
        <name>Reference Architecture</name>

        <section anchor="functional-roles">
          <name>Functional Roles</name>

          <t>The framework defines four functional roles. A single physical or
          virtual network element <bcp14>MAY</bcp14> implement more than one
          role.</t>

          <figure anchor="fig-roles">
            <name>FANN Functional Roles Within One Domain</name>

            <artwork type="ascii-art"><![CDATA[
        +-------------------------------------------------+
        |              Notification Domain                |
        |                                                 |
        |   [Detect]      [Distribute]        [Act]       |
        |                                                 |
        |  +----------+  +-----------+    +-----------+   |
        |  |Originator|->|  Relay    |--->| Consumer  |   |
        |  | (detect/ |  | (forward/ |    | (receive/ |   |
        |  | generate)|  |  filter/  |    |  action)  |   |
        |  +----+-----+  |  damp)    |    +-----+-----+   |
        |       |        +-----+-----+          |         |
        |       |              |                |         |
        |       v              v                v         |
        |  ........................................       |
        |  :        Notification Controller       :       |
        |  : (discovery / registration / policy / :       |
        |  :        global optimization)          :       |
        |  ........................................       |
        +-------------------------------------------------+
]]></artwork>
          </figure>

          <dl>
            <dt>Notification Originator:</dt>

            <dd>Detects an event using local detection mechanisms (for example
            link fault detection, error counters, queue occupancy thresholds,
            or BFD <xref target="RFC5880"/> as a detection input) and
            generates a notification. The originator determines, by policy or
            signaling, the set of consumers and the delivery mode, and applies
            origination-side controls such as damping and rate limiting.</dd>

            <dt>Notification Relay:</dt>

            <dd>Receives a notification and forwards it toward additional
            consumers. A relay <bcp14>MAY</bcp14> filter, aggregate,
            deduplicate, or damp notifications. Relays enable hop-by-hop and
            scoped-flooding delivery and allow load to be bounded inside the
            domain.</dd>

            <dt>Notification Consumer:</dt>

            <dd>Receives a notification and may act on it in the data plane
            (for example rate adjusting, ECMP rebalancing, flow steering,
            traffic pause, etc.), and/or pass the information to the control
            plane or management plane. A consumer <bcp14>MAY</bcp14> also be a
            relay.</dd>

            <dt>Notification Controller:</dt>

            <dd>An optional entity that supports discovery, registration,
            subscription, and policy distribution, and that may consume
            notifications for global traffic-engineering or load-balancing
            optimization. The controller is not required to be on the fast
            delivery path and <bcp14>SHOULD NOT</bcp14> be a single point of
            failure for forwarding-plane reactions.</dd>
          </dl>
        </section>

        <section anchor="lifecycle">
          <name>Notification Lifecycle</name>

          <t>A fast network notification proceeds through the following
          stages.</t>

          <ol>
            <li>Detection. A node observes an event at the forwarding plane
            using a local detection mechanism. Detection mechanisms are out of
            scope of this framework, but their output is the trigger for
            notification generation.</li>

            <li>Generation. The originator constructs a notification populated
            from the information model (<xref target="info-model"/>), subject
            to origination policy, damping, and rate limiting.</li>

            <li>Delivery. The notification is delivered to the intended
            consumers using one of the delivery modes (<xref
            target="delivery"/>), possibly via one or more relays.</li>

            <li>Consumption. A consumer parses the notification and decides
            whether and how to act, based on the information carried and on
            any local state it holds.</li>

            <li>Action. The consumer performs an action (out of scope of this
            framework) and may relay the notification further.</li>

            <li>Recovery and withdrawal. When the condition clears, the
            originator may generate a recovery notification so consumers can
            revert or update their state. Recovery notifications are subject
            to damping so that flapping conditions do not generate excessive
            traffic.</li>
          </ol>
        </section>

        <section anchor="detection-assumptions">
          <name>Detection Assumptions and Constraints</name>

          <t>The mechanisms by which a node detects an event are out of scope
          of this framework, but the framework assumes their existence and
          depends on their characteristics. Two assumptions are important.</t>

          <t>First, the E2E responsiveness of a fast notification system is
          bounded by detection time as well as delivery time: a notification
          cannot be faster than the moment the originating node becomes aware
          of the condition. Detection latency, accuracy, and false-positive
          behavior therefore directly shape what the notification system can
          achieve, and an event that is detected slowly or unreliably limits
          the value of fast delivery.</t>

          <t>Second, detection itself has a cost that interacts with scaling.
          For example, achieving fast liveness detection by running BFD <xref
          target="RFC5880"/> at very short transmit intervals consumes
          forwarding and control resources and does not by itself notify any
          node beyond the BFD endpoints. Driving detection intervals down to
          obtain faster notification can impose significant load, and this
          trade-off between detection speed and detection cost
          <bcp14>SHOULD</bcp14> be considered together with the notification
          load discussed in <xref target="scaling"/>. Where hardware can
          detect a condition directly (for example loss of signal, FEC errors,
          or queue-occupancy thresholds), it is generally preferable to
          detection mechanisms that rely on periodic message exchange such as
          BFD. The relevant distinction is between hardware-based and
          protocol-session-based detection in terms of speed and overhead,
          rather than between polling and non-polling as such: a hardware
          mechanism may itself poll internally, but its detection speed and
          per-event cost are typically far lower than those of a protocol
          session driven to an aggressive interval.</t>
        </section>
      </section>

      <section anchor="info-model">
        <name>Information Model</name>

        <t>A fast network notification carries one or more information
        elements. For a given scenario some elements are mandatory and others
        optional; the framework does not require all elements in every
        notification. The detailed encoding is left to protocol
        specifications. The information elements are:</t>

        <dl>
          <dt>Event Type:</dt>

          <dd>The class of event, for example failure, signal degradation,
          congestion, or performance degradation, and whether the notification
          reports onset or recovery. The event taxonomy is extensible.</dd>

          <dt>Location of Event:</dt>

          <dd>An identifier of where the event occurred, for example a link,
          node, interface, or queue identifier. Location identifiers
          <bcp14>SHOULD</bcp14> be interpretable by consumers within the
          notification domain.</dd>

          <dt>Fine-grained Network Status:</dt>

          <dd>Quantifiable metrics such as link utilization, available
          bandwidth, link capacity, queue length, level of congestion, link or
          node delay, jitter, and packet loss. Conveying such quantitative
          metrics, rather than a binary up/down indication, enables graduated
          and proportional responses such as weighted load-sharing
          adjustments.</dd>

          <dt>Path Identification:</dt>

          <dd>Identification of the path affected by the event, allowing
          consumers to scope their reaction to specific paths.</dd>

          <dt>Flow/Service Identification:</dt>

          <dd>Identification of an affected flow (for example a 5-tuple) or
          service, allowing differentiated, per-flow or per-service
          responses.</dd>

          <dt>Timing and Validity:</dt>

          <dd>Optional event timestamp and a validity or hold time after which
          the reported condition should be considered stale absent refresh or
          recovery.</dd>

          <dt>Action Hint:</dt>

          <dd>An optional explicit indication of a recommended action. When
          absent, the consumer determines the action implicitly from the other
          elements.</dd>

          <dt>Origin and Sequence:</dt>

          <dd>Originator identity, which may be represented by the source IP
          address of the notification, and an optional sequence or epoch
          indicator. The sequence or epoch supports ordering, deduplication,
          and loop detection at relays and consumers; it is
          <bcp14>RECOMMENDED</bcp14> where notifications may be relayed,
          flooded, or reordered, and <bcp14>MAY</bcp14> be omitted in simple
          cases such as single-hop unicast delivery where such protection is
          unnecessary.</dd>
        </dl>

        <t>A consistent information model across implementations is necessary
        for interoperability; defining the normative model and encodings is a
        task for the protocol specification.</t>
      </section>

      <section anchor="delivery">
        <name>Delivery and Scoping</name>

        <section anchor="delivery-modes">
          <name>Delivery Modes</name>

          <t>Depending on the position and number of consumers, the framework
          supports the following delivery modes. A scenario <bcp14>MAY</bcp14>
          use more than one.</t>

          <dl>
            <dt>Unicast:</dt>

            <dd>Direct delivery to a single consumer. Suitable when the
            originator knows the specific node that must react (for example a
            designated ingress or upstream node).</dd>

            <dt>Multicast / Point-to-Multipoint:</dt>

            <dd>Delivery to a selected group of consumers, for example along a
            service or forwarding path. Suitable when a defined set of nodes
            must react together.</dd>

            <dt>Hop-by-hop:</dt>

            <dd>Delivery along a series of nodes on a specified path, with
            each node acting as a relay and possibly a consumer. Suitable for
            propagating awareness upstream along an affected path.</dd>

            <dt>Scoped Flooding:</dt>

            <dd>Dissemination to all nodes within a bounded region of the
            domain. Suitable for critical events with many interested
            consumers, with special attention to control overhead and
            duplicate suppression.</dd>
          </dl>
        </section>

        <section anchor="transport">
          <name>Transport Considerations</name>

          <t>Delivery <bcp14>MAY</bcp14> reuse existing messaging and
          transport mechanisms or a new lightweight mechanism
          <bcp14>MAY</bcp14> be defined where existing ones cannot meet the
          latency or forwarding-plane processing targets. Regardless of the
          underlying transport, the delivery mechanism is responsible for
          timely delivery to the intended consumers and for bounding the load
          it introduces.</t>

          <t>Because notifications are most valuable precisely when the
          network is under stress, the transport <bcp14>MUST</bcp14> support
          prioritization so that notifications are not delayed or dropped
          behind the very congestion they report. A notification that is
          queued behind the congested traffic loses most of its value.
          Prioritization can be realized using existing forwarding-plane
          mechanisms, including:</t>

          <ul>
            <li>DiffServ marking, for example a dedicated DSCP <xref
            target="RFC2474"/> code point mapped to a high-priority or
            low-latency per-hop behavior (for example Expedited Forwarding
            <xref target="RFC3246"/>) along the notification path, so that
            classification and queuing of notifications can be done in
            hardware at every hop;</li>

            <li>a strict-priority or low-latency queue, or a dedicated
            control-class queue, separated from user-data queues so
            notifications bypass congested data queues;</li>

            <li>at Layer 2, priority marking such as IEEE 802.1p / PCP where
            the delivery path traverses bridged segments.</li>
          </ul>

          <t>The chosen marking and per-hop behavior <bcp14>MUST</bcp14> be
          consistent across the notification domain so that priority is
          honored E2E within the domain. Operators <bcp14>MUST</bcp14> be able
          to configure the marking, and the markings used for notifications
          <bcp14>SHOULD</bcp14> be reserved so that ordinary traffic cannot
          claim the same priority and so that notification traffic itself
          cannot be abused to obtain preferential treatment (<xref
          target="security"/>). Because notifications occupy a high-priority
          class, their volume <bcp14>MUST</bcp14> be bounded by the rate
          limiting, damping, and filtering of <xref target="damping"/> to
          avoid starving other control traffic.</t>

          <t>Reliability requirements vary by scenario: some events warrant
          best-effort, low-latency delivery, while others (for example
          recovery state) may warrant acknowledgement or periodic refresh.</t>
        </section>

        <section anchor="domain-isolation">
          <name>Notification Domain and Isolation</name>

          <t>Fast network notifications are confined to a notification domain.
          The framework requires mechanisms to:</t>

          <ul>
            <li>identify a notification domain and its membership;</li>

            <li>ensure notifications are not propagated outside the domain
            without explicit policy;</li>

            <li>prevent notifications from one domain being injected into or
            trusted by another.</li>
          </ul>

          <t>Domain scoping bounds the blast radius of both legitimate
          notification storms and malicious injection, and it aligns the trust
          boundary with the single administrative control assumed by the
          charter.</t>
        </section>
      </section>

      <section anchor="discovery">
        <name>Discovery, Registration, and Subscription</name>

        <t>To deliver notifications only to interested and authorized
        consumers, the framework supports the following procedures. A
        deployment <bcp14>MAY</bcp14> use configuration, dynamic signaling, or
        a combination.</t>

        <dl>
          <dt>Discovery:</dt>

          <dd>Originators and consumers determine the existence, identity, and
          capabilities (event types, encodings, delivery modes) of relevant
          peers and relays within the domain.</dd>

          <dt>Registration:</dt>

          <dd>A node registers as a potential originator or consumer within
          the domain, establishing the trust and addressing state needed for
          delivery.</dd>

          <dt>Subscription:</dt>

          <dd>A consumer expresses interest in specific event types,
          locations, paths, flows, or metric thresholds. A subscription-based
          approach ensures each consumer receives only relevant information,
          reducing unnecessary overhead. Subscriptions <bcp14>MAY</bcp14> be
          brokered by a controller or established directly between nodes.</dd>
        </dl>

        <t>These procedures <bcp14>MAY</bcp14> be realized by reusing existing
        protocols where appropriate, or by new mechanisms defined in the
        protocol specification work.</t>
      </section>

      <section anchor="damping">
        <name>Loop Prevention, Filtering, and Damping</name>

        <t>Because relays may forward notifications and consumers may relay
        further, the solution <bcp14>MUST</bcp14> provide for:</t>

        <dl>
          <dt>Loop prevention:</dt>

          <dd>Use of origin identity, sequence/epoch indicators, scope limits
          (for example a hop or region bound), and duplicate suppression so
          that a notification does not circulate indefinitely.</dd>

          <dt>Filtering and aggregation:</dt>

          <dd>Relays <bcp14>MAY</bcp14> filter notifications that are not
          relevant to downstream consumers and <bcp14>MAY</bcp14> aggregate
          multiple related events to reduce volume.</dd>

          <dt>Damping:</dt>

          <dd>The solution <bcp14>MUST</bcp14> define where responsibility
          lies for handling rapidly changing conditions, such as a flapping
          link. Damping <bcp14>MAY</bcp14> be applied at the originator, at a
          transit relay, or required to reach the consumer; the chosen
          location and its controls <bcp14>MUST</bcp14> be specified
          explicitly. A common policy is to report a degradation immediately
          but to delay reporting the corresponding recovery for a configurable
          interval to confirm stability.</dd>
        </dl>
      </section>
    </section>

    <section anchor="realization">
      <name>Realization and Operational Considerations</name>

      <t>This chapter describes how the framework relates to existing
      technologies, the candidate mechanisms that could realize it, the
      applications it enables, and the scaling and operational considerations
      that apply when deploying it. It is informational and does not mandate
      any particular realization.</t>

      <section anchor="integration">
        <name>Integration with Existing Technologies</name>

        <t>A central goal of the framework is integration with existing
        mechanisms across layers, as required by the charter. Fast network
        notifications are complementary to these mechanisms.</t>

        <dl>
          <dt>Layer 2:</dt>

          <dd>Link-layer fault and error detection (for example physical-layer
          alarms, FEC error counters, and interface error statistics) are
          detection inputs to the originator. Layer 2 protection may act as a
          consumer's response.</dd>

          <dt>Layer 3 / Routing:</dt>

          <dd>Fast network notifications complement IGP/BGP-based
          dissemination and FRR <xref target="RFC4090"/> <xref
          target="RFC5714"/>. Whereas a Point of Local Repair acts on a local
          topology view and may cause congestion on a backup path, a
          notification can give upstream nodes a wider view before they react.
          Consumption by routing protocols is a secondary objective and
          <bcp14>MUST</bcp14> preserve routing stability; notifications
          <bcp14>MUST NOT</bcp14> be allowed to induce control-plane churn or
          instability. Topology and inventory models such as <xref
          target="RFC8345"/> may provide context for interpreting location and
          path identifiers.</dd>

          <dt>Layer 4 / Transport:</dt>

          <dd>ECN <xref target="RFC3168"/> signals congestion to the transport
          sender only coarsely and over a full RTT. Fast network notifications
          can deliver richer congestion information to network nodes far
          sooner, but they then act inside the network while end-to-end
          transport congestion control (TCP, QUIC, or RDMA/RoCE) acts at the
          endpoints, so the two loops run concurrently on the same traffic. A
          solution <bcp14>MUST</bcp14> ensure its actions remain a
          net-positive complement to transport: it <bcp14>SHOULD</bcp14>
          preserve per-flow ordering and avoid abrupt RTT or capacity changes
          where feasible, and <bcp14>SHOULD NOT</bcp14> suppress or rewrite
          end-to-end congestion signals such as ECN marks. Where the
          interaction cannot be shown to be benign, a conservative reaction is
          preferred; detailed coordination between network-side and
          transport-side reactions is for further study with the relevant
          transport working groups.</dd>

          <dt>Detection and OAM:</dt>

          <dd>BFD <xref target="RFC5880"/> provides fast bidirectional fault
          detection between endpoints but does not notify other nodes; it can
          serve as a detection input to the originator. Obtaining faster
          detection by shortening BFD transmit intervals increases resource
          consumption, as discussed in <xref target="detection-assumptions"/>.
          IOAM <xref target="RFC9197"/>, the Alternate-Marking (AM) <xref
          target="RFC9341"/>, and IPFIX <xref target="RFC7011"/> provide
          detailed data-plane measurements but are not designed for
          lightweight, rapid alerts to specific nodes for immediate action.
          Performance metrics may be defined consistently with <xref
          target="RFC7799"/>. Fast network notifications fill this gap and
          feed, rather than replace, telemetry pipelines.</dd>
        </dl>

        <t>The interaction with each technology, including any required
        protocol extensions, is expected to be developed in the relevant IETF
        working groups.</t>
      </section>

      <section anchor="candidates">
        <name>Candidate Realization Approaches</name>

        <t>This section surveys, non-normatively, classes of mechanism that
        could realize fast network notifications. It does not endorse a
        specific approach; the choice depends on the deployment scenario
        (<xref target="scenarios"/>), and a solution <bcp14>MAY</bcp14>
        combine more than one.</t>

        <dl>
          <dt>Advertisement of link/path status to neighbors, decoupled from
          the IGP:</dt>

          <dd>A node advertises the up/down status and quality of its links or
          paths to neighboring nodes, separately from the IGP and its
          link-state database, so that fast or frequent updates do not perturb
          routing convergence (see the decoupling principle in <xref
          target="design-principles"/>). <xref
          target="I-D.zzhang-rtgwg-router-info"/> illustrates this approach;
          note that what is advertised is link/path reachability (up/down) and
          quality, which is distinct from the node and link state that an IGP
          floods. It suits cases where the consumers are routing or forwarding
          elements that benefit from a wider view without incurring IGP
          churn.</dd>

          <dt>IGP/BGP protocol extensions:</dt>

          <dd>Existing control-plane protocols could be extended to carry
          notification information. This reuses deployed machinery but must be
          weighed carefully against the routing-stability and overhead
          concerns in <xref target="integration"/>, and is generally better
          suited to slower-changing or control-plane-consumed information than
          to the fastest forwarding-plane reactions.</dd>

          <dt>In-band / data-plane signaling:</dt>

          <dd>Notifications are carried in the forwarding plane, for example
          in packet headers or lightweight dedicated packets, so that
          detection, delivery, and consumption can occur in hardware. This
          offers the lowest latency and best matches the intra-DC scenario, at
          the cost of requiring forwarding-hardware support and careful
          scoping. The need for such forwarding-plane notification in AI data
          center fabrics is motivated by <xref
          target="I-D.clad-rtgwg-ipfrr-aiml"/>, which analyzes the limitations
          of existing IP-FRR in these fabrics and the requirements for its
          enhancement. As examples of the notification mechanisms, <xref
          target="I-D.camarillo-rtgwg-lsn"/> defines a fast notification
          protocol that operates above the Ethernet layer, and <xref
          target="I-D.csaszar-rtgwg-ipfrr-fn"/> proposed
          fast-notification-based IP-FRR optimization over a decade ago, with
          the companion <xref target="I-D.lu-fn-transport"/> defining a
          data-plane transport and message container for the notifications
          themselves.</dd>

          <dt>Tunnel- or overlay-based delivery in the WAN:</dt>

          <dd>For multi-site or WAN deployments, notifications may be
          delivered over established tunnels or overlays toward ingress or
          upstream nodes; work such as <xref
          target="I-D.hzh-fantel-wan-tunnel"/> explores fast notification in
          this context for tunnel-based transport.</dd>

          <dt>Telemetry-assisted collection toward the traffic source:</dt>

          <dd>Rather than pushing an alert outward from the detecting node,
          path latency and congestion state are accumulated in-band along the
          path and returned to the traffic source, so that the source obtains
          fresh path state and can steer traffic or adjust its congestion
          response. FALCON <xref target="I-D.song-rtgwg-falcon"/> realizes
          this by combining in-network telemetry with source routing and
          collecting the data on the reverse path toward the source, reducing
          the feedback lag to less than half the baseline RTT; it applies to
          both DCN and WAN and reuses existing IETF mechanisms.</dd>

          <dt>Path- and slice-scoped flow control and backpressure:</dt>

          <dd>Congestion or available-bandwidth information is notified to
          upstream nodes along the forwarding path and acted upon as flow
          control, scoped per path segment or per slice so that control
          applies at tenant or task granularity. <xref
          target="I-D.liu-rtgwg-srv6-cc"/> uses SRv6 segments and slicing to
          throttle specific flows for lossless transmission, and <xref
          target="I-D.han-rtgwg-fine-grained-backpressure"/> extends Layer 2
          PFC into the WAN with hop-by-hop backpressure messages and
          slice-based isolation. This approach suits the managed multi-hop DCI
          scenario of <xref target="scenarios"/>.</dd>
        </dl>

        <t>Each approach trades off latency, hardware dependence, protocol
        reuse, and impact on routing stability differently, and fits some
        scenarios in <xref target="scenarios"/> better than others.
        Coordination when multiple recipients act on the same notification is
        out of scope and for further study.</t>
      </section>

      <section anchor="applications">
        <name>Illustrative Applications</name>

        <t>This section sketches, non-normatively, applications that fast
        network notifications enable. The actions themselves are out of scope
        (<xref target="scope"/>); they illustrate what the information in
        <xref target="info-model"/> makes possible.</t>

        <dl>
          <dt>Upstream (remote) protection:</dt>

          <dd>On a link or node failure notification, a node several hops
          upstream activates a pre-computed backup path instead of relying
          only on local repair, avoiding the hairpinning that purely local
          alternates (LFA or TI-LFA) can introduce. Efficient Remote
          Protection <xref
          target="I-D.clad-rtgwg-efficient-remote-protection"/> describes such
          a mechanism and applies both to failures and to degradations such as
          reduced capacity or congestion.</dd>

          <dt>Fast protection in AI/ML fabrics:</dt>

          <dd>AI/ML fabrics need convergence within tens of microseconds,
          requiring notification and reaction within the forwarding plane
          without CPU intervention. <xref target="I-D.clad-rtgwg-ipfrr-aiml"/>
          analyzes the limitations of existing IP-FRR in such fabrics and the
          requirements for fast, forwarding-plane protection.</dd>

          <dt>Graduated load-sharing and flow control:</dt>

          <dd>When a notification carries fine-grained status (utilization,
          available bandwidth, or capacity degradation) rather than a binary
          up/down, an upstream node can rebalance load-sharing weights in
          proportion to the reported severity, or apply per-flow, per-tenant,
          or per-slice flow control and backpressure toward the congestion
          point instead of the coarse, port-level pausing of link-layer PFC.
          Both responses are graduated and capacity-aware, shifting or
          throttling traffic gradually rather than all at once; the realizing
          mechanisms are surveyed in <xref target="candidates"/>.</dd>
        </dl>
      </section>

      <section anchor="scaling">
        <name>Scaling Considerations</name>

        <t>The solution must remain effective as the network grows. Scaling
        pressure arises from network size (the number of nodes and links that
        may report events), the volume and rate of change of reported
        information, and the number of consumers. The design assumption is
        that if anything can go wrong it will, so the system must cope with a
        high proportion of nodes and links reporting simultaneously.</t>

        <t>The framework addresses scale through subscription (delivering only
        relevant information), scoping and domain isolation (bounding
        propagation), relay-based filtering and aggregation, damping of
        rapidly changing conditions, and transport prioritization and rate
        limiting. Protocol specifications <bcp14>SHOULD</bcp14> quantify the
        load their mechanisms place on the forwarding and control planes under
        worst-case event conditions.</t>
      </section>

      <section anchor="operational">
        <name>Operational Considerations</name>

        <t>Fast network notifications introduce additional traffic. During the
        failures and congestion events they report, the notification system
        <bcp14>MUST NOT</bcp14> exacerbate the situation and
        <bcp14>SHOULD</bcp14> actively assist in mitigating it. Operators
        <bcp14>SHOULD</bcp14> be able to configure which event types trigger
        notifications, the delivery modes and scopes used, damping and
        rate-limiting parameters, and prioritization, so that notification
        behavior aligns with network operation policies.</t>

        <t>Management and configuration of the solution are expected to be
        supported by YANG modules, to be defined as a separate deliverable
        consistent with the charter. Manageability includes observability of
        the notification system itself (counts, drops, damping events) so
        operators can verify it is helping rather than harming.</t>
      </section>
    </section>

    <section anchor="security">
      <name>Security Considerations</name>

      <t>If not properly authenticated and rate-limited, fast network
      notifications could be a denial-of-service vector: an attacker that
      injects or floods spurious notifications could trigger unnecessary
      re-convergence, path changes, or repeated state updates, and could
      induce state flapping to keep an originator busy. Notifications may also
      reveal sensitive operational information, whether by inspection or by an
      adversary registering as a consumer.</t>

      <t>Accordingly, solutions built on this framework <bcp14>MUST</bcp14>
      provide integrity protection and origin authentication of notifications,
      <bcp14>MUST</bcp14> apply rate controls on both sending and receiving,
      and <bcp14>MUST</bcp14> address trust boundaries around domains and
      subscriptions, authorization of notification sources, and protection of
      sensitive operational data. Because stronger security can add latency,
      the trade-off between notification latency and security strength is
      considered per scenario. Domain identification and isolation (<xref
      target="delivery"/>) are central to confining notifications to the
      trusted administrative boundary.</t>

      <t>The charter's restriction to a single administrative control reduces,
      but does not eliminate, the threat surface. Because the operator
      controls every originator, relay, and consumer and the trust boundary
      coincides with the notification domain (<xref target="delivery"/>), the
      boundary can drop notifications arriving from outside it (constraining
      external injection and spoofing), exposure of operational data to third
      parties is bounded, and trust for discovery, registration, and
      subscription can reuse existing intra-domain infrastructure. This lets
      the design favor lightweight, low-latency mechanisms internally while
      concentrating stronger enforcement at the domain boundary.</t>

      <t>This assumption does not remove the need for in-domain protection:
      insider threats, compromised or malfunctioning nodes, and the
      self-inflicted denial of service of a flapping link all originate inside
      the boundary. The requirements above therefore still apply within the
      domain, and the single-administrative-control premise should be treated
      as defense in depth rather than a substitute for them.</t>
    </section>

    <section anchor="iana">
      <name>IANA Considerations</name>

      <t>This document requires no IANA actions.</t>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>

      <references>
        <name>Normative References</name>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>
      </references>

      <references>
        <name>Informative References</name>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2474.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3168.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3246.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4090.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5714.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5880.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7011.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7799.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8345.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9197.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9341.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-rtgwg-net-notif-ps.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.geng-fantel-fantel-requirements.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.geng-fantel-fantel-gap-analysis.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.liu-rtgwg-srv6-cc.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.han-rtgwg-fine-grained-backpressure.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.song-rtgwg-falcon.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.zzhang-rtgwg-router-info.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.hzh-fantel-wan-tunnel.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.clad-rtgwg-ipfrr-aiml.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.clad-rtgwg-efficient-remote-protection.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.camarillo-rtgwg-lsn.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.csaszar-rtgwg-ipfrr-fn.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.lu-fn-transport.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>
      </references>
    </references>

    <section anchor="acknowledgements" numbered="false">
      <name>Acknowledgements</name>

      <t/>
    </section>
  </back>
</rfc>
