Internet-Draft RIFT July 2024
Przygienda Expires 8 January 2025 [Page]
Workgroup:
RIFT Working Group
Internet-Draft:
draft-przygienda-rift-adrift-00
Published:
Intended Status:
Standards Track
Expires:
Author:
A. Przygienda, Ed.
Juniper Networks

AD-RIFT: Adaptive RIFT

Abstract

Adaptive RIFT (AD-RIFT for short) extends RIFT to carry additional link and node information, primarily traffic engineering related. This enables the southbound computation to optimally place traffic depending on available resources and usage. Additionally, a selective disaggregation, similar to negative disaggregation, is introduced that allows to balance northbound forwarding toward specific prefixes between nodes at the same level depending on resources available.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 8 January 2025.

Table of Contents

1. Introduction

RIFT in its original form [I-D.ietf-rift-rift] does not propagate any kind of metrics beside bandwidth and general cost. It also abstains from propagating dynamic metrics across the fabric. In environments which rely on, for lack of a better term, adaptive routing; adaptive meaning here reaction to link usage or more complex metrics like delay, additional metrics are needed. One such use case is the placement of long-lived, fairly fat flows in the fabric and with some additional metrics and mechanisms RIFT is ideally suited for the task executed in a distributed way due to its absence on dependence on shortest path forwarding.

2. Extra Node Information TIE AdditionalNodeTIEElement

To start with, as a fairly obvious mechanism a new TIE, identical in its scope to a node TIE is introduced. Those TIEs SHOULD be flooded at low priority and contain different link metrics per traffic class in AdditionalNodeInfoTIEElement helping in the computation of southbound direction forwarding.

Since the additional info contains many variables describing link properties for advanced computation the lack of such TIEs from a node may lead to its links not being considered for forwarding.

3. Traffic Engineering Prefix Preference TEPrefixPreferenceTIEElement

Since for scaling reasons nodes in the south direction do not see full topology and operate under normal conditions using a default route only, another mechanism has to be used to steer traffic accordingly to available resources for prefixes that are exceptional in terms of available resources. A similar mechanism in the form of negative disaggregation is already implemented in RIFT, and it can be considered in a sense the most extreme form of traffic steering, namely complete rejection to accept traffic to a prefix. Accordingly, a new TIE type TEPrefixPreferenceTIEElement is introduced that carries a negative preference in regard to a traffic class for specific prefixes and follows the prefix TIE flooding scope.

This mechanism preconditions, during north computation, just like with negative disaggregation, that a node modifies the next-hop gateway weights based on the next-hops of the less specific aggregate.

In the southern direction the analogy to negative disaggregation holds as well. A ToF node performs computation from the view points of all nodes in its plane and on realization that it has less resources to a prefix than other ToFs it issues a TEPrefixPreferenceTIEElement.

The analogy to negative disaggregation is imperfect however when it comes to transitivity. Generally, not all nodes northbound will issue a TEPrefixPreferenceTIEElement since the node with the best resource availability will abstain from generating it. Hence, the prefix is always reachable and northbound wise all the nodes at same level see same preference. However, a node could generate a TEPrefixPreferenceTIEElement for the prefix again southbound if it sees a large imbalance northbound of it though here an extended BAD computation can take care of this as well.

In terms of preference negative disaggregation computation has to be performed first and only after the according gateways have been constructed or purged can the according TE prefix preference modify the gateway preferences further. Observe that negative disaggregation can operate on complete different prefix resolution than TE preference, i.e. negative disaggregation may choose to suppress forwarding to a prefix but TE preference may disaggregate a more specific prefix and still choose to forward towards it although negative disaggregation suppressed a less specific to some of the gateways already.

Just like in case of negative disaggregation the lack of prefix preference advertisement from a node MUST be interpreted as the node accepting traffic to the prefix with maximum preference.

4. Schema Extensions

Due to changes in the TIEType adding new TIE type with a scope different from a prefix TIE scope new major version of the schema MUST be introduced, i.e. AD-RIFT is not compatible with normal RIFT although an attempt could be made to introduce a minor version with according NodeCapabilities extensions given the TIEs are completely optional.



/** Type of TIE.
*/
enum TIETypeType {
    Illegal                                     = 0,
    TIETypeMinValue                             = 1,
    /** first legal value */
    NodeTIEType                                 = 2,
    PrefixTIEType                               = 3,
    PositiveDisaggregationPrefixTIEType         = 4,
    NegativeDisaggregationPrefixTIEType         = 5,
    PGPrefixTIEType                             = 6,
    KeyValueTIEType                             = 7,
    ExternalPrefixTIEType                       = 8,
    PositiveExternalDisaggregationPrefixTIEType = 9,
    TEPrefixPreferenceTIEType                   = 10,
    AdditionalNodeInfoTIEType                   = 11,
    TIETypeMaxValue                             = 12,
}

typedef i8  TrafficClassType
/** The preference to carry traffic to this prefix compared to other nodes in the same level. Higher numbers
    show higher preference to carry traffic */
typedef i8  RelativePreferenceType

struct TEPrefixPreferenceTIEElement {
    1: required map<TrafficClassType, map<common.IPPrefixType, RelativePreferenceType>> prefixes;
}

typedef i32 DelayinNanoSecType
typedef i32 DelayVariationPerMilliSecinNanoSecType
typedef i32 PacketLossPerBillionPacketsType
typedef i32 AdministrativeGroupType

struct AdditionalLinkInfoTypePerTrafficClassType {
    1: optional PacketLossPerBillionPacketsType           loss;
    2: optional DelayVariationPerMilliSecinNanoSecType    delay_variation;
    3: optional common.BandwithInMegaBitsType             used_bandwidth;
    4: optional common.BandwithInMegaBitsType             maximum_available_bandwidth;
}

struct AdditionalLinkInfoType {
    1: optional DelayinNanoSecType                                                  min_delay;
    2: optional DelayinNanoSecType                                                  max_delay;
    3: optional common.BandwithInMegaBitsType                                       bandwidth;
    4: optional AdministrativeGroupType                                             admin_group;
   10: optional map<TrafficClassType, AdditionalLinkInfoTypePerTrafficClassType>    per_class_info;
}

struct AdditionalNodeInfoTIEElement {
    1: optional map<common.LinkIDType, AdditionalLinkInfoType>      links;
}

/** Single element in a TIE. */
union TIEElement {
    /** Used in case of enum common.TIETypeType.NodeTIEType. */
    1: optional NodeTIEElement     node;
    /** Used in case of enum common.TIETypeType.PrefixTIEType. */
    2: optional PrefixTIEElement          prefixes;
    /** Positive prefixes (always southbound). */
    3: optional PrefixTIEElement   positive_disaggregation_prefixes;
    /** Transitive, negative prefixes (always southbound) */
    5: optional PrefixTIEElement   negative_disaggregation_prefixes;
    /** Externally reimported prefixes. */
    6: optional PrefixTIEElement          external_prefixes;
    /** Positive external disaggregated prefixes (always southbound). */
    7: optional PrefixTIEElement
            positive_external_disaggregation_prefixes;
    /** Key-Value store elements. */
    9: optional KeyValueTIEElement keyvalues;
    /** <adrift>

        **/
   10: optional TEPrefixPreferenceTIEElement
            traffic_engineering_preference_prefix;
   11: optional AdditionalNodeInfoTIEElement
            additional_node;
   /** </adrift>

    **/
}

Figure 1: RIFT information distribution

5. Normative References

[I-D.ietf-rift-rift]
Przygienda, T., Head, J., Sharma, A., Thubert, P., Rijsman, B., and D. Afanasiev, "RIFT: Routing in Fat Trees", Work in Progress, Internet-Draft, draft-ietf-rift-rift-24, , <https://datatracker.ietf.org/doc/html/draft-ietf-rift-rift-24>.

Author's Address

Tony Przygienda (editor)
Juniper Networks
1137 Innovation Way
Sunnyvale, CA 94089
United States of America