Internet-Draft draft-hegde-lsr-ospf-better-idbx July 2024
Hegde, et al. Expires 9 January 2025 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-hegde-lsr-ospf-better-idbx-02
Published:
Intended Status:
Informational
Expires:
Authors:
S. Hegde
Juniper
A. Przygienda
Juniper
A. Lindem
LabN Consulting LLC

Improved OSPF Database Exchange Procedure

Abstract

When an OSPF router undergoes restart, previous instances of LSAs belonging to that router may remain in the databases of other routers in the OSPF domain until such LSAs are aged out. Hence, when the restarting router joins the network again, neighboring routers re-establish adjacencies while the restarting router is still bringing-up its interfaces and adjacencies and generates LSAs with sequence numbers that may be lower than the stale LSAs. Such stale LSAs may be interpreted as bi-directional connectivity before the initial database exchanges are finished and genuine bi-directional LSA connectivity exists. Such incorrect interpretation may lead to, among other things, transient traffic packet drops. This document suggests improvements in the OSPF database exchange process to prevent such problems due to stale LSA utilization. The solution does not preclude changes in the existing standard but presents an extension that will prevent this scenario.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 9 January 2025.

Table of Contents

1. Introduction

When an OSPF [RFC2328] router restarts, its stale LSAs are left in the database of other routers in the OSPF domain until the LSAs are aged out either intentionally or by the LSA age elapsing. The stale Router LSA can contain links to all the neighbors that had Full adjacencies before the router restarted.

                            ----------C--------
                            |                 |
                      A-----B                 E-----F
                            |                 |
                            --------D---------


Figure 1: OSPF Network

Figure 1 shows a very simple OSPF network. In case of C undergoing restart that does not generate purges, the other routers in the domain will hold the stale LSA of Router C in their database. The stale LSA may have links to B and E, which represents the topology of C before it went down. When C restarts again, it initiates the database exchange process with B and E. B and E may have C's stale LSA with a higher sequence number in their database than the new ones originated by C and hence erroneously assume those being the newest copy, successively bringing up the adjacency with C, and transitioning to Full state. Based on C's Stale LSA having LSA links to B and E, the Shortest Path First (SPF) back-link check is satisfied and B and E update their routing table to point to C. This may cause C to drop this traffic as C may not have all its previous adjacencies up and all LSAs in place to correctly compute the necessary routes. The situation corrects itself with C reissuing the LSAs with even higher sequence numbers over time so the condition is transient today already.

2. Solution

To prevent the transient condition described the database exchange procedure from [RFC2328] section 7.2 is extended with additional constraints to prevent an OSPF router from transitioning to Full state when it has stale LSAs originated by the database exchange neighbor in its Link State Database (LSDB).

During an IDBX the router determined it should become adjacent with another OSPF router and for that purpose initially it enters the ExStart state and creates a "Database summary list" for the neighbor. In addition to this list a "Stale database exchange list" is created for such neighbor and is initially populated with LSAs in LSDB which were originated by it. The neighbor will not transition to Full state until both the Link state request list and the Stale database exchange list are empty. During the Database Exchange process entries are removed from the Stale database exchange list when the same or a more recent LSAs is referenced in a Database Description packet or the neighbor originates a more recent instance of the LSA, and it is received in a Link State Update packet. For stale LSAs in the router's Link State Database, the neighbor will request stale LSAs and originate more recent instances via normal OSPF procedures as illustrated in Section 2.2.

2.1. OSPF Specification Additions and Modifications

The following additions and modifications to OSPF [RFC2328] are needed for this feature. The application of the feature SHOULD be based on local configuration (refer to Section 4).

  1. In section 10.3 "The Neighbor State Machine", State: ExStart, Event: NegotiationDone - The "Stale database exchange list" is created including LSAs with the "Advertising Router" set to the neighbor's Router ID.
  2. In section 10.3 "The Neighbor State Machine", State: Exchange, Event: ExchangeDone - The new neighbor state will not be Full unless both the Link state request list and the Stale database exchange list are empty.
  3. In section 10.3 "The Neighbor State Machine", State: Exchange or greater, Event: AdjOk? - If an adjacency will not be formed, the Stale database exchange list is also cleaned up.
  4. In section 10.3 "The Neighbor State Machine", State: Any state, Event: KillNbr - The Stale database exchange list is also cleaned up.
  5. In section 10.3 "The Neighbor State Machine", State: Any state, Event: LLDown - The Stale database exchange list is also cleaned up.
  6. In section 10.3 "The Neighbor State Machine", State: Any state, Event: InactivityTimer - The Stale database exchange list is is also cleaned up.
  7. In section 10.3 "The Neighbor State Machine", State: 2Way or greater, Event: 1-WayReceived - The Stale database exchange list is is also cleaned up.
  8. In section 10.6 "Receiving Database Description Packets", If the Database Description packet is accepted, LSAs originated by the neighbor that are not less recent than the local database copy will be removed from the Stale database exchange list. LSAs that are less recent than the local database copy will remain on th State database exchange list and will be updated by the neighbor through the Loading phase of the Database exchange process.
  9. In section 13, "The Flooding Procedure", (5), (g): If the orignator of the more recent LSA is a neighbor that is in Exchange or Loading state, remove the LSA from the Stale database exchange list. If the neighbor is in Loading state and both the List state request list and the Stale database exchange list are empty, the Loading Done event is generated. In most cases, the LSA will have been received from the neighbor in Exchange or Loading state. However, this cannot be guaranteed due to packet loss and different propagation delays. Additionally, the neighbor may be in Exchange and Loading state on more than one interface.

2.2. Example

Figure 2 provides an example of C restarting having originated an LSA with sequence number Y before. After restarting C originates the same LSA with sequence number X where X < Y since it is not aware of existence of version X yet.



                 C-----------------E
                               Create Stale DB Exchange
                                List

      DBD: LSA A origin:C  ------->
          Sequence:X
                              DBD: LSA A Origin:C
                      <------       Sequence:Y
      LSReq LSA A,
      origin C       ----->
                            Modified Procedure

                     <------- LSA  A, Origin C, Seq:Y
      C re-originating self LSA
      LSA A, Origin C, -------> Remove Stale
      Seq:Y+1                   LSA A, Origin C
                                From Stale DB Exchange
                                list.

                                Bring adjacency to Full state if
                                both LS Request list and Stale DB
                                Exchange list are empty.



Figure 2: Modified Database Exchange Procedure

As shown in figure Figure 2 above, E originates LSReq with Sequence number X but waits until the LSA with sequence number Y+1 (or strictly speaking, an LSA that compares as newer to the one it holds) arrives. As the LSA is still in the Stale DB Exchange List, the adjacency will remain in Loading state and will not move to Full state. All the neighbors of the restarting routers hold the neighbor FSM in Loading state and do not transition to Full state until the stale LSA is replaced with the new LSA that is more recent than the stale LSA. This ensures that other routers in the network do not compute a path through the restarting router since they cannot satisfy the bi-directionality condition in SPF computations.

3. Potential Optimizations

The solution described in Section 2 assures stale LSAs for a restarting router are either updated or purged before neighbors of the restarting router advertise a link to the restarting router. However, there are several optimizations being considered.

3.1. Restarting Neighbor Tracking

OSPF Routers would maintain a "Stale Neighbor List" containing the router IDs and interfaces for neighbor that were previously in Full state but recently went down. The procedures in Section 2 would then only be applied for neighbors on the Stale Neighbor List. Neighbors would be removed from the Stale Neighbor List when a new adjacency is formed with the neighbor or after a configured interval. The interval SHOULD be longer than the expected duration of an unplanned neighbor restart.

3.2. Limited LSA Tracking

Rather than adding all the restarting router's LSAs in the Link-State Database (LSDB) to the Stale DB Exchange list, only the following LSAs would be added:

  • Router-LSA containing one or more links to the local router
  • Network-LSAs containing a link to the local router

The procedures in Section 2 will guarantee that the LSAs on the Stale DB Exchange list are updated before the local router advertises a link to the restarting router. Given that the these stale LSAs are the only ones containing a link to the local router, their update will prevent usage of the link until the restarting router has established an adjacency with the local router. As part of the restarting router establishing an adjacency and advertising a link to the local router, the restarting router will synchronize its LSDB with the local router and will update or purge stale LSAs previously originated by the restarting router. These stale LSAs will be requested by the restarting router using Link State Request packets (section 10.9 in [RFC2328]) and, when received, updated or purgeed using the procedure specified in section 13.4 [RFC2328].

The restarting router's LSAs containing a link to the local router are added to the Stale DB Exchange list and will be updated or purged prior to the local router advertising a link to the restarting router. Additionally, the restarting router will not advertise a link to the local router until it has synchronized its LSDB with the local router. Hence, the restarting router's other stale LSAs will be updated or purged prior both routers advertising a link and these other stale LSAs need not be added to the Stale DB Exchange list.

4. Management Considerations

Application the Database exchange procedure specified in this documennt SHOULD be based on local configuration with the default behavior not to perform the addition Stale database exchange list processed specified in this document. Not all OSPF networks require elimination of traffic drops during the Database exchange process (as evidenced by the fact that OSPF has been deployed for several decades without this added packet loss mitigation). Additionally, optional configuration will encourage implementation and deployment.

5. IANA Considerations

No IANA Considerations

6. Security Considerations

7. Contributors

8. References

8.1. Informative References

8.2. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC2328]
Moy, J., "OSPF Version 2", STD 54, RFC 2328, DOI 10.17487/RFC2328, , <https://www.rfc-editor.org/info/rfc2328>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

Authors' Addresses

Shraddha Hegde
Juniper
India
Tony Przygienda
Juniper
1133 Innovation Way
Sunnyvale, CA
United States of America
Acee Lindem
LabN Consulting LLC
301 Midenhall Way
Cary, North Carolina
United States of America