<?xml version='1.0' encoding='utf-8'?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<?xml-model href="rfc7991bis.rnc"?>  <!-- Required for schema validation and schema-aware editing -->
<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> -->
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->

<rfc
      xmlns:xi="http://www.w3.org/2001/XInclude"
      category="std"
      docName="draft-zhang-idr-portid-ec-01"
      ipr="trust200902"
      obsoletes=""
      updates=""
      submissionType="IETF"
      xml:lang="en"
      tocInclude="true"
      tocDepth="4"
      symRefs="true"
      sortRefs="true"
      version="3">
  <!-- xml2rfc v2v3 conversion 2.38.1 -->
  <!-- category values: std, bcp, info, exp, and historic
    ipr values: trust200902, noModificationTrust200902, noDerivativesTrust200902,
       or pre5378Trust200902
    you can add the attributes updates="NNNN" and obsoletes="NNNN" 
    they will automatically be output with "(if approved)" -->

 <!-- ***** FRONT MATTER ***** -->

 <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
        full title is longer than 39 characters -->

   <title abbrev="Abbreviated Title">BGP PORT EC for AIDC</title>
    <seriesInfo name="Internet-Draft" value="draft-zhang-idr-portid-ec-01"/>
    <!-- add 'role="editor"' below for the editors if appropriate -->

   <!-- Another author who claims to be an editor -->

   <author fullname="Junye Zhang" initials="J" surname="Zhang">
      <organization>China Mobile</organization>
      <address>
        <postal>
          <street/>
          <!-- Reorder these if your country does things differently -->

         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>zhangjunye@chinamobile.com</email>
        <!-- uri and facsimile elements may also be added -->
     </address>
    </author>
	
	<author fullname="Rui Zhuang" initials="R" surname="Zhuang">
      <organization>China Mobile</organization>
      <address>
        <postal>
          <street/>
          <!-- Reorder these if your country does things differently -->

         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>zhuangruiyjy@chinamobile.com</email>
        <!-- uri and facsimile elements may also be added -->
     </address>
    </author>
   
   <author fullname="Zheng Zhang" initials="Z" surname="Zhang">
      <organization>ZTE Corporation</organization>
      <address>
        <postal>
          <street/>
          <!-- Reorder these if your country does things differently -->

         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>zhang.zheng@zte.com.cn</email>
        <!-- uri and facsimile elements may also be added -->
     </address>
    </author>
	
	<author fullname="Dongyu Yuan" initials="D" surname="Yuan">
      <organization>ZTE Corporation</organization>
      <address>
        <postal>
          <street/>
          <!-- Reorder these if your country does things differently -->

         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>yuan.dongyu@zte.com.cn</email>
        <!-- uri and facsimile elements may also be added -->
     </address>
    </author>

    <date year="2026"/>
    <!-- If the month and year are both specified and are the current ones, xml2rfc will fill 
        in the current day for you. If only the current year is specified, xml2rfc will fill 
     in the current day and month for you. If the year is not the current one, it is 
     necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the 
     purpose of calculating the expiry date).  With drafts it is normally sufficient to 
     specify just the year. -->

   <!-- Meta-data Declarations -->

   <area>Routing</area>
    <workgroup>IDR</workgroup>
    <!-- WG name at the upperleft corner of the doc,
        IETF is fine for individual submissions.  
     If this element is not present, the default is "Network Working Group",
        which is used by the RFC Editor as a nod to the history of the IETF. -->

   <keyword>BGP PORT AIDC</keyword>
    <!-- Keywords will be incorporated into HTML output
        files in a meta tag but they have no effect on text or nroff
        output. If you submit your draft to the RFC Editor, the
        keywords will be used for the search engine. -->

   <abstract>
      <t>This document introduces a new BGP extended community attribute for use in AI computing, 
	  which announces the port ID between Leaf switches and servers as preparation for sending large-scale traffic 
	  before initiating AI tasks.</t>
    </abstract>
  </front>
  <middle>
    <section numbered="true" toc="default">
      <name>Introduction</name>
      <t>With the rapid development of Artificial Intelligence (AI) and Machine Learning (ML), 
	  AI tasks often generate large traffic due to the characteristics of large language model computation (LLM). 
	  If the link bandwidth is insufficient, packet loss may occur. 
	  AI computation has very high reliability requirements and extremely low tolerance for packet loss and latency. 
	  When there is link congestion in the network that leads to packet loss or excessive latency, 
	  it will have a significant impact on the computational efficiency of AI tasks.</t>
	  
	  <t>In data centers used for AI and machine learning, BGP is often used as the routing protocol <xref target="RFC7938" format="default"/>. 
	  In some implementations, sufficient bandwidth between the destination server and its connected leaf switches 
	  must be ensured before sending traffic for AI tasks.
      On the network side, specifically the area comprised of the Leaf and Spine switches in Figure 1, 
	  there are numerous ECMP links. 
	  Techniques such as Packet Spray can be used to minimize congestion and packet loss. 
	  However, on the computing side, specifically the last hop between the Leaf switches and the server, 
	  congestion can easily lead to packet loss, significantly reducing the efficiency of AI tasks. 
	  To minimize or eliminate packet loss on the last hop, 
	  BGP needs to be extended to include port information on the destination leaf switch. 
	  This allows the sender to negotiate based on this information before sending traffic, 
	  ensuring sufficient bandwidth is available in the last hop and preventing congestion and packet loss due to insufficient bandwidth.
	  To reduce the stress caused by full-mesh connections, Leaf switches do not establish neighbors with each other.</t>
	  
      <figure anchor="Fig1">
        <artwork align="left" name="Figure 1" type="" alt=""><![CDATA[
            +--------+                         +--------+
            | Spine1 |                         | Spine2 |
            +-+-+-+-++                         +-+-+-+-++
              | | | |                            | | | |
            +------------------------------------+ | | |
            | | | | |    +-------------------------+ | |
            | | | | |    |                   +-------+ |
            | | | | |    |                   |         |
            | | | | +-------------------------------------+
            | | | +-----------------------+  |         |  |
          +---+ +------+ |                |  |         |  |
          | |          | |                |  |         |  |
        +-+-+---+   +--+-+--+           +-+--+--+     ++--+---+
        | Leaf1 |   | Leaf2 |           | Leaf3 |     | Leaf4 |
        +--+-+--+   +---+-+-+           +--+-+--+     +---+-+-+
           | |          | |                | |            | |     
           | +--------+ | |                | +----------+ | |
           |          | | |                |            | | |
           | +--------|-+ |                | +----------|-+ |
           | |        |   |                | |          |   |
     +-----+-+-+    +-+-+-+---+        +---+-+---+    +-+---+---+
     | Server1 |    | Server2 |        | Server3 |    | Server4 |
     +---------+    +---------+        +---------+    +---------+
           ]]></artwork>
      </figure>
      
	  <t>Figure 1 shows a typical data center used for AI computing. 
	  In this network, when Server2, 3, or 4 sends traffic to Server1 through leaf1, a common incast congestion problem may occur. 
	  That is, the link 1 between Leaf1 switch and Server1 may be congested due to insufficient bandwidth, resulting in packet loss.</t>
	  
      <t>Currently, some implementations negotiate before sending traffic from devices like Server2 and Server3 to Server1. 
	  The AI task traffic is only sent if the link bandwidth between the destination server 
	  and its connected Leaf switch (referred to as the destination switch) is sufficient. 
	  This negotiation method is outside the scope of this draft. 
	  However, before negotiation, the port information connecting the destination switch to the server needs to be obtained. 
	  This information will be sent via the newly added extended community "Route Port ID" in BGP.</t>

      <section numbered="true" toc="default">
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
       "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
       document are to be interpreted as described in <xref target="RFC2119" format="default"/>.</t>
      </section>
    </section>
	
    <section numbered="true" toc="default">
      <name>Format</name>
      <t>When announcing the route to the connected server, the BGP protocol on the Leaf switch 
	carries the switch's address and the port ID information connected to the destination server.</t>
	
	<t>Transitive IPv4-Address-Specific Extended Community defined in <xref target= "RFC7153"/> 
	and <xref target="I-D.ietf-idr-rfc4360-bis"/> 
	with new sub-type "Route Port ID" is used for carry the IPv4 address of Leaf switch 
	and the related port ID to the destination server. </t>
	
	<t>Transitive IPv6-Address-Specific Extended Community defined in <xref target= "RFC5701"/> 
	with new sub-type "Route Port ID" is used for carry the IPv6 address of the leaf switch 
	and the related port ID to the destination server.</t>
	
	  <figure anchor="Fig2">
        <artwork align="left" name="Figure 2" type="" alt=""><![CDATA[
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | 0x01 or 0x41  |   Sub-Type    |    Global Administrator       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Global Administrator (cont.)  |    Local Administrator        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           ]]></artwork>
      </figure>
	  
	  <t>Figure 2 shows the format of IPv4-Address-Specific Extended Community, where:</t>
	  <ul spacing="normal">
        <li>Sub-Type: TBD. This indicates that this is the Route Port ID extended community;</li>
        <li>Global Administrator: 4 octets, set to the IPv4 address of the switch that advertises the server route. 
		This address can be the loopback address for establishing the BGP connection;</li>
		<li>Local Administrator: 2 octets, set to the ID of the port connecting the switch and the server, 
		with a value range of 1 to 65535.</li>
      </ul>
	  
		  <figure anchor="Fig3">
        <artwork align="left" name="Figure 3" type="" alt=""><![CDATA[
       0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | 0x00 or 0x40  |    Sub-Type   |    Global Administrator       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Global Administrator (cont.)                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Global Administrator (cont.)                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Global Administrator (cont.)                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Global Administrator (cont.)  |    Local Administrator        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           ]]></artwork>
      </figure>
	  
	  <t>Figure 3 shows the format of IPv6-Address-Specific Extended Community, where:</t>
	  <ul spacing="normal">
        <li>Sub-Type: TBD. This indicates that this is the Route Port ID extended community;</li>
        <li>Global Administrator: 16 octets, set to the IPv6 address of the switch that advertises the server route. 
		This address can be the loopback address for establishing the BGP connection;</li>
		<li>Local Administrator: 2 octets, set to the ID of the port connecting the switch and the server, 
		with a value range of 1 to 65535.</li>
      </ul>
	  
    </section>
    
    <section numbered="true" toc="default">
      <name>Specification</name>
      <t>When the Leaf switch advertises routes to the server, the advertisement includes the Route Port ID extended community, 
	  which is transmitted along with the route advertisement.</t>
      
      <t>In the example shown in Figure 1, Leaf1, when advertising routes to the Spine switch, 
	  includes the Route Port ID extended community, which contains the Loopback address used to establish the BGP connection 
	  and the port ID connected to the server. The Leaf2 is the same.</t>
      
      <t>Upon receiving the route carrying the Route Port ID extended community, the leaf switch checks if the address is reachable. 
	  If unreachable, the extended community is ignored. 
	  If reachable, the address and port information are stored locally or sent to the server. 
	  This storing or sending process is outside the scope of this draft.</t>
      
      <t>Because data centers used for AI computing have a large number of ECMP paths, 
	  deploying this feature requires enabling the ADD-PATH advertisement function defined in <xref target= "RFC7911"/>, 
	  to ensure the propagation of extended community attributes. 
	  Spine or higher-level switches do not need to generate entries based on this extended community attribute. 
	  To avoid a large number of route advertisements that may result from enabling the ADD-PATH function, 
	  this advertisement SHOULD be limited to a single PoD.</t>
	  
	  <t>When a server wants to send large traffic for AI tasks, it will negotiate bandwidth based on the destination switch 
	  and port information obtained from BGP. 
	  Traffic will only be sent after successful negotiation, thus avoiding packet loss caused by congestion.
      Traffic will be sent to the server via the successfully negotiated Leaf switch.	  
	  This negotiation process is outside the scope of this draft.</t>
	  
	  <t>In the example shown in Figure 1, the routes advertised by Leaf1 and Leaf2 to Server1 
	  will carry the Route Port ID extended community. 
	  When Server3 wants to send AI task traffic to Server1, it can first negotiate with Leaf1. 
	  If the negotiation fails, it may negotiate with Leaf2. Only after the negotiation succeeds will the traffic be sent.
	  In this example, assuming Leaf1 is successfully negotiated, traffic will be sent to Server1 through Leaf1.</t>
	  
    </section>
    
    <section anchor="IANA" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>IANA is requested to allocate two new code points from 
	  the "Transitive IPv4-Address-Specific Extended Community Sub-Types" 
	  and the "Transitive IPv6-Address-Specific Extended Community Sub-Types" registry.</t>
	    <table anchor="table_1" align="center">
          <name>TABLE_1	</name>
          <thead>
            <tr>
              <th align="center">Type</th>
              <th align="center">Description</th>
			  <th align="center">Reference</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="center">TBD</td>
              <td align="center">Route Port ID</td>
			  <td align="center">This Document</td>
            </tr>
          </tbody>
        </table>
    </section>
	
    <section anchor="Security" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>This extension to BGP has similar security implications as BGP Extended Communities <xref target= "RFC7153"/>,
	  <xref target= "RFC5701"/> and <xref target="I-D.ietf-idr-rfc4360-bis"/>.</t>
    </section>
  </middle>
  <!--  *****BACK MATTER ***** -->

 <back>

   <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
		<xi:include href="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
		<xi:include href="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5701.xml"/>
		<xi:include href="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7153.xml"/>
		<xi:include href="https://datatracker.ietf.org/doc/bibxml3/draft-ietf-idr-rfc4360-bis.xml"/>
      </references>
      <references title="Informative References">
		<xi:include href="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7911.xml"/>
		<xi:include href="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7938.xml"/>
    </references>
    </references>
 </back>
</rfc>
