Internet-Draft CD-OAM-PS-REQ July 2026
Pang, et al. Expires 2 January 2027 [Page]
Workgroup:
opsawg
Internet-Draft:
draft-pang-opsawg-cd-oam-problem-00
Published:
Intended Status:
Informational
Expires:
Authors:
R. Pang
China Unicom
J. Li
China Unicom
J. Zhao
China Unicom

Cross-Domain OAM Problem Statement and Requirements:Across Trust Boundaries

Abstract

As network service deployments increasingly span multiple administrative domains—such as operator edge clouds, tenant datacenters, multi-cloud environments, and managed security services—operators lack standardized OAM (Operations, Administration, and Maintenance) mechanisms to localize faults across these trust boundaries. Existing OAM mechanisms, such as RFC 9516 for Service Function Chaining (SFC), typically assume all network elements reside within a single administrative domain.

This document describes the generic problem space, typical use cases, and gaps in existing standards for cross-domain OAM across trust boundaries, and outlines the requirements for a cross-domain OAM proxy mechanism. This document is intended to provide a problem baseline and requirement definitions for the OPSAWG and the broader operations community; it does not specify protocols or data models.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 2 January 2027.

Table of Contents

1. Introduction

The deployment models of modern network services are undergoing profound changes. Service paths are no longer confined within the core network of a single operator; instead, they are widely distributed across Multi-access Edge Computing (MEC) nodes, public cloud environments, and tenant-owned datacenters. In such cross-domain deployments, service chains (e.g., SFC), tunnels, or security paths frequently need to traverse distinct trust boundaries.

Existing IETF OAM standards, such as SFC OAM defined in [RFC9516], are primarily designed for single administrative domains, assuming that the entity initiating the probe and the probed network elements operate under unified management. When an OAM probe must cross administrative domains, existing mechanisms fail to provide secure, privacy-preserving fault localization because: * The target domain distrusts external probe packets. * The target domain is reluctant to expose its internal topology.

Consequently, operators currently resort to ad-hoc methods (e.g., manual Pings across VPNs, proprietary orchestrator APIs), which are non-interoperable and break OAM semantics.

This document uses cross-domain Service Function Chaining (SFC) as a primary use case, while also covering MEC offloading, multi-cloud interconnection, and managed security services, to describe the generic problem space of OAM across trust boundaries. It analyzes the gaps in current OAM standards and specifies requirements for a Cross-Domain OAM Proxy (CD-OAM Proxy).

The scope of this document is strictly limited to problem statements, use cases, gap analysis, and requirements for cross-domain OAM. The following items are explicitly out of scope: * Cross-domain service orchestration (assumed to be handled by existing orchestrators). * Automated discovery of cross-domain service capabilities. * Billing or SLA enforcement mechanisms. * Modifications to existing data plane forwarding behaviors (such as NSH, VXLAN-GPE, or SFF forwarding logic).

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC2119 [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. Use Cases

3.1. MEC Offloading and Managed Security Services

An operator deploys service forwarding capabilities at an MEC site close to the user. Traffic must traverse a firewall and Deep Packet Inspection (DPI) functions. The firewall resides in the operator's core cloud, whereas the DPI is hosted in the tenant's private datacenter.

The operator needs to verify that the cross-domain link to the tenant SFF is healthy and that the tenant SFF is reachable, without requiring the tenant to expose its internal SF topology or modify its service function implementations.

In a variant of managed security services, a third-party security provider does not accept OAM probes directly entering its internal network and provides no guarantee that its SFs support specific technologies (e.g., NSH OAM).

3.2. Multi-Cloud Interconnection

An enterprise workload is distributed across two public clouds. A service path in Cloud A steers egress traffic through a security gateway hosted by Cloud B. Cloud A and Cloud B represent different administrative domains with no shared management plane.

Cloud A requires a standardized mechanism to probe the reachability of Cloud B's boundary node without establishing a full management-plane trust relationship.

3.3. Enterprise VPN and Segmented Security Services

A large enterprise connects hundreds of branches via an operator VPN. The operator provides a shared security service chain (Firewall, IDS, DLP) hosted across multiple vendor clouds.

When application performance degrades, the enterprise IT team needs to know which segment of the service chain is at fault. These vendors will not expose internal SF health to either the enterprise or the operator, and multi-tenancy constraints restrict what information can be exposed to an individual tenant.

4. Problem Statement

4.1. Trust Boundary Traversing

Existing single-domain OAM mechanisms assume that the network element initiating a probe can send probe packets directly to all downstream elements. In cross-domain scenarios: 1. The source domain lacks the security credentials to inject packets into the target domain. 2. The target domain does not trust unauthenticated external probes.

Feedback from operators indicates that cross-domain fault resolution times are typically orders of magnitude longer than intra-domain resolution, largely due to the need for manual escalation of trouble tickets between domains.

4.2. OAM-Unaware Service Nodes

Within a single domain, an operator can uniformly configure all nodes. However, in cross-domain scenarios, the source domain has no visibility into the capabilities of the target domain's nodes. Nodes in the target domain may be legacy devices, third-party virtualized functions, or cloud-native services that do not support specific OAM extensions (such as the NSH O-bit).

Standard OAM probes entering the target domain are silently dropped, leading to false-positive fault detections. Existing standards provide no mechanism for a target domain to signal "boundary reachable only, do not probe deeper", nor do they allow the source domain to gracefully degrade to boundary-only probing based on target domain capabilities.

4.3. Fault Localization vs. Privacy

When a cross-domain OAM probe fails, source domain operators cannot determine whether the fault lies within the source domain, on the inter-domain link, or within the target domain. Standard IP Ping or BFD can verify Layer 3 reachability but cannot interpret upper-layer (e.g., SFC/NSH) semantics.

Target domains are reluctant to expose internal topologies due to security and commercial reasons. Any cross-domain OAM mechanism MUST support privacy-preserving fault localization, revealing only the minimum necessary information.

5. Gap Analysis

5.1. Limitations of Single-Domain OAM Standards

Taking SFC OAM [RFC9516] as an example, it explicitly states that proxy functions are out of scope. The resulting gap is that no IETF standard defines a CD-OAM proxy operating at administrative domain boundaries. The absence of such a proxy means there is no standardized source-domain authentication, authorization for service path probing, or relaying of OAM forwarding context across boundaries.

5.2. Semantic Disconnect in Generic L3 OAM Tools

Standard IP Ping or Bidirectional Forwarding Detection (BFD) can verify Layer 3 reachability, but in complex cross-domain service chaining scenarios, they are unaware of service function states and cannot locate which specific service node segment or security policy caused the failure. Application-layer monitoring is too coarse and introduces significant latency.

5.3. Mismatch between Security Protocols and OAM Real-Time Needs

IPsec and TLS could theoretically protect cross-domain OAM traffic. However:

  • IPsec tunnel establishment imposes heavy overhead, making it unsuitable for frequent, short-lived OAM probes.

  • TCP-based TLS introduces head-of-line blocking and connection setup latency, conflicting with real-time OAM constraints.

Neither efficiently carries the forwarding context (such as NSH SPI, SI, or other metadata) required to reconstruct OAM replies.

6. Requirements

6.1. Functional Requirements

  • REQ-1: The mechanism MUST operate without requiring any modifications to the data plane forwarding behavior of existing network elements (such as SFFs or SFs).

  • REQ-2: The mechanism SHOULD maintain backward compatibility with existing single-domain OAM standards (such as RFC 9516).

  • REQ-3: To prevent spurious probe timeouts, the data plane components of boundary proxies SHOULD process and relay packets in a low-latency, stateless manner.

  • REQ-4: The mechanism MUST support both IPv4 and IPv6 transport.

  • REQ-5: The mechanism MUST provide a "boundary-reachability-only" mode to gracefully degrade when downstream nodes or SFs lack OAM capabilities.

6.2. Security and Privacy Requirements

  • REQ-6: The proxy MUST authenticate the source administrative domain and perform path-level authorization before processing any cross-domain probe.

  • REQ-7: Cross-boundary OAM traffic MUST be protected by integrity, confidentiality, and anti-replay mechanisms, and SHOULD support confidentiality protection where policy demands.

  • REQ-8: By default, the information exposed externally MUST be restricted to the binary path status and end-to-end delay.

  • REQ-9: The mechanism MUST NOT disclose internal topology details (such as node types, hop counts, or internal IP addresses) unless explicitly authorized by the target domain's policy.

  • REQ-10: The mechanism MUST support rate limiting to mitigate DoS vectors and SHOULD provide the capability to coarsen or fuzz latency measurements against timing side-channel analysis.

6.3. Operations, Management, and Extensibility Requirements

  • REQ-11: The mechanism MUST provide a standardized fault localization model that unambiguously differentiates between faults in the source domain, the inter-domain link, and the internal target domain.

  • REQ-12: To minimize operational coupling, the mechanism SHOULD NOT require real-time control plane interactions between domains and MUST support asymmetric or offline policy configuration exchanges.

  • REQ-13: The mechanism SHOULD provide a YANG data model for policy configuration, and this model SHOULD align with generic OAM management models (such as RFC 8531).

  • REQ-14: The framework MUST be extensible to support future network data plane technologies (such as SRv6-based service chaining or SAV rule-related information announcement scenarios) without modifying its core control logic.

7. Security Considerations

8. Acknowledgements

TBD.

9. IANA Considerations

TBD.

10. References

10.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC9516]
Mirsky, G., Meng, W., Ao, T., Khasnabish, B., Leung, K., and G. Mishra, "Active Operations, Administration, and Maintenance (OAM) for Service Function Chaining (SFC)", RFC 9516, DOI 10.17487/RFC9516, , <https://www.rfc-editor.org/info/rfc9516>.

10.2. Informative References

[RFC8531]
Kumar, D., Wu, Q., and Z. Wang, "Generic YANG Data Model for Connection-Oriented Operations, Administration, and Maintenance (OAM) Protocols", RFC 8531, DOI 10.17487/RFC8531, , <https://www.rfc-editor.org/info/rfc8531>.

Authors' Addresses

Ran Pang
China Unicom
Beijing
China
Jianfei Li
China Unicom
Beijing
China
Jing Zhao
China Unicom
Beijing
China