Internet Engineering Task Force B. Zhang, Ed.
Internet-Draft Pengcheng Laboratory
Intended status: Informational Y. Dai, Ed.
Expires: 9 August 2026 Sun Yat-sen University
B. Shen, Ed.
Harbin Institute of Technology
5 February 2026
Computing metrics as a service (CMAS) for facilitating traffic steering
in CATS framework
draft-zhangb-cats-cmas-01
Abstract
In the context of CATS applications, resource modeling and dynamic
scheduling face core challenges: heterogeneous computing resources
(e.g., CPUs, GPUs, FPGAs) with differentiated characteristics are
difficult to unify through traditional coarse-grained metrics (e.g.,
virtual machine/container counts). Moreover, dynamically changing
resource states (e.g., resource occupancy, service instance load
cycles) complicate routing table maintenance in network nodes,
creating bottlenecks for resources scheduling. This document
provides a service-oriented computing capability modeling framework,
abstracting heterogeneous resources into standardized service units
for efficient resource modelling and traffic steering.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 9 August 2026.
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
Zhang, et al. Expires 9 August 2026 [Page 1]
Internet-Draft cmas February 2026
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Computing Metrics As a Service (CMAS) . . . . . . . . . . . . 4
4. Service modelling with CMAS . . . . . . . . . . . . . . . . . 6
5. Service Distribution Under CMAS . . . . . . . . . . . . . . . 8
6. Service Consuming Under CMAS . . . . . . . . . . . . . . . . 10
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.1. Informative References . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11
1. Introduction
Computing-aware traffic steering (CATS) is a traffic engineering
approach that takes into account the dynamic nature of computing
resources and network state to optimize service-specific traffic
forwarding towards a given service instance. As described in
[I-D.ietf-cats-framework], the Computing-Aware Traffic Steering
(CATS) framework assumes that there might be multiple service
instances that are providing one given service, which are running in
one or more service sites. Each of these service instances can be
accessed via a service contact instance, which is a client-facing
service function instance. A single service site may host one or
multiple service contact instances. A single service site may have
limited computing resources available at a given time, whereas the
various service sites may experience different resource availability
issues over time. Therefore, steering traffic among different
service sites can address the issues of lacking resources in a
specific service site. Base on this, [I-D.ietf-cats-framework]
provides an architectural framework that aims at facilitating the
making of compute- and network-aware traffic steering decisions in
networking environments where computing service resources are
deployed.
In CATS framework, C-SMA collects both computing-related capabilities
and metrics, and associates them with a CS-ID that identifies the
service. The C-SMA then advertises CS-IDs along with metrics to
related C-PSes in the network. Computing metrics are very huge and
Zhang, et al. Expires 9 August 2026 [Page 2]
Internet-Draft cmas February 2026
may change very frequently, which make them unsuitable for direct
dissemination on the network. [I-D.ietf-cats-metric-definition]
proposes to use normalized metrics in CATS. Level 1 and level 2
metrics are proposed to transfer on the network instead of the level
0 raw metrics. How to make level 1 and level 2 metrics to steer
traffic is further to be studied.
Unlike electricity, computing metrics cannot be quantified simply in
units like "kWh/kWh", especially considering the different types of
CPUs, GPUs, FPGAs, ASICs and other chips, it is difficult to make a
unified measurement. The concrete normalization method for computing
metrics is very hard and is a key factor hindering the development of
CATS. The normalization will face two challenges. The first is
different service provider may use different normalization method,
which will make C-PS hard to decide for a normalized metric. The
second is a normalized value may lose important information of the
concrete raw metrics.
To solve this problem, this draft proposes a service-oriented
computing capability modeling framework, named Computing Metrics as a
Service (CMAS). CMAS is a standardization approach that packages
computing metrics (e.g., FLOPS, memory, latency) alongside services
instead of pure resource aggregation. When deploying specific
services, the service site allocates resources based on these bundled
metric units, enabling efficient, service-oriented resource modelling
across heterogeneous infrastructures.
2. Terminology
This document makes use of the terms defined in
[I-D.ietf-cats-framework] and also makes use of the following terms:
* Computing Metrics as a Service (CMAS): CMAS is a standardization
approach that packages computing metrics (e.g., FLOPS, memory,
latency) alongside services. When deploying specific services,
the service site allocates resources based on these bundled metric
units, enabling efficient, service-oriented resource allocation
across heterogeneous infrastructures.
* Public service platform: The public service platform hosts the
complete set of CATS public services and acts as a bridge between
clients and service sites. From it, service sites can download
and deploy offerings, while clients can formulate and submit their
service requests.
Zhang, et al. Expires 9 August 2026 [Page 3]
Internet-Draft cmas February 2026
3. Computing Metrics As a Service (CMAS)
CMAS is realized through ingeniously building a public service
platform, which hosts the complete set of CATS public services and
acts as a bridge between clients and service sites. From it, service
sites can download and deploy offerings, while clients can formulate
and submit their service requests. Most importantly, in this
platform, services and computing metrics are bundled, and the service
site allocates resources according to the computing metric units
bundled with services when deploying specific services. The
lightweight service-oriented resource model can be easliy build with
CMAS, making it possible for CATS to be widely deployed on the
Internet. The public service platform provides all public services
of the CATS framework and serves as a bridge between clients and
service sites, from which the service sits can download and deploy
some services to provide service for clients and the clients can
build their service requests. Table 1 illustrates a typical public
service table—an openly searchable and browsable registry for both
clients and service sites.
Zhang, et al. Expires 9 August 2026 [Page 4]
Internet-Draft cmas February 2026
+--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+
| Service ID | Service Name | Input | Service | Service | Computing | Storage | Computing Time | Software |
| | | | Description | Runing Code | Requirement | Requirement | | Dependency |
| | | | | | | | | |
+--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+
| | | Motion Capture | This service | | | | | |
| | | Voice Tracking |receives multiple | |multi-thread CPUs| 16GB DRAM | | Unity, |
| AR1 | AR/VR | Eye Tracking |inputs from sensors| Github Link |with minimum | 256GB SSD | ≤ 1ms | Unreal Engine, |
| | | Environmental |and generate scenes| |2.0GHz; Higher | | | etc. |
| | | Sensing | | |than RTX 4060 | | | |
+--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+
| | | Transport standard | Automation Driving| | CPU: ≥4.0GHz, | 64GB DDR5 DRAM | | Apache Kafka |
| TP1 | Intelligent | datas, transport | Sensing Enviroment| Github Link | ≥24MB L3 Cache, | ≥1TB NVMe SSD | ≤20ms | Apollo |
| | transportation | traffic info, etc. | | | GPU: ≥200 TOPS | | | CUDA |
|--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+
| | | Video input source | Video Game Live | |CPU: ≥4.5GHz, 12 | ≥32GB DDR5 DRAM | depending on | OBS Studio |
| LB1 | Live | Audio input source | Interaction Live | Github Link |cores; GPU: NVENC| ≥5TB NVMe SSD | pecific scene | WebRTC |
| | broadcase | Interaction input | Sport Live | |encoder | | 0.5s - 3s | FFmpeg |
|--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+
| | | speech input |real-time caption | | CPU: ≥3.5GHz, 16| ≥32GB DDR5 DRAM | | CUDA/cuDNN |
| ST1 | Simultaneous |(optional) action cap|conf. translation | Github Link | threads; GPU: | ≥1TB NVMe SSD | ≤ 1s | Apache Kalfa |
| | interpretation | interaction input | | | RTX 4090, FP16 | ≥16GB GPU DRAM | | |
+--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+
Table 1: examples of the service table in the public service platform. The service ID represents the service.
The service name is the name of the service. The input describes the concrete information and format of the input data for the service.
The service description details the concrete function of the service. The service code contains the location of the service code.
The computing requirement lists the basic computing resources demands of the service such as CPU/GPU/NPU detailed information.
The storage requirement lists the basic storage demands of the service such as memory and disk detailed information.
The computing time describes the computing delay of the service when computing a basic data sample.
The software dependency describes the software environment for deploying the service.
The service ID is empowered to indicate a kind of service ability by
the service table. The client can query the service table to find
the service he may interest, build his service request using the
service ID, and send the service request to its Ingress CATS-
Forwarder to get the service. The service site can query the service
table and find services interested, allocates resouces based on the
computing and storage requirements of a service and deploy these
services as service instances. A Service contact instance can be run
on the service site for these service instances who provide the same
service.
Zhang, et al. Expires 9 August 2026 [Page 5]
Internet-Draft cmas February 2026
By ingeniously designing the public-service platform, clients can
formulate their requirements in plain service-language, while service
sites normalize their heterogeneous compute and storage into service-
specific units only—no unified abstraction across resource types is
required. The service table spells out a common resource recipe
(CPU, memory, runtime) for one logical service unit. A site may
therefore:
* allocate 3× that recipe and run three AR1 service instances, or
* allocate 4× and run four TP1 service instances, all according to
its own capacity and business goals.
The computing time listed in the table is the delay measured when the
basic recipe processes the basic data sample (Table 1). If a client
wants faster turnaround, he simply requests more service instances
(higher Gas); CATS will pick the site/instance combination whose real
computing time ≤ requested delay, while keeping cost close to his
stated budget.
4. Service modelling with CMAS
A service site as a public service provider or contributor can
publish some specific services to the public service platform based
on the fields in the service table. A service site—such as a
regional cloud-computing pool—deploys services by browsing the public
service platform’s catalogue and applying for the specific services
it intends to host. The service site models its available compute
and storage resources in terms of the services it can actually
deliver. Table 2 illustrates such a service model table: each row
describes one service type (e.g., AR1), while the columns expose the
site's current capacity, economics and contact points.
* GAS (Global Available Slots) – the total number of identical
service instances that the site is willing to run concurrently.
Example: GAS = 3 for AR1 means the site will keep three AR1
service instances alive, so three clients can be served
simultaneously.
* Cost per instance – the site-declared price for one such slot.
Rule of thumb: an edge site with scarce GPUs may set higher cost
than a central cloud with abundant resources.
* CSCI-ID – a tiny proxy VM created per service, whose public IP is
published as the CSCI-ID. Role: handles concrete hand-over (token
exchange, redirect, health ping) so that the main service
instances remains shielded from direct client traffic.
Zhang, et al. Expires 9 August 2026 [Page 6]
Internet-Draft cmas February 2026
Thus, the service site realizes the service-oriented computing
metrics modeling by: 1.inserts one row per service into its service
model table; 2.spawns GAS identical service instances; 3.starts one
proxy per service and stores its CSCI-ID into the service model
table; 4.updates cost and GAS in real time as local load or hardware
changes occur. This allows CATS to rank and select the most
economical or closest instance for each client request, while the
service site retains full control over its own pricing and capacity
policies.
+--------------+-----------------+---------------------+------------------------+
| Service ID | Gas | Cost | CSCI-ID |
| | | | |
| | | | |
+--------------+-----------------+---------------------+------------------------+
| | | | |
| | | | |
| AR1 | 3 | 4 | IP address |
| | | | |
| | | | |
+--------------+-----------------+---------------------+------------------------+
| | | | |
| TP1 | 6 | 5 | IP address |
| | | | |
|--------------+-----------------+---------------------+------------------------+
| | | | |
| LB1 | 2 | 7 | IP address |
| | | | |
|--------------+-----------------+---------------------+------------------------+
| | | | |
| ST1 | 1 | 2 | IP address |
| | | | |
+--------------+-----------------+---------------------+------------------------+
Table 2: example of the service-oriented modelling table of a service site.
In this way, a service site turns its raw compute and storage
capacity into service-oriented offers without ever exposing internal
computing metrics to the network. Instead of publishing FLOPS,
memory sizes or utilization curves, the site simply maintains and
distributes its service model table—a concise, standardized summary
of how many instances of each service type it can run and at what
cost.
* Initial state: the site sends the entire table to the C-SMA.
Zhang, et al. Expires 9 August 2026 [Page 7]
Internet-Draft cmas February 2026
* Subsequent changes: only the delta (new deployments, added/removed
instances, price adjustments) is transmitted, keeping updates
lightweight and avoiding the complex normalization of raw
computing metrics.
CMAS turns the traditional flood of raw computing metrics into a
single, lightweight service model table. Because the table contains
only service counts and cost values, the information volume shrinks
dramatically and is immediately understandable to the C-PS. Resource
management inside the site is equally simplified:
* To allocate resources for a service, the site simply increments
its GAS counter by one.
* To free resources, it decrements GAS by one.
No normalization, no complex resource aggregation, no metric
flooding—just “+1” or “-1” against its own service model table.
5. Service Distribution Under CMAS
[I-D.ietf-cats-framework] describes that a C-SMA collects both
computing-related capabilities and metrics, and associates them with
a CS-ID that identifies the service, then advertises CS-IDs along
with metrics to related C-PSes in the network. With CMAS mechanism,
the C-SMA only needs to collect a minimal service-metric
tuple—(Service ID, CSCI-ID, GAS, cost)—from each site. Meanwhile,
the C-NMA gathers purely network-related metrics such as delay,
jitter, and bandwidth; no raw computing figures are ever distributed,
keeping both data collection and cross-domain orchestration
lightweight and standardized.
Figure 1 illustrates how CATS metrics are disseminated under the CMAS
mechanism. A client reaches the network through “CATS-Forwarder 1”.
For the service identified by CS-ID “1”, two contact instances exist:
* Instance CSCI-ID “1” at Service Site 2 (reachable via CATS-
Forwarder 2)
* Instance CSCI-ID “3” at Service Site 3 (reachable via CATS-
Forwarder 3)
Additionally, two separate services (CS-ID “2” and CS-ID “3”) each
have one contact instance located at Service Site 2 and Service Site
3, respectively.
Zhang, et al. Expires 9 August 2026 [Page 8]
Internet-Draft cmas February 2026
CS-ID 1, CSCI-ID 1, gas, cost
CS-ID 2, CSCI-ID 2, gas, cost
:<----------------------:
: : +---------+
+----------+ : : |CS-ID 1 |
|Public | : : .--|CSCI-ID 1|
|Service | : +----------------+ | +---------+
|Platform | : | C-SMA |----| Service Site 2
+----------+ : +----------------+ | +---------+
|computing| : |CATS-Forwarder 2| '--|CS-ID 2 |
| time | : +----------------+ |CSCI-ID 2|
+--------+ | : | +---------+
| Client | | : Network +----------------------+
+--------+ | : delay | +-------+ |
| | : :<---------| C-NMA | |
| | : : | +-------+ |
+---------------------+ | |
|CATS-Forwarder 1|C-PS|----| |
+---------------------+ | Underlay |
:Service | Infrastructure | +---------+
:table | | |CS-ID 1 |
: +----------------------+ .---|CSCI-ID 3|
: | | +---------+
: +----------------+ +------+
: |CATS-Forwarder 3|--|C-SMA | Service Site 3
: +----------------+ +------+
: : |
: : | +-----------+
: : '---|CS-ID 3 |
: : |CSCI-ID 4 |
:<-------------------------------: +-----------+
CS-ID 1, CSCI-ID 3, gas, cost
CS-ID 3, CSCI-ID 4, gas, cost
Figure 1: An Example of CATS Service Metric Dissemination Under CMAS.
The service table formed in C-PS in this example is:
(CS-ID 1, CSCI-ID 1, gas, cost, Computing time, Network delay)
(CS-ID 1, CSCI-ID 3, gas, cost, Computing time, Network delay)
(CS-ID 2, CSCI-ID 2, gas, cost, Computing time, Network delay)
(CS-ID 3, CSCI-ID 4, gas, cost, Computing time, Network delay)
Zhang, et al. Expires 9 August 2026 [Page 9]
Internet-Draft cmas February 2026
In Figure 1, the C-SMA co-located with “CATS-Forwarder 2” advertises
the CMAS metrics for both service-contact instances it covers—namely
(CS-ID 1, CSCI-ID 1, gas, cost) and (CS-ID 1, CSCI-ID 2, gas, cost).
Likewise, the C-SMA agent at “Service Site 3” publishes the metrics
for the two services hosted by that site. All these service-metric
advertisements are received and processed by the C-PS hosted on
“CATS-Forwarder 1”, which also handles the network-metric
advertisements sent by the C-NMA.
Under CMAS, the C-PS can effortlessly build and maintain a unified
service view for every offering. It simply collects all service
metrics from the sites, appends the network metrics (here, delay)
advertised by the C-NMA, and fetches each service's computing time
from the public service platform. This yields a comprehensive
service table in the form (Service ID, CSCI-ID, Gas, Cost, Computing
time, Network delay), which we call it the whole service table.
Using this table, the C-PS selects the most suitable path to the
egress CATS-Forwarder by evaluating:
* the client's initial service request ("CS-ID 1" or "CS-ID 2"),
* the real-time state of each service-contact instance (gas, cost,
computing time), and
* the current network state (delay and other metrics).
6. Service Consuming Under CMAS
In the example of Figure 1, the client first queries the public
service platform to build its request from Table 1. The request is
expressed as a 4-tuple: (Service ID, Gas, Cost, Delay). It is
injected into the network through "CATS-Forwarder 1" (ingress role),
which forwards it to the C-PS. The C-PS (co-located or centralised)
selects the CSCI-ID that best matches the tuple by comparing:
* Gas ≥ requested Gas,
* Cost closest to requested Cost,
* Real Delay ≤ requested Delay, where Real Delay = Computing Time +
Network Delay taken from the whole service table.
The C-PS returns a 6-tuple response: (Service ID, CSCI-ID, Gas, Real-
Cost, Real-Delay, success). When this response reaches "CATS-
Forwarder 1", the forwarder notifies the client that the service is
ready with a simplified acknowledgement: (Service ID, Gas, Real-Cost,
Real-Delay, success). The client then:
Zhang, et al. Expires 9 August 2026 [Page 10]
Internet-Draft cmas February 2026
* pays the Real-Cost,
* assembles input data according to the "input" schema in Table 1,
* sends (Service ID, data) back to "CATS-Forwarder 1".
Using the CSCI-ID returned in the response, the forwarder establishes
a direct data-plane tunnel between the client and the selected
service-contact instance. Client data and subsequent computing
results flow through this tunnel; no further routing decisions are
needed.
Immediately after the tunnel is set up, the forwarder signals the
C-PS to decrement the corresponding GAS value in the global service
table. When the service completes, the contact instance itself sends
a “service-finished” heartbeat to the C-PS, which increments GAS
again, keeping the table accurate in real time.
7. References
7.1. Informative References
[I-D.ietf-cats-usecases-requirements]
Yao, K., "Computing-Aware Traffic Steering (CATS) Problem
Statement, Use Cases, and Requirements", June 2025,
.
[I-D.ietf-cats-metric-definition]
Kumari, W. and K. Yao, "CATS Metrics Definition", July
2025, .
[I-D.ietf-cats-framework]
Li, C., "A Framework for Computing-Aware Traffic Steering
(CATS)", July 2025, .
Authors' Addresses
Bin Zhang (editor)
Pengcheng Laboratory
Sibilong Street
Shenzhen
518055
China
Email: bin.zhang@pcl.ac.cn
Zhang, et al. Expires 9 August 2026 [Page 11]
Internet-Draft cmas February 2026
Yina Dai (editor)
Sun Yat-sen University
Sun Yat-sen Street
Guangzhou
510080
China
Email: daiyn5@mail2.sysu.edu.cn
Bowen Shen (editor)
Harbin Institute of Technology
Taoyuan Street
Shenzhen
518055
China
Email: shenbowen@stu.hit.edu.cn
Zhang, et al. Expires 9 August 2026 [Page 12]