| Internet-Draft | Intelligent Hybrid Cloud Requirements | July 2026 |
| Li | Expires 4 January 2027 | [Page] |
This document specifies the general technical capability requirements for an intelligent hybrid cloud platform. An intelligent hybrid cloud combines compute, storage, and network resources across multiple cloud deployment models, leveraging artificial intelligence algorithms to implement active hybrid cloud management functions such as intelligent resource scheduling, intelligent analysis, intelligent statistics, and intelligent prediction. It also provides support for intelligent computing power and large model development-related service technical capabilities within the hybrid cloud. This document defines capability requirements across infrastructure, unified platform management, model cross-cloud development, and intelligent operations and maintenance.¶
This document is applicable to the design, development, and deployment of intelligent hybrid cloud platforms by cloud service providers, and provides reference and specifications for users designing and deploying intelligent hybrid cloud platforms.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 4 January 2027.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Intelligent hybrid cloud combines compute, storage, and network resources across multiple cloud deployment models, leveraging artificial intelligence algorithms to implement active hybrid cloud management functions such as intelligent resource scheduling, intelligent analysis, intelligent statistics, and intelligent prediction, including cloud-to-cloud collaboration. It also provides support for intelligent computing power and large model development-related service technical capabilities within the hybrid cloud. It enables enterprises to flexibly allocate resources according to business requirements, while improving efficiency and reliability through intelligent dynamic monitoring, AI-driven analysis, and decision-making capabilities.¶
This document applies to the design, development, and deployment of intelligent hybrid cloud platforms by cloud service providers, and provides reference and specifications for users designing and deploying intelligent hybrid cloud platforms.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document uses the following terms and definitions established in [T_CCSA385.1] and [GB_T32400].¶
A cloud deployment model that contains at least two different cloud deployment models. [Source: [GB_T32400], 3.2.23]¶
A hybrid cloud deployment model enhanced by artificial intelligence technology, which implements active hybrid cloud management functions such as intelligent resource scheduling, intelligent analysis, intelligent statistics, and intelligent prediction through AI algorithms, and provides support for intelligent computing power and large model development-related service capabilities during hybrid cloud operation processes.¶
The dynamic allocation capability to divide physical GPUs, bare metal servers, and other heterogeneous hardware resources into virtual units on demand.¶
The capability to deploy and manage TensorFlow, PyTorch, PaddlePaddle, and Large Language Models (LLM) within the same computing resource pool.¶
The capability to achieve optimal allocation and elastic scaling of cross-cloud resources based on multi-objective optimization algorithms.¶
An intelligent operations and maintenance system that utilizes AI technology to implement operations data collection, anomaly detection, root cause analysis, and self-healing processing.¶
The following abbreviations are used in this document:¶
| Abbreviation | Full Name |
|---|---|
| API | Application Programming Interface |
| ARM | Advanced RISC Machines |
| CPU | Central Processing Unit |
| GPU | Graphics Processing Unit |
| LLM | Large Language Model |
| NPU | Neural Network Processing Unit |
| SDK | Software Development Kit |
| SSD | Solid State Drives |
| TLS | Transport Layer Security |
| VPN | Virtual Private Network |
| VXLAN | Virtual eXtensible Local Area Network |
This document primarily adopts technical testing and document review methods to conduct technical testing and protocol/SLA clause compliance checks on the resource integration capability, platform management capability, model development capability, and operations and maintenance capability of the evaluated hybrid cloud platform, and scores according to the pass status. Infrastructure capabilities are mandatory, while other capabilities require passing more than 80% of the evaluation items.¶
This standard applies to the intelligent transformation of existing hybrid cloud products based on hybrid cloud architecture, covering the integration of public cloud, private cloud, and edge cloud resources and the fusion of intelligent capabilities. It also applies to the needs of enterprises, government agencies, and other organizations to achieve intelligent services during digital transformation, helping enterprises clarify their capability levels and upgrade targets.¶
This indicator defines the heterogeneous cloud resource integration capability. The platform MUST meet the following requirements:¶
Support unified access to resources including public cloud, private cloud (dedicated cloud), edge cloud, and endpoint nodes;¶
Support unified management of computing resources, storage resources, network resources, database resources, middleware resources, and container platforms;¶
Provide edge node access capability, supporting local autonomous operation in offline network environments.¶
This indicator defines the hybrid cloud network access capability. The platform MUST meet the following requirements:¶
This indicator defines the hybrid cloud storage capability. The platform MUST meet the following requirements:¶
This indicator defines the specific capability requirements for heterogeneous resources and multi-source resource collaboration that an intelligent hybrid cloud SHOULD possess. The platform MUST meet the following requirements:¶
Support mapping different forms of similar resource services to unified standardized service APIs, enabling the system to identify and register different types of computing, storage, and network resources;¶
Support compatibility with mainstream GPU types and models;¶
Support compatibility with mainstream CPU hardware architectures (x86/ARM/RISC-V) for mixed deployment;¶
Support at least two types of heterogeneous hardware, such as CPU/GPU/FPGA/NPU;¶
Support multi-source heterogeneous data storage, such as structured, semi-structured, and unstructured data;¶
Support cross-cloud storage scheduling and dynamic performance specification adjustment;¶
Support optimizing cross-cloud traffic routing strategies based on AI algorithms (reinforcement learning/time series prediction);¶
Support automatic selection of low-latency links according to network congestion status;¶
Support predicting peak periods and proactively expanding bandwidth resources in advance.¶
This indicator defines the cross-cloud resource orchestration and scheduling capability requirements that an intelligent hybrid cloud SHOULD possess. The platform MUST meet the following requirements:¶
Support multi-cloud collaborative scheduling capability, supporting real-time resource scheduling across 3 or more cloud platforms (public cloud/private cloud/edge cloud);¶
Support automatic cross-cloud scaling under burst load, with resource recovery rate greater than or equal to 90% during scaling down;¶
Provide global capacity monitoring and capacity scheduling based on monitoring analysis results;¶
Support dynamic policy optimization, including multi-objective optimization algorithms based on cost, performance, and carbon emissions;¶
Predict business load changes through AI models;¶
Support visualized composite orchestration of basic resources;¶
Support generating orchestration templates, dynamically adding resource nodes to adjust orchestration;¶
Support cross-cloud resource orchestration recommendations based on scheduling strategies such as cost optimization and performance optimization.¶
This indicator defines the capability of an intelligent hybrid cloud to uniformly manage and intelligently analyze cross-cloud resources, tasks, and applications. The platform MUST meet the following requirements:¶
Support custom log storage for monitoring prediction, scheduling prediction, and cost analysis;¶
Support generating resource optimization reports and cost optimization reports;¶
Support multi-dimensional cost analysis and resource optimization recommendations;¶
Support cross-cloud log and metric correlation analysis;¶
Support implementing a visualized hybrid cloud management, operations, and operation interface.¶
This indicator defines the unified scheduling capability that an intelligent hybrid cloud platform can provide through collaborative resources. The platform MUST meet the following requirements:¶
Support dynamic allocation of hybrid cloud resources (public cloud computing power peak expansion, private cloud sensitive data processing) for AI model training;¶
Support sharding of training tasks, supporting deployment of compute-intensive tasks (such as large model pre-training) to public cloud, while retaining data preprocessing tasks in the local private cloud.¶
This indicator defines the model inference and evaluation capabilities that an intelligent hybrid cloud platform can provide through collaborative multi-cloud environments. The platform MUST meet the following requirements:¶
Support three fine-tuning modes: full update, low-rank adaptation (LoRA), and Prompt Tuning. Users MAY comprehensively select fine-tuning modes considering factors such as computing power, dataset size, downstream task type, and base model;¶
Support implementing model fine-tuning in public cloud and model testing in private cloud;¶
Support online testing functionality, allowing users to verify the accuracy and response effectiveness of models created on the platform online. Online testing supports selecting services and applications on the testing workbench for parameter configuration, inputting or referencing prompt templates for input, and completing testing;¶
Support evaluating pre-configured large models and trained models, supporting evaluation of models that have not been published as online services or have been published as online services; support configuring evaluation tasks to use public resource pools or dedicated resource pools, as well as resource size configuration.¶
This indicator defines the multi-cloud deployment capability that an intelligent hybrid cloud platform can provide. The platform MUST meet the following requirements:¶
Support seamless migration of TensorFlow/PyTorch models between X86 and ARM architectures;¶
Support multi-environment deployment of models in hybrid cloud, including deployment in private cloud, edge nodes, and other environments;¶
Support diversified model deployment strategies, such as blue-green deployment, canary deployment, and multi-replica deployment;¶
Support public resource pool and dedicated resource pool configuration. When services are published in a dedicated resource pool, services exclusively occupy resources, and corresponding computing units can be set to guarantee QPS;¶
Support deploying multiple models or multiple versions of a single model in the hybrid cloud.¶
The platform MUST meet the following requirements:¶
Support model inference service management, including start, stop, and traffic limiting;¶
Support elastic scaling of model inference services;¶
Support version updates of inference services, supporting updating specified versions for launch, and also supporting offline operations for published services, supporting smooth version replacement without directly affecting currently running version services;¶
Support a visualized interface for model inference services, including displaying model call frequency, inference performance metrics, computing resource occupancy, and other metrics;¶
Support abnormal data collection for model inference.¶
The platform MUST meet the following requirements:¶
Support cloud-edge training and inference capabilities, achieving distributed intelligence;¶
Support data cloud-edge collaborative deployment capability, allocating differently according to hot and cold data identifiers;¶
Support model lightweight compression technologies (such as quantization, pruning), adapting to resource constraints of edge devices.¶
This indicator defines the model management service capability that an intelligent hybrid cloud platform can provide. The platform MUST meet the following requirements:¶
Support viewing information of models deployed on the hybrid cloud, including model name, type, creation/modification time, etc.;¶
Support lifecycle management of large models already listed on the hybrid cloud, including model listing, publishing, querying, removing, and delisting;¶
Support version management of deployed large models, including version tracking, difference comparison between different versions, and version rollback;¶
Support at least 2 different model file storage formats.¶
This indicator defines the multi-cloud knowledge base management capability that an intelligent hybrid cloud platform can provide. The platform MUST meet the following requirements:¶
Support building knowledge bases from documents in multiple formats, including pdf, txt, md, docx, etc.;¶
Support text splitting by character, length, and semantics, as well as document cleaning;¶
Support vector databases to store vectorized text fragments, and support vector database similarity retrieval;¶
Support viewing the total number of documents and total number of characters in the knowledge base, and support document-level function configuration.¶
The capability to perform intelligent operations and maintenance optimization on the intelligent hybrid cloud platform and provide related monitoring data collection for intelligent operations and maintenance. The platform MUST meet the following requirements:¶
Support large-screen monitoring, including resource usage rate, allocation rate, alerts, and other information;¶
Support querying historical monitoring data within custom time periods;¶
Support noise reduction, classification, grading, and notification of alert information and events based on the platform's own algorithms;¶
Support unified convergence and status tracking of alarm information.¶
The capability to perform intelligent operations and maintenance log optimization on the intelligent hybrid cloud platform. The platform MUST meet the following requirements:¶
The capability to perform intelligent operations and maintenance fault optimization on the intelligent hybrid cloud platform. The platform MUST meet the following requirements:¶
Meet anomaly detection and pre-analysis for CPU/memory/ network;¶
Support root cause analysis, locating the root cause of complex faults within hours;¶
Support rapid recovery capability for minor faults within a short time;¶
Support building a fault knowledge base;¶
Support fault learning capability, building operations and maintenance intelligent agents.¶
The capability to perform intelligent operations and maintenance automation on the intelligent hybrid cloud platform. The platform MUST meet the following requirements:¶
Support self-healing capability through intelligent algorithms (causal reasoning graph engines): achieving 100% automatic repair for known fault modes (disk full, service process crash);¶
Support automated inspection: periodic health check coverage rate greater than or equal to 99%, executable rate of generated repair recommendations greater than or equal to 80%;¶
Support dynamic resource reclamation: identifying idle resources (utilization less than 10% for 24 consecutive hours), automatically releasing or hibernating them.¶
The capability to perform intelligent operations and maintenance metering and billing on the intelligent hybrid cloud platform. The platform MUST meet the following requirements:¶
Support basic resource usage metering and bill generation;¶
Provide a unified multi-cloud billing view;¶
Provide mixed billing modes for on-demand/reserved/spot instance AI tasks, such as resource-based billing, monthly subscription billing, and billing by GPU and model service (training/inference/knowledge base) call volume;¶
Support viewing bills on bill details and cost analysis pages;¶
Support multi-dimensional cost analysis and resource optimization recommendations;¶
Implement intelligent cost prediction and dynamic resource adjustment to optimize billing.¶
This memo includes no request to IANA.¶
This document specifies technical capability requirements for intelligent hybrid cloud platforms. Implementers SHOULD consider the following security aspects when deploying such platforms:¶
Data confidentiality and integrity when transferring data across public and private cloud boundaries;¶
Access control and authentication mechanisms for multi-tenant environments;¶
Network security for cross-cloud communication channels;¶
Model security, including protection of intellectual property for deployed AI models;¶
Compliance with applicable data protection regulations.¶
This document was drafted in accordance with the provisions of GB/T 1.1-2020 "Directives for Standardization - Part 1: Rules for the Structure and Drafting of Standardizing Documents."¶
Please note that certain contents of this document may involve patents. The publishing organization of this document assumes no responsibility for identifying such patents.¶
This document was proposed and administered by the China Communications Standards Association.¶
The following organizations contributed to the development of this document: China Academy of Information and Communications Technology, China Unicom Cloud Data Co., Ltd., Alibaba Cloud Technology Co., Ltd., China Mobile (Suzhou) Software Technology Co., Ltd., JD Cloud Computing Co., Ltd., Tianyi Cloud Technology Co., Ltd., and Lenovo (Beijing) Co., Ltd.¶