Internet-Draft Intelligent Hybrid Cloud Requirements July 2026
Li Expires 4 January 2027 [Page]
Workgroup:
Cloud Computing Open Source Industry Alliance
Internet-Draft:
draft-lizihan-intelligent-hybrid-cloud-00
Published:
Intended Status:
Informational
Expires:
Author:
Z. Li
China Academy of Information and Communications Technology

General Technical Capability Requirements for Intelligent Hybrid Cloud Platform

Abstract

This document specifies the general technical capability requirements for an intelligent hybrid cloud platform. An intelligent hybrid cloud combines compute, storage, and network resources across multiple cloud deployment models, leveraging artificial intelligence algorithms to implement active hybrid cloud management functions such as intelligent resource scheduling, intelligent analysis, intelligent statistics, and intelligent prediction. It also provides support for intelligent computing power and large model development-related service technical capabilities within the hybrid cloud. This document defines capability requirements across infrastructure, unified platform management, model cross-cloud development, and intelligent operations and maintenance.

This document is applicable to the design, development, and deployment of intelligent hybrid cloud platforms by cloud service providers, and provides reference and specifications for users designing and deploying intelligent hybrid cloud platforms.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 4 January 2027.

Table of Contents

1. Introduction

Intelligent hybrid cloud combines compute, storage, and network resources across multiple cloud deployment models, leveraging artificial intelligence algorithms to implement active hybrid cloud management functions such as intelligent resource scheduling, intelligent analysis, intelligent statistics, and intelligent prediction, including cloud-to-cloud collaboration. It also provides support for intelligent computing power and large model development-related service technical capabilities within the hybrid cloud. It enables enterprises to flexibly allocate resources according to business requirements, while improving efficiency and reliability through intelligent dynamic monitoring, AI-driven analysis, and decision-making capabilities.

This document applies to the design, development, and deployment of intelligent hybrid cloud platforms by cloud service providers, and provides reference and specifications for users designing and deploying intelligent hybrid cloud platforms.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1.2. Terminology

This document uses the following terms and definitions established in [T_CCSA385.1] and [GB_T32400].

hybrid cloud:

A cloud deployment model that contains at least two different cloud deployment models. [Source: [GB_T32400], 3.2.23]

intelligent hybrid cloud:

A hybrid cloud deployment model enhanced by artificial intelligence technology, which implements active hybrid cloud management functions such as intelligent resource scheduling, intelligent analysis, intelligent statistics, and intelligent prediction through AI algorithms, and provides support for intelligent computing power and large model development-related service capabilities during hybrid cloud operation processes.

hardware resource splitting:

The dynamic allocation capability to divide physical GPUs, bare metal servers, and other heterogeneous hardware resources into virtual units on demand.

multi-model integration management:

The capability to deploy and manage TensorFlow, PyTorch, PaddlePaddle, and Large Language Models (LLM) within the same computing resource pool.

intelligent resource orchestration:

The capability to achieve optimal allocation and elastic scaling of cross-cloud resources based on multi-objective optimization algorithms.

artificial intelligence for IT operations (AIOps):

An intelligent operations and maintenance system that utilizes AI technology to implement operations data collection, anomaly detection, root cause analysis, and self-healing processing.

2. Abbreviations

The following abbreviations are used in this document:

Table 1: Abbreviations Used in This Document
Abbreviation Full Name
API Application Programming Interface
ARM Advanced RISC Machines
CPU Central Processing Unit
GPU Graphics Processing Unit
LLM Large Language Model
NPU Neural Network Processing Unit
SDK Software Development Kit
SSD Solid State Drives
TLS Transport Layer Security
VPN Virtual Private Network
VXLAN Virtual eXtensible Local Area Network

3. Overview of Capability Requirements Methodology

This document primarily adopts technical testing and document review methods to conduct technical testing and protocol/SLA clause compliance checks on the resource integration capability, platform management capability, model development capability, and operations and maintenance capability of the evaluated hybrid cloud platform, and scores according to the pass status. Infrastructure capabilities are mandatory, while other capabilities require passing more than 80% of the evaluation items.

This standard applies to the intelligent transformation of existing hybrid cloud products based on hybrid cloud architecture, covering the integration of public cloud, private cloud, and edge cloud resources and the fusion of intelligent capabilities. It also applies to the needs of enterprises, government agencies, and other organizations to achieve intelligent services during digital transformation, helping enterprises clarify their capability levels and upgrade targets.

4. Infrastructure Capability Requirements

4.1. Computing Capability

This indicator defines the heterogeneous cloud resource integration capability. The platform MUST meet the following requirements:

  • Support unified access to resources including public cloud, private cloud (dedicated cloud), edge cloud, and endpoint nodes;

  • Support unified management of computing resources, storage resources, network resources, database resources, middleware resources, and container platforms;

  • Provide edge node access capability, supporting local autonomous operation in offline network environments.

4.2. Network Capability

This indicator defines the hybrid cloud network access capability. The platform MUST meet the following requirements:

  • Support the adoption of high-speed interconnection technologies, optimization of network topology structures, and other methods to improve data transmission efficiency and reduce network latency;

  • Implement intelligent network security protection, supporting dynamic threat detection and response.

4.3. Storage Capability

This indicator defines the hybrid cloud storage capability. The platform MUST meet the following requirements:

  • Implement high-performance storage (such as SSD, NVMe SSD) and storage tiered management;

  • Support cross-cloud storage scheduling and dynamic performance specification adjustment.

5. Unified Platform Management Capability

5.1. Cross-Cloud Resource Collaboration Capability

This indicator defines the specific capability requirements for heterogeneous resources and multi-source resource collaboration that an intelligent hybrid cloud SHOULD possess. The platform MUST meet the following requirements:

  • Support mapping different forms of similar resource services to unified standardized service APIs, enabling the system to identify and register different types of computing, storage, and network resources;

  • Support compatibility with mainstream GPU types and models;

  • Support compatibility with mainstream CPU hardware architectures (x86/ARM/RISC-V) for mixed deployment;

  • Support at least two types of heterogeneous hardware, such as CPU/GPU/FPGA/NPU;

  • Support multi-source heterogeneous data storage, such as structured, semi-structured, and unstructured data;

  • Support cross-cloud storage scheduling and dynamic performance specification adjustment;

  • Support optimizing cross-cloud traffic routing strategies based on AI algorithms (reinforcement learning/time series prediction);

  • Support automatic selection of low-latency links according to network congestion status;

  • Support predicting peak periods and proactively expanding bandwidth resources in advance.

5.2. Cross-Cloud Orchestration and Scheduling Capability

This indicator defines the cross-cloud resource orchestration and scheduling capability requirements that an intelligent hybrid cloud SHOULD possess. The platform MUST meet the following requirements:

  • Support multi-cloud collaborative scheduling capability, supporting real-time resource scheduling across 3 or more cloud platforms (public cloud/private cloud/edge cloud);

  • Support automatic cross-cloud scaling under burst load, with resource recovery rate greater than or equal to 90% during scaling down;

  • Provide global capacity monitoring and capacity scheduling based on monitoring analysis results;

  • Support dynamic policy optimization, including multi-objective optimization algorithms based on cost, performance, and carbon emissions;

  • Predict business load changes through AI models;

  • Support visualized composite orchestration of basic resources;

  • Support generating orchestration templates, dynamically adding resource nodes to adjust orchestration;

  • Support cross-cloud resource orchestration recommendations based on scheduling strategies such as cost optimization and performance optimization.

5.3. Cross-Cloud Platform Analysis Capability

This indicator defines the capability of an intelligent hybrid cloud to uniformly manage and intelligently analyze cross-cloud resources, tasks, and applications. The platform MUST meet the following requirements:

  • Support custom log storage for monitoring prediction, scheduling prediction, and cost analysis;

  • Support generating resource optimization reports and cost optimization reports;

  • Support multi-dimensional cost analysis and resource optimization recommendations;

  • Support cross-cloud log and metric correlation analysis;

  • Support implementing a visualized hybrid cloud management, operations, and operation interface.

6. Model Cross-Cloud Development Capability Requirements

6.1. Cross-Cloud Training Scheduling Capability

This indicator defines the unified scheduling capability that an intelligent hybrid cloud platform can provide through collaborative resources. The platform MUST meet the following requirements:

  • Support dynamic allocation of hybrid cloud resources (public cloud computing power peak expansion, private cloud sensitive data processing) for AI model training;

  • Support sharding of training tasks, supporting deployment of compute-intensive tasks (such as large model pre-training) to public cloud, while retaining data preprocessing tasks in the local private cloud.

6.2. Model Fine-Tuning and Evaluation Collaboration Capability

This indicator defines the model inference and evaluation capabilities that an intelligent hybrid cloud platform can provide through collaborative multi-cloud environments. The platform MUST meet the following requirements:

  • Support three fine-tuning modes: full update, low-rank adaptation (LoRA), and Prompt Tuning. Users MAY comprehensively select fine-tuning modes considering factors such as computing power, dataset size, downstream task type, and base model;

  • Support implementing model fine-tuning in public cloud and model testing in private cloud;

  • Support online testing functionality, allowing users to verify the accuracy and response effectiveness of models created on the platform online. Online testing supports selecting services and applications on the testing workbench for parameter configuration, inputting or referencing prompt templates for input, and completing testing;

  • Support evaluating pre-configured large models and trained models, supporting evaluation of models that have not been published as online services or have been published as online services; support configuring evaluation tasks to use public resource pools or dedicated resource pools, as well as resource size configuration.

6.3. Model Cross-Cloud Deployment Capability

This indicator defines the multi-cloud deployment capability that an intelligent hybrid cloud platform can provide. The platform MUST meet the following requirements:

  • Support seamless migration of TensorFlow/PyTorch models between X86 and ARM architectures;

  • Support multi-environment deployment of models in hybrid cloud, including deployment in private cloud, edge nodes, and other environments;

  • Support diversified model deployment strategies, such as blue-green deployment, canary deployment, and multi-replica deployment;

  • Support public resource pool and dedicated resource pool configuration. When services are published in a dedicated resource pool, services exclusively occupy resources, and corresponding computing units can be set to guarantee QPS;

  • Support deploying multiple models or multiple versions of a single model in the hybrid cloud.

6.4. Model Cross-Cloud Inference Capability

6.4.1. Service Manageability

The platform MUST meet the following requirements:

  • Support model inference service management, including start, stop, and traffic limiting;

  • Support elastic scaling of model inference services;

  • Support version updates of inference services, supporting updating specified versions for launch, and also supporting offline operations for published services, supporting smooth version replacement without directly affecting currently running version services;

  • Support a visualized interface for model inference services, including displaying model call frequency, inference performance metrics, computing resource occupancy, and other metrics;

  • Support abnormal data collection for model inference.

6.4.2. Cloud-Edge Collaborative Inference Capability

The platform MUST meet the following requirements:

  • Support cloud-edge training and inference capabilities, achieving distributed intelligence;

  • Support data cloud-edge collaborative deployment capability, allocating differently according to hot and cold data identifiers;

  • Support model lightweight compression technologies (such as quantization, pruning), adapting to resource constraints of edge devices.

6.5. Cross-Cloud Model Service Management Capability

This indicator defines the model management service capability that an intelligent hybrid cloud platform can provide. The platform MUST meet the following requirements:

  • Support viewing information of models deployed on the hybrid cloud, including model name, type, creation/modification time, etc.;

  • Support lifecycle management of large models already listed on the hybrid cloud, including model listing, publishing, querying, removing, and delisting;

  • Support version management of deployed large models, including version tracking, difference comparison between different versions, and version rollback;

  • Support at least 2 different model file storage formats.

6.6. Knowledge Base Management Capability

This indicator defines the multi-cloud knowledge base management capability that an intelligent hybrid cloud platform can provide. The platform MUST meet the following requirements:

  • Support building knowledge bases from documents in multiple formats, including pdf, txt, md, docx, etc.;

  • Support text splitting by character, length, and semantics, as well as document cleaning;

  • Support vector databases to store vectorized text fragments, and support vector database similarity retrieval;

  • Support viewing the total number of documents and total number of characters in the knowledge base, and support document-level function configuration.

7. Intelligent Operations and Maintenance Capability

7.1. Monitoring and Alerting Capability

The capability to perform intelligent operations and maintenance optimization on the intelligent hybrid cloud platform and provide related monitoring data collection for intelligent operations and maintenance. The platform MUST meet the following requirements:

  • Support large-screen monitoring, including resource usage rate, allocation rate, alerts, and other information;

  • Support querying historical monitoring data within custom time periods;

  • Support noise reduction, classification, grading, and notification of alert information and events based on the platform's own algorithms;

  • Support unified convergence and status tracking of alarm information.

7.2. Log Management Capability

The capability to perform intelligent operations and maintenance log optimization on the intelligent hybrid cloud platform. The platform MUST meet the following requirements:

  • Support calling log analysis interfaces and querying log analysis system data;

  • Support cross-cloud log and metric correlation analysis.

7.3. Fault Management

The capability to perform intelligent operations and maintenance fault optimization on the intelligent hybrid cloud platform. The platform MUST meet the following requirements:

  • Meet anomaly detection and pre-analysis for CPU/memory/ network;

  • Support root cause analysis, locating the root cause of complex faults within hours;

  • Support rapid recovery capability for minor faults within a short time;

  • Support building a fault knowledge base;

  • Support fault learning capability, building operations and maintenance intelligent agents.

7.4. Automation Capability

The capability to perform intelligent operations and maintenance automation on the intelligent hybrid cloud platform. The platform MUST meet the following requirements:

  • Support self-healing capability through intelligent algorithms (causal reasoning graph engines): achieving 100% automatic repair for known fault modes (disk full, service process crash);

  • Support automated inspection: periodic health check coverage rate greater than or equal to 99%, executable rate of generated repair recommendations greater than or equal to 80%;

  • Support dynamic resource reclamation: identifying idle resources (utilization less than 10% for 24 consecutive hours), automatically releasing or hibernating them.

7.5. Metering and Billing Capability

The capability to perform intelligent operations and maintenance metering and billing on the intelligent hybrid cloud platform. The platform MUST meet the following requirements:

  • Support basic resource usage metering and bill generation;

  • Provide a unified multi-cloud billing view;

  • Provide mixed billing modes for on-demand/reserved/spot instance AI tasks, such as resource-based billing, monthly subscription billing, and billing by GPU and model service (training/inference/knowledge base) call volume;

  • Support viewing bills on bill details and cost analysis pages;

  • Support multi-dimensional cost analysis and resource optimization recommendations;

  • Implement intelligent cost prediction and dynamic resource adjustment to optimize billing.

8. IANA Considerations

This memo includes no request to IANA.

9. Security Considerations

This document specifies technical capability requirements for intelligent hybrid cloud platforms. Implementers SHOULD consider the following security aspects when deploying such platforms:

10. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[GB_T32400]
Standardization Administration of China, "Information Technology - Cloud Computing - Overview and Vocabulary", .
[T_CCSA385.1]
China Communications Standards Association, "Cloud Computing - Intelligent Hybrid Cloud - Part 1", .

Acknowledgements

This document was drafted in accordance with the provisions of GB/T 1.1-2020 "Directives for Standardization - Part 1: Rules for the Structure and Drafting of Standardizing Documents."

Please note that certain contents of this document may involve patents. The publishing organization of this document assumes no responsibility for identifying such patents.

This document was proposed and administered by the China Communications Standards Association.

The following organizations contributed to the development of this document: China Academy of Information and Communications Technology, China Unicom Cloud Data Co., Ltd., Alibaba Cloud Technology Co., Ltd., China Mobile (Suzhou) Software Technology Co., Ltd., JD Cloud Computing Co., Ltd., Tianyi Cloud Technology Co., Ltd., and Lenovo (Beijing) Co., Ltd.

Author's Address

Zihan Li
China Academy of Information and Communications Technology
China