CAICT                                                                       Z. Li
INTERNET DRAFT                                                   Y. Su
Intended status: Standards Track                       J. Dou
Expires: 30 April 2025                                           R.Chen
                                                                                  CAICT
									          16 October 2024
							                         

A method for evaluating the capabilities of 
large language models deployed on hybrid cloud

draft-lizihan-hybrid-cloudlargelanguagemodel-00


Abstract
This document establishes Group Standard for 
Large Language Model capabilities on hybrid cloud.

Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the
Internet Engineering Task Force (IETF). 
Note that other groups may also distribute working documents
as Internet-Drafts.  
The list of current 
Internet-Drafts is 
at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for
a maximum of six months and may be updated, replaced,
or obsoleted by other documents at any time.  
It is inappropriate to use Internet-Drafts as reference 
material or to cite them other than as "work in progress."

This Internet-Draft will expire on 30 April 2025.


Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified
 as the document authors.  
All rights reserved.

This document is subject to BCP 78 
and the IETF Trust's Legal Provisions 
Relating to IETF Documents (https://trustee.ietf.org/license-info) 
in effect on the date of publication of this document.
Please review these documents carefully,
as they describe your rights and restrictions 
with respect to this document.





Table of contents


1. Introduction	 
2. Capability Overview	 
3. Cloud Infrastructural Capabilities 
4. Model Capabilities  
5. Application Capabilities 	
6. Security Considerations	
7. Operation and Maintenance Capabilities  	
8. Completeness of Service Considerations 
9. IANA Considerations	 	
10.References 	
Acknowledgments
Authors Addresses 
 

1.Introduction

This document stipulates the maturity of large model 
hybrid cloud capability, including 
Cloud Infrastructural 
capabilities, model capabilities,
application capabilities, 
model security considerations, 
and the completeness and standardization 
of service indicators.

This document intends to standardize products or solutions
in the field of large model hybrid cloud, 
and to help enterprises
to better build and improve the construction 
of their own large model hybrid cloud.

Large model hybrid cloud is now a focus 
for many service providers 
and enterprise customers. 
Since its emergence, 
the concept of hybrid cloud has experienced
the integration with multiple technologies.
 Hybrid cloud is now a common 
form of enterprise tech architecture.
82% of global enterprises 
have adopted hybrid cloud architecture. Among them, 
with the continuous breakthrough 
of artificial intelligence technology, 
large model hybrid cloud architecture 
has become a key force for 
enterprises to promote technological innovation 
and optimize data processing. Companies 
around the world are now embracing this architecture 
to address the growing need 
for computing and data management.  
In the face of these challenges, the large model 
hybrid cloud demonstrates powerful core capabilities, 
including model scalability, 
adaptive learning capabilities, and cross-platform 
collaboration capabilities. 

2. Capability Overview

2.1 Technical architecture of large model 
hybrid cloud capability maturity

The large model hybrid cloud tech architecture describes 
its components and their relationships:

Cloud Infrastructure layer: 
Provides and manages computing resources for the system;
Model layer: 
Supports the full development process of large models;
Application layer: 
Develops or adjusts large models
 for specific industry or scenario needs;
Security capability: 
Consists of hybrid cloud and model security;
Operation and maintenance capability: Includes disaster recovery, 
monitoring, and log management.

3 Cloud Infrastructure Capabilities Considerations

3.1 Overview
Large model hybrid cloud capability refers to the ability to
 provide and manage hybrid cloud computing resources.
 The following indicators are defined:

- Multiple computing power compatibility;
- Storage facilities network capacity;
- Multi-cloud access capability;
- Multi-cloud resource management capability.

3.2 Multiple computing power compatibility
Level 1: Supports at least one CPU (x86, ARM, etc.) 
and GPU architecture;
Level 2: Supports various CPU (x86, ARM, etc.) 
and GPU architectures;
Level 3: Supports various CPU (x86, ARM, etc.), GPU, 
and NPU or DPU architectures.

3.3 Storage capacity
Level 1: Supports distributed storage in the cloud 
and local storage systems;
Level 2: Supports object storage, parallel file access,
and lifecycle management;
Level 3: Supports high-performance object storage 
and distributed parallel file storage for AI.

3.4 Network capability
Level 1: Supports stable networks through dedicated lines, 
VPN, or SD-WAN, and basic QoS;
Level 2: Tests network performance and 
supports visual network monitoring;
Level 3: Supports RDMA and single-port high network bandwidth.

3.5 Cloud access capability
Level 1: Supports access to at least one public cloud,
 private cloud, or local environment;
Level 2: Supports access to multiple public clouds, 
private clouds, and local environments;
Level 3: Supports access to multiple public clouds, 
private clouds, local environments, 
proprietary clouds, and edge nodes.

3.6 Multi-cloud resource management capability
Level 1: Supports management of computing resources, 
including allocation and lifecycle management;
Level 2: Supports load balancing 
and unified resource pooling;
Level 3: Supports visual arrangement 
and computing power cutting of GPU or NPU.

4 Model layer ability

4.1 Overview
The model layer capability refers to the ability of large models 
to develop fully on the hybrid cloud, 
with the following indicators defined:

- Data engineering capability;
- Model development or training capability;
- Model deployment or reasoning capability.

4.2 Data engineering capability
4.2.1 Overview
Data engineering capability refers to the ability 
to process data required for model training 
or reasoning on the hybrid cloud. 
The following indicators are defined:

- Data access capability;
- Data processing capability;
- Data management capabilities.

4.2.2 Data access capability
Level 1: Supports access to structured, semi-structured, 
and unstructured data, and multi-modal data;
Level 2: Supports identification and access of data
from different sources and formats;
Level 3: Supports incremental data synchronization 
and custom data access filtering strategies.

4.2.3 Data processing capability
Level 1: Supports data cleaning, completion, 
and annotation;
Level 2: Supports data standardization, enhancement, 
and both offline and online processing;
Level 3: Supports intelligent cleaning, 
multi-modal data annotation, 
and various annotation methods.

4.2.4 Data management capability
Level 1: Supports metadata management, data set construction, 
and version management;
Level 2: Supports data lifecycle management, classification, 
and migration between multi-cloud environments;
Level 3: Supports multi-dimensional analysis 
and construction of new data sets.

4.3 Model development or training ability
4.3.1 Overview
Model development or training capability refers to the ability
 to conduct model development or training on the hybrid cloud, 
with the following indicators defined:

- Model development or training environmental capabilities;
- Training task arrangement ability;
- Model evaluation capability.

4.3.2 Model development or training of 
environmental competencies
Level 1: Supports deep learning frameworks, heterogeneous 
computing frameworks, and distributed training frameworks;
Level 2: Supports preset model libraries and algorithm libraries.

4.3.3 Training task arrangement ability
Level 1: Supports regular model hyperparameter setup 
and visual display of training tasks;
Level 2: Supports various distributed training methods 
and visual arrangement of model training tasks;
Level 3: Supports various model tuning strategies
 and targeted training of abnormal data.

4.3.4 Ability of model evaluation
Level 1: Supports multi-dimensional model evaluation
 and custom business-related indicators;
Level 2: Supports visual comparison 
of different model evaluation results
and generation of evaluation reports.

4.4 Model deployment or reasoning capability
4.4.1 Overview
Model deployment or reasoning capability refers
 to the ability to deploy or reason on a hybrid cloud, 
with the following indicators defined:

- Multivariate deployment or reasoning capabilities;
- Model management capability.

4.4.2 Multivariate deployment or reasoning capability
Level 1: Supports mirrored model deployment 
and service interfaces;
Level 2: Supports diversified model deployment strategies and 
multiple hybrid cloud environments;
Level 3: Supports cloud-edge collaboration strategies 
and model compression for edge deployment.

4.4.3 Model management capability
Level 1: Supports deployment of multiple models or versions 
and viewing of deployed model information;
Level 2: Supports lifecycle management of deployed models, 
version management, and multiple model file storage formats.

5 Application layer ability

5.1 Overview
Application layer capability refers to the ability of large models 
to meet personalized needs of different industries or scenarios, 
with the following indicators defined:

- Model application scenario support capability.

5.2 Support capability of model application scenarios
Level 1: Supports at least one basic large model, one industry model,
 and three large scene models;
Level 2: Supports at least two basic large models, 
two industry models, 
and five large scene models;
Level 3: Supports at least three basic large models, 
three industry models, and ten large scene models.

6 Security Considerations 

6.1 Overview
Security capability in large model hybrid cloud capability 
refers to the comprehensive security capability, 
with the following indicators defined:

- Hybrid cloud security capabilities;
- Model security capability.

6.2 Hybrid cloud security capabilities
6.2.1 Overview
Hybrid cloud security capability refers 
to the comprehensive security capability 
of hybrid cloud, with the following indicators defined:

- Access control capability;
- Data security capability;
- Network security capability.

6.2.2 Access control capability
Level 1: Supports user account management, password modification,
 and identity authentication;
Level 2: Supports role-based access control and user login settings;
Level 3: Supports multi-factor authentication and connection 
with independent user authentication systems.

6.2.3 Data security capability
Level 1: Protects data storage and transmission through encryption, 
key management, and database firewalls;
Level 2: Supports data differentiation, classification,
 and various access strategies;
Level 3: Supports data domain management, encryption, 
and desensitization for data sharing.

6.2.4 Network security capability
Level 1: Ensures network boundary security with security groups, 
firewalls, IDS, and secure transmission protocols like HTTPS and TLS;
Level 2: Supports network isolation through 
software-defined networks 
and defends against traffic and application layer attacks;
Level 3: Supports simulated model attacks, 
identification of model "illusions," 
and inspection of model generation results for high-risk content.

6.3 Model security capability
6.3.1 Overview
Model security capability refers 
to the comprehensive safety capability 
of large models, with the following indicators defined:
- Access control capability;
- Model service security capability.

6.3.2 Access control capability
Level 1: Supports identity authorization management 
and authentication for large model access;
Level 2: Supports access control strategies
 and diversified identity authentication functions.

6.3.3 Model service security capability
Level 1: Supports data review for model training, 
model file integrity verification, and encryption;
Level 2: Adopts security algorithms and protocols 
for model training and has an emergency 
response mechanism
 for model services;
Level 3: Supports simulated model attacks 
and identification and analysis of model "illusions."

7 Operation and Maintenance Capabilities Requirements
7.1 Overview
Operation and maintenance capability refers to 
the comprehensive operation and maintenance capability 
of the large model hybrid cloud system, 
with the following indicators defined:
- Disaster recovery backup;
- Metering pay;
- Monitoring alarm;
- Log Management.

7.2 Disaster recovery and backup capability
Level 1: Supports disaster recovery task management, 
data backup, and consistency verification;
Level 2: Supports model state saving and 
offline model availability in case of network failure;
Level 3: Supports system disaster recovery
and disaster recovery drills.

7.3 Measurement and billing capacity
Level 1: Provides metering and billing rules 
for large model services covering various scenarios;
Level 2: Supports centralized display of billing records 
and multi-dimensional bill queries;
Level 3: Supports cost estimation, analysis,
 and custom billing models.

7.4 Monitoring and alarm capability
Level 1: Supports monitoring of computing resources, 
data access, model training, 
and model service stability;
Level 2: Supports fault simulation, 
customized alarm thresholds, 
log monitoring, and heartbeat detection;
Level 3: Supports unified alarm information collection, 
automated noise reduction, 
and fast fault location across hardware
 and software.

7.5 Log management capability
Level 1: Supports logging of hybrid cloud access operations,
 large model access audits, and resource logging;
Level 2: Supports log viewing, management, download, 
synchronization, and cleaning;
Level 3: Supports recording of network attacks 
and querying of log analysis system data.

8. Completeness of service indicators Considerations

8.1 Product Cycle
Large model hybrid cloud solutions or products must commit 
to delivery times and provide upgrade service evaluation 
methods based on material reviews, 
including product delivery cycles 
and update disclosures.

8.2 Operation and maintenance services
Large model hybrid cloud solutions or products 
must describe assisted or managed operation services, 
including service hours, training services, and operation 
and maintenance assistance.

8.3 Protection of rights and interests
Large model hybrid cloud solutions or products 
must promise user rights protection 
and risk control methods in service agreements, 
including continuous service durations, service fees, 
and privacy protection rules.

9. IANA Considerations

To be completed.
10.References

10.1 Normative references to the reference documents

The content of the following documents constitute 
essential provisions in this document. 
For dated references, 
only the version corresponding to that date 
applies to this document. For undated referenced documents, 
 the  latest  version  (including  all modification orders)
 applies to this document.
GB/T 32400-2015 Information Technology Cloud Computing 
Overview and vocabulary
GB/T 41867-2022 Information technology AI terminology

10.2 Terms, definitions, and abbreviations

GB/T 32400-2015, GB/ T 41867-2022 as defined 
and the following terms, 
definitions and abbreviations apply to this document.

10.2.1  Terms and definitions

10.2.2 Public cloud : public cloud
A cloud deployment model in which cloud services can be used 
by any cloud service customer and resources 
are controlled by a cloud service provider.
[ GB/T 32400-2015   3.2.33]
10.2.2 Private cloud private cloud
A type of cloud deployment type that is used only 
by one cloud service customer and resources 
are controlled by that cloud service customer.
[ GB/T 32400-2015   3.2.32]
10.2.3 Hybrid cloud hybrid cloud
Cloud deployment models that contain at least two different 
cloud deployment models. [ GB/T 32400-2015   3.2.23]
10.2.4 Artificial Intelligence artificial Intelligence
Research and development of related mechanisms 
and applications of artificial intelligence systems.
[Source: GB/  T 41867-2022,3. 1.2]
10.2.5 Deep learning
deep learning Methods to create rich hierarchical representations 
by training neural networks with many hidden layers.
[Source: GB/  T 41867-2022,3.2.27]
10.2.6 Model Assessment of the model evaluation
The quality of the trained model is evaluated through the established 
evaluation indexes of various AI tasks.
10.2.7 Sensitive data
Refers to the data recorded such as personal information,
 enterprise information and government departments information
 in the computer information system that
is not suitable for public release.

Acknowledgments

we are grateful to the authors of those
documents for putting their time and effort into this.

Zihan Li 
China Academy of Information and Communications Technology

Yue Su

Ruihao Chen

Jiali Dou

Authors' Addresses

   Zihan Li (editor)
   China Academy of Information and Communications Technology
   Zhichunlu Road
   Beijing
   China
   Email: lizihan1@caict.ac.cn

   Yue Su 
   China Academy of Information and Communications Technology 
   Email: suyue1@caict.ac.cn

   Ruihao Chen 
   China Academy of Information and Communications Technology 
   Email: chenruihao@caict.ac.cn

   Jiali Dou 
   China Academy of Information and Communications Technology 
   Email: doujiali@caict.ac.cn