tbd R. Jesske Internet-Draft M. Kreipl Obsoletes: none (if approved) Deutsche Telekom Updates: none (if approved) 26 June 2026 Intended status: Informational Expires: 28 December 2026 AI enablement interface for multimedia services platforms draft-jesske-ai-enablement-interface-00.txt Abstract This document specifies a generic interface enabling the integration between a Multimedia Communication Framework such as a 3GPP IP Multimedia Subsystem and Large Language Models (LLMs) or other AI- based services that perform text, audio, video, image and service processing. A Multimedia Communication framework is a network supporting voice, video and message services as a SIP network or the IMS (IP Multimedia System) defined by 3GPP. The interface is designed to be platform-agnostic and flexible, allowing a connection to different AI platforms e.g. LLMs independent of their underlying technology or deployment environment. This document is inspired by the IMS environment providing data channel capabilities offering a variety of services within the framework. Such an interface allows advanced AI functions such as natural language understanding, semantic analysis, and contextual processing through a standardized, extensible and interoperable interface. This enables the IMS ecosystem to seamlessly adopt and integrate emerging AI technologies to enrich communication services. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 28 December 2026. Roland & Kreipl Expires 28 December 2026 [Page 1] Internet-Draft AI enablement interface for text, audio, June 2026 Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Bidirectional Communication Support . . . . . . . . . . . 3 2.3. Standardized Message Formats . . . . . . . . . . . . . . 3 2.4. Multi-Modal Data Handling . . . . . . . . . . . . . . . . 4 2.5. Compatibility with Existing LLM Services . . . . . . . . 4 2.6. Authentication and Authorization . . . . . . . . . . . . 4 2.7. Interaction Models . . . . . . . . . . . . . . . . . . . 4 2.8. Error Handling and Fallback Procedures . . . . . . . . . 5 3. Interface Architecture . . . . . . . . . . . . . . . . . . . 5 3.1. Communication Protocol . . . . . . . . . . . . . . . . . 5 3.2. Message Formats . . . . . . . . . . . . . . . . . . . . . 6 3.3. Security . . . . . . . . . . . . . . . . . . . . . . . . 7 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 6. Operational considerations . . . . . . . . . . . . . . . . . 8 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 8.1. Normative References . . . . . . . . . . . . . . . . . . 8 8.2. Informative References . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 1. Introduction This draft is inspired by the IP Multimedia Subsystem (IMS) which employs voice, video, messaging as well as Data Channels (DC) to enable real-time, bidirectional data communication between various entities. Different Application Servers play a central role in managing and processing these data streams and the related services. The world of Artificial Intelligence (AI), particularly Large Language Models (LLMs), offers significant opportunities to enhance Roland & Kreipl Expires 28 December 2026 [Page 2] Internet-Draft AI enablement interface for text, audio, June 2026 communication services by enabling intelligent message processing, semantic analysis and contextual understanding. This document defines a generic, platform-agnostic interface between a generic application layer and AI-based services, facilitating a seamless integration of AI capabilities. This may be provided for a 3GPP IMS or any other equivalent multimedia platform. Future AI services enabled through this interface are not limited to automated translation, speech transcription, and personalized concierge functionalities. By standardizing this interface, a multimedia platform as the IMS can flexibly adopt AI technologies to enrich user experience and provide advanced multimedia services. 2. Requirements 2.1. Introduction This section outlines the necessary capabilities and design principles for the interface between the service domain and an AI- based service like Large Language Models (LLM). The interface must be flexible enough to accommodate existing and future LLM platforms such as OpenAI, Gemini, and others. It also needs to support various types of data, including text, audio, image and video streams. 2.2. Bidirectional Communication Support The interface shall support bidirectional data exchange between the Multimedia Communication Framework and an AI service. This enables real-time interaction required for conversational AI, continuous context synchronization, and feedback mechanisms. Typical transport protocols include WebSocket for low-latency, full- duplex communication, or HTTP/2 REST APIs for request-response workflows. 2.3. Standardized Message Formats Messages exchanged should adopt widely accepted serialization formats such as JSON. The message structure must distinguish clearly between text, audio, image and video data payloads, including metadata such as encoding, language, and context information. The format should be extensible to support future AI service features and media types. Roland & Kreipl Expires 28 December 2026 [Page 3] Internet-Draft AI enablement interface for text, audio, June 2026 2.4. Multi-Modal Data Handling The interface must enable the transfer of multiple media types relevant to AI processing. text: raw or structured text for natural language understanding and generation. audio: audio streams or voice data needing transcription, recognition, or speaker identification. Image: Images in different formats for requiring analysis of different parameters, recognition, Pattern Analysis or translation to descriptive output. video: multimedia streams potentially requiring analysis for context, lip reading, or sentiment detection. Support for efficient streaming protocols or wrappers to encapsulate real-time media streams is needed, aligned with the transport protocols used by target LLM AI services. 2.5. Compatibility with Existing LLM Services The interface design must accommodate popular LLM providers such as OpenAI, Gemini and others, respecting their respective protocol preferences, authentication methods, and API semantics. The interface should be agnostic and flexible enough to integrate with both cloud-hosted and on-premises AI deployments. Support for common bearer tokens and authentication standards (e.g. OAuth 2.0 [RFC6749]) is mandatory. 2.6. Authentication and Authorization The interface must enforce strong authentication and authorization measures to secure communication between Multimedia Communication Framework and AI services. Typical mechanisms include TLS encryption, token-based authentication (OAuth 2.0, JWT [JSON Web Token]), and fine-grained access control policies to ensure data confidentiality and integrity. 2.7. Interaction Models The interface should support both synchronous and asynchronous interaction modes. Roland & Kreipl Expires 28 December 2026 [Page 4] Internet-Draft AI enablement interface for text, audio, June 2026 Synchronous: For real-time communication and session control where immediate AI responses are required. Asynchronous: For AI tasks like bulk transcription, translation, or offline content analysis which are longer running. An ability to notify the Multimedia Communication Framework of task completion or partial results is essential. 2.8. Error Handling and Fallback Procedures The interface must provide standardized means to report errors, timeouts, and exceptional conditions. The service domain should provide fallback and retry mechanisms. These assumes that fallback triggers are recognized and a decicion matrix for alternative processing paths or retries does exist. Robustness against transient network or service failures must be considered. 3. Interface Architecture 3.1. Communication Protocol Transport Protocols The interface shall support WebSocket [RFC6455] as a primary transport protocol, enabling low-latency, full-duplex, bidirectional communication. This is essential for real-time interactions. This allows conversational AI, streaming transcription, and interactive multimedia processing which satisfies the user in a good quality. Additionally, HTTP/2 RESTful APIs [RFC9113] shall be supported to facilitate synchronous request-response interactions and to ensure compatibility with existing infrastructure. REST APIs are preferable for batch processing, management tasks, or asynchronous workflows. The interface shall be backwards compatible an shall work with minimum HTTP 1.0 [RFC1945] Both WebSocket as well as HTTP/2 RESTful APIs should be used in parallel based on use case requirements or network environments. Session Management For WebSocket connections, according to [RFC6455] an initial HTTP handshake allows a persistent socket connection, which permits continuous message exchange until it is explicitly closed. Roland & Kreipl Expires 28 December 2026 [Page 5] Internet-Draft AI enablement interface for text, audio, June 2026 REST APIs do not rely on connection-bound session state. Any conversational or application context must be conveyed explicitly in each request, either directly or by reference through a session, conversation, or transaction identifier. 3.2. Message Formats Encoding Messages exchanged between the Multimedia Communication Framework and LLM services shall be encoded in JSON [RFC8259] due to its ubiquity and ease of integration. For performance-sensitive applications, Protocol Buffers may also be supported to reduce payload size and parsing overhead. The interface shall clearly specify the content type of each message (e.g., "application/json" or "application/protobuf") in message headers or associated metadata. Request Message Structure (Application layer to LLM) Each request message must include: A unique request identifier for correlation. User and session metadata such as user ID, session ID, language, and service context. Payload containing the input data, text, audio data (as encoded audio or stream references), images (in different formats) or video data with a clear specification of format and encoding. Optional contextual parameters such as task type (e.g., transcription, translation), priority, or custom AI service options. Response Message Structure (LLM to Application layer) Response messages must return: The original request identifier for matching responses to requests. Status fields indicating success, partial success, or errors. The processed output data which can include generated text, classification labels, transcribed speech, semantic analysis results, or multimedia annotations. Confidence scores or quality metrics, where applicable. Roland & Kreipl Expires 28 December 2026 [Page 6] Internet-Draft AI enablement interface for text, audio, June 2026 For asynchronous operations, the response may include job IDs and estimated completion times. Extensibility Both request and response formats must allow for future extensions, such as new data types, enhanced metadata, or experimental AI features, by permitting custom fields or versioning. 3.3. Security Data Confidentiality and Integrity All communication between the Multimedia Communication Framework and LLM services must be encrypted using Transport Layer Security (TLS) to prevent eavesdropping, tampering, or man-in-the-middle attacks. TLS versions 1.2 [RFC5246] or TLS 1.3 [RFC8446] should be mandated. Integrity checks via message authentication codes (MAC) or similar mechanisms may be used as supplementary safeguards, especially for streaming data. Authentication The interface shall employ token-based authentication, with OAuth 2.0 [RFC6749] being the preferred protocol for authorization, enabling delegated access and token lifecycle management. Tokens (e.g., JWTs) may be carried in standardized headers, for REST APIs or during WebSocket connection initialization. The interface must support token renewal, revocation, and fine- grained scope control to limit permissions based on service roles or client identity. Access Control Application Layer and LLM services must implement strict access controls, ensuring only authorized requests are processed. This includes IP whitelisting, rate limiting, and auditing. Role-based and attribute-based access control policies should be defined to protect sensitive data and restrict operation types. Privacy and Compliance The interface should support data minimization by transmitting only necessary user or session information in requests. Roland & Kreipl Expires 28 December 2026 [Page 7] Internet-Draft AI enablement interface for text, audio, June 2026 Compliance with data protection regulations such as GDPR must be considered, including capabilities for data anonymization and secure deletion. 4. IANA Considerations No IANA Considerations. 5. Security Considerations tbd 6. Operational considerations tbd 7. Acknowledgments The author would like to acknowledge the . 8. References 8.1. Normative References [RFC1945] Berners-Lee, T., Fielding, R., and H. Frystyk, "Hypertext Transfer Protocol -- HTTP/1.0", RFC 1945, DOI 10.17487/RFC1945, May 1996, . [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, DOI 10.17487/RFC5246, August 2008, . [RFC6455] Fette, I. and A. Melnikov, "The WebSocket Protocol", RFC 6455, DOI 10.17487/RFC6455, December 2011, . [RFC6749] Hardt, D., Ed., "The OAuth 2.0 Authorization Framework", RFC 6749, DOI 10.17487/RFC6749, October 2012, . [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, DOI 10.17487/RFC8259, December 2017, . Roland & Kreipl Expires 28 December 2026 [Page 8] Internet-Draft AI enablement interface for text, audio, June 2026 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, . [RFC9113] Thomson, M., Ed. and C. Benfield, Ed., "HTTP/2", RFC 9113, DOI 10.17487/RFC9113, June 2022, . 8.2. Informative References Authors' Addresses Roland Jesske Deutsche Telekom Telekom Allee 9 64295 Darmstadt Germany Email: r.jesske@telekom.de URI: www.telekom.com Michael Kreipl Deutsche Telekom Dieselstrasse 43 90441 Nuernberg Germany Email: michael.kreipl@telekom.de URI: www.telekom.com Roland & Kreipl Expires 28 December 2026 [Page 9]