SML H.-J. Happel Internet-Draft audriga GmbH Intended status: Standards Track 8 July 2024 Expires: 9 January 2025 Structured Email draft-ietf-sml-structured-email-02 Abstract This document specifies how a machine-readable version of the content of email messages can be added to those messages. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 9 January 2025. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Happel Expires 9 January 2025 [Page 1] Internet-Draft Structured Email July 2024 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions Used in This Document . . . . . . . . . . . . . . 3 3. Representing structured data . . . . . . . . . . . . . . . . 3 3.1. Knowledge representation language . . . . . . . . . . . . 3 3.2. Vocabularies . . . . . . . . . . . . . . . . . . . . . . 4 4. Structured data in email messages . . . . . . . . . . . . . . 4 4.1. Placement . . . . . . . . . . . . . . . . . . . . . . . . 4 4.1.1. Full representation . . . . . . . . . . . . . . . . . 5 4.1.2. Partial representation . . . . . . . . . . . . . . . 5 4.1.3. Non-representation . . . . . . . . . . . . . . . . . 5 4.2. Identifiers . . . . . . . . . . . . . . . . . . . . . . . 5 4.2.1. Using identifiers in structured data . . . . . . . . 6 4.2.2. Using structured data identifiers in text/html . . . 6 5. Structured data across email messages . . . . . . . . . . . . 7 5.1. Forwarding . . . . . . . . . . . . . . . . . . . . . . . 7 5.2. Replies . . . . . . . . . . . . . . . . . . . . . . . . . 7 5.3. Error replies . . . . . . . . . . . . . . . . . . . . . . 8 5.4. Updates . . . . . . . . . . . . . . . . . . . . . . . . . 8 6. Header fields and message flags . . . . . . . . . . . . . . . 8 6.1. Presence of structured data . . . . . . . . . . . . . . . 9 6.2. Action processing . . . . . . . . . . . . . . . . . . . . 9 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 10 7.1. Partial representation . . . . . . . . . . . . . . . . . 10 8. Security and trust . . . . . . . . . . . . . . . . . . . . . 10 9. Implementation status . . . . . . . . . . . . . . . . . . . . 10 9.1. Structured Email plugin for Roundcube Webmail . . . . . . 11 10. Security considerations . . . . . . . . . . . . . . . . . . . 11 11. Privacy considerations . . . . . . . . . . . . . . . . . . . 11 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 13. Informative References . . . . . . . . . . . . . . . . . . . 11 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13 1. Introduction Information on websites and in email messages mostly addresses human readers. However, various attempts have been made to make such information - fully or in part - machine-readable, so that tools can assist users in dealing with that information more efficiently. One widespread approach is the usage of [SchemaOrg] vocabulary which can be embedded in the HTML markup of websites. It is then crawled by web search engines and used to improve the quality of search result snippets (e.g., by showing displaying ratings, opening hours, or contact information). Happel Expires 9 January 2025 [Page 2] Internet-Draft Structured Email July 2024 Similarly, a number of web shops, hotels, or airlines include Schema.org vocabulary in order receipt email messages sent to customers. This information is extracted and used by some ISPs and open source tools ([SchemaOrgEmail]). However, these implementations differ in many details. The goal of this specification is to provide a clear and comprehensive specification for this practice and to provide ground for potential future extensions. 2. Conventions Used in This Document The terms "message" and "email message" refer to "electronic mail messages" or "emails" as specified in [RFC5322]. The term "Message User Agent" (MUA) denotes an email client application as per [RFC5598]. The terms "machine-readable data" and "structured data" are used in contrast to "human-readable" messages and denote information expressed "in a structured format (..) which can be consumed by another program using consistent processing logic" [MachineReadable]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. Representing structured data In order to exchange structured data, one needs to chose a formal language and a serialization format. Based on this, vocabularies can be helpful to establish a shared understanding of structured data among heterogeneous senders and receivers. 3.1. Knowledge representation language The Resource Description Framework ([RDF]) is a formal language for knowledge representation standardized by the W3C. It is already used for annotating websites and emails, as it is underlying [SchemaOrg]. Among the various serializations for RDF, JSON-LD ([JSONLD]) has become the most commonly used serialization used on websites ([WDCStats]). Hence, structured data in email messages SHOULD be expressed in the JSON-LD serialization of RDF. Happel Expires 9 January 2025 [Page 3] Internet-Draft Structured Email July 2024 For discussion, see also: https://github.com/hhappel/draft-happel-structured-email/issues/1 3.2. Vocabularies Using RDF/JSON-LD, users are free to express any kind of information in structured data. For reuse and reference however, it is common to agree upon certain core concepts/entities and properties for a certain domain. Those are typically collected and shared in so- called vocabularies. [SchemaOrg] is a widespread vocabulary, which was design for annotating content on websites. A small subset of its concepts is already used by email senders and processed by email providers. Users that want to add structured data into email message SHOULD use concepts from [SchemaOrg], if they fit their use case. They MAY however use any valid JSON-LD. There might also be certain vocabularies for email-specific use cases (such as [I-D.happel-sml-structured-vacation-notices-00]), that will be specifically endorsed by the IETF or by respective RFCs. MUAs may choose freely if and how to use structured data extracted from messages. If they do not explictly support a certain vocabulary, MUAs may also rely on extensions or passing data to outside applications, similar to the case of MIME body parts. For discussion, see also: https://github.com/hhappel/draft-happel-structured-email/issues/2 4. Structured data in email messages This section discusses the placement of structured data within email messages and identifiers for referencing between structured data and other parts of a message. 4.1. Placement This document targets structured data describing the content of an email message itself. Since users may add other arbitrary structured data (e.g., as MIME body parts of type "application/ld+json") to an email message, we need to define which kinds of structured data are supposed to be representative of the email message content. For this, we distinguish the cases of full, partial, and non- representation. Happel Expires 9 January 2025 [Page 4] Internet-Draft Structured Email July 2024 For discussion, see also: https://github.com/hhappel/draft-happel-structured-email/issues/3 4.1.1. Full representation If structured data is intended by the sender to _fully_ describe the human readable content of an email message, it MUST be added as a multipart/alternative entity with the content type application/ ld+json. The email message SHOULD in this case also contain a text/plain and a text/html version of the content. MUAs supporting this specification SHOULD prefer the application/ ld+json representation when receiving such email messages if they are able to process the used vocabulary or are able to process the structured data otherwise. 4.1.2. Partial representation If structured data is intended to describe only a _subset_ of the human-readable content, it must be enclosed in a 8. Security and trust Email user agents that want to support structured email should follow guidance to ensure trust and security standards. These will be elaborated in a separate specification. 9. Implementation status < RFC Editor: before publication please remove this section and the reference to [RFC7942] > This section records the status of known implementations of the protocol defined by this specification at the time of posting of this Internet-Draft, and is based on a proposal described in [RFC7942]. The description of implementations in this section is intended to assist the IETF in its decision processes in progressing drafts to RFCs. Please note that the listing of any individual implementation here does not imply endorsement by the IETF. Furthermore, no effort has been spent to verify the information presented here that was supplied by IETF contributors. This is not intended as, and must not be construed to be, a catalog of available implementations or their features. Readers are advised to note that other implementations may exist. According to [RFC7942], "this will allow reviewers and working groups to assign due consideration to documents that have the benefit of running code, which may serve as evidence of valuable experimentation and feedback that have made the implemented protocols more mature. It is up to the individual working groups to use this information as they see fit". Happel Expires 9 January 2025 [Page 10] Internet-Draft Structured Email July 2024 9.1. Structured Email plugin for Roundcube Webmail An open source plugin for the Roundcube Webmail software is developed to serve as a reference implementation for this specification ([RC-SML]). Beyond that, some ISPs and open source tools provide implementation partly compliant with this specficiation ([SchemaOrgEmail]). 10. Security considerations See section "security and trust". 11. Privacy considerations See section "security and trust". 12. IANA Considerations This document has no IANA actions at this time. (TBD IMAP flags) 13. Informative References [HTMLData] WHATWG, "HTML Living Standard: Embedding custom non- visible data with the data-* attributes", . [HasAttachment] IETF imapext WG mailing list, "Registering $hasAttachment & $hasNoAttachment", . [JSONLD] W3C JSON-LD Working Group, "JSON-LD 1.1", . [MachineReadable] NIST, "NIST IR 7511 Rev. 4", . [PotentialAction] W3C Schema.org Community Group, "Schema.org: potentialAction", . Happel Expires 9 January 2025 [Page 11] Internet-Draft Structured Email July 2024 [RC-SML] audriga GmbH, "Structured Email plugin for Roundcube Webmail", . [RDF] W3C RDF Working Group), "RDF 1.1 Concepts and Abstract Syntax", . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2392] Levinson, E., "Content-ID and Message-ID Uniform Resource Locators", RFC 2392, DOI 10.17487/RFC2392, August 1998, . [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, January 2005, . [RFC4021] Klyne, G. and J. Palme, "Registration of Mail and MIME Header Fields", RFC 4021, DOI 10.17487/RFC4021, March 2005, . [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, DOI 10.17487/RFC5322, October 2008, . [RFC5598] Crocker, D., "Internet Mail Architecture", RFC 5598, DOI 10.17487/RFC5598, July 2009, . [RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running Code: The Implementation Status Section", BCP 205, RFC 7942, DOI 10.17487/RFC7942, July 2016, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC9051] Melnikov, A., Ed. and B. Leiba, Ed., "Internet Message Access Protocol (IMAP) - Version 4rev2", RFC 9051, DOI 10.17487/RFC9051, August 2021, . Happel Expires 9 January 2025 [Page 12] Internet-Draft Structured Email July 2024 [SchemaOrg] W3C Schema.org Community Group, "Schema.org", . [SchemaOrgEmail] Structured Email, "Schema.org for email", . [WDCStats] Web Data Commons Project, "Web Data Commons - Microdata, RDFa, JSON-LD, and Microformat Data Sets", . Author's Address Hans-Joerg Happel audriga GmbH Email: happel@audriga.com URI: https://www.audriga.com Happel Expires 9 January 2025 [Page 13]