Independent Submission                                         M. Caldas
Internet-Draft                                               Independent
Intended status: Informational                           12 January 2026
Expires: 16 July 2026


   CSV++ (CSV Plus Plus): Extension to RFC 4180 for Hierarchical Data
                        draft-mscaldas-csvpp-01

Abstract

   This document specifies CSV++ (CSV Plus Plus), an extension to the
   Comma-Separated Values (CSV) format defined in RFC 4180.  CSV++ adds
   support for repeating fields (one-to-many relationships) and
   hierarchicalcomponent structures while maintaining backward
   compatibility with standard CSV parsers.  The extension uses
   declarative syntax in column headers to define array fields and
   nested structures, enabling representation of complex real-world data
   while preserving the simplicity and human-readability of CSV.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 16 July 2026.

Copyright Notice

   Copyright (c) 2026 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Caldas                    Expires 16 July 2026                  [Page 1]

Internet-Draft                    CSV++                     January 2026


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Motivation  . . . . . . . . . . . . . . . . . . . . . . .   3
     1.2.  When to Use CSV++ . . . . . . . . . . . . . . . . . . . .   4
     1.3.  Design Principles . . . . . . . . . . . . . . . . . . . .   4
     1.4.  Requirements Language . . . . . . . . . . . . . . . . . .   5
   2.  Conformance with RFC 4180 . . . . . . . . . . . . . . . . . .   5
   3.  Field Separator Detection . . . . . . . . . . . . . . . . . .   5
   4.  Array Fields (Repetitions)  . . . . . . . . . . . . . . . . .   5
     4.1.  Syntax  . . . . . . . . . . . . . . . . . . . . . . . . .   6
     4.2.  Examples  . . . . . . . . . . . . . . . . . . . . . . . .   6
     4.3.  Empty Values  . . . . . . . . . . . . . . . . . . . . . .   6
     4.4.  Escaping  . . . . . . . . . . . . . . . . . . . . . . . .   7
   5.  Structured Fields (Components)  . . . . . . . . . . . . . . .   7
     5.1.  Syntax  . . . . . . . . . . . . . . . . . . . . . . . . .   7
     5.2.  Examples  . . . . . . . . . . . . . . . . . . . . . . . .   7
   6.  Nested Structures . . . . . . . . . . . . . . . . . . . . . .   8
     6.1.  Recursive Composition . . . . . . . . . . . . . . . . . .   8
     6.2.  Examples  . . . . . . . . . . . . . . . . . . . . . . . .   8
     6.3.  Delimiter Selection Guidelines  . . . . . . . . . . . . .   8
   7.  Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . .   9
   8.  Implementation Considerations . . . . . . . . . . . . . . . .   9
     8.1.  Validation  . . . . . . . . . . . . . . . . . . . . . . .   9
     8.2.  Limits  . . . . . . . . . . . . . . . . . . . . . . . . .   9
   9.  MIME Type and File Extension  . . . . . . . . . . . . . . . .   9
     9.1.  MIME Type . . . . . . . . . . . . . . . . . . . . . . . .   9
     9.2.  File Extensions . . . . . . . . . . . . . . . . . . . . .  10
   10. Security Considerations . . . . . . . . . . . . . . . . . . .  10
     10.1.  Injection and Interpretation Risks . . . . . . . . . . .  10
     10.2.  Complexity and Resource Exhaustion . . . . . . . . . . .  10
     10.3.  Mixed-Tool Interoperability  . . . . . . . . . . . . . .  11
     10.4.  Encoding Issues  . . . . . . . . . . . . . . . . . . . .  11
     10.5.  IANA Considerations  . . . . . . . . . . . . . . . . . .  11
   Change Log  . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
   References  . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
     Normative References  . . . . . . . . . . . . . . . . . . . . .  11
     Informative References  . . . . . . . . . . . . . . . . . . . .  12
   Appendix A.  Grammar (ABNF) . . . . . . . . . . . . . . . . . . .  12


Caldas                    Expires 16 July 2026                  [Page 2]

Internet-Draft                    CSV++                     January 2026


   Appendix B.  Complete Examples  . . . . . . . . . . . . . . . . .  13
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  13
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  13

1.  Introduction

   CSV++ extends the CSV format defined in [RFC4180] to support
   repeating fields (one-to-many relationships) and hierarchical
   component structures while maintaining backward compatibility with
   standard CSV parsers.

1.1.  Motivation

   Traditional CSV files represent flat, tabular data.  However, real-
   world data often contains:

   *  Repeated values (e.g., multiple phone numbers for one person)

   *  Structured components (e.g., addresses with street, city, state,
      zip)

   *  Nested hierarchies (e.g., addresses with multiple address lines)

   CSV++ addresses these limitations by introducing:

   While formats like JSON, XML, and YAML excel at representing
   hierarchical data, they introduce complexity and redundancy that may
   not be warranted for moderately structured datasets.  CSV++ occupies
   a middle ground by extending CSV's tabular simplicity with
   hierarchical capabilities, making it particularly suitable for:

   *  Data interchange where CSV is already established but structure is
      needed

   *  Spreadsheet applications where users expect tabular layouts

   *  Systems with existing CSV infrastructure that need enhanced
      capabilities

   *  Scenarios where human readability and editability in text editors
      is valued

   *  Applications requiring backward compatibility with legacy CSV
      parsers

   CSV++ maintains CSV's core strengths - simple tooling, wide
   compatibility, and human-readable plain text - while addressing its
   limitations with hierarchical data through declarative header syntax.


Caldas                    Expires 16 July 2026                  [Page 3]

Internet-Draft                    CSV++                     January 2026


1.2.  When to Use CSV++

   CSV++ is most appropriate for:

   *  Moderately structured data (1-3 levels of nesting)

   *  Environments where CSV is already the established interchange
      format

   *  Scenarios requiring backward compatibility with existing CSV
      infrastructure

   *  Applications that benefit from self-documenting tabular data with
      inline structure definitions

   *  Data that needs to be both machine-parseable and human-readable in
      plain text

   *  Large datasets where file size and bandwidth matter, as CSV's
      columnar format avoids repeating field names in every record
      (unlike JSON or XML)

   For deeply nested hierarchical data (4+ levels), document-oriented
   formats like JSON or XML may provide better readability and tooling
   support.  CSV++ aims to extend CSV's capabilities for moderately
   structured data while preserving its tabular nature, not to replace
   hierarchical data formats.

1.3.  Design Principles

   1.  *Backward Compatibility:* Standard CSV parsers can read CSV++
       files (though they won't interpret the enhanced structure)

   2.  *Self-Documenting:* Structure is defined in column headers

   3.  *Tabular Readability:* Data maintains a tabular layout suitable
       for spreadsheet viewing and editing, though deeply nested
       structures (3+ levels) may be more readable in hierarchical
       formats like JSON

   4.  *Explicit Over Implicit:* Delimiters are declared, not assumed

   5.  *Recursively Composable:* Structures can nest to any depth,
       though practical implementations SHOULD limit nesting to 3-4
       levels for readability


Caldas                    Expires 16 July 2026                  [Page 4]

Internet-Draft                    CSV++                     January 2026


1.4.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  Conformance with RFC 4180

   CSV++ files MUST conform to [RFC4180] with these specifications:

   *  Fields are separated by a delimiter (comma by default)

   *  Records are separated by line breaks (CRLF or LF)

   *  Fields containing special characters MUST be enclosed in double-
      quotes

   *  Double-quotes within quoted fields MUST be escaped by doubling: ""

   *  First record MAY be a header record per RFC 4180.  However, CSV++
      array and structure features REQUIRE headers to declare field
      types

   *  MIME type: text/csv

3.  Field Separator Detection

   The field separator character is detected using the same rules as
   [RFC4180].  Parsers SHOULD auto-detect the field separator by:

   1.  Scanning the first line (header row)

   2.  Tracking bracket depth: [] and ()

   3.  Identifying characters that appear outside brackets (depth = 0)

   4.  Selecting the most common such character as the field separator

   5.  Common candidates: , (comma), \t (tab), | (pipe), ; (semicolon)

   The comma (,) is the conventional field separator for CSV++ files.

4.  Array Fields (Repetitions)


Caldas                    Expires 16 July 2026                  [Page 5]

Internet-Draft                    CSV++                     January 2026


4.1.  Syntax

   A field containing repeated values is declared in the header using
   square brackets:

   column_name[delimiter]
   column_name[]

   Where:

   *  column_name - The name of the field

   *  [delimiter] - Optional: The character used to separate repeated
      values

   *  [] - Empty brackets use the default array delimiter

   Delimiter Resolution:

   1.  If delimiter is specified: phone[|] uses |

   2.  If empty brackets: phone[] uses the tilde (~) as default
       delimiter

   The tilde (~) is recommended as the default array delimiter to avoid
   conflicts with common data characters and the field separator.

4.2.  Examples

   id,name,phone[|],email[;]
   1,John,555-1234|555-5678|555-9012,john@work.com;john@home.com
   2,Jane,555-4444,jane@company.com

                 Figure 1: Arrays with Explicit Delimiters

   id,name,phone[],email[]
   1,John,555-1234~555-5678~555-9012,john@work.com~john@home.com
   2,Jane,555-4444,jane@company.com

                  Figure 2: Arrays with Default Delimiters

4.3.  Empty Values

   Empty values in repetitions are represented by consecutive
   delimiters:

   id,tags[|]
   1,urgent||priority


Caldas                    Expires 16 July 2026                  [Page 6]

Internet-Draft                    CSV++                     January 2026


                                  Figure 3

   This represents three tags: "urgent", "" (empty), "priority"

4.4.  Escaping

   If a repetition delimiter appears within the data itself, that
   individual value MUST be quoted per [RFC4180].  The repetition
   delimiter outside quotes still functions as a separator:

   id,notes[|]
   1,First note|Second note|"Third note contains | pipe character"
   2,"Note with | pipe"|Another note|Final note

               Figure 4: Escaping Delimiters in Array Values

   In the first row, there are three notes.  The third note contains a
   literal pipe character.  In the second row, the first note contains a
   literal pipe character.

5.  Structured Fields (Components)

5.1.  Syntax

   A field containing structured components is declared using
   parentheses:

   column_name[repetition_delim]component_delim(
       comp1 component_delim comp2 ...)
   column_name[]component_delim(comp1 component_delim comp2 ...)
   column_name[](comp1 component_delim comp2 ...)
   column_name(comp1 component_delim comp2 ...)

   Component Delimiter Resolution:

   1.  If specified before (: address^(...) uses ^

   2.  If omitted: address(...) uses the caret (^) as default delimiter

   The caret (^) is recommended as the default component delimiter to
   avoid conflicts with common data characters.

5.2.  Examples

   id,name,geo^(lat^lon)
   1,Location A,34.0522^-118.2437
   2,Location B,40.7128^-74.0060


Caldas                    Expires 16 July 2026                  [Page 7]

Internet-Draft                    CSV++                     January 2026


                         Figure 5: Simple Structure

   id,name,address[~]^(street^city^state^zip)
   1,John,123 Main St^Los Angeles^CA^90210~456 Oak Ave^New York^NY^10001
   2,Jane,789 Pine St^Boston^MA^02101

                       Figure 6: Repeated Structures

6.  Nested Structures

6.1.  Recursive Composition

   Structures can nest arbitrarily deep.  Component names can themselves
   be arrays or structures.  Within component names in (...), array and
   structure syntax applies recursively.

6.2.  Examples

   id,name,address[~]^(type^lines[;]^city^state^zip)
   1,John,home^123 Main;Apt 4^LA^CA^90210~work^456 Oak^NY^NY^10001

                      Figure 7: Array Within Structure

   id,location^(name^coords:(lat:lon))
   1,Office^34.05:-118.24
   2,Home^40.71:-74.00

                    Figure 8: Structure Within Structure

6.3.  Delimiter Selection Guidelines

   To maintain readability and parseability:

   1.  *REQUIRED:* Use different delimiters at each nesting level.
       Nested structures MUST use different component delimiters than
       their parent

   2.  Use visually distinct delimiters at each level

   3.  *Recommended progression:* ~ -> ^ -> ; -> :

   4.  Avoid using the field separator as a component delimiter

   5.  Document delimiter choices for complex schemas

   6.  *Recommendation:* Limit nesting to 3-4 levels maximum


Caldas                    Expires 16 July 2026                  [Page 8]

Internet-Draft                    CSV++                     January 2026


7.  Parsing

   CSV++ parsers process files in two phases:

   1.  *Header Parsing:* Parse column headers to identify field types
       (simple, array, or structured) and extract delimiter information

   2.  *Data Parsing:* For each data row, split fields according to
       their declared type, respecting [RFC4180] quoting rules for
       nested delimiters

   The ABNF grammar in Appendix A provides a formal specification.
   Implementations MUST handle arbitrary nesting depth up to their
   documented limits.

8.  Implementation Considerations

8.1.  Validation

   Implementations SHOULD validate:

   *  Matching number of components across repeated structures

   *  Proper bracket nesting in headers

   *  Delimiter conflicts (same delimiter at multiple levels)

   *  MUST reject: Nested structures using the same component delimiter
      as their parent

   *  Reasonable nesting depth (recommend warning beyond 3-4 levels)

8.2.  Limits

   Implementations MAY impose reasonable limits on:

   *  Nesting depth (recommended minimum: 10 levels)

   *  Number of components per structure (recommended minimum: 100)

   *  Number of repetitions per array (recommended minimum: 1000)

9.  MIME Type and File Extension

9.1.  MIME Type

   CSV++ files use the text/csv media type defined in [RFC4180].


Caldas                    Expires 16 July 2026                  [Page 9]

Internet-Draft                    CSV++                     January 2026


9.2.  File Extensions

   *  .csv - Standard extension (recommended for compatibility)

   *  .csvpp - MAY be used to explicitly indicate CSV++ format

   *  .csvplus - Alternative explicit extension

10.  Security Considerations

   CSV is a long-established and widely deployed format with well-known
   security considerations.  As a result, most mature implementations
   already incorporate mitigations for common CSV-related risks.  This
   specification builds on [RFC4180] and remains fully backward
   compatible, but introduces additional structural semantics that may
   increase parser complexity and therefore require corresponding care
   in implementations.

10.1.  Injection and Interpretation Risks

   Malicious data may attempt to inject delimiters or structural markers
   to influence parsing behavior.  Implementations MUST respect
   [RFC4180] quoting rules.  Delimiters and structural markers appearing
   within quoted fields MUST be treated as literal values.

   The default delimiters defined by this specification are
   intentionally chosen to be neutral and to avoid characters commonly
   associated with executable or control semantics.  In addition, the
   explicit declaration of any non-default delimiters in the header
   allows parsers to establish expectations up front, reducing the
   likelihood of delimiter injection or ambiguous interpretation.

   As with traditional CSV, some spreadsheet applications interpret
   certain values (e.g. those beginning with "=", "+", "-", or "@") as
   formulas.  This specification does not attempt to redefine or
   mitigate spreadsheet formula evaluation; producers and consumers
   SHOULD continue to apply established best practices when targeting
   such environments.

10.2.  Complexity and Resource Exhaustion

   Deeply nested, malformed, or highly repetitive structures may lead to
   excessive CPU or memory consumption during parsing.

   Implementations SHOULD:

   *  Enforce configurable limits on nesting depth and repetition


Caldas                    Expires 16 July 2026                 [Page 10]

Internet-Draft                    CSV++                     January 2026


   *  Enforce reasonable limits on field sizes and record length

   *  Fail fast on structurally invalid input

   *  Prefer streaming or incremental parsing for large files

   *  Validate headers and structural definitions before processing data
      rows

10.3.  Mixed-Tool Interoperability

   CSV++ files may transit through tools unaware of the extended
   semantics, potentially resulting in loss of structure or unintended
   reinterpretation.  Implementations used in security-sensitive
   pipelines SHOULD explicitly validate inputs and avoid implicit trust
   when moving between CSV-aware and CSV++-aware tools.

10.4.  Encoding Issues

   Files SHOULD use UTF-8 encoding.  Implementations SHOULD detect and
   handle encoding errors.  A BOM (Byte Order Mark) MAY be present.

10.5.  IANA Considerations

   This document has no IANA actions.

   CSV++ files use the text/csv media type as defined in [RFC4180].  The
   format is fully backward compatible with standard CSV parsers;
   implementations unaware of the extensions defined in this document
   will process CSV++ files as conventional CSV, ignoring extended
   semantics.

Change Log

   Changes from -00 to -01:

   *  Enhanced Motivation section to contrast with JSON/XML

   *  Added "When to Use CSV++" section

   *  Improved scaping example on 4.4

   *  Updated Security section to include CSV injection considerations.

References

Normative References


Caldas                    Expires 16 July 2026                 [Page 11]

Internet-Draft                    CSV++                     January 2026


   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4180]  Shafranovich, Y., "Common Format and MIME Type for Comma-
              Separated Values (CSV) Files", RFC 4180,
              DOI 10.17487/RFC4180, October 2005,
              <https://www.rfc-editor.org/info/rfc4180>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Informative References

Appendix A.  Grammar (ABNF)

   csvpp-file     = header-row data-rows

   header-row     = field *(field-sep field) CRLF
   data-rows      = *(data-row CRLF)
   data-row       = value *(field-sep value)

   field          = simple-field / array-field /
                    struct-field / array-struct-field
   simple-field   = name
   array-field    = name "[" [delimiter] "]"
   struct-field   = name [component-delim] "(" component-list ")"
   array-struct-field = name "[" [delimiter] "]"
                        [component-delim] "(" component-list ")"

   component-list = component *(component-delim component)
   component      = simple-field / array-field /
                    struct-field / array-struct-field

   name           = 1*field-char
   field-char     = ALPHA / DIGIT / "_" / "-"
   delimiter      = CHAR
   component-delim = CHAR

   value          = quoted-value / unquoted-value
   quoted-value   = DQUOTE *(textdata / escaped-quote) DQUOTE
   unquoted-value = *textdata
   escaped-quote  = DQUOTE DQUOTE
   textdata       = <any character except DQUOTE, CRLF, or field-sep>


Caldas                    Expires 16 July 2026                 [Page 12]

Internet-Draft                    CSV++                     January 2026


Appendix B.  Complete Examples

   id,cust,items[~]^(sku^name^qty^price^opts[;]:(k:v))
   1,Alice,S1^Shirt^2^20^sz:M;col:blu~S2^Pant^1^50^sz:32

                         Figure 9: E-commerce Order

Acknowledgments

   This specification was inspired by the HL7 Version 2.x delimiter
   hierarchy and the need for a simple, human-readable format for
   hierarchical data that maintains compatibility with existing CSV
   tools.

Author's Address

   Marcelo Caldas
   Independent
   Roswell, Georgia
   United States of America
   Email: mscaldas@gmail.com


Caldas                    Expires 16 July 2026                 [Page 13]