YANG Message Keys for Message Broker Integration

YANG Message Keys for Message Broker Integration Swisscom

Binzring 17 Zurich 8045 CH thomas.graf@swisscom.com

Swisscom

Binzring 17 Zurich 8045 CH ahmed.elhassany@swisscom.com

Deutsche Telekom

Barcelona ES alex.huang-feng@t-systems.com

Everything OPS

Liege BE benoit@everything-ops.net

NTT

Veemweg 23 Barneveld 3771 NL paolo@ntt.net

General NMOP YANG-Push Data Mesh Network Telemetry Network Analytics This document specifies a mechanism to define a unique Message key for a YANG to Message Broker integration and a topic addressing scheme based on YANG-Push subscription type and YANG Schema Node Identifier. This enables YANG data consumption of a subset of subscribed YANG data, either per specific YANG data node, identifier or telemetry message type, by indexing and organizing in Message Broker topics. It helps to index the information by using data taxonomy and organizes data in partitions and shards of Message Brokers and time series databases.

Introduction Nowadays network operators are using machine and human readable YANG to model their configurations and monitor YANG operational data from their networks according to . Most network analytic use cases require real-time data and the delivery of near real-time analytical and actionable insights. This imposes high scalability, resilience and low overhead in the data processing pipeline. Accessing the right data for the right use case with minimal overhead and in the shortest period of time is therefore crucial. Network operators organize their data in a Data Mesh according to where a Message Broker, such as Apache Kafka or Apache Pulsar, facilitates the exchange of Messages among data processing components in topics and subjects. Typically, data is being stored in Message Broker topics for several hours or days to facilitate resilience in the data processing chain and addressed in Subjects depending on Schema, enabling a data consumer to address and re-consume previously consumed data again if previously lost. Dimensional data is structured information in a data store. It uses a model of dimension tables to organize business metrics and their descriptive context. This model, developed by Ralph Kimball, simplifies data analysis and reporting by creating denormalized, easy-to-understand structures for quick querying. It is optimized for online analytical processing (OLAP) and data warehouses by using the data taxonomy to scale in partitions and shards. YANG as a data modelling language based on hierarchical tree-based structures facilitates the modelling of dimensional data. This is best shown with YANG Tree Diagrams. An Architecture for YANG-Push to Message Broker Integration specifies an architecture for integrating YANG-Push with Message Brokers for a Data Mesh architecture. describes how the notification messages at a YANG-Push Receiver are being transformed to the Message Broker while specifies to a Message Schema to contextualize telemetry data. However, neither of these documents addresses how these messages should be indexed in a Message Broker, nor define how topics, partitioning and sharding must be used. Due to this missing dimensional indexing for Message Broker stored YANG data, all YANG data is stored in one single Topic. This leads to a round robin distribution across multiple Partitions where each YANG Schema ID is defined as a subject within that topic. Therefore, the entire Topic from all Partitions needs to be consumed first before data selection can be applied. This leads to avoidable data processing overhead which in turn impairs scalability and real-time capabilities, required for certain Network Analytics use cases. YANG telemetry data can be used for several network analytic use cases. Importantly, depending on the use case, only a subset of the subscribed YANG data might be necessary (in time or space). For example, for specific use cases, it is more important to know the current network state, as opposed to have the full series of the state changes over time. In other use cases, instead of consuming data for all network nodes, only a specific network node or network node component requires the YANG monitoring and hence subscription. This document defines how YANG Messages should be indexed and organized in Message Broker topics by leveraging the network node hostname, the YANG-Push subscription identifier, and concrete XPath data node instances derived from the YANG schema path for indexing. Then, a YANG-Push subscription type and YANG Schema name for a Message Broker topic naming scheme is defined to better organize YANG data. Network node hostname and subtree and xpath filters are part of "ietf-yang-push-telemetry-message" structured YANG data defined in . The Message Key is derived through a three-phase algorithm that normalizes subscription filters against the YANG schema path and extracts concrete data node instances from each notification message.

Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Terminology The following terms are used as defined in :

Network Telemetry
Network Analytics
Value
State
Change

The following terms are used as defined in :

Message Broker
YANG Message Broker Producer
YANG Message Broker Consumer
YANG Schema Registry
YANG Schema ID
YANG Data Consumer

The following terms are used as defined in Apache Kafka and Apache Pulsar Message Broker:

Subject: Corresponds to a unique Schema Path within a Schema Registry and is used to identify Messages within a Topic.
Topic: A communication channel for publishing and subscribing messages with one or more subjects and partitions.
Topic Compaction: The act of compressing messages in a topic to the latest state. As used with Apache Pulsar. Apache Kafka uses the term Log Compaction with identical meaning.
Partition: Messages in a topic are spread over hash buckets where a hash bucket refers to a partition being stored within one message broker node. Message ordering is guaranteed within a partition.
Shard: The same as Partition but distributed among multiple message broker nodes. In this document, the term Partition is being used primarily but the described indexing concept equally applies also to Shards.
Message: A piece of structured data sent between data processing components to facilitate communication in a distributed system
Message Key: Message Key: Metadata associated with a message to facilitate deterministic hash bucketing and indexing for instantiated YANG data.

The following terms are used as defined in The Log-Structured Merge-Tree scientific paper:

LSM Tree: Log-Structured Merge-Tree is a data structure with performance characteristics that makes it attractive for providing indexed access to files with high insert volume. LSM trees, like other search trees, maintain key-value pairs.

The following terms are used as defined in Confluent Schema Registry Documentation:

Schema: A formalized, documented structure that defines the shape and content of the messages exchange.
Schema ID: A unique identifier of a schema associated to a Message Broker subject.
Schema Registry: A system where schemas are registered, compared and retrieved.

The following terms are used as defined in :

Periodic Subscription
On-change Subscription
Sync-On-Start
Xpath Filter
Subtree Filter

The following terms are used as defined in :

Notification
Hostname

The following terms are used as defined in :

Datastore

The following terms are used as defined in :

Schema Node Identifier
Data Node: Such as "container", "leaf", "leaf-list", "list", "choice", "case", "augment", "uses", "anydata", and "anyxml" elements.

The following terms are used as defined in this document:

Schema Path: A specific route to a single, leaf in a schema tree. As oposed to a schema tree which represents the entire hierarchical structure of a schema, showing how all these individual paths branch out and relate to each other as a whole.

Solution Design To identify which network node produced which YANG data instance into which Message Broker Topic, Partition and Subject, YANG Message Keys and Indexes are being introduced. These keys enable a deterministic distribution of YANG messages across Topics and Partitions enabling applications to consume only the needed data from specific topics and partitions. In order to facilitate Message Broker Topic Compaction, a YANG-Push subscription type based topic naming scheme is defined. This segregates statistical (Value), State and State change YANG metrics and facilitates a YANG Message Broker Consumer to use the Topic wild card consumption method to select based on YANG-Push subscription type.

YANG Message Keys and Indexes For topics that carry YANG telemetry messages as defined in , a Message Key MUST be used. If no Message Key is defined then the Messages are distributed in a round robin fashion across partitions. If a Message Key is defined, then the value of the Message Key is being used as input for the Message Broker Producer hash function to distribute across Partitions. Therefore, Message Keys facilitate Message deterministic distribution. The Message Key is not only used for Message indexing at the Message Producer but also at the Message Broker for topic compaction. For YANG, the network node hostname, the YANG-Push subscription identifier, and concrete XPath data node instances are used to generate the Message Key. The Message Key MUST be derived through the three-phase algorithm described in to guarantee deterministic and unambiguous keys. The following sections describe the Message Key format, the derivation algorithm, and how Message Keys are used in both Message producers and Message consumers.

Message Key Format The Message Key is a UTF-8 encoded byte string consisting of exactly three fields separated by a single newline character (LF, U+000A):

Line 1: node-name. The managed device identifier, typically a hostname or FQDN, as defined in "ietf-yang-push-telemetry-message" .
Line 2: subscription-id. The YANG-Push subscription identifier as a decimal string.
Line 3: One or more concrete XPath expressions, sorted lexicographically and deduplicated, joined by " | " (space, pipe, space). Each XPath uniquely identifies one data node instance within the notification.

The key MUST NOT contain a trailing newline after the XPath line. This ensures byte-identical keys for identical inputs, which is the invariant required for Message Broker topic compaction. shows a Message Key for a single interface instance on node "router-nyc-01" with subscription identifier "1042".

shows a Message Key where a single notification carries two interface instances. The XPaths are sorted lexicographically and joined with " | ".

shows a Message Key for a container target (no list ancestor). The XPath is the path to the container itself with no key predicates.

This line-delimited format guarantees deterministic serialization without the ambiguities of structured encodings such as JSON (where key ordering and whitespace may vary across implementations). The key is a plain byte string suitable for direct use as a Message Broker record key.

YANG Message Broker Producer The Message Key is derived through a three-phase algorithm. Phase 1 is only required when the YANG-Push subscription uses a subtree filter (); XPath subscriptions skip directly to Phase 2. Phase 2 runs once at subscription creation time. Phase 3 runs for every notification.

| Phase 1: |---> Normalized XPath(s) | (if applicable) | | Normalize | +-------------------+ +--------------+ | v +-------------------+ +--------------+ +---------------+ | Subscription Path |------>| | | Key Templates | | (XPath, possibly | | Phase 2: |---->| + Extraction | | from Phase 1) | | Schema | | Specs | +-------------------+ | Resolution | +---------------+ | YANG Schema Path |------>| | +-------------------+ +--------------+ | v +-------------------+ +--------------+ +--------------+ | Parsed Data Path |------>| | | Message Key | +-------------------+ | Phase 3: |---->| (line- | | Node Name |------>| Data Walk | | delimited) | | Subscription ID |------>| | | | +-------------------+ +--------------+ +--------------+ ]]>

Phase 1: Subtree Filter Normalization defines how YANG data nodes can be subscribed with subtree and xpath selection filters. When a subtree filter is used, the XML representation MUST first be normalized into one or more equivalent XPath expressions before proceeding to Phase 2. The normalization follows the element classification defined in . Each child element in the subtree filter is classified based on its content:

Content match node: The element has text content and no child elements. It becomes an XPath predicate of the form [module:key='value'] on the parent path step.
Selection node: The element has no text content and no child elements. It becomes a terminal XPath branch.
Containment node: The element has child elements (with no text content or only whitespace). It becomes an intermediate path step and the algorithm recurses into its children.

An element whose text content consists only of whitespace (spaces, tabs, newlines) MUST NOT be treated as a content match. It is classified as a selection or containment node depending on whether it has children. Each path segment in the output carries the YANG module-name prefix. Duplicate XPath branches MUST be deduplicated. Multiple branches are joined with " | ". For example, the subtree filter shown in normalizes to the XPath expression shown in .

eth0

]]>

In this example the "name" element is classified as a content match (it has text content "eth0"), producing the predicate [ietf-interfaces:name='eth0'] on the "interface" path step. The "oper-status" element is classified as a selection node, becoming the terminal branch.

Phase 2: Key Template Derivation Given a subscription XPath (from Phase 1 or directly from an XPath subscription) and a compiled YANG schema context, Phase 2 resolves each union branch to its target schema node, walks the ancestor chain from root to target, and builds a key template. The subscription XPath is first split on top-level "|" into individual branches (respecting brackets, quotes, and parentheses). Each branch is processed independently. For each branch, the algorithm resolves the XPath to its target schema node using the YANG Schema Path. It then walks the ancestor chain from the schema root to the target node, building the key template path. For each node in the path:

The path segment uses minimal-prefix style: the YANG module name appears only on the first segment or when the module changes from the previous segment.
If the node is a YANG list (), key predicates are appended for each schema-defined key leaf in schema order. If the subscription XPath contains a matching predicate with a literal value, that value is embedded in the template (pinned key). Otherwise, a placeholder marker is used indicating that the value must be extracted from the notification data at runtime (open key).
If the target node is a YANG leaf-list, a value predicate [.='...'] is appended, either as a pinned literal or as an open placeholder.
YANG choice and case nodes are skipped as they are not data nodes.

The following predicate normalizations are applied during template derivation:

No predicate in subscription: open placeholder + extraction spec.
[key='value']: literal value embedded as-is (pinned).
[module:key='value']: module prefix stripped, literal value embedded.
[key="value"]: double-quote normalized to single-quote.
[N] (positional predicate): treated as open (positional subscripts do not identify a specific instance by key).

shows the key template derived from the XPath subscription "/ietf-interfaces:interfaces/interface". Because the "interface" list has a key leaf "name" and the subscription does not pin it, the template contains an open placeholder for the "name" key.

shows the key template when the subscription pins the outer list key but leaves the inner list open: "/ietf-interfaces:interfaces/interface[name='eth0'] /ietf-ip:ipv4/address".

Phase 2 produces one key template per union branch, together with extraction specifications that describe which key leaf values must be extracted from the data tree at runtime to fill each open placeholder. Each extraction is expressed as an absolute XPath that identifies the key leaf in the data tree. The path mirrors the template path from the root to the owning list (preserving any pinned predicates on ancestor lists) and appends the key leaf name without a predicate: /MODULE:CONTAINER/.../LIST/KEY-LEAF-NAME When evaluated against a notification data tree, this XPath selects the key leaf value(s) of the matching list instance(s). For leaf-list targets, the extraction is simply "." (the data node's own value).

Phase 3: Runtime Key Production For each notification, the parsed data tree is walked and matched against the branch templates from Phase 2. For each matching data node instance:

The open placeholders in the key template are filled with the actual key leaf values from the data tree by evaluating the extraction XPath from Phase 2. Each extraction XPath selects the key leaf value(s) of the matching list instance(s) in the notification data.
For leaf-list targets, the data node's own value fills the [.='%s'] placeholder.

As an optimization, implementations need not invoke a full XPath evaluator for each extraction. Because the extraction path always leads to a key leaf of an ancestor list, it can be rewritten to an equivalent "ancestor-or-self" axis expression evaluated relative to the matched data node. For example, the extraction "/ietf-interfaces:interfaces/interface/name" becomes "ancestor-or-self::ietf-interfaces:interface/name". This reduces evaluation to a simple upward tree walk from the matched data node to the first ancestor whose schema node matches the list name, followed by reading a direct child. This yields O(d) complexity where d is the depth of the data tree, which is typically small (3 to 8 levels for common YANG models). All filled XPath expressions (concrete XPaths) are collected, deduplicated, and sorted lexicographically. The final Message Key is then composed as defined in : the node-name and subscription-id on separate lines, followed by the concrete XPaths joined by " | " on a third line. shows the complete derivation for a notification carrying two interface instances.

eth0

eth1

down

Concrete XPaths (sorted): /ietf-interfaces:interfaces/interface[name='eth0'] /ietf-interfaces:interfaces/interface[name='eth1'] Message Key: router-nyc-01 1042 /ietf-interfaces:interfaces/interface[name='eth0']\ | /ietf-interfaces:interfaces/interface[name='eth1'] ]]> When the subscription targets a YANG container (with no list ancestor), there are no open placeholders and no instances to match. In that case, the key template itself is used as the single concrete XPath in the Message Key.

Subtree Filter End-to-End Example illustrates the complete three-phase derivation starting from a subtree filter subscription that spans two YANG modules. The filter selects the "oper-status" leaf of interface "eth0" from "ietf-interfaces" (concrete branch, key pinned in the filter) and the "serial-num" leaf of every hardware component from "ietf-hardware" (non-concrete branch, no key value in the filter). Phase 1 produces a union of two branches. Phase 2 derives one fully-concrete template (key pinned) and one template with an open placeholder for the component "name" key. Phase 3 evaluates the open placeholder against the notification data, expanding the single template into one concrete XPath per matching component instance.

eth0

Phase 1 (Normalize): /ietf-interfaces:interfaces/ietf-interfaces:interface[ietf-interfaces\ :name='eth0']/ietf-interfaces:oper-status | /ietf-hardware:hardware/ietf-hardware:component/ietf-hardware\ :serial-num Phase 2 (Two branch templates): Branch 0 (leaf target, key pinned): /ietf-interfaces:interfaces/interface[name='eth0']/oper-status (fully concrete) Branch 1 (leaf target, key open): /ietf-hardware:hardware/component[name='%s']/serial-num Extraction: /ietf-hardware:hardware/component/name (open placeholder: name) Notification data (XML): eth0

chassis

SN-12345

fan-1

SN-67890

Phase 3 (Key production): Concrete XPaths (sorted): /ietf-hardware:hardware/component[name='chassis']/serial-num /ietf-hardware:hardware/component[name='fan-1']/serial-num /ietf-interfaces:interfaces/interface[name='eth0']/oper-status Message Key: router-nyc-01 1042 /ietf-hardware:hardware/component[name='chassis']/serial-num | /ietf-hardware:hardware/component[name='fan-1']/serial-num | /ietf-interfaces:interfaces/interface[name='eth0']/oper-status ]]> When the Message is being produced to the Message Broker, the network node hostname is used from the structured YANG data defined in "ietf-yang-push-telemetry-message" . The concrete XPath expressions are derived from the subscription filter and the YANG Schema Path as described above, and then instantiated with values from each notification data path. represent the Message Key and and the Message Broker headers with the Schema ID and contend type of the Message and the Message itself.

YANG Message Broker Consumer The consumer hashes the Message Key, applies modulo with the number of partitions, and determines the partition from which it should consume messages bearing that Message Key. To parse the Message Key, the consumer splits the byte string on newline (LF) characters. The first line is the node-name, the second is the subscription-id, and the third line contains the concrete XPath expression(s) joined by " | ". At a YANG data store, such as a Time Series database or stream processor, the YANG data could then be ingested into tables according to topic names and indexed per Message Key. If Topic Compaction is enabled, only current state is consumed.

Time Series Database Depending if the YANG Data Consumer knows the Message Key from the YANG Message Broker Consumer or the YANG Schema from the YANG Schema Registry the network telemetry messages can be indexed in a Time series database. The Message Key could serve as the primary key, while the individual fields (node-name, subscription-id, concrete XPaths) can be reflected in the indexing scheme using primary and secondary keys in a time series database. Implementation examples can be found under .

YANG-Push Message Broker Topic Naming Each YANG-Push subscription requires a deterministic, human-readable Message Broker topic name. The topic name MUST satisfy the following requirements:

Deterministic: The same subscription, regardless of syntactic form (XPath vs. subtree filter, redundant module prefixes, single-quoted vs. double-quoted predicates), MUST always produce the same topic name.
Unique: Two subscriptions targeting different YANG schema nodes MUST NOT share a topic name.
Human-readable: An operator inspecting a topic listing SHOULD be able to identify which YANG data the topic carries.
Stable under schema evolution: Augmenting the YANG schema with new nodes MUST NOT change existing topic names. Any optimization that depends on the current set of schema siblings (e.g. dropping zero-entropy wrapper containers or using shortest-unique-prefix abbreviation) is therefore unsafe and MUST NOT be used.
Within Message Broker limits: Topic names MUST contain only characters permitted by the Message Broker (for Apache Kafka: [a-zA-Z0-9._-], maximum 249 characters).

The topic name is derived from the Phase 2 key template (see ) through a purely mechanical transformation of the YANG schema DATA path. No additional schema resolution is needed beyond Phase 2.

Topic Name Derivation Algorithm The input is the Phase 2 key template for one branch of the subscription. For union subscriptions with multiple branches, each branch produces its own topic name independently. The derivation proceeds in five steps:

Strip Predicates: Remove all predicate expressions ([...]) from the key template. This yields the schema DATA path, the structural identity of the subscription target, independent of any specific instance. For example, "/ietf-interfaces:interfaces/interface[name='%s']" becomes "/ietf-interfaces:interfaces/interface".
Replace Module Names with YANG Prefixes: Walk the path segments. Wherever a segment carries a "module-name:local-name" prefix, look up the module in the YANG schema context and substitute its prefix statement. YANG module prefixes are short by convention (2-6 characters), unique within any loaded schema context, and immutable once a module is published (). For example, "/ietf-interfaces:interfaces/interface" becomes "/if:interfaces/interface".
Flatten to Topic Name: Apply three mechanical substitutions: (a) remove the leading "/", (b) replace every ":" with "-", (c) replace every "/" with "-". For example, "if:interfaces/interface" becomes "if-interfaces-interface". All resulting characters ([a-z0-9-]) are valid in Message Broker topic names.
Prepend Organization Prefix (Optional): When an organization, team, or project prefix is configured, it is prepended with a "-" separator. For example, "if-interfaces-interface" becomes "netops-if-interfaces-interface". The prefix MUST contain only Message Broker safe characters. When no prefix is configured, this step is a no-op.
Handle Overflow: If the resulting name exceeds the maximum topic name length (configurable, default 255), it is truncated at the last "-" boundary that keeps the name within the budget, and an 8-character hexadecimal hash suffix (FNV-1a 64-bit of the full Schema Path) is appended for uniqueness. In practice, overflow rarely triggers — the longest realistic YANG paths produce topic names of 50-80 characters.

shows the derivation for three subscription paths.

shows the same paths with an organization prefix "netops".

Properties The topic naming algorithm has the following properties:

The mapping is injective (one-to-one): given a topic name, the original Schema Path can be reconstructed by reversing the substitutions. Different Schema Paths always produce different topic names.
The topic name depends only on the subscription's own path segments and their YANG module prefixes. It does not depend on what other nodes exist at the same schema level. Augmenting the schema adds new topic names but never changes existing ones.
The "-" separator creates a natural hierarchy for pattern matching. For example, "^if-interfaces-interface-.*" matches all interface leaf topics, and "^if-.*" matches all topics from the ietf-interfaces module.

YANG Message Broker Producer The YANG Message Broker Producer derives the topic name from the YANG-Push subscription's xpath or subtree filter by running Phases 1 and 2 (as described in ) and then applying the topic name derivation algorithm above. The subscription type ("periodic", "on-change", or "on-change" with "sync-on-start") MAY be encoded in a separate topic hierarchy level, depending on the deployment's naming policy. Where "periodic" is encoded as "stats", "on-change" as "state-change", "on-change" with "sync-on-start" as "state" and "on-change" with "sync-on-start" whre topic compaction is enabled as "current-state".

YANG Message Broker Consumer The consumer can subscribe to multiple topics using wildcard or regex patterns. For example:

All interface data: "^if-interfaces-interface.*"
All data from a specific module: "^if-.*" (ietf-interfaces) or "^sys-.*" (ietf-system)
All data from an organization: "^netops-.*"
All current-state data from an organization: "^netops-current-state-.*"

The YANG data is then ingested into tables according to topic names and indexed per Message Key. If Topic Compaction is enabled, only the current state is consumed.

Message Broker Implementations Topic, Partitioning and Message Keys are generic concepts of Message Brokers. There are two known Message Broker implementations supporting all features described in this document.

Apache Kafka Apache Kafka supports Message Keys, Partitioning and Log Compaction. With the following example from the Apache Kafka admin client API a new compacted Topic can be created. future = result.values().get(topicName); // Call get() to block until the topic creation is complete or has // failed if creation failed the ExecutionException wraps the // underlying cause. future.get(); } ]]> The most important configuration items from are "topicName" defines the Topic name, "partitions" the amount of partitions, "replicationFactor" how many times the partition is being replicated. With "compact" in "cleanup.policy" the log compaction can be turned on per topic. With "min.cleanable.dirty.ratio" and "delete.retention.ms" how often and when Log Compaction should occur per topic. Where with "retention.bytes" and with "retention.ms" the topic specific compaction configurations can be limited how often the topics are compacted. The topic names are constrained to 249 character length and the following characters: "a-z", "A-Z", "0-9", ".", "_" and "-". Topics can be created on the fly by producing into a new Topic when "auto.create.topics.enable" has been configured prior. Topics should be deleted at the end of the lifecycle through the "kafka-topics.sh" command. The Partition count for a given Topic can be increased but not decreased. Consumer groups are automatically re-joined and partitions are being rebalanced on Message Broker nodes when Partition count changed.

Apache Pulsar Apache Pulsar supports Message Keys, Partitioning and Topic Compaction. With "brokerServiceCompactionThreshold" when Topic Compaction should occur is being configured. The topic names allow all characters except: "/". Topics can be created on the fly by producing into a new Topic when "allowAutoTopicCreation" has been configured prior. Topics should be deleted at the end of the lifecycle through pulsar-admin or pulsarctl tools. The Partition count for a given Topic can be increased but not decreased. Consumer groups are automatically re-joined and partitions are being rebalanced on Message Broker nodes when Partition count changed.

Time Series Database Implementations Tables, partition and keys are generic concepts of time series databases. With ClickHouse, this document provides examples of how YANG message keys can be obtained from the Message Broker and used for indexing.

ClickHouse

Data Model Unlike other realtime analytics databases, ClickHouse does not (necessarily) rely on partitioning data by timestamp. ClickHouse represents data in the MergeTree format, which is similar to a LSM tree: A table consists of data parts sorted by primary key. When data is inserted in a table, separate data parts are created and each of data part is lexicographically sorted by primary key. For example, if the primary key is ("MessageKey", "Date"), the data in the part is sorted by "MessageKey", and within each "MessageKey", it is ordered by "Date". Data belonging to different partitions are separated into different parts. In the background, ClickHouse merges data parts for more efficient storage. Parts belonging to different partitions are not merged. The merge mechanism does not guarantee that all rows with the same primary key will be in the same data part. Each data part is logically divided into granules. A granule is the smallest indivisible data set that ClickHouse reads when selecting data. ClickHouse does not split rows or values, so each granule always contains an integer number of rows. The first row of a granule is marked with the value of the primary key for the row. For each data part, ClickHouse creates an index file that stores the marks. For each column, whether it's in the primary key or not, ClickHouse also stores the same marks. These marks let you find data directly in column files. Thus, it is possible to quickly run queries on one or many ranges of the primary key.

Message Broker Integration ClickHouse integrates with Message Brokers through Integration Table Engines. Reading (selecting) data through Kafka Table Engine follows Apache Kafka semantics of advancing the offset, so subsequent reads will start at the offset the previous read left off. It is the responsibility of the data model designer to transfer data to a regular table:

Use the engine to create a Kafka consumer and consider it a data stream.

Example:

Create a table with the desired structure.

Example:

Create a materialized view that converts data from the engine and puts it into a previously created table.

The Message Key and partition ID are available as virtual (read only) columns _key and _partition.

Message Formats ClickHouse supports numerous Message formats natively. The example above uses the JSON Lines format but other (binary) formats, such as Apache Avro or Protobuf, are supported as well.

Schema Registry ClickHouse has built in Schema Registry support. For Apache Avro, the Schema Registry and authentication are encoded in additional parameters to the Apache Kafka consumer. For formats such as Confluent JSON_SR, use the "kafka_schema_registry_skip_bytes" parameter to skip reading the Schema Registry preamble. The Schema can then be encoded explicitly.

IANA Considerations This document includes no request to IANA.

Security Considerations This document should not affect the security of the Internet.

Operational Considerations The YANG Message Broker Producer of a YANG-Push receiver should have three config knobs facilitate the features described in this document as optional:

Topic Distribution: Select between "topic" and "subject" distribution. Default is subject to remain backward compatibility to .
Distribution Type: Select between "none" and "YANG-Push subscription type".
YANG Message Key: Select between "enable" and "disable".

Subject distribution enables message ordering for a set of YANG Message Keys on each partition. Where in topic distribution messages are randomly being distributed among partitions. To accommodate for potential date loss throughout the data processing pipeline, periodic update of the current State for State metrics is RECOMMENDED. This can be accommodated with YANG-Push as defined in by complementing "on-change sync on start" subscriptions with "periodic" subscriptions. Alternatively, in YANG-Push Lite defined in this simplified in one subscription.

Implementation status This section provides pointers to existing open source implementations of this draft. Note to the RFC-editor: Please remove this before publishing.

yang-push-key A prove of concept implementing the three-phase algorithm described in . The open source code can be accessed here: .

References Normative References Informative References Toward Building a Semantic Network Inventory for Model-Driven Telemetry IEEE Toward Avoiding the Data Mess: Industry Insights From Data Mesh Implementations IEEE Data Mesh O'Reilly Media The Data Warehouse Toolkit Wiley The Log-Structured Merge-Tree Acta Informatica Apache Kafka Apache Software Foundation Apache Pulsar Apache Software Foundation Confluent Schema Registry Documentation Confluent Community and Apache Software Foundation yang-push-key

Acknowledgements Thanks to Camilo Cardona, Rob Wilton, Holger Keller, Reshad Rahman, Nigel Davis, Olga Havel and Michael Mackey for their comments and reviews. We also like to thank Victor Lopez for the initial idea on the network controller use case. Ashley Woods, Sivakumar Sundaravadivel and Rafael Julio for the idea of grouping topics by YANG-Push subscription type and insisting that Topic Compaction is a key enabler for inventory metrics and YANG data consumer integration and should be supported day 1. Nigel Davis for confirming that Topic Compaction simplifies indeed data processing system architecture and Loic Monney for the operational configuration and monitoring details on Apache Kafka.

Contributors Many thanks goes to Hellmar Becker who contributed and on how YANG Message Keys can be obtained from Message Broker, how time series databases can use it for indexing YANG data and example implementation in ClickHouse. ClickHouse

601 Marshall Street Redwood City CA 94063 US hellmar.becker@clickhouse.com