| Internet-Draft | YANG Message Keys | July 2026 |
| Graf, et al. | Expires 3 January 2027 | [Page] |
This document specifies a mechanism to define a unique Message key for a YANG to Message Broker integration and a topic addressing scheme based on YANG-Push subscription type and YANG Schema Node Identifier. This enables YANG data consumption of a subset of subscribed YANG data, either per specific YANG data node, identifier or telemetry message type, by indexing and organizing in Message Broker topics. It helps to index the information by using data taxonomy and organizes data in partitions and shards of Message Brokers and time series databases.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 3 January 2027.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Nowadays network operators are using machine and human readable YANG [RFC7950] to model their configurations and monitor YANG operational data from their networks according to [Mar24].¶
Most network analytic use cases require real-time data and the delivery of near real-time analytical and actionable insights. This imposes high scalability, resilience and low overhead in the data processing pipeline. Accessing the right data for the right use case with minimal overhead and in the shortest period of time is therefore crucial.¶
Network operators organize their data in a Data Mesh [Deh22] according to [Bod24] where a Message Broker, such as Apache Kafka [Kaf11] or Apache Pulsar [Pul16], facilitates the exchange of Messages among data processing components in topics and subjects. Typically, data is being stored in Message Broker topics for several hours or days to facilitate resilience in the data processing chain and addressed in Subjects depending on Schema, enabling a data consumer to address and re-consume previously consumed data again if previously lost.¶
Dimensional data is structured information in a data store. It uses a model of dimension tables to organize business metrics and their descriptive context. This model, developed by Ralph Kimball [Kim96], simplifies data analysis and reporting by creating denormalized, easy-to-understand structures for quick querying. It is optimized for online analytical processing (OLAP) and data warehouses by using the data taxonomy to scale in partitions and shards. YANG [RFC7950] as a data modelling language based on hierarchical tree-based structures facilitates the modelling of dimensional data. This is best shown with YANG Tree Diagrams [RFC8340].¶
An Architecture for YANG-Push to Message Broker Integration [I-D.ietf-nmop-yang-message-broker-integration] specifies an architecture for integrating YANG-Push with Message Brokers for a Data Mesh architecture. Section 4.5 of [I-D.ietf-nmop-yang-message-broker-integration] describes how the notification messages at a YANG-Push Receiver are being transformed to the Message Broker while Section 3 of [I-D.ietf-nmop-message-broker-telemetry-message] specifies to a Message Schema to contextualize telemetry data. However, neither of these documents addresses how these messages should be indexed in a Message Broker, nor define how topics, partitioning and sharding must be used.¶
Due to this missing dimensional indexing for Message Broker stored YANG data, all YANG data is stored in one single Topic. This leads to a round robin distribution across multiple Partitions where each YANG Schema ID is defined as a subject within that topic. Therefore, the entire Topic from all Partitions needs to be consumed first before data selection can be applied. This leads to avoidable data processing overhead which in turn impairs scalability and real-time capabilities, required for certain Network Analytics use cases.¶
YANG telemetry data can be used for several network analytic use cases. Importantly, depending on the use case, only a subset of the subscribed YANG data might be necessary (in time or space). For example, for specific use cases, it is more important to know the current network state, as opposed to have the full series of the state changes over time. In other use cases, instead of consuming data for all network nodes, only a specific network node or network node component requires the YANG monitoring and hence subscription.¶
This document defines how YANG Messages [I-D.ietf-nmop-message-broker-telemetry-message] should be indexed and organized in Message Broker topics by leveraging the network node hostname, the YANG-Push subscription identifier, and concrete XPath data node instances derived from the YANG schema path for indexing. Then, a YANG-Push subscription type and YANG Schema name for a Message Broker topic naming scheme is defined to better organize YANG data.¶
Network node hostname and subtree and xpath filters are part of "ietf-yang-push-telemetry-message" structured YANG data defined in Section 3 of [I-D.ietf-nmop-message-broker-telemetry-message]. The Message Key is derived through a three-phase algorithm that normalizes subscription filters against the YANG schema path and extracts concrete data node instances from each notification message.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The following terms are used as defined in [I-D.ietf-nmop-terminology]:¶
The following terms are used as defined in [I-D.ietf-nmop-yang-message-broker-integration]:¶
The following terms are used as defined in Apache Kafka [Kaf11] and Apache Pulsar [Pul16] Message Broker:¶
The following terms are used as defined in The Log-Structured Merge-Tree [One96] scientific paper:¶
The following terms are used as defined in Confluent Schema Registry Documentation [ConDoc18]:¶
The following terms are used as defined in [RFC8641]:¶
The following terms are used as defined in [I-D.ietf-netconf-notif-envelope]:¶
The following terms are used as defined in [RFC8342]:¶
The following terms are used as defined in [RFC7950]:¶
The following terms are used as defined in this document:¶
To identify which network node produced which YANG data instance into which Message Broker Topic, Partition and Subject, YANG Message Keys and Indexes (Section 3.1) are being introduced. These keys enable a deterministic distribution of YANG messages across Topics and Partitions enabling applications to consume only the needed data from specific topics and partitions.¶
In order to facilitate Message Broker Topic Compaction, a YANG-Push subscription type based topic naming scheme (Section 3.2) is defined. This segregates statistical (Value), State and State change YANG metrics and facilitates a YANG Message Broker Consumer to use the Topic wild card consumption method to select based on YANG-Push subscription type.¶
For topics that carry YANG telemetry messages as defined in [I-D.ietf-nmop-message-broker-telemetry-message], a Message Key MUST be used. If no Message Key is defined then the Messages are distributed in a round robin fashion across partitions. If a Message Key is defined, then the value of the Message Key is being used as input for the Message Broker Producer hash function to distribute across Partitions. Therefore, Message Keys facilitate Message deterministic distribution.¶
The Message Key is not only used for Message indexing at the Message Producer but also at the Message Broker for topic compaction.¶
For YANG, the network node hostname, the YANG-Push subscription identifier, and concrete XPath data node instances are used to generate the Message Key. The Message Key MUST be derived through the three-phase algorithm described in Section 3.1.2 to guarantee deterministic and unambiguous keys.¶
The following sections describe the Message Key format, the derivation algorithm, and how Message Keys are used in both Message producers and Message consumers.¶
The Message Key is a UTF-8 encoded byte string consisting of exactly three fields separated by a single newline character (LF, U+000A):¶
The key MUST NOT contain a trailing newline after the XPath line. This ensures byte-identical keys for identical inputs, which is the invariant required for Message Broker topic compaction.¶
Figure 1 shows a Message Key for a single interface instance on node "router-nyc-01" with subscription identifier "1042".¶
router-nyc-01 1042 /ietf-interfaces:interfaces/interface[name='eth0']
Figure 2 shows a Message Key where a single notification carries two interface instances. The XPaths are sorted lexicographically and joined with " | ".¶
=============== NOTE: '\' line wrapping per RFC 8792 ================ router-nyc-01 1042 /ietf-interfaces:interfaces/interface[name='eth0']\ | /ietf-interfaces:interfaces/interface[name='eth1']
Figure 3 shows a Message Key for a container target (no list ancestor). The XPath is the path to the container itself with no key predicates.¶
router-nyc-01 1042 /ietf-system:system/clock
This line-delimited format guarantees deterministic serialization without the ambiguities of structured encodings such as JSON (where key ordering and whitespace may vary across implementations). The key is a plain byte string suitable for direct use as a Message Broker record key.¶
The Message Key is derived through a three-phase algorithm. Phase 1 is only required when the YANG-Push subscription uses a subtree filter (Section 6 of [RFC6241]); XPath subscriptions skip directly to Phase 2. Phase 2 runs once at subscription creation time. Phase 3 runs for every notification.¶
+-------------------+ +--------------+
| Subtree Filter |------>| Phase 1: |---> Normalized XPath(s)
| (if applicable) | | Normalize |
+-------------------+ +--------------+
|
v
+-------------------+ +--------------+ +---------------+
| Subscription Path |------>| | | Key Templates |
| (XPath, possibly | | Phase 2: |---->| + Extraction |
| from Phase 1) | | Schema | | Specs |
+-------------------+ | Resolution | +---------------+
| YANG Schema Path |------>| |
+-------------------+ +--------------+
|
v
+-------------------+ +--------------+ +--------------+
| Parsed Data Path |------>| | | Message Key |
+-------------------+ | Phase 3: |---->| (line- |
| Node Name |------>| Data Walk | | delimited) |
| Subscription ID |------>| | | |
+-------------------+ +--------------+ +--------------+
Section 3.6 of [RFC8641] defines how YANG data nodes can be subscribed with subtree and xpath selection filters. When a subtree filter is used, the XML representation MUST first be normalized into one or more equivalent XPath expressions before proceeding to Phase 2.¶
The normalization follows the element classification defined in Section 6.2 of [RFC6241]. Each child element in the subtree filter is classified based on its content:¶
An element whose text content consists only of whitespace (spaces, tabs, newlines) MUST NOT be treated as a content match. It is classified as a selection or containment node depending on whether it has children.¶
Each path segment in the output carries the YANG module-name prefix. Duplicate XPath branches MUST be deduplicated. Multiple branches are joined with " | ".¶
For example, the subtree filter shown in Figure 5 normalizes to the XPath expression shown in Figure 6.¶
<interfaces xmlns=
"urn:ietf:params:xml:ns:yang:ietf-interfaces">
<interface>
<name>eth0</name>
<oper-status/>
</interface>
</interfaces>
/ietf-interfaces:interfaces/ietf-interfaces:interface
[ietf-interfaces:name='eth0']
/ietf-interfaces:oper-status
In this example the "name" element is classified as a content match (it has text content "eth0"), producing the predicate [ietf-interfaces:name='eth0'] on the "interface" path step. The "oper-status" element is classified as a selection node, becoming the terminal branch.¶
Given a subscription XPath (from Phase 1 or directly from an XPath subscription) and a compiled YANG schema context, Phase 2 resolves each union branch to its target schema node, walks the ancestor chain from root to target, and builds a key template.¶
The subscription XPath is first split on top-level "|" into individual branches (respecting brackets, quotes, and parentheses). Each branch is processed independently.¶
For each branch, the algorithm resolves the XPath to its target schema node using the YANG Schema Path. It then walks the ancestor chain from the schema root to the target node, building the key template path. For each node in the path:¶
The following predicate normalizations are applied during template derivation:¶
Figure 7 shows the key template derived from the XPath subscription "/ietf-interfaces:interfaces/interface". Because the "interface" list has a key leaf "name" and the subscription does not pin it, the template contains an open placeholder for the "name" key.¶
Template: /ietf-interfaces:interfaces/interface[name='%s'] Extraction: /ietf-interfaces:interfaces/interface/name
Figure 8 shows the key template when the subscription pins the outer list key but leaves the inner list open: "/ietf-interfaces:interfaces/interface[name='eth0'] /ietf-ip:ipv4/address".¶
Template: /ietf-interfaces:interfaces/interface[name='eth0']/ietf-ip:ipv4/address[ip='%s'] Extraction: /ietf-interfaces:interfaces/interface[name='eth0']/ietf-ip:ipv4/address/ip
Phase 2 produces one key template per union branch, together with extraction specifications that describe which key leaf values must be extracted from the data tree at runtime to fill each open placeholder. Each extraction is expressed as an absolute XPath that identifies the key leaf in the data tree. The path mirrors the template path from the root to the owning list (preserving any pinned predicates on ancestor lists) and appends the key leaf name without a predicate:¶
/MODULE:CONTAINER/.../LIST/KEY-LEAF-NAME¶
When evaluated against a notification data tree, this XPath selects the key leaf value(s) of the matching list instance(s). For leaf-list targets, the extraction is simply "." (the data node's own value).¶
For each notification, the parsed data tree is walked and matched against the branch templates from Phase 2. For each matching data node instance:¶
As an optimization, implementations need not invoke a full XPath evaluator for each extraction. Because the extraction path always leads to a key leaf of an ancestor list, it can be rewritten to an equivalent "ancestor-or-self" axis expression evaluated relative to the matched data node. For example, the extraction "/ietf-interfaces:interfaces/interface/name" becomes "ancestor-or-self::ietf-interfaces:interface/name". This reduces evaluation to a simple upward tree walk from the matched data node to the first ancestor whose schema node matches the list name, followed by reading a direct child. This yields O(d) complexity where d is the depth of the data tree, which is typically small (3 to 8 levels for common YANG models).¶
All filled XPath expressions (concrete XPaths) are collected, deduplicated, and sorted lexicographically. The final Message Key is then composed as defined in Section 3.1.1: the node-name and subscription-id on separate lines, followed by the concrete XPaths joined by " | " on a third line.¶
Figure 9 shows the complete derivation for a notification carrying two interface instances.¶
=============== NOTE: '\' line wrapping per RFC 8792 ================
Subscription XPath:
/ietf-interfaces:interfaces/interface
Phase 2 Template:
/ietf-interfaces:interfaces/interface[name='%s']
Notification data (XML):
<interfaces xmlns=
"urn:ietf:params:xml:ns:yang:ietf-interfaces">
<interface>
<name>eth0</name>
<oper-status>up</oper-status>
</interface>
<interface>
<name>eth1</name>
<oper-status>down</oper-status>
</interface>
</interfaces>
Concrete XPaths (sorted):
/ietf-interfaces:interfaces/interface[name='eth0']
/ietf-interfaces:interfaces/interface[name='eth1']
Message Key:
router-nyc-01
1042
/ietf-interfaces:interfaces/interface[name='eth0']\
| /ietf-interfaces:interfaces/interface[name='eth1']
When the subscription targets a YANG container (with no list ancestor), there are no open placeholders and no instances to match. In that case, the key template itself is used as the single concrete XPath in the Message Key.¶
Figure 10 illustrates the complete three-phase derivation starting from a subtree filter subscription that spans two YANG modules. The filter selects the "oper-status" leaf of interface "eth0" from "ietf-interfaces" [RFC8343] (concrete branch, key pinned in the filter) and the "serial-num" leaf of every hardware component from "ietf-hardware" [RFC8348] (non-concrete branch, no key value in the filter). Phase 1 produces a union of two branches. Phase 2 derives one fully-concrete template (key pinned) and one template with an open placeholder for the component "name" key. Phase 3 evaluates the open placeholder against the notification data, expanding the single template into one concrete XPath per matching component instance.¶
=============== NOTE: '\' line wrapping per RFC 8792 ================
Subtree filter:
<interfaces xmlns=
"urn:ietf:params:xml:ns:yang:ietf-interfaces">
<interface>
<name>eth0</name>
<oper-status/>
</interface>
</interfaces>
<hardware xmlns=
"urn:ietf:params:xml:ns:yang:ietf-hardware">
<component>
<serial-num/>
</component>
</hardware>
Phase 1 (Normalize):
/ietf-interfaces:interfaces/ietf-interfaces:interface[ietf-interfaces\
:name='eth0']/ietf-interfaces:oper-status
| /ietf-hardware:hardware/ietf-hardware:component/ietf-hardware\
:serial-num
Phase 2 (Two branch templates):
Branch 0 (leaf target, key pinned):
/ietf-interfaces:interfaces/interface[name='eth0']/oper-status
(fully concrete)
Branch 1 (leaf target, key open):
/ietf-hardware:hardware/component[name='%s']/serial-num
Extraction:
/ietf-hardware:hardware/component/name
(open placeholder: name)
Notification data (XML):
<interfaces xmlns="urn:ietf:params:xml:ns:yang:ietf-interfaces">
<interface>
<name>eth0</name>
<oper-status>up</oper-status>
</interface>
</interfaces>
<hardware xmlns="urn:ietf:params:xml:ns:yang:ietf-hardware">
<component>
<name>chassis</name>
<serial-num>SN-12345</serial-num>
</component>
<component>
<name>fan-1</name>
<serial-num>SN-67890</serial-num>
</component>
</hardware>
Phase 3 (Key production):
Concrete XPaths (sorted):
/ietf-hardware:hardware/component[name='chassis']/serial-num
/ietf-hardware:hardware/component[name='fan-1']/serial-num
/ietf-interfaces:interfaces/interface[name='eth0']/oper-status
Message Key:
router-nyc-01
1042
/ietf-hardware:hardware/component[name='chassis']/serial-num | /ietf-hardware:hardware/component[name='fan-1']/serial-num | /ietf-interfaces:interfaces/interface[name='eth0']/oper-status
When the Message is being produced to the Message Broker, the network node hostname is used from the structured YANG data defined in "ietf-yang-push-telemetry-message" Section 3 of [I-D.ietf-nmop-message-broker-telemetry-message]. The concrete XPath expressions are derived from the subscription filter and the YANG Schema Path as described above, and then instantiated with values from each notification data path. Figure 11 represent the Message Key and and the Message Broker headers with the Schema ID and contend type of the Message and Figure 12 the Message itself.¶
NOTE: The Key is a two-line byte string; the line break between the
node hostname and the YANG path is a literal LF (U+000A).
Key:
router-nyc-01
1042
/ietf-interfaces:interfaces/interface[name='eth0']
Headers:
schema-id: 1
content-type: application/yang-data+json
{
"ietf-telemetry-message:message": {
"network-node-manifest": {
"name": "pe1",
"vendor": "open source",
"software-version": "1.1.2"
},
"telemetry-message-metadata": {
"node-export-timestamp": "2024-02-14T12:10:10.10+01:00",
"collection-timestamp": "2024-02-14T12:10:10.12+01:00",
"session-protocol": "yang-push",
"notification-event": "log",
"export-address": "192.168.1.100",
"export-port": 123,
"collection-address": "192.168.1.1",
"collection-port": 9991,
"ietf-yang-push-telemetry-message:yang-push-subscription": {
"id": 89,
"xpath-filter": "/ietf-interfaces:interfaces",
"datastore": "ietf-datastores:operational",
"transport": "ietf-udp-notif-transport:udp-notif",
"encoding": "ietf-subscribed-notifications:encode-json",
"module": [
{
"module": "ietf-interfaces",
"revision": "2018-02-20",
"version": "2.0.0"
}
],
"yang-library-content-id": "abc"
}
},
"data-collection-manifest": {
"name": "collector-1",
"vendor": "open source",
"software-version": "2.1.0"
},
"network-operator-metadata": {
"labels": [
{
"name": "platform-name",
"string-value": "name"
}
]
},
"payload": {
"ietf-yang-push:push-update": {
"id": 1042,
"datastore-contents": {
"ietf-interfaces:interfaces": {
"interface": [
{
"name": "eth0",
"oper-status": "down"
}
]
}
}
}
}
}
}
}
The consumer hashes the Message Key, applies modulo with the number of partitions, and determines the partition from which it should consume messages bearing that Message Key.¶
To parse the Message Key, the consumer splits the byte string on newline (LF) characters. The first line is the node-name, the second is the subscription-id, and the third line contains the concrete XPath expression(s) joined by " | ".¶
At a YANG data store, such as a Time Series database or stream processor, the YANG data could then be ingested into tables according to topic names and indexed per Message Key. If Topic Compaction is enabled, only current state is consumed.¶
Depending if the YANG Data Consumer knows the Message Key from the YANG Message Broker Consumer or the YANG Schema from the YANG Schema Registry the network telemetry messages can be indexed in a Time series database. The Message Key could serve as the primary key, while the individual fields (node-name, subscription-id, concrete XPaths) can be reflected in the indexing scheme using primary and secondary keys in a time series database. Implementation examples can be found under Section 5.¶
Each YANG-Push subscription requires a deterministic, human-readable Message Broker topic name. The topic name MUST satisfy the following requirements:¶
The topic name is derived from the Phase 2 key template (see Section 3.1.2.2) through a purely mechanical transformation of the YANG schema DATA path. No additional schema resolution is needed beyond Phase 2.¶
The input is the Phase 2 key template for one branch of the subscription. For union subscriptions with multiple branches, each branch produces its own topic name independently.¶
The derivation proceeds in five steps:¶
Figure 13 shows the derivation for three subscription paths.¶
=============== NOTE: '\' line wrapping per RFC 8792 ================ Subscription XPath | Topic Name ---------------------------------------+-------------------------------- /ietf-interfaces:interfaces/interface | if-interfaces-interface /ietf-interfaces:interfaces/interface\ | if-interfaces-interface-oper\ /oper-status | -status /ietf-system:system/clock | sys-system-clock /ietf-system:system/dns-resolver/server| sys-system-dns-resolver-server
Figure 14 shows the same paths with an organization prefix "netops".¶
Subscription XPath | Topic Name ---------------------------------------+-------------------------------- /ietf-interfaces:interfaces/interface | netops-if-interfaces-interface /ietf-interfaces:interfaces/interface\ | netops-if-interfaces-interface\ /oper-status | -oper-status /ietf-system:system/clock | netops-sys-system-clock
The topic naming algorithm has the following properties:¶
The YANG Message Broker Producer derives the topic name from the YANG-Push subscription's xpath or subtree filter by running Phases 1 and 2 (as described in Section 3.1.2) and then applying the topic name derivation algorithm above. The subscription type ("periodic", "on-change", or "on-change" with "sync-on-start") MAY be encoded in a separate topic hierarchy level, depending on the deployment's naming policy. Where "periodic" is encoded as "stats", "on-change" as "state-change", "on-change" with "sync-on-start" as "state" and "on-change" with "sync-on-start" whre topic compaction is enabled as "current-state".¶
The consumer can subscribe to multiple topics using wildcard or regex patterns. For example:¶
The YANG data is then ingested into tables according to topic names and indexed per Message Key. If Topic Compaction is enabled, only the current state is consumed.¶
Topic, Partitioning and Message Keys are generic concepts of Message Brokers. There are two known Message Broker implementations supporting all features described in this document.¶
Apache Kafka supports Message Keys, Partitioning and Log Compaction.¶
With the following example from the Apache Kafka admin client API https://kafka.apache.org/41/javadoc/org/apache/kafka/clients/admin/Admin.html a new compacted Topic can be created.¶
Properties props = new Properties();
props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
try (Admin admin = Admin.create(props)) {
String topicName = "my-topic";
int partitions = 12;
short replicationFactor = 3;
// Create a compacted topic
CreateTopicsResult result = admin.createTopics(Collections.singleton(
new NewTopic(topicName, partitions, replicationFactor)
.configs(Collections.singletonMap(TopicConfig.CLEANUP_POLICY_CONFIG,/
TopicConfig.CLEANUP_POLICY_COMPACT))));
// Call values() to get the result for a specific topic
// KafkaFuture<Void> future = result.values().get(topicName);
// Call get() to block until the topic creation is complete or has
// failed if creation failed the ExecutionException wraps the
// underlying cause. future.get();
}
¶
The most important configuration items from https://kafka.apache.org/41/configuration/topic-configs/ are "topicName" defines the Topic name, "partitions" the amount of partitions, "replicationFactor" how many times the partition is being replicated.¶
With "compact" in "cleanup.policy" the log compaction can be turned on per topic. With "min.cleanable.dirty.ratio" and "delete.retention.ms" how often and when Log Compaction should occur per topic. Where with "retention.bytes" and with "retention.ms" the topic specific compaction configurations can be limited how often the topics are compacted.¶
The topic names are constrained to 249 character length and the following characters: "a-z", "A-Z", "0-9", ".", "_" and "-". Topics can be created on the fly by producing into a new Topic when "auto.create.topics.enable" has been configured prior. Topics should be deleted at the end of the lifecycle through the "kafka-topics.sh" command.¶
The Partition count for a given Topic can be increased but not decreased. Consumer groups are automatically re-joined and partitions are being rebalanced on Message Broker nodes when Partition count changed.¶
Apache Pulsar supports Message Keys, Partitioning and Topic Compaction.¶
With "brokerServiceCompactionThreshold" when Topic Compaction should occur is being configured.¶
The topic names allow all characters except: "/". Topics can be created on the fly by producing into a new Topic when "allowAutoTopicCreation" has been configured prior. Topics should be deleted at the end of the lifecycle through pulsar-admin or pulsarctl tools.¶
The Partition count for a given Topic can be increased but not decreased. Consumer groups are automatically re-joined and partitions are being rebalanced on Message Broker nodes when Partition count changed.¶
Tables, partition and keys are generic concepts of time series databases. With ClickHouse, this document provides examples of how YANG message keys can be obtained from the Message Broker and used for indexing.¶
Unlike other realtime analytics databases, ClickHouse does not (necessarily) rely on partitioning data by timestamp. ClickHouse represents data in the MergeTree format, which is similar to a LSM tree:¶
A table consists of data parts sorted by primary key.¶
When data is inserted in a table, separate data parts are created and each of data part is lexicographically sorted by primary key. For example, if the primary key is ("MessageKey", "Date"), the data in the part is sorted by "MessageKey", and within each "MessageKey", it is ordered by "Date".¶
Data belonging to different partitions are separated into different parts. In the background, ClickHouse merges data parts for more efficient storage. Parts belonging to different partitions are not merged. The merge mechanism does not guarantee that all rows with the same primary key will be in the same data part.¶
Each data part is logically divided into granules. A granule is the smallest indivisible data set that ClickHouse reads when selecting data. ClickHouse does not split rows or values, so each granule always contains an integer number of rows. The first row of a granule is marked with the value of the primary key for the row. For each data part, ClickHouse creates an index file that stores the marks. For each column, whether it's in the primary key or not, ClickHouse also stores the same marks. These marks let you find data directly in column files.¶
Thus, it is possible to quickly run queries on one or many ranges of the primary key.¶
ClickHouse integrates with Message Brokers through Integration Table Engines.¶
Reading (selecting) data through Kafka Table Engine follows Apache Kafka semantics of advancing the offset, so subsequent reads will start at the offset the previous read left off.¶
It is the responsibility of the data model designer to transfer data to a regular table:¶
Example:¶
CREATE TABLE queue (
timestamp UInt64,
level String,
message String
)
ENGINE = Kafka
SETTINGS kafka_broker_list = 'localhost:9092',
kafka_topic_list = 'topic',
kafka_group_name = 'group1',
kafka_format = 'JSONEachRow',
kafka_num_consumers = 4;
¶
Example:¶
CREATE TABLE messages (
key String,
timestamp UInt64,
level String,
message String
)
ENGINE = MergeTree
ORDER BY (key, timestamp);
¶
CREATE MATERIALIZED VIEW mv_messages TO messages AS
SELECT
_key AS key,
timestamp,
level,
message
FROM queue;
¶
The Message Key and partition ID are available as virtual (read only) columns _key and _partition.¶
ClickHouse supports numerous Message formats natively. The example above uses the JSON Lines format but other (binary) formats, such as Apache Avro or Protobuf, are supported as well.¶
ClickHouse has built in Schema Registry support. For Apache Avro, the Schema Registry and authentication are encoded in additional parameters to the Apache Kafka consumer.¶
For formats such as Confluent JSON_SR, use the "kafka_schema_registry_skip_bytes" parameter to skip reading the Schema Registry preamble. The Schema can then be encoded explicitly.¶
This document includes no request to IANA.¶
This document should not affect the security of the Internet.¶
The YANG Message Broker Producer of a YANG-Push receiver should have three config knobs facilitate the features described in this document as optional:¶
Subject distribution enables message ordering for a set of YANG Message Keys on each partition. Where in topic distribution messages are randomly being distributed among partitions.¶
To accommodate for potential date loss throughout the data processing pipeline, periodic update of the current State for State metrics is RECOMMENDED. This can be accommodated with YANG-Push as defined in [RFC8641] by complementing "on-change sync on start" subscriptions with "periodic" subscriptions. Alternatively, in YANG-Push Lite defined in Section 7.6 of [I-D.wilton-netconf-yang-push-lite] this simplified in one subscription.¶
This section provides pointers to existing open source implementations of this draft. Note to the RFC-editor: Please remove this before publishing.¶
A prove of concept implementing the three-phase algorithm described in Section 3.1.2.¶
The open source code can be accessed here: [yang-push-key].¶
Thanks to Camilo Cardona, Rob Wilton, Holger Keller, Reshad Rahman, Nigel Davis, Olga Havel and Michael Mackey for their comments and reviews.¶
We also like to thank Victor Lopez for the initial idea on the network controller use case. Ashley Woods, Sivakumar Sundaravadivel and Rafael Julio for the idea of grouping topics by YANG-Push subscription type and insisting that Topic Compaction is a key enabler for inventory metrics and YANG data consumer integration and should be supported day 1. Nigel Davis for confirming that Topic Compaction simplifies indeed data processing system architecture and Loic Monney for the operational configuration and monitoring details on Apache Kafka.¶
Many thanks goes to Hellmar Becker who contributed Section 3.1.4 and Section 5 on how YANG Message Keys can be obtained from Message Broker, how time series databases can use it for indexing YANG data and example implementation in ClickHouse.¶