Sparkplug B Protocol Fuzzing with AI Assistance

Sparkplug B Protocol Fuzzing with AI Assistance
TL;DR: The problem: Sparkplug B is the dominant MQTT-based protocol in industrial control and S 2026-5-26 13:0:0 Author: bishopfox.com(查看原文) 阅读量:8 收藏

TL;DR: The problem: Sparkplug B is the dominant MQTT-based protocol in industrial control and SCADA environments, but until now there was no publicly available security fuzzer for it.
What we built: A systematic Sparkplug B protocol fuzzer covering all 9 message types, 19 data types, and 87+ unique field paths defined by the Eclipse Sparkplug specification.
How AI helped: Claude Code read the Sparkplug protobuf definition and specification alongside our working prototype, identified the coverage gaps and Python defects in the original, and produced a hardened, self-contained tool with CLI, logging, and passive network discovery.
Why it matters: ICS and SCADA operators, device vendors, and their defenders now have a tool to exercise malformed-traffic behavior on Sparkplug B endpoints. Fuzzing will surface crashes, protocol violations, and state-handling bugs before an attacker does.
Where to get it: https://github.com/BishopFox/sparkplugFuzzer.

Sparkplug B is an open MQTT-based specification maintained by the Eclipse Foundation that standardizes how industrial control system (ICS) and SCADA (Supervisory Control and Data Acquisition) devices publish state, telemetry, and commands over a shared broker.

It serializes messages with Google Protocol Buffers keeping payloads small enough for bandwidth-constrained links, like cellular or radio, and defines a structured topic namespace (spBv1.0/{group}/{message_type}/{node}/{device}) across nine message types and 19 data types. That combination is what makes it the default interoperability layer for modern unified namespace (UNS) deployments in manufacturing and critical infrastructure.

How Sparkplug B Works

Sparkplug B layers three things on top of vanilla MQTT:

Fixed topic namespace. Every message follows spBv1.0/{group_id}/{message_type}/{edge_node_id}/{device_id}. That structure lets a broker, and any subscribed tool, build a live map of the network simply by listening.
Defined message lifecycle. Nine message types cover device birth, death, data, and commands — NBIRTH/DBIRTH announce a node or device's metrics and state, NDATA/DDATA publish values, NCMD/DCMD carry commands, NDEATH/DDEATH announce disconnection (via MQTT Last Will and Testament), and STATE reports host-application liveness.
Protobuf-encoded payload. Each message is a Google Protocol Buffers blob with typed metrics, aliases for bandwidth compression, and sequence numbers for ordering. 19 data types: Int8 through UUID, DataSet, and Template can appear in a single payload.

This structure is what makes Sparkplug B efficient for OT (Operational Technology) networks, and it's also what makes it a rich target for protocol fuzzing: there are a lot of fields, sequence rules, and type expectations that a malformed payload can violate.

Why It Needed a Fuzzer

Specifications for protocols define what is required but normally don't specify how to handle unexpected traffic. Protocol fuzzers work by sending malformed, random, or unexpected traffic to endpoints on the network. Security fuzzing tests what the implementation and specification documents say and how devices behave when receiving unexpected traffic. Using a fuzzer uncovers device behavior that may introduce vulnerabilities or disrupt communication for critical devices.

Why Sparkplug B Security Matters

Sparkplug B rides on MQTT brokers that sit between the IT network and the OT floor – the same brokers that carry set-points to PLCs, telemetry from pumps and valves, and commands to robots and CNC machines. A malformed payload that crashes a device, corrupts its state, or gets silently accepted when it shouldn't be doesn't just cost a reboot. In an industrial environment, the downstream physical process can go with it.

Examples:

Availability: A single malformed DDATA message that crashes an edge node can take an entire production line's sensor feed offline. Recovering a hung PLC often requires manual intervention on the plant floor.

Integrity: Sparkplug B's alias mechanism, where short integer IDs stand in for named metrics, is powerful, but alias-collision or type-mismatch handling bugs can cause the wrong metric to receive the wrong value for example a temperature field accepting a string or metric ID 5 rebinding mid-session.

Safety: In critical infrastructure such as water treatment, energy, pharma, and food production, incorrect telemetry or a missed command is a safety event, not just a software bug.

Blast radius: MQTT's publish-subscribe model means a single misbehaving or malicious publisher can reach every subscriber on a topic. Flat namespaces and permissive ACLs, both common in OT implementations magnify that reach.

Sparkplug B is marketed as "self-describing" and easy to adopt. That ease of adoption has outpaced the security tooling that should accompany it.

Initial Effort and Issues

The initial script was based on Cirrus-Link Sparkplug example code . With slight modifications, we were able to add a file read for a static list of strings to fuzz the fields with.

I tell everyone that my code is like a 1987 Honda, it will get you where you’re going but don’t look under the hood. But, we looked under the hood and this was the state of testing:

The Good, what the script was able to do:
- Connect to an MQTT broker at localhost:1883
- Publish NBIRTH and DBIRTH certificates with hardcoded metrics
- Set NDEATH as MQTT last-will-and-testament for graceful disconnect handling
- Map 13 static aliases using an AliasMap class for metric alias
- Handle incoming NCMD/DCMD commands via on_message callback
- Read fuzz data from a static file and inject it as string metric values in DDATA messages
By the numbers:
- 4 of 9 message types covered (NBIRTH, DBIRTH, NDEATH, DDATA)
- Only 1 metric actually fuzzed (Device_Metric0 with string data from static file)
- 1 randomized boolean (Device_Metric1)
- All other metrics were static hardcoded values

The Ugly, what the script didn’t do:
- No logging, meaning no sample data for the client.
- No network discovery. Potentially missing targets
- No systematic approach to testing protocol fields. Not ideal for ensuring the client gets the best possible results.
- No CLI interface and hardcoded broker address.

With a working first prototype, we fed our script and the protocol specification () into a Claude Code instance along with a few requirements such as the outputted script should install any dependencies and be self-contained. In addition, we asked Claude to check for general programming issues. What it found, this is a painful list to read but it is fully transparent:

The script mixed Python 2 print statements, without parentheses, with Python 3 print() function statements
The core fuzzing logic was inefficient resulting in timeouts caused by issues between the speed of the for loop and network latency
The initial MQTT constructor mqtt.Client(serverUrl, 1883, 60) passed the broker hostname as client_id, port as clean_session, and keepalive as userdata — the actual connection worked only because client.connect() was called separately with correct arguments.
The client.loop() called in a tight loop instead of threaded client.loop_start(), making the script fragile to timing issues. This made the core fuzzing inefficiency even worse
Minimal fuzzing coverage since the initial script only tested String and Boolean data types out of 19 available. The initial version ignored Float, Double, all unsigned integers, Bytes, DateTime, Template instances, DataSet values
Missing testing of critical attack surface initially we did not test for type mismatch testing, sequence number manipulation, alias collision testing, protocol ordering violations, no binary protobuf corruption, no topic namespace fuzzing.
No logging of sent or received messages, no way to analyze what was tested or correlate with observed behavior on the target

Did it work? Yes, by accident, and it was fragile and not the sort of testing we want to give our clients.

Improved Fuzzer

The end state fixed the above issues and improved the protocol fuzzer in multiple ways:

After analysis of the Sparkplug B protobuf definition () and the Eclipse Sparkplug specification the AI agent was able to map out all 87+ unique field paths, all 19 data types, and all 9 message types
Performed a systematic gap analysis. Compared the original script's coverage against the full protocol specification
Redesigned the fuzzer payload constructor with both high-level valid payload construction (via sparkplug_b helpers) and low-level raw protobuf manipulation (via sparkplug_b_pb2 directly). The significant improvement was that fuzzing requires intentionally invalid payloads that helper libraries will reject
Added network discovery DeviceTracker class for passive reconnaissance. The script subscribes to the Sparkplug wildcard topic, parses NBIRTH/DBIRTH messages to build a live map of the network, and extracts metric definitions for targeted attacks

The end fuzzer was robust, systematic, and a professionalized tool for testing this critical protocol.

Coverage Comparison

Capability	Initial script	Improved fuzzer
Message types exercised	4 of 9 (NBIRTH, DBIRTH, NDEATH, DDATA)	All 9 (NBIRTH, DBIRTH, NDATA, DDATA, NCMD, DCMD, NDEATH, DDEATH, STATE)
Data types fuzzed	2 (String, Boolean)	All 19 (Int8/16/32/64, UInt8/16/32/64, Float, Double, Boolean, String, DateTime, Text, UUID, DataSet, Bytes, File, Template)
Field paths covered	Handful of hardcoded metrics	87+ unique field paths mapped from the protobuf definition
Type-mismatch testing	None	Yes
Sequence number manipulation	None	Yes
Alias collision testing	None	Yes
Protocol ordering violations	None	Yes
Raw protobuf corruption	None	Yes (via sparkplug_b_pb2 direct access)
Topic namespace fuzzing	None	Yes
Network discovery	None	Passive DeviceTracker via wildcard subscription
Logging & artifacts	None	Structured send/receive log for client correlation
CLI / broker config	Hardcoded localhost:1883	CLI interface, configurable broker
Python version hygiene	Mixed Py2/Py3 print	Clean Python 3
MQTT loop handling	Tight client.loop()	Threaded client.loop_start()

Sparkplug B Message Types

Type	Purpose
NBIRTH	Edge node announces itself and publishes its full metric set
DBIRTH	Device under an edge node announces itself and its metric set
NDATA	Edge node publishes metric value updates
DDATA	Device publishes metric value updates
NCMD	Command targeting an edge node
DCMD	Command targeting a device
NDEATH	Edge node disconnection (delivered via MQTT Last Will and Testament)
DDEATH	Device disconnection
STATE	Host application liveness announcement

Detection and Mitigation

If you operate a Sparkplug B environment, or you're the vendor shipping devices that speak it, the following controls meaningfully raise the bar against both opportunistic and targeted abuse:

Broker configuration

Require authenticated publishers and subscribers. Anonymous MQTT is a default that should be disabled in production.
Enforce TLS on broker connections (8883 not 1883), including between edge nodes and the broker, not just host applications.
Apply topic ACLs that match the Sparkplug B namespace (spBv1.0/{group}/+/+/+ ) scoped per group, per role. A single subscriber authorized for spBv1.0/# is a blast-radius problem.

Protocol-layer hardening

Validate incoming protobuf payloads against the Sparkplug B schema before acting on them. Many device implementations will accept malformed payloads that a strict parser rejects.
Enforce type expectations on metric updates. A metric declared Float in NBIRTH should not accept a String in DDATA.
Check sequence numbers. Gaps, wraparound, and out-of-order deliveries are diagnostic signals, not things to silently paper over.
Handle alias collisions defensively. An alias re-bound mid-session is either a bug or an attack treat it as one of those two.

Detection and monitoring

Log all NBIRTH / DBIRTH / NDEATH transitions. Unexpected re-births or orphaned devices are cheap, high-signal alerts.
Monitor for malformed protobuf parse failures at the broker or at a passive tap. A burst of parse errors is an early indicator of either a misbehaving device or a fuzzing campaign.
Watch for topic-namespace anomalies: publishers appearing under unexpected group_id values, or edge nodes publishing STATE messages they shouldn't be authoritative for.
Alert on stale NDEATH / orphaned-session patterns, a device that births but never deaths, or vice versa, is a state-handling bug waiting to be exploited.

Operational hygiene

Segment OT brokers from IT networks. A Sparkplug B broker reachable from a corporate workstation or, worse, the internet is the first finding on any assessment.
Test the above controls with a fuzzer. That's what this tool is for.

What we see in real engagements

Across ICS and OT assessments, a few Sparkplug B patterns come up reliably:

Brokers reachable from the business network with anonymous auth still enabled in production.
Devices that accept any alias rebinding mid-session, trusting the publisher to be well-behaved.
Stale NDEATH handling, dead sessions that never get cleaned up, leaving ghost devices in host-application state for hours.

On one automotive manufacturing assessment, all three failures lined up: corporate Wi-Fi bridged directly into the OT network, devices that required no authentication, and sessions that allowed mid-session alias rebinding without ever tombstoning a device after NDEATH.

There is a creativity gap in penetration testing that AI won't be able to overcome, i.e. sticking two square pegs in a round hole and jiggling them just right can't be trained. But if you have two almost-working square pegs and a vague idea of how to jiggle them, AI can help get the process across the finish line fairly quickly. Most of the recent news has been about how attackers are outpacing defenders. Bishop Fox hopes that this is proof that behind-the-scenes defenders are quietly fixing issues and hardening their attack surface.

Benefits of Using Claude Code

Could we have written this fuzzer without AI assistance? Yes. Could we have written it with the time constraints of the test and validated it against the specifications and protobuf definition? Maybe. I would need a bigger whiteboard. Would we have written the readme and usage documentation? Probably not.

Claude Code took our creative security testing idea and made us more efficient by reading documents and planning a better script. The bane of everyone who has written code is maintaining documentation and usage information. Claude generated that documentation for us, allowing us to spend more time doing what we are good at, assessing critical infrastructure, and less doing the chore of maintaining documentation.

Final Thoughts

Building the fuzzer was the easy part. If you want to know what else is lurking in a Sparkplug B environment, the three findings from that automotive engagement show up together more often than they should: bridged networks, no authentication, uncleaned alias sessions. That's what Bishop Fox penetration testing is built to surface before an attacker does.

Grab the Sparkplug B fuzzer on GitHub and put it to work, and for critical infrastructure operators specifically, see how Bishop Fox works with energy and utility clients and how a Fortune 500 utility stays ahead of emerging threats. 

文章来源: https://bishopfox.com/blog/sparkplug-b-protocol-fuzzing-with-ai-assistance
如有侵权请联系:admin#unsafe.sh