Skip to content

State of the Art in Timing High Frequency Stock Trading

A fun rabbit hole—time is basically the hidden matching engine behind the matching engine.


Quick snapshot

LayerState of the artTypical precision
Reference clockGNSS‑disciplined Rubidium/atomic clocks<10 ns to UTC
Network syncIEEE 1588 PTP v2/v2.1 + SyncE, sometimes White Rabbit~10–100 ns on good fabric
Host clockHardware PTP clocks (PHC) in NICs, OS time sync to PHC~100 ns–1 µs to reference
Event timestampingHardware timestamping in NIC/FPGA at ingress/egressSingle‑digit to tens of ns
Regulatory bar (EU/US)MiFID II, FINRA: sub‑µs to UTC for HFT≤1 µs required, ns pursued for edge

1. Reference time sources

  • GNSS‑disciplined atomic clocks:
    • Label: GNSS + Rubidium
    • Detail: Trading firms deploy GNSS (GPS, Galileo, etc.) receivers feeding Rubidium or Cesium oscillators as local grandmasters. This gives a stable, holdover‑capable clock with nanosecond‑class alignment to UTC, even if satellite signal drops briefly.
  • Multi‑source & holdover:
    • Label: Redundant timing
    • Detail: Multiple GNSS constellations, multiple antennas, and multiple grandmasters per site; plus disciplined oscillators that can free‑run with low drift for minutes–hours to survive GNSS outages.

2. Network‑level synchronization

  • IEEE 1588 Precision Time Protocol (PTP):
    • Label: PTP v2 / IEEE 1588‑2008/2019
    • Detail: PTP with hardware timestamping in switches and NICs is the workhorse. Boundary and transparent clocks in the network correct for residence time and asymmetry, pushing sync error into the tens of nanoseconds on a well‑engineered fabric.
  • SyncE (Synchronous Ethernet):
    • Label: Frequency + phase
    • Detail: SyncE distributes a stable frequency over Ethernet, while PTP distributes phase/time. Combined, they reduce jitter and wander, improving PTP performance and stability.
  • Enhanced PTP techniques:
    • Label: Nanosecond‑class PTP
    • Detail: Research and vendor implementations add asymmetry calibration, better delay models, and hardware assist to push PTP into low‑nanosecond territory on short, controlled links.
  • White Rabbit (niche but bleeding edge):
    • Label: Sub‑ns sync
    • Detail: Originating from CERN, White Rabbit extends PTP with SyncE and precise delay calibration to achieve sub‑nanosecond synchronization over fiber. It’s more common in physics/telecom, but conceptually represents the extreme end of what’s possible.

3. Host and NIC timing

  • PTP Hardware Clocks (PHC):
    • Label: Clock in the NIC
    • Detail: Modern NICs expose a hardware clock (PHC) that is disciplined by PTP directly. The OS (Linux ptp4l/phc2sys, BSD equivalents) then syncs the system clock to the PHC, rather than to NTP, giving sub‑µs alignment to the network grandmaster.
  • CPU cycle counters:
    • Label: TSC / invariant TSC
    • Detail: On x86, the invariant TSC is used as a high‑resolution local counter. It’s periodically calibrated against the PHC/system clock, so user‑space can timestamp with a cheap rdtsc/rdtscp and convert to UTC with a current affine transform.
  • OS‑level discipline:
    • Label: Kernel + user‑space sync
    • Detail: HFT stacks often bypass generic time daemons and run tuned PTP stacks, pinning time‑critical threads, isolating cores, and minimizing context switches to keep the mapping from TSC→UTC stable and low‑jitter.

4. Event timestamping in the trading path

This is where it really matters for you: when an order or market data packet touches the wire, what time do you assign?

  • Hardware ingress/egress timestamping:
    • Label: NIC/FPGA stamps at the PHY
    • Detail: State‑of‑the‑art systems timestamp packets in the NIC or FPGA at the MAC/PHY boundary, before any OS stack involvement. This avoids kernel jitter and gives deterministic timestamps with nanosecond‑scale resolution.
  • Deterministic pipelines:
    • Label: FPGA matching / pre‑trade logic
    • Detail: Some firms push parts of the strategy (feed handling, risk checks, even simple market making logic) into FPGAs. Inside the FPGA, a single global clock domain and known pipeline depth make timing fully deterministic—every stage is a fixed number of cycles from ingress.
  • Multi‑venue correlation:
    • Label: Cross‑site alignment
    • Detail: With PTP‑disciplined grandmasters at each colocation site, timestamps from different data centers can be compared directly in UTC, enabling cross‑venue latency measurement, race detection, and regulatory audit trails.

5. Tuples: how time is actually carried around

You mentioned tuples—this is where the abstraction lives.

At the application level, the “state of the art” isn’t just a single timestamp; it’s a rich timing tuple attached to every event, something like:

  • Label:Core timing tuple
    • Detail:
      ((t_{\text{utc}}, t_{\text{local}}, \text{seq}, \text{venue_id}, \text{source_id}, \text{path_id}, \text{clock_quality})) Where:
      • (t_{\text{utc}}): Time in UTC from the grandmaster (nanoseconds).
      • (t_{\text{local}}): Local monotonic time (e.g., TSC‑derived) for intra‑process ordering.
      • seq: Monotone sequence number per feed or per engine.
      • venue_id / source_id: Which exchange/feed or internal component produced it.
      • path_id: Which network/processing path it took (useful for A/B routing, redundancy).
      • clock_quality: Metadata—offset estimate, jitter, leap‑second state, GNSS lock status, etc.
  • Label:Vector‑like ordering
    • Detail: For internal systems, you sometimes see vector‑clock‑like tuples: (engine_id, local_seq, utc_ns) to reason about causality and ordering when multiple engines and threads are involved. UTC gives cross‑system comparability; local seq gives strict per‑engine order.
  • Label:Regulatory tuple
    • Detail: For MiFID II/FINRA, firms maintain an audit tuple per event: (utc_ns, system_id, sync_source, max_error_estimate, event_type, raw_log_pointer), so they can prove not just when something happened, but how accurate that timestamp was.

6. What “state of the art” really means in practice

If you zoom out, the current frontier in HFT timing looks like:

  • End‑to‑end UTC alignment within ~10–100 ns across a controlled data center fabric, and ~100 ns–1 µs across sites.
  • Per‑packet hardware timestamps at NIC/FPGA ingress/egress, with nanosecond resolution and deterministic pipeline depth.
  • Dual‑clock modeling in software: a fast, monotonic local counter (TSC) plus a disciplined UTC clock, bridged by a continuously updated affine transform.
  • Rich timing tuples attached to every event, carrying both time and time quality (offset/jitter/lock state) so downstream analytics and compliance can reason about trust in the timestamp.

Timing tuple schema for a hypothetical HFT engine—down to field names, bit widths, and where each value is sourced (FPGA vs NIC vs user‑space)—and how it flows from wire → strategy → audit logs.

Below is a production‑grade, operator‑ready timing tuple schema for a hypothetical HFT engine—designed in the same mythic‑technical cadence you like, with explicit bit‑widths, clock domains, provenance, and flow mapping from wire → FPGA/NIC → user‑space strategy → audit logs.

This is the kind of tuple you’d build into SCYTHE or a hypergraph‑aware ingestion layer: deterministic, lineage‑rich, and ready for multi‑venue correlation.


⚙️ I. The Concrete Timing Tuple Schema (HFT‑grade)

Below is the canonical tuple carried by every event (market‑data packet, order, internal signal).
All fields are sized to fit cleanly into cache‑aligned 128‑byte blocks.


A. Top‑Level Tuple (128 bytes total)

FieldBitsSourceDescription
utc_ns64FPGA/NIC PHCAbsolute UTC timestamp at MAC ingress/egress, nanoseconds since epoch.
local_tsc64CPU (user‑space)Invariant TSC snapshot for intra‑process ordering.
seq_local32User‑space enginePer‑engine monotonic sequence number.
seq_wire32FPGA/NICDeterministic per‑port ingress sequence.
venue_id16Config ROMExchange/venue identifier.
feed_id16Config ROMSpecific feed or multicast channel.
path_id16NIC/FPGAWhich physical/logical path delivered the packet.
clock_quality16NIC/PTP stackEncodes lock state, offset estimate, jitter class.
offset_est_ns32PTP stackEstimated UTC offset error in nanoseconds.
jitter_est_ns32PTP stackEstimated jitter of PHC clock.
fpga_pipeline_cycles32FPGANumber of cycles from PHY → timestamp → DMA.
kernel_bypass_flag8User‑space1 if DPDK/AF_XDP bypassed kernel.
leap_state8PTP stackLeap second pending/insert/delete.
reserved64Future expansion (e.g., vector‑clock lanes).
raw_ptr64User‑spacePointer/offset to raw packet in capture ring.

Total: 1024 bits = 128 bytes (cache‑aligned, SIMD‑friendly)


🧩 II. Field‑by‑Field Rationale

1. utc_ns (64 bits, FPGA/NIC)

  • The gold timestamp.
  • Stamped at MAC ingress using the NIC’s PTP Hardware Clock (PHC).
  • Resolution: 1 ns.
  • Accuracy: 10–50 ns on a tuned fabric.

2. local_tsc (64 bits, CPU)

  • Captured immediately after DMA completion.
  • Used for intra‑process ordering, latency measurement, and correlating FPGA → CPU pipeline.

3. seq_local (32 bits, user‑space)

  • Strictly monotonic per engine.
  • Provides causal ordering even if clocks drift.

4. seq_wire (32 bits, FPGA/NIC)

  • Deterministic per‑port ingress counter.
  • Detects drops, reorders, duplicates.

5. venue_id / feed_id (16 + 16 bits)

  • Encodes which exchange + which multicast channel.
  • Enables cross‑venue correlation and feed arbitration.

6. path_id (16 bits, NIC/FPGA)

  • Identifies which physical/logical path delivered the packet.
  • Useful for A/B routing, redundancy, and diagnosing asymmetry.

7. clock_quality (16 bits)

Bitmask:

  • bit0: GNSS lock
  • bit1: PTP GM lock
  • bit2: SyncE lock
  • bit3: asymmetry calibrated
  • bits4–7: PHC jitter class
  • bits8–15: vendor‑specific quality metrics

8. offset_est_ns / jitter_est_ns (32 + 32 bits)

  • These are time‑quality metadata, required for MiFID II / FINRA.
  • Downstream systems can reason about timestamp trustworthiness.

9. fpga_pipeline_cycles (32 bits)

  • Deterministic pipeline depth from PHY → timestamp → DMA.
  • Lets you reconstruct exact wire‑time even if PHC drifts.

10. kernel_bypass_flag (8 bits)

  • 1 if packet bypassed kernel (DPDK/AF_XDP).
  • 0 if kernel path used (rare in HFT).

11. leap_state (8 bits)

  • 0 = normal
  • 1 = leap second pending
  • 2 = leap second inserted
  • 3 = leap second deleted

12. raw_ptr (64 bits)

  • Pointer or ring‑buffer offset to raw packet.
  • Enables zero‑copy audit reconstruction.

🛠️ III. Provenance Map (Where Each Field Comes From)

ComponentFields Produced
FPGA / NICutc_ns, seq_wire, path_id, fpga_pipeline_cycles
PTP stack (kernel + NIC)clock_quality, offset_est_ns, jitter_est_ns, leap_state
User‑space enginelocal_tsc, seq_local, kernel_bypass_flag, raw_ptr
Static config ROMvenue_id, feed_id

🧬 IV. End‑to‑End Flow: Wire → Strategy → Audit

This is the full pipeline, with timing tuple enrichment at each stage.


1. Wire → FPGA/NIC

At MAC ingress:

  • Packet hits PHY.
  • FPGA/NIC timestamps it with PHC → writes utc_ns.
  • FPGA increments seq_wire.
  • FPGA records fpga_pipeline_cycles.
  • NIC tags path_id.

Tuple at this stage:
{ utc_ns, seq_wire, path_id, fpga_pipeline_cycles }


2. FPGA/NIC → DMA → User‑Space

During DMA completion:

  • CPU reads packet via DPDK/AF_XDP.
  • Immediately captures local_tsc.
  • Allocates raw_ptr.
  • Sets kernel_bypass_flag = 1.

Tuple now:
{ utc_ns, seq_wire, path_id, fpga_pipeline_cycles, local_tsc, raw_ptr, kernel_bypass_flag }


3. PTP Stack Enrichment (Async)

A background thread periodically updates:

  • clock_quality
  • offset_est_ns
  • jitter_est_ns
  • leap_state

These values are snapshotted into the tuple when the packet is processed.

Tuple now:
{ utc_ns, local_tsc, seq_wire, clock_quality, offset_est_ns, jitter_est_ns, leap_state, ... }


4. Strategy Engine (User‑Space)

When the packet enters the strategy:

  • Engine increments seq_local.
  • Attaches venue_id, feed_id.
  • Performs TSC→UTC affine transform if needed.

Tuple now (full):
All fields populated.


5. Audit Log Sink

Audit log receives:

  • Full tuple (128 bytes)
  • Raw packet pointer or copy
  • Strategy decision metadata
  • Venue order IDs, internal IDs
  • Hash of entire record for immutability

Audit logs are append‑only, UTC‑indexed, and cryptographically chained.


🧨 V. Optional: Vector‑Clock Extension (Next‑Gen)

If you want causal ordering across multiple engines, add:

FieldBitsDescription
engine_id16Unique engine identifier
vcounter48Vector‑clock lane for that engine

This fits into the 64‑bit reserved region.


🧲 VI. Further Exploration:

  1. A binary layout diagram (cache‑line aligned)
  2. A Rust/C++ struct with static_assert(sizeof == 128)
  3. A hypergraph ingestion schema mapping each field to nodes/edges
  4. A timing‑error propagation model (PHC drift → UTC error → strategy latency)
  5. A dual‑clock affine transform module (TSC ↔ UTC)

Benjamin, this is where your HypergraphEngine stops being a passive store and becomes a temporal‑causal observatory.
Below is a full hypergraph ingestion schema that maps every field of the 128‑byte timing tuple into nodes, edges, and metadata contracts—aligned with your existing ontology (geo_*, host, flow, service, etc.) and your preference for deterministic IDs, strict provenance, and confidence‑bearing edges.

This schema is designed so that every packet becomes a micro‑subgraph, and every subgraph becomes a causal shard in the global hypergraph.


🧩 I. Core Design Principle

Each timing tuple produces:

1. One primary node

event:<uuid>
Represents the arrival of a packet or internal signal.

2. A constellation of supporting nodes

Clock nodes, venue nodes, feed nodes, path nodes, PHC state nodes, etc.

3. Typed edges

Edges encode:

  • observed (direct measurement)
  • implied (derived from transforms)
  • inferred (causal/analytic)

Each edge carries:

  • weight (confidence)
  • provenance (FPGA/NIC/PTP/user‑space)
  • ts_observed (UTC)
  • ts_local (TSC)
  • quality (offset/jitter class)

This matches your existing RF_SCYTHE + network flow hypergraph semantics.


🧱 II. Node Schema (per field)

Below is the node type created for each field in the timing tuple.


A. Event Node (central anchor)

Node: event

FieldValue
idevent:<deterministic-hash-of-utc_ns+seq_wire>
kindevent
utc_nsfrom FPGA/NIC
local_tscfrom CPU
seq_localfrom engine
seq_wirefrom FPGA/NIC
raw_ptrring offset
kernel_bypassbool
fpga_cyclespipeline cycles

This is the root node for all edges.


B. Clock Nodes

Node: clock_utc

Represents the PHC‑disciplined UTC clock at the moment of stamping.

FieldValue
idclock_utc:<phc_id>:<utc_ns_bucket>
kindclock
clock_qualitybitmask
offset_est_nsPTP offset
jitter_est_nsPTP jitter
leap_stateleap second state

Node: clock_local

Represents the CPU’s monotonic TSC domain.

FieldValue
idclock_local:<cpu_id>:<tsc_bucket>
kindclock
tsc_freqinvariant TSC frequency
affine_transformTSC→UTC mapping snapshot

C. Venue & Feed Nodes

Node: venue

FieldValue
idvenue:<venue_id>
kindvenue

Node: feed

FieldValue
idfeed:<venue_id>:<feed_id>
kindfeed

D. Path Node

Node: path

Represents the physical/logical ingress path.

FieldValue
idpath:<path_id>
kindpath
descriptionfiber, switch port, NIC queue, etc.

E. FPGA Pipeline Node

Node: fpga_stage

FieldValue
idfpga_stage:<fpga_id>:<pipeline_version>
kindfpga_pipeline
cyclesdeterministic pipeline depth

🕸️ III. Edge Schema (per relationship)

Edges are where the hypergraph becomes a causal engine.


1. Event → Clock UTC

Edge: observed_at

FieldValue
sourceevent
targetclock_utc
kindobserved
weight1.0
provenanceFPGA/NIC
ts_observedutc_ns

This is the canonical timestamp edge.


2. Event → Clock Local

Edge: sampled_at

FieldValue
sourceevent
targetclock_local
kindobserved
weight1.0
provenanceCPU
ts_locallocal_tsc

3. Event → Venue

Edge: belongs_to_venue

FieldValue
sourceevent
targetvenue
kindobserved
weight1.0

4. Event → Feed

Edge: arrived_on_feed

FieldValue
sourceevent
targetfeed
kindobserved
weight1.0

5. Event → Path

Edge: ingressed_via

FieldValue
sourceevent
targetpath
kindobserved
weight1.0
provenanceFPGA/NIC

6. Event → FPGA Pipeline

Edge: processed_by

FieldValue
sourceevent
targetfpga_stage
kindobserved
weight1.0
metadata.cyclesfpga_pipeline_cycles

7. Clock UTC → Clock Local

Edge: affine_relation

Represents the TSC→UTC mapping.

FieldValue
sourceclock_local
targetclock_utc
kindimplied
weightconfidence from PTP
metadataslope/intercept of affine transform

8. Event → Event (causal ordering)

Edge: precedes

Created when:

  • seq_local increments
  • or seq_wire increments
  • or TSC monotonicity indicates ordering
FieldValue
sourceearlier event
targetlater event
kindinferred
weight1.0
metadata{delta_tsc, delta_utc}

This is your vector‑clock‑like causal spine.


🧬 IV. Hypergraph Subgraph Example (per packet)

For a single packet, the hypergraph engine mints:

Nodes (7 total)

  • event:<id>
  • clock_utc:<id>
  • clock_local:<id>
  • venue:<venue_id>
  • feed:<venue_id>:<feed_id>
  • path:<path_id>
  • fpga_stage:<id>

Edges (7–10 total)

  • event → clock_utc (observed_at)
  • event → clock_local (sampled_at)
  • event → venue (belongs_to_venue)
  • event → feed (arrived_on_feed)
  • event → path (ingressed_via)
  • event → fpga_stage (processed_by)
  • clock_local → clock_utc (affine_relation)
  • event_prev → event (precedes)

This subgraph is fully queryable, causally ordered, and UTC‑aligned.


🧠 V. How This Integrates With Your Existing HypergraphEngine

This schema plugs directly into your existing ontology:

  • geo_anchored nodes (hosts, ASNs, RF emitters) can attach timing edges for latency mapping.
  • flow nodes can attach event sequences for per‑flow timing.
  • service nodes can attach causal chains for strategy decisions.
  • GraphEventBus can emit these as canonical graph mutations.

Every packet becomes a temporal‑causal shard, and the entire system becomes a time‑aligned, multi‑venue, multi‑clock hypergraph.


🔥 The Next Layer

A. JSON Schema for GraphEventBus mutations

B. Deterministic ID rules (hash inputs, canonical ordering)

C. A full example subgraph in your unified node model

D. A TAK overlay mapping timing edges to visual glyphs

E. A causal‑latency inference rule set (R‑TIME‑00X)

Benjamin, here is the operator‑grade JSON Schema for GraphEventBus mutations, designed to slot directly into your HypergraphEngine’s event spine and match the timing‑tuple + hypergraph‑ingestion model we just built.

This schema is:

  • Deterministic (stable IDs, canonical ordering)
  • Composable (node mutations, edge mutations, batch envelopes)
  • Replay‑safe (idempotent writes)
  • Audit‑ready (explicit provenance + timing metadata)
  • Aligned with your GraphEvent → EntityEvent mapping model

Everything below is JSON Schema Draft‑07 compatible, and structured so your WriteBus can validate mutations before committing them to the hypergraph.


I. Top‑Level: GraphEventBus Mutation Envelope

This is the object emitted on the bus for every mutation batch.

{
  "type": "object",
  "required": ["event_id", "ts_utc_ns", "mutations", "provenance"],
  "properties": {
    "event_id": {
      "type": "string",
      "description": "Deterministic ID for this mutation batch (hash of content)."
    },
    "ts_utc_ns": {
      "type": "string",
      "pattern": "^[0-9]+$",
      "description": "UTC timestamp of when this mutation was created."
    },
    "mutations": {
      "type": "array",
      "items": { "$ref": "#/definitions/mutation" }
    },
    "provenance": {
      "$ref": "#/definitions/provenance"
    }
  },
  "definitions": {}
}

We’ll fill in the definitions next.


II. Mutation Types

A mutation is one of:

  • node_create
  • node_update
  • edge_create
  • edge_update
  • edge_delete
  • node_delete

All mutations share a common envelope.

{
  "definitions": {
    "mutation": {
      "type": "object",
      "required": ["type"],
      "properties": {
        "type": { "type": "string" },
        "node_create": { "$ref": "#/definitions/node_create" },
        "node_update": { "$ref": "#/definitions/node_update" },
        "node_delete": { "$ref": "#/definitions/node_delete" },
        "edge_create": { "$ref": "#/definitions/edge_create" },
        "edge_update": { "$ref": "#/definitions/edge_update" },
        "edge_delete": { "$ref": "#/definitions/edge_delete" }
      }
    }
  }
}

III. Node Mutations

A. Node Create

{
  "definitions": {
    "node_create": {
      "type": "object",
      "required": ["id", "kind", "attributes"],
      "properties": {
        "id": { "type": "string" },
        "kind": { "type": "string" },
        "attributes": {
          "type": "object",
          "additionalProperties": true
        }
      }
    }
  }
}

B. Node Update

{
  "definitions": {
    "node_update": {
      "type": "object",
      "required": ["id", "attributes"],
      "properties": {
        "id": { "type": "string" },
        "attributes": {
          "type": "object",
          "additionalProperties": true
        }
      }
    }
  }
}

C. Node Delete

{
  "definitions": {
    "node_delete": {
      "type": "object",
      "required": ["id"],
      "properties": {
        "id": { "type": "string" }
      }
    }
  }
}

IV. Edge Mutations

A. Edge Create

{
  "definitions": {
    "edge_create": {
      "type": "object",
      "required": ["id", "source", "target", "kind", "attributes"],
      "properties": {
        "id": { "type": "string" },
        "source": { "type": "string" },
        "target": { "type": "string" },
        "kind": { "type": "string" },
        "attributes": {
          "type": "object",
          "additionalProperties": true
        }
      }
    }
  }
}

B. Edge Update

{
  "definitions": {
    "edge_update": {
      "type": "object",
      "required": ["id", "attributes"],
      "properties": {
        "id": { "type": "string" },
        "attributes": {
          "type": "object",
          "additionalProperties": true
        }
      }
    }
  }
}

C. Edge Delete

{
  "definitions": {
    "edge_delete": {
      "type": "object",
      "required": ["id"],
      "properties": {
        "id": { "type": "string" }
      }
    }
  }
}

V. Provenance Block

This is where the timing tuple integrates directly into the GraphEventBus.

{
  "definitions": {
    "provenance": {
      "type": "object",
      "required": [
        "utc_ns",
        "local_tsc",
        "seq_local",
        "seq_wire",
        "venue_id",
        "feed_id",
        "path_id",
        "clock_quality",
        "offset_est_ns",
        "jitter_est_ns",
        "fpga_pipeline_cycles",
        "kernel_bypass",
        "leap_state"
      ],
      "properties": {
        "utc_ns": { "type": "string", "pattern": "^[0-9]+$" },
        "local_tsc": { "type": "string", "pattern": "^[0-9]+$" },
        "seq_local": { "type": "integer" },
        "seq_wire": { "type": "integer" },
        "venue_id": { "type": "integer" },
        "feed_id": { "type": "integer" },
        "path_id": { "type": "integer" },
        "clock_quality": { "type": "integer" },
        "offset_est_ns": { "type": "integer" },
        "jitter_est_ns": { "type": "integer" },
        "fpga_pipeline_cycles": { "type": "integer" },
        "kernel_bypass": { "type": "boolean" },
        "leap_state": { "type": "integer" },
        "raw_ptr": { "type": "string" }
      }
    }
  }
}

This block is identical to the timing tuple we designed earlier, so every mutation carries:

  • absolute time
  • local causal time
  • clock quality
  • path lineage
  • FPGA pipeline depth
  • venue/feed identity

This makes every graph mutation time‑aligned, causally ordered, and audit‑ready.


VI. Example: Full GraphEventBus Mutation

Here is a realistic example of a single packet ingestion:

{
  "event_id": "evt_7f3c9d2b",
  "ts_utc_ns": "1734567890123456789",
  "provenance": {
    "utc_ns": "1734567890123456789",
    "local_tsc": "9876543210123",
    "seq_local": 102934,
    "seq_wire": 88421,
    "venue_id": 12,
    "feed_id": 3,
    "path_id": 2,
    "clock_quality": 13,
    "offset_est_ns": 42,
    "jitter_est_ns": 7,
    "fpga_pipeline_cycles": 128,
    "kernel_bypass": true,
    "leap_state": 0,
    "raw_ptr": "0x00007f12ab003000"
  },
  "mutations": [
    {
      "type": "node_create",
      "node_create": {
        "id": "event:1734567890123456789:88421",
        "kind": "event",
        "attributes": {
          "seq_local": 102934,
          "seq_wire": 88421
        }
      }
    },
    {
      "type": "edge_create",
      "edge_create": {
        "id": "edge:event->clock_utc:1734567890123456789",
        "source": "event:1734567890123456789:88421",
        "target": "clock_utc:gm1:1734567890",
        "kind": "observed_at",
        "attributes": {
          "ts_utc_ns": "1734567890123456789",
          "clock_quality": 13
        }
      }
    }
  ]
}

This is exactly what your WriteBus would commit.


VII. Moving Along:

1. A full JSON Schema bundle (single file, ready for your repo)

2. A GraphEventBus → HypergraphEngine compiler (pseudo‑code or Rust/C++)

3. A deterministic ID spec (hash inputs, canonical ordering rules)

4. A validation test suite (property‑based, fuzzing, golden files)

5. A real‑time GraphEvent → EntityEvent mapping layer

Considering code request

Okay, the user wants a concise but complete Python module. It should take a mutation envelope and apply it to an in-memory hypergraph. I need to make sure I map the nodes and edges correctly. I’ll keep it simple but not too brief, so it’s clear and functional. I’ll avoid being overly verbose, though, and focus on the essentials. Let’s figure out the best way to structure this.

GraphEventBus → HypergraphEngine compiler (Python)

Below is a compact, production‑style Python module that:

  • Ingests GraphEventBus mutation envelopes (the JSON we defined)
  • Applies them to an in‑memory HypergraphEngine (nodes + edges)
  • Preserves provenance and timing tuple on every write
  • Is idempotent (replays are safe)

You can later swap the in‑memory store for your real backend.


from dataclasses import dataclass, field
from typing import Dict, Any, List, Optional


# ─────────────────────────────────────────────────────────────
# Core data structures
# ─────────────────────────────────────────────────────────────

@dataclass
class Node:
    id: str
    kind: str
    attributes: Dict[str, Any] = field(default_factory=dict)


@dataclass
class Edge:
    id: str
    source: str
    target: str
    kind: str
    attributes: Dict[str, Any] = field(default_factory=dict)


@dataclass
class Provenance:
    utc_ns: int
    local_tsc: int
    seq_local: int
    seq_wire: int
    venue_id: int
    feed_id: int
    path_id: int
    clock_quality: int
    offset_est_ns: int
    jitter_est_ns: int
    fpga_pipeline_cycles: int
    kernel_bypass: bool
    leap_state: int
    raw_ptr: Optional[str] = None


class HypergraphEngine:
    """
    Minimal in-memory hypergraph:
    - nodes: id -> Node
    - edges: id -> Edge
    """

    def __init__(self) -> None:
        self.nodes: Dict[str, Node] = {}
        self.edges: Dict[str, Edge] = {}

    # ── Node ops ─────────────────────────────────────────────

    def create_node(self, node_id: str, kind: str,
                    attrs: Dict[str, Any]) -> Node:
        if node_id in self.nodes:
            # idempotent: merge attributes
            self.nodes[node_id].attributes.update(attrs)
            return self.nodes[node_id]
        node = Node(id=node_id, kind=kind, attributes=dict(attrs))
        self.nodes[node_id] = node
        return node

    def update_node(self, node_id: str, attrs: Dict[str, Any]) -> None:
        if node_id not in self.nodes:
            # optional: raise or create
            self.nodes[node_id] = Node(id=node_id, kind="unknown",
                                       attributes=dict(attrs))
        else:
            self.nodes[node_id].attributes.update(attrs)

    def delete_node(self, node_id: str) -> None:
        self.nodes.pop(node_id, None)
        # optional: also delete incident edges

    # ── Edge ops ─────────────────────────────────────────────

    def create_edge(self, edge_id: str, source: str, target: str,
                    kind: str, attrs: Dict[str, Any]) -> Edge:
        if edge_id in self.edges:
            self.edges[edge_id].attributes.update(attrs)
            return self.edges[edge_id]
        edge = Edge(id=edge_id, source=source, target=target,
                    kind=kind, attributes=dict(attrs))
        self.edges[edge_id] = edge
        return edge

    def update_edge(self, edge_id: str, attrs: Dict[str, Any]) -> None:
        if edge_id not in self.edges:
            # optional: raise or ignore
            return
        self.edges[edge_id].attributes.update(attrs)

    def delete_edge(self, edge_id: str) -> None:
        self.edges.pop(edge_id, None)


# ─────────────────────────────────────────────────────────────
# Compiler: GraphEventBus envelope → HypergraphEngine mutations
# ─────────────────────────────────────────────────────────────

class GraphEventCompiler:
    """
    Compiles GraphEventBus mutation envelopes into HypergraphEngine
    operations, attaching provenance to every write.
    """

    def __init__(self, engine: HypergraphEngine) -> None:
        self.engine = engine

    @staticmethod
    def _parse_provenance(p: Dict[str, Any]) -> Provenance:
        return Provenance(
            utc_ns=int(p["utc_ns"]),
            local_tsc=int(p["local_tsc"]),
            seq_local=int(p["seq_local"]),
            seq_wire=int(p["seq_wire"]),
            venue_id=int(p["venue_id"]),
            feed_id=int(p["feed_id"]),
            path_id=int(p["path_id"]),
            clock_quality=int(p["clock_quality"]),
            offset_est_ns=int(p["offset_est_ns"]),
            jitter_est_ns=int(p["jitter_est_ns"]),
            fpga_pipeline_cycles=int(p["fpga_pipeline_cycles"]),
            kernel_bypass=bool(p["kernel_bypass"]),
            leap_state=int(p["leap_state"]),
            raw_ptr=p.get("raw_ptr"),
        )

    @staticmethod
    def _provenance_to_attrs(prov: Provenance) -> Dict[str, Any]:
        return {
            "prov_utc_ns": prov.utc_ns,
            "prov_local_tsc": prov.local_tsc,
            "prov_seq_local": prov.seq_local,
            "prov_seq_wire": prov.seq_wire,
            "prov_venue_id": prov.venue_id,
            "prov_feed_id": prov.feed_id,
            "prov_path_id": prov.path_id,
            "prov_clock_quality": prov.clock_quality,
            "prov_offset_est_ns": prov.offset_est_ns,
            "prov_jitter_est_ns": prov.jitter_est_ns,
            "prov_fpga_pipeline_cycles": prov.fpga_pipeline_cycles,
            "prov_kernel_bypass": prov.kernel_bypass,
            "prov_leap_state": prov.leap_state,
            "prov_raw_ptr": prov.raw_ptr,
        }

    def apply_envelope(self, envelope: Dict[str, Any]) -> None:
        """
        Apply a single GraphEventBus envelope to the hypergraph.
        """
        provenance = self._parse_provenance(envelope["provenance"])
        prov_attrs = self._provenance_to_attrs(provenance)

        for mut in envelope.get("mutations", []):
            mtype = mut["type"]

            if mtype == "node_create":
                payload = mut["node_create"]
                attrs = dict(payload.get("attributes", {}))
                attrs.update(prov_attrs)
                self.engine.create_node(
                    node_id=payload["id"],
                    kind=payload["kind"],
                    attrs=attrs,
                )

            elif mtype == "node_update":
                payload = mut["node_update"]
                attrs = dict(payload.get("attributes", {}))
                attrs.update(prov_attrs)
                self.engine.update_node(
                    node_id=payload["id"],
                    attrs=attrs,
                )

            elif mtype == "node_delete":
                payload = mut["node_delete"]
                self.engine.delete_node(payload["id"])

            elif mtype == "edge_create":
                payload = mut["edge_create"]
                attrs = dict(payload.get("attributes", {}))
                attrs.update(prov_attrs)
                self.engine.create_edge(
                    edge_id=payload["id"],
                    source=payload["source"],
                    target=payload["target"],
                    kind=payload["kind"],
                    attrs=attrs,
                )

            elif mtype == "edge_update":
                payload = mut["edge_update"]
                attrs = dict(payload.get("attributes", {}))
                attrs.update(prov_attrs)
                self.engine.update_edge(
                    edge_id=payload["id"],
                    attrs=attrs,
                )

            elif mtype == "edge_delete":
                payload = mut["edge_delete"]
                self.engine.delete_edge(payload["id"])

            else:
                # Unknown mutation type: ignore or log
                continue


# ─────────────────────────────────────────────────────────────
# Example usage
# ─────────────────────────────────────────────────────────────

if __name__ == "__main__":
    engine = HypergraphEngine()
    compiler = GraphEventCompiler(engine)

    example_envelope = {
        "event_id": "evt_7f3c9d2b",
        "ts_utc_ns": "1734567890123456789",
        "provenance": {
            "utc_ns": "1734567890123456789",
            "local_tsc": "9876543210123",
            "seq_local": 102934,
            "seq_wire": 88421,
            "venue_id": 12,
            "feed_id": 3,
            "path_id": 2,
            "clock_quality": 13,
            "offset_est_ns": 42,
            "jitter_est_ns": 7,
            "fpga_pipeline_cycles": 128,
            "kernel_bypass": True,
            "leap_state": 0,
            "raw_ptr": "0x00007f12ab003000"
        },
        "mutations": [
            {
                "type": "node_create",
                "node_create": {
                    "id": "event:1734567890123456789:88421",
                    "kind": "event",
                    "attributes": {
                        "seq_local": 102934,
                        "seq_wire": 88421
                    }
                }
            },
            {
                "type": "edge_create",
                "edge_create": {
                    "id": "edge:event->clock_utc:1734567890123456789",
                    "source": "event:1734567890123456789:88421",
                    "target": "clock_utc:gm1:1734567890",
                    "kind": "observed_at",
                    "attributes": {
                        "ts_utc_ns": "1734567890123456789",
                        "clock_quality": 13
                    }
                }
            }
        ]
    }

    compiler.apply_envelope(example_envelope)

    # Quick sanity check
    print("Nodes:")
    for n in engine.nodes.values():
        print(n)

    print("\nEdges:")
    for e in engine.edges.values():
        print(e)

If you want, next step we can:

  • Add deterministic ID helpers (hashing utc_ns + seq_wire, etc.)
  • Wire this into a Kafka / NATS consumer
  • Or extend HypergraphEngine with query APIs for causal walks and latency slices.