NCDB Coverage File Format

NCDB (Native Coverage DataBase) is a compact, ZIP-based binary format for storing and merging UCIS coverage data. A single .cdb file is a standard ZIP archive whose members encode the scope hierarchy, hit counts, test history, and source file references.

The format is designed to be:

Space-efficient — typically 100–200× smaller than the equivalent SQLite .cdb (see 13. Size and performance reference).
Merge-fast — same-schema merges reduce to element-wise integer addition over a flat array, with no SQL overhead.
Self-describing — a manifest.json at the root of the archive carries all metadata needed to read or merge the file without any external schema.
Readable without PyUCIS — every binary encoding is documented here in sufficient detail to write an independent parser.

1. File identification 

Both NCDB and the legacy SQLite backend use the .cdb extension. Format discrimination is done by inspecting the first 16 bytes of the file.

Format	Header (hex)	Description
SQLite	`53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00`	Literal ASCII `SQLite format 3\x00`
NCDB (non-empty)	`50 4B 03 04 …`	ZIP local-file header signature `PK\x03\x04`
NCDB (empty archive)	`50 4B 05 06 …`	ZIP end-of-central-directory signature `PK\x05\x06`

Detection algorithm:

Read the first 16 bytes of the file.
If bytes[0:16] equals the SQLite magic string → format is sqlite.
If bytes[0:4] is PK\x03\x04 or PK\x05\x06:
1. Open as ZIP.
2. Read manifest.json.
3. If manifest["format"] == "NCDB" → format is ncdb.
Otherwise → format is unknown.

2. Archive structure 

An NCDB file is a standard ZIP archive (DEFLATE compression) whose members are named as follows. Members marked required must be present in every valid NCDB file; others are only written when the corresponding data is non-empty or non-default.

Member name	Required	Contents
`manifest.json`	✓	Format identity, version, statistics, and the schema hash.
`strings.bin`	✓	Deduplicated string table referenced by index throughout other members.
`scope_tree.bin`	✓	DFS-serialized scope hierarchy (V2 encoding). Counts are not stored here.
`counts.bin`	✓	Flat array of hit counts in the same DFS order as `scope_tree.bin`.
`history.json`	✓	Array of test-run and merge history records.
`sources.json`	✓	Ordered list of source file paths; indices match file IDs in `scope_tree.bin`.
`attrs.bin`	—	User-defined attribute assignments (V2 JSON: scopes, coveritems, history nodes, and global attributes).
`tags.json`	—	Tag assignments for scopes (sparse, DFS-indexed).
`toggle.bin`	—	Per-signal toggle metadata (JSON: canonical name, metric, type, direction).
`fsm.bin`	—	FSM state-index overrides (JSON, sparse; only written when state indices differ from the default 0, 1, 2, … sequence).
`cross.bin`	—	Cross-coverpoint link records (JSON: crossed coverpoint sibling names).
`properties.json`	—	Typed string property values (DFS scope-indexed).
`design_units.json`	—	Design-unit name-to-DFS-index lookup table (name, index, scope type).
`formal.bin`	—	Formal-verification assertion data (JSON: status, radius, witness).
`coveritem_flags.bin`	—	Per-coveritem non-default flags (sparse delta-encoded binary).
`contrib/<hist_idx>.bin`	—	Per-test coveritem contribution arrays (delta-encoded, sparse). One file per history node that has contributions; `<hist_idx>` is the integer history-node index (not zero-padded).

3. Primitive encodings 

3.1 Unsigned LEB128 varint 

All variable-length integers in NCDB are encoded as unsigned LEB128 (also called unsigned varint or ULEB128). This is the same encoding used by DWARF, WebAssembly, and Protocol Buffers (field type uint64).

Encoding:

Take the 7 least-significant bits of the value; set bit 7 to 1 if more bytes follow, 0 if this is the last byte.
Shift the value right by 7. Repeat until the value is zero.

value     bytes (hex)
────────────────────
0         00
1         01
127       7F
128       80 01
255       FF 01
16383     FF 7F
16384     80 80 01
2³²−1     FF FF FF FF 0F
2⁶⁴−1     FF FF FF FF FF FF FF FF FF 01

Decoding:

Read bytes one at a time. For each byte, take the low 7 bits and OR them into the accumulator at the current bit position (starting at 0). Advance the bit position by 7. If bit 7 of the byte is set, continue reading; otherwise stop.

def decode_varint(buf: bytes, offset: int = 0):
    result, shift = 0, 0
    while True:
        byte = buf[offset]; offset += 1
        result |= (byte & 0x7F) << shift
        shift += 7
        if not (byte & 0x80):
            return result, offset

3.2 UTF-8 strings 

All text is UTF-8. Strings stored inline (e.g. in JSON members) are standard JSON strings. Strings stored in binary members (scope_tree.bin, strings.bin) are referenced by their string-table index (a varint).

4. manifest.json 

A JSON object with the following fields (all present; unknown fields must be ignored by readers for forward compatibility):

{
  "format":          "NCDB",
  "version":         "1.0",
  "ucis_version":    "1.0",
  "created":         "2026-02-25T21:00:00Z",
  "path_separator":  "/",
  "scope_count":     42,
  "coveritem_count": 8800,
  "test_count":      64,
  "total_hits":      155432,
  "covered_bins":    7312,
  "schema_hash":     "sha256:a3f1...",
  "generator":       "pyucis-ncdb"
}

Field	Description
`format`	Always the string `"NCDB"`. Readers must reject files where this is not `"NCDB"`.
`version`	Format version string. Currently `"1.0"`. Readers should check the major component; a mismatch should produce a clear error.
`ucis_version`	UCIS standard version the data conforms to. Currently `"1.0"`.
`created`	ISO 8601 UTC timestamp when the file was written.
`path_separator`	Hierarchical path separator used in scope names. Typically `"/"`.
`scope_count`	Total number of scopes in `scope_tree.bin` (informational).
`coveritem_count`	Total number of coveritems. Must equal the length of the array in `counts.bin`.
`test_count`	Number of TEST-kind entries in `history.json`.
`total_hits`	Sum of all values in `counts.bin`.
`covered_bins`	Number of non-zero values in `counts.bin`.
`schema_hash`	`"sha256:"` followed by the lowercase hex SHA-256 digest of the uncompressed `scope_tree.bin` content. Used by the fast-merge path to verify schema identity without parsing the scope tree. (See 10. Merging NCDB files.)
`generator`	Free-form tool identification string.

5. strings.bin 

A deduplicated string table. Every string used anywhere in scope_tree.bin (scope names, coveritem names) is stored exactly once here and referenced by a zero-based integer index.

Binary layout:

[count   : varint]          — number of strings
[len_0   : varint]          — byte length of string 0 (UTF-8 encoded)
[bytes_0 : len_0 bytes]     — UTF-8 bytes of string 0
[len_1   : varint]
[bytes_1 : len_1 bytes]
...

Index 0 is always the empty string "".
String indices are stable: the same string always maps to the same index within a single file (indices are assigned in first-encounter DFS order).

6. scope_tree.bin 

The complete scope hierarchy encoded as a depth-first traversal. The file contains a flat sequence of scope records with no explicit end marker; the count of child scopes embedded in each record defines the nesting.

Counts (hit values) are not stored in this member. Instead, each coveritem encountered during DFS appends its hit count to counts.bin in the same traversal order. A reader reconstructs the association by walking scope_tree.bin and consuming counts from counts.bin in lockstep.

6.1 Scope record types 

Every scope record begins with a one-byte marker:

Marker byte	Name	Description
`0x00`	`REGULAR`	Full scope record with type, name, presence bitfield, and children.
`0x01`	`TOGGLE_PAIR`	Compact 2-field record for BRANCH scopes that carry exactly two TOGGLEBIN coveritems with the implicit names `"0 -> 1"` and `"1 -> 0"`. Saves ~10 bytes per signal.

6.2 REGULAR scope record 

[marker    : 1 byte  ]  always 0x00
[scope_type: varint  ]  ScopeTypeT integer value
[name_ref  : varint  ]  index into strings.bin
[presence  : varint  ]  bitfield of optional fields present (see below)

— optional fields, each present only if the corresponding bit is set —
[flags       : varint  ]  only if PRESENCE_FLAGS       (bit 0) set
[file_id     : varint  ]  only if PRESENCE_SOURCE      (bit 1) set
[line        : varint  ]     "
[token       : varint  ]     "
[weight      : varint  ]  only if PRESENCE_WEIGHT      (bit 2) set
[at_least    : varint  ]  only if PRESENCE_AT_LEAST    (bit 3) set
[goal        : varint  ]  only if PRESENCE_GOAL        (bit 5) set
[source_type : varint  ]  only if PRESENCE_SOURCE_TYPE (bit 6) set

— always present —
[num_children : varint]  number of child scope records that follow
[num_covers   : varint]  number of coveritem records that follow

— present only when num_covers > 0 —
[cover_type   : varint]  CoverTypeT of all coveritems in this scope

— num_covers coveritem records —
[name_ref_ci  : varint]  × num_covers   (one per coveritem)

— num_children child scope records (recursive) —

Presence bitfield values:

Bit	Name	Meaning
0	`PRESENCE_FLAGS`	Non-default scope flags are stored.
1	`PRESENCE_SOURCE`	Source location (`file_id`, `line`, `token`) is stored.
2	`PRESENCE_WEIGHT`	Non-default scope weight (≠ 1) is stored.
3	`PRESENCE_AT_LEAST`	An `at_least` threshold that overrides the cover-type default is stored at the scope level (applies to all coveritems in the scope).
4	`PRESENCE_CVG_OPTS`	Reserved for covergroup options (not yet used by the writer).
5	`PRESENCE_GOAL`	Non-default scope goal (≠ −1) is stored.
6	`PRESENCE_SOURCE_TYPE`	Explicit `SourceT` enum value is stored. When absent, the source type defaults to `SourceT.NONE`.

Cover-type defaults (used when PRESENCE_AT_LEAST is absent):

CoverTypeT	flags default	at_least default	weight default
`CVGBIN`	`0x19`	1	1
All others (TOGGLEBIN, STMTBIN, BRANCHBIN, …)	`0x01`	0	1

6.3 TOGGLE_PAIR record 

[marker   : 1 byte ]  always 0x01
[name_ref : varint ]  scope name index in strings.bin

A TOGGLE_PAIR record implicitly encodes:

Scope type: BRANCH
Two TOGGLEBIN coveritems with names "0 -> 1" and "1 -> 0" (in that order).
Two consecutive entries are consumed from counts.bin: first the "0 -> 1" count, then the "1 -> 0" count.

No child scope records follow a TOGGLE_PAIR.

6.4 Scope-type integer values 

The scope_type varint uses the integer values of ScopeTypeT. The most common values are:

Value	ScopeTypeT name	Typical context
2	`DU_MODULE`	Design-unit scope for a Verilog module
16	`INSTANCE`	Instantiation of a design unit
22	`COVERGROUP`	SystemVerilog covergroup type or instance
23	`COVERPOINT`	SystemVerilog coverpoint
28	`CROSS`	SystemVerilog cross
30	`BRANCH`	Code-coverage branch (toggle pair or regular)
32	`TOGGLE`	Toggle scope (parent of BRANCH scopes)
33	`FSM`	Finite state machine
36	`BLOCK`	Statement block

The full set of values is defined in ucis/scope_type_t.py.

7. counts.bin 

A flat array of non-negative integers, one per coveritem, in the same DFS order as the coveritems encountered while reading scope_tree.bin. TOGGLE_PAIR scopes contribute two consecutive counts ("0 -> 1" then "1 -> 0").

The array length is given by coveritem_count in manifest.json.

7.1 Binary layout 

[mode  : 1 byte ]  0 = UINT32, 1 = VARINT
[count : varint ]  number of integers that follow
[data  : …      ]  mode-dependent encoding (see below)

Mode 0 — UINT32: Each integer is a 4-byte little-endian unsigned 32-bit value. Used when most counts are large (i.e. varint encoding would not save space).

[v_0 : 4 bytes LE] [v_1 : 4 bytes LE] … [v_{n-1} : 4 bytes LE]

Mode 1 — VARINT: Each integer is encoded as an unsigned LEB128 varint (see 3.1 Unsigned LEB128 varint). Used when most counts are small (0–127), which is the common case for per-test databases.

[varint_0] [varint_1] … [varint_{n-1}]

Mode selection: The writer computes both encodings and selects VARINT when len(varint_encoding) < count × 4 (i.e. when it is strictly smaller), falling back to UINT32 otherwise. A reader must support both modes.

7.2 Efficient single-byte fast path 

When mode is VARINT and all values fit in a single byte (0–127), each byte in the data section is equal to the corresponding count value (the high bit is never set). A parser can exploit this: scan the data section for any byte ≥ 0x80; if none are found, each byte is its value, and the entire section can be decoded with a single bytes → list conversion.

8. history.json 

A JSON array of history node records. Each element represents either a test run (kind: "TEST") or a merge operation (kind: "MERGE").

Record schema:

[
  {
    "logical_name":  "regression_seed_42",
    "physical_name": null,
    "kind":          "TEST",
    "test_status":   0,
    "tool_category": "sim",
    "date":          "2026-02-25",
    "sim_time":      1500.0,
    "time_unit":     "ns",
    "run_cwd":       "/home/user/sim",
    "cpu_time":      12.3,
    "seed":          "42",
    "cmd":           "vsim -seed 42 top",
    "args":          "",
    "compulsory":    null,
    "user_name":     "jsmith",
    "cost":          0.0,
    "ucis_version":  null,
    "vendor_id":     null,
    "vendor_tool":   null,
    "vendor_tool_version": null,
    "same_tests":    null,
    "comment":       null
  }
]

Field	Type	Description
`logical_name`	string	Unique name for this history node (test name or merge label).
`physical_name`	string \| null	Physical file name associated with the history node, or `null`.
`kind`	`"TEST"` \| `"MERGE"`	History node kind.
`test_status`	integer	Test status code: 0 = OK, 1 = WARNING, 2 = ERROR, 3 = FATAL, 4 = NOTRUN.
`tool_category`	string	Free-form tool category (e.g. `"sim"`, `"formal"`).
`date`	string	Date string (ISO 8601 recommended).
`sim_time`	number	Simulation end time in `time_unit` units.
`time_unit`	string	Simulation time unit (e.g. `"ns"`, `"ps"`).
`run_cwd`	string	Working directory of the simulation run.
`cpu_time`	number	CPU seconds consumed.
`seed`	string	Random seed used.
`cmd`	string	Simulator command line.
`args`	string	Additional arguments.
`compulsory`	any \| null	Compulsory flag (tool-defined), or `null` if unset.
`user_name`	string	Username that ran the simulation.
`cost`	number	Simulation cost (tool-defined).
`ucis_version`	string \| null	UCIS version associated with this history node, or `null`.
`vendor_id`	string \| null	Vendor identifier, or `null`.
`vendor_tool`	string \| null	Vendor tool name, or `null`.
`vendor_tool_version`	string \| null	Vendor tool version, or `null`.
`same_tests`	integer \| null	Number of identical tests merged, or `null`.
`comment`	string \| null	Free-form comment, or `null`.

9. sources.json 

A JSON array of strings, where each element is an absolute or relative file path. The position of each path in the array is its file ID, which is the integer used as file_id in scope_tree.bin source references.

[
  "/home/user/design/top.sv",
  "/home/user/design/alu.sv",
  "/home/user/tb/coverage_pkg.sv"
]

File ID 0 corresponds to the first element. An empty sources.json ([]) is valid when no source information was recorded.

10. Merging NCDB files 

The key performance advantage of NCDB over SQLite is the same-schema fast merge path, which reduces a multi-file merge to element-wise integer addition.

10.1 Same-schema fast merge 

Two NCDB files are schema-compatible if and only if their schema_hash values are equal. The schema_hash is "sha256:" followed by the SHA-256 digest of the uncompressed scope_tree.bin bytes; equal hashes guarantee an identical scope hierarchy and coveritem ordering.

Algorithm for merging N same-schema files into one output file:

Read manifest.json from all N sources. Verify schema_hash is identical for all; if not, fall back to the cross-schema path.
Read counts.bin from all N sources → N lists of integers.
Compute the merged count array: element-wise sum of all N lists. (In Python: list(map(sum, zip(*all_counts))))
Concatenate all history.json arrays from all sources. Append a new MERGE history node that references all source names.
Copy strings.bin, scope_tree.bin, and sources.json verbatim from the first source (they are identical for same-schema files).
Write the output ZIP with the merged manifest, the copied schema members, the merged counts.bin, and the combined history.json.

The scope tree and string table never need to be decoded for a same-schema merge.

10.2 Cross-schema merge 

When the schema hashes differ, the merger must parse both scope trees, match scopes by (path, type, name) key, and add counts for matched coveritems. Unmatched coveritems from either source are appended with their original counts. This path is slower but correct for merging databases from designs that have evolved between runs.

10.3 Merge history node 

A merge operation appends a "MERGE"-kind history node to history.json:

{
  "logical_name": "merge:output.cdb",
  "physical_name": null,
  "kind":   "MERGE",
  "test_status": 0,
  "tool_category": "merge",
  "date":   "2026-02-25T21:00:00Z"
}

11. Optional binary members 

These members are omitted from the archive when the corresponding data is absent or all-default. Readers must silently skip any optional member they do not support, and must not fail if an expected optional member is absent.

11.1 attrs.bin 

User-defined attribute assignments. Despite the .bin extension, this member is JSON-encoded.

Format v2 (current):

{
  "version": 2,
  "scopes": [
    {"idx": 0, "attrs": {"key": "value"}}
  ],
  "coveritems": [
    {"scope_idx": 0, "ci_idx": 1, "attrs": {"key": "value"}}
  ],
  "history": [
    {"idx": 0, "kind": "TEST", "attrs": {"key": "value"}}
  ],
  "global": {"key": "value"}
}

idx / scope_idx values are DFS scope indices (same ordering as scope_tree.bin). ci_idx is the zero-based coveritem position within its parent scope. Only objects with at least one attribute are included (sparse). The reader also accepts legacy v1 files that store only scope-level attributes.

11.2 tags.json 

Tag assignments for scopes (sparse, DFS-indexed).

{
  "version": 1,
  "entries": [
    {"idx": 0, "tags": ["tag_a", "tag_b"]}
  ]
}

idx is the DFS scope index. Only scopes with at least one tag are included.

11.3 toggle.bin 

Per-signal toggle metadata for TOGGLE-type scopes. Despite the .bin extension, this member is JSON-encoded.

{
  "version": 1,
  "entries": [
    {"idx": 5, "canonical": "top.clk", "metric": 0, "type": 1, "dir": 2}
  ]
}

idx is the DFS scope index. All fields except idx are optional and are omitted when they match the defaults (metric = ToggleMetricT._2STOGGLE, type = ToggleTypeT.NET, dir = ToggleDirT.INTERNAL). Only TOGGLE scopes with at least one non-default value are included.

11.4 fsm.bin 

FSM state-index overrides for FSM-type scopes. Despite the .bin extension, this member is JSON-encoded. State and transition names are already stored in scope_tree.bin as FSMBIN coveritems under FSM_STATES and FSM_TRANS sub-scopes; this member only records non-sequential state indices.

{
  "version": 1,
  "entries": [
    {"fsm_idx": 3, "states": [{"name": "IDLE", "index": 5}]}
  ]
}

fsm_idx is the DFS scope index of the FSM scope. Only FSM scopes whose state indices differ from the default 0, 1, 2, … sequence are included. The member is omitted entirely when all indices are sequential.

11.5 cross.bin 

Cross-coverpoint link records for CROSS-type scopes. Despite the .bin extension, this member is JSON-encoded.

{
  "version": 1,
  "entries": [
    {"idx": 12, "crossed": ["cp_a", "cp_b"]}
  ]
}

idx is the DFS scope index of the CROSS scope. crossed lists the getScopeName() values of each crossed coverpoint (sibling scopes within the same parent COVERGROUP/COVERINSTANCE).

11.6 properties.json 

Typed string property values for scopes (DFS-indexed).

{
  "version": 1,
  "entries": [
    {"kind": "scope", "idx": 0, "key": 1, "type": "str", "value": "comment text"}
  ]
}

key is the integer value of the StrProperty enum. Only scopes with explicitly-set properties are included.

11.7 design_units.json 

Design-unit name-to-DFS-index lookup table.

{
  "version": 1,
  "units": [
    {"name": "top", "idx": 0, "type": 2}
  ]
}

type is the integer value of ScopeTypeT (e.g. 2 = DU_MODULE). Only DU_ANY scopes are included. The member is omitted when no design units are present.

11.8 formal.bin 

Formal-verification assertion data. Despite the .bin extension, this member is JSON-encoded.

{
  "version": 1,
  "entries": [
    {"idx": 42, "status": 1, "radius": 100, "witness": "/path/to/witness.vcd"}
  ]
}

idx is the flat DFS coveritem index (same ordering as counts.bin). Fields status, radius, and witness are each omitted when they match the defaults (0, 0, null respectively). Defaults: status = FormalStatusT.NONE (0), radius = 0.

11.9 coveritem_flags.bin 

Per-coveritem non-default flags. This member uses a true binary encoding (sparse, delta-encoded varint pairs).

[version      : varint]  always 1
[num_entries  : varint]  number of (index, flags) pairs
per entry:
    [delta_idx : varint]  coveritem DFS index delta from previous entry
    [flags     : varint]  ucisFlagsT value

Only coveritems whose flags differ from the cover-type default (see cover-type defaults table in Section 6.2) are included. The member is omitted entirely when all coveritems use default flags.

11.10 contrib/<hist_idx>.bin 

Per-test contribution arrays. One file per history node that recorded contributions; <hist_idx> is the integer history-node index (not zero-padded). Each file encodes a sparse, delta-encoded array of per-test hit counts, allowing reconstruction of which tests hit which bins.

[num_entries      : varint]
per entry (sorted by bin_index, ascending):
    [delta_bin_index : varint]  bin_index − previous bin_index
    [count           : varint]  hit count for this bin from this test

12. Version history 

Version

Changes

1.0

Initial release. Scope-tree V2 encoding with presence bitfield and TOGGLE_PAIR optimization. Varint + UINT32 dual-mode counts encoding. Same-schema fast-merge path via schema_hash.

1.0 (UCIS compliance update)

Added presence bits 4–6 (PRESENCE_CVG_OPTS, PRESENCE_GOAL, PRESENCE_SOURCE_TYPE) to scope records. Added coveritem_flags.bin member for per-coveritem non-default flags. Updated history.json to UCIS-compliant field names (logical_name, physical_name, test_status, sim_time, time_unit, run_cwd, cpu_time, user_name) and added vendor/tool fields (ucis_version, vendor_id, vendor_tool, vendor_tool_version, same_tests, comment, compulsory). Upgraded attrs.bin to V2 format with sections for scopes, coveritems, history nodes, and global attributes. Updated cover-type default flags to 0x01 (most types) / 0x19 (CVGBIN). Documented cross.bin, properties.json, design_units.json, formal.bin, and contrib/ formats.

13. Size and performance reference 

Measurements using synthetic BM1–BM6 benchmark databases (pure Python, no C accelerator, median of 3 merge runs):

Workload	Bins	SQLite/test	NCDB/test	Size ratio	SQLite merge	NCDB merge
BM1 Counter	5	276 KB	1.3 KB	209×	22 ms	1.2 ms
BM2 ALU	104	276 KB	1.4 KB	196×	24 ms	1.7 ms
BM3 Protocol	180	276 KB	1.4 KB	195×	29 ms	3.5 ms
BM4 Hierarchy	117	276 KB	1.4 KB	195×	28 ms	4.0 ms
BM5 Bins (8K)	8 800	276 KB	2.3 KB	122×	40 ms	17 ms
BM6 SoC	256	276 KB	1.4 KB	192×	72 ms	12 ms

Merge seed counts: BM1=4, BM2=16, BM3=32, BM4=32, BM5=64, BM6=128.

The SQLite per-test size is dominated by the fixed B-tree page overhead (minimum 276 KB regardless of design size). NCDB scales with actual data: a design with 5 bins uses only 1.3 KB.

With a C accelerator for varint encode/decode, BM5 merge time is projected to drop to ~5 ms (~7.5× faster than SQLite).

14. Implementing a reader 

To read an NCDB file without PyUCIS:

import zipfile, json, struct, hashlib

def read_varint(data, offset):
    result, shift = 0, 0
    while True:
        b = data[offset]; offset += 1
        result |= (b & 0x7F) << shift
        shift += 7
        if not (b & 0x80):
            return result, offset

def read_ncdb(path):
    with zipfile.ZipFile(path) as zf:
        manifest = json.loads(zf.read("manifest.json"))
        assert manifest["format"] == "NCDB"

        strings_raw = zf.read("strings.bin")
        counts_raw  = zf.read("counts.bin")
        history     = json.loads(zf.read("history.json"))
        sources     = json.loads(zf.read("sources.json"))

    # Decode string table
    offset = 0
    n_strings, offset = read_varint(strings_raw, offset)
    strings = []
    for _ in range(n_strings):
        length, offset = read_varint(strings_raw, offset)
        strings.append(strings_raw[offset:offset+length].decode("utf-8"))
        offset += length

    # Decode counts
    mode = counts_raw[0]; offset = 1
    n_counts, offset = read_varint(counts_raw, offset)
    counts = []
    if mode == 1:  # VARINT
        # Fast path: all single-byte values
        payload = counts_raw[offset:offset + n_counts]
        if len(payload) == n_counts and all(b < 0x80 for b in payload):
            counts = list(payload)
        else:
            for _ in range(n_counts):
                v, offset = read_varint(counts_raw, offset)
                counts.append(v)
    else:  # UINT32
        counts = list(struct.unpack_from(f"<{n_counts}I", counts_raw, offset))

    return {
        "manifest": manifest,
        "strings":  strings,
        "counts":   counts,
        "history":  history,
        "sources":  sources,
    }

7. V2 binary test history 

When manifest.json contains "history_format": "v2" the archive holds six additional binary members. All integers are little-endian unless noted.

7.1 `history/test_registry.bin`

Maps stable integer IDs to test names and seed strings. IDs are assigned by insertion order and never reassigned.

Header (17 bytes):
  magic       u32   0x54524547  ('TREG')
  version     u8    1
  next_run_id u32   monotonically-increasing run counter
  num_names   u32
  num_seeds   u32

Offset tables (immediately after header):
  name_offsets  u32[num_names]  byte offset into name heap
  seed_offsets  u32[num_seeds]  byte offset into seed heap

Heaps (NUL-terminated UTF-8 strings):
  name_heap  NUL-terminated strings in name_id order
  seed_heap  NUL-terminated strings in seed_id order

7.2 `history/test_stats.bin`

One 72-byte entry per test name (indexed by name_id).

Header (9 bytes):
  magic      u32   0x54535453  ('TSTS')
  version    u8    1
  num_entries u32

Entry (72 bytes, repeated num_entries times):
  name_id      u32
  total_runs   u32
  pass_count   u32
  fail_count   u32
  error_count  u32
  skip_count   u32
  timeout_count u32
  _reserved    u32   (padding, always 0)
  mean_ms      f32   Welford running mean of runtime in milliseconds
  m2_ms        f32   Welford running sum-of-squares (variance = m2/n)
  cusum_pos    f32   CUSUM positive accumulator for change detection
  cusum_neg    f32   CUSUM negative accumulator
  _pad1        f32   (reserved, 0.0)
  _pad2        f32   (reserved, 0.0)
  _pad3        f32   (reserved, 0.0)
  flakiness_score i16  fixed-point 0–10000 representing 0.00–100.00 %
  tag          u8[6] short ASCII label (NUL-padded)
  last_status  u8    most-recent HIST_STATUS_* value
  _trailing    u8    padding

7.3 `history/bucket_index.bin`

Index over the per-bucket run-record files.

Header (9 bytes):
  magic       u32   0x42494458  ('BIDX')
  version     u8    1
  num_buckets u32

Entry (28 bytes, sorted by bucket_seq):
  bucket_seq  u32
  ts_start    u32   Unix timestamp of first record in bucket
  ts_end      u32   Unix timestamp of last record in bucket
  num_records u32
  fail_count  u32
  min_name_id u32
  max_name_id u32

7.4 `history/NNNNNN.bin`

Each bucket holds up to 10 000 run records, compressed with LZMA (sealed buckets) or DEFLATE level 1 (current open bucket). After decompression:

Header (16 bytes):
  magic       u32   0x42434B54  ('BCKT')
  version     u8    1
  num_records u32
  num_names   u16
  _pad        u8    (padding)
  ts_base     u32   Unix timestamp of first record

Name index (12 bytes per unique name in this bucket):
  name_id     u32   global name_id from test_registry
  offset      u32   byte offset into name's record data
  count       u16   number of records for this name
  _pad        u8[2]

Columnar record data (one column per name, name_id order):
  seeds[]         u8[count]           local seed index (≤ 255 unique/bucket)
  ts_deltas[]     varint[count]       delta-encoded seconds from ts_base
  status_flags[]  u8[count]           nibble-packed (high=status, low=flags)

Seed dictionary (appended after all record data):
  num_local_seeds u8
  seed_ids[]      u32[num_local_seeds]  global seed_ids

Varint encoding: each value uses 1–5 bytes; the high bit of each byte indicates that more bytes follow (7 bits of value per byte, little-endian).

7.5 `history/contrib_index.bin`

Tracks which test runs contributed coverage so that squash can be replayed.

Header (12 bytes):
  magic        u32   0x43494458  ('CIDX')
  version      u8    1
  policy       u8    merge-policy constant
  watermark    u32   highest squashed run_id
  num_active   u32

Entry (16 bytes, one per unsquashed run):
  run_id    u32
  name_id   u32
  status    u8
  flags     u8
  _pad      u8[2]
  ts        u32

7.6 `history/squash_log.bin`

Append-only provenance log for squash events.

Header (9 bytes):
  magic      u32   0x53514C47  ('SQLG')
  version    u8    1
  num_entries u32

Entry (24 bytes):
  ts        u32   Unix timestamp of squash operation
  policy    u8    merge-policy used
  _pad      u8[3]
  from_run  u32   first run_id squashed
  to_run    u32   last run_id squashed (inclusive)
  num_runs  u32   total runs processed
  pass_runs u32   runs that passed

8. Testplan and Waivers JSON 

testplan.json and waivers.json are optional UTF-8 JSON members stored at the ZIP root. They are written by NcdbWriter when the corresponding objects are attached to the database and are read transparently by NcdbReader.

8.1 `testplan.json`

{
  "format_version": 1,
  "source_file": "uart.hjson",
  "import_timestamp": "2025-01-01T00:00:00+00:00",
  "testpoints": [
    {
      "name": "uart_reset",
      "stage": "V1",
      "desc": "Verify reset",
      "tests": ["uart_smoke", "uart_reset_*"],
      "tags": ["smoke"],
      "na": false,
      "source_template": "",
      "requirements": [
        {"id": "REQ-001", "desc": "Reset spec"}
      ]
    }
  ],
  "covergroups": [
    {"name": "cg_reset", "desc": "Reset coverage"}
  ]
}

testplan.json — top-level fields
Field	Type	Description
`format_version`	int	Schema version; currently `1`
`source_file`	string	Path to the Hjson/JSON source that produced this plan
`import_timestamp`	ISO-8601 string	UTC timestamp when the plan was last imported
`testpoints`	array	Ordered list of `Testpoint` objects
`covergroups`	array	Ordered list of `CovergroupEntry` objects

Merger behaviour

When merging two .cdb files that both contain testplan.json:

Same ``source_file`` — the entry with the later import_timestamp is kept.
Different ``source_file`` — a warning is emitted and the merged output contains no testplan.

8.2 `waivers.json`

{
  "format_version": 1,
  "waivers": [
    {
      "id": "W-001",
      "scope_pattern": "top/uart/**",
      "bin_pattern": "reset_*",
      "rationale": "Deferred to V2",
      "approver": "jdoe",
      "approved_at": "2025-01-01T00:00:00",
      "expires_at": "2026-01-01T00:00:00",
      "status": "active"
    }
  ]
}

waivers.json — Waiver fields
Field	Type	Description
`id`	string	Unique waiver identifier
`scope_pattern`	glob string	Hierarchy path pattern; `` = single segment, `*` = any depth
`bin_pattern`	glob string	Coverage bin name pattern; same glob syntax as scope_pattern
`rationale`	string	Human-readable reason for the waiver
`approver`	string	Name or email of the approver
`approved_at`	ISO-8601 string	Approval timestamp
`expires_at`	ISO-8601 string	Expiry timestamp; empty string means no expiry
`status`	`"active"` \| `"expired"`	Current status; `active_at()` filters on both this field and `expires_at`

Merger behaviour: Waivers are unioned by id across all source files. When the same id appears in multiple sources the entry with the latest approved_at is kept.