.. _ncdb-format:

############################
NCDB Coverage File Format
############################

NCDB (*Native Coverage DataBase*) is a compact, ZIP-based binary format for
storing and merging UCIS coverage data.  A single ``.cdb`` file is a standard
ZIP archive whose members encode the scope hierarchy, hit counts, test history,
and source file references.

The format is designed to be:

* **Space-efficient** — typically 100–200× smaller than the equivalent SQLite
  ``.cdb`` (see :ref:`ncdb-benchmarks`).
* **Merge-fast** — same-schema merges reduce to element-wise integer addition
  over a flat array, with no SQL overhead.
* **Self-describing** — a ``manifest.json`` at the root of the archive carries
  all metadata needed to read or merge the file without any external schema.
* **Readable without PyUCIS** — every binary encoding is documented here in
  sufficient detail to write an independent parser.

.. contents:: On this page
   :local:
   :depth: 2

-----------

**********************
1. File identification
**********************

Both NCDB and the legacy SQLite backend use the ``.cdb`` extension.
Format discrimination is done by inspecting the first 16 bytes of the file.

.. list-table::
   :header-rows: 1
   :widths: 20 30 50

   * - Format
     - Header (hex)
     - Description
   * - SQLite
     - ``53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00``
     - Literal ASCII ``SQLite format 3\x00``
   * - NCDB (non-empty)
     - ``50 4B 03 04 …``
     - ZIP local-file header signature ``PK\x03\x04``
   * - NCDB (empty archive)
     - ``50 4B 05 06 …``
     - ZIP end-of-central-directory signature ``PK\x05\x06``

**Detection algorithm:**

1. Read the first 16 bytes of the file.
2. If ``bytes[0:16]`` equals the SQLite magic string → format is ``sqlite``.
3. If ``bytes[0:4]`` is ``PK\x03\x04`` or ``PK\x05\x06``:

   a. Open as ZIP.
   b. Read ``manifest.json``.
   c. If ``manifest["format"] == "NCDB"`` → format is ``ncdb``.

4. Otherwise → format is ``unknown``.

-----------

***********************
2. Archive structure
***********************

An NCDB file is a **standard ZIP archive** (DEFLATE compression) whose members
are named as follows.  Members marked *required* must be present in every valid
NCDB file; others are only written when the corresponding data is non-empty or
non-default.

.. list-table::
   :header-rows: 1
   :widths: 25 12 63

   * - Member name
     - Required
     - Contents
   * - ``manifest.json``
     - ✓
     - Format identity, version, statistics, and the schema hash.
   * - ``strings.bin``
     - ✓
     - Deduplicated string table referenced by index throughout other members.
   * - ``scope_tree.bin``
     - ✓
     - DFS-serialized scope hierarchy (V2 encoding).  Counts are *not* stored here.
   * - ``counts.bin``
     - ✓
     - Flat array of hit counts in the same DFS order as ``scope_tree.bin``.
   * - ``history.json``
     - ✓
     - Array of test-run and merge history records.
   * - ``sources.json``
     - ✓
     - Ordered list of source file paths; indices match file IDs in ``scope_tree.bin``.
   * - ``attrs.bin``
     - —
     - User-defined attribute assignments (V2 JSON: scopes, coveritems,
       history nodes, and global attributes).
   * - ``tags.json``
     - —
     - Tag assignments for scopes (sparse, DFS-indexed).
   * - ``toggle.bin``
     - —
     - Per-signal toggle metadata (JSON: canonical name, metric, type,
       direction).
   * - ``fsm.bin``
     - —
     - FSM state-index overrides (JSON, sparse; only written when state
       indices differ from the default 0, 1, 2, … sequence).
   * - ``cross.bin``
     - —
     - Cross-coverpoint link records (JSON: crossed coverpoint sibling names).
   * - ``properties.json``
     - —
     - Typed string property values (DFS scope-indexed).
   * - ``design_units.json``
     - —
     - Design-unit name-to-DFS-index lookup table (name, index, scope type).
   * - ``formal.bin``
     - —
     - Formal-verification assertion data (JSON: status, radius, witness).
   * - ``coveritem_flags.bin``
     - —
     - Per-coveritem non-default flags (sparse delta-encoded binary).
   * - ``contrib/<hist_idx>.bin``
     - —
     - Per-test coveritem contribution arrays (delta-encoded, sparse).
       One file per history node that has contributions; ``<hist_idx>`` is
       the integer history-node index (not zero-padded).

-----------

***********************
3. Primitive encodings
***********************

.. _ncdb-varint:

3.1 Unsigned LEB128 varint
==========================

All variable-length integers in NCDB are encoded as **unsigned LEB128**
(also called unsigned varint or ULEB128).  This is the same encoding used
by DWARF, WebAssembly, and Protocol Buffers (field type ``uint64``).

**Encoding:**

1. Take the 7 least-significant bits of the value; set bit 7 to ``1`` if more
   bytes follow, ``0`` if this is the last byte.
2. Shift the value right by 7.  Repeat until the value is zero.

.. code-block:: text

   value     bytes (hex)
   ────────────────────
   0         00
   1         01
   127       7F
   128       80 01
   255       FF 01
   16383     FF 7F
   16384     80 80 01
   2³²−1     FF FF FF FF 0F
   2⁶⁴−1     FF FF FF FF FF FF FF FF FF 01

**Decoding:**

Read bytes one at a time.  For each byte, take the low 7 bits and OR them into
the accumulator at the current bit position (starting at 0).  Advance the bit
position by 7.  If bit 7 of the byte is set, continue reading; otherwise stop.

.. code-block:: python

   def decode_varint(buf: bytes, offset: int = 0):
       result, shift = 0, 0
       while True:
           byte = buf[offset]; offset += 1
           result |= (byte & 0x7F) << shift
           shift += 7
           if not (byte & 0x80):
               return result, offset

3.2 UTF-8 strings
=================

All text is UTF-8.  Strings stored inline (e.g. in JSON members) are standard
JSON strings.  Strings stored in binary members (``scope_tree.bin``,
``strings.bin``) are referenced by their **string-table index** (a varint).

-----------

********************
4. manifest.json
********************

A JSON object with the following fields (all present; unknown fields must be
ignored by readers for forward compatibility):

.. code-block:: json

   {
     "format":          "NCDB",
     "version":         "1.0",
     "ucis_version":    "1.0",
     "created":         "2026-02-25T21:00:00Z",
     "path_separator":  "/",
     "scope_count":     42,
     "coveritem_count": 8800,
     "test_count":      64,
     "total_hits":      155432,
     "covered_bins":    7312,
     "schema_hash":     "sha256:a3f1...",
     "generator":       "pyucis-ncdb"
   }

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Field
     - Description
   * - ``format``
     - Always the string ``"NCDB"``.  Readers must reject files where this
       is not ``"NCDB"``.
   * - ``version``
     - Format version string.  Currently ``"1.0"``.  Readers should check
       the major component; a mismatch should produce a clear error.
   * - ``ucis_version``
     - UCIS standard version the data conforms to.  Currently ``"1.0"``.
   * - ``created``
     - ISO 8601 UTC timestamp when the file was written.
   * - ``path_separator``
     - Hierarchical path separator used in scope names.  Typically ``"/"``.
   * - ``scope_count``
     - Total number of scopes in ``scope_tree.bin`` (informational).
   * - ``coveritem_count``
     - Total number of coveritems.  Must equal the length of the array in
       ``counts.bin``.
   * - ``test_count``
     - Number of TEST-kind entries in ``history.json``.
   * - ``total_hits``
     - Sum of all values in ``counts.bin``.
   * - ``covered_bins``
     - Number of non-zero values in ``counts.bin``.
   * - ``schema_hash``
     - ``"sha256:"`` followed by the lowercase hex SHA-256 digest of the
       **uncompressed** ``scope_tree.bin`` content.  Used by the fast-merge
       path to verify schema identity without parsing the scope tree.
       (See :ref:`ncdb-merge`.)
   * - ``generator``
     - Free-form tool identification string.

-----------

********************
5. strings.bin
********************

A deduplicated string table.  Every string used anywhere in ``scope_tree.bin``
(scope names, coveritem names) is stored exactly once here and referenced by a
zero-based integer index.

**Binary layout:**

.. code-block:: text

   [count   : varint]          — number of strings
   [len_0   : varint]          — byte length of string 0 (UTF-8 encoded)
   [bytes_0 : len_0 bytes]     — UTF-8 bytes of string 0
   [len_1   : varint]
   [bytes_1 : len_1 bytes]
   ...

* **Index 0** is always the empty string ``""``.
* String indices are stable: the same string always maps to the same index
  within a single file (indices are assigned in first-encounter DFS order).

-----------

************************
6. scope_tree.bin
************************

The complete scope hierarchy encoded as a depth-first traversal.  The file
contains a flat sequence of scope records with no explicit end marker; the
count of child scopes embedded in each record defines the nesting.

Counts (hit values) are **not** stored in this member.  Instead, each
coveritem encountered during DFS appends its hit count to ``counts.bin`` in
the same traversal order.  A reader reconstructs the association by walking
``scope_tree.bin`` and consuming counts from ``counts.bin`` in lockstep.

6.1 Scope record types
=======================

Every scope record begins with a one-byte **marker**:

.. list-table::
   :header-rows: 1
   :widths: 15 20 65

   * - Marker byte
     - Name
     - Description
   * - ``0x00``
     - ``REGULAR``
     - Full scope record with type, name, presence bitfield, and children.
   * - ``0x01``
     - ``TOGGLE_PAIR``
     - Compact 2-field record for BRANCH scopes that carry exactly two
       TOGGLEBIN coveritems with the implicit names ``"0 -> 1"`` and
       ``"1 -> 0"``.  Saves ~10 bytes per signal.

6.2 REGULAR scope record
=========================

.. code-block:: text

   [marker    : 1 byte  ]  always 0x00
   [scope_type: varint  ]  ScopeTypeT integer value
   [name_ref  : varint  ]  index into strings.bin
   [presence  : varint  ]  bitfield of optional fields present (see below)

   — optional fields, each present only if the corresponding bit is set —
   [flags       : varint  ]  only if PRESENCE_FLAGS       (bit 0) set
   [file_id     : varint  ]  only if PRESENCE_SOURCE      (bit 1) set
   [line        : varint  ]     "
   [token       : varint  ]     "
   [weight      : varint  ]  only if PRESENCE_WEIGHT      (bit 2) set
   [at_least    : varint  ]  only if PRESENCE_AT_LEAST    (bit 3) set
   [goal        : varint  ]  only if PRESENCE_GOAL        (bit 5) set
   [source_type : varint  ]  only if PRESENCE_SOURCE_TYPE (bit 6) set

   — always present —
   [num_children : varint]  number of child scope records that follow
   [num_covers   : varint]  number of coveritem records that follow

   — present only when num_covers > 0 —
   [cover_type   : varint]  CoverTypeT of all coveritems in this scope

   — num_covers coveritem records —
   [name_ref_ci  : varint]  × num_covers   (one per coveritem)

   — num_children child scope records (recursive) —

**Presence bitfield values:**

.. list-table::
   :header-rows: 1
   :widths: 10 20 70

   * - Bit
     - Name
     - Meaning
   * - 0
     - ``PRESENCE_FLAGS``
     - Non-default scope flags are stored.
   * - 1
     - ``PRESENCE_SOURCE``
     - Source location (``file_id``, ``line``, ``token``) is stored.
   * - 2
     - ``PRESENCE_WEIGHT``
     - Non-default scope weight (≠ 1) is stored.
   * - 3
     - ``PRESENCE_AT_LEAST``
     - An ``at_least`` threshold that overrides the cover-type default is
       stored at the scope level (applies to all coveritems in the scope).
   * - 4
     - ``PRESENCE_CVG_OPTS``
     - Reserved for covergroup options (not yet used by the writer).
   * - 5
     - ``PRESENCE_GOAL``
     - Non-default scope goal (≠ −1) is stored.
   * - 6
     - ``PRESENCE_SOURCE_TYPE``
     - Explicit ``SourceT`` enum value is stored.  When absent, the source
       type defaults to ``SourceT.NONE``.

**Cover-type defaults** (used when ``PRESENCE_AT_LEAST`` is absent):

.. list-table::
   :header-rows: 1
   :widths: 30 15 15 15

   * - CoverTypeT
     - flags default
     - at_least default
     - weight default
   * - ``CVGBIN``
     - ``0x19``
     - **1**
     - 1
   * - All others (TOGGLEBIN, STMTBIN, BRANCHBIN, …)
     - ``0x01``
     - 0
     - 1

6.3 TOGGLE_PAIR record
=======================

.. code-block:: text

   [marker   : 1 byte ]  always 0x01
   [name_ref : varint ]  scope name index in strings.bin

A TOGGLE_PAIR record implicitly encodes:

* Scope type: ``BRANCH``
* Two TOGGLEBIN coveritems with names ``"0 -> 1"`` and ``"1 -> 0"`` (in that
  order).
* Two consecutive entries are consumed from ``counts.bin``: first the
  ``"0 -> 1"`` count, then the ``"1 -> 0"`` count.

No child scope records follow a TOGGLE_PAIR.

6.4 Scope-type integer values
==============================

The ``scope_type`` varint uses the integer values of ``ScopeTypeT``.
The most common values are:

.. list-table::
   :header-rows: 1
   :widths: 15 45 40

   * - Value
     - ScopeTypeT name
     - Typical context
   * - 2
     - ``DU_MODULE``
     - Design-unit scope for a Verilog module
   * - 16
     - ``INSTANCE``
     - Instantiation of a design unit
   * - 22
     - ``COVERGROUP``
     - SystemVerilog covergroup type or instance
   * - 23
     - ``COVERPOINT``
     - SystemVerilog coverpoint
   * - 28
     - ``CROSS``
     - SystemVerilog cross
   * - 30
     - ``BRANCH``
     - Code-coverage branch (toggle pair or regular)
   * - 32
     - ``TOGGLE``
     - Toggle scope (parent of BRANCH scopes)
   * - 33
     - ``FSM``
     - Finite state machine
   * - 36
     - ``BLOCK``
     - Statement block

The full set of values is defined in ``ucis/scope_type_t.py``.

-----------

********************
7. counts.bin
********************

A flat array of non-negative integers, one per coveritem, in the **same DFS
order** as the coveritems encountered while reading ``scope_tree.bin``.  TOGGLE_PAIR
scopes contribute two consecutive counts (``"0 -> 1"`` then ``"1 -> 0"``).

The array length is given by ``coveritem_count`` in ``manifest.json``.

7.1 Binary layout
==================

.. code-block:: text

   [mode  : 1 byte ]  0 = UINT32, 1 = VARINT
   [count : varint ]  number of integers that follow
   [data  : …      ]  mode-dependent encoding (see below)

**Mode 0 — UINT32:**
Each integer is a 4-byte little-endian unsigned 32-bit value.  Used when
most counts are large (i.e. varint encoding would not save space).

.. code-block:: text

   [v_0 : 4 bytes LE] [v_1 : 4 bytes LE] … [v_{n-1} : 4 bytes LE]

**Mode 1 — VARINT:**
Each integer is encoded as an unsigned LEB128 varint
(see :ref:`ncdb-varint`).  Used when most counts are small (0–127), which is
the common case for per-test databases.

.. code-block:: text

   [varint_0] [varint_1] … [varint_{n-1}]

**Mode selection:** The writer computes both encodings and selects VARINT when
``len(varint_encoding) < count × 4`` (i.e. when it is strictly smaller),
falling back to UINT32 otherwise.  A reader must support both modes.

7.2 Efficient single-byte fast path
=====================================

When mode is VARINT and all values fit in a single byte (0–127), each byte in
the data section is equal to the corresponding count value (the high bit is
never set).  A parser can exploit this: scan the data section for any byte
≥ 0x80; if none are found, each byte *is* its value, and the entire section
can be decoded with a single ``bytes → list`` conversion.

-----------

********************
8. history.json
********************

A JSON array of history node records.  Each element represents either a test
run (``kind: "TEST"``) or a merge operation (``kind: "MERGE"``).

**Record schema:**

.. code-block:: json

   [
     {
       "logical_name":  "regression_seed_42",
       "physical_name": null,
       "kind":          "TEST",
       "test_status":   0,
       "tool_category": "sim",
       "date":          "2026-02-25",
       "sim_time":      1500.0,
       "time_unit":     "ns",
       "run_cwd":       "/home/user/sim",
       "cpu_time":      12.3,
       "seed":          "42",
       "cmd":           "vsim -seed 42 top",
       "args":          "",
       "compulsory":    null,
       "user_name":     "jsmith",
       "cost":          0.0,
       "ucis_version":  null,
       "vendor_id":     null,
       "vendor_tool":   null,
       "vendor_tool_version": null,
       "same_tests":    null,
       "comment":       null
     }
   ]

.. list-table::
   :header-rows: 1
   :widths: 20 20 60

   * - Field
     - Type
     - Description
   * - ``logical_name``
     - string
     - Unique name for this history node (test name or merge label).
   * - ``physical_name``
     - string | null
     - Physical file name associated with the history node, or ``null``.
   * - ``kind``
     - ``"TEST"`` | ``"MERGE"``
     - History node kind.
   * - ``test_status``
     - integer
     - Test status code: 0 = OK, 1 = WARNING, 2 = ERROR, 3 = FATAL,
       4 = NOTRUN.
   * - ``tool_category``
     - string
     - Free-form tool category (e.g. ``"sim"``, ``"formal"``).
   * - ``date``
     - string
     - Date string (ISO 8601 recommended).
   * - ``sim_time``
     - number
     - Simulation end time in ``time_unit`` units.
   * - ``time_unit``
     - string
     - Simulation time unit (e.g. ``"ns"``, ``"ps"``).
   * - ``run_cwd``
     - string
     - Working directory of the simulation run.
   * - ``cpu_time``
     - number
     - CPU seconds consumed.
   * - ``seed``
     - string
     - Random seed used.
   * - ``cmd``
     - string
     - Simulator command line.
   * - ``args``
     - string
     - Additional arguments.
   * - ``compulsory``
     - any | null
     - Compulsory flag (tool-defined), or ``null`` if unset.
   * - ``user_name``
     - string
     - Username that ran the simulation.
   * - ``cost``
     - number
     - Simulation cost (tool-defined).
   * - ``ucis_version``
     - string | null
     - UCIS version associated with this history node, or ``null``.
   * - ``vendor_id``
     - string | null
     - Vendor identifier, or ``null``.
   * - ``vendor_tool``
     - string | null
     - Vendor tool name, or ``null``.
   * - ``vendor_tool_version``
     - string | null
     - Vendor tool version, or ``null``.
   * - ``same_tests``
     - integer | null
     - Number of identical tests merged, or ``null``.
   * - ``comment``
     - string | null
     - Free-form comment, or ``null``.

-----------

********************
9. sources.json
********************

A JSON array of strings, where each element is an absolute or relative file
path.  The position of each path in the array is its **file ID**, which is the
integer used as ``file_id`` in ``scope_tree.bin`` source references.

.. code-block:: json

   [
     "/home/user/design/top.sv",
     "/home/user/design/alu.sv",
     "/home/user/tb/coverage_pkg.sv"
   ]

File ID 0 corresponds to the first element.  An empty ``sources.json`` (``[]``)
is valid when no source information was recorded.

-----------

.. _ncdb-merge:

**************************
10. Merging NCDB files
**************************

The key performance advantage of NCDB over SQLite is the **same-schema fast
merge path**, which reduces a multi-file merge to element-wise integer addition.

10.1 Same-schema fast merge
============================

Two NCDB files are *schema-compatible* if and only if their ``schema_hash``
values are equal.  The ``schema_hash`` is ``"sha256:"`` followed by the
SHA-256 digest of the uncompressed ``scope_tree.bin`` bytes; equal hashes
guarantee an identical scope hierarchy and coveritem ordering.

**Algorithm for merging N same-schema files into one output file:**

1. Read ``manifest.json`` from all N sources.  Verify ``schema_hash`` is
   identical for all; if not, fall back to the cross-schema path.
2. Read ``counts.bin`` from all N sources → N lists of integers.
3. Compute the merged count array: element-wise sum of all N lists.
   (In Python: ``list(map(sum, zip(*all_counts)))``)
4. Concatenate all ``history.json`` arrays from all sources.  Append a new
   MERGE history node that references all source names.
5. Copy ``strings.bin``, ``scope_tree.bin``, and ``sources.json`` verbatim
   from the first source (they are identical for same-schema files).
6. Write the output ZIP with the merged manifest, the copied schema members,
   the merged ``counts.bin``, and the combined ``history.json``.

The scope tree and string table never need to be decoded for a same-schema
merge.

10.2 Cross-schema merge
========================

When the schema hashes differ, the merger must parse both scope trees, match
scopes by ``(path, type, name)`` key, and add counts for matched coveritems.
Unmatched coveritems from either source are appended with their original
counts.  This path is slower but correct for merging databases from designs
that have evolved between runs.

10.3 Merge history node
========================

A merge operation appends a ``"MERGE"``-kind history node to ``history.json``:

.. code-block:: json

   {
     "logical_name": "merge:output.cdb",
     "physical_name": null,
     "kind":   "MERGE",
     "test_status": 0,
     "tool_category": "merge",
     "date":   "2026-02-25T21:00:00Z"
   }

-----------

*******************************
11. Optional binary members
*******************************

These members are omitted from the archive when the corresponding data is
absent or all-default.  Readers must silently skip any optional member they
do not support, and must not fail if an expected optional member is absent.

11.1 attrs.bin
==============

User-defined attribute assignments.  Despite the ``.bin`` extension, this
member is JSON-encoded.

**Format v2** (current):

.. code-block:: json

   {
     "version": 2,
     "scopes": [
       {"idx": 0, "attrs": {"key": "value"}}
     ],
     "coveritems": [
       {"scope_idx": 0, "ci_idx": 1, "attrs": {"key": "value"}}
     ],
     "history": [
       {"idx": 0, "kind": "TEST", "attrs": {"key": "value"}}
     ],
     "global": {"key": "value"}
   }

``idx`` / ``scope_idx`` values are DFS scope indices (same ordering as
``scope_tree.bin``).  ``ci_idx`` is the zero-based coveritem position
within its parent scope.  Only objects with at least one attribute are
included (sparse).  The reader also accepts legacy **v1** files that store
only scope-level attributes.

11.2 tags.json
==============

Tag assignments for scopes (sparse, DFS-indexed).

.. code-block:: json

   {
     "version": 1,
     "entries": [
       {"idx": 0, "tags": ["tag_a", "tag_b"]}
     ]
   }

``idx`` is the DFS scope index.  Only scopes with at least one tag are
included.

11.3 toggle.bin
================

Per-signal toggle metadata for ``TOGGLE``-type scopes.  Despite the ``.bin``
extension, this member is JSON-encoded.

.. code-block:: json

   {
     "version": 1,
     "entries": [
       {"idx": 5, "canonical": "top.clk", "metric": 0, "type": 1, "dir": 2}
     ]
   }

``idx`` is the DFS scope index.  All fields except ``idx`` are optional and
are omitted when they match the defaults (``metric`` = ``ToggleMetricT._2STOGGLE``,
``type`` = ``ToggleTypeT.NET``, ``dir`` = ``ToggleDirT.INTERNAL``).  Only
``TOGGLE`` scopes with at least one non-default value are included.

11.4 fsm.bin
=============

FSM state-index overrides for ``FSM``-type scopes.  Despite the ``.bin``
extension, this member is JSON-encoded.  State and transition names are
already stored in ``scope_tree.bin`` as FSMBIN coveritems under FSM_STATES
and FSM_TRANS sub-scopes; this member only records non-sequential state
indices.

.. code-block:: json

   {
     "version": 1,
     "entries": [
       {"fsm_idx": 3, "states": [{"name": "IDLE", "index": 5}]}
     ]
   }

``fsm_idx`` is the DFS scope index of the ``FSM`` scope.  Only FSM scopes
whose state indices differ from the default 0, 1, 2, … sequence are included.
The member is omitted entirely when all indices are sequential.

11.5 cross.bin
===============

Cross-coverpoint link records for ``CROSS``-type scopes.  Despite the
``.bin`` extension, this member is JSON-encoded.

.. code-block:: json

   {
     "version": 1,
     "entries": [
       {"idx": 12, "crossed": ["cp_a", "cp_b"]}
     ]
   }

``idx`` is the DFS scope index of the ``CROSS`` scope.  ``crossed`` lists
the ``getScopeName()`` values of each crossed coverpoint (sibling scopes
within the same parent COVERGROUP/COVERINSTANCE).

11.6 properties.json
=====================

Typed string property values for scopes (DFS-indexed).

.. code-block:: json

   {
     "version": 1,
     "entries": [
       {"kind": "scope", "idx": 0, "key": 1, "type": "str", "value": "comment text"}
     ]
   }

``key`` is the integer value of the ``StrProperty`` enum.  Only scopes with
explicitly-set properties are included.

11.7 design_units.json
=======================

Design-unit name-to-DFS-index lookup table.

.. code-block:: json

   {
     "version": 1,
     "units": [
       {"name": "top", "idx": 0, "type": 2}
     ]
   }

``type`` is the integer value of ``ScopeTypeT`` (e.g. 2 = ``DU_MODULE``).
Only DU_ANY scopes are included.  The member is omitted when no design units
are present.

11.8 formal.bin
================

Formal-verification assertion data.  Despite the ``.bin`` extension, this
member is JSON-encoded.

.. code-block:: json

   {
     "version": 1,
     "entries": [
       {"idx": 42, "status": 1, "radius": 100, "witness": "/path/to/witness.vcd"}
     ]
   }

``idx`` is the flat DFS coveritem index (same ordering as ``counts.bin``).
Fields ``status``, ``radius``, and ``witness`` are each omitted when they
match the defaults (0, 0, ``null`` respectively).  Defaults:
``status`` = ``FormalStatusT.NONE`` (0), ``radius`` = 0.

11.9 coveritem_flags.bin
=========================

Per-coveritem non-default flags.  This member uses a true binary encoding
(sparse, delta-encoded varint pairs).

.. code-block:: text

   [version      : varint]  always 1
   [num_entries  : varint]  number of (index, flags) pairs
   per entry:
       [delta_idx : varint]  coveritem DFS index delta from previous entry
       [flags     : varint]  ucisFlagsT value

Only coveritems whose flags differ from the cover-type default (see
cover-type defaults table in Section 6.2) are included.  The member is
omitted entirely when all coveritems use default flags.

11.10 contrib/<hist_idx>.bin
=============================

Per-test contribution arrays.  One file per history node that recorded
contributions; ``<hist_idx>`` is the integer history-node index (not
zero-padded).  Each file encodes a sparse, delta-encoded array of per-test
hit counts, allowing reconstruction of which tests hit which bins.

.. code-block:: text

   [num_entries      : varint]
   per entry (sorted by bin_index, ascending):
       [delta_bin_index : varint]  bin_index − previous bin_index
       [count           : varint]  hit count for this bin from this test

-----------

***********************
12. Version history
***********************

.. list-table::
   :header-rows: 1
   :widths: 15 85

   * - Version
     - Changes
   * - ``1.0``
     - Initial release.  Scope-tree V2 encoding with presence bitfield and
       TOGGLE_PAIR optimization.  Varint + UINT32 dual-mode counts encoding.
       Same-schema fast-merge path via ``schema_hash``.
   * - ``1.0`` (UCIS compliance update)
     - Added presence bits 4–6 (``PRESENCE_CVG_OPTS``, ``PRESENCE_GOAL``,
       ``PRESENCE_SOURCE_TYPE``) to scope records.  Added
       ``coveritem_flags.bin`` member for per-coveritem non-default flags.
       Updated ``history.json`` to UCIS-compliant field names
       (``logical_name``, ``physical_name``, ``test_status``, ``sim_time``,
       ``time_unit``, ``run_cwd``, ``cpu_time``, ``user_name``) and added
       vendor/tool fields (``ucis_version``, ``vendor_id``, ``vendor_tool``,
       ``vendor_tool_version``, ``same_tests``, ``comment``, ``compulsory``).
       Upgraded ``attrs.bin`` to V2 format with sections for scopes,
       coveritems, history nodes, and global attributes.  Updated
       cover-type default flags to ``0x01`` (most types) / ``0x19``
       (``CVGBIN``).  Documented ``cross.bin``, ``properties.json``,
       ``design_units.json``, ``formal.bin``, and ``contrib/`` formats.

-----------

.. _ncdb-benchmarks:

*************************************
13. Size and performance reference
*************************************

Measurements using synthetic BM1–BM6 benchmark databases (pure Python,
no C accelerator, median of 3 merge runs):

.. list-table::
   :header-rows: 1
   :widths: 20 10 16 12 16 12 14

   * - Workload
     - Bins
     - SQLite/test
     - NCDB/test
     - Size ratio
     - SQLite merge
     - NCDB merge
   * - BM1 Counter
     - 5
     - 276 KB
     - 1.3 KB
     - **209×**
     - 22 ms
     - 1.2 ms
   * - BM2 ALU
     - 104
     - 276 KB
     - 1.4 KB
     - **196×**
     - 24 ms
     - 1.7 ms
   * - BM3 Protocol
     - 180
     - 276 KB
     - 1.4 KB
     - **195×**
     - 29 ms
     - 3.5 ms
   * - BM4 Hierarchy
     - 117
     - 276 KB
     - 1.4 KB
     - **195×**
     - 28 ms
     - 4.0 ms
   * - BM5 Bins (8K)
     - 8 800
     - 276 KB
     - 2.3 KB
     - **122×**
     - 40 ms
     - 17 ms
   * - BM6 SoC
     - 256
     - 276 KB
     - 1.4 KB
     - **192×**
     - 72 ms
     - 12 ms

*Merge seed counts: BM1=4, BM2=16, BM3=32, BM4=32, BM5=64, BM6=128.*

The SQLite per-test size is dominated by the fixed B-tree page overhead
(minimum 276 KB regardless of design size).  NCDB scales with actual data:
a design with 5 bins uses only 1.3 KB.

With a C accelerator for varint encode/decode, BM5 merge time is projected
to drop to ~5 ms (~7.5× faster than SQLite).

-----------

*****************************
14. Implementing a reader
*****************************

To read an NCDB file without PyUCIS:

.. code-block:: python

   import zipfile, json, struct, hashlib

   def read_varint(data, offset):
       result, shift = 0, 0
       while True:
           b = data[offset]; offset += 1
           result |= (b & 0x7F) << shift
           shift += 7
           if not (b & 0x80):
               return result, offset

   def read_ncdb(path):
       with zipfile.ZipFile(path) as zf:
           manifest = json.loads(zf.read("manifest.json"))
           assert manifest["format"] == "NCDB"

           strings_raw = zf.read("strings.bin")
           counts_raw  = zf.read("counts.bin")
           history     = json.loads(zf.read("history.json"))
           sources     = json.loads(zf.read("sources.json"))

       # Decode string table
       offset = 0
       n_strings, offset = read_varint(strings_raw, offset)
       strings = []
       for _ in range(n_strings):
           length, offset = read_varint(strings_raw, offset)
           strings.append(strings_raw[offset:offset+length].decode("utf-8"))
           offset += length

       # Decode counts
       mode = counts_raw[0]; offset = 1
       n_counts, offset = read_varint(counts_raw, offset)
       counts = []
       if mode == 1:  # VARINT
           # Fast path: all single-byte values
           payload = counts_raw[offset:offset + n_counts]
           if len(payload) == n_counts and all(b < 0x80 for b in payload):
               counts = list(payload)
           else:
               for _ in range(n_counts):
                   v, offset = read_varint(counts_raw, offset)
                   counts.append(v)
       else:  # UINT32
           counts = list(struct.unpack_from(f"<{n_counts}I", counts_raw, offset))

       return {
           "manifest": manifest,
           "strings":  strings,
           "counts":   counts,
           "history":  history,
           "sources":  sources,
       }

.. seealso::

   * :doc:`sqlite-schema` — SQLite backend schema reference
   * :doc:`xml-interchange` — XML interchange format
   * :ref:`working-with-coverage-merging` — How to merge databases using the CLI

-----------

.. _ncdb-format-v2-history:

***********************
7. V2 binary test history
***********************

When ``manifest.json`` contains ``"history_format": "v2"`` the archive holds
six additional binary members.  All integers are **little-endian** unless
noted.

7.1 ``history/test_registry.bin``
==================================

Maps stable integer IDs to test names and seed strings.  IDs are assigned by
insertion order and never reassigned.

.. code-block:: none

    Header (17 bytes):
      magic       u32   0x54524547  ('TREG')
      version     u8    1
      next_run_id u32   monotonically-increasing run counter
      num_names   u32
      num_seeds   u32

    Offset tables (immediately after header):
      name_offsets  u32[num_names]  byte offset into name heap
      seed_offsets  u32[num_seeds]  byte offset into seed heap

    Heaps (NUL-terminated UTF-8 strings):
      name_heap  NUL-terminated strings in name_id order
      seed_heap  NUL-terminated strings in seed_id order

7.2 ``history/test_stats.bin``
================================

One 72-byte entry per test name (indexed by name_id).

.. code-block:: none

    Header (9 bytes):
      magic      u32   0x54535453  ('TSTS')
      version    u8    1
      num_entries u32

    Entry (72 bytes, repeated num_entries times):
      name_id      u32
      total_runs   u32
      pass_count   u32
      fail_count   u32
      error_count  u32
      skip_count   u32
      timeout_count u32
      _reserved    u32   (padding, always 0)
      mean_ms      f32   Welford running mean of runtime in milliseconds
      m2_ms        f32   Welford running sum-of-squares (variance = m2/n)
      cusum_pos    f32   CUSUM positive accumulator for change detection
      cusum_neg    f32   CUSUM negative accumulator
      _pad1        f32   (reserved, 0.0)
      _pad2        f32   (reserved, 0.0)
      _pad3        f32   (reserved, 0.0)
      flakiness_score i16  fixed-point 0–10000 representing 0.00–100.00 %
      tag          u8[6] short ASCII label (NUL-padded)
      last_status  u8    most-recent HIST_STATUS_* value
      _trailing    u8    padding

7.3 ``history/bucket_index.bin``
==================================

Index over the per-bucket run-record files.

.. code-block:: none

    Header (9 bytes):
      magic       u32   0x42494458  ('BIDX')
      version     u8    1
      num_buckets u32

    Entry (28 bytes, sorted by bucket_seq):
      bucket_seq  u32
      ts_start    u32   Unix timestamp of first record in bucket
      ts_end      u32   Unix timestamp of last record in bucket
      num_records u32
      fail_count  u32
      min_name_id u32
      max_name_id u32

7.4 ``history/NNNNNN.bin``
============================

Each bucket holds up to 10 000 run records, compressed with LZMA (sealed
buckets) or DEFLATE level 1 (current open bucket).  After decompression:

.. code-block:: none

    Header (16 bytes):
      magic       u32   0x42434B54  ('BCKT')
      version     u8    1
      num_records u32
      num_names   u16
      _pad        u8    (padding)
      ts_base     u32   Unix timestamp of first record

    Name index (12 bytes per unique name in this bucket):
      name_id     u32   global name_id from test_registry
      offset      u32   byte offset into name's record data
      count       u16   number of records for this name
      _pad        u8[2]

    Columnar record data (one column per name, name_id order):
      seeds[]         u8[count]           local seed index (≤ 255 unique/bucket)
      ts_deltas[]     varint[count]       delta-encoded seconds from ts_base
      status_flags[]  u8[count]           nibble-packed (high=status, low=flags)

    Seed dictionary (appended after all record data):
      num_local_seeds u8
      seed_ids[]      u32[num_local_seeds]  global seed_ids

Varint encoding: each value uses 1–5 bytes; the high bit of each byte
indicates that more bytes follow (7 bits of value per byte, little-endian).

7.5 ``history/contrib_index.bin``
====================================

Tracks which test runs contributed coverage so that squash can be replayed.

.. code-block:: none

    Header (12 bytes):
      magic        u32   0x43494458  ('CIDX')
      version      u8    1
      policy       u8    merge-policy constant
      watermark    u32   highest squashed run_id
      num_active   u32

    Entry (16 bytes, one per unsquashed run):
      run_id    u32
      name_id   u32
      status    u8
      flags     u8
      _pad      u8[2]
      ts        u32

7.6 ``history/squash_log.bin``
================================

Append-only provenance log for squash events.

.. code-block:: none

    Header (9 bytes):
      magic      u32   0x53514C47  ('SQLG')
      version    u8    1
      num_entries u32

    Entry (24 bytes):
      ts        u32   Unix timestamp of squash operation
      policy    u8    merge-policy used
      _pad      u8[3]
      from_run  u32   first run_id squashed
      to_run    u32   last run_id squashed (inclusive)
      num_runs  u32   total runs processed
      pass_runs u32   runs that passed

----

**********************************
8. Testplan and Waivers JSON
**********************************

``testplan.json`` and ``waivers.json`` are optional UTF-8 JSON members
stored at the ZIP root.  They are written by :class:`~ucis.ncdb.ncdb_writer.NcdbWriter`
when the corresponding objects are attached to the database and are read
transparently by :class:`~ucis.ncdb.ncdb_reader.NcdbReader`.

8.1 ``testplan.json``
======================

.. code-block:: json

    {
      "format_version": 1,
      "source_file": "uart.hjson",
      "import_timestamp": "2025-01-01T00:00:00+00:00",
      "testpoints": [
        {
          "name": "uart_reset",
          "stage": "V1",
          "desc": "Verify reset",
          "tests": ["uart_smoke", "uart_reset_*"],
          "tags": ["smoke"],
          "na": false,
          "source_template": "",
          "requirements": [
            {"id": "REQ-001", "desc": "Reset spec"}
          ]
        }
      ],
      "covergroups": [
        {"name": "cg_reset", "desc": "Reset coverage"}
      ]
    }

.. list-table:: testplan.json — top-level fields
   :header-rows: 1
   :widths: 25 15 60

   * - Field
     - Type
     - Description
   * - ``format_version``
     - int
     - Schema version; currently ``1``
   * - ``source_file``
     - string
     - Path to the Hjson/JSON source that produced this plan
   * - ``import_timestamp``
     - ISO-8601 string
     - UTC timestamp when the plan was last imported
   * - ``testpoints``
     - array
     - Ordered list of :class:`~ucis.ncdb.testplan.Testpoint` objects
   * - ``covergroups``
     - array
     - Ordered list of :class:`~ucis.ncdb.testplan.CovergroupEntry` objects

Merger behaviour
   When merging two ``.cdb`` files that both contain ``testplan.json``:

   * **Same ``source_file``** — the entry with the later
     ``import_timestamp`` is kept.
   * **Different ``source_file``** — a warning is emitted and the merged
     output contains no testplan.

8.2 ``waivers.json``
======================

.. code-block:: json

    {
      "format_version": 1,
      "waivers": [
        {
          "id": "W-001",
          "scope_pattern": "top/uart/**",
          "bin_pattern": "reset_*",
          "rationale": "Deferred to V2",
          "approver": "jdoe",
          "approved_at": "2025-01-01T00:00:00",
          "expires_at": "2026-01-01T00:00:00",
          "status": "active"
        }
      ]
    }

.. list-table:: waivers.json — Waiver fields
   :header-rows: 1
   :widths: 25 15 60

   * - Field
     - Type
     - Description
   * - ``id``
     - string
     - Unique waiver identifier
   * - ``scope_pattern``
     - glob string
     - Hierarchy path pattern; ``*`` = single segment, ``**`` = any depth
   * - ``bin_pattern``
     - glob string
     - Coverage bin name pattern; same glob syntax as scope_pattern
   * - ``rationale``
     - string
     - Human-readable reason for the waiver
   * - ``approver``
     - string
     - Name or email of the approver
   * - ``approved_at``
     - ISO-8601 string
     - Approval timestamp
   * - ``expires_at``
     - ISO-8601 string
     - Expiry timestamp; empty string means no expiry
   * - ``status``
     - ``"active"`` | ``"expired"``
     - Current status; :meth:`~ucis.ncdb.waivers.WaiverSet.active_at` filters
       on both this field and ``expires_at``

Merger behaviour
   Waivers are unioned by ``id`` across all source files.  When the same
   ``id`` appears in multiple sources the entry with the latest
   ``approved_at`` is kept.