Test History

PyUCIS stores a rich, time-series history of every test run inside each NCDB .cdb file. Introduced in NCDB v2, this binary test history is separate from the legacy UCIS history-node model and is designed for:

  • Trend analysis — identify flaky or consistently-failing tests over hundreds or thousands of runs.

  • Regression detection — spot when a test’s pass rate drops using a CUSUM change-point algorithm.

  • Coverage provenance — trace exactly which test runs contributed to the squashed coverage numbers.


Quick-start

Record test results with add_test_run():

from ucis.ncdb.ncdb_ucis import NcdbUCIS
from ucis.ncdb.ncdb_writer import NcdbWriter
from ucis.ncdb.constants import HIST_STATUS_OK, HIST_STATUS_FAIL
from ucis.mem.mem_ucis import MemUCIS

# Create or open a .cdb
NcdbWriter().write(MemUCIS(), "coverage.cdb")   # once, to initialise
db = NcdbUCIS("coverage.cdb")

# Record runs
db.add_test_run("uart_smoke", seed="42", status=HIST_STATUS_OK,
                ts=1700000000, has_coverage=True)
db.add_test_run("uart_smoke", seed="43", status=HIST_STATUS_FAIL,
                ts=1700003600, has_coverage=False)

# Save
NcdbWriter().write(db, "coverage.cdb")

Query the results:

db2 = NcdbUCIS("coverage.cdb")

# All runs for one test
history = db2.query_test_history("uart_smoke")
for rec in history:
    print(rec.ts, rec.status)

# Aggregate statistics
entry = db2.get_test_stats("uart_smoke")
print(f"total={entry.total_runs}  pass={entry.pass_count}  fail={entry.fail_count}")

# Top-flaky across all tests
for entry in db2.top_flaky_tests(10):
    print(entry.name_id, entry.flakiness_score)

API reference

NcdbUCIS.add_test_run(name: str, seed='0', status: int = 0, ts: int | None = None, cpu_time: float | None = None, has_coverage: bool = False, is_rerun: bool = False) int

Record one test run in the binary history store.

Automatically upgrades the manifest to history_format = "v2" on first call (no explicit opt-in required).

Parameters:
  • name – Test base-name (e.g. "uart_smoke").

  • seed – Test seed string or integer (converted to str).

  • status – One of the HIST_STATUS_* constants.

  • ts – Unix timestamp; defaults to int(time.time()).

  • cpu_time – CPU/wall time in seconds (optional).

  • has_coverage – True if this run produced coverage data.

  • is_rerun – True if this is a retry of a previously-failed run.

Returns:

The run_id assigned to this run.

NcdbUCIS.query_test_history(name: str, ts_from: int | None = None, ts_to: int | None = None) list

Return all BucketRecord objects for name across all buckets.

Parameters:
  • name – Test name to query.

  • ts_from – Optional lower bound timestamp (inclusive).

  • ts_to – Optional upper bound timestamp (inclusive).

Returns:

List of BucketRecord.

NcdbUCIS.get_test_stats(name: str)

Return the TestStatsEntry for name, or None if not seen.

Returns:

TestStatsEntry or None.

NcdbUCIS.top_flaky_tests(n: int = 20) list

Return top-n flakiest tests.

Returns:

List of TestStatsEntry.

NcdbUCIS.top_failing_tests(n: int = 20) list

Return top-n consistently-failing tests.

Returns:

List of TestStatsEntry.

NcdbUCIS.squash_coverage(policy: int = 1) None

Squash all active contrib entries into counts.bin contribution.

Records the squash in the squash_log for provenance auditing.

Parameters:

policy – Merge policy constant from contrib_index.


Status and flag values

Status constants (in ucis.ncdb.constants):

Constant

Meaning

HIST_STATUS_OK

Run passed

HIST_STATUS_FAIL

Run failed

HIST_STATUS_ERROR

Test infrastructure error (not a test-logic failure)

HIST_STATUS_TIMEOUT

Run exceeded wall-clock budget

HIST_STATUS_SKIP

Run was explicitly skipped

Flag constants (combinable with |):

Constant

Meaning

HIST_FLAG_HAS_COV

Run produced coverage data (counts.bin was updated)

HIST_FLAG_REGRESS

Run is part of a regression sweep

HIST_FLAG_RERUN

This is a re-run of a previously recorded test


Time-range queries

query_test_history() accepts optional ts_from and ts_to Unix-timestamp bounds:

import time
yesterday = int(time.time()) - 86400

# Only runs from the last 24 hours
recent = db.query_test_history("my_test", ts_from=yesterday)

The call uses the bucket index to skip buckets whose time ranges do not overlap, so queries over large history stores are fast even when only a small window is requested.


Merging history

History is merged automatically when two or more .cdb files are combined with NcdbMerger:

from ucis.ncdb.ncdb_merger import NcdbMerger

NcdbMerger().merge(["run_a.cdb", "run_b.cdb"], "merged.cdb")

The merger performs:

  1. Registry union — all test names and seed strings from all sources are collected into a single merged registry, preserving insertion order.

  2. Stats merge — per-test aggregate metrics (mean runtime, variance, pass rate) are combined using Chan’s parallel formula for numerically stable Welford-style mean/variance.

  3. Bucket remap — name_ids in each source’s bucket files are remapped to the merged registry before being written to the output.

  4. Contrib-index remap — run_ids in the contribution index are offset by the source’s base run_id so merged run_ids remain globally unique.

Note

Merging is idempotent: merging a file with itself produces the same statistics as the original (though run counts will double).


Squash coverage

Over time a .cdb accumulates contribution entries for every test run that produced coverage. Squashing compresses these entries into the main coverage counts and frees space:

db.squash_coverage(policy=POLICY_PASS_ONLY)
NcdbWriter().write(db, "coverage.cdb")

The squash event is recorded in the squash log so that provenance is never lost. The policy argument controls which runs are squashed:

Constant

Behaviour

POLICY_PASS_ONLY

Squash only runs with HIST_STATUS_OK

POLICY_ALL

Squash all runs regardless of status


Binary format overview

The v2 test history is stored as several members inside the NCDB ZIP archive. A history_format key in manifest.json selects the version:

  • "v1" — legacy UCIS history-node model (no binary history)

  • "v2" — binary test history (this section)

Binary members added for v2:

ZIP member

Contents

history/test_registry.bin

Ordered list of test names and seed strings with stable integer IDs

history/test_stats.bin

Per-test aggregate metrics (72 bytes/test)

history/bucket_index.bin

Index of time-bucketed run-record files (28 bytes/entry)

history/NNNNNN.bin

Individual run-record buckets (LZMA or DEFLATE compressed)

history/contrib_index.bin

Per-run coverage-contribution entries

history/squash_log.bin

Append-only log of squash events

For the full binary layout see 7. V2 binary test history in the format reference.

See also