Test History
PyUCIS stores a rich, time-series history of every test run inside each NCDB
.cdb file. Introduced in NCDB v2, this binary test history is separate
from the legacy UCIS history-node model and is designed for:
Trend analysis — identify flaky or consistently-failing tests over hundreds or thousands of runs.
Regression detection — spot when a test’s pass rate drops using a CUSUM change-point algorithm.
Coverage provenance — trace exactly which test runs contributed to the squashed coverage numbers.
Quick-start
Record test results with add_test_run():
from ucis.ncdb.ncdb_ucis import NcdbUCIS
from ucis.ncdb.ncdb_writer import NcdbWriter
from ucis.ncdb.constants import HIST_STATUS_OK, HIST_STATUS_FAIL
from ucis.mem.mem_ucis import MemUCIS
# Create or open a .cdb
NcdbWriter().write(MemUCIS(), "coverage.cdb") # once, to initialise
db = NcdbUCIS("coverage.cdb")
# Record runs
db.add_test_run("uart_smoke", seed="42", status=HIST_STATUS_OK,
ts=1700000000, has_coverage=True)
db.add_test_run("uart_smoke", seed="43", status=HIST_STATUS_FAIL,
ts=1700003600, has_coverage=False)
# Save
NcdbWriter().write(db, "coverage.cdb")
Query the results:
db2 = NcdbUCIS("coverage.cdb")
# All runs for one test
history = db2.query_test_history("uart_smoke")
for rec in history:
print(rec.ts, rec.status)
# Aggregate statistics
entry = db2.get_test_stats("uart_smoke")
print(f"total={entry.total_runs} pass={entry.pass_count} fail={entry.fail_count}")
# Top-flaky across all tests
for entry in db2.top_flaky_tests(10):
print(entry.name_id, entry.flakiness_score)
API reference
- NcdbUCIS.add_test_run(name: str, seed='0', status: int = 0, ts: int | None = None, cpu_time: float | None = None, has_coverage: bool = False, is_rerun: bool = False) int
Record one test run in the binary history store.
Automatically upgrades the manifest to
history_format = "v2"on first call (no explicit opt-in required).- Parameters:
name – Test base-name (e.g.
"uart_smoke").seed – Test seed string or integer (converted to str).
status – One of the
HIST_STATUS_*constants.ts – Unix timestamp; defaults to
int(time.time()).cpu_time – CPU/wall time in seconds (optional).
has_coverage – True if this run produced coverage data.
is_rerun – True if this is a retry of a previously-failed run.
- Returns:
The run_id assigned to this run.
- NcdbUCIS.query_test_history(name: str, ts_from: int | None = None, ts_to: int | None = None) list
Return all BucketRecord objects for name across all buckets.
- Parameters:
name – Test name to query.
ts_from – Optional lower bound timestamp (inclusive).
ts_to – Optional upper bound timestamp (inclusive).
- Returns:
List of
BucketRecord.
- NcdbUCIS.get_test_stats(name: str)
Return the TestStatsEntry for name, or None if not seen.
- Returns:
TestStatsEntryor None.
- NcdbUCIS.top_flaky_tests(n: int = 20) list
Return top-n flakiest tests.
- Returns:
List of
TestStatsEntry.
Status and flag values
Status constants (in ucis.ncdb.constants):
Constant |
Meaning |
|---|---|
|
Run passed |
|
Run failed |
|
Test infrastructure error (not a test-logic failure) |
|
Run exceeded wall-clock budget |
|
Run was explicitly skipped |
Flag constants (combinable with |):
Constant |
Meaning |
|---|---|
|
Run produced coverage data (counts.bin was updated) |
|
Run is part of a regression sweep |
|
This is a re-run of a previously recorded test |
Time-range queries
query_test_history() accepts optional
ts_from and ts_to Unix-timestamp bounds:
import time
yesterday = int(time.time()) - 86400
# Only runs from the last 24 hours
recent = db.query_test_history("my_test", ts_from=yesterday)
The call uses the bucket index to skip buckets whose time ranges do not overlap, so queries over large history stores are fast even when only a small window is requested.
Merging history
History is merged automatically when two or more .cdb files are combined
with NcdbMerger:
from ucis.ncdb.ncdb_merger import NcdbMerger
NcdbMerger().merge(["run_a.cdb", "run_b.cdb"], "merged.cdb")
The merger performs:
Registry union — all test names and seed strings from all sources are collected into a single merged registry, preserving insertion order.
Stats merge — per-test aggregate metrics (mean runtime, variance, pass rate) are combined using Chan’s parallel formula for numerically stable Welford-style mean/variance.
Bucket remap — name_ids in each source’s bucket files are remapped to the merged registry before being written to the output.
Contrib-index remap — run_ids in the contribution index are offset by the source’s base run_id so merged run_ids remain globally unique.
Note
Merging is idempotent: merging a file with itself produces the same statistics as the original (though run counts will double).
Squash coverage
Over time a .cdb accumulates contribution entries for every test run
that produced coverage. Squashing compresses these entries into the main
coverage counts and frees space:
db.squash_coverage(policy=POLICY_PASS_ONLY)
NcdbWriter().write(db, "coverage.cdb")
The squash event is recorded in the squash log so that provenance is never
lost. The policy argument controls which runs are squashed:
Constant |
Behaviour |
|---|---|
|
Squash only runs with |
|
Squash all runs regardless of status |
Binary format overview
The v2 test history is stored as several members inside the NCDB ZIP archive.
A history_format key in manifest.json selects the version:
"v1"— legacy UCIS history-node model (no binary history)"v2"— binary test history (this section)
Binary members added for v2:
ZIP member |
Contents |
|---|---|
|
Ordered list of test names and seed strings with stable integer IDs |
|
Per-test aggregate metrics (72 bytes/test) |
|
Index of time-bucketed run-record files (28 bytes/entry) |
|
Individual run-record buckets (LZMA or DEFLATE compressed) |
|
Per-run coverage-contribution entries |
|
Append-only log of squash events |
For the full binary layout see 7. V2 binary test history in the format reference.
See also
NCDB Coverage File Format — Full NCDB binary format specification
Merging Coverage — How to merge
.cdbfiles on the command lineAnalyzing Coverage — Query and report coverage from the CLI