Add a dataset or a baseline

The benchmark harness is open. The fastest way to make the comparison fairer is to add a dataset where it falls short, or a baseline it should be measured against.

What is open

The harness — the datasets registry, the per-algorithm runners, the metric definitions, and the methodology — is public and reproducible. The Ufinq algorithm itself is closed; it is run as one baseline among several, on the same fixed configuration and the same held-out splits as every other algorithm.

Good contributions

A dataset with a known ground-truth law, for the symbolic-recovery metric.
A real-world dataset that exposes a failure mode the current corpus misses.
A new baseline runner that meets the one-declared-configuration policy.
A correction to a metric definition or a dataset's provenance.

Contributions follow the methodology: one declared configuration per algorithm, uniform across datasets and seeds, with failures reported rather than hidden.

How a baseline is vetted

A new-baseline PR is gated by an automated conformance canary: it validates the runner manifest + Dockerfile, builds and scans the image, and runs the runner over a small canary set, checking that it produces well-formed, finite trials — conformance, not accuracy. The PR gets an automated approve / request-changes / block decision; a human reviewer makes the merge.