We ran an independent, assumption-explicit audit instrument over the published DESI DR1 and DR2 BAO distance tables. It reproduced your BAO-alone Ωm to the fourth decimal, blind-localized the z = 0.51 feature your collaboration and the community debated, tracked its moderation into DR2 — and ships with a published fault-injection validation battery. Every number below is the verbatim output of a reproducible run (fixed seed, bit-identical re-runs).
DR1 and DR2 report a preference for evolving dark energy (w0 > −1, wa < 0) that appears when BAO is combined with CMB and supernovae — 2.6σ (DR1, +CMB) growing to 3.1σ (DR2, +CMB), and 2.8σ / 3.8σ / 4.2σ depending on whether Pantheon+, Union3 or DES-Y5 supernovae are added (arXiv:2404.03002; arXiv:2503.14738).
“DESI BAO data alone are consistent with the standard flat ΛCDM cosmological model with a matter density Ωm = 0.295 ± 0.015.”
DESI 2024 VI, arXiv:2404.03002, abstract (verbatim) — DR2 puts the BAO-alone w0wa preference at “just 1.7σ” (arXiv:2503.14738 §VII.1). The starting point of this audit, not its discovery.That structure — a signal living in dataset combinations, with significance swinging across external inputs, and independent reanalyses reaching materially different w0/wa — is precisely the situation where an independent, deterministic, calibrated audit layer adds value: not another posterior, but a reproducible instrument that regression-tests each data release against stated model classes and localizes worst-case tension, with measured error rates.
A deterministic engine (no neural networks, no LLM anywhere in the loop) that fits a stated model class by min-χ² with the BAO scale β = c/(H0rd) profiled analytically, measures per-bin Mahalanobis tension using your published (DM, DH) correlations, attributes misfit by leave-one-bin-out, and calibrates every threshold with a deterministic parametric bootstrap. If the optimum hits the grid boundary or the d.o.f. are insufficient — it abstains rather than reports.
Inputs: only the published fiducial BAO tables (DR1: 12 measurements, arXiv:2404.03002 Table 1; DR2: 13 measurements, arXiv:2503.14738 Table IV) with their per-bin correlations. No CMB, no supernovae, no priors beyond the stated assumption stack.
| Quantity | NOESIS-AUDIT | DESI published | Verdict |
|---|---|---|---|
| DR2 BAO-alone Ωm | 0.2975 ± 0.0085 | 0.2975 ± 0.0086 | dead-on |
| DR1 BAO-alone Ωm | 0.2945 ± 0.0145 | 0.295 ± 0.015 | dead-on |
| DR1 rd·h | 101.881 Mpc | ≈101.8 ± 1.3 Mpc | inside |
| DR2 rd·h | 101.538 Mpc | 101.54 ± 0.73 Mpc (Eq. 17) | dead-on |
| DR1 (2024) | DR2 (2025) | |
|---|---|---|
| Max-tension bin (found blind) | LRG1 z=0.510 · 2.461σ | LRG1 z=0.510 · 2.066σ |
| Leave-one-out attribution | drop LRG1 → χ² improves 6.12 — dominant single bin | LRG1 4.64 / LRG2 4.53 — no longer single-bin dominant |
| Calibrated verdict | CONSISTENT (bootstrap p = 0.138) | CONSISTENT (bootstrap p = 0.396) |
Given only the 12 published DR1 numbers, the instrument flagged LRG1 (z = 0.510) as the worst-case measurement — the bin your DR1 paper calls “the single most anomalous result” (§3.2) — and attributed the bulk of the misfit to it. Neither key paper runs a per-bin exclusion test (DR1's check was an SDSS substitution at z<0.6); the audit supplies that missing view, then measures the feature's moderation in DR2 (a shift your DR2 paper documents only in Table 4 / Fig. 6, without comment). Moderated, not resolved: your combined-dataset preference grew DR1→DR2. The calibration also quantifies why neither release is alarming BAO-alone: across 12–13 measurements, a worst bin of 2.85σ arises by chance 5% of the time.
| Expansion shape (w0, wa) | Ωm | rd·h (Mpc) | χ²min / 11 dof |
|---|---|---|---|
| ΛCDM (−1, 0) | 0.2975 | 101.538 | 10.539 |
| Pantheon+ best-fit (−0.827, −0.75) | 0.3210 | 99.420 | 9.489 |
| Union3 best-fit (−0.64, −1.27) | 0.3390 | 96.775 | 7.359 |
| DESY5 best-fit (−0.727, −1.05) | 0.3310 | 98.033 | 8.353 |
Honest reading: the evolving shapes fit DR2 BAO-alone mildly better (Δχ² = 1.1–3.2 at equal parameter count) — consistent with your own weak BAO-alone preference — but none is forced; both classes pass the calibrated audit. The Ωm↑ / rdh↓ shifts make the BAO degeneracy explicit and auditable.
σ-scaled fault injections into the DR2 data vector (per bin, per component, plus correlated two-component directions along/against your published r ≈ −0.4…−0.5), 200 mocks per configuration, flag = exceed the calibrated p95 threshold AND localize to the injected bin.
Zero cost, zero integration burden: we take your published (or pre-publication, under any agreement you prefer) compressed BAO tables, and return the deterministic audit — reproduction gates, calibrated per-bin tensions, release-over-release tracking, injection-validated. If it ever disagrees with your pipeline, that disagreement is itself the interesting number.