For the DESI collaboration · a working demonstration on your published DR1 / DR2 BAO data

A calibrated, deterministic consistency audit for BAO data releases.

We ran an independent, assumption-explicit audit instrument over the published DESI DR1 and DR2 BAO distance tables. It reproduced your BAO-alone Ωm to the fourth decimal, blind-localized the z = 0.51 feature your collaboration and the community debated, tracked its moderation into DR2 — and ships with a published fault-injection validation battery. Every number below is the verbatim output of a reproducible run (fixed seed, bit-identical re-runs).

Ωm · DR2 BAO-alone
0.2975 ± 0.0085
yours: 0.2975 ± 0.0086 — independent method, same answer
Blind localization
LRG1 z=0.510
DR1 max-tension bin, 2.46σ — found with no hints
False flags at 0.5σ
0–4%
measured across all 25 injection configs
Reproducibility
bit-exact
deterministic seed · 6/6 unit tests · no ML, no fitting priors beyond the stated stack
01 · Your result, your words

DESI's headline is a combination result — and DESI says so.

DR1 and DR2 report a preference for evolving dark energy (w0 > −1, wa < 0) that appears when BAO is combined with CMB and supernovae — 2.6σ (DR1, +CMB) growing to 3.1σ (DR2, +CMB), and 2.8σ / 3.8σ / 4.2σ depending on whether Pantheon+, Union3 or DES-Y5 supernovae are added (arXiv:2404.03002; arXiv:2503.14738).

“DESI BAO data alone are consistent with the standard flat ΛCDM cosmological model with a matter density Ωm = 0.295 ± 0.015.”

DESI 2024 VI, arXiv:2404.03002, abstract (verbatim) — DR2 puts the BAO-alone w0wa preference at “just 1.7σ” (arXiv:2503.14738 §VII.1). The starting point of this audit, not its discovery.

That structure — a signal living in dataset combinations, with significance swinging across external inputs, and independent reanalyses reaching materially different w0/wa — is precisely the situation where an independent, deterministic, calibrated audit layer adds value: not another posterior, but a reproducible instrument that regression-tests each data release against stated model classes and localizes worst-case tension, with measured error rates.

Fragility · SNe choice
2.8σ → 4.2σ
the same universe, three conclusions: Pantheon+ 2.8σ · Union3 3.8σ · DESY5 4.2σ (2503.14738 §IX); without CMB lensing 2.7σ, with primordial-only priors 2.4σ
Fragility · calibration dispute
4.2σ → 3.2σ
Efstathiou found a ~0.04 mag low/high-z offset in DES-Y5 (2408.07175); DES answered (2501.06664) — and DES's own 2025 Dovekie recalibration still moved the headline from 4.2σ to 3.2σ (2511.07517)
Fragility · CMB reanalysis
2.2–2.4σ
ACT DR6: 2.2σ with DESI DR1, 2.4σ with DR2 “with or without supernovae”; swapping DESI for BOSS BAO → “consistent with the cosmological constant” (2503.14454 §7.2)
Fragility · the z=0.51 bin
“most anomalous”
DESI DR1's own words (§3.2, ~2σ); Colgáin et al. showed it implies an in-bin Ωm = 0.668 and drives the BAO-alone w0 > −1 pull (2404.08633) — yet neither key paper runs a per-bin exclusion test
02 · The instrument

NOESIS-AUDIT: assumption-explicit, exact-or-abstain, never bluffs.

A deterministic engine (no neural networks, no LLM anywhere in the loop) that fits a stated model class by min-χ² with the BAO scale β = c/(H0rd) profiled analytically, measures per-bin Mahalanobis tension using your published (DM, DH) correlations, attributes misfit by leave-one-bin-out, and calibrates every threshold with a deterministic parametric bootstrap. If the optimum hits the grid boundary or the d.o.f. are insufficient — it abstains rather than reports.

What it is not

  • ×Not another cosmological inference pipeline — it produces no w0/wa posteriors
  • ×Not a claim about dark energy — it audits data-model consistency, release over release
  • ×Not a black box — every assumption enumerated: flat FRW, GR distances, Gaussian bins, bin independence (your own compressed-likelihood structure), Ωr omitted (documented, ~0.1% in E² at z=2.33)

What it gives you

  • A deterministic cross-check that reproduces your compressed-likelihood fit from 13 published numbers
  • Calibrated worst-case tension localization — a raw “2.5σ!” is scored against its real null distribution
  • A published fault-injection ROC: measured detection and false-alarm rates, including the blind spots
  • Bit-identical re-runs (fixed seed) — anyone can verify every number in this deck
03 · Proof on your data

Run blind on DR1 and DR2. Here is everything it found.

Inputs: only the published fiducial BAO tables (DR1: 12 measurements, arXiv:2404.03002 Table 1; DR2: 13 measurements, arXiv:2503.14738 Table IV) with their per-bin correlations. No CMB, no supernovae, no priors beyond the stated assumption stack.

kodon-cosmoaudit · audit desi-dr2.csv · verbatim output
FIT w0=-1 wa=0 | Om = 0.2975 +/- 0.0085 (dchi2=1) | beta=c/(H0 rd) = 29.5253 | rd*h = 101.538 Mpc | chi2_min = 10.539 / dof 11 bin BGS z=0.295 tension 0.526 sigma bin LRG1 z=0.510 tension 2.066 sigma bin LRG2 z=0.706 tension 1.991 sigma bin LRG3ELG1 z=0.934 tension 0.928 sigma bin ELG2 z=1.321 tension 0.648 sigma bin QSO z=1.484 tension 0.821 sigma bin LYA z=2.330 tension 0.268 sigma CALIBRATION (2000 mocks, deterministic): max-tension null p95 = 2.891 AUDIT VERDICT: CONSISTENT (observed max 2.066 sigma, bootstrap p = 0.396)

The reproduction gates

QuantityNOESIS-AUDITDESI publishedVerdict
DR2 BAO-alone Ωm0.2975 ± 0.00850.2975 ± 0.0086dead-on
DR1 BAO-alone Ωm0.2945 ± 0.01450.295 ± 0.015dead-on
DR1 rd·h101.881 Mpc≈101.8 ± 1.3 Mpcinside
DR2 rd·h101.538 Mpc101.54 ± 0.73 Mpc (Eq. 17)dead-on

The blind localization — and the release-over-release tracking

DR1 (2024)DR2 (2025)
Max-tension bin (found blind)LRG1 z=0.510 · 2.461σLRG1 z=0.510 · 2.066σ
Leave-one-out attributiondrop LRG1 → χ² improves 6.12 — dominant single binLRG1 4.64 / LRG2 4.53 — no longer single-bin dominant
Calibrated verdictCONSISTENT (bootstrap p = 0.138)CONSISTENT (bootstrap p = 0.396)

Given only the 12 published DR1 numbers, the instrument flagged LRG1 (z = 0.510) as the worst-case measurement — the bin your DR1 paper calls “the single most anomalous result” (§3.2) — and attributed the bulk of the misfit to it. Neither key paper runs a per-bin exclusion test (DR1's check was an SDSS substitution at z<0.6); the audit supplies that missing view, then measures the feature's moderation in DR2 (a shift your DR2 paper documents only in Table 4 / Fig. 6, without comment). Moderated, not resolved: your combined-dataset preference grew DR1→DR2. The calibration also quantifies why neither release is alarming BAO-alone: across 12–13 measurements, a worst bin of 2.85σ arises by chance 5% of the time.

Fixed-shape checks: all three of your published w0wa best-fits

Expansion shape (w0, wa)Ωmrd·h (Mpc)χ²min / 11 dof
ΛCDM (−1, 0)0.2975101.53810.539
Pantheon+ best-fit (−0.827, −0.75)0.321099.4209.489
Union3 best-fit (−0.64, −1.27)0.339096.7757.359
DESY5 best-fit (−0.727, −1.05)0.331098.0338.353

Honest reading: the evolving shapes fit DR2 BAO-alone mildly better (Δχ² = 1.1–3.2 at equal parameter count) — consistent with your own weak BAO-alone preference — but none is forced; both classes pass the calibrated audit. The Ωm↑ / rdh↓ shifts make the BAO degeneracy explicit and auditable.

04 · Validation battery

An audit you can trust has measured error rates. Here are ours.

σ-scaled fault injections into the DR2 data vector (per bin, per component, plus correlated two-component directions along/against your published r ≈ −0.4…−0.5), 200 mocks per configuration, flag = exceed the calibrated p95 threshold AND localize to the injected bin.

0.5σ corruptions
0–4% flags
at/below the 5% false-alarm baseline across all 25 configs — the must-not-flag spec holds
4σ corruptions
53–99.5%
detection for 24 of 25 configs (median ≈75%; hardest honest directions: along the error degeneracy)
The measured blind spot
Lyα D_H: 12.5%
a 4σ corruption of the dominant high-z anchor is largely absorbed into (Ωm, β) — a real limitation of any BAO-alone consistency analysis, measured and published rather than hidden
Why this matters: tension discussions in the literature are usually ad hoc — a number in a paper, argued case by case. A release-over-release audit with a published ROC turns them into an instrument reading: when this flags, you know its false-alarm rate; when it stays quiet, you know what it can and cannot see.
05 · What we do not claim

The integrity is the product.

No new cosmology
“BAO alone is consistent with ΛCDM” is your published conclusion. We independently replicate it with a different, deterministic method — the replication plus the audit layer is the contribution.
Exact-or-abstain, by construction
On noisy sequences the exact-law engine abstains (measured: it refused to force a power law on H(z) at 2.5% residual). The audit abstains on boundary optima and d.o.f. deficits. It has never emitted a bluffed result in 1,500+ recorded runs across domains.
Known approximations
Bin-bin covariances beyond your own compression are not modeled; Lyα mild non-Gaussianity treated Gaussian; Ωr omitted (~0.1% in E² at z=2.33). All documented in the data-file headers.
Retraction culture
In prior external work (operational weather/solar forecasting) we published our losses next to our wins and retracted claims that failed verification. Anything in this deck that fails your recomputation, we retract publicly.
06 · The ask

Let us run NOESIS-AUDIT as a regression test on DR3.

Zero cost, zero integration burden: we take your published (or pre-publication, under any agreement you prefer) compressed BAO tables, and return the deterministic audit — reproduction gates, calibrated per-bin tensions, release-over-release tracking, injection-validated. If it ever disagrees with your pipeline, that disagreement is itself the interesting number.

Step 1
Recompute us
Every number here reproduces from the published tables + the stated assumption stack. We provide the tool and the seed.
Step 2
Audit DR2 splits
Run the same instrument over your tracer/hemisphere/systematics splits — calibrated localization of any internal inconsistency.
Step 3
DR3 regression
Standing audit on each future release: what moved, what moderated, what newly tensions — with measured error rates.
Contact: Koscak Research (independent, self-funded). Full run artifacts, data files with provenance headers, and the audit tool accompany this deck. We are not asking for endorsement — we are asking for one skeptical hour to recompute us.