NOESIS-AUDIT × DESI · A calibrated consistency audit for BAO data releases

01 · Your result, your words

DESI's headline is a combination result — and DESI says so.

DR1 and DR2 report a preference for evolving dark energy (w₀ > −1, w_a < 0) that appears when BAO is combined with CMB and supernovae — 2.6σ (DR1, +CMB) growing to 3.1σ (DR2, +CMB), and 2.8σ / 3.8σ / 4.2σ depending on whether Pantheon+, Union3 or DES-Y5 supernovae are added (arXiv:2404.03002; arXiv:2503.14738).

“DESI BAO data alone are consistent with the standard flat ΛCDM cosmological model with a matter density Ωm = 0.295 ± 0.015.”

DESI 2024 VI, arXiv:2404.03002, abstract (verbatim) — DR2 puts the BAO-alone w0wa preference at “just 1.7σ” (arXiv:2503.14738 §VII.1). The starting point of this audit, not its discovery.

That structure — a signal living in dataset combinations, with significance swinging across external inputs, and independent reanalyses reaching materially different w₀/w_a — is precisely the situation where an independent, deterministic, calibrated audit layer adds value: not another posterior, but a reproducible instrument that regression-tests each data release against stated model classes and localizes worst-case tension, with measured error rates.

Fragility · SNe choice

2.8σ → 4.2σ

the same universe, three conclusions: Pantheon+ 2.8σ · Union3 3.8σ · DESY5 4.2σ (2503.14738 §IX); without CMB lensing 2.7σ, with primordial-only priors 2.4σ

Fragility · calibration dispute

4.2σ → 3.2σ

Efstathiou found a ~0.04 mag low/high-z offset in DES-Y5 (2408.07175); DES answered (2501.06664) — and DES's own 2025 Dovekie recalibration still moved the headline from 4.2σ to 3.2σ (2511.07517)

Fragility · CMB reanalysis

2.2–2.4σ

ACT DR6: 2.2σ with DESI DR1, 2.4σ with DR2 “with or without supernovae”; swapping DESI for BOSS BAO → “consistent with the cosmological constant” (2503.14454 §7.2)

Fragility · the z=0.51 bin

“most anomalous”

DESI DR1's own words (§3.2, ~2σ); Colgáin et al. showed it implies an in-bin Ωm = 0.668 and drives the BAO-alone w0 > −1 pull (2404.08633) — yet neither key paper runs a per-bin exclusion test

02 · The instrument

NOESIS-AUDIT: assumption-explicit, exact-or-abstain, never bluffs.

A deterministic engine (no neural networks, no LLM anywhere in the loop) that fits a stated model class by min-χ² with the BAO scale β = c/(H₀r_d) profiled analytically, measures per-bin Mahalanobis tension using your published (D_M, D_H) correlations, attributes misfit by leave-one-bin-out, and calibrates every threshold with a deterministic parametric bootstrap. If the optimum hits the grid boundary or the d.o.f. are insufficient — it abstains rather than reports.

What it is not

×Not another cosmological inference pipeline — it produces no w₀/w_a posteriors
×Not a claim about dark energy — it audits data-model consistency, release over release
×Not a black box — every assumption enumerated: flat FRW, GR distances, Gaussian bins, bin independence (your own compressed-likelihood structure), Ω_r omitted (documented, ~0.1% in E² at z=2.33)

What it gives you

✓A deterministic cross-check that reproduces your compressed-likelihood fit from 13 published numbers
✓Calibrated worst-case tension localization — a raw “2.5σ!” is scored against its real null distribution
✓A published fault-injection ROC: measured detection and false-alarm rates, including the blind spots
✓Bit-identical re-runs (fixed seed) — anyone can verify every number in this deck

03 · Proof on your data

Run blind on DR1 and DR2. Here is everything it found.

Inputs: only the published fiducial BAO tables (DR1: 12 measurements, arXiv:2404.03002 Table 1; DR2: 13 measurements, arXiv:2503.14738 Table IV) with their per-bin correlations. No CMB, no supernovae, no priors beyond the stated assumption stack.

kodon-cosmoaudit · audit desi-dr2.csv · verbatim output

FIT w0=-1 wa=0 | Om = 0.2975 +/- 0.0085 (dchi2=1) | beta=c/(H0 rd) = 29.5253 | rd*h = 101.538 Mpc | chi2_min = 10.539 / dof 11 bin BGS z=0.295 tension 0.526 sigma bin LRG1 z=0.510 tension 2.066 sigma bin LRG2 z=0.706 tension 1.991 sigma bin LRG3ELG1 z=0.934 tension 0.928 sigma bin ELG2 z=1.321 tension 0.648 sigma bin QSO z=1.484 tension 0.821 sigma bin LYA z=2.330 tension 0.268 sigma CALIBRATION (2000 mocks, deterministic): max-tension null p95 = 2.891 AUDIT VERDICT: CONSISTENT (observed max 2.066 sigma, bootstrap p = 0.396)

The reproduction gates

Quantity	NOESIS-AUDIT	DESI published	Verdict
DR2 BAO-alone Ω_m	0.2975 ± 0.0085	0.2975 ± 0.0086	dead-on
DR1 BAO-alone Ω_m	0.2945 ± 0.0145	0.295 ± 0.015	dead-on
DR1 r_d·h	101.881 Mpc	≈101.8 ± 1.3 Mpc	inside
DR2 r_d·h	101.538 Mpc	101.54 ± 0.73 Mpc (Eq. 17)	dead-on

The blind localization — and the release-over-release tracking

	DR1 (2024)	DR2 (2025)
Max-tension bin (found blind)	LRG1 z=0.510 · 2.461σ	LRG1 z=0.510 · 2.066σ
Leave-one-out attribution	drop LRG1 → χ² improves 6.12 — dominant single bin	LRG1 4.64 / LRG2 4.53 — no longer single-bin dominant
Calibrated verdict	CONSISTENT (bootstrap p = 0.138)	CONSISTENT (bootstrap p = 0.396)

Given only the 12 published DR1 numbers, the instrument flagged LRG1 (z = 0.510) as the worst-case measurement — the bin your DR1 paper calls “the single most anomalous result” (§3.2) — and attributed the bulk of the misfit to it. Neither key paper runs a per-bin exclusion test (DR1's check was an SDSS substitution at z<0.6); the audit supplies that missing view, then measures the feature's moderation in DR2 (a shift your DR2 paper documents only in Table 4 / Fig. 6, without comment). Moderated, not resolved: your combined-dataset preference grew DR1→DR2. The calibration also quantifies why neither release is alarming BAO-alone: across 12–13 measurements, a worst bin of 2.85σ arises by chance 5% of the time.

Fixed-shape checks: all three of your published w₀w_a best-fits

Expansion shape (w0, wa)	Ωm	rd·h (Mpc)	χ²min / 11 dof
ΛCDM (−1, 0)	0.2975	101.538	10.539
Pantheon+ best-fit (−0.827, −0.75)	0.3210	99.420	9.489
Union3 best-fit (−0.64, −1.27)	0.3390	96.775	7.359
DESY5 best-fit (−0.727, −1.05)	0.3310	98.033	8.353

Honest reading: the evolving shapes fit DR2 BAO-alone mildly better (Δχ² = 1.1–3.2 at equal parameter count) — consistent with your own weak BAO-alone preference — but none is forced; both classes pass the calibrated audit. The Ω_m↑ / r_dh↓ shifts make the BAO degeneracy explicit and auditable.

04 · Validation battery

An audit you can trust has measured error rates. Here are ours.

σ-scaled fault injections into the DR2 data vector (per bin, per component, plus correlated two-component directions along/against your published r ≈ −0.4…−0.5), 200 mocks per configuration, flag = exceed the calibrated p95 threshold AND localize to the injected bin.

0.5σ corruptions

0–4% flags

at/below the 5% false-alarm baseline across all 25 configs — the must-not-flag spec holds

4σ corruptions

53–99.5%

detection for 24 of 25 configs (median ≈75%; hardest honest directions: along the error degeneracy)

The measured blind spot

Lyα D_H: 12.5%

a 4σ corruption of the dominant high-z anchor is largely absorbed into (Ωm, β) — a real limitation of any BAO-alone consistency analysis, measured and published rather than hidden

Why this matters: tension discussions in the literature are usually ad hoc — a number in a paper, argued case by case. A release-over-release audit with a published ROC turns them into an instrument reading: when this flags, you know its false-alarm rate; when it stays quiet, you know what it can and cannot see.

05 · What we do not claim

The integrity is the product.

No new cosmology

“BAO alone is consistent with ΛCDM” is your published conclusion. We independently replicate it with a different, deterministic method — the replication plus the audit layer is the contribution.

Exact-or-abstain, by construction

On noisy sequences the exact-law engine abstains (measured: it refused to force a power law on H(z) at 2.5% residual). The audit abstains on boundary optima and d.o.f. deficits. It has never emitted a bluffed result in 1,500+ recorded runs across domains.

Known approximations

Bin-bin covariances beyond your own compression are not modeled; Lyα mild non-Gaussianity treated Gaussian; Ω_r omitted (~0.1% in E² at z=2.33). All documented in the data-file headers.

Retraction culture

In prior external work (operational weather/solar forecasting) we published our losses next to our wins and retracted claims that failed verification. Anything in this deck that fails your recomputation, we retract publicly.

06 · The ask

Let us run NOESIS-AUDIT as a regression test on DR3.

Zero cost, zero integration burden: we take your published (or pre-publication, under any agreement you prefer) compressed BAO tables, and return the deterministic audit — reproduction gates, calibrated per-bin tensions, release-over-release tracking, injection-validated. If it ever disagrees with your pipeline, that disagreement is itself the interesting number.

Step 1

Recompute us

Every number here reproduces from the published tables + the stated assumption stack. We provide the tool and the seed.

Step 2

Audit DR2 splits

Run the same instrument over your tracer/hemisphere/systematics splits — calibrated localization of any internal inconsistency.

Step 3

DR3 regression

Standing audit on each future release: what moved, what moderated, what newly tensions — with measured error rates.

Contact: Koscak Research (independent, self-funded). Full run artifacts, data files with provenance headers, and the audit tool accompany this deck. We are not asking for endorsement — we are asking for one skeptical hour to recompute us.