Methodology · College Baseball

Perception Divergence

How Blaze Sports Intel measures what consensus sees versus what we see, and when the gap is big enough to name.

What this is

Every college baseball game on a game-thread page that qualifies for coverage shows a Blaze Sports Intel verdict in the right rail. The verdict compares two probabilities: a Blaze Sports Intel model estimate of who wins the game, and a consensus estimate derived from polls and opponent-adjusted form. When they disagree enough, we name the disagreement.

This is an intelligence product, not a betting product. The math under the hood is the same math a gambling model would use; the output is editorial — what Blaze Sports Intel sees that consensus doesn't.

v1 coverage policy

The card only appears on two kinds of games:

SEC weekend series — Friday, Saturday, and Sunday games between two SEC teams.
Ranked vs ranked — any game where both teams appear in a current top-25 poll, any day of the week.

All other games render the normal game thread without a verdict card. Broader coverage (ACC, Big 12 weekends, bubble-team matchups, mid-week non-conference) comes after the v1 backtest validates the thesis.

The two layers

Blaze Sports Intel Model

Team strength is a composite of team-level offensive and pitching metrics, schedule-adjusted. Specifically: PA-weighted team wRC+ (offense), IP-weighted team FIP (pitching, lower is better), and a conference strength index. The composite comes from the existing Blaze Sports Intel Power Rankings surface, which recomputes every six hours when the Savant pipeline writes fresh advanced metrics to D1.

Team strength converts to per-game win probability via a logistic curve with a home-field bonus. No XGBoost, no machine-learning training loop in v1 — the composite itself carries enough signal for a two-layer divergence product. A per-game model layer (incorporating starter and bullpen structure) is a v1.1 addition.

Consensus Perception

Weighted composite from four components. Two are active; two are reserved.

Component	Weight (v1)	Status
Poll composite	0.60	active in v1
Opponent-adjusted form	0.40	active in v1
RPI	0.00	reserved for v1.1
Selection pressure	0.00	reserved for v1.1

RPI and selection-pressure ingestion are deferred to v1.1. Weights rescale dynamically when a component is missing — for an early-season team with fewer than five completed games, polls alone carry the composite. When RPI comes online, weights proportionally redistribute (polls 0.35, form 0.25, RPI 0.25, selection 0.15 are the current target).

The math

Normalization to [0, 1]:

Polls: average rank across D1Baseball, Baseball America, and the Coaches Poll. Unranked teams receive rank 40. Normalized as 1 − (avg_rank − 1) / 39.
Form: last-10 games, actual wins divided by log5-expected wins against opponents of known strength. Clamped to [0.5, 1.5] then rescaled to [0, 1].

Pairwise probability:

delta = strength_home − strength_away + HOME_BONUS
P(home wins) = 1 / (1 + exp(-K × delta))

HOME_BONUS = 0.04  (from 5-season CBB home win rate ~56%)
K          = 4.0   (calibration, retuned post-backtest)

Applied once for Blaze Sports Intel strength, once for consensus strength. Divergence is the difference between the two home-win probabilities, expressed in percentage points.

Reliability gate

Magnitude and stability stay on separate axes. A 20-point divergence on shaky inputs is not a verdict — it's a watchlist. Four binary flags, any single FAIL sends the verdict to WATCHLIST.

v1 note: Starter-certainty and weather-freshness sources are not yet ingested. Until they are (v1.1), those two checks auto-PASS — the card's "Reliability: HIGH" in v1 means feature-completeness and lineup checks passed. The moment a starter or weather feed lands, those gates tighten automatically; the math already exists in code.

Flag	PASS condition	FAIL condition
Starter certainty	Confirmed probable starters within 48h of first pitch OR deterministic rotation pattern.	TBD, unreliable source, or rotation disrupted.
Lineup certainty	Projected lineup available OR stable over last 5 games (≤1 position change).	≥2 starters uncertain or high lineup churn.
Weather freshness	Outdoor forecast within 24h of first pitch; no active advisory. Indoor venues auto-pass.	Stale forecast OR advisory in window.
Feature completeness	≥95% of quantitative features populated for both teams.	<95% populated or form window too small (early season).

Verdict states

Verdict	Condition
BLAZE SPORTS INTEL LEAN [team]	Model home-win probability minus consensus home-win probability ≥ +6pp AND reliability = HIGH. The team named is whichever side Blaze Sports Intel rates higher than consensus does.
CONSENSUS LEAN [team]	Model is ≥6pp lower than consensus on the team consensus has as favorite. In other words: consensus loves this team, Blaze Sports Intel does not.
NO EDGE	Blaze Sports Intel and consensus agree within 6pp. Reliability passes.
WATCHLIST	Reliability gate fails on any of the four flags, regardless of divergence magnitude.

The divergence threshold is 6 percentage points in v1. Below that, game-to-game noise dominates; above, the signal is strong enough to name. Threshold retunes after the first full backtest pass.

Claude's role

Claude writes the narrative — the headline and 2-3 grounded reasons beneath each verdict. Claude does not emit any number that feeds the model. Features are computed conventionally from D1 data; Claude reads them and interprets.

Every numeric claim in a Claude-written reason is checked server-side against the feature pack. If a reason cites a stat that isn't in the pack, the response is rejected and Claude retries once under a stricter prompt. A second failure falls back to a deterministic template narrative — the verdict still renders, the voice just becomes more clinical.

Known gaps

RPI and selection-pressure signals are reserved in the consensus composite, not yet ingested.
Probable-starter data is not ingested; the starter-certainty check auto-passes in v1.
Game-day weather forecasts are not ingested; the weather-freshness check auto-passes in v1 for all outdoor venues.
Opponent-adjusted form requires last-10 game results; teams with fewer than five completed games receive polls-only consensus until form is available.
Blaze Sports Intel strength data currently comes from the cached power-rankings composite, which caps to qualifying teams. Games involving a team outside that set fall to WATCHLIST on feature completeness.
Per-game starter + bullpen structure is not yet part of the model.
CLV (closing line value) validation applies only to top-25 matchups where sharp-book lines exist.

Validation

The v1 thesis is perception lag, not sportsbook alpha. The backtest measures whether BLAZE SPORTS INTEL LEAN calls precede movement in consensus — do flagged teams move up in polls and RPI over the following three weeks, and do they outperform consensus-expected win rate over the following ten games? Brier score and calibration curves are secondary diagnostics. CLV against closing lines is a side check, not a pass/fail criterion.

Weights and constants pinned in workers/shared/perception/constants.ts. Changes to the values on this page are tracked in commit history alongside re-run backtests.

Blaze Sports Intel

Sports Intelligence Put Simply