
Blaze Sports Intel
Born to Blaze the Path Beaten Less

Blaze Sports Intel
Born to Blaze the Path Beaten Less
Methodology · College Baseball
How BSI measures what consensus sees versus what BSI sees, and when the gap is big enough to name.
Every college baseball game on a game-thread page that qualifies for coverage shows a BSI Intel verdict in the right rail. The verdict compares two probabilities: a BSI model estimate of who wins the game, and a consensus estimate derived from polls and opponent-adjusted form. When they disagree enough, we name the disagreement.
This is an intelligence product, not a betting product. The math under the hood is the same math a gambling model would use; the output is editorial — what BSI sees that consensus doesn't.
The card only appears on two kinds of games:
All other games render the normal game thread without a verdict card. Broader coverage (ACC, Big 12 weekends, bubble-team matchups, mid-week non-conference) comes after the v1 backtest validates the thesis.
Team strength is a composite of team-level offensive and pitching metrics, schedule-adjusted. Specifically: PA-weighted team wRC+ (offense), IP-weighted team FIP (pitching, lower is better), and a conference strength index. The composite comes from the existing BSI Power Rankings surface, which recomputes every six hours when the Savant pipeline writes fresh advanced metrics to D1.
Team strength converts to per-game win probability via a logistic curve with a home-field bonus. No XGBoost, no machine-learning training loop in v1 — the composite itself carries enough signal for a two-layer divergence product. A per-game model layer (incorporating starter and bullpen structure) is a v1.1 addition.
Weighted composite from four components. Two are active; two are reserved.
| Component | Weight (v1) | Status |
|---|---|---|
| Poll composite | 0.60 | active in v1 |
| Opponent-adjusted form | 0.40 | active in v1 |
| RPI | 0.00 | reserved for v1.1 |
| Selection pressure | 0.00 | reserved for v1.1 |
RPI and selection-pressure ingestion are deferred to v1.1. Weights rescale dynamically when a component is missing — for an early-season team with fewer than five completed games, polls alone carry the composite. When RPI comes online, weights proportionally redistribute (polls 0.35, form 0.25, RPI 0.25, selection 0.15 are the current target).
Normalization to [0, 1]:
1 − (avg_rank − 1) / 39.Pairwise probability:
delta = strength_home − strength_away + HOME_BONUS P(home wins) = 1 / (1 + exp(-K × delta)) HOME_BONUS = 0.04 (from 5-season CBB home win rate ~56%) K = 4.0 (calibration, retuned post-backtest)
Applied once for BSI strength, once for consensus strength. Divergence is the difference between the two home-win probabilities, expressed in percentage points.
Magnitude and stability stay on separate axes. A 20-point divergence on shaky inputs is not a verdict — it's a watchlist. Four binary flags, any single FAIL sends the verdict to WATCHLIST.
v1 note: Starter-certainty and weather-freshness sources are not yet ingested. Until they are (v1.1), those two checks auto-PASS — the card's "Reliability: HIGH" in v1 means feature-completeness and lineup checks passed. The moment a starter or weather feed lands, those gates tighten automatically; the math already exists in code.
| Flag | PASS condition | FAIL condition |
|---|---|---|
| Starter certainty | Confirmed probable starters within 48h of first pitch OR deterministic rotation pattern. | TBD, unreliable source, or rotation disrupted. |
| Lineup certainty | Projected lineup available OR stable over last 5 games (≤1 position change). | ≥2 starters uncertain or high lineup churn. |
| Weather freshness | Outdoor forecast within 24h of first pitch; no active advisory. Indoor venues auto-pass. | Stale forecast OR advisory in window. |
| Feature completeness | ≥95% of quantitative features populated for both teams. | <95% populated or form window too small (early season). |
| Verdict | Condition |
|---|---|
| BSI LEAN [team] | Model home-win probability minus consensus home-win probability ≥ +6pp AND reliability = HIGH. The team named is whichever side BSI rates higher than consensus does. |
| CONSENSUS LEAN [team] | Model is ≥6pp lower than consensus on the team consensus has as favorite. In other words: consensus loves this team, BSI does not. |
| NO EDGE | BSI and consensus agree within 6pp. Reliability passes. |
| WATCHLIST | Reliability gate fails on any of the four flags, regardless of divergence magnitude. |
The divergence threshold is 6 percentage points in v1. Below that, game-to-game noise dominates; above, the signal is strong enough to name. Threshold retunes after the first full backtest pass.
Claude writes the narrative — the headline and 2-3 grounded reasons beneath each verdict. Claude does not emit any number that feeds the model. Features are computed conventionally from D1 data; Claude reads them and interprets.
Every numeric claim in a Claude-written reason is checked server-side against the feature pack. If a reason cites a stat that isn't in the pack, the response is rejected and Claude retries once under a stricter prompt. A second failure falls back to a deterministic template narrative — the verdict still renders, the voice just becomes more clinical.
The v1 thesis is perception lag, not sportsbook alpha. The backtest measures whether BSI LEAN calls precede movement in consensus — do flagged teams move up in polls and RPI over the following three weeks, and do they outperform consensus-expected win rate over the following ten games? Brier score and calibration curves are secondary diagnostics. CLV against closing lines is a side check, not a pass/fail criterion.