← Back to Blog
The Code behind Elite Edge
Mackey Arena is the loudest place I've ever been. It's the pride of Purdue Basketball and, most of the time, an impenetrable fortress. It's also where I fell in love with college basketball — the passion, the chaos, the upsets. And as someone who spends most of his time in data, it was only a matter of time before I tried to model it.
This post walks through the core of the Final Four prediction pipeline I built for Elite Edge — a live college basketball analytics platform. The model runs on KenPom efficiency data, trains an XGBoost classifier on historical seasons, and outputs a ranked probability list of Final Four contenders that updates throughout the season.
The Data
The backbone of the model is KenPom's adjusted efficiency metrics — the gold standard for evaluating college basketball teams independent of schedule strength. Each row in the dataset represents one team in one season, with features like:
AdjEM — Adjusted Efficiency Margin (offense minus defense, schedule-adjusted)
AdjOE / AdjDE — Adjusted Offensive and Defensive Efficiency per 100 possessions
AdjTempo — Adjusted pace of play
eFG% — Effective field goal percentage (accounts for 3-point value)
TO%, OR%, FT Rate — The four factors of basketball
SOS / Luck — Strength of schedule and luck-adjusted record
The target variable is binary: did this team reach the Final Four that year? Because Final Four teams are rare (4 out of ~350 each season), the dataset is heavily imbalanced. That's a problem we have to handle explicitly.
Feature Engineering & Scaling
The most important design decision in this pipeline is scaling within each season, not across all seasons. A team with an AdjEM of +20 means something very different in a weak year versus a loaded one. By z-score normalizing each feature per season, we let the model learn relative dominance rather than absolute numbers.
In [1]:
def scale_per_season(df, cols):
    """Z-score normalize features grouped by season (used during training)."""
    out = df.copy()
    grouped = out.groupby("Season")[cols]
    out[cols] = (grouped.transform(lambda x: x) - grouped.transform("mean")) \
                / grouped.transform("std")
    return out
At prediction time — when we only have the current season's data — we apply the same logic but over just that single season:
In [2]:
def scale_single_season(df, cols):
    """Z-score normalize features within one season (no other seasons needed)."""
    out = df.copy()
    for c in cols:
        mean = out[c].mean()
        std = out[c].std()
        out[c] = (out[c] - mean) / std if std > 0 else 0.0
    return out
Feature columns are selected dynamically — anything numeric that isn't a rank column, team metadata, or the target label. This keeps the pipeline clean as KenPom adds or removes columns over time.
In [3]:
DROP_COLS = ["TeamName", "Coach", "ConfShort", "Event", "Seed", "FinalFour"]

def get_feature_cols(df):
    rank_cols = [c for c in df.columns if c.startswith("Rank") or c.endswith("Rank")]
    drop = set(DROP_COLS + rank_cols + ["Season"])
    return [c for c in df.columns if c not in drop and df[c].dtype in ("float64", "int64")]
The Model
I chose XGBoost for a few reasons: it handles the class imbalance well via scale_pos_weight, it's robust to correlated features (which efficiency metrics definitely are), and it gives reliable probability outputs that we can rank teams by.
In [4]:
pos = y_train.sum()
neg = len(y_train) - pos

model = XGBClassifier(
    n_estimators=300,
    max_depth=4,
    learning_rate=0.05,
    scale_pos_weight=neg / pos,   # ~85:1 ratio — critical for rare events
    subsample=0.8,
    colsample_bytree=0.8,
    eval_metric="logloss",
    random_state=42,
)
model.fit(X_train, y_train)
The scale_pos_weight parameter tells XGBoost how much more to penalize missing a Final Four team versus incorrectly flagging a non-contender. With roughly 85 non-Final-Four teams for every Final Four team in the training data, this ratio is essential — without it, the model would just predict "no" for everyone and be 98% accurate while being completely useless.
The Pipeline
The full pipeline has two modes. On first run, it trains from scratch on all historical seasons and pickles the model. On every subsequent run (daily during the season), it just loads the saved model and re-scores the current year's fresh data:
In [5]:
# First-time setup: train on all seasons, save model
# python export_f4_predictions.py --train

# Daily update: load saved model, score fresh 2026 data
# python export_f4_predictions.py --year 2026

scaled = scale_single_season(raw, feature_cols)
scaled["prob"] = model.predict_proba(scaled[feature_cols].values)[:, 1]
top20 = raw.nlargest(20, "prob")
The output is a JSON file consumed by the Elite Edge frontend — a ranked list of the top 20 Final Four contenders with their probability, efficiency margins, and four-factor stats. It looks like this:
Out [5]:
Exported 20 teams → f4_predictions.json

1. Auburn       (62.3%)
2. Duke         (58.1%)
3. Houston      (54.7%)
4. Florida      (51.2%)
...

lastUpdated: March 09, 2026  |  modelRecall: 0.45
What the Model Has Learned
After training on 12+ years of KenPom data, a few patterns are consistent. Final Four teams almost always sit in the top tier of AdjEM — the model rarely picks a team outside the top 15 nationally. Defensive efficiency tends to be a stronger signal than offensive efficiency; elite defenses are harder to replicate with a hot shooting streak. And luck-adjusted records matter: teams that look good but have been propped up by close wins tend to get filtered out.
The model achieves around 45% recall — meaning it correctly identifies roughly 2 of the 4 Final Four teams in a given year. That sounds modest, but consider that a random bracket picker has about a 1-in-350 chance of getting any single team right. The model is consistently finding real contenders; it just can't predict the chaos that makes March Madness what it is.
And honestly? That chaos is the whole point. The model is a tool for thinking, not a crystal ball. It tells you which teams the data says should be there. What actually happens in March is a different story — and that's what makes it worth watching.