HYP3 — Sophia Bae

Overview

HYP3 measures the distance between how loudly the internet talked about each team and how far they actually went in the NCAA Men's Basketball Tournament. The site ranks all 68 teams twice, once by Google Trends search interest and once by tournament wins, and shows you the gap.

play

HYP3.dev

year: 2026
time frame: 8 Weeks
role: Solo Designer
& Developer
disciplines: Data Visualization
Web Development
Information Design
tools: Python [pytrends]
D3 [ReCharts]
React [Shadcn/ui]
Tailwind CSS
Vercel
Claude Code
Figma
Git

The Brief

Was the internet right to hype whichever team performed well?

This was a self-directed studio project. The prompt was open: I wanted to build something that uses data to make an argument, and I was heavily inspired by the 2026 March Madness NCAA Men's Basketball tournament. Google Trends is the closest thing we have to a public record of what the internet was paying attention to, day by day. It doesn't care what ESPN said. It records what people typed.

March Madness was the right test case. 68 teams, a fixed window, a clear ground truth (wins), and a fanbase that talks online constantly. If hype and performance ever diverge in a measurable way, they diverge here.

The Tension

How might we turn a feeling into a number that holds up every season?

Hype is a feeling. The site needs a number.

The hardest part of the project wasn’t visual. It was definitional. Hype could mean a lot of things. Twitter mentions, YouTube view counts, ESPN coverage volume, Vegas odds, bracket pick frequency. I had to pick one I could actually measure, repeatedly, for every team, in a way that wouldn’t fall apart next season. If the definition wavers, the whole site wavers with it. Every view, every color, every field in the JSON schema falls out of this one choice.

I landed on Google Trends search interest with two modes. Tournament mode looks at the 15-day window around Selection Sunday. This is hype at peak attention, who the internet is thinking about right now in the moment the bracket goes live. Season mode looks at Nov 1 through Selection Sunday + 9, hype as accumulated narrative, who built a story over the whole season. Same teams, same scoring rule, different windows. The contrast between modes is itself a finding.

Thought Process

I focused on three decisions that shaped the rest of the build.

1. Define hype as a date range, not a feeling.

A team’s hype score is the mean of its daily Google Trends search interest across the window. That’s it. The definition is boring on purpose. Boring is repeatable. Boring is what lets the same pipeline run next March, and the March after that, without me re-deciding what I think hype means.

2. Solve the anchoring problem before anything else.

Google Trends returns values from 0 to 100, but that scale is only valid within a single query. Searching one team alone gives a curve normalized to its own peak. Searching five teams together rescales all of them relative to whichever was biggest. Two queries are not directly comparable.

This was fine for five teams, but it broke at 68. The fix is to pick a reference team. It should be a year-round national program with reliable signal. Pull it alone first to record its true curve. Then for every batch of five teams, include the anchor as the fifth slot. Trends rescales the batch internally, but the anchor in this batch is now a known multiple of the true anchor curve. Divide every team in the batch by that multiple. They land on the anchor's true scale, comparable to teams in every other batch. A shared yardstick, derived from the data rather than assumed.

Nothing in the Trends docs tells you to do this. I found it the way you find most things in data work. The numbers were obviously wrong, and I refused to ship them.

3. The dataset is the product, too.

The whole year’s data is one JSON file bundled with the site. No backend, no auth, no rate limits. GET /data/2026.json and you have everything. Every team, every daily curve, every computed field, including the gap value precomputed so the chart code stays dumb.

I did this because the project is more useful if other people can build on it. A different chart, a different ranking, someone’s own definition of performance plugged in, the file is right there. It’s also a constraint I set for myself. If the site runs as a static deploy with one JSON, I haven’t over-engineered it.

The Build

The site is live at hyp3.dev. Pipeline and source are on GitHub.

Initially, the data pipeline came first. There was nothing to design until the numbers existed, and the visual decisions came out of staring at the JSON. That constraint was useful. Every layout decision had to survive contact with real numbers, and the design got tighter because of it. I started in Figma but moved to building and iterating in notebook-and-browser early on, and that ended up being the right call.

One dataset, four views.

One view wasn’t enough. Each view answers a different question, and the site is shaped so a visitor can pick the question that matches how they think.

01 / Divergent ranks all 68 teams by gap score, from most overhyped to most underhyped. Each row shows the team's seed, name, and gap value. Tap any row to expand team details. This is the headline view, the full ranking readable at a glance.

02 / Scatter plots hype index on the x-axis and wins on the y-axis. The diagonal is the expected line. Teams above it were underhyped, teams below were overhyped, teams on it got it right. This is the view for skeptics, the relationship without ranking anyone.

03 / Timeline is a heatmap. One row per team, one column per day across the 15-day window around Selection Sunday. The color intensity shows where search interest concentrated. Use it to ask when people started paying attention, whether a team's hype built gradually or spiked at the buzzer.

04 / Bracket is the actual tournament structure, arranged by region and seed, with every team colored by their gap category: overhyped, underhyped, as expected, or noise. The bracket as it played out, recolored by who was oversold.

What I Considered (and Cut)

A live API instead of a static JSON.

An earlier version had the site hit Trends and the NCAA endpoint at request time. It was slower, fragile, and offered nothing the static file doesn’t. I cached everything at build time and the experience got better in every direction, including for the people pulling the dataset themselves.

Predictive scoring.

I considered adding a model that uses early-season hype to predict tournament performance. I cut it. The whole premise of the site is descriptive, not predictive. Adding a forecast would have shifted the argument from “here is what happened” to “here is what should happen next,” and that’s a different project with a different burden of proof.

A “noise” filter.

Some teams have query strings that collide with non-basketball searches (Stanford, for example, surfaces a lot of university traffic). I considered filtering these out or flagging them as noisy. I left them in and documented the query strings instead. Hiding the messiness would have made the data look more authoritative than it is.

Future Steps

I'd like to rerun the pipeline next March. The schema is frozen so the new year drops in without changes. Backfill 2024 and 2023 next, since five years of data is when the patterns start to be patterns.

The longest-running open question is the anchor. The current reference team works, but the field of year-round national programs with reliable signal is small and changes over time. A composite anchor built from several stable teams would be sturdier across seasons.

The method is not basketball-specific either. The World Cup and the College Football Playoff are the next obvious targets, with the same window structure but on different scales.

Interested in Hearing More?

I’d love to chat more about my process: