A handful of high schools send hundreds of students to Harvard. The rest of the country sends almost none. The most consequential pipeline in American education is also one of the worst-documented — and the few datasets that do exist sit behind paywalls, court seals, and school logins.
Almost every high school senior who matriculates anywhere is, in principle, in a database. The National Student Clearinghouse covers some 3,600 institutions and roughly 97% of all postsecondary students. It can answer the question every guidance counselor wants to know: from this high school, who went where?
It just won't answer it for you. The Clearinghouse's flagship product — StudentTracker — is sold to subscribing high schools and districts. Schools upload rosters; they get back colleges. The general public sees only national aggregates, published in the High School Benchmarks Report and term-enrollment estimates.
Researchers can apply for microdata. Reporters can't. Parents and students can't. The single best dataset in the country for understanding feeder patterns is, by design, a paywalled product.
When the federal data goes dark, reporters turn to the states.
A handful of states publish college-going rates at the individual high school level. California's CDE leads the field, with downloadable CSVs at the state, county, district, and school level — broken down by race, ethnicity, and student group.
Texas publishes student-data dashboards. Illinois leans on the NCES State Dashboard. These are useful — they tell you whether graduates of a given high school go to college. They do not tell you which college.
That last mile — the actual school-to-college edge — sits inside StudentTracker. State systems track the source node and the destination type. The destination itself is dark.
For named-school-to-named-college data, you have to wait for journalists.
In November 2024, the Harvard Crimson published the most complete public look at Harvard's feeder schools ever produced. Reporters analyzed 15 matriculated classes of Freshman Register data — entries from 2009 through 2024 — and counted how often each high school appeared.
The headline: 21 schools sent 2,200 or more students to Harvard over those 15 years. That's roughly one in eleven Harvard admits. The country has more than 24,000 high schools.
Even within the elite tier, the distribution is steeply skewed. Phillips Andover, Boston Latin, Stuyvesant, and Phillips Exeter each crossed the 100-student mark on their own.
Tighten the lens further and the concentration sharpens.
Drop from 21 schools to seven. The Crimson found that 5% of Harvard freshmen over the same window came from Boston Latin, Phillips Andover, Stuyvesant, Noble & Greenough, Phillips Exeter, Trinity (NYC), and Lexington High School.
The list is not what most people imagine. Two are public exam schools — Boston Latin and Stuyvesant. One — Lexington High — is a regular public high school in an affluent suburb. The other four are private day or boarding schools clustered around Boston and New York.
These seven institutions occupy roughly 0.03% of American high schools. They produce 5% of Harvard's class. The ratio is not subtle.
Zoom out from those seven and another pattern shows up.
Of the 21 elite feeders, 12 are private, with average tuition near $64,000 per year. The other 9 are public — but public in a very specific way.
Four are selective magnets. Four are affluent suburban high schools. One is a single local public school. Even among the "public" feeders, the path runs almost entirely through admissions tests, neighborhood wealth, or both.
The pattern at peer institutions is consistent. Private-school students account for roughly 25-30% of Ivy League undergraduates, with some schools running considerably hotter than that.
The wealth on display in the high school list is also visible inside the Harvard admissions file.
The 2014-2019 Harvard admissions microdata, made public through the SFFA v. Harvard trial, gave economists a very rare look inside an Ivy admit pile. Arcidiacono, Kinsler, and Ransom did the math: 43% of white admits were ALDCs — Athletes, Legacies, Dean's-list applicants, or Children of faculty and staff.
Those tags are not garnish. The authors estimate that roughly three-quarters of ALDC admits would have been rejected without those preferences. Recruited athletes are admitted at 86%; legacies at 33%; dean's-list applicants at 42%; faculty children at 47%. The overall admit rate in the same data is roughly 6%.
ALDC status is also racially concentrated. Recruited athletes, legacies, and dean's-list applicants are 68% or more white. Among typical applicants, the share is below 41%.
A separate analysis of the same trial data tracks the other side of the ledger.
Chetty, Deming, and Friedman's 2023 paper used anonymized admissions records from the Ivy-Plus colleges — the eight Ivies plus Stanford, MIT, Duke, and the University of Chicago — linked to tax records and standardized scores.
Children from top-1% families are roughly twice as likely to attend an Ivy-Plus institution as middle-class students with comparable test scores. The advantage, the authors find, is almost entirely driven by attending an elite private high school — which boosts non-academic ratings — and by legacy and athletic preferences.
At flagship public colleges, that high-income advantage disappears. High-income applicants do not have an admissions edge there at all.
Looked at across the full earnings distribution, the gradient is even starker. Children with parents in the top 1% are 77 times more likely to attend Ivy-Plus than children whose parents are in the bottom 20%.
Move from national data to a single metro and the same shape repeats.
On the West Coast, a project called Chicardgo publishes an Elite College Placement Index for Los Angeles private schools. It uses publicly posted matriculation lists to compute the share of each school's class landing at a Top-25 national or Top-15 liberal arts college.
At the top of the LA list: Harvard-Westlake, where 50% of graduates land at one of those colleges. Polytechnic and Marlborough sit at 40% apiece.
These rates are not error bars on a national average. They are the average. For roughly half a graduating class, the elite-college pipeline is the default outcome, not a long shot.
All of this data has to be turned into something a simulation can use.
Stitching the available sources together — Crimson counts, Chetty multipliers, SFFA admit rates, California college-going files, Opportunity Insights mobility data — gives us enough to set feeder-school parameters in the simulation.
Each of the 20 modeled high schools is mapped to one of five archetypes, each with its own HYPSM and Top-20 placement rate. Elite boarding schools sit at one extreme; under-resourced public schools at the other. The order of magnitude between them is roughly 100×.
The point of building it this way is not to predict who gets in. It's to make the inequality in the pipeline a thing you can watch the simulation reproduce — and then a thing you can attempt to fix, by tugging on the parameters one at a time.
The American feeder-school map is mostly drawn from the outside. A handful of newspapers (mostly campus newspapers), a small set of academic papers tied to a single trial, and a few states with strong open-data laws together form most of what is publicly known.
The richest sources — NSC StudentTracker for actual school-to-college edges, Naviance for per-school admit/deny scattergrams, the SFFA microdata with Harvard's own ratings — are restricted, paywalled, or sealed in court records.
For the simulation, the gap is bridged in a specific, conservative way: feeder rates per archetype are calibrated to published numbers, the private-school multiplier is set near 2× to match the Chetty estimates, and the cross-tier reject cascade follows the same shape implied by the Crimson concentration.
| Archetype | HYPSM placement | Top-20 placement |
|---|---|---|
| Elite boarding school | 15-20% | 40-50% |
| Selective magnet | 8-12% | 30-40% |
| Affluent suburban | 5-8% | 20-30% |
| Average public | 0.5-1% | 5-10% |
| Under-resourced public | ~0.1% | 2-5% |
Source: research/data_feeder_schools.md, "Recommended Approach for the Simulation".