Most applicants don't pick a college list from scratch. They follow a pattern shaped by who they are and where they come from — and admissions officers can usually spot the pattern within the first page.
The simulation decomposes every applicant into two coordinates. The first is an identity dimension — the academic spike or interest pattern a student has built around. STEM, humanities, arts, athletics, a balanced profile, or simply "solid."
The second is a circumstance dimension — legacy access, first-gen status, household income, school resources. Read together, they predict everything from how many schools a student applies to, to which colleges feel like reaches versus realistic targets, to how strongly hooks like recruited-athlete or legacy actually move the needle.
Six identities × four positions = 24 distinct profiles. Most real people are a blend of two adjacent cells. That's normal — and it's exactly the kind of structure admissions readers see every cycle.
Start with the identity axis: the six recognizable types.
The mix isn't uniform. At an elite public magnet, more than a quarter of the class reads as a STEM spike. At an average suburban high school, that share collapses to 6 percent — not because the kids are different, but because the institutions filter and shape who emerges with which signature.
Boarding schools produce athletic recruits at three times the rate of urban publics. Affluent suburbs hold a long tail of average academic applicants — the largest single category outside elite settings. Humanities and arts spikes are rarer everywhere, and concentrated where money and coaching reach further into the curriculum.
Each archetype carries different baseline parameters: a STEM spike picks up +30 to +60 SAT points and an extra unit of EC quality; an arts spike has a smaller portfolio of applications (5 vs 7); a recruited athlete shows lower test scores and GPAs, partially offset by hook treatment downstream.
Now the second axis: the structural shape of each cohort.
Two students with identical scores and identical extracurriculars can occupy radically different positions. The model assigns one of four: high advantage (legacy 25%, donor 15%, income brackets 4–5), moderate (professional class), neutral (middle class), and disadvantaged (first-gen, Pell-eligible, URM 40%).
At an elite boarding school, 40 percent of the class lands in high advantage and only 5 percent in disadvantaged. Flip to an urban public, and the inverse holds: 64 percent disadvantaged, 1 percent high advantage. The school you attend is a strong prior on the resources you can deploy.
Structural position also moves a hidden lever — the application multiplier. High-advantage students apply to slightly fewer schools (×0.9) because they have anchor options; disadvantaged students apply to noticeably fewer (×0.75), constrained by fees and counseling load. That gap compounds over the cycle.
Identity and position together drive the first concrete number: how many schools you apply to.
Each student's list size K is drawn from a lognormal: take the mean for the archetype, scale by structural multiplier, take a log, add gaussian noise with standard deviation 0.4, exponentiate, clamp to the range 3–20. The result is a long-tailed distribution with a median around 6–8 and a fat upper end.
The base means split clearly by type. STEM spike applicants file the most (8); recruited athletes the fewest (4) — once a coach has slotted you, the list is largely settled. HSLS:09 corrections nudge Asian-American applicants up by one school and URM applicants up by one as well, both anchored to published cohort averages.
Income gradient also reshapes the distribution. The top quintile applies at 1.15× the population baseline; the bottom quintile at 0.85×. The stretch ratio is modest in absolute terms but large in tail behavior: top-quintile students are far more likely to file 12+ apps.
List size is a starting point. The much harder question: which colleges go on it.
Some pairings are obvious. MIT, Caltech, Harvey Mudd are 5/5 for STEM spike applicants
— perfect fit. Yale is 5/5 for humanities. The model's FIT_SCORES matrix encodes
these affinities as integers from 2 (default, no signal) to 5 (perfect match), and they enter the
utility function as a small but persistent thumb on the scale.
The structure is asymmetric. STEM spike students see strong fits at a small set of technical elites. Athletic spike students fan out across the Ivy & NESCAC recruiting circuit (Dartmouth, Cornell, Williams, Colgate). Well-rounded applicants get a long, flat list of 3-rated schools and zero 5s — broad compatibility, no strong signal.
And average academic students see almost nothing: four 3-rated schools (Northeastern, NYU, BC, USC) and the default 2 everywhere else. The fit-score channel is largely silent for them, which means utility ranking falls back almost entirely on prestige and admit odds.
Fit is one input. Now combine it with prestige and admit odds to actually rank the list.
Each student computes a utility for every college: U = prestige + fit + legacy + LAC + in-state + 4×log(P_admit). Prestige scales 0–25 (HYPSM tops the chart). The 4×log term means a 10× lower admission chance reduces utility by about 9 points — roughly the cost of dropping two tiers.
Schools are then sorted by utility and binned by log-odds of admission. The model's target allocation: 25% dreams (very low odds), 35% reaches, 25% targets, 15% safeties. This mirrors the Hossler & Gallagher 1987 framework taught in counseling offices.
The shape matters. A reach-heavy list is rational under the utility function: prestige dominates and the log-odds penalty is gentle. But it produces high-variance outcomes — many shutouts, a few wins. The dream/reach/target/safety ratios are an antidote, encoding the conventional wisdom that you should hedge.
Once the list is set, hooks and round timing do most of the rest.
Every admit decision runs through a logistic model: P = σ(academic + feeder + hooks + yield + in-state + round + interest). The hook multipliers are calibrated from SFFA v. Harvard testimony and Arcidiacono's published re-analyses. Donor at HYPSM: 12×. Legacy: 5.7×. Recruited athlete: 4.5×. First-gen and Pell get smaller bumps.
Multipliers compress as tier descends — legacy is worth 5.7× at HYPSM but only 2× at selective publics. Post-SFFA demographic adjustments are gentler still, ranging from 0.80× for URM at Ivy+ schools to 1.35× for Asian-American applicants at HYPSM. Gender weights sharpen at STEM-heavy schools: female applicants get a 1.9× lift at Caltech and MIT.
Round choice matters separately. ED adds a logit boost equal to log(rateE/rate), clamped between 1.2 and 8×. EDII gets a smaller version. EA is mostly neutral. RD is the baseline. These aren't free wins — ED is binding — but they explain why early apps consistently accept at multiples of regular round.
The model is detailed but not complete. Worth knowing what it leaves out.
The archetype distributions are calibrated by hand, not against HSLS:09 or ELS:2002 microdata with student-level activity profiles. The utility function is linear — no quadratic penalty for overreach — whereas Reardon (2016) penalizes both undermatching and overmatching with a quadratic term.
Students can't pivot mid-cycle. A STEM spike rejected at MIT doesn't reread herself as a humanities applicant. EC quality conflates a varsity letter with a portfolio: same numeric score, no type. And three real-world archetypes — pre-med spike, business spike, and social impact spike — are absent in the current six.
The hook multipliers themselves come from a single source for the largest two (legacy 5.7×,
donor 12.0× both from SFFA v. Harvard). First-gen 1.4× lacks a specific published
anchor. The phantom-applicant scaling factor of MODEL_SCALE = 0.013 has no published
justification at all — it's tuned to make total decisions match.
Treat the archetypes as a reading lens, not a verdict. Most applicants are a blend of two, and the most useful question isn't "which one am I?" but "which two, and what does that tell me about my list?"