Where's This Come From?
All of the data you see here comes from the Department of Education's wonderful
College Scorecard dataset. Specifically, it's mashed up from two
huge CSV files:
This data is compiled based on the subset of students who take out federal student loans or grants, so it's not by any means a complete picture.
There are also significant gaps in the data where costs or earnings are unknown or, as listed in the data, "Privacy Suppressed".
To be honest, some of the data is kind of a pain. For instance, there are multiple colleges with the same name, which is especially true for
cosmetology schools. Some are different branches in different areas and some are unrelated. There are other cases where the OPEID or UNITID fields
are either blank or incorrect (for example, they refer to the main campus rather than the satellite campus that the data is for). In other cases,
the primary URL for a school is empty or just plain wrong. So there's issues here and there. As we find these, we'll try to get them updated.
The data is only based on federal student loan and grant programs, so it does not include those who take out private loans, or have the good fortune
to be able to cover the cost of college themselves, or scholarships, etc.
I was surprisied to learn that the Dept. of Education measures completion rates at the 8 year mark. So when you look on collegescorecard
at a specific school and see their graduation rate - for example, at
Granite State College you'll see a rate of 42%.
This is the 8 year rate, which they mention in the small print infobox, if you hover over it. There's probably good reasons for this. Granite State is
an online college, and so the students are going to be far more likely to already have fulltime jobs and families. But it seems a bit disingenuous, especially
when the 6 year graduation rate is only 14% and the 4 year rate is 3%!
Incomes for both college and field rankings are based on numbers 1 year post-graduation. Median earnings at the college level are measured at 6-10
year levels and are listed for colleges, but there are some questions to detail here.
DOE has started breaking this up based on family income, but this isn't represented on CV yet. They have three
terciles for this: low-income: $30,000 or less; middle-income: $30,001-$75,000; and high-income: $75,001+. How payback
rate affects what is reported is not specified. In other words, it seems like the only income statistics received are for those still paying loans.
If some students are able to pay off their loans much quicker, their (possibly skewed higher) income is not accounted.
This could affect 6-10 year median incomes much more than
the 1 year post-graduation incomes used here to compile rankings.
One of the reasons for building this site in the first place is to test the hypothesis that the correlation of majors and colleges together often matter
more than either variable alone. It appears that this is the case, for example the expected earnings across nursing degrees varies wildly, especially when
accounting for debt loads and graduation rates.
It's important to note that this is still a very limited and possibly skewed view. Some details noted by the DOE:
"One of the most common reasons students cite in choosing to go to college is the expansion of
employment opportunities. To that end, data on the earnings and employment prospects of former
students can provide key information. To measure the labor market outcomes of individuals attending
institutions of higher education, data on cohorts of federally aided undergraduate students were linked
with earnings data from de-identified tax records and reported back at the aggregate, institutional level.
Mean earnings data elements at the institution-level were last updated in the fall of 2018."
"There are two notable limitations that researchers should keep in mind for all of these metrics. First,
research suggests that the variation across programs within an institution may be even greater than
aggregate earnings across institutions. For information related to more recent earnings calculations by
field of study, please see the technical documentation for field of study data files. Second, the data
include only Title IV-receiving students, so figures may not be representative of institutions with a low
proportion of Title IV-eligible students. Additionally, the data are restricted to students who are not
enrolled (enrolled means having an in-school deferment status for at least 30 days of the measurement
so students who are currently enrolled in, for example, graduate school at the time of
measurement are excluded."
One key insight is that the amount of debt incurred is independent from completion rate, and the students are still beholden to this debt load!
As Bryan Caplan
and others have pointed out, the majority of the value of an undergraduate degree is in the last year and actually receiving the diploma rather than averaged
over 4 years.
The DOE says:
"At institutions where large numbers of students withdraw before completion, a lower median debt level could simply reflect the lack of time that a typical student spends at the institution. Therefore, the Department uses the typical debt level for students who complete (GRAD_DEBT_MDN_SUPP or GRAD_DEBT_MDN10YR_SUPP for the debt level expressed in monthly payments26) on the consumer website. Additionally, this measure can be placed in context by looking at the borrowing rate of students at the institution (FTFTPCTFLOAN; see above); at institutions where few students borrow, the numbers may represent outliers."
For colleges, we break this down and show median debt for graduates, withdrawals, and both. For individual majors, the debt is based on the loans for only those students who completed the program.
This is a sparse matrix. A lot of the data is empty to maintain student privacy. We can still get bigger trends in a lot of cases, but
smaller institutions or fields will not have much data.
From the DOE:
"..Those data that do not meet
reporting standards are shown as PrivacySuppressed. Note that for many elements, we have also taken
additional steps to ensure data are stable from year to year and representative of a certain number of
students. For many elements, data are pooled across two years of data to reduce year-over-year
variability in figures (i.e. repayment rate, debt figures, earnings). Moreover, for elements that are
highlighted on the consumer-facing College Scorecard, a separate version of the element is available
that suppresses data for institutions with fewer than 30 students in the denominator to ensure data are
as representative as possible."
Naming and Franchises
Some schools, especially for-profit organizations, have many branches spread out in different cities but they often only report
one set of statistics for the entire institution. Over time, we plan to decompose these into their proper grouping. Examples include
Strayer, University of Phoenix, Cortiva Institute, etc.