Today the National Research Council released its new rankings of doctoral programs — the first it has produced since 1995. You can read more about their report “here”:http://sites.nationalacademies.org/PGA/Resdoc/.

The NRC calculates these rankings two different ways, using 3 inputs that largely come from data gathered in 2006-2007 (see “here”:http://chronicle.com/article/After-Years-of-Delay-NRC-D/65918/). The 3 inputs are as follows:

1. Department level data collected by the NRC on 20 different indicators (e.g., publications, citations, proportion of students who graduate within six years, etc.).

2. Survey data that asked respondents to evaluate how important each of the 20 indicators are to graduate program quality.

3. Survey data that asked respondents to rate a random subsample of 15 programs on a six point scale.

In the NRC S rankings, S denotes “survey-based.” In the S rankings, the indicator data obtained in (1) is weighted by the survey data in (2). Since there is inherent uncertainty in these rankings, this was done 500 times on random half-samples of respondents, which generates percentile rankings. The NRC presents the range from the 5th to 95th percentile. In other words, the ranking for any school is not a single estimate (#1, #2, etc.) but a 90% confidence interval of sorts.

In the NRC R rankings, R denotes “regression-based.” In the R rankings, the ratings in (3) are regressed on the indicators in (1) to generate a regression based set of weights. These weights were then applied to the indicators in (1) on the full set of schools to generate rankings. Since there is uncertainty in these rankings, the regressions were estimated 500 times on random half samples of respondents, which generates the percentile rankings.

Eric Lawrence and I constructed graphs of the top 25 schools in each of the NRC’s rankings. This is an arbitrary cut-off, of course. We indulged in a bit of self-congratulation by putting GW in red.

Three things to note.

First, the NRC methodology changed a great deal since their 1995 rankings. The two sets of rankings are not comparable.

Second, the R and S rankings are not the same. Some schools are in both, some are not. The ordering of schools changes as well. What might account for differences across the S and R ratings? The R ratings weigh the 20 indicators so that the variables themselves can account for a program’s reputation. The traits that historically prestigious departments have will be more important in the R ranking to the extent that (1) perceptions of these departments remain high and (2) the weights obtained via regression differ from the weights stated by respondents in the survey. And in fact, the weights differ across the two sets of rankings. In the R method, publications per faculty member is the sixth most important criterion behind (in order) the number of Ph.D.s granted, average GRE score, cites per publication, awards per faculty, and number of student activities offered. According to the survey items used in the S method, publications per faculty member is the most important criterion.

Third, and most importantly, the ratings contain considerable uncertainty. We simply cannot statistically distinguish many departments from one another. This is a good thing. Too much is made about specific ranks, even though no one department is clearly the “best” and lots of departments are just pretty similar.

UPDATE: See also this “Inside Higher Ed story”:http://www.insidehighered.com/news/2010/09/28/rankings.

{ 5 comments }

Wild differences between the two methods. The terminology (S-survey vs. R-regression) that the NRC uses isn’t very informative, however.

First of all keep in mind that both the S and the R methods use exactly the same “objective” input data about departments and faculty (publications, citations, honors won, GRE-Q for grad students, etc.).

The difference between the derived R and S scores reported in the charts above is entirely a result of different weights applied to the 20 objective input (performance) data.

You can go to the NRC website to download the data, or to another that’s about to appear online, and change the weights to ones that you think might be more appropriate, and generate your own rankings.

The S-weights and R-weights were generated in very different ways. But neither is very close to what was done the in the last NRC study.

The S-weights were generated by asking all respondents (roughly 85% of all eligible program faculty in all the 72 disciplines responded) to specify how much they consider each of the input/performance factors ought to matter in assessing program quality. They were not asked to actually evaluate programs — nothing akin to the “reputational” rating from the 1995 NRC report.

The R-weights were based on taking a sample of ca. 50-80 (a minimum of 40) respondents in each field (discipline) and asking the respondents to rate 15 named programs in their field (each on a 6-point scale — not much different from the way U.S. News does it). The respondents weren’t asked what criteria they used to rank the programs. But descriptive information about each program was provided for them to use if the wanted to use it.

However, the ratings themselves were used only as an instrument to derive via regression analysis the implicit importance (beta weights) of the 20 “objective” factors in the rating. The scores (the ratings) themselves weren’t used directly in the NRC’s report. Rather the weights derived via regression were applied to standardized input data (publications, citations, etc.) for each program and the weighted scores were added to get a total.

IMO, the R-method is somewhat similar to the old reputational method from the 1995 NRC ratings in that the regression analysis revealed that total publications (or publications per faculty member) don’t matter at all in how faculty evaluate programs. But citations to publications, i.e., the professional visibility or impact of the publications, do matter a lot — and are the strongest single factor in the regression result. As a result, in the R-ratings, the “weight” given to publications is perhaps only 10% as large as the weight given to citations (this from observation of some data in economics).

I suggest that somebody test my conjecture that the R-ratings are more similar to reputational ratings than are the S-ratings by regressing each of these ratings on the 1995 NRC ratings as well as on US News ratings.

(BTW/ keep in mind that the data for the latest study were current as of 2005.)

BTW/ to undertake the analysis that I propose, you would probably choose to take the midpoint estimate between the 5th and 95th percentiles.

Ive seen a lot of claims that the data are from 2005, but I know that I filled out the NRC productivity form at my school in the fall of 2006 (I was not at the school in 2005).

It would be nice if they actually tracked their graduate students performance in and after their time in the program. While not a complete picture of the value of the program, it certainly is an important factor in determining the “value” of attaining a PhD from said program.

PhDs.org has all of the NRC data online, and it is interactive. You can change the weighting of criteria. This means that although the default is the R-ranking, the S-ranking is also an option. (Click on “More options.”)

Similarly, you can specify the importance of specific variables (publications, citations, grants, etc.) if you click on “More options”.

Here’s the link to the political science rankings: http://graduate-school.phds.org/rankings/political-science

Comments on this entry are closed.