Survey methodology in university rankings: a critical reader's guide
How to evaluate whether a survey-based ranking indicator is trustworthy, from sampling to question design to response aggregation.
The ubiquity and invisibility of surveys in rankings
Survey data powers some of the most influential components of university rankings. Academic reputation surveys ask scholars to name the best institutions in their field. Employer reputation surveys ask recruiters which universities produce the best graduates. Student satisfaction surveys ask current students about their experience. These surveys collectively account for a large share of the weighting in many major ranking systems, yet the survey methodology itself is often presented only in summary form, leaving users with insufficient information to assess the reliability of the results.
A survey is not a neutral measurement. Its results are shaped by who is invited to participate, who chooses to respond, how the questions are framed, the order in which they are asked, the scale on which responses are recorded, and how responses are aggregated and normalized. A well-designed survey can provide genuine insight; a poorly designed one can produce results that are misleading or meaningless. As a ranking user, you need enough information about the survey methodology to distinguish between the two.
Sampling: who is asked and who answers
The first question to ask about any survey-based indicator is who was surveyed. Is the sample drawn from a defined population, such as all academics in a particular discipline or all employers in a particular sector? How were potential respondents identified and invited? If the sample is self-selected—for example, an open online survey that anyone can complete—the results may be dominated by respondents with particularly strong opinions or institutional loyalties, and cannot be generalized to any broader population.
The response rate is equally important. If a survey is sent to 100,000 people and 5,000 respond, the results may be biased if respondents differ systematically from non-respondents. Academics from prestigious institutions may be more likely to respond to reputation surveys than those from less well-known institutions, reinforcing existing hierarchies. Employers from large multinational companies may be overrepresented compared to small or medium enterprises. A transparent ranking publisher will report the sampling frame, the invitation method, the response rate, and any analysis of non-response bias.
Question design and measurement scales
How questions are worded profoundly affects the answers. A survey that asks respondents to name the top institutions in their field without further guidance will produce different results from one that asks them to rate a list of institutions on a scale of one to five. The first approach captures top-of-mind awareness, which favors well-known names. The second approach captures more considered evaluations but may introduce anchoring effects, where the order or framing of the list influences the ratings.
The measurement scale also matters. A five-point scale and a ten-point scale produce different distributions, even if the underlying opinions are identical. Respondents from different cultural backgrounds use scales differently; some cultures tend toward extreme responses while others tend toward moderation. If the survey does not account for these cultural response styles, cross-national comparisons may be distorted. A well-documented survey methodology will explain how these issues were addressed.
Aggregation and what it obscures
Once individual responses are collected, they must be aggregated into institutional scores. This aggregation step involves choices that can significantly affect results. How are multiple responses for the same institution combined? Simple averaging gives equal weight to each respondent, but if some respondents are more knowledgeable than others, weighted averaging might be more appropriate. How are missing responses—where a respondent does not rate an institution they do not know—handled? Treating missing responses as zero unfairly penalizes less well-known institutions; treating them as neutral may inflate scores.
Geographic and disciplinary weighting also affects aggregation. If more survey responses come from Europe than from Africa, a simple average will give more weight to European perspectives. Weighting responses to achieve regional balance may improve fairness but introduces additional assumptions. A transparent methodology explains these aggregation choices and provides sensitivity analyses showing how much results change under alternative aggregation methods. As a user, you should look for this information and, if it is not provided, treat the resulting scores with appropriate caution.
The presence of surveys in rankings is not inherently problematic. Surveys capture dimensions of reputation and experience that quantitative indicators cannot reach. But the reliability of survey-based indicators depends entirely on the quality of the survey methodology. A ranking-literate user does not dismiss survey data or accept it uncritically, but evaluates it against the standards described here: sampling, response rates, question design, and aggregation methods. When these standards are met, survey data earns a place in your decision-making. When they are not, survey data deserves the skepticism you would apply to any poorly conducted poll.