An Introduction
Originally developed by Roger Tourangeau (1984), the Cognitive Process Model of Question Response posits that a respondent engages in four distinct cognitive processes when responding to a question:
Here are two items (question + response scale) to help us think through the cognitive processes involved in responding to a question or statement:
My mentor treats me with respect. (Strongly Disagree to Strongly Agree)
Senior administrators at UGA show that they care about the university employees. (Never to All the Time)
Do you think they are well designed items?
“Survey respondents are often asked to expend a great deal of cognitive effort for little or no apparent reward.”
Krosnick, 1991
Survey fatigue is real!
All of our scale design choices should strike a balance between scientific rigor and helping out the respondent.
Writing questions is easy. Writing good questions is hard! There are four important dimensions to writing a good question:
Questions should be relevant to the construct they are designed to measure.
Use the construct definition in your measurement framework to help generate questions.
Do not repeat questions, you need multiple distinct questions per scale.
Your question should measure one and only one construct.
Keep items and words within an item relevant. Use only words that your respondents will understand and are relevant to the study context.
Know your target respondents, and tailor the questions to them.
Consider the respondents’ cognitive skills when writing questions (are they adults or children?)
Consider if respondents will have sufficient information to accurately respond to the question
Consider if respondents can accurately recall the information needed to answer the question
Consider the diversity of your respondents (multicultural, non-bias questions)
Use clear and simple language that is familiar to the respondents.
Use simple and familiar words
Use objective, non-leading phrases (stay away from emotion-laden words)
Use concrete and precise language
Write questions at a middle school reading level when possible
Questions should be structured as brief, simple sentences (subject, verb, complete thought).
Keep questions to 20 words or less with no more than three commas
Use complete sentences
Questions should present only a single idea—no conjunctions
Avoid negative phrases if possible
There are three important dimensions to writing a good response scale:
Response scales should have between five and seven response points.
Survey data collected using five to seven response points can be treated as continuous
Reliability levels off between five and seven response points
Consider how well respondents can discriminate among different response points
You should label every response point with a distinct label.
Label should be consistent with the question / statement
The two endpoint labels should be opposites (Strongly Disagree and Strongly Agree)
Labels should be equal intervals along a continuum (Strongly Disagree, Disagree, Agree, Strongly Agree)
Labels should be ordered from negative (Disagree) to positive (Agree)
Avoid using response options that make it difficult to distinguish between respondents who hold different opinions, judgments, or attitudes.
Use an even number of response points (6 points)
Only use a “Do Not Know” option if you believe uninformed respondents might provide inaccurate responses
Visually separate the “Do Not Know” option from the rest of the response scale
Before piloting your items, it is important to subject them to a content analysis or expert review:
Content Adequacy1 is an analysis that informs you as to how well your questions correspond to the construct they were designed to measure—it is similar to content validity. Colquitt et al. demonstrate how to expand this analysis to the scale level.2
Have a sample of individuals (\(N \geq 50\)) rate the extent to which a question corresponds to the definition of the construct it is measuring (1 = Not at all to 5 = Completely). These ratings are referred to as the definitional correspondence ratings.
Have that same sample rate the extent to which a question corresponds to a definition of a different, but related construct—an orbiting construct. These ratings are referred to as the orbital correspondence ratings.
Use an ANOVA to compare the definitional correspondence ratings to the orbital correspondence ratings and retain only those items which have statistically higher average definitional correspondence values compared to the average orbital correspondence ratings.
Target Population: The population of inferential interest.
Sample Frame: A list of units that make up the population.
Sample: A randomly drawn set of units from the sampling frame.
\(\text{Sample Size} \approx\max(300, 10\times\text{number of questions})\)
For each item, you will want to know if respondents had any difficulty responding to the question or using the response scale. There are two ways to gather this information:
Think-alouds: Gather a sub-sample of respondents and have them talk through their thought processes as they respond to each question.
Closed-Response Questions about Item Quality: For each question, ask the respondent if they had any difficulty understanding what the question was asking or if the response scale was difficult to use.
Item Response Distributions: Are respondents using the full response scale?
Item and Scale means: Are responses inflated or deflated?
Item and Scale variability: Are responses clumped together?
Corrected Item-Total Correlations: How correlated is the item with a sum score that consists of every other item in the scale?
Inter-item and inter-scale correlations: Are certain items / scales too related or not related enough?
Reliability can be thought of as an index of how much of the observed variance in your scale is due to true variation on the focal construct.
Coefficient Alpha is easily the most used and reported on reliability estimate.
Alpha ranges between 0 and 1 and you typically want values greater than .80
For each scale and item in a scale, you will want to calculate alpha-if-item-deleted values
“Validity refers to the degree to which evidence and theory support the interpretation of test scores for proposed uses of test.”1
Remove or change items based on the information you gathered from the pilot sample
Give out the revised scales (survey) to a new sample (or a holdout sample from your original sample) and repeat the same analyses.
Document all of your work in a test manual!