What Are the Best Practices of Writing a Scale?

Written by Inkblot Analytics Editorial Team | May 9, 2022 2:00:00 PM

Measurement is a roadblock in psychological research. To measure a person’s abilities, attitudes, and traits, you cannot simply place them on a literal scale and record the “weight.” But wouldn’t it be nice if you could.

Features that are not directly observable—called latent variables—require a different kind of scale; one fit for psychological research. That’s why response scales were created, so that the mind can be effectively measured.

If you want to learn how to write a scale for psychometric investigation, the following tips might help you create an accurate and reliable tool.

When to Use a Scale

In psychology, scales are a series of statements or questions concise to a construct. They are used to measure psychological variables. Some common latent variables you can measure with a scale are:

Attitudes
Behaviors
Beliefs
Biases
Emotions
General intelligence
Personality traits
Satisfaction
Values

Whether your scope is experimental or market research, the best practices of writing a scale are generally the same.

Steps for Writing a Scale

The first thing to do when writing a scale is figure out what you want to measure.

Step 1: Define the topic.

While it may be tempting to jump right into writing items and questions, background research can save you time in the long run.

Take the latent variable of happiness for example. Sure, humans have an intuitive idea of what happiness is, but researchers define it in slightly different ways. How you define happiness will affect the way you measure it in your scale.

A typical definition of happiness is the experience of positive emotions and satisfaction. Using this definition, you might start to construct a scale that detects the presence of positive feelings and contentment.

However, there are other scientific definitions of happiness. Some common definitions include both affective and cognitive (i.e., feeling and thinking) appraisals of satisfaction, along with the presence of positive experiences and the absence of negative ones (Garaigordobil, 2015). Now your definition of happiness could include new items relating to thoughts and negative experiences.

It is a best practice to form a definition based on existing scientific literature (Kyriazos and Stalikas, 2018).

Step 2: Define the population.

Not only will preliminary research help you define the topic, it will also put a spotlight on the population you wish to study, effectively casting a shadow on the representative sample. However, if the sample size is not large enough, it could reduce the statistical power.

That’s why some researchers suggest that you consult a statistician during the early stages of scale writing (Jones et. al., 2013). Plus, as you’ll find out later on, fostering a relationship with a statistician (or psychometrician) will be extremely beneficial when you need to validate your scale.

Step 3: Generate the item pool.

Once you’ve identified your topic/population and defined them, you can start generating the items that will make up your scale. As a rule of thumb, you will need more than a single item, especially if you want to use statistical measures to prove reliability (Boateng et. al., 2018).

There are a few ways to generate items ranging from expert advice to conducting depth-interviews. First, consult with experts in your field of study. It is a best practice to ask experts to generate a list of topics or items they feel is relevant to the construct of interest.

You can also conduct another literature review. You may find that there is an existing scale (or several) already created for your topic of study. This will give you an idea of how many items/questions are in validated scales, which can be extremely helpful in developing an item pool for your scale.

If an existing scale matches your operational definition of the topic, use it! Validated scales are a great way to ensure accuracy.

For instance, two common happiness scales are the Subjective Happiness Scale—a four item scale which gives a general measure of happiness—and the Oxford Happiness Questionnaire—a 29 item scale which gives a measure of psychological well-being (Lyubomirsky and Lepper, 1999, Hills and Argyle, 2002).

If you decide to adapt or modify an existing scale, it’s a best practice to contact the researchers that created the scale and ask for permission; alterations could in effect alter the psychometric properties of the scale.

But depending on your research needs, you may need to create your own. One way to create your own items is by conducting in-depth interviews with a focus group, or individuals, from the population that your scale targets. These interviews can reveal themes common to your topic.

The purpose of conducting a literature review and depth interviews is to generate a large pool of items. Researchers recommend that your initial pool of items should be at least twice the size of the final scale, so it’s okay to include items that are not a perfect fit (Boateng et. al., 2018). You will filter those out later in the validation process.

Step 4: Format the items.

This step, in practice, is conducted in parallel with step three. While you collect and generate items for your pool, think about how you will format the items in terms of measurement scales, response types, and item wording.

Measurement: There are four main types of measurement scales.

Nominal scales can be used for descriptors, and are subject to limited forms of analysis. These scales are most typically used to track things like gender or affiliation to an ethnic, religious, or political party. You can use bar or pie charts to track the percentage or frequency of the identifiers.
Ordinal scales can rank things. For example, suppose you present the question—”Which location do you typically feel the happiest at?”—with the response options “1. Work 2. Home or 3. Outdoors (recreation).” With this data, you can rank which location people feel the most happy, and even find the median value.
Ratio scales have a true zero point. They are often used in measurements for number of responses and response time. For example, you could count the number of times that a person smiles in conversation and use that as a measure of happiness. From the ratio data, you could conclude that one person smiles twice as much as another person.
Interval scales have a definite order, the distance between responses is meaningful, but they do not have a zero point. A great example is an IQ test. A score of 50 on the test, when compared to 100, is 50 less points, but you cannot say that a person with an IQ of 100 has twice the intelligence of someone with a score of 50.

Response Options: Once you’ve decided how to measure your data, it’s time to select a response type. Here are four popular options:

Dichotomous Scales are simple yes/no, or true/false response options.
Likert Scales measure agreement on a scale of points typically ranging from Strongly Disagree (1) to Strongly Agree (5). Researchers recommend using five-to-seven points to improve reliability. Additionally, the inclusion of a midpoint can prevent participants from choosing a side, especially when the mid-point represents their desired response. Notably, Likert Scales are highly controversial. While some researchers think it is a cardinal sin to classify and analyze them as anything other than ordinal data, many researchers treat Likert scales like interval data, assuming equal spacing between response options (Wu and Leung, 2017).
Semantic Differential Scales present responses on a bi-polar continuum, and typically feature opposite adjectives. For example, you could ask someone to note how unhappy or happy they are by labeling the response options from 1-7 (1 being “Very Unhappy” and 7 being “Very Happy”). For bi-polar measures, seven point scales are recommended (Boateng et. al., 2018).
Graphic Scales are commonly used to rate experiences. For example, if you want to measure how happy a person is with a customer service experience, you could ask them to rate their satisfaction on a five-star graphic scale.

Response Labeling: Researchers found that labeling all of the response options makes items more clear than when labeling just the endpoints (Kyriazos and Stalikas, 2018). Below are two examples of fully labeled response options on a five-point Likert scale and a seven-point semantic differential scale.

Five-point Likert scale:

__Strongly Disagree __Disagree __Neutral __Agree __Strongly Agree

Seven-point semantic differential scale:

__Very Unhappy __Unhappy __Somewhat Unhappy __Neutral __Somewhat Happy __Happy __Very Happy

Item Wording: When you write and phrase the items in your scale, it is important to make sure that they are easily understood. Here are three things to consider when writing your items.

Keep It Short And Succinct. Using simple and clear language goes a long way. Double-barreled items (e.g., Are you a happy, healthy person?) add ambiguity to the item and can alter the participant's response. It is generally recommended to keep an item under 20 words (Kyriazos and Stalikas, 2018).
Always Use Good Grammar. Avoid using the past tense, double-negatives, or contractions.
Avoid Leading Questions. These types of questions introduce bias into the scale (Allen, 2017). For example, asking a participant a simple true or false question like “Do you feel happy after eating?” already leads them towards the response that they are happy. A better way to format this question is to ask “How do you feel after eating?” with a semantic differential scale ranging from Very Unhappy (1) to Very Happy (7).

Once your item pool is ready, it is time to get an outside opinion.

Step 5: Evaluate.

This step involves evaluating your item set based on expert opinion and cognitive interviewing.

First, you can consult experts in your field of study to evaluate the item list. This is a critical step in aligning your items with the topic you wish to measure. It can help you:

ensure that your items are relevant to your topic,
identify superfluous items that can be removed,
add missing items, and
remove bias (Kyriazos and Stalikas, 2018).

Next, you can administer drafts of your items. During these cognitive interviews, you can ask the participants to re-phrase the items in their own words and talk you through their answering process. This invaluable approach allows you to:

double-check that your items measure the target topic,
alter any unclear questions, and
optimize the ordering of the items along with the response options (Boateng et. al., 2018).

It is a best practice to identify clear trends from several cognitive interviews before making item changes (Gehlbach and Brinkworth, 2011).

Step 6: Test and refine.

The final step is the most important, and the most complicated. For simplicity’s sake, this final step can be distilled down to pilot-testing, analysis, and validation.

Pilot-testing involves administering your scale to a representative sample from your target population. Then, once the results are obtained, you can statistically analyze them.

While complicated, statistical analysis methods, including those aimed at reliability and validity, will ensure that your target population is responding to your scale congruent with the underlying theory. In short, you will be able to empirically explore how people respond to your scale, how accurately your scale measures what you want it to measure, if people are responding to surveys in the same way consistently, and much more.

In order to perform these statistical operations, background knowledge on the concepts of reliability and validity becomes essential. If you want to familiarize yourself with reliability and validity in psychological testing, check out our posts on the same subjects.

References:

Allen, M. (2017). The sage encyclopedia of communication research methods (Vols. 1-4). Thousand Oaks, CA: SAGE Publications.

Garaigordobil M. (2015). Predictor variables of happiness and its connection with risk and protective factors for health. Frontiers in psychology, 6, 1176.

Gehlbach, H., & Brinkworth, M. E. (2011). Measure Twice, Cut down Error: A Process for Enhancing the Validity of Survey Scales. Review of General Psychology, 15(4). pp.380–387.

Hills, P., & Argyle, M. (2002). The Oxford Happiness Questionnaire: a compact scale for the measurement of psychological well-being. Personality and Individual Differences, 33. pp. 1073–1082.

Jones, T. L., Baxter, M. A., & Khanduja, V. (2013). A quick guide to survey research. Annals of the Royal College of Surgeons of England, 95(1), pp. 5–7.

Kyriazos, TA, Stalikas, A (2018). Applied psychometrics: the steps of scale development and standardization process. Psychology 9. pp. 2531–2560.

Lyubomirsky, S. & Lepper, H. S. (1999). A measure of subjective happiness: Preliminary reliability and construct validation. Social Indicators Research, 46. pp.137-155.

Wu, H., Leung, S. (2017) Can Likert Scales be Treated as Interval Scales?—A Simulation Study, Journal of Social Service Research, 43:4. pp. 527-532.

View full post