When you think of the term experiment, what comes to mind? Perhaps you thought about trying a new soda or changing your cat’s litter to a different brand. We all design informal experiments in our life. We try new things and seek to learn how those things changed us or how they compare to other things we might try. We even create entertainment programs like Mythbusters whose hosts use experimental methods to test whether common myths or bits of folk knowledge are actually true. It’s likely you’ve already developed an intuitive sense of how experiments work. The content of this chapter will increase your existing competency about using experiments to learn about the social world.
Chapter Outline
- 8.1 Experimental design: What is it and when should it be used?
- 8.2 Quasi-experimental and pre-experimental designs
- 8.3 The logic of experimental design
Content Advisory
This chapter discusses or mentions the following topics: substance abuse; eating disorders; prejudice; hurricane Katrina; domestic violence; racism; poverty; trauma; teen pregnancy, sexually transmitted infections and condom use.
8.1 Experimental design: What is it and when should it be used?
Learning Objectives
- Define experiment
- Identify the core features of true experimental designs
- Describe the difference between an experimental group and a control group
- Identify and describe the various types of true experimental designs
Experiments are an excellent data collection strategy for social workers wishing to observe the effects of a clinical intervention or social welfare program. Understanding what experiments are and how they are conducted is useful for all social scientists, whether they actually plan to use this methodology or simply aim to understand findings from experimental studies. An experiment is a method of data collection designed to test hypotheses under controlled conditions. In social scientific research, the term experiment has a precise meaning and should not be used to describe all research methodologies.
Experiments have a long and important history in social science. Behaviorists such as John Watson, B. F. Skinner, Ivan Pavlov, and Albert Bandura used experimental design to demonstrate the various types of conditioning. Using strictly controlled environments, behaviorists were able to isolate a single stimulus as the cause of measurable differences in behavior or physiological responses. The foundations of social learning theory and behavior modification are found in experimental research projects. Moreover, behaviorist experiments brought psychology and social science away from the abstract world of Freudian analysis and towards empirical inquiry, grounded in real-world observations and objectively-defined variables. Experiments are used at all levels of social work inquiry, including agency-based experiments that test therapeutic interventions and policy experiments that test new programs.
Several kinds of experimental designs exist. In general, designs considered to be true experiments contain three basic key features:
- random assignment of participants into experimental and control groups
- a “treatment” (or intervention) provided to the experimental group
- measurement of the effects of the treatment in a post-test administered to both groups
Some true experiments are more complex. Their designs can also include a pre-test and can have more than two groups, but these are the minimum requirements for a design to be a true experiment.
Experimental and control groups
In a true experiment, the effect of an intervention is tested by comparing two groups: one that is exposed to the intervention (the experimentalgroup, also known as the treatment group) and another that does not receive the intervention (the controlgroup). Importantly, participants in a true experiment need to be randomly assigned to either the control or experimental groups. Random assignment uses a random number generator or some other random process to assign people into experimental and control groups. Random assignment is important in experimental research because it helps to ensure that the experimental group and control group are comparable and that any differences between the experimental and control groups are due to random chance. We will address more of the logic behind random assignment in the next section.
Treatment or intervention
In an experiment, the independent variable is receiving the intervention being tested—for example, a therapeutic technique, prevention program, or access to some service or support. It is less common in of social work research, but social science research may also have a stimulus, rather than an intervention as the independent variable. For example, an electric shock or a reading about death might be used as a stimulus to provoke a response.
In some cases, it may be immoral to withhold treatment completely from a control group within an experiment. If you recruited two groups of people with severe addiction and only provided treatment to one group, the other group would likely suffer. For these cases, researchers use a control group that receives “treatment as usual.” Experimenters must clearly define what treatment as usual means. For example, a standard treatment in substance abuse recovery is attending Alcoholics Anonymous or Narcotics Anonymous meetings. A substance abuse researcher conducting an experiment may use twelve-step programs in their control group and use their experimental intervention in the experimental group. The results would show whether the experimental intervention worked better than normal treatment, which is useful information.
Post-test
The dependent variable is usually the intended effect the researcher wants the intervention to have. If the researcher is testing a new therapy for individuals with binge eating disorder, their dependent variable may be the number of binge eating episodes a participant reports. The researcher likely expects her intervention to decrease the number of binge eating episodes reported by participants. Thus, she must, at a minimum, measure the number of episodes that occur after the intervention, which is the post-test. In a classic experimental design, participants are also given a pretest to measure the dependent variable before the experimental treatment begins.
Types of experimental design
Let’s put these concepts in chronological order so we can better understand how an experiment runs from start to finish. Once you’ve collected your sample, you’ll need to randomly assign your participants to the experimental group and control group. In a common type of experimental design, you will then give both groups your pretest, which measures your dependent variable, to see what your participants are like before you start your intervention. Next, you will provide your intervention, or independent variable, to your experimental group, but not to your control group. Many interventions last a few weeks or months to complete, particularly therapeutic treatments. Finally, you will administer your post-test to both groups to observe any changes in your dependent variable. What we’ve just described is known as the classical experimental design and is the simplest type of true experimental design. All of the designs we review in this section are variations on this approach. Figure 8.1 visually represents these steps.
An interesting example of experimental research can be found in Shannon K. McCoy and Brenda Major’s (2003) study of people’s perceptions of prejudice. In one portion of this multifaceted study, all participants were given a pretest to assess their levels of depression. No significant differences in depression were found between the experimental and control groups during the pretest. Participants in the experimental group were then asked to read an article suggesting that prejudice against their own racial group is severe and pervasive, while participants in the control group were asked to read an article suggesting that prejudice against a racial group other than their own is severe and pervasive. Clearly, these were not meant to be interventions or treatments to help depression, but were stimuli designed to elicit changes in people’s depression levels. Upon measuring depression scores during the post-test period, the researchers discovered that those who had received the experimental stimulus (the article citing prejudice against their same racial group) reported greater depression than those in the control group. This is just one of many examples of social scientific experimental research.
In addition to classic experimental design, there are two other ways of designing experiments that are considered to fall within the purview of “true” experiments (Babbie, 2010; Campbell & Stanley, 1963). The posttest-only control group design is almost the same as classic experimental design, except it does not use a pretest. Researchers who use posttest-only designs want to eliminate testing effects, in which participants’ scores on a measure change because they have already been exposed to it. If you took multiple SAT or ACT practice exams before you took the real one you sent to colleges, you’ve taken advantage of testing effects to get a better score. Considering the previous example on racism and depression, participants who are given a pretest about depression before being exposed to the stimulus would likely assume that the intervention is designed to address depression. That knowledge could cause them to answer differently on the post-test than they otherwise would. In theory, as long as the control and experimental groups have been determined randomly and are therefore comparable, no pretest is needed. However, most researchers prefer to use pretests in case randomization did not result in equivalent groups and to help assess change over time within both the experimental and control groups.
Researchers wishing to account for testing effects but also gather pretest data can use a Solomon four-group design. In the Solomon four-group design, the researcher uses four groups. Two groups are treated as they would be in a classic experiment—pretest, experimental group intervention, and post-test. The other two groups do not receive the pretest, though one receives the intervention. All groups are given the post-test. Table 8.1 illustrates the features of each of the four groups in the Solomon four-group design. By having one set of experimental and control groups that complete the pretest (Groups 1 and 2) and another set that does not complete the pretest (Groups 3 and 4), researchers using the Solomon four-group design can account for testing effects in their analysis.
Pretest | Stimulus | Posttest | |
Group 1 | X | X | X |
Group 2 | X | X | |
Group 3 | X | X | |
Group 4 | X |
Solomon four-group designs are challenging to implement in the real world because they are time- and resource-intensive. Researchers must recruit enough participants to create four groups and implement interventions in two of them.
Overall, true experimental designs are sometimes difficult to implement in a real-world practice environment. It may be impossible to withhold treatment from a control group or randomly assign participants in a study. In these cases, pre-experimental and quasi-experimental designs–which we will discuss in the next section–can be used. However, the differences in rigor from true experimental designs leave their conclusions more open to critique.
Experimental design in macro-level research
You can imagine that social work researchers may be limited in their ability to use random assignment when examining the effects of governmental policy on individuals. For example, it is unlikely that a researcher could randomly assign some states to implement decriminalization of recreational marijuana and some states not to in order to assess the effects of the policy change. There are, however, important examples of policy experiments that use random assignment, including the Oregon Medicaid experiment. In the Oregon Medicaid experiment, the wait list for Oregon was so long, state officials conducted a lottery to see who from the wait list would receive Medicaid (Baicker et al., 2013). Researchers used the lottery as a natural experiment that included random assignment. People selected to be a part of Medicaid were the experimental group and those on the wait list were in the control group. There are some practical complications macro-level experiments, just as with other experiments. For example, the ethical concern with using people on a wait list as a control group exists in macro-level research just as it does in micro-level research.
Key Takeaways
- True experimental designs require random assignment.
- Control groups do not receive an intervention, and experimental groups receive an intervention.
- The basic components of a true experiment include a pretest, posttest, control group, and experimental group.
- Testing effects may cause researchers to use variations on the classic experimental design.
Glossary
- Classic experimental design- uses random assignment, an experimental and control group, as well as pre- and posttesting
- Control group- the group in an experiment that does not receive the intervention
- Experiment- a method of data collection designed to test hypotheses under controlled conditions
- Experimental group- the group in an experiment that receives the intervention
- Posttest- a measurement taken after the intervention
- Posttest-only control group design- a type of experimental design that uses random assignment, and an experimental and control group, but does not use a pretest
- Pretest- a measurement taken prior to the intervention
- Random assignment-using a random process to assign people into experimental and control groups
- Solomon four-group design- uses random assignment, two experimental and two control groups, pretests for half of the groups, and posttests for all
- Testing effects- when a participant’s scores on a measure change because they have already been exposed to it
- True experiments- a group of experimental designs that contain independent and dependent variables, pretesting and post testing, and experimental and control groups
8.2 Quasi-experimental and pre-experimental designs
Learning Objectives
- Identify and describe the various types of quasi-experimental designs
- Distinguish true experimental designs from quasi-experimental and pre-experimental designs
- Identify and describe the various types of quasi-experimental and pre-experimental designs
As we discussed in the previous section, time, funding, and ethics may limit a researcher’s ability to conduct a true experiment. For researchers in the medical sciences and social work, conducting a true experiment could require denying needed treatment to clients, which is a clear ethical violation. Even those whose research may not involve the administration of needed medications or treatments may be limited in their ability to conduct a classic experiment. When true experiments are not possible, researchers often use quasi-experimental designs.
Quasi-experimental designs
Quasi-experimental designs are similar to true experiments, but they lack random assignment to experimental and control groups. Quasi-experimental designs have a comparison group that is similar to a control group except assignment to the comparison group is not determined by random assignment. The most basic of these quasi-experimental designs is the nonequivalent comparison groups design (Rubin & Babbie, 2017). The nonequivalent comparison group design looks a lot like the classic experimental design, except it does not use random assignment. In many cases, these groups may already exist. For example, a researcher might conduct research at two different agency sites, one of which receives the intervention and the other does not. No one was assigned to treatment or comparison groups. Those groupings existed prior to the study. While this method is more convenient for real-world research, it is less likely that that the groups are comparable than if they had been determined by random assignment. Perhaps the treatment group has a characteristic that is unique–for example, higher income or different diagnoses–that make the treatment more effective.
Quasi-experiments are particularly useful in social welfare policy research. Social welfare policy researchers often look for what are termed natural experiments, or situations in which comparable groups are created by differences that already occur in the real world. Natural experiments are a feature of the social world that allows researchers to use the logic of experimental design to investigate the connection between variables. For example, Stratmann and Wille (2016) were interested in the effects of a state healthcare policy called Certificate of Need on the quality of hospitals. They clearly could not randomly assign states to adopt one set of policies or another. Instead, researchers used hospital referral regions, or the areas from which hospitals draw their patients, that spanned across state lines. Because the hospitals were in the same referral region, researchers could be pretty sure that the client characteristics were pretty similar. In this way, they could classify patients in experimental and comparison groups without dictating state policy or telling people where to live.
Matching is another approach in quasi-experimental design for assigning people to experimental and comparison groups. It begins with researchers thinking about what variables are important in their study, particularly demographic variables or attributes that might impact their dependent variable. Individual matching involves pairing participants with similar attributes. Then, the matched pair is split—with one participant going to the experimental group and the other to the comparison group. An ex post facto control group, in contrast, is when a researcher matches individuals after the intervention is administered to some participants. Finally, researchers may engage in aggregate matching, in which the comparison group is determined to be similar on important variables.
Time series design
There are many different quasi-experimental designs in addition to the nonequivalent comparison group design described earlier. Describing all of them is beyond the scope of this textbook, but one more design is worth mentioning. The time series design uses multiple observations before and after an intervention. In some cases, experimental and comparison groups are used. In other cases where that is not feasible, a single experimental group is used. By using multiple observations before and after the intervention, the researcher can better understand the true value of the dependent variable in each participant before the intervention starts. Additionally, multiple observations afterwards allow the researcher to see whether the intervention had lasting effects on participants. Time series designs are similar to single-subjects designs, which we will discuss in Chapter 15.
Pre-experimental design
When true experiments and quasi-experiments are not possible, researchers may turn to a pre-experimental design (Campbell & Stanley, 1963). Pre-experimental designs are called such because they often happen as a pre-cursor to conducting a true experiment. Researchers want to see if their interventions will have some effect on a small group of people before they seek funding and dedicate time to conduct a true experiment. Pre-experimental designs, thus, are usually conducted as a first step towards establishing the evidence for or against an intervention. However, this type of design comes with some unique disadvantages, which we’ll describe below.
A commonly used type of pre-experiment is the one-group pretest post-testdesign. In this design, pre- and posttests are both administered, but there is no comparison group to which to compare the experimental group. Researchers may be able to make the claim that participants receiving the treatment experienced a change in the dependent variable, but they cannot begin to claim that the change was the result of the treatment without a comparison group. Imagine if the students in your research class completed a questionnaire about their level of stress at the beginning of the semester. Then your professor taught you mindfulness techniques throughout the semester. At the end of the semester, she administers the stress survey again. What if levels of stress went up? Could she conclude that the mindfulness techniques caused stress? Not without a comparison group! If there was a comparison group, she would be able to recognize that all students experienced higher stress at the end of the semester than the beginning of the semester, not just the students in her research class.
In cases where the administration of a pretest is cost prohibitive or otherwise not possible, a one-shot case study design might be used. In this instance, no pretest is administered, nor is a comparison group present. If we wished to measure the impact of a natural disaster, such as Hurricane Katrina for example, we might conduct a pre-experiment by identifying a community that was hit by the hurricane and then measuring the levels of stress in the community. Researchers using this design must be extremely cautious about making claims regarding the effect of the treatment or stimulus. They have no idea what the levels of stress in the community were before the hurricane hit nor can they compare the stress levels to a community that was not affected by the hurricane. Nonetheless, this design can be useful for exploratory studies aimed at testing a measures or the feasibility of further study.
In our example of the study of the impact of Hurricane Katrina, a researcher might choose to examine the effects of the hurricane by identifying a group from a community that experienced the hurricane and a comparison group from a similar community that had not been hit by the hurricane. This study design, called a static group comparison, has the advantage of including a comparison group that did not experience the stimulus (in this case, the hurricane). Unfortunately, the design only uses for post-tests, so it is not possible to know if the groups were comparable before the stimulus or intervention. As you might have guessed from our example, static group comparisons are useful in cases where a researcher cannot control or predict whether, when, or how the stimulus is administered, as in the case of natural disasters.
As implied by the preceding examples where we considered studying the impact of Hurricane Katrina, experiments, quasi-experiments, and pre-experiments do not necessarily need to take place in the controlled setting of a lab. In fact, many applied researchers rely on experiments to assess the impact and effectiveness of various programs and policies. You might recall our discussion of arresting perpetrators of domestic violence in Chapter 2, which is an excellent example of an applied experiment. Researchers did not subject participants to conditions in a lab setting; instead, they applied their stimulus (in this case, arrest) to some subjects in the field and they also had a control group in the field that did not receive the stimulus (and therefore were not arrested).
Key Takeaways
- Quasi-experimental designs do not use random assignment.
- Comparison groups are used in quasi-experiments.
- Matching is a way of improving the comparability of experimental and comparison groups.
- Quasi-experimental designs and pre-experimental designs are often used when experimental designs are impractical.
- Quasi-experimental and pre-experimental designs may be easier to carry out, but they lack the rigor of true experiments.
Glossary
- Aggregate matching – when the comparison group is determined to be similar to the experimental group along important variables
- Comparison group – a group in quasi-experimental design that does not receive the experimental treatment; it is similar to a control group except assignment to the comparison group is not determined by random assignment
- Ex post facto control group – a control group created when a researcher matches individuals after the intervention is administered
- Individual matching – pairing participants with similar attributes for the purpose of assignment to groups
- Natural experiments – situations in which comparable groups are created by differences that already occur in the real world
- Nonequivalent comparison group design – a quasi-experimental design similar to a classic experimental design but without random assignment
- One-group pretest post-test design – a pre-experimental design that applies an intervention to one group but also includes a pretest
- One-shot case study – a pre-experimental design that applies an intervention to only one group without a pretest
- Pre-experimental designs – a variation of experimental design that lacks the rigor of experiments and is often used before a true experiment is conducted
- Quasi-experimental design – designs lack random assignment to experimental and control groups
- Static group design – uses an experimental group and a comparison group, without random assignment and pretesting
- Time series design – a quasi-experimental design that uses multiple observations before and after an intervention
8.3 The logic of experimental design
Learning Objectives
- Apply the criterion of causality to experimental design
- Define internal validity and external validity
- Identify and define threats to internal validity
As we discussed at the beginning of this chapter, experimental design is commonly understood and implemented informally in everyday life. Trying out a new restaurant, dating a new person—we often call these things “experiments.” As you’ve learned over the past two sections, in order for something to be a true experiment, or even a quasi- or pre-experiment, you must rigorously apply the various components of experimental design. A true experiment for trying a new restaurant would include recruitment of a large enough sample, random assignment to control and experimental groups, pretesting and posttesting, as well as using clearly and objectively defined measures of satisfaction with the restaurant.
Social scientists use this level of rigor and control because they try to maximize the internal validity of their research. Internal validity is the confidence researchers have about whether the independent variable intervention truly produced a change in the dependent variable. In the case of experimental design, the independent variable is the intervention or treatment. Experiments are attempts to establish causality between two variables—the treatment and its intended outcome.
As we talked about in Chapter 4, nomothetic causal explanations must establish four criteria: covariation, plausibility, temporality, and nonspuriousness. The logic and rigor of experimental design allows for causality to be established. Experimenters can assess covariation on the dependent variable through pre- and post-tests. The use of experimental and control conditions ensures that some people receive the intervention and others do not, providing variation in the independent variable (i.e., receiving the treatment). Moreover, since the researcher controls when the intervention is administered, she can be assured that changes in the independent variable (the treatment) happened before changes the dependent variable (the outcome). In this way, experiments assure temporality. In our restaurant experiment, we would know through assignment to experimental and control groups that people varied in the restaurant they attended. We would also know whether their level of satisfaction changed, as measured by the pre- and posttest. We would also know that changes in our diners’ satisfaction occurred after they left the restaurant, not before they walked in because of the pre- and post-tests.
Experimenters also have a plausible reason why their intervention would cause changes in the dependent variable. Usually, a theory or previous empirical evidence should indicate the potential for a causal relationship. Perhaps we found a national poll that found the type of food our experimental restaurant served, let’s say pizza, is the most popular food in America. Perhaps this restaurant has good reviews on Yelp or Google. This evidence would give us a plausible reason to establish the restaurant as causing satisfaction.
One of the most important features of experiments is that they allow researchers to eliminate spurious variables. True experiments are usually conducted under strictly controlled conditions. The intervention is given in the same way to each person, with a minimal number of other variables that might cause their post-test scores to change. In our restaurant example, this level of control might prove difficult. We cannot control how many people are waiting for a table, whether participants saw someone famous there, or if there is bad weather. Any of these factors might cause a diner to be less satisfied with their meal. These spurious variables may cause changes in satisfaction that have nothing to do with the restaurant itself, an important problem in real-world research. For this reason, experiments try to control as many aspects of the research process as possible: using control groups, having large enough sample sizes, standardizing the treatment, etc. Researchers in large experiments often employ clinicians or other research staff to help them. Researchers train their staff members exhaustively, provide pre-scripted responses to common questions, and control the physical environment of the experiment so each person who participates receives the exact same treatment.
Experimental researchers also document their procedures, so that others can review them and make changes in future research if they think it will improve on the ability to control for spurious variables. An interesting example is Bruce Alexander’s (2010) Rat Park experiments. Much of the early research conducted on addictive drugs, like heroin and cocaine, was conducted on animals other than humans, usually mice or rats. The scientific consensus up until Alexander’s experiments was that cocaine and heroin were so addictive that rats, if offered the drugs, would consume them repeatedly until they perished. Researchers claimed this behavior explained how addiction worked in humans, but Alexander was not so sure. He knew rats were social animals and the experimental procedure from previous experiments did not allow them to socialize. Instead, rats were kept isolated in small cages with only food, water, and metal walls. To Alexander, social isolation was a spurious variable, causing changes in addictive behavior not due to the drug itself. Alexander created an experiment of his own, in which rats were allowed to run freely in an interesting environment, socialize and mate with other rats, and of course, drink from a solution that contained an addictive drug. In this environment, rats did not become hopelessly addicted to drugs. In fact, they had little interest in the substance. To Alexander, the results of his experiment demonstrated that social isolation was more of a causal factor for addiction than the drug itself.
One challenge with Alexander’s findings is that subsequent researchers have had mixed success replicating his findings (e.g., Petrie, 1996; Solinas, Thiriet, El Rawas, Lardeux, & Jaber, 2009). Replication involves conducting another researcher’s experiment in the same manner and seeing if it produces the same results. If the causal relationship is real, it should occur in all (or at least most) replications of the experiment.
One of the defining features of experiments is that they report their procedures diligently, which allows for easier replication. Recently, researchers at the Reproducibility Project have caused a significant controversy in social science fields like psychology (Open Science Collaboration, 2015). In one study, researchers attempted reproduce the results of 100 experiments published in major psychology journals between 2008 and the present. What they found was shocking. The results of only 36% of the studies were reproducible. Despite coordinating closely with the original researchers, the Reproducibility Project found that nearly two-thirds of psychology experiments published in respected journals were not reproducible. The implications of the Reproducibility Project are staggering, and social scientists are coming up with new ways to ensure researchers do not cherry-pick data or change their hypotheses, simply to get published.
Let’s return to Alexander’s Rat Park study and consider the implications of his experiment for substance use professionals. The conclusions he drew from his experiments on rats were meant to generalize to the population of people with substance use disorders. If this could be done, the experiment would have high degree of external validity, which is the degree to which conclusions generalize to larger populations and different situations. Alexander argues his conclusions about addiction and social isolation help us understand why people living in deprived, isolated environments may become addicted to drugs more often than those in more enriching environments. Similarly, earlier rat researchers argued their results showed these drugs were instantly addictive to humans, often to the point of death.
Neither study’s results will match up perfectly with real life. There are clients in social work practice who may fit into Alexander’s social isolation model, but social isolation is complex. Clients can live in environments with other sociable humans, work jobs, and have romantic relationships; does this mean they are not socially isolated? On the other hand, clients may face structural racism, poverty, trauma, and other challenges that may contribute their social environment. Alexander’s work helps understand clients’ experiences, but the explanation is incomplete. Human existence is more complicated than the experimental conditions in Rat Park.
Social workers are especially attentive to how social context shapes social life. So, we are likely to point out a specific disadvantage of experiments. They are rather artificial. How often do real-world social interactions occur in the same way that they do in a controlled experiment? Experiments that are conducted in community settings may not be as subject to artificiality as those in a research lab, but their conditions are less easily controlled. This demonstrates the tension between internal and external validity. Internal validity and external validity are conceptually linked. Internal validity refers to the degree to which the intervention causes its intended outcomes, and external validity refers to how well that relationship applies to different groups and circumstances. However, the more researchers tightly control the environment to ensure internal validity, the less they can claim external validity for generalizing their results to different populations and circumstances. Correspondingly, researchers whose settings are just like the real world will be less able to ensure internal validity, as there are many factors that could pollute the research process. This is not to suggest that experimental research cannot have external validity, but that experimental researchers must always be aware that external validity problems can occur and be forthcoming in their reports of findings about this potential weakness.
Threats to internal validity
There are a number of factors that may influence a study’s internal validity. You might consider these threats to all be spurious variables, as we discussed at the beginning of this section. Each threat proposes something other than the treatment (or intervention) is changing the outcome. The threats introduce error and bias into the experiment.
Throughout this chapter, we reviewed the importance of experimental and control groups. These groups must be comparable in order for experimental design to work. Comparable groups are groups that are similar across factors important for the study. Researchers can help establish comparable groups by using probability sampling, random assignment, or matching techniques. Control or comparison groups give researchers an opportunity to explore what happens when similar people who do not receive the intervention. But if the experimental and control groups are not comparable, then the differences in outcome may not be due to the intervention. No groups are ever perfectly comparable. What’s important is ensuring groups are as similar as possible along variables relevant to the research project.
In our restaurant example, if one of the groups had far more vegetarians or people with gluten issues, it might influence how satisfied they were with the restaurant. The groups, in that case, would not be comparable. Researchers can account for this by measuring other variables, like dietary preference, and controlling for their effects statistically, after the data are collected. We discussed control variables like these in Chapter 4. When some factor related to selecting research participants prevents the groups from being comparable, then selection bias is introduced into the sample. This could happen if a researcher cho0ses clients from one agency to belong to the experimental group and those from another agency to be in the comparison group, when the agencies serve different types of people. Selection bias is a reason experimenters use random assignment, so conscious and unconscious bias do not influence to which group a participant is assigned. Sometimes, the groups are comparable at the start of the experiment, but people drop out of the experiment. Mortality is the term we use to describe when a group changes because of people dropping out of the study. In our restaurant example, this could happen if vegetarians dropped out of the experimental group because the restaurant being tested didn’t have vegetarian options.
Experiments themselves are often the source of threats to validity. Experiments are different from participants’ normal routines. The novelty of a research environment or experimental treatment may cause them to expect to feel differently, independently of the actual intervention. Reactivity is a threat to internal validity that occurs because the participants realize they are being observed. In this case, being observed makes the difference in outcome, not the intervention.
What if the people in the control group are aware that they aren’t receiving the potential benefits from the experimental treatment? Maybe they respond by increasing their efforts to improve in spite of not receiving the treatment. This introduces a threat to internal validity called compensatory rivalry. On the other hand, it might have the opposite effect. Resentful demoralization occurs when people in the control group decrease their efforts because they aren’t getting the treatment. These threats could be decreased by keeping the experimental and control groups completely separate, so the control group isn’t aware of what’s happening with the experimental group. An advantage to this is that it can help prevent diffusion of treatment, in which members of the control group learn about the experimental treatment from people in the experimental group and start implementing the intervention for themselves. This can occur if participants in the experimental group begin to behave differently or share insights from the intervention with individuals in the control group. Whether through social learning or conversation, participants in the control group may receive parts of the intervention of which they were supposed to be unaware.
Researchers may also introduce error. For example, researchers may expect the experimental group to feel better and may give off conscious or unconscious cues to participants that influence their outcomes. Control groups could be expected to fare worse, and research staff might cue participants that they should feel worse than they otherwise would. It is also possible that research staff administering treatment as usual to the control group might try to equalize treatment or engage in a rivalry with research staff administering the experimental group (Engel & Schutt, 2016). To prevent these threats that are caused by researchers or participants being aware of their role in the experiment, double-blind designs prevent both the research staff interacting with participants and the participants themselves from knowing who is assigned to which group.
There are some additional threats to internal validity that using double-blind designs cannot reduce. You have likely heard of the placebo effect, in which a participant in the control group feels better because they think they are receiving treatment, despite not having received the experimental treatment at all. Researchers may introduce a threat to internal validity called instrumentation when they choose measures that do not accurately measure participants or implement the measure in a way that biases participant responses. Testing is a threat to internal validity in which the fact that participants take a pretest–not the intervention–affects their score on the post-test. The Solomon Four Group and Post-test Only designs are used to reduce the testing threat to internal validity. Sometimes, the change in an experiment would have happened even without any intervention because of the natural passage of time. This is called maturation. Imagine researchers testing the effects of a parenting class on the beliefs and attitudes of adolescent fathers. Perhaps the changes in their beliefs and attitudes are based on growing older, not on the class. Having a control or comparison group helps with this threat. It also helps reduce the threat of history, when something happens outside the experiment but affects its participants.
As you can see, there are several ways in which the internal validity of a study can be threatened. No study can eliminate all threats, but the best ones consider the threats and do their best to reduce them as much as is feasible based on the resources available. When you read and critique research articles, it is important to consider these threats so you can assess the validity of a study’s results.
Spotlight on UTA School of Social Work
Assessing a teen pregnancy and STI prevention program
Dr. Holli Slater and Dr.Diane Mitschke implemented an experimental design to conduct a randomized two-group cohort-based longitudinal study using repeated measures to assess outcomes related to a teen pregnancy prevention program (Slater & Mitschke, 2015). Crossroads was a co-ed program targeting academically at risk youth enrolled in local school district. It was administered by trained facilitators in a large-group setting across three consecutive days for a total of 18.75 hours of program instruction. Each day had a separate focus, including building relationships, prevention of pregnancy and sexually transmitted infections (STIs), and identifying resources available within the community.
Potential participants were recruited on an ongoing basis and put into a pool of potential candidates to join the study. Prior to each intervention series, 60 youth were randomly assigned to either treatment or control groups. Youth assigned to the treatment group attended a three day intervention and received ongoing support from assigned facilitators. Youth who were assigned to the control group did not attend the intervention and continued to receive services as usual. Services as usual comprised of being assigned a graduation coach who provided dropout prevention services and assisted youth to meet their academic goals. Graduation coach services were available to all at-risk students in the school district, regardless of their enrollment in the study and/or assignment to treatment or control groups.
The primary research aim of the study was to assess the impact of being offered participation in the intervention on condom use. Essentially, the researchers wanted to see if condom use increased more among sexually active youth following the intervention compared to youth who did not attend the intervention. In addition to this primary research aim, Drs. Mitschke and Slater explored whether this effect was sustained over time. They collected data through an online survey at four separate time points (baseline, 3-, 6-, and 12- months post intervention). Due to the longitudinal nature of the study and the highly transient population, the researchers provided incentives of a $20 gift card at each data collection point. They still had a challenge in retaining youth for the duration of the study.
An intent-to-treat framework was used to assess the impact of the program, meaning data analysis included all youth who were randomized regardless of their level of participation in the program. The researchers compared the outcomes between youth in treatment and youth in the control groups. Significant differences between the treatment and control groups (p<.05) would support the argument that changes in behavior (e.g., increase in condom use) were attributed to participation in the intervention.
Results of the study did not identify significant findings in condom usage at 3 months and 12 months after the intervention. However, it did find significant results at 6 months, indicating that youth who participated in the intervention were less likely to engage in intercourse without a condom than youth in the control group. While it is disappointing to not find significant results in a large scale study, such as this, negative results can be just as powerful.
Dr. Slater and Dr. Mitschke explored reasons why the intervention may not have been as effective immediately following the intervention by talking with youth and their counselors to gain insight. One possible explanation is that youth enrolled in this study had already established their sexual norms prior to the intervention. The majority of youth in the study were already sexually active. If this was the case, then practitioners developing interventions for pregnancy prevention should take this into consideration when developing program. Perhaps implementing an intervention at an earlier age when youth are not yet sexually active would have a greater impact on behaviors than waiting until they are already engaging in risky sexual behaviors and trying to create a change.
It is interesting that behaviors did seem to change with youth at the six month follow up. It is possible this is a spurious result and should be explored more fully. Interviews with youth indicated that the repeated follow up from the intervention team over time resulted in an increase in trust between the youth and their counselor. Some even suggested they changed their behaviors because a caring adult took time to continually follow up with them. This alternate explanation should also be further explored to better understand what components of the intervention have the greatest impact on the behavior of youth.
Key Takeaways
- Experimental design provides researchers with the ability to best establish causality between their variables.
- Experiments provide strong internal validity but may have trouble achieving external validity.
- Experimental deigns should be reproducible by future researchers.
- Threats to validity come from both experimenter and participant reactivity.
Glossary
- Comparable groups – groups that are similar across factors important for the study
- Compensatory rivalry – a threat to internal validity in which participants in the control group increasing their efforts to improve because they know they are not receiving the experimental treatment
- Diffusion of treatment – a threat to internal validity in which members of the control group learn about the experimental treatment from people in the experimental group and start implementing the intervention for themselves
- Double-blind – when researchers interact with participants are unaware of who is in the control or experimental group
- External validity – the degree to which experimental conclusions generalize to larger populations and different situations
- Instrumentation – a threat to internal validity when measures do not accurately measure participants or are implemented in a way that biases participant responses
- Internal validity – the confidence researchers have about whether their intervention produced variation in their dependent variable
- Maturation – a threat to internal validity in which the change in an experiment would have happened even without any intervention because of the natural passage of time
- Mortality – a threat to internal validity caused when either the experimental or control group composition changes because of people dropping out of the study
- Placebo effect- when a participant feels better, despite having received no intervention at all
- Reactivity – a threat to internal validity that occurs because the participants realize they are being observed
- Replication – conducting another researcher’s experiment in the same manner and seeing if it produces the same results
- Resentful demoralization – a threat to internal validity that occurs when people in the control group decrease their efforts because they aren’t getting the experimental treatment
- Selection bias – when the elements selected for inclusion in a study do not represent the larger population from which they were drawn due to sampling method or thought processes of the researcher