U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Behav Anal Pract
  • v.12(2); 2019 Jun

Systematic Protocols for the Visual Analysis of Single-Case Research Data

Katie wolfe.

1 Department of Educational Studies, University of South Carolina, 820 Main St, Columbia, SC 29208 USA

Erin E. Barton

2 Department of Special Education, Vanderbilt University, Box 228 GPC, Nashville, TN 37203 USA

Hedda Meadan

3 Department of Special Education, University of Illinois at Urbana–Champaign, 1310 South Sixth Street, Champaign, IL 61820 USA

Researchers in applied behavior analysis and related fields such as special education and school psychology use single-case designs to evaluate causal relations between variables and to evaluate the effectiveness of interventions. Visual analysis is the primary method by which single-case research data are analyzed; however, research suggests that visual analysis may be unreliable. In the absence of specific guidelines to operationalize the process of visual analysis, it is likely to be influenced by idiosyncratic factors and individual variability. To address this gap, we developed systematic, responsive protocols for the visual analysis of A-B-A-B and multiple-baseline designs. The protocols guide the analyst through the process of visual analysis and synthesize responses into a numeric score. In this paper, we describe the content of the protocols, illustrate their application to 2 graphs, and describe a small-scale evaluation study. We also describe considerations and future directions for the development and evaluation of the protocols.

Single-case research (SCR) is the predominant methodology used to evaluate causal relations between interventions and target behaviors in applied behavior analysis and related fields such as special education and psychology (Horner et al., 2005 ; Kazdin, 2011 ). This methodology focuses on the individual case as the unit of analysis and is well suited to examining the effectiveness of interventions. SCR facilitates a fine-grained analysis of data patterns across experimental phases, allowing researchers to identify the conditions under which a given intervention is effective for particular participants (Horner et al., 2005 ; Ledford & Gast, 2018 ). In addition, the dynamic nature of SCR allows the researcher to make adaptations to phases and to conduct component analyses of intervention packages with nonresponders to empirically identify optimal treatment components (Barton et al., 2016 ; Horner et al., 2005 ).

Visual analysis is the primary method by which researchers analyze SCR data to determine whether a causal relation (i.e., functional relation, experimental control) is documented (Horner et al., 2005 ; Kratochwill et al., 2013 ). Visual analysis involves examining graphed data within and across experimental phases. Specifically, researchers look for changes in the level, trend, or variability of the data across phases that would not be predicted to occur without the active manipulation of the independent variable. Level is the amount of behavior that occurs in a phase relative to the y -axis (Barton, Lloyd, Spriggs, & Gast, 2018 ). Trend is the direction of the data over time, which may be increasing, decreasing, or flat (Barton et al., 2018 ). Variability is the spread or fluctuation of the data around the trend line (Barton et al., 2018 ). A change in the level, trend, or variability of the data between adjacent phases is a basic effect; to determine whether there is a causal relation, the researcher looks for multiple replications of the effect at different and temporally related time points (Kratochwill et al., 2013 ).

Despite this reliance on visual analysis, there have been long-standing concerns about interrater agreement , or the extent to which two visual analysts evaluating the same graph make the same determination about functional relations and the magnitude of change. In general, these concerns have been borne out by empirical research (e.g., Brossart, Parker, Olson, & Mahadevan, 2006 ; DeProspero & Cohen, 1979 ; Wolfe, Seaman, & Drasgow, 2016 ). In one study, Wolfe et al. ( 2016 ) asked 52 experts to report whether each of 31 published multiple-baseline design graphs depicted (a) a change in the dependent variable from baseline to intervention for each tier of the graph and (b) an overall functional relation for the entire multiple-baseline design graph. Interrater agreement was just at or just below minimally acceptable standards for both types of decisions (intraclass correlation coefficient [ICC] = .601 and .58, respectively). The results of this study are generally representative of the body of literature on interrater agreement among visual analysts (cf. Kahng et al., 2010 ). Given that visual analysis is integral to the evaluation of SCR data (Horner & Spaulding, 2010 ; Kazdin, 2011 ), research indicating that it is unreliable under many circumstances presents a significant challenge for the field—particularly the acceptance of SCR as a credible and rigorous research methodology.

Many researchers have argued that poor agreement among visual analysts may be due to the absence of formal guidelines to operationalize the process (Furlong & Wampold, 1982 ), which leaves the analysis vulnerable to idiosyncratic factors and individual variability related to “history, training, experience, and vigilance” (Fisch, 1998 , p. 112). Perhaps due to the lack of formal guidelines, single-case researchers rarely identify, let alone describe, the methods by which they analyze their data. Smith ( 2012 ) reported that authors in fewer than half of the SCR studies published between 2000 and 2010 ( n = 409) identified the analytic method they used; only 28.1% explicitly stated that they used visual analysis. Even less frequently do authors describe the specific procedure by which visual analysis was conducted. In a review of SCR articles published in 2008 ( n = 113), Shadish and Sullivan ( 2011 ) found that only one study reported using a systematic procedure for visual analysis (Shadish, 2014 ). Barton, Meadan, and Fettig ( 2019 ) found similar results in a review of parent-implemented functional assessment interventions; study authors rarely and inconsistently used visual analysis terms and procedures across SCR studies and were most likely to discuss results using only mean, median, and average rather than level, trend, or variability. Overall, it is difficult to identify specifically how single-case researchers are conducting visual analysis of their data, which might lead to high rates of disagreement and adversely impact interpretations of results and syntheses across SCR. In other words, unreliable data analysis may impede the use of SCR to identify evidence-based practices, which has important and potentially adverse practical and policy implications.

There have been a few recent efforts to produce and disseminate standards that may promote greater consistency in visual analysis. The What Works Clearinghouse (WWC) Single-Case Design Standards (Kratochwill et al., 2013 ; WWC, 2017 ) describe four steps for conducting visual analysis that consider six data characteristics (i.e., level, trend, variability, immediacy, overlap, and consistency). However, the WWC standards were not designed to provide a systematic, step-by-step protocol to guide the visual analysis process (Hitchcock et al., 2014 ) and do not assist researchers in synthesizing information about the data characteristics and across experimental phases. For example, the four steps do not explain the relative importance of the data characteristics in making determinations about basic effects and experimental control. This ambiguity could introduce subjectivity into the analysis and result in two visual analysts reaching different conclusions about the same graph despite using the same procedures.

To increase agreement among visual analysts working on reviews of SCR literature, Maggin, Briesch, and Chafouleas ( 2013 ) developed a visual analysis protocol based on the WWC SCR standards (Kratochwill et al., 2013 ). Using this protocol, the analyst answers a series of questions about the graph and then uses these responses to determine the number of basic effects and the level of experimental control demonstrated by the graph (Maggin et al., 2013 ). Maggin et al. ( 2013 ) reported high agreement between the three authors following training on the protocol (e.g., 86% agreement), which suggests that structured, step-by-step protocols could be an effective way to increase consistency among visual analysts. Their protocol guides researchers through visual analysis procedures; however, it does not assist the researcher in synthesizing the six data characteristics within and across phases to make determinations about basic effects, experimental control, or weighing conflicting data patterns for making a judgment about functional relations. This introduces potential variability that could produce inconsistencies across different individuals and studies. The study by Wolfe et al. ( 2016 ) provides empirical evidence of this variability. They found that experts vary in the minimum number of effects they require to identify a functional relation. Some experts identified functional relations when there were three basic effects, but other experts identified a functional relation with only two basic effects. In other words, two experts may come to the same conclusions about the presence of basic effects in a particular graph, but they may translate that information into different decisions about the presence of a functional relation. Structured criteria that systematize the process of translating the within- and across-phase analysis into a decision about the overall functional relation may reduce this variability and improve agreement.

Researchers have developed structured criteria for the analysis of a specific type of SCR design used for a specific purpose. Hagopian et al. ( 1997 ) developed criteria for evaluating multielement graphs depicting the results of a functional analysis. The criteria consist of a step-by-step process that leads to a conclusion about the function of the behavior depicted in the graph. Hagopian et al. ( 1997 ) evaluated the effects of the criteria with three predoctoral interns in a multiple-baseline design and showed that participants’ agreement with the first author increased from around 50% in baseline to an average of 90% following training in the use of the structured criteria. The work of Hagopian et al. ( 1997 ) demonstrates that structured criteria can be developed for SCR that synthesize the user’s responses and lead directly to a conclusion about the data. Further, the use of the criteria improved agreement between raters and experts. However, the Hagopian et al. ( 1997 ) criteria apply only to multielement graphs used for a specific purpose and cannot be applied to other SCR designs.

To address the shortcomings of current practice and standards in visual analysis, we developed systematic, web-based protocols for the visual analysis of A-B-A-B and multiple-baseline design SCR data that consist of a series of questions for the analyst to answer that synthesizes the analyst’s responses to produce a numerical rating of experimental control for the graph. We designed our protocols to emphasize the six data characteristics outlined in the WWC ( 2017 ) SCR standards (i.e., level, trend, variability, immediacy, overlap, and consistency) and to support single-case researchers in making decisions about data patterns based on these characteristics. Further, our protocols guide the researchers in systematically making decisions about data patterns within and across phases and tiers to make judgments about functional relations. In this paper we describe the protocols, illustrate their application to two SCR graphs, and discuss findings from an initial evaluation study.

Content and Structure of the Protocols

We developed two step-by-step protocols, one for A-B-A-B designs, and one for multiple-baseline designs, to guide the analyst through the process of evaluating SCR data. The protocols are accessible as web-based surveys and as Google Sheets; both formats can be accessed from https://sites.google.com/site/scrvaprotocols/ . Each protocol consists of a series of questions with dichotomous response options (i.e., yes or no) about each phase and phase contrast in the design. The questions in each protocol are based on current published standards for SCR (Kratochwill et al., 2013 ), as well as guidelines for visual analysis published in textbooks on SCR (e.g., Cooper, Heron, & Heward, 2007 ; Kazdin, 2011 ; Ledford & Gast, 2018 ). Table ​ Table1 1 lists the relevant sources that support the inclusion of the questions in each protocol and also provides evidence of the protocols’ content validity. Each question in the protocols includes instructions and graphic examples illustrating potential “yes” and “no” responses. In the web-based survey, these instructions appear when the user hovers over a question. In Google Sheets, the instructions are accessed by clicking on a link in the spreadsheet.

Alignment of protocol content with published recommendations for visual analysis

X = item is referenced in source

The basic process for assessing each phase using the protocols includes examining both within- and between-phase data patterns (Kratochwill et al., 2013 ). First, the protocol prompts the visual analyst to evaluate the stability of the data within a given phase. Second, if there is a predictable pattern, the visual analyst projects the trend of the data into the subsequent phase and determines whether the level, trend, or variability of the data in this subsequent phase differs from the pattern predicted from the previous phase. If there was a change in the data between the two phases, then the analyst identifies if that change was immediate and measures the data overlap between the two phases. If there is not a change between the two phases, the analyst is directed to proceed to the next phase contrast. If multiple data paths are depicted on an A-B-A-B or multiple-baseline design graph, the data paths typically represent different dependent variables. In these cases, each data path should be analyzed with a separate application of the protocol to determine the presence of a functional relation between the independent variable and each dependent variable.

The protocols are response guided (i.e., responsive to the analyst’s input) and route the analyst through the process based on responses to previous questions. For example, if there are not sufficient data in the baseline phase to predict the future pattern of behavior, then the analyst cannot project the trend of the baseline data into the intervention phase to evaluate whether the data changed from the predicted pattern. In this case, the protocol skips ahead to questions about the next phase. Likewise, if the analyst responds that there is not a change in the dependent variable from one phase to the next, the protocol skips questions about immediacy and overlap, which are not relevant if the data did not change. The protocols are dynamic—some questions act as gatekeepers, making other questions available or unavailable based on the user’s response.

Unlike other systematic guidelines for visual analysis (e.g., Maggin et al., 2013 ), the protocols generate an experimental control score for the graph based on the analyst’s responses to the questions. Specific questions in the protocols have weighted values based on their importance to demonstrating a functional relation, and the sum of these values produces the experimental control score for the graph. Scores generated by the protocols range from 0 (no functional relation) to 5 (functional relation with large behavioral change), with 3 being the minimum score for evidence of a functional relation. Published guidelines for the analysis of SCR suggest that three basic effects, or changes in the level, trend, or variability of the dependent variable from one phase to the next, are required to demonstrate a functional relation (Barton et al., 2018 ; Kratochwill et al., 2013 ). Therefore, the questions pertaining to changes between adjacent phases (i.e., phase contrast questions) have a value of 1 in the protocols. As a result, a study depicting three basic effects would earn a minimum score of 3, which is the minimum criterion for demonstrating a functional relation based on our proposed interpretation guidelines.

Other questions may not be critical to the demonstration of a functional relation but strengthen the evidence of a functional relation if one is present. For example, depending on the nature of the dependent variable, it may not be essential that the data change immediately after the introduction of the intervention (i.e., within 3–5 data points) to demonstrate a functional relation (Kazdin, 2011 ). However, an immediate change increases the analyst’s confidence that the intervention caused the change in the dependent variable. Therefore, questions about the immediacy of the effect have a smaller weight (e.g., 0.25; A-B-A-B protocol) compared to questions about identifying basic effects.

Similarly, minimal overlap between the data paths in adjacent phases is generally considered desirable but not necessary nor always meaningful (e.g., data might have substantial overlap but contrasting trends) for demonstrating functional relations (Barton et al., 2018 ). Therefore, the overlap item also has a smaller weight (e.g., 0.25; A-B-A-B protocol). Phase contrasts must have 30% or fewer overlapping data points to receive points for this item in the protocol. This criterion is based on the interpretive guidelines proposed for the percentage of nonoverlapping data (Scruggs & Mastropieri, 1998 ), which suggest that 70% of nonoverlapping data between phases indicates an effective intervention (note that the protocol asks the analyst to calculate the inverse, or the amount of overlapping data, and thus the criterion is set at 30%).

In the multiple-baseline design protocol, we assigned the questions pertaining to vertical analysis a negative value. Vertical analysis refers to the examination of the data in tiers that remain in baseline when the intervention is introduced to a previous tier (Horner, Swaminathan, Sugai, & Smolkowski, 2012 ). Other sources refer to this same feature as verification of the change in the previous tier (Cooper et al., 2007 ). If the baseline data for any tiers still in baseline change markedly when the intervention is introduced to another tier, this indicates a potential alternative explanation for any observed change (e.g., behavioral covariation, history, maturation) and decreases confidence that the intervention was causally related to the change in the dependent variable. This question has a negative value because if the analyst answers “yes,” it detracts from the overall experimental control score for the graph.

Although we have proposed interpretation guidelines for the scores generated by the protocols, the score should be interpreted within the context of the study’s overall methodological quality and rigor; if the study has strong internal validity, minimizing plausible alternative explanations, then the score produced by the protocol can indicate the presence and strength of a functional relation. However, if the study is poorly designed or executed or is missing key features (e.g., interobserver agreement [IOA], procedural fidelity), or if key features are insufficient to rule out threats to internal validity (e.g., IOA is less than 80%, missing data), then the score produced by the protocol may be misleading because the methodological rigor limits interpretations of the data.

Application of the Protocols

Although we cannot demonstrate the dynamic and responsive nature of the protocols in this article, we will walk through two examples to illustrate how they are applied to SCR data. Both of the graphs used to illustrate the application of the protocols were used in our reliability and validity evaluations of the protocols. We encourage the reader to access the protocols in one or both formats to explore the content, structure, routing, and scoring that will be illustrated in the next sections.

A-B-A-B Design Protocol

Figure ​ Figure1 1 depicts a hypothetical A-B-A-B graph showing the number of talk-outs within a session, and Fig. ​ Fig.2 2 shows the completed protocol for this graph. Use of the protocol involves comparing the first baseline phase to the first treatment phase (A1 to B1), the first treatment phase to the second baseline phase (B1 to A2), and the second baseline phase to the second treatment phase (A2 to B2). We also compare the data patterns in similar phases (i.e., A1 to A2 and B1 to B2).

An external file that holds a picture, illustration, etc.
Object name is 40617_2019_336_Fig1_HTML.jpg

Sample A-B-A-B graph

An external file that holds a picture, illustration, etc.
Object name is 40617_2019_336_Fig2_HTML.jpg

Completed protocol for sample A-B-A-B graph

The protocol starts by prompting the visual analyst to examine the first baseline phase. There are three data points, and those data are stable—we predicted that if baseline continued, the data would continue to decrease—so we answered “yes” to the first question. The second question asks us to evaluate the first treatment phase in the same manner, and given the number of data points and the overall decreasing trend, we answered “yes” to this question as well. Next, we are directed to project the trend of the first baseline phase into the first treatment phase and evaluate whether the level, trend, or variability of the treatment data is different from our prediction. The level is different from our prediction, so we answered “yes,” identifying a basic effect between these phases. The identification of a basic effect for this phase contrast makes the next two questions available.

Regarding immediacy, the level of the data did change from the last three data points in baseline to the first three data points in treatment, so we selected “yes.” To identify the amount of overlap between the two phases, we drew a horizontal line extending from the highest baseline datum point into the first treatment phase because the goal of the intervention was to increase the behavior. Next, we counted the number of data points in the first treatment phase that are the same or “worse” than this line. Whether “worse” data are higher or lower than the line will depend on the targeted direction of behavior change. In this case, the goal was to increase the behavior, so treatment data points that are the same as or below the line would be considered worse. There are no treatment data points below the line, so there is no overlapping data between these two phases. If there were data points below the line, we would divide the number of data points below the line by the total number of data points in the treatment phase to get the percentage of overlapping data. We answered “yes” because less than 30% of the data overlaps between the two phases.

The majority of the remaining A-B-A-B protocol involves answering this same series of questions about the remaining phases and phase contrasts; however, it is important to note that in the second phase contrast (i.e., the comparison from the first treatment phase to the second baseline phase), a basic effect would be demonstrated by a decrease in the number of talk-outs relative to our prediction from the treatment phase. Because the expected direction of behavior change is different for this particular phase contrast, the procedure for calculating overlapping data differs slightly as well (see instructions for this question in the protocol). The A-B-A-B protocol also includes two questions about the consistency of the data patterns across like phases. These questions involve examining the similarity of the level, trend, or variability of the data across (a) both baseline phases and (b) both treatment phases to evaluate if any of these characteristics are similar. For this graph, the data in the first baseline phase have a low level, little variability, and a decreasing trend. The data in the second baseline phase have a medium level, medium variability, and no clear trend. Therefore, we answered “no” to the question about consistency between the baseline phases. Based on our dichotomous responses to the questions in the protocol, the overall score for experimental control for this graph is 2.75, which does not provide evidence of a functional relation. To see answers and scoring for the complete protocol for this graph, as well as details about how the protocol routes the user through relevant questions based on responses, we encourage the reader to examine Fig. ​ Fig.2 2 in detail.

Multiple-Baseline Design Protocol

Similar to the A-B-A-B protocol, the multiple-baseline design protocol requires that the analyst examine each phase and phase contrast in the design. However, consistent with the logic of a multiple-baseline design, use of this protocol involves both comparing baseline to treatment for each tier (i.e., A to B) and determining if the introduction of the intervention was staggered in time across tiers and whether the dependent variable changed when and only when the intervention was applied (i.e., vertical analysis).

Figure ​ Figure3 3 shows a hypothetical multiple-baseline design depicting the percentage of steps of a hygiene routine completed independently, and Fig. ​ Fig.4 4 is the completed protocol for this graph. The first question in the protocol involves the stability of the baseline data in the first tier. The phase does have three data points, but the variability of the data makes it difficult to project the overall pattern of the behavior, and as a result, we answered “no” to this question. This made the next four questions unavailable; if we cannot predict the future pattern of the baseline data, then we cannot project the trend into the treatment phase and make a confident determination about the presence of a basic effect. The next available question is about the stability of the baseline data in the second tier. This phase has more than three data points, and they are fairly stable around 10–20%, so we answered “yes.” Next, we looked at whether the baseline data in Tier 2 changed when the intervention began with Tier 1, which was after Session 3. The data in Tier 2 remain stable during and immediately after that session, so we answered “no” for this question. The next question asks if the treatment was introduced to Tier 2 after it was introduced to Tier 1; it was, so we answered “yes.” Had this question been answered “no,” the remaining questions for Tier 2 would become unavailable.

An external file that holds a picture, illustration, etc.
Object name is 40617_2019_336_Fig3_HTML.jpg

Sample multiple-baseline design graph

An external file that holds a picture, illustration, etc.
Object name is 40617_2019_336_Fig4_HTML.jpg

Completed protocol for sample multiple-baseline design graph

We continue by examining the stability of the Tier 2 treatment phase, and we have more than three data points and a clear upward trend, so we answered “yes.” Projecting the trend of the baseline phase into the treatment phase for Tier 2, we see there is a change in both the level and trend of the treatment data compared to our prediction from baseline, so we answered “yes.” That change was immediate (i.e., within the first 3–5 data points of treatment), so we answered “yes” to the next question about immediacy. Calculating overlap as previously described, we calculated 13% overlap between the two phases (1 overlapping datum point out of 8 total treatment data points), which is less than 30%, so we answered “yes.” The last question about this tier asks us to examine the similarity of data patterns between the treatment phases for Tier 1 and Tier 2. The tiers have similar levels, trends, and variability, so our response was “yes.”

The remainder of the multiple-baseline design protocol includes these same questions about the third tier in the design. Notably, the Tier 3 baseline data did change after Session 3, when the treatment was introduced to Tier 1, so we answered “yes” to the question about vertical analysis for Tier 3. Based on our dichotomous responses to the questions in the protocol, our overall score for experimental control for this graph was 2.32. To see answers and scoring for the complete protocol for this graph, as well as details about how the protocol routes the user through relevant questions based on responses, examine Fig. ​ Fig.4 4 in detail.

Evaluation of the Protocols

We conducted an initial evaluation of the reliability and validity of the protocols. We evaluated the reliability of the protocols by comparing the interrater agreement produced by the protocols to interrater agreement produced by a visual analysis rating scale. We evaluated the validity of the protocols by comparing scores produced by the protocols to scores assigned to the graphs by expert visual analysts using a rating scale.

Reliability Evaluation

To evaluate the reliability of the protocols, we recruited 16 attendees at an international early childhood special education conference held in a large city in the Southeastern United States. Attendees had to have taken a graduate-level course in SCR to participate in the evaluation. Nine participants reported that their terminal degree was a doctorate and designated their primary roles as university faculty or researchers, and seven reported that their terminal degree was a master’s and indicated that they were students. Participants were randomly assigned to the rating scale group ( n = 8) or the protocol group ( n = 8) and were split fairly evenly between the two groups based on highest degree earned (e.g., the protocol group consisted of three participants with doctorates and five with master’s degrees).

Each of the three authors independently used the protocols with 48 randomly selected published SCR graphs (24 A-B-A-B; 24 multiple-baseline design) during the iterative development process. From this set, we identified four A-B-A-B graphs and four multiple-baseline graphs with (a) ratings across the range of the protocol (i.e., 0–5) and (b) differences of 0.5 to 1.5 in our expert ratings based on our independent applications of the protocol. These criteria were used to ensure that we included diverse graphs in terms of both (a) the presence and absence of basic effects and functional relations and (b) graph difficulty (e.g., graphs with data with more variability or smaller changes might be difficult to visually analyze). We quantified difficulty using the range of scores produced by our independent applications of the protocol, such that graphs with more disparate scores between the authors were considered more difficult.

All study materials (i.e., graphs, rating scale, protocol) were uploaded into an online survey platform, and participants accessed the survey from the web browser on their personal laptop or tablet. All participants took a pretest on which they scored the eight graphs using a rating scale from 0 to 5. All points on the rating scale were defined as illustrated in Table ​ Table2, 2 , and the terms basic effect and functional relation were defined on each page of the pretest. Then, based on their random group assignments, participants rated the same eight graphs using either the rating scale or the systematic protocols.

Visual analysis rating scale

To evaluate interrater agreement, we calculated the ICC (Shrout & Fleiss, 1979 ) on the scores produced by the rating scale and the protocols (i.e., 0–5). The ICC is an index of agreement across multiple judges making multiple decisions that takes into account the magnitude of difference between judges’ decisions, unlike other agreement indices that are calculated based on exact agreement (Hallgren, 2012 ). Suggested interpretation guidelines for ICCs are as follows: Values below .40 are considered poor, values between .41 and .59 are considered fair, values between .60 and .74 are considered good, and values at .75 and above are considered excellent (Cicchetti, 1994 ). We calculated the ICC for each group at each time point, which enabled us to evaluate (a) if the use of the protocols improved agreement compared to the use of the rating scale and (b) if we could attribute improvements in agreement to the protocols rather than to the evaluation of the same graphs a second time. We collected social validity data from the participants regarding the utility of each method for understanding the data and the extent to which each reflected how the analyst would typically analyze SCR data. We also asked the protocol group which method (i.e., rating scale or protocol) they would be more likely to use to conduct visual analysis and to teach others to conduct visual analysis.

Figure ​ Figure5 5 shows the pretest and posttest ICCs for each group. Both groups had similar interrater agreement at pretest when using the rating scale (rating scale group ICC = .60; protocol group ICC = .58). However, the agreement of the protocol group improved at posttest (ICC = .78), whereas the agreement of the rating scale group remained relatively stable (ICC = .63). Based on the proposed guidelines for interpreting ICCs (Cicchetti, 1994 ), the agreement of the protocol group improved from fair at pretest when using the rating scale to excellent at posttest when using the protocol.

An external file that holds a picture, illustration, etc.
Object name is 40617_2019_336_Fig5_HTML.jpg

Intraclass correlation coefficients for the rating scale group ( n = 8) and the protocol group ( n = 8) at pretest and posttest

We also examined percentage agreement across protocol questions, displayed in Table ​ Table3, 3 , to identify the types of questions that produced the most disagreement among participants. Participants disagreed most often about questions pertaining to phase stability, followed by questions about the presence of basic effects. Questions about immediacy, overlap, consistency, and staggered treatment introduction (multiple-baseline designs) produced the highest agreement. Most participants in the protocol group rated the protocol as easy or very easy to understand ( n = 6), whereas half as many participants in the rating scale group reported the same about the rating scales ( n = 3). Similarly, most participants who used the protocol rated it as either mostly or very reflective of how they would typically conduct visual analysis, whereas one participant in the rating scale group reported the same about the rating scale. Finally, almost all participants in the protocol group reported that they would choose the protocol over the rating scale to conduct visual analysis ( n = 6) and to teach others to conduct visual analysis ( n = 7).

Percentage agreement on protocols by question type across graphs

n refers to the number of questions per graph

Validity Evaluation

We also evaluated the validity of the protocols by comparing decisions produced by it to decisions made by expert visual analysts. We recruited eight researchers with expertise in SCR, which we defined as having a doctorate and being an author on at least five SCR publications (Wolfe et al., 2016 ), to participate. All experts identified their current position as faculty member or researcher and reported that they were an author on an average of 21 SCR publications (range = 5–65; median = 10).

Using the graphs from the reliability evaluation, we asked the experts (a) to make a dichotomous judgment about whether there was a functional relation and (b) to use the rating scale in Table ​ Table2 2 for each graph. Experts accessed the materials from a link sent via e-mail, and we allowed 10 days for experts to participate in the validity evaluation. We told the experts that we were evaluating the validity of systematic protocols for visual analysis, but they did not have knowledge of or access to the protocols.

To evaluate the validity of the protocols, we calculated the percentage of experts who said there was a functional relation and the percentage of participants whose protocol score converted to a functional relation (i.e., ≥3) for each graph. Although we asked the experts to answer “yes” or “no” about the presence of a functional relation and then use the rating scale for each graph, the experts’ dichotomous decisions always aligned with their score on the rating scale. There was some disagreement among the experts on their ratings and dichotomous decisions, so we calculated the mean score of the experts using the rating scale and compared it to the mean score of the participants using the protocols.

The ICC for the experts using the rating scale was .73, which is considered good according to interpretive guidelines for the statistic. Table ​ Table4 4 displays the percentage of experts who said there was a functional relation for each graph and the percentage of participants whose protocol score indicated a functional relation for each graph, as well as the mean scores for each graph for each group. These results indicate similar levels of agreement among experts using the rating scale and among participants using the protocol.

Percentage agreement and mean ratings for experts and protocol group

Figure ​ Figure6 6 shows the mean scores for each graph for both groups of raters. Graphs 1–4 were multiple-baseline designs, and Graphs 5–8 were A-B-A-B designs. Across all graphs, the correlation between the mean scores produced by the experts using the rating scale and by the participants using the protocol was strong ( r = 0.83). The mean difference between the expert rating scale score and the participant protocol score was 0.5, with a range of 0–1.2. For most of the graphs (63%), the difference between the scores was less than 0.5. Although the average difference score was 0.5 for both multiple-baseline designs and A-B-A-B designs, there was a larger range of difference scores for the multiple-baseline designs (0–1.2) than for the A-B-A-B designs (0.3–0.7). We dichotomized the mean scores for each group for each graph to obtain one “decision” for each group with respect to the presence or absence of a functional relation for the graph. The mean decision produced by the experts using the rating scale agreed with the mean decision produced by the participants using the protocol for all eight graphs. As shown in Fig. ​ Fig.6, 6 , the mean participant protocol score tended to be below the mean expert rating scale score for multiple-baseline designs, but the reverse was true for A-B-A-B designs. The lower score for the use of the protocol for multiple-baseline designs may be due to the question on vertical analysis, which subtracts a point if the participant indicated that the data in a tier that was still in baseline changed when the intervention was introduced to a previous tier.

An external file that holds a picture, illustration, etc.
Object name is 40617_2019_336_Fig6_HTML.jpg

Mean scores for each graph on the rating scale (expert visual analysis) and on the protocol (participant visual analysis). The dotted line indicates the criterion for demonstrating a functional relation

Further Development and Evaluation of the Protocols

Visual analysis of SCR data is the primary evaluative method to identify functional relations between experimental variables (Horner et al., 2005 ; Kazdin, 2011 ). However, visual analysis procedures are not standardized, subjective judgments about behavior change and magnitude of effects can be idiosyncratic, and interpretations often result in low agreement across analysts, all of which has led to criticism of the method (Kazdin, 2011 ; Lieberman, Yoder, Reichow, & Wolery, 2010 ). We developed our protocols to address these issues and provide standardized and systematic procedures to guide visual analysts through the comprehensive processes involved in making judgments about two common SCR designs: A-B-A-B and multiple baseline. Our initial evaluation of the protocols indicates that they improved reliability among visual analysts from fair to excellent, and the correspondence with expert visual analysis provides evidence of criterion validity. In addition, participants reported that they found the protocols easy to understand and navigate, supporting the social validity of the tools. These preliminary results are promising and highlight several areas for future research.

First, we plan to continue to examine the protocols’ reliability in a number of ways. Our results support the use of transparent and consistent visual analysis procedures for improving reliability. However, we did include a small sample of participants, which impacts the interpretation of our results. Specifically, the limited number of participants in each group may influence the accuracy of the ICCs, and we were unable to statistically compare the ICCs between the two groups to identify whether the differences were likely due to chance. Evaluating the protocols across a larger pool of raters will increase the precision of our reliability estimates and provide important information about the utility of the protocols.

In addition, we only included eight graphs in this investigation, and only two of these received mean scores at or above 3, which is the cutoff for demonstrating a functional relation using either method. Although we did not purposefully select graphs that did not depict a functional relation, we did attempt to include graphs with a range of difficulty and may have eliminated graphs with large, obvious effects as a result. Thus, this evaluation provides more compelling evidence of the reliability and validity of the tool for graphs that do not demonstrate a functional relation than for those that do. Additional investigations of the protocols with graphs that demonstrate functional relations are warranted. The application of the protocols to a larger sample of graphs will allow us to (a) examine the validity of the scoring procedures for additional and varied data patterns and (b) evaluate the appropriateness of individual item weights and the proposed interpretation guidelines for the overall experimental control score. The scores produced by the protocols could also be compared to other analytical approaches, such as statistics, to expand on the evaluation of the protocols’ validity.

In future investigations, we plan to compare the protocols to other methods of visual analysis with similar sensitivity. In the current study, we compared the protocols, which can produce scores with decimals (i.e., 2.5), to a rating scale, which could only produce integer-level scores (i.e., 2). It is possible that this differential sensitivity may have impacted our reliability estimates. There is some evidence that correlation coefficients increase but percentage agreement decreases when comparing reliability of a more sensitive rubric to a less sensitive version of the same rubric (Penny, Johnson, & Gordon, 2000a , 2000b ). However, because these studies compared different versions of the same measure, it is not clear that their findings apply to the current results given the distinct structures of the protocols and the rating scale. Nonetheless, we could mitigate this factor in future studies by allowing raters using the rating scale to select a score on a continuum from 0 to 5 (i.e., including decimals).

Second, we developed the protocols to be comprehensive, transparent, and ubiquitous. We intend for visual analysts at any level of training to be able to use the protocols to make reliable and sound decisions about data patterns and functional relations. Thus, we plan to continue to test agreement across different groups, including single-case researchers with expertise in visual analysis, practitioners, and students in SCR coursework who are learning to conduct visual analysis.

Third, the usability of the protocols is critical. The results of the social validity survey suggest that participants found the protocols to be user-friendly; however, all participants in the evaluation had already completed a course on SCR. Although even expert visual analysts are continually improving their visual analysis skills, we designed the protocols to support novice visual analysts who are acquiring their visual analysis knowledge and skills. Future research should involve testing the use of the protocols as an instructional tool for individuals who are learning how to visually analyze SCR data.

Fourth, we plan to continue the iterative development of the protocols. This pilot investigation identified questions that were likely to produce discrepant responses among users; future versions of the protocols could address this by providing more explicit instructions for how to examine the data to answer those questions. Additional examples embedded in the instructions for these questions could also improve agreement. We plan to update the protocols as additional information is published on the process of visual analysis and on the variables that influence agreement among visual analysts. For example, Barton et al. ( 2018 ) recommend that visual analysts examine the scaling of the y -axis to determine whether it is appropriate for the dependent variable and, in multiple-baseline designs, whether it is consistent across tiers. This initial step of the visual analysis process could be included in the next version of the protocol to ensure that it remains up-to-date with current recommended practices.

In conclusion, there is a clear need for standardized visual analysis procedures that improve consistency and agreement across visual analysts with a range of professional roles (e.g., researchers, practitioners). We developed and evaluated protocols for two common SCR designs and plan to use an iterative process to continue to test and refine our protocols to improve their reliability, validity, and usability. Improved consistency of visual analysis also might improve SCR syntheses, which is important for ensuring aggregate findings from SCR can be used to identify evidence-based practices.

Compliance with Ethical Standards

Katie Wolfe declares that she has no conflict of interest. Erin E. Barton declares that she has no conflict of interest. Hedda Meadan declares that she has no conflict of interest.

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The University of South Carolina Institutional Review Board approved the procedures in this study.

Informed consent was obtained from all individual participants included in the study.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Barton EE, Ledford JR, Lane JD, Decker J, Germansky SE, Hemmeter ML, Kaiser A. The iterative use of single case research designs to advance the science of EI/ECSE. Topics in Early Childhood Special Education. 2016; 36 (1):4–14. doi: 10.1177/0271121416630011. [ CrossRef ] [ Google Scholar ]
  • Barton EE, Lloyd BP, Spriggs AD, Gast DL. Visual analysis of graphic data. In: Ledford JR, Gast DL, editors. Single-case research methodology: Applications in special education and behavioral sciences. New York, NY: Routledge; 2018. pp. 179–213. [ Google Scholar ]
  • Barton EE, Meadan H, Fettig A. Comparison of visual analysis, non-overlap methods, and effect sizes in the evaluation of parent implemented functional assessment based interventions. Research in Developmental Disabilities. 2019; 85 :31–41. doi: 10.1016/j.ridd.2018.11.001. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brossart DF, Parker RI, Olson EA, Mahadevan L. The relationship between visual analysis and five statistical analyses in a simple AB single-case research design. Behavior Modification. 2006; 30 :531–563. doi: 10.1177/0145445503261167. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment. 1994; 6 (4):284–290. doi: 10.1037/1040-3590.6.4.284. [ CrossRef ] [ Google Scholar ]
  • Cooper CO, Heron TE, Heward WL. Applied behavior analysis. St. Louis: Pearson Education; 2007. [ Google Scholar ]
  • DeProspero A, Cohen S. Inconsistent visual analyses of intrasubject data. Journal of Applied Behavior Analysis. 1979; 12 (4):573–579. doi: 10.1901/jaba.1979.12-573. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fisch GS. Visual inspection of data revisited: Do the eyes still have it? The Behavior Analyst. 1998; 21 (1):111–123. doi: 10.1007/BF03392786. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Furlong MJ, Wampold BE. Intervention effects and relative variation as dimensions in experts’ use of visual inference. Journal of Applied Behavior Analysis. 1982; 15 (3):415–421. doi: 10.1901/jaba.1982.15-415. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hagopian LP, Fisher WW, Thompson RH, Owen-DeSchryver J, Iwata BA, Wacker DP. Toward the development of structured criteria for interpretation of functional analysis data. Journal of Applied Behavior Analysis. 1997; 30 (2):313–326. doi: 10.1901/jaba.1997.30-313. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hallgren KA. Computing inter-rater reliability for observational data: An overview and tutorial. Tutorial in Quantitative Methods for Psychology. 2012; 8 (1):23–34. doi: 10.20982/tqmp.08.1.p023. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hitchcock JH, Horner RH, Kratochwill TR, Levin JR, Odom SL, Rindskopf DM, Shadish WM. The what works Clearinghouse single-case design pilot standards: Who will guard the guards? Remedial and Special Education. 2014; 35 (3):145–152. doi: 10.1177/0741932513518979. [ CrossRef ] [ Google Scholar ]
  • Horner RH, Carr EG, Halle J, McGee G, Odom SL, Wolery M. The use of single-subject research to identify evidence-based practices in special education. Exceptional Children. 2005; 71 :165–179. [ Google Scholar ]
  • Horner RH, Spaulding SA. Single-subject designs. In: Salkind NE, editor. The encyclopedia of research design. Thousand Oaks: Sage Publications; 2010. pp. 1386–1394. [ Google Scholar ]
  • Horner RH, Swaminathan H, Sugai G, Smolkowski K. Considerations for the systematic analysis and use of single-case research. Education and Treatment of Children. 2012; 35 (2):269–290. doi: 10.1353/etc.2012.0011. [ CrossRef ] [ Google Scholar ]
  • Kahng SW, Chung KM, Gutshall K, Pitts SC, Kao J, Girolami K. Consistent visual analyses of intrasubject data. Journal of Applied Behavior Analysis. 2010; 43 (1):35–45. doi: 10.1901/jaba.2010.43-35. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Single-case research designs: Methods for clinical and applied settings. 2. New York: Oxford University Press; 2011. [ Google Scholar ]
  • Kratochwill TR, Hitchcock JH, Horner RH, Levin JR, Odom SL, Rindskopf DM, Shadish WR. Single-case intervention research design standards. Remedial and Special Education. 2013; 34 :26–38. doi: 10.1177/0741932512452794. [ CrossRef ] [ Google Scholar ]
  • Ledford JR, Gast DL. Single case research methodology: Applications in special education and behavioral sciences. New York: Routledge; 2018. [ Google Scholar ]
  • Lieberman RG, Yoder PJ, Reichow B, Wolery M. Visual analysis of multiple baseline across participants graphs when change is delayed. School Psychology Quarterly. 2010; 25 (1):28–44. doi: 10.1037/a0018600. [ CrossRef ] [ Google Scholar ]
  • Maggin DM, Briesch AM, Chafouleas SM. An application of the what works Clearinghouse standards for evaluating single subject research: Synthesis of the self-management literature base. Remedial and Special Education. 2013; 34 (1):44–58. doi: 10.1177/0741932511435176. [ CrossRef ] [ Google Scholar ]
  • Penny J, Johnson RL, Gordon B. The effect of rating augmentation on inter-rater reliability: An empirical study of a holistic rubric. Assessing Writing. 2000; 7 (2):143–164. doi: 10.1016/S1075-2935(00)00012-X. [ CrossRef ] [ Google Scholar ]
  • Penny J, Johnson RL, Gordon B. Using rating augmentation to expand the scale of an analytic rubric. Journal of Experimental Education. 2000; 68 (3):269–287. doi: 10.1080/00220970009600096. [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA. Summarizing single-subject research: Issues and applications. Behavior Modification. 1998; 22 (3):221–242. doi: 10.1177/01454455980223001. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shadish WR. Statistical analyses of single-case designs: The shape of things to come. Current Directions in Psychological Science. 2014; 23 (2):139–146. doi: 10.1177/0963721414524773. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Sullivan KJ. Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods. 2011; 43 (4):971–980. doi: 10.3758/s13428-011-0111-y. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin. 1979; 86 (2):420–428. doi: 10.1037/0033-2909.86.2.420. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith JD. Single-case experimental designs: A systematic review of published research and current standards. Psychological Methods. 2012; 17 :510–550. doi: 10.1037/a0029312. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • What Works Clearinghouse. (2017). Procedures and standards handbook (Version 4.0). Retrieved from https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_standards_handbook_v4.pdf . Accessed 9 Jan 2018.
  • Wolfe K, Seaman MA, Drasgow E. Interrater agreement on the visual analysis of individual tiers and functional relations in multiple baseline designs. Behavior Modification. 2016; 40 (6):852–873. doi: 10.1177/0145445516644699. [ PubMed ] [ CrossRef ] [ Google Scholar ]

The Analysis of Single-Case Research Data: Current Instructional Practices

  • Original Paper
  • Published: 30 June 2020
  • Volume 31 , pages 28–42, ( 2022 )

Cite this article

  • Katie Wolfe   ORCID: orcid.org/0000-0002-6298-1815 1 &
  • Meka N. McCammon 1  

666 Accesses

8 Citations

Explore all metrics

Visual analysis is the predominant method of analysis in single-case research (SCR). However, most research suggests that agreement between visual analysts is poor, which may be due to a lack of clear guidelines and criteria for visual analysis, as well as variability in how individuals are trained. We developed a survey containing questions about the content and methods used to teach visual and statistical analysis of SCR data in verified course sequences (VCS) and distributed it via the VCS Coordinator Listserv. Thirty-seven instructors completed the survey. Results suggest that there is variability across instructors in some fundamental aspects of data analysis (e.g., number of effects required for a functional relation) but a great deal of consistency in others (e.g., emphasizing visual over statistical analysis). We discuss our results along with their implications both for teaching students to analyze SCR data and for conducting additional research on behavior-analytic training programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

quality single case research designs should have reliability data

Similar content being viewed by others

quality single case research designs should have reliability data

Social Learning Theory—Albert Bandura

The use of cronbach’s alpha when developing and reporting research instruments in science education.

Keith S. Taber

quality single case research designs should have reliability data

Theories of Motivation in Education: an Integrative Framework

Detlef Urhahne & Lisette Wijnia

Barton, E. E., Lloyd, B. P., Spriggs, A. D., & Gast, D. L. (2018). Visual analysis of graphic data. In J. R. Ledford & D. L. Gast (Eds.), Single case research in behavioral sciences (3rd ed., pp. 179–214). New York: Routledge.

Chapter   Google Scholar  

Behavior Analyst Certification Board. (2012). BACB fourth edition task list . Retrieved from http://bacb.com/wp-content/uploads/2015/05/BACB_Fourth_Edition_Task_List.pdf .

Behavior Analyst Certification Board. (2014). Professional and ethical compliance code for behavior analysts . Littleton, CO: Author.

Google Scholar  

Behavior Analyst Certification Board. (2017). BACB fifth edition task list . Retrieved from https://www.bacb.com/wp-content/uploads/2017/09/170113-BCBA-BCaBA-task-list-5th-ed-.pdf .

Blydenburg, D. M., & Diller, J. W. (2016). Evaluating components of behavior analytic programs. Behavior Analysis in Practice, 9, 179–183. https://doi.org/10.1007/s40617-016-0123-2 .

Article   PubMed   PubMed Central   Google Scholar  

Burns, M. K. (2012). Meta-analysis of single-case design research: Introduction to the special issue. Journal of Behavioral Education, 21, 175–184. https://doi.org/10.1007/s10864-012-9158-9 .

Article   Google Scholar  

Carr, J. E., & Briggs, A. M. (2010). Strategies for making regular contact with the scholarly literature. Behavior Analysis in Practice, 3, 13–18. https://doi.org/10.1007/BF03391760 .

Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis (2nd ed.). Upper Saddle River, NJ: Merrill/Prentice Hall.

Council for Exceptional Children. (2014). Standards for evidence-based practices in special education. TEACHING Exceptional Children, 46, 206–212. https://doi.org/10.1177/0040059914531389 .

Craig, A. R., & Fisher, W. W. (2019). Randomization tests as alternative analysis methods for behavior-analytic data. Journal of the Experimental Analysis of Behavior, 111 (2), 309–328. https://doi.org/10.1002/jeab.500 .

Article   PubMed   Google Scholar  

DeHart, W. B., & Kaplan, B. A. (2019). Applying mixed-effects modeling to single-subject designs: An introduction. Journal of the Experimental Analysis of Behavior, 111 (2), 192–206. https://doi.org/10.1002/jeab.507 .

Fisch, G. S. (1998). Visual inspection of data revisited: Do the eyes still have it? The Behavior Analyst, 21, 111–123. https://doi.org/10.1007/BF03392786 .

Fisher, W. W., Kelley, M. E., & Lomas, J. E. (2003). Visual aids and structured criteria for improving visual inspection and interpretation of single-case designs. Journal of Applied Behavior Analysis, 36, 387–406. https://doi.org/10.1901/jaba.2003.36-387 .

Fisher, W. W., & Lerman, D. C. (2014). It has been said that, “There are three degrees of falsehoods: Lies, damn lies, and statistics”. Journal of School Psychology, 52, 243–248. https://doi.org/10.1016/j.jsp.2014.01.001 .

Gast, D. L., & Ledford, J. R. (Eds.). (2014). Single case research methodology: Applications in special education and behavioral sciences (2nd ed.). New York: Routledge.

Hagopian, L. P., Fisher, W. W., Thompson, R. H., Owen-DeSchryver, J., Iwata, B. A., & Wacker, D. P. (1997). Toward the development of structured criteria for interpretation of functional analysis data. Journal of Applied Behavior Analysis, 30, 313–326. https://doi.org/10.1901/jaba.1997.30-313 .

Hall, S. S., Pollard, J. S., Monlux, K. D., & Baker, J. M. (2020). Interpreting functional analysis outcomes using automated nonparametric statistical analysis. Journal of Applied Behavior Analysis . https://doi.org/10.1002/jaba.689 .

Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single subject research to identify evidence-based practice in special education. Exceptional Children, 71, 165–179. https://doi.org/10.1177/001440290507100203 .

Johnston, J. M., & Pennypacker, H. S., Jr. (2009). Strategies and tactics of behavioral research (3rd ed.). New York: Routledge.

Kazdin, A. E. (2011). Single-case research designs (2nd ed.). New York: Oxford University Press.

Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, et al. (2010). Single-case designs technical documentation . Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf .

Kratochwill, T. R., Hitchcock, J. H., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., et al. (2013). Single-case intervention research design standards. Remedial and Special Education, 34, 26–38. https://doi.org/10.1177/0741932512452794 .

Lanovaz, M. J., & Turgeon, S. (2020). How many tiers do we need? Type I errors and power in multiple baseline designs. PsyArXiv . https://doi.org/10.31234/osf.io/vr8ut .

Lieberman, R. G., Yoder, P. J., Reichow, B., & Wolery, M. (2010). Visual analysis of multiple baseline across participants graphs when change is delayed. School Psychology Quarterly, 25, 28–44. https://doi.org/10.1037/a0018600 .

Normand, M. P., & Bailey, J. S. (2006). The effects of celeration lines on visual data analysis. Behavior Modification, 30, 295–314. https://doi.org/10.1177/0145445503262406 .

Pustejovsky, J. E. (2018). Using response ratios for meta-analyzing single-case designs with behavioral outcomes. Journal of School Psychology, 68, 99–112. https://doi.org/10.1016/j.jsp.2018.02.003 .

Qualtrics. (2018). Qualtrics [Online survey software] . Retrieved from www.qualtrics.com .

Richman, D. M., Barnard-Brak, L., Grubb, L., Bosch, A., & Abby, L. (2015). Meta-analysis of noncontingent reinforcement effects on problem behavior. Journal of Applied Behavior Analysis, 48 (1), 131–152. https://doi.org/10.1002/jaba.189 .

Shadish, W. R. (2014). Statistical analyses of single-case designs: The shape of things to come. Current Directions in Psychological Science, 23, 139–146. https://doi.org/10.1177/0963721414524773 .

Shadish, W. R., Hedges, L. V., Horner, R. H., & Odom, S. L. (2015). The role of between-case effect size in conducting, interpreting, and summarizing single-case research. NCER 2015-002. National Center for Education Research, December, 1–109. https://ies.ed.gov/ncser/pubs/2015002/pdf/2015002.pdf .

Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods, 43, 971–980. https://doi.org/10.3758/s13428-011-0111-y .

Sidman, M. (1960). Tactics of scientific research . Oxford: Basic Books.

Slocum, T. A., Detrich, R., Wilczynski, S. M., Spencer, T. D., Lewis, T., & Wolfe, K. (2014). The evidence-based practice of applied behavior analysis. The Behavior Analyst, 37 (1), 41–56. https://doi.org/10.1007/s40614-014-0005-2 .

What Works Clearinghouse. (2017). Procedures and standards handbook (Version 4.0) . Retrieved from https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_standards_handbook_v4.pdf . Retrieved June 17, 2019.

Wolery, M. (2013). A commentary: Single-case design technical document of the what works clearinghouse. Remedial and Special Education, 34 (1), 39–43.

Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11, 203–214. https://doi.org/10.1901/jaba.1978.11-203 .

Wolfe, K., Barton, E. E., & Meadan, H. (2019). Systematic protocols for the visual analysis of single-case research data. Behavior Analysis in Practice, 12, 491–502. https://doi.org/10.1007/s40617-019-00336-7 .

Wolfe, K., Seaman, M. A., & Drasgow, E. (2016). Interrater agreement on the visual analysis of individual tiers and functional relations in multiple baseline designs. Behavior Modification, 40, 852–873. https://doi.org/10.1177/0145445516644699 .

Wolfe, K., & Slocum, T. A. (2015). A comparison of two approaches to training visual analysis of AB graphs. Journal of Applied Behavior Analysis, 48, 472–477. https://doi.org/10.1002/jaba.212 .

Young, M. E. (2018). A place for statistics in behavior analysis.  Behavior Analysis: Research and Practice ,  18 (2), 193–202. https://doi.org/10.1037/bar0000099 .

Zimmerman, K. N., Ledford, J. R., Severini, K. E., Pustejovsky, J. E., Barton, E. E., & Lloyd, B. P. (2018). Single-case synthesis tools I: Comparing tools to evaluate SCD quality and rigor. Research in Developmental Disabilities, 79, 19–32. https://doi.org/10.1016/j.ridd.2018.02.003 .

Download references

Author information

Authors and affiliations.

Department of Educational Studies, University of South Carolina, Columbia, SC, 29208, USA

Katie Wolfe & Meka N. McCammon

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Katie Wolfe .

Ethics declarations

Conflict of interest.

Conflict of interest statement removed from manuscript for blinding purposes.

Ethical Approval

The study procedures were reviewed and approved by the University of South Carolina Institutional Review Board.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Wolfe, K., McCammon, M.N. The Analysis of Single-Case Research Data: Current Instructional Practices. J Behav Educ 31 , 28–42 (2022). https://doi.org/10.1007/s10864-020-09403-4

Download citation

Published : 30 June 2020

Issue Date : March 2022

DOI : https://doi.org/10.1007/s10864-020-09403-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Single-case research
  • Visual analysis
  • Visual inspection
  • Higher education
  • Find a journal
  • Publish with us
  • Track your research

Main Logo

  • Vision, Mission and Purpose
  • Focus & Scope
  • Editorial Info
  • Open Access Policy
  • Editing Services
  • Article Outreach
  • Why with us
  • Focused Topics
  • Manuscript Guidelines
  • Ethics & Disclosures
  • What happens next to your Submission
  • Submission Link
  • Special Issues
  • Latest Articles
  • All Articles
  • Article Details

Improving the Methodological Quality of Single-Case Experimental Design Meta-Analysis

Laleh Jamshidi 1* , Lies Declercq 1 , John M. Ferron 2 , Mariola Moeyaert 3 , S. Natasha Beretvas 4 , and Wim Van den Noortgate 1 1 Faculty of Psychology and Educational Sciences & imec-Itec, KU Leuven (University of Leuven), Belgium 2 University of South Florida, Tampa, Florida, USA 3 University at Albany – State University of New York, New York, USA 4 University of Texas at Austin, Texas, USA

Single-case experimental design (SCED) studies are becoming more prevalent in a variety of different fields and are increasingly included in meta-analyses (MAs) and systematic reviews (SRs). As MA/SR’s conclusions are used as an evidence base for making decisions in practice and policy, the methodological quality and reporting standards of SRs/MAs are of uttermost importance. One way to improve the reliability and validity of SCED MAs and therefore to provide more confidence in MA/SR findings to practitioners and clinicians to decide on a particular intervention is the use of high-quality standards when conducting and reporting MAs/SRs. In the current study, some existing tools for assessing the quality of SRs/MAs that might also be helpful for SCED MAs will be reviewed briefly. These tools and guidelines can help meta-analysts, reviewers, and users to organize and evaluate the quality and reliability of the findings.

In order to investigate a certain intervention effect, the classic research design is a group comparison experimental design. In this kind of designs, the participants are randomly assigned to either intervention or control groups and the means of one or more dependent variables are compared to assess the effectiveness of the intervention. In order to get reliable effect size estimates and reach an acceptable level of statistical power, a large sample size of study participants is required in these designs. Single-case experimental designs (SCEDs) are alternative research designs that do not require many participants (or cases) and therefore are well suited to be used for studying rare phenomena, e.g., specific diseases or disabilities 1–3 . In this kind of designs, outcomes of interest are measured repeatedly for one or multiple cases under at least two conditions (i.e., typically a control phase followed by an intervention phase). Within each specific case, the measurements are compared across conditions or phases to investigate whether introducing the intervention has a causal effect on one or more outcomes 2,4–7 . SCEDs are frequently used in a variety of different fields such as psychology and educational sciences to evaluate the effectiveness of interventions of interest 7–11 .

Due to the small number of participants, the main issue of SCEDs is limited generalizability of their findings. To overcome this issue of generalizability, SCEDs can be replicated across participants, and systematic review (SR) approaches can be applied to synthesize the results 4,12,13 . A SR is a kind of literature review to identify, evaluate, and aggregate all relevant studies on the same topic. In order to decrease the possible systematic bias to answer particular research question(s), specific methods could be applied in SR 14 . A SR can include a meta-analysis (MA), which refers to a statistical integration of the findings from individual studies, typically by combining and comparing observed effect sizes 15 .

SCED data have specific features that should be taken into account while calculating effect sizes in individual studies and synthesizing the effect sizes in a meta-analysis afterwards; otherwise, biased estimates might be obtained and statistical inferences may be flawed. For instance, the outcome variable could systematically decrease or increase over time even without being exposed to any intervention. Such a time trend should be accounted for in calculating effect sizes 4,16 . Another feature that has to be considered is the possible presence of serial dependency or autocorrelation in which the sequential measurements are more similar compared to farther measurements, violating the assumption of independence 17,18 .

Conducting a SCED SR or MA could provide better insights into the overall effectiveness of interventions, as well as about factors that moderate the effect. Yet, poorly conducted SRs/MAs can lead to inaccurate inferences about the intervention effectiveness. Conclusions may be affected by deficiencies in designing, performing, and reporting these SRs/MAs. Therefore, it is important that users of SRs/MAs results (e.g., clinicians, researchers, and policy makers) consider the methodological quality of these studies. One way to do this is by assessing their quality by means of a standardized tool. Such a tool may also be useful for meta-analysts and systematic reviewers to ensure that their studies are well designed, conducted, and reported. On top of giving insight into the specific strengths and weaknesses of a study, such a tool can also be useful to assess the quality in general, although there is a considerable debate over using a quantifiable summary score to assess and rate the quality. The results of our recent systematic review of 178 SCED MAs conducted between 1985 and 2015 19 indicate that according to the R-AMSTAR, a considerable percentage of studies scored low on methodological quality. This tool assesses the methodological quality based on 11 main items that are further operationalized by means of 44 criteria. In order to apply the scale to SCED MAs rather than to MAs of group comparison studies, we had to reformulate some of the criteria. The MAs scored relatively high regarding some aspects such as “providing the characteristics of the included studies” and “doing a comprehensive literature search”. The main deficiencies were related to “reporting an assessment of the likelihood of publication bias” and “using the methods appropriately to combine the findings of studies”. In that review of SCED MAs, the methodological quality was evaluated by applying the modified R-AMSTAR, but there are other tools available that can be used. In the review of Jamshidi et al. (in press) 19 , the R-AMSTAR was chosen because it was found more comprehensive and detailed compared to other tools and due to its ability to produce a quantifiable assessment of methodological quality. More details related to the choice of the R-AMSTAR and the modified items can be found in that paper. In the current review, we give an overview of some of the frequently used tools for either assessing or improving the quality of SRs/MAs, and discuss their appropriateness for SCED SRs/MAs. To the best of our knowledge, there is no specific validated tool to assess the quality of SCED MAs or SRs and further research to produce a validated tool would be quite beneficial.

Approaches to Evaluate and Improve the Quality of Systematic Reviews and Meta-Analyses

To avoid inaccurate conclusions that might mislead decision-makers, meta-analysts and systematic reviewers should try to decrease key methodological deficiencies 20–22 , such as not applying a random-effects model in case of heterogeneity, not assessing the likelihood of publication bias, or not assessing the scientific quality of included studies in formulating the conclusions, among others. Such kinds of deficiencies could also be expected to occur in SCED MAs and SRs. Conflicting results from SRs may confuse readers 23 , and make it more difficult for practitioners and clinicians to make appropriate inferences. In order for systematic reviews and meta-analyses to provide valid and reliable evidence for informing decisions in research and policy-making, these must strictly uphold high methodological standards 21,23–25 .

In addition, the users of SRs and MAs have a responsibility 26 : scientists, practitioners and clinicians should critically examine the methodological quality of a SR to avoid potentially misleading information when developing clinical decisions and guidelines 20,25,27,28 .

Several tools have been developed specifically to assess the quality of SRs and MAs by either those who are conducting MA/SR or also those who use the results of MAs and SRs, such as practitioners and clinicians. By applying such tools, meta-analysts could ensure their studies meet high standards of quality, while users could be more informed on the reliability of MA or SR when basing their decisions. Table 1 lists some of the more well-known and commonly used tools 28–30 , which have been specifically developed for assessing the quality of SRs/MAs and describes the basic features and guidance for their use (e.g. the purpose of the tool, the number of items, the items, and the judgement). Note that these tools are not specifically intended for meta-analyzing results from SCED studies. However, facets of these tools are useful and appropriate for judging the quality of SRs and MAs of SCED research studies.

Table 1: Overview of characteristics of tools for assessing the quality of SRs/MAs

Some of these tools focus on the description of the methodology and findings (e.g. PRISMA and QUOROM) and some others concentrate on methodological quality and evaluate how well the SR was designed and performed (e.g. AMSTAR, R-AMSTAR, OQAQ) 31 . Some of the above-mentioned tools explicitly address that they can be used not only for conducting and reporting the MAs/SRs, but also for critical appraising published MAs/ SRs (e.g. Sack’s checklist, PRISMA, QUOROM). Although it was not explicitly stated in descriptions of other tools whether they could be applied by meta-analysts and reviewers while they are performing and reporting their studies, we believe that being aware of criteria that might be used for critically appraising the quality of SRs/MAs could be helpful for researchers for designing, conducting, and reporting the results and conclusions of SRs/MAs.

Table 2 gives a further comparison of the content of the reviewed tools. Some tools assess one aspect of methodological quality via one general item, whereas others use multiple detailed criteria. The comparison indicates that providing search strategy, validity/quality assessment of primary studies, and checking the possibility of combining the results are the aspects that were considered in all reviewed tools.

Table 2: Comparison of aspects related to methodological quality of SRs/MAs among tools

SRs and MAs are essential methods for aggregating the results of primary studies in a specific field. Nevertheless, the reliability and validity of their associated conclusions could be compromised by the risk of methodological flaws. Since limited generalizability is a key limitation of SCED studies in providing a source of information for practitioners and clinicians to make the best decisions and guidelines for practice, conducting high-quality SCED MAs/SRs is of the uttermost importance. The recent results of the review of methodological quality of SCED MAs 19 , indicate that improving the scientific quality of SCED MAs/SRs is necessary. Applying a validated tool (or using a modified tool or a combination of tools) consisting of methodological standards might be helpful for supporting meta-analysts/reviewers who are conducting studies or might help users (e.g. clinicians, practitioners, and decision-makers) to appraise the quality of MA/SR that they are referencing. Because there is no validated tool to assess specifically the methodological quality of SCED MAs and SRs, applied researchers can use one of the existing tools or combinations of multiple tools, or better yet develop and validate a new tool to conduct high-quality MAs/SRs of SCED studies’ results. Most of these tools could be used to evaluate the quality of SCED MAs/SRs because they mainly focus on general aspects of the methodological quality of studies that do not depend highly on primary studies included in the review.

However, it is possible that some of the detailed criteria of the reviewed tools require being modified, omitted, or added to make it more applicable for assessing SCED MAs as was done in the study by Jamshidi et al. (in press) 19 . For instance, based on the recommendations of What Works Clearinghouse (WWC) 2 for combining the results of multiple SCED studies into a single summary, MAs have to meet certain thresholds: 1) a minimum of five SCED studies examining the intervention that Meet Evidence Standards or Meet Evidence Standards with Reservations, 2) the SCED studies must be conducted by at least three different research teams at three different geographical locations, and 3) the aggregated number of experiments across the studies must total at least 20. Such criteria can help SCED meta-analysts ensure they are following some standards while conducting their own reviews. In addition, the possible features of SCED data such as time trend or serial dependency that might lead to overestimated or underestimated intervention effect should be taken into consideration in meta-analyses. None of the reviewed tools specifically took into account these SCED-specific recommendations and it might be because the tools were not developed for assessing the quality of SCED MAs in particular. These recommendations can be considered by either the meta-analysts or users while developing new tools or applying existing tools for assessing the methodological quality of SCED MAs.

Acknowledgement

This project was supported in part by the Institute of Education Sciences, U.S. Department of Education, under Grant R305D150007. All the content is solely the responsibility of the authors and do not represent views of the Institute of Education Sciences, U.S. Department of Education.

Authors declare that they have no conflict of interest.

  • Barlow DH, Nock MK, Hersen M. Single Case Experimental Designs. 3rd ed. Boston: Pearson/Allyn and Bacon; 2009.
  • Kratochwill TR, Hitchcock J, Horner RH, et al. Single-case design technical documentation. What Works Clearing House. http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf. Published 2010.
  • Onghena P. Single-case designs. In: Everittt BS, Howell DC, eds. Encyclopedia of Statistics in Behavioral Science. Chichester: John Wiley & Sons; 2005:1850-1854.
  • Beretvas SN, Chung H. An evaluation of modified R 2-change effect size indices for single-subject experimental designs. Evid Based Commun Assess Interv. 2008; 2(3): 120-128. doi:10.1080/17489530802446328.
  • Moeyaert M, Ugille M, Ferron JM, et al. The influence of the design matrix on treatment effect estimates in the quantitative analyses of single-subject experimental design research. Behav Modif. 2014; 38(5): 665-704. doi:10.1177/0145445514535243.
  • Rogers LA, Graham S. A meta-analysis of single subject design writing intervention research. J Educ Psychol. 2008; 100(4): 879-906. doi:10.1037/0022-0663.100.4.879.
  • Smith JD. Single-case experimental designs: A systematic review of published research and current standards. Psychol Methods. 2012; 17(4): 1-70. doi:10.1037/a0029312.
  • Schlosser RW, Lee DL, Wendt O. Application of the percentage of non-overlapping data (PND) in systematic reviews and meta-analyses: A systematic review of reporting characteristics. Evidence-Based Commun Assess Interv. 2008; 2(3): 163-187. doi:10.1080/17489530802505412.
  • Shadish WR. Statistical analyses of single-case designs: The shape of things to come. Curr Dir Psychol Sci. 2014; 23(2): 139-146. doi:10.1177/0963721414524773.
  • Shadish WR, Hedges LV. Pustejovsky JE. Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: A primer and applications. J Sch Psychol. 2014; 52(2): 123-147. doi:10.1016/j.jsp.2013.11.005.
  • Shadish WR, Rindskopf DM. Methods for evidence-based practice: Quantitative synthesis of single-subject designs. New Dir Eval. 2007; 113: 95-109. doi:10.1002/ev.217.
  • Petit-Bois M, Baek EK, Van den Noortgate W, et al. The consequences of modeling autocorrelation when synthesizing single-case studies using a three-level model. Behav Res Methods. 2016; 48(2): 803-812. doi:10.3758/s13428-015-0612-1.
  • Tincani M, De Mers M. Meta-analysis of single-case research design studies on instructional pacing. Behav Modif. 2016; 40(6): 799-824. doi:10.1177/0145445516643488.
  • Petticrew M, Roberts H. Systematic Reviews in the Social Sciences: A Practical Guide. Oxford: Blackwell Publishing Limited; 2006. doi:10.1002/9780470754887.
  • Cooper H, Hedges LV. Research synthesis as a scientific process. In: Cooper H, Hedges L V., Valentine JC, eds. Handbook of Research Synthesis and Meta-Analysis. 2nd ed. New York: Russell Sage Foundation; 2009:3-18. http://www.google.se/books?hl=sv&lr=&id=LUGd6B9eyc4C&pgis=1.
  • Campbell JM. Statistical comparison of four effect sizes for single-subject designs. Behav Modif. 2004; 28(2): 234-246. doi:10.1177/0145445503259264.
  • Van den Noortgate W, Onghena P. Hierarchical linear models for the quantitative integration of effect sizes in single-case research. Behav Res methods instruments Comput. 2003; 35(1): 1-10. doi:10.3758/BF03195492.
  • Van den Noortgate W, Onghena P. A multilevel meta-analysis of single-subject experimental design studies. Evid Based Commun Assess Interv. 2008; 2(3): 142-151. doi:10.1080/17489530802505362.
  • Reference omitted in function of blind review
  • Remschmidt C, Wichmann O, Harder T. Methodological quality of systematic reviews on influenza vaccination. Vaccine. 2014; 32(15): 1678-1684. doi:10.1016/j.vaccine.2014.01.060.
  • Faggion CMJ, Giannakopoulos NN. Critical appraisal of systematic reviews on the effect of a history of periodontitis on dental implant loss. J Clin Periodontol. 2013; 40(5): 542-552. doi:10.1111/jcpe.12096.
  • Pinnock H, Parke HL, Panagioti M, et al. Systematic meta-review of supported self-management for asthma: A healthcare perspective. BMC Med. 2017; 15(64). doi:10.1186/s12916-017-0823-7.
  • Wells C, Kolt GS, Marshall P, Hill B, et al. Effectiveness of Pilates exercise in treating people with chronic low back pain: A systematic review of systematic reviews. BMC Med Res Methodol. 2013; 13(7): 1-12. doi:10.1186/1471-2288-13-7.
  • Hall AM, Lee S, Zurakowski D. Quality assessment of meta-analyses published in leading anesthesiology journals from 2005 to 2014. Anesth Analg. 2017; 124(6): 2063-2067. doi:10.1213/ANE.0000000000002074.
  • Rotta I, Salgado TM, Silva ML, et al. Effectiveness of clinical pharmacy services: An overview of systematic reviews (2000–2010). Int J Clin Pharm. 2015; 37(5): 687-697. doi:10.1007/s11096-015-0137-9.
  • Pieper D, Mathes T, Eikermann M. Can AMSTAR also be applied to systematic reviews of non-randomized studies? BMC Res Notes. 2014; 7(609): 1-6. doi:10.1186/1756-0500-7-609.
  • Faggion CMJ. Critical appraisal of AMSTAR: challenges, limitations, and potential solutions from the perspective of an assessor. BMC Med Res Methodol. 2015; 15(63). doi:10.1186/s12874-015-0062-6.
  • Shea BJ, Grimshaw JM, Wells GA, et al. Development of AMSTAR: A measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007; 7(10): 1-7. doi:10.1186/1471-2288-7-10.
  • Zeng X, Zhang Y, Kwong JSW, et al. The methodological quality assessment tools for preclinical and clinical studies , systematic review and meta-analysis , and clinical practice guideline?: A systematic review. 2015; 8: 2-10. doi:10.1111/jebm.12141.
  • Pussegoda K, Turner L, Garritty C, et al. Systematic review adherence to methodological or reporting quality. Syst Rev. 2017; 6(131): 1-14. doi:10.1186/s13643-017-0527-2.
  • Pussegoda K, Turner L, Garritty C, et al. Identifying approaches for assessing methodological and reporting quality of systematic reviews: A descriptive study. Syst Rev. 2017; 6(1): 1-13. doi:10.1186/s13643-017-0507-6.
  • Sacks HS, Berrier J, Reitman D, et al. Meta-analyses of randomized controlled trials. N Engl J Med. 1987; 316(8): 450-455. doi:10.1056/NEJM198702193160806.
  • Oxman AD, Guyatt GH, Singer J, et al. Agreement among reviwers of review articles. J Clin Epidemiol. 1991; 44(1): 91-98. doi:https://doi.org/10.1016/0895-4356(91)90205-N.
  • Salmos J, Gerbi MEMM, Braz R, et al. Methodological quality of systematic reviews analyzing the use of laser therapy in restorative dentistry. Lasers Med Sci. 2010; 25(1): 127-136. doi:10.1007/s10103-009-0733-9.
  • Kung J, Chiappelli F, Cajulis OO, et al. From systematic reviews to clinical recommendations for evidence- based health care: Validation of revised assessment of multiple systematic reviews (R-AMSTAR) for grading of clinical relevance. Open Dent J. 2010; 4: 84-91. doi:10.2174/1874210601004020084.
  • Methodology Checklist 1: Systematic Reviews and Meta-Analyses. Scottish Intercollegiate Guidelines Network; 2012. http://www.sign.ac.uk/checklists-and-notes.html.
  • Moher D, Cook DJ, Eastwood S, et al. Improving the quality of reports of meta-analyses of randomised controlled trials: The QUOROM statement. Br J Surg. 2000; 87: 1448-1454. doi:10.1159/000055014.
  • Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. J Chinese Integr Med. 2009; 7(9): 889-896. doi:10.3736/jcim20090918.
  • Critical Appraisal Skills Programme. CASP Systematic Review checklist. https://casp-uk.net/casp-tools-checklists/. Published 2018.
  • The Social Care Guidance Manual:Process and Methods. National Institute for Health and Care Excellence; 2013. https://www.nice.org.uk/process/pmg10/chapter/introduction. Accessed April 13, 2018.
  • Churchill R, Lasserson T, Chandler J, et al. Standards for the reporting of new Cochrane Intervention Reviews. In: Higgins JP, Lasserson T, Chandler J, Tovey D, Churchill R, eds. Methodological Expectations of Cochrane Intervention Reviews. Cochrane: London; 2016:37-58.
  • Whiting P, Savovi? J, Higgins JPT, et al. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016; 69: 225-234. doi:10.1016/j.jclinepi.2015.06.005.

Article Info

  • Journal of Mental Health & Clinical Psychology
  • Article Type : Mini Review
  • View/Download pdf
  • DOI : 10.29245/2578-2959/2018/4.1140

Article Notes

  • Published on: July 07, 2018
  • Meta-analysis
  • Systematic review
  • Methodological quality

*Correspondence:

IMAGES

  1. single subject research design recommendations for levels of evidence

    quality single case research designs should have reliability data

  2. Reliability vs. Validity in Research

    quality single case research designs should have reliability data

  3. Internal reliability in qualitative research

    quality single case research designs should have reliability data

  4. case study method of qualitative research

    quality single case research designs should have reliability data

  5. Types Of Qualitative Research Design With Examples

    quality single case research designs should have reliability data

  6. Validity and reliability in quantitative research example

    quality single case research designs should have reliability data

VIDEO

  1. SAMPLING PROCEDURE AND SAMPLE (QUALITATIVE RESEARCH)

  2. Research Designs: Part 2 of 3: Qualitative Research Designs (ሪሰርች ዲዛይን

  3. Chapter 7: Single-Case Research

  4. QUANTITATIVE METHODOLOGY (Part 2 of 3):

  5. Validity vs Reliability || Research ||

  6. Charting in Single Case Research

COMMENTS

  1. Single-Case Design, Analysis, and Quality Assessment for Intervention Research

    Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single case studies involve repeated measures, and manipulation of and independent variable. They can be designed to have strong internal validity for assessing causal relationships between interventions and outcomes, and external ...

  2. Single-case intervention research design standards: Additional proposed

    [N]onindependence issues that negatively impact statistical-conclusion validity occur in [single-case intervention] research investigations but which, fortunately, can be alleviated by adopting appropriate data-analysis strategies…[including randomization statistical test procedures to accompany the randomized single-case designs that were ...

  3. PDF Guide for the Use of Single Case Design Research Evidence

    Components of Single Case Design Research Single case research requires repeated measurement of behavior over time. This repeated measurement may include, for example, assessing the extent to which a behavior occurs for multiple consecutive days. Most often, the types of behaviors appropriate for measurement in SCR are proximal and context-bound.

  4. PDF Key Criteria Used in WWC Reviews of Single-Case Design Research

    Eligibility for WWC review of an SCD. The eligibility criteria for a WWC review of SCDs require that the study be publicly available, released within the. 20 years preceding the review, uses eligible populations, examines eligible interventions, and has eligible outcomes. In addition, studies that are eligible for review as SCDs are identiÿed ...

  5. A systematic review of applied single-case research ...

    Single-case experimental designs (SCEDs) have become a popular research methodology in educational science, psychology, and beyond. The growing popularity has been accompanied by the development of specific guidelines for the conduct and analysis of SCEDs. In this paper, we examine recent practices in the conduct and analysis of SCEDs by systematically reviewing applied SCEDs published over a ...

  6. Advancing the Application and Use of Single-Case Research Designs

    The application of single-case research methods is entering a new phase of scientific relevance. Researchers in an increasing array of disciplines are finding single-case methods useful for the questions they are asking and the clinical needs in their fields (Kratochwill et al., 2010; Maggin et al., 2017; Maggin & Odom, 2014; Riley-Tillman, Burns, & Kilgus, 2020).

  7. Single-Case Intervention Research Design Standards

    In an effort to responsibly incorporate evidence based on single-case designs (SCDs) into the What Works Clearinghouse (WWC) evidence base, the WWC assembled a panel of individuals with expertise in quantitative methods and SCD methodology to draft SCD standards.

  8. Reliability, Validity, and Usability of Data Extraction Programs for

    The role of single-case designs in supporting rigorous intervention development and evaluation at the Institute of Educational Sciences. In Kratochwill T. R., Levin (Eds.). Single-case intervetion research: Methodological and statistical advances (pp. 283 - 296), Washington, DC: American Psychological Association.

  9. Reliable visual analysis of single-case data: A comparison of rating

    Single-case experimental designs are used in many areas of psychological research, but investigators have yet to resolve substantial problems with single-case data analysis (Smith, 2012). The most common method of single-case data analysis is visual analysis; however, interrater reliability among visual raters tends to be poor.

  10. Single Case Research Design

    Argue why single-case research is a subsidiary design that should be used if no other, more specific research design fits. FormalPara Single case research in a nutshell Purpose: In-depth analysis of a small sample in its environmental context. Contextual conditions are part of the research process. A non-random sampling of cases.

  11. Single-case design standards: An update and proposed upgrades

    In this paper, we provide a critique focused on the What Works Clearinghouse (WWC) Standards for Single-Case Research Design (Standards 4.1).Specifically, we (a) recommend the use of visual-analysis to verify a single-case intervention study's design standards and to examine the study's operational issues, (b) identify limitations of the design-comparable effect-size measure and discuss ...

  12. Single-Case Data Analysis: A Practitioner Guide for Accurate and

    This paper summarizes single-case experimental design and considerations for professionals to improve the accuracy and reliability of judgments made from single-case data. This paper can also help practitioners to appropriately incorporate single-case research design applications in their practice.

  13. Optimizing behavioral health interventions with single-case designs

    Research methods are tools to discover new phenomena, test theories, and evaluate interventions. Many researchers have argued that our research tools have become limited, particularly in the domain of behavioral health interventions [1-9].The reasons for their arguments vary, but include an overreliance on randomized controlled trials, the slow pace and high cost of such trials, and the lack ...

  14. Single-Case Design, Analysis, and Quality Assessment for Int ...

    Summary of Key Points: Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single-case studies involve repeated measures and manipulation of an independent variable. They can be designed to have strong internal validity for assessing causal relationships between interventions and outcomes, as well as external validity for ...

  15. PDF Generality of Findings From Single-Case Designs: It's Not ...

    small sample size in [single-case designs] limits generalizabil-ity (p. 121). The underlying logic of the previous statements. ". is that the generality of research findings is a function of the number of participants in a study (its N). This implies that one could simply add more participants to a study to increase the generality of its ...

  16. Systematic Protocols for the Visual Analysis of Single-Case Research Data

    Single-case research (SCR) is the predominant methodology used to evaluate causal relations between interventions and target behaviors in applied behavior analysis and related fields such as special education and psychology (Horner et al., 2005; Kazdin, 2011 ). This methodology focuses on the individual case as the unit of analysis and is well ...

  17. PDF Single-Case Design: A Promising Tool for Improvement Science

    This article provides an overview of single-case research design and how it can be used in concert with Improvement Science. Hypothetical examples of using single-case design to inform improvement efforts are ... time, and the accuracy of the data is established (e.g., data reliability). Thus, single-case design investigations require sustained

  18. Evaluating Quality and Rigor in Single Case Research

    Evaluating Quality and Rigor in Single Case Research . ... This chapter focuses on assessing rigor of single case design (SCD) studies. A study with low risk of bias and high rigor is internally valid. ... demonstrating adequate reliability of dependent and independent variables, and collecting a sufficient amount of data. In the areas of ...

  19. Single-Case Research Design

    Single-Case Research Designs. Although usually labeled a quasi-experimental time-series design, single-case research designs are described in this article as a separate form of research design (formerly termed single-subject or N = 1 research) that have a long and influential history in psychology and education (e.g., Kratochwill, 1978; Levin ...

  20. Single Case Research Design

    Abstract. This chapter addresses the peculiarities, characteristics, and major fallacies of single case research designs. A single case study research design is a collective term for an in-depth analysis of a small non-random sample. The focus on this design is on in-depth.

  21. The Analysis of Single-Case Research Data: Current Instructional

    Visual analysis is the predominant method of analysis in single-case research (SCR). However, most research suggests that agreement between visual analysts is poor, which may be due to a lack of clear guidelines and criteria for visual analysis, as well as variability in how individuals are trained. We developed a survey containing questions about the content and methods used to teach visual ...

  22. Improving the Methodological Quality of Single-Case Experimental Design

    Abstract. Single-case experimental design (SCED) studies are becoming more prevalent in a variety of different fields and are increasingly included in meta-analyses (MAs) and systematic reviews (SRs). As MA/SR's conclusions are used as an evidence base for making decisions in practice and policy, the methodological quality and reporting ...