Single-Case Data Analysis: A Practitioner Guide for Accurate and Reliable Decisions

Affiliation.

  • 1 University of Hawai'i at Mānoa, Honolulu, HI, USA.
  • PMID: 31441315
  • DOI: 10.1177/0145445519867054

Practitioners frequently use single-case data for decision-making related to behavioral programming and progress monitoring. Visual analysis is an important and primary tool for reporting results of graphed single-case data because it provides immediate, contextualized information. Criticisms exist concerning the objectivity and reliability of the visual analysis process. When practitioners are equipped with knowledge about single-case designs, including threats and safeguards to internal validity, they can make technically accurate conclusions and reliable data-based decisions with relative ease. This paper summarizes single-case experimental design and considerations for professionals to improve the accuracy and reliability of judgments made from single-case data. This paper can also help practitioners to appropriately incorporate single-case research design applications in their practice.

Keywords: internal validity; reliability; single-case design; visual analysis.

  • Decision Making*

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Single-Case Experimental Designs: A Systematic Review of Published Research and Current Standards

Justin d. smith.

Child and Family Center, University of Oregon

This article systematically reviews the research design and methodological characteristics of single-case experimental design (SCED) research published in peer-reviewed journals between 2000 and 2010. SCEDs provide researchers with a flexible and viable alternative to group designs with large sample sizes. However, methodological challenges have precluded widespread implementation and acceptance of the SCED as a viable complementary methodology to the predominant group design. This article includes a description of the research design, measurement, and analysis domains distinctive to the SCED; a discussion of the results within the framework of contemporary standards and guidelines in the field; and a presentation of updated benchmarks for key characteristics (e.g., baseline sampling, method of analysis), and overall, it provides researchers and reviewers with a resource for conducting and evaluating SCED research. The results of the systematic review of 409 studies suggest that recently published SCED research is largely in accordance with contemporary criteria for experimental quality. Analytic method emerged as an area of discord. Comparison of the findings of this review with historical estimates of the use of statistical analysis indicates an upward trend, but visual analysis remains the most common analytic method and also garners the most support amongst those entities providing SCED standards. Although consensus exists along key dimensions of single-case research design and researchers appear to be practicing within these parameters, there remains a need for further evaluation of assessment and sampling techniques and data analytic methods.

The single-case experiment has a storied history in psychology dating back to the field’s founders: Fechner (1889) , Watson (1925) , and Skinner (1938) . It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see Morgan & Morgan, 2001 ). In recent years the single-case experimental design (SCED) has been represented in the literature more often than in past decades, as is evidenced by recent reviews ( Hammond & Gast, 2010 ; Shadish & Sullivan, 2011 ), but it still languishes behind the more prominent group design in nearly all subfields of psychology. Group designs are often professed to be superior because they minimize, although do not necessarily eliminate, the major internal validity threats to drawing scientifically valid inferences from the results ( Shadish, Cook, & Campbell, 2002 ). SCEDs provide a rigorous, methodologically sound alternative method of evaluation (e.g., Barlow, Nock, & Hersen, 2008 ; Horner et al., 2005 ; Kazdin, 2010 ; Kratochwill & Levin, 2010 ; Shadish et al., 2002 ) but are often overlooked as a true experimental methodology capable of eliciting legitimate inferences (e.g., Barlow et al., 2008 ; Kazdin, 2010 ). Despite a shift in the zeitgeist from single-case experiments to group designs more than a half century ago, recent and rapid methodological advancements suggest that SCEDs are poised for resurgence.

Single case refers to the participant or cluster of participants (e.g., a classroom, hospital, or neighborhood) under investigation. In contrast to an experimental group design in which one group is compared with another, participants in a single-subject experiment research provide their own control data for the purpose of comparison in a within-subject rather than a between-subjects design. SCEDs typically involve a comparison between two experimental time periods, known as phases. This approach typically includes collecting a representative baseline phase to serve as a comparison with subsequent phases. In studies examining single subjects that are actually groups (i.e., classroom, school), there are additional threats to internal validity of the results, as noted by Kratochwill and Levin (2010) , which include setting or site effects.

The central goal of the SCED is to determine whether a causal or functional relationship exists between a researcher-manipulated independent variable (IV) and a meaningful change in the dependent variable (DV). SCEDs generally involve repeated, systematic assessment of one or more IVs and DVs over time. The DV is measured repeatedly across and within all conditions or phases of the IV. Experimental control in SCEDs includes replication of the effect either within or between participants ( Horner et al., 2005 ). Randomization is another way in which threats to internal validity can be experimentally controlled. Kratochwill and Levin (2010) recently provided multiple suggestions for adding a randomization component to SCEDs to improve the methodological rigor and internal validity of the findings.

Examination of the effectiveness of interventions is perhaps the area in which SCEDs are most well represented ( Morgan & Morgan, 2001 ). Researchers in behavioral medicine and in clinical, health, educational, school, sport, rehabilitation, and counseling psychology often use SCEDs because they are particularly well suited to examining the processes and outcomes of psychological and behavioral interventions (e.g., Borckardt et al., 2008 ; Kazdin, 2010 ; Robey, Schultz, Crawford, & Sinner, 1999 ). Skepticism about the clinical utility of the randomized controlled trial (e.g., Jacobsen & Christensen, 1996 ; Wachtel, 2010 ; Westen & Bradley, 2005 ; Westen, Novotny, & Thompson-Brenner, 2004 ) has renewed researchers’ interest in SCEDs as a means to assess intervention outcomes (e.g., Borckardt et al., 2008 ; Dattilio, Edwards, & Fishman, 2010 ; Horner et al., 2005 ; Kratochwill, 2007 ; Kratochwill & Levin, 2010 ). Although SCEDs are relatively well represented in the intervention literature, it is by no means their sole home: Examples appear in nearly every subfield of psychology (e.g., Bolger, Davis, & Rafaeli, 2003 ; Piasecki, Hufford, Solham, & Trull, 2007 ; Reis & Gable, 2000 ; Shiffman, Stone, & Hufford, 2008 ; Soliday, Moore, & Lande, 2002 ). Aside from the current preference for group-based research designs, several methodological challenges have repressed the proliferation of the SCED.

Methodological Complexity

SCEDs undeniably present researchers with a complex array of methodological and research design challenges, such as establishing a representative baseline, managing the nonindependence of sequential observations (i.e., autocorrelation, serial dependence), interpreting single-subject effect sizes, analyzing the short data streams seen in many applications, and appropriately addressing the matter of missing observations. In the field of intervention research for example, Hser et al. (2001) noted that studies using SCEDs are “rare” because of the minimum number of observations that are necessary (e.g., 3–5 data points in each phase) and the complexity of available data analysis approaches. Advances in longitudinal person-based trajectory analysis (e.g., Nagin, 1999 ), structural equation modeling techniques (e.g., Lubke & Muthén, 2005 ), time-series forecasting (e.g., autoregressive integrated moving averages; Box & Jenkins, 1970 ), and statistical programs designed specifically for SCEDs (e.g., Simulation Modeling Analysis; Borckardt, 2006 ) have provided researchers with robust means of analysis, but they might not be feasible methods for the average psychological scientist.

Application of the SCED has also expanded. Today, researchers use variants of the SCED to examine complex psychological processes and the relationship between daily and momentary events in peoples’ lives and their psychological correlates. Research in nearly all subfields of psychology has begun to use daily diary and ecological momentary assessment (EMA) methods in the context of the SCED, opening the door to understanding increasingly complex psychological phenomena (see Bolger et al., 2003 ; Shiffman et al., 2008 ). In contrast to the carefully controlled laboratory experiment that dominated research in the first half of the twentieth century (e.g., Skinner, 1938 ; Watson, 1925 ), contemporary proponents advocate application of the SCED in naturalistic studies to increase the ecological validity of empirical findings (e.g., Bloom, Fisher, & Orme, 2003 ; Borckardt et al., 2008 ; Dattilio et al., 2010 ; Jacobsen & Christensen, 1996 ; Kazdin, 2008 ; Morgan & Morgan, 2001 ; Westen & Bradley, 2005 ; Westen et al., 2004 ). Recent advancements and expanded application of SCEDs indicate a need for updated design and reporting standards.

Many current benchmarks in the literature concerning key parameters of the SCED were established well before current advancements and innovations, such as the suggested minimum number of data points in the baseline phase(s), which remains a disputed area of SCED research (e.g., Center, Skiba, & Casey, 1986 ; Huitema, 1985 ; R. R. Jones, Vaught, & Weinrott, 1977 ; Sharpley, 1987 ). This article comprises (a) an examination of contemporary SCED methodological and reporting standards; (b) a systematic review of select design, measurement, and statistical characteristics of published SCED research during the past decade; and (c) a broad discussion of the critical aspects of this research to inform methodological improvements and study reporting standards. The reader will garner a fundamental understanding of what constitutes appropriate methodological soundness in single-case experimental research according to the established standards in the field, which can be used to guide the design of future studies, improve the presentation of publishable empirical findings, and inform the peer-review process. The discussion begins with the basic characteristics of the SCED, including an introduction to time-series, daily diary, and EMA strategies, and describes how current reporting and design standards apply to each of these areas of single-case research. Interweaved within this presentation are the results of a systematic review of SCED research published between 2000 and 2010 in peer-reviewed outlets and a discussion of the way in which these findings support, or differ from, existing design and reporting standards and published SCED benchmarks.

Review of Current SCED Guidelines and Reporting Standards

In contrast to experimental group comparison studies, which conform to generally well agreed upon methodological design and reporting guidelines, such as the CONSORT ( Moher, Schulz, Altman, & the CONSORT Group, 2001 ) and TREND ( Des Jarlais, Lyles, & Crepaz, 2004 ) statements for randomized and nonrandomized trials, respectively, there is comparatively much less consensus when it comes to the SCED. Until fairly recently, design and reporting guidelines for single-case experiments were almost entirely absent in the literature and were typically determined by the preferences of a research subspecialty or a particular journal’s editorial board. Factions still exist within the larger field of psychology, as can be seen in the collection of standards presented in this article, particularly in regard to data analytic methods of SCEDs, but fortunately there is budding agreement about certain design and measurement characteristics. A number of task forces, professional groups, and independent experts in the field have recently put forth guidelines; each has a relatively distinct purpose, which likely accounts for some of the discrepancies between them. In what is to be a central theme of this article, researchers are ultimately responsible for thoughtfully and synergistically combining research design, measurement, and analysis aspects of a study.

This review presents the more prominent, comprehensive, and recently established SCED standards. Six sources are discussed: (1) Single-Case Design Technical Documentation from the What Works Clearinghouse (WWC; Kratochwill et al., 2010 ); (2) the APA Division 12 Task Force on Psychological Interventions, with contributions from the Division 12 Task Force on Promotion and Dissemination of Psychological Procedures and the APA Task Force for Psychological Intervention Guidelines (DIV12; presented in Chambless & Hollon, 1998 ; Chambless & Ollendick, 2001 ), adopted and expanded by APA Division 53, the Society for Clinical Child and Adolescent Psychology ( Weisz & Hawley, 1998 , 1999 ); (3) the APA Division 16 Task Force on Evidence-Based Interventions in School Psychology (DIV16; Members of the Task Force on Evidence-Based Interventions in School Psychology. Chair: T. R. Kratochwill, 2003); (4) the National Reading Panel (NRP; National Institute of Child Health and Human Development, 2000 ); (5) the Single-Case Experimental Design Scale ( Tate et al., 2008 ); and (6) the reporting guidelines for EMA put forth by Stone & Shiffman (2002) . Although the specific purposes of each source differ somewhat, the overall aim is to provide researchers and reviewers with agreed-upon criteria to be used in the conduct and evaluation of SCED research. The standards provided by WWC, DIV12, DIV16, and the NRP represent the efforts of task forces. The Tate et al. scale was selected for inclusion in this review because it represents perhaps the only psychometrically validated tool for assessing the rigor of SCED methodology. Stone and Shiffman’s (2002) standards were intended specifically for EMA methods, but many of their criteria also apply to time-series, daily diary, and other repeated-measurement and sampling methods, making them pertinent to this article. The design, measurement, and analysis standards are presented in the later sections of this article and notable concurrences, discrepancies, strengths, and deficiencies are summarized.

Systematic Review Search Procedures and Selection Criteria

Search strategy.

A comprehensive search strategy of SCEDs was performed to identify studies published in peer-reviewed journals meeting a priori search and inclusion criteria. First, a computer-based PsycINFO search of articles published between 2000 and 2010 (search conducted in July 2011) was conducted that used the following primary key terms and phrases that appeared anywhere in the article (asterisks denote that any characters/letters can follow the last character of the search term): alternating treatment design, changing criterion design, experimental case*, multiple baseline design, replicated single-case design, simultaneous treatment design, time-series design. The search was limited to studies published in the English language and those appearing in peer-reviewed journals within the specified publication year range. Additional limiters of the type of article were also used in PsycINFO to increase specificity: The search was limited to include methodologies indexed as either quantitative study OR treatment outcome/randomized clinical trial and NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study.

Study selection

The author used a three-phase study selection, screening, and coding procedure to select the highest number of applicable studies. Phase 1 consisted of the initial systematic review conducted using PsycINFO, which resulted in 571 articles. In Phase 2, titles and abstracts were screened: Articles appearing to use a SCED were retained (451) for Phase 3, in which the author and a trained research assistant read each full-text article and entered the characteristics of interest into a database. At each phase of the screening process, studies that did not use a SCED or that either self-identified as, or were determined to be, quasi-experimental were dropped. Of the 571 original studies, 82 studies were determined to be quasi-experimental. The definition of a quasi-experimental design used in the screening procedure conforms to the descriptions provided by Kazdin (2010) and Shadish et al. (2002) regarding the necessary components of an experimental design. For example, reversal designs require a minimum of four phases (e.g., ABAB), and multiple baseline designs must demonstrate replication of the effect across at least three conditions (e.g., subjects, settings, behaviors). Sixteen studies were unavailable in full text in English, and five could not be obtained in full text and were thus dropped. The remaining articles that were not retained for review (59) were determined not to be SCED studies meeting our inclusion criteria, but had been identified in our PsycINFO search using the specified keyword and methodology terms. For this review, 409 studies were selected. The sources of the 409 reviewed studies are summarized in Table 1 . A complete bibliography of the 571 studies appearing in the initial search, with the included studies marked, is available online as an Appendix or from the author.

Journal Sources of Studies Included in the Systematic Review (N = 409)

Note: Each of the following journal titles contributed 1 study unless otherwise noted in parentheses: Augmentative and Alternative Communication; Acta Colombiana de Psicología; Acta Comportamentalia; Adapted Physical Activity Quarterly (2); Addiction Research and Theory; Advances in Speech Language Pathology; American Annals of the Deaf; American Journal of Education; American Journal of Occupational Therapy; American Journal of Speech-Language Pathology; The American Journal on Addictions; American Journal on Mental Retardation; Applied Ergonomics; Applied Psychophysiology and Biofeedback; Australian Journal of Guidance & Counseling; Australian Psychologist; Autism; The Behavior Analyst; The Behavior Analyst Today; Behavior Analysis in Practice (2); Behavior and Social Issues (2); Behaviour Change (2); Behavioural and Cognitive Psychotherapy; Behaviour Research and Therapy (3); Brain and Language (2); Brain Injury (2); Canadian Journal of Occupational Therapy (2); Canadian Journal of School Psychology; Career Development for Exceptional Individuals; Chinese Mental Health Journal; Clinical Linguistics and Phonetics; Clinical Psychology & Psychotherapy; Cognitive and Behavioral Practice; Cognitive Computation; Cognitive Therapy and Research; Communication Disorders Quarterly; Developmental Medicine & Child Neurology (2); Developmental Neurorehabilitation (2); Disability and Rehabilitation: An International, Multidisciplinary Journal (3); Disability and Rehabilitation: Assistive Technology; Down Syndrome: Research & Practice; Drug and Alcohol Dependence (2); Early Childhood Education Journal (2); Early Childhood Services: An Interdisciplinary Journal of Effectiveness; Educational Psychology (2); Education and Training in Autism and Developmental Disabilities; Electronic Journal of Research in Educational Psychology; Environment and Behavior (2); European Eating Disorders Review; European Journal of Sport Science; European Review of Applied Psychology; Exceptional Children; Exceptionality; Experimental and Clinical Psychopharmacology; Family & Community Health: The Journal of Health Promotion & Maintenance; Headache: The Journal of Head and Face Pain; International Journal of Behavioral Consultation and Therapy (2); International Journal of Disability; Development and Education (2); International Journal of Drug Policy; International Journal of Psychology; International Journal of Speech-Language Pathology; International Psychogeriatrics; Japanese Journal of Behavior Analysis (3); Japanese Journal of Special Education; Journal of Applied Research in Intellectual Disabilities (2); Journal of Applied Sport Psychology (3); Journal of Attention Disorders (2); Journal of Behavior Therapy and Experimental Psychiatry; Journal of Child Psychology and Psychiatry; Journal of Clinical Psychology in Medical Settings; Journal of Clinical Sport Psychology; Journal of Cognitive Psychotherapy; Journal of Consulting and Clinical Psychology (2); Journal of Deaf Studies and Deaf Education; Journal of Educational & Psychological Consultation (2); Journal of Evidence-Based Practices for Schools (2); Journal of the Experimental Analysis of Behavior (2); Journal of General Internal Medicine; Journal of Intellectual and Developmental Disabilities; Journal of Intellectual Disability Research (2); Journal of Medical Speech-Language Pathology; Journal of Neurology, Neurosurgery & Psychiatry; Journal of Paediatrics and Child Health; Journal of Prevention and Intervention in the Community; Journal of Safety Research; Journal of School Psychology (3); The Journal of Socio-Economics; The Journal of Special Education; Journal of Speech, Language, and Hearing Research (2); Journal of Sport Behavior; Journal of Substance Abuse Treatment; Journal of the International Neuropsychological Society; Journal of Traumatic Stress; The Journals of Gerontology: Series B: Psychological Sciences and Social Sciences; Language, Speech, and Hearing Services in Schools; Learning Disabilities Research & Practice (2); Learning Disability Quarterly (2); Music Therapy Perspectives; Neurorehabilitation and Neural Repair; Neuropsychological Rehabilitation (2); Pain; Physical Education and Sport Pedagogy (2); Preventive Medicine: An International Journal Devoted to Practice and Theory; Psychological Assessment; Psychological Medicine: A Journal of Research in Psychiatry and the Allied Sciences; The Psychological Record; Reading and Writing; Remedial and Special Education (3); Research and Practice for Persons with Severe Disabilities (2); Restorative Neurology and Neuroscience; School Psychology International; Seminars in Speech and Language; Sleep and Hypnosis; School Psychology Quarterly; Social Work in Health Care; The Sport Psychologist (3); Therapeutic Recreation Journal (2); The Volta Review; Work: Journal of Prevention, Assessment & Rehabilitation.

Coding criteria amplifications

A comprehensive description of the coding criteria for each category in this review is available from the author by request. The primary coding criteria are described here and in later sections of this article.

  • Research design was classified into one of the types discussed later in the section titled Predominant Single-Case Experimental Designs on the basis of the authors’ stated design type. Secondary research designs were then coded when applicable (i.e., mixed designs). Distinctions between primary and secondary research designs were made based on the authors’ description of their study. For example, if an author described the study as a “multiple baseline design with time-series measurement,” the primary research design would be coded as being multiple baseline, and time-series would be coded as the secondary research design.
  • Observer ratings were coded as present when observational coding procedures were described and/or the results of a test of interobserver agreement were reported.
  • Interrater reliability for observer ratings was coded as present in any case in which percent agreement, alpha, kappa, or another appropriate statistic was reported, regardless of the amount of the total data that were examined for agreement.
  • Daily diary, daily self-report, and EMA codes were given when authors explicitly described these procedures in the text by name. Coders did not infer the use of these measurement strategies.
  • The number of baseline observations was either taken directly from the figures provided in text or was simply counted in graphical displays of the data when this was determined to be a reliable approach. In some cases, it was not possible to reliably determine the number of baseline data points from the graphical display of data, in which case, the “unavailable” code was assigned. Similarly, the “unavailable” code was assigned when the number of observations was either unreported or ambiguous, or only a range was provided and thus no mean could be determined. Similarly, the mean number of baseline observations was calculated for each study prior to further descriptive statistical analyses because a number of studies reported means only.
  • The coding of the analytic method used in the reviewed studies is discussed later in the section titled Discussion of Review Results and Coding of Analytic Methods .

Results of the Systematic Review

Descriptive statistics of the design, measurement, and analysis characteristics of the reviewed studies are presented in Table 2 . The results and their implications are discussed in the relevant sections throughout the remainder of the article.

Descriptive Statistics of Reviewed SCED Characteristics

Note. % refers to the proportion of reviewed studies that satisfied criteria for this code: For example, the percent of studies reporting observer ratings.

Discussion of the Systematic Review Results in Context

The SCED is a very flexible methodology and has many variants. Those mentioned here are the building blocks from which other designs are then derived. For those readers interested in the nuances of each design, Barlow et al., (2008) ; Franklin, Allison, and Gorman (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992) , among others, provide cogent, in-depth discussions. Identifying the appropriate SCED depends upon many factors, including the specifics of the IV, the setting in which the study will be conducted, participant characteristics, the desired or hypothesized outcomes, and the research question(s). Similarly, the researcher’s selection of measurement and analysis techniques is determined by these factors.

Predominant Single-Case Experimental Designs

Alternating/simultaneous designs (6%; primary design of the studies reviewed).

Alternating and simultaneous designs involve an iterative manipulation of the IV(s) across different phases to show that changes in the DV vary systematically as a function of manipulating the IV(s). In these multielement designs, the researcher has the option to alternate the introduction of two or more IVs or present two or more IVs at the same time. In the alternating variation, the researcher is able to determine the relative impact of two different IVs on the DV, when all other conditions are held constant. Another variation of this design is to alternate IVs across various conditions that could be related to the DV (e.g., class period, interventionist). Similarly, the simultaneous design would occur when the IVs were presented at the same time within the same phase of the study.

Changing criterion design (4%)

Changing criterion designs are used to demonstrate a gradual change in the DV over the course of the phase involving the active manipulation of the IV. Criteria indicating that a change has occurred happen in a step-wise manner, in which the criterion shifts as the participant responds to the presence of the manipulated IV. The changing criterion design is particularly useful in applied intervention research for a number of reasons. The IV is continuous and never withdrawn, unlike the strategy used in a reversal design. This is particularly important in situations where removal of a psychological intervention would be either detrimental or dangerous to the participant, or would be otherwise unfeasible or unethical. The multiple baseline design also does not withdraw intervention, but it requires replicating the effects of the intervention across participants, settings, or situations. A changing criterion design can be accomplished with one participant in one setting without withholding or withdrawing treatment.

Multiple baseline/combined series design (69%)

The multiple baseline or combined series design can be used to test within-subject change across conditions and often involves multiple participants in a replication context. The multiple baseline design is quite simple in many ways, essentially consisting of a number of repeated, miniature AB experiments or variations thereof. Introduction of the IV is staggered temporally across multiple participants or across multiple within-subject conditions, which allows the researcher to demonstrate that changes in the DV reliably occur only when the IV is introduced, thus controlling for the effects of extraneous factors. Multiple baseline designs can be used both within and across units (i.e., persons or groups of persons). When the baseline phase of each subject begins simultaneously, it is called a concurrent multiple baseline design. In a nonconcurrent variation, baseline periods across subjects begin at different points in time. The multiple baseline design is useful in many settings in which withdrawal of the IV would not be appropriate or when introduction of the IV is hypothesized to result in permanent change that would not reverse when the IV is withdrawn. The major drawback of this design is that the IV must be initially withheld for a period of time to ensure different starting points across the different units in the baseline phase. Depending upon the nature of the research questions, withholding an IV, such as a treatment, could be potentially detrimental to participants.

Reversal designs (17%)

Reversal designs are also known as introduction and withdrawal and are denoted as ABAB designs in their simplest form. As the name suggests, the reversal design involves collecting a baseline measure of the DV (the first A phase), introducing the IV (the first B phase), removing the IV while continuing to assess the DV (the second A phase), and then reintroducing the IV (the second B phase). This pattern can be repeated as many times as is necessary to demonstrate an effect or otherwise address the research question. Reversal designs are useful when the manipulation is hypothesized to result in changes in the DV that are expected to reverse or discontinue when the manipulation is not present. Maintenance of an effect is often necessary to uphold the findings of reversal designs. The demonstration of an effect is evident in reversal designs when improvement occurs during the first manipulation phase, compared to the first baseline phase, then reverts to or approaches original baseline levels during the second baseline phase when the manipulation has been withdrawn, and then improves again when the manipulation in then reinstated. This pattern of reversal, when the manipulation is introduced and then withdrawn, is essential to attributing changes in the DV to the IV. However, maintenance of the effects in a reversal design, in which the DV is hypothesized to reverse when the IV is withdrawn, is not incompatible ( Kazdin, 2010 ). Maintenance is demonstrated by repeating introduction–withdrawal segments until improvement in the DV becomes permanent even when the IV is withdrawn. There is not always a need to demonstrate maintenance in all applications, nor is it always possible or desirable, but it is paramount in the learning and intervention research contexts.

Mixed designs (10%)

Mixed designs include a combination of more than one SCED (e.g., a reversal design embedded within a multiple baseline) or an SCED embedded within a group design (i.e., a randomized controlled trial comparing two groups of multiple baseline experiments). Mixed designs afford the researcher even greater flexibility in designing a study to address complex psychological hypotheses, but also capitalize on the strengths of the various designs. See Kazdin (2010) for a discussion of the variations and utility of mixed designs.

Related Nonexperimental Designs

Quasi-experimental designs.

In contrast to the designs previously described, all of which constitute “true experiments” ( Kazdin, 2010 ; Shadish et al., 2002 ), in quasi-experimental designs the conditions of a true experiment (e.g., active manipulation of the IV, replication of the effect) are approximated and are not readily under the control of the researcher. Because the focus of this article is on experimental designs, quasi-experiments are not discussed in detail; instead the reader is referred to Kazdin (2010) and Shadish et al. (2002) .

Ecological and naturalistic single-case designs

For a single-case design to be experimental, there must be active manipulation of the IV, but in some applications, such as those that might be used in social and personality psychology, the researcher might be interested in measuring naturally occurring phenomena and examining their temporal relationships. Thus, the researcher will not use a manipulation. An example of this type of research might be a study about the temporal relationship between alcohol consumption and depressed mood, which can be measured reliably using EMA methods. Psychotherapy process researchers also use this type of design to assess dyadic relationship dynamics between therapists and clients (e.g., Tschacher & Ramseyer, 2009 ).

Research Design Standards

Each of the reviewed standards provides some degree of direction regarding acceptable research designs. The WWC provides the most detailed and specific requirements regarding design characteristics. Those guidelines presented in Tables 3 , ​ ,4, 4 , and ​ and5 5 are consistent with the methodological rigor necessary to meet the WWC distinction “meets standards.” The WWC also provides less-stringent standards for a “meets standards with reservations” distinction. When minimum criteria in the design, measurement, or analysis sections of a study are not met, it is rated “does not meet standards” ( Kratochwill et al., 2010 ). Many SCEDs are acceptable within the standards of DIV12, DIV16, NRP, and in the Tate et al. SCED scale. DIV12 specifies that replication occurs across a minimum of three successive cases, which differs from the WWC specifications, which allow for three replications within a single-subject design but does not necessarily need to be across multiple subjects. DIV16 does not require, but seems to prefer, a multiple baseline design with a between-subject replication. Tate et al. state that the “design allows for the examination of cause and effect relationships to demonstrate efficacy” (p. 400, 2008). Determining whether or not a design meets this requirement is left up to the evaluator, who might then refer to one of the other standards or another source for direction.

Research Design Standards and Guidelines

Measurement and Assessment Standards and Guidelines

Analysis Standards and Guidelines

The Stone and Shiffman (2002) standards for EMA are concerned almost entirely with the reporting of measurement characteristics and less so with research design. One way in which these standards differ from those of other sources is in the active manipulation of the IV. Many research questions in EMA, daily diary, and time-series designs are concerned with naturally occurring phenomena, and a researcher manipulation would run counter to this aim. The EMA standards become important when selecting an appropriate measurement strategy within the SCED. In EMA applications, as is also true in some other time-series and daily diary designs, researcher manipulation occurs as a function of the sampling interval in which DVs of interest are measured according to fixed time schedules (e.g., reporting occurs at the end of each day), random time schedules (e.g., the data collection device prompts the participant to respond at random intervals throughout the day), or on an event-based schedule (e.g., reporting occurs after a specified event takes place).

Measurement

The basic measurement requirement of the SCED is a repeated assessment of the DV across each phase of the design in order to draw valid inferences regarding the effect of the IV on the DV. In other applications, such as those used by personality and social psychology researchers to study various human phenomena ( Bolger et al., 2003 ; Reis & Gable, 2000 ), sampling strategies vary widely depending on the topic area under investigation. Regardless of the research area, SCEDs are most typically concerned with within-person change and processes and involve a time-based strategy, most commonly to assess global daily averages or peak daily levels of the DV. Many sampling strategies, such as time-series, in which reporting occurs at uniform intervals or on event-based, fixed, or variable schedules, are also appropriate measurement methods and are common in psychological research (see Bolger et al., 2003 ).

Repeated-measurement methods permit the natural, even spontaneous, reporting of information ( Reis, 1994 ), which reduces the biases of retrospection by minimizing the amount of time elapsed between an experience and the account of this experience ( Bolger et al., 2003 ). Shiffman et al. (2008) aptly noted that the majority of research in the field of psychology relies heavily on retrospective assessment measures, even though retrospective reports have been found to be susceptible to state-congruent recall (e.g., Bower, 1981 ) and a tendency to report peak levels of the experience instead of giving credence to temporal fluctuations ( Redelmeier & Kahneman, 1996 ; Stone, Broderick, Kaell, Deles-Paul, & Porter, 2000 ). Furthermore, Shiffman et al. (1997) demonstrated that subjective aggregate accounts were a poor fit to daily reported experiences, which can be attributed to reductions in measurement error resulting in increased validity and reliability of the daily reports.

The necessity of measuring at least one DV repeatedly means that the selected assessment method, instrument, and/or construct must be sensitive to change over time and be capable of reliably and validly capturing change. Horner et al. (2005) discusses the important features of outcome measures selected for use in these types of designs. Kazdin (2010) suggests that measures be dimensional, which can more readily detect effects than categorical and binary measures. Although using an established measure or scale, such as the Outcome Questionnaire System ( M. J. Lambert, Hansen, & Harmon, 2010 ), provides empirically validated items for assessing various outcomes, most measure validation studies conducted on this type of instrument involve between-subject designs, which is no guarantee that these measures are reliable and valid for assessing within-person variability. Borsboom, Mellenbergh, and van Heerden (2003) suggest that researchers adapting validated measures should consider whether the items they propose using have a factor structure within subjects similar to that obtained between subjects. This is one of the reasons that SCEDs often use observational assessments from multiple sources and report the interrater reliability of the measure. Self-report measures are acceptable practice in some circles, but generally additional assessment methods or informants are necessary to uphold the highest methodological standards. The results of this review indicate that the majority of studies include observational measurement (76.0%). Within those studies, nearly all (97.1%) reported interrater reliability procedures and results. The results within each design were similar, with the exception of time-series designs, which used observer ratings in only half of the reviewed studies.

Time-series

Time-series designs are defined by repeated measurement of variables of interest over a period of time ( Box & Jenkins, 1970 ). Time-series measurement most often occurs in uniform intervals; however, this is no longer a constraint of time-series designs (see Harvey, 2001 ). Although uniform interval reporting is not necessary in SCED research, repeated measures often occur at uniform intervals, such as once each day or each week, which constitutes a time-series design. The time-series design has been used in various basic science applications ( Scollon, Kim-Pietro, & Diener, 2003 ) across nearly all subspecialties in psychology (e.g., Bolger et al., 2003 ; Piasecki et al., 2007 ; for a review, see Reis & Gable, 2000 ; Soliday et al., 2002 ). The basic time-series formula for a two-phase (AB) data stream is presented in Equation 1 . In this formula α represents the step function of the data stream; S represents the change between the first and second phases, which is also the intercept in a two-phase data stream and a step function being 0 at times i = 1, 2, 3…n1 and 1 at times i = n1+1, n1+2, n1+3…n; n 1 is the number of observations in the baseline phase; n is the total number of data points in the data stream; i represents time; and ε i = ρε i −1 + e i , which indicates the relationship between the autoregressive function (ρ) and the distribution of the data in the stream.

Time-series formulas become increasingly complex when seasonality and autoregressive processes are modeled in the analytic procedures, but these are rarely of concern for short time-series data streams in SCEDs. For a detailed description of other time-series design and analysis issues, see Borckardt et al. (2008) , Box and Jenkins (1970) , Crosbie (1993) , R. R. Jones et al. (1977) , and Velicer and Fava (2003) .

Time-series and other repeated-measures methodologies also enable examination of temporal effects. Borckardt et al. (2008) and others have noted that time-series designs have the potential to reveal how change occurs, not simply if it occurs. This distinction is what most interested Skinner (1938) , but it often falls below the purview of today’s researchers in favor of group designs, which Skinner felt obscured the process of change. In intervention and psychopathology research, time-series designs can assess mediators of change ( Doss & Atkins, 2006 ), treatment processes ( Stout, 2007 ; Tschacher & Ramseyer, 2009 ), and the relationship between psychological symptoms (e.g., Alloy, Just, & Panzarella, 1997 ; Hanson & Chen, 2010 ; Oslin, Cary, Slaymaker, Colleran, & Blow, 2009 ), and might be capable of revealing mechanisms of change ( Kazdin, 2007 , 2009 , 2010 ). Between- and within-subject SCED designs with repeated measurements enable researchers to examine similarities and differences in the course of change, both during and as a result of manipulating an IV. Temporal effects have been largely overlooked in many areas of psychological science ( Bolger et al., 2003 ): Examining temporal relationships is sorely needed to further our understanding of the etiology and amplification of numerous psychological phenomena.

Time-series studies were very infrequently found in this literature search (2%). Time-series studies traditionally occur in subfields of psychology in which single-case research is not often used (e.g., personality, physiological/biological). Recent advances in methods for collecting and analyzing time-series data (e.g., Borckardt et al., 2008 ) could expand the use of time-series methodology in the SCED community. One problem with drawing firm conclusions from this particular review finding is a semantic factor: Time-series is a specific term reserved for measurement occurring at a uniform interval. However, SCED research appears to not yet have adopted this language when referring to data collected in this fashion. When time-series data analytic methods are not used, the matter of measurement interval is of less importance and might not need to be specified or described as a time-series. An interesting extension of this work would be to examine SCED research that used time-series measurement strategies but did not label it as such. This is important because then it could be determined how many SCEDs could be analyzed with time-series statistical methods.

Daily diary and ecological momentary assessment methods

EMA and daily diary approaches represent methodological procedures for collecting repeated measurements in time-series and non-time-series experiments, which are also known as experience sampling. Presenting an in-depth discussion of the nuances of these sampling techniques is well beyond the scope of this paper. The reader is referred to the following review articles: daily diary ( Bolger et al., 2003 ; Reis & Gable, 2000 ; Thiele, Laireiter, & Baumann, 2002 ), and EMA ( Shiffman et al., 2008 ). Experience sampling in psychology has burgeoned in the past two decades as technological advances have permitted more precise and immediate reporting by participants (e.g., Internet-based, two-way pagers, cellular telephones, handheld computers) than do paper and pencil methods (for reviews see Barrett & Barrett, 2001 ; Shiffman & Stone, 1998 ). Both methods have practical limitations and advantages. For example, electronic methods are more costly and may exclude certain subjects from participating in the study, either because they do not have access to the necessary technology or they do not have the familiarity or savvy to successfully complete reporting. Electronic data collection methods enable the researcher to prompt responses at random or predetermined intervals and also accurately assess compliance. Paper and pencil methods have been criticized for their inability to reliably track respondents’ compliance: Palermo, Valenzuela, and Stork (2004) found better compliance with electronic diaries than with paper and pencil. On the other hand, Green, Rafaeli, Bolger, Shrout, & Reis (2006) demonstrated the psychometric data structure equivalence between these two methods, suggesting that the data collected in either method will yield similar statistical results given comparable compliance rates.

Daily diary/daily self-report and EMA measurement were somewhat rarely represented in this review, occurring in only 6.1% of the total studies. EMA methods had been used in only one of the reviewed studies. The recent proliferation of EMA and daily diary studies in psychology reported by others ( Bolger et al., 2003 ; Piasecki et al., 2007 ; Shiffman et al., 2008 ) suggests that these methods have not yet reached SCED researchers, which could in part have resulted from the long-held supremacy of observational measurement in fields that commonly practice single-case research.

Measurement Standards

As was previously mentioned, measurement in SCEDs requires the reliable assessment of change over time. As illustrated in Table 4 , DIV16 and the NRP explicitly require that reliability of all measures be reported. DIV12 provides little direction in the selection of the measurement instrument, except to require that three or more clinically important behaviors with relative independence be assessed. Similarly, the only item concerned with measurement on the Tate et al. scale specifies assessing behaviors consistent with the target of the intervention. The WWC and the Tate et al. scale require at least two independent assessors of the DV and that interrater reliability meeting minimum established thresholds be reported. Furthermore, WWC requires that interrater reliability be assessed on at least 20% of the data in each phase and in each condition. DIV16 expects that assessment of the outcome measures will be multisource and multimethod, when applicable. The interval of measurement is not specified by any of the reviewed sources. The WWC and the Tate et al. scale require that DVs be measured repeatedly across phases (e.g., baseline and treatment), which is a typical requirement of a SCED. The NRP asks that the time points at which DV measurement occurred be reported.

The baseline measurement represents one of the most crucial design elements of the SCED. Because subjects provide their own data for comparison, gathering a representative, stable sampling of behavior before manipulating the IV is essential to accurately inferring an effect. Some researchers have reported the typical length of the baseline period to range from 3 to 12 observations in intervention research applications (e.g., Center et al., 1986 ; Huitema, 1985 ; R. R. Jones et al., 1977 ; Sharpley, 1987 ); Huitema’s (1985) review of 881 experiments published in the Journal of Applied Behavior Analysis resulted in a modal number of three to four baseline points. Center et al. (1986) suggested five as the minimum number of baseline measurements needed to accurately estimate autocorrelation. Longer baseline periods suggest a greater likelihood of a representative measurement of the DVs, which has been found to increase the validity of the effects and reduce bias resulting from autocorrelation ( Huitema & McKean, 1994 ). The results of this review are largely consistent with those of previous researchers: The mean number of baseline observations was found to be 10.22 ( SD = 9.59), and 6 was the modal number of observations. Baseline data were available in 77.8% of the reviewed studies. Although the baseline assessment has tremendous bearing on the results of a SCED study, it was often difficult to locate the exact number of data points. Similarly, the number of data points assessed across all phases of the study were not easily identified.

The WWC, DIV12, and DIV16 agree that a minimum of three data points during the baseline is necessary. However, to receive the highest rating by the WWC, five data points are necessary in each phase, including the baseline and any subsequent withdrawal baselines as would occur in a reversal design. DIV16 explicitly states that more than three points are preferred and further stipulates that the baseline must demonstrate stability (i.e., limited variability), absence of overlap between the baseline and other phases, absence of a trend, and that the level of the baseline measurement is severe enough to warrant intervention; each of these aspects of the data is important in inferential accuracy. Detrending techniques can be used to address baseline data trend. The integration option in ARIMA-based modeling and the empirical mode decomposition method ( Wu, Huang, Long, & Peng, 2007 ) are two sophisticated detrending techniques. In regression-based analytic methods, detrending can be accomplished by simply regressing each variable in the model on time (i.e., the residuals become the detrended series), which is analogous to adding a linear, exponential, or quadratic term to the regression equation.

NRP does not provide a minimum for data points, nor does the Tate et al. scale, which requires only a sufficient sampling of baseline behavior. Although the mean and modal number of baseline observations is well within these parameters, seven (1.7%) studies reported mean baselines of less than three data points.

Establishing a uniform minimum number of required baseline observations would provide researchers and reviewers with only a starting guide. The baseline phase is important in SCED research because it establishes a trend that can then be compared with that of subsequent phases. Although a minimum number of observations might be required to meet standards, many more might be necessary to establish a trend when there is variability and trends in the direction of the expected effect. The selected data analytic approach also has some bearing on the number of necessary baseline observations. This is discussed further in the Analysis section.

Reporting of repeated measurements

Stone and Shiffman (2002) provide a comprehensive set of guidelines for the reporting of EMA data, which can also be applied to other repeated-measurement strategies. Because the application of EMA is widespread and not confined to specific research designs, Stone and Shiffman intentionally place few restraints on researchers regarding selection of the DV and the reporter, which is determined by the research question under investigation. The methods of measurement, however, are specified in detail: Descriptions of prompting, recording of responses, participant-initiated entries, and the data acquisition interface (e.g., paper and pencil diary, PDA, cellular telephone) ought to be provided with sufficient detail for replication. Because EMA specifically, and time-series/daily diary methods similarly, are primarily concerned with the interval of assessment, Stone and Shiffman suggest reporting the density and schedule of assessment. The approach is generally determined by the nature of the research question and pragmatic considerations, such as access to electronic data collection devices at certain times of the day and participant burden. Compliance and missing data concerns are present in any longitudinal research design, but they are of particular importance in repeated-measurement applications with frequent measurement. When the research question pertains to temporal effects, compliance becomes paramount, and timely, immediate responding is necessary. For this reason, compliance decisions, rates of missing data, and missing data management techniques must be reported. The effect of missing data in time-series data streams has been the topic of recent research in the social sciences (e.g., Smith, Borckardt, & Nash, in press ; Velicer & Colby, 2005a , 2005b ). The results and implications of these and other missing data studies are discussed in the next section.

Analysis of SCED Data

Visual analysis.

Experts in the field generally agree about the majority of critical single-case experiment design and measurement characteristics. Analysis, on the other hand, is an area of significant disagreement, yet it has also received extensive recent attention and advancement. Debate regarding the appropriateness and accuracy of various methods for analyzing SCED data, the interpretation of single-case effect sizes, and other concerns vital to the validity of SCED results has been ongoing for decades, and no clear consensus has been reached. Visual analysis, following systematic procedures such as those provided by Franklin, Gorman, Beasley, and Allison (1997) and Parsonson and Baer (1978) , remains the standard by which SCED data are most commonly analyzed ( Parker, Cryer, & Byrns, 2006 ). Visual analysis can arguably be applied to all SCEDs. However, a number of baseline data characteristics must be met for effects obtained through visual analysis to be valid and reliable. The baseline phase must be relatively stable; free of significant trend, particularly in the hypothesized direction of the effect; have minimal overlap of data with subsequent phases; and have a sufficient sampling of behavior to be considered representative ( Franklin, Gorman, et al., 1997 ; Parsonson & Baer, 1978 ). The effect of baseline trend on visual analysis, and a technique to control baseline trend, are offered by Parker et al. (2006) . Kazdin (2010) suggests using statistical analysis when a trend or significant variability appears in the baseline phase, two conditions that ought to preclude the use of visual analysis techniques. Visual analysis methods are especially adept at determining intervention effects and can be of particular relevance in real-world applications (e.g., Borckardt et al., 2008 ; Kratochwill, Levin, Horner, & Swoboda, 2011 ).

However, visual analysis has its detractors. It has been shown to be inconsistent, can be affected by autocorrelation, and results in overestimation of effect (e.g., Matyas & Greenwood, 1990 ). Visual analysis as a means of estimating an effect precludes the results of SCED research from being included in meta-analysis, and also makes it very difficult to compare results to the effect sizes generated by other statistical methods. Yet, visual analysis proliferates in large part because SCED researchers are familiar with these methods and are not only generally unfamiliar with statistical approaches, but lack agreement about their appropriateness. Still, top experts in single-case analysis champion the use of statistical methods alongside visual analysis whenever it is appropriate to do so ( Kratochwill et al., 2011 ).

Statistical analysis

Statistical analysis of SCED data consists generally of an attempt to address one or more of three broad research questions: (1) Does introduction/manipulation of the IV result in statistically significant change in the level of the DV (level-change or phase-effect analysis)? (2) Does introduction/manipulation of the IV result in statistically significant change in the slope of the DV over time (slope-change analysis)? and (3) Do meaningful relationships exist between the trajectory of the DV and other potential covariates? Level- and slope-change analyses are relevant to intervention effectiveness studies and other research questions in which the IV is expected to result in changes in the DV in a particular direction. Visual analysis methods are most adept at addressing research questions pertaining to changes in level and slope (Questions 1 and 2), most often using some form of graphical representation and standardized computation of a mean level or trend line within and between each phase of interest (e.g., Horner & Spaulding, 2010 ; Kratochwill et al., 2011 ; Matyas & Greenwood, 1990 ). Research questions in other areas of psychological science might address the relationship between DVs or the slopes of DVs (Question 3). A number of sophisticated modeling approaches (e.g., cross-lag, multilevel, panel, growth mixture, latent class analysis) may be used for this type of question, and some are discussed in greater detail later in this section. However, a discussion about the nuances of this type of analysis and all their possible methods is well beyond the scope of this article.

The statistical analysis of SCEDs is a contentious issue in the field. Not only is there no agreed-upon statistical method, but the practice of statistical analysis in the context of the SCED is viewed by some as unnecessary (see Shadish, Rindskopf, & Hedges, 2008 ). Traditional trends in the prevalence of statistical analysis usage by SCED researchers are revealing: Busk & Marascuilo (1992) found that only 10% of the published single-case studies they reviewed used statistical analysis; Brossart, Parker, Olson, & Mahadevan (2006) estimated that this figure had roughly doubled by 2006. A range of concerns regarding single-case effect size calculation and interpretation is discussed in significant detail elsewhere (e.g., Campbell, 2004 ; Cohen, 1994 ; Ferron & Sentovich, 2002 ; Ferron & Ware, 1995 ; Kirk, 1996 ; Manolov & Solanas, 2008 ; Olive & Smith, 2005 ; Parker & Brossart, 2003 ; Robey et al., 1999 ; Smith et al., in press ; Velicer & Fava, 2003 ). One concern is the lack of a clearly superior method across datasets. Although statistical methods for analyzing SCEDs abound, few studies have examined their comparative performance with the same dataset. The most recent studies of this kind, performed by Brossart et al. (2006) , Campbell (2004) , Parker and Brossart (2003) , and Parker and Vannest (2009) , found that the more promising available statistical analysis methods yielded moderately different results on the same data series, which led them to conclude that each available method is equipped to adequately address only a relatively narrow spectrum of data. Given these findings, analysts need to select an appropriate model for the research questions and data structure, being mindful of how modeling results can be influenced by extraneous factors.

The current standards unfortunately provide little guidance in the way of statistical analysis options. This article presents an admittedly cursory introduction to available statistical methods; many others are not covered in this review. The following articles provide more in-depth discussion and description of other methods: Barlow et al. (2008) ; Franklin et al., (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992 , 2010 ). Shadish et al. (2008) summarize more recently developed methods. Similarly, a Special Issue of Evidence-Based Communication Assessment and Intervention (2008, Volume 2) provides articles and discussion of the more promising statistical methods for SCED analysis. An introduction to autocorrelation and its implications for statistical analysis is necessary before specific analytic methods can be discussed. It is also pertinent at this time to discuss the implications of missing data.

Autocorrelation

Many repeated measurements within a single subject or unit create a situation that most psychological researchers are unaccustomed to dealing with: autocorrelated data, which is the nonindependence of sequential observations, also known as serial dependence. Basic and advanced discussions of autocorrelation in single-subject data can be found in Borckardt et al. (2008) , Huitema (1985) , and Marshall (1980) , and discussions of autocorrelation in multilevel models can be found in Snijders and Bosker (1999) and Diggle and Liang (2001) . Along with trend and seasonal variation, autocorrelation is one example of the internal structure of repeated measurements. In the social sciences, autocorrelated data occur most naturally in the fields of physiological psychology, econometrics, and finance, where each phase of interest has potentially hundreds or even thousands of observations that are tightly packed across time (e.g., electroencephalography actuarial data, financial market indices). Applied SCED research in most areas of psychology is more likely to have measurement intervals of day, week, or hour.

Autocorrelation is a direct result of the repeated-measurement requirements of the SCED, but its effect is most noticeable and problematic when one is attempting to analyze these data. Many commonly used data analytic approaches, such as analysis of variance, assume independence of observations and can produce spurious results when the data are nonindependent. Even statistically insignificant autocorrelation estimates are generally viewed as sufficient to cause inferential bias when conventional statistics are used (e.g., Busk & Marascuilo, 1988 ; R. R. Jones et al., 1977 ; Matyas & Greenwood, 1990 ). The effect of autocorrelation on statistical inference in single-case applications has also been known for quite some time (e.g., R. R. Jones et al., 1977 ; Kanfer, 1970 ; Kazdin, 1981 ; Marshall, 1980 ). The findings of recent simulation studies of single-subject data streams indicate that autocorrelation is a nontrivial matter. For example, Manolov and Solanas (2008) determined that calculated effect sizes were linearly related to the autocorrelation of the data stream, and Smith et al. (in press) demonstrated that autocorrelation estimates in the vicinity of 0.80 negatively affect the ability to correctly infer a significant level-change effect using a standardized mean differences method. Huitema and colleagues (e.g., Huitema, 1985 ; Huitema & McKean, 1994 ) argued that autocorrelation is rarely a concern in applied research. Huitema’s methods and conclusions have been questioned and opposing data have been published (e.g., Allison & Gorman, 1993 ; Matyas & Greenwood, 1990 ; Robey et al., 1999 ), resulting in abandonment of the position that autocorrelation can be conscionably ignored without compromising the validity of the statistical procedures. Procedures for removing autocorrelation in the data stream prior to calculating effect sizes are offered as one option: One of the more promising analysis methods, autoregressive integrated moving averages (discussed later in this article), was specifically designed to remove the internal structure of time-series data, such as autocorrelation, trend, and seasonality ( Box & Jenkins, 1970 ; Tiao & Box, 1981 ).

Missing observations

Another concern inherent in repeated-measures designs is missing data. Daily diary and EMA methods are intended to reduce the risk of retrospection error by eliciting accurate, real-time information ( Bolger et al., 2003 ). However, these methods are subject to missing data as a result of honest forgetfulness, not possessing the diary collection tool at the specified time of collection, and intentional or systematic noncompliance. With paper and pencil diaries and some electronic methods, subjects might be able to complete missed entries retrospectively, defeating the temporal benefits of these assessment strategies ( Bolger et al., 2003 ). Methods of managing noncompliance through the study design and measurement methods include training the subject to use the data collection device appropriately, using technology to prompt responding and track the time of response, and providing incentives to participants for timely compliance (for additional discussion of this topic, see Bolger et al., 2003 ; Shiffman & Stone, 1998 ).

Even when efforts are made to maximize compliance during the conduct of the research, the problem of missing data is often unavoidable. Numerous approaches exist for handling missing observations in group multivariate designs (e.g., Horton & Kleinman, 2007 ; Ibrahim, Chen, Lipsitz, & Herring, 2005 ). Ragunathan (2004) and others concluded that full information and raw data maximum likelihood methods are preferable. Velicer and Colby (2005a , 2005b ) established the superiority of maximum likelihood methods over listwise deletion, mean of adjacent observations, and series mean substitution in the estimation of various critical time-series data parameters. Smith et al. (in press) extended these findings regarding the effect of missing data on inferential precision. They found that managing missing data with the EM procedure ( Dempster, Laird, & Rubin, 1977 ), a maximum likelihood algorithm, did not affect one’s ability to correctly infer a significant effect. However, lag-1 autocorrelation estimates in the vicinity of 0.80 resulted in insufficient power sensitivity (< 0.80), regardless of the proportion of missing data (10%, 20%, 30%, or 40%). 1 Although maximum likelihood methods have garnered some empirical support, methodological strategies that minimize missing data, particularly systematically missing data, are paramount to post-hoc statistical remedies.

Nonnormal distribution of data

In addition to the autocorrelated nature of SCED data, typical measurement methods also present analytic challenges. Many statistical methods, particularly those involving model finding, assume that the data are normally distributed. This is often not satisfied in SCED research when measurements involve count data, observer-rated behaviors, and other, similar metrics that result in skewed distributions. Techniques are available to manage nonnormal distributions in regression-based analysis, such as zero-inflated Poisson regression ( D. Lambert, 1992 ) and negative binomial regression ( Gardner, Mulvey, & Shaw, 1995 ), but many other statistical analysis methods do not include these sophisticated techniques. A skewed data distribution is perhaps one of the reasons Kazdin (2010) suggests not using count, categorical, or ordinal measurement methods.

Available statistical analysis methods

Following is a basic introduction to the more promising and prevalent analytic methods for SCED research. Because there is little consensus regarding the superiority of any single method, the burden unfortunately falls on the researcher to select a method capable of addressing the research question and handling the data involved in the study. Some indications and contraindications are provided for each method presented here.

Multilevel and structural equation modeling

Multilevel modeling (MLM; e.g., Schmidt, Perels, & Schmitz, 2010 ) techniques represent the state of the art among parametric approaches to SCED analysis, particularly when synthesizing SCED results ( Shadish et al., 2008 ). MLM and related latent growth curve and factor mixture methods in structural equation modeling (SEM; e.g., Lubke & Muthén, 2005 ; B. O. Muthén & Curran, 1997 ) are particularly effective for evaluating trajectories and slopes in longitudinal data and relating changes to potential covariates. MLM and related hierarchical linear models (HLM) can also illuminate the relationship between the trajectories of different variables under investigation and clarify whether or not these relationships differ amongst the subjects in the study. Time-series and cross-lag analyses can also be used in MLM and SEM ( Chow, Ho, Hamaker, & Dolan, 2010 ; du Toit & Browne, 2007 ). However, they generally require sophisticated model-fitting techniques, making them difficult for many social scientists to implement. The structure (autocorrelation) and trend of the data can also complicate many MLM methods. The common, short data streams in SCED research and the small number of subjects also present problems to MLM and SEM approaches, which were developed for data with significantly greater numbers of observations when the number of subjects is fewer, and for a greater number of participants for model-fitting purposes, particularly when there are fewer data points. Still, MLM and related techniques arguably represent the most promising analytic methods.

A number of software options 2 exist for SEM. Popular statistical packages in the social sciences provide SEM options, such as PROC CALIS in SAS ( SAS Institute Inc., 2008 ), the AMOS module ( Arbuckle, 2006 ) of SPSS ( SPSS Statistics, 2011 ), and the sempackage for R ( R Development Core Team, 2005 ), the use of which is described by Fox ( Fox, 2006 ). A number of stand-alone software options are also available for SEM applications, including Mplus ( L. K. Muthén & Muthén, 2010 ) and Stata ( StataCorp., 2011 ). Each of these programs also provides options for estimating multilevel/hierarchical models (for a review of using these programs for MLM analysis see Albright & Marinova, 2010 ). Hierarchical linear and nonlinear modeling can also be accomplished using the HLM 7 program ( Raudenbush, Bryk, & Congdon, 2011 ).

Autoregressive moving averages (ARMA; e.g., Browne & Nesselroade, 2005 ; Liu & Hudack, 1995 ; Tiao & Box, 1981 )

Two primary points have been raised regarding ARMA modeling: length of the data stream and feasibility of the modeling technique. ARMA models generally require 30–50 observations in each phase when analyzing a single-subject experiment (e.g., Borckardt et al., 2008 ; Box & Jenkins, 1970 ), which is often difficult to satisfy in applied psychological research applications. However, ARMA models in an SEM framework, such as those described by du Toit & Browne (2001) , are well suited for longitudinal panel data with few observations and many subjects. Autoregressive SEM models are also applicable under similar conditions. Model-fitting options are available in SPSS, R, and SAS via PROC ARMA.

ARMA modeling also requires considerable training in the method and rather advanced knowledge about statistical methods (e.g., Kratochwill & Levin, 1992 ). However, Brossart et al. (2006) point out that ARMA-based approaches can produce excellent results when there is no “model finding” and a simple lag-1 model, with no differencing and no moving average, is used. This approach can be taken for many SCED applications when phase- or slope-change analyses are of interest with a single, or very few, subjects. As already mentioned, this method is particularly useful when one is seeking to account for autocorrelation or other over-time variations that are not directly related to the experimental or intervention effect of interest (i.e., detrending). ARMA and other time-series analysis methods require missing data to be managed prior to analysis by means of options such as full information maximum likelihood estimation, multiple imputation, or the Kalman filter (see Box & Jenkins, 1970 ; Hamilton, 1994 ; Shumway & Stoffer, 1982 ) because listwise deletion has been shown to result in inaccurate time-series parameter estimates ( Velicer & Colby, 2005a ).

Standardized mean differences

Standardized mean differences approaches include the common Cohen’s d , Glass’s Delta, and Hedge’s g that are used in the analysis of group designs. The computational properties of mean differences approaches to SCEDs are identical to those used for group comparisons, except that the results represent within-case variation instead of the variation between groups, which suggests that the obtained effect sizes are not interpretively equivalent. The advantage of the mean differences approach is its simplicity of calculation and also its familiarity to social scientists. The primary drawback of these approaches is that they were not developed to contend with autocorrelated data. However, Manolov and Solanas (2008) reported that autocorrelation least affected effect sizes calculated using standardized mean differences approaches. To the applied-research scientist this likely represents the most accessible analytic approach, because statistical software is not required to calculate these effect sizes. The resultant effect sizes of single subject standardized mean differences analysis must be interpreted cautiously because their relation to standard effect size benchmarks, such as those provided by Cohen (1988) , is unknown. Standardized mean differences approaches are appropriate only when examining significant differences between phases of the study and cannot illuminate trajectories or relationships between variables.

Other analytic approaches

Researchers have offered other analytic methods to deal with the characteristics of SCED data. A number of methods for analyzing N -of-1 experiments have been developed. Borckardt’s Simulation Modeling Analysis (2006) program provides a method for analyzing level- and slope-change in short (<30 observations per phase; see Borckardt et al., 2008 ), autocorrelated data streams that is statistically sophisticated, yet accessible and freely available to typical psychological scientists and clinicians. A replicated single-case time-series design conducted by Smith, Handler, & Nash (2010) provides an example of SMA application. The Singwin Package, described in Bloom et al., (2003) , is a another easy-to-use parametric approach for analyzing single-case experiments. A number of nonparametric approaches have also been developed that emerged from the visual analysis tradition: Some examples include percent nonoverlapping data ( Scruggs, Mastropieri, & Casto, 1987 ) and nonoverlap of all pairs ( Parker & Vannest, 2009 ); however, these methods have come under scrutiny, and Wolery, Busick, Reichow, and Barton (2010) have suggested abandoning them altogether. Each of these methods appears to be well suited for managing specific data characteristics, but they should not be used to analyze data streams beyond their intended purpose until additional empirical research is conducted.

Combining SCED Results

Beyond the issue of single-case analysis is the matter of integrating and meta-analyzing the results of single-case experiments. SCEDs have been given short shrift in the majority of meta-analytic literature ( Littell, Corcoran, & Pillai, 2008 ; Shadish et al., 2008 ), with only a few exceptions ( Carr et al., 1999 ; Horner & Spaulding, 2010 ). Currently, few proven methods exist for integrating the results of multiple single-case experiments. Allison and Gorman (1993) and Shadish et al. (2008) present the problems associated with meta-analyzing single-case effect sizes, and W. P. Jones (2003) , Manolov and Solanas (2008) , Scruggs and Mastropieri (1998) , and Shadish et al. (2008) offer four different potential statistical solutions for this problem, none of which appear to have received consensus amongst researchers. The ability to synthesize and compare single-case effect sizes, particularly effect sizes garnered through group design research, is undoubtedly necessary to increase SCED proliferation.

Discussion of Review Results and Coding of Analytic Methods

The coding criteria for this review were quite stringent in terms of what was considered to be either visual or statistical analysis. For visual analysis to be coded as present, it was necessary for the authors to self-identify as having used a visual analysis method. In many cases, it could likely be inferred that visual analysis had been used, but it was often not specified. Similarly, statistical analysis was reserved for analytic methods that produced an effect. 3 Analyses that involved comparing magnitude of change using raw count data or percentages were not considered rigorous enough. These two narrow definitions of visual and statistical analysis contributed to the high rate of unreported analytic method, shown in Table 1 (52.3%). A better representation of the use of visual and statistical analysis would likely be the percentage of studies within those that reported a method of analysis. Under these parameters, 41.5% used visual analysis and 31.3% used statistical analysis. Included in these figures are studies that included both visual and statistical methods (11%). These findings are slightly higher than those estimated by Brossart et al. (2006) , who estimated statistical analysis is used in about 20% of SCED studies. Visual analysis continues to undoubtedly be the most prevalent method, but there appears to be a trend for increased use of statistical approaches, which is likely to only gain momentum as innovations continue.

Analysis Standards

The standards selected for inclusion in this review offer minimal direction in the way of analyzing the results of SCED research. Table 5 summarizes analysis-related information provided by the six reviewed sources for SCED standards. Visual analysis is acceptable to DV12 and DIV16, along with unspecified statistical approaches. In the WWC standards, visual analysis is the acceptable method of determining an intervention effect, with statistical analyses and randomization tests permissible as a complementary or supporting method to the results of visual analysis methods. However, the authors of the WWC standards state, “As the field reaches greater consensus about appropriate statistical analyses and quantitative effect-size measures, new standards for effect demonstration will need to be developed” ( Kratochwill et al., 2010 , p.16). The NRP and DIV12 seem to prefer statistical methods when they are warranted. The Tate at al. scale accepts only statistical analysis with the reporting of an effect size. Only the WWC and DIV16 provide guidance in the use of statistical analysis procedures: The WWC “recommends” nonparametric and parametric approaches, multilevel modeling, and regression when statistical analysis is used. DIV16 refers the reader to Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999) for direction in this matter. Statistical analysis of daily diary and EMA methods is similarly unsettled. Stone and Shiffman (2002) ask for a detailed description of the statistical procedures used, in order for the approach to be replicated and evaluated. They provide direction for analyzing aggregated and disaggregated data. They also aptly note that because many different modes of analysis exist, researchers must carefully match the analytic approach to the hypotheses being pursued.

Limitations and Future Directions

This review has a number of limitations that leave the door open for future study of SCED methodology. Publication bias is a concern in any systematic review. This is particularly true for this review because the search was limited to articles published in peer-reviewed journals. This strategy was chosen in order to inform changes in the practice of reporting and of reviewing, but it also is likely to have inflated the findings regarding the methodological rigor of the reviewed works. Inclusion of book chapters, unpublished studies, and dissertations would likely have yielded somewhat different results.

A second concern is the stringent coding criteria in regard to the analytic methods and the broad categorization into visual and statistical analytic approaches. The selection of an appropriate method for analyzing SCED data is perhaps the murkiest area of this type of research. Future reviews that evaluate the appropriateness of selected analytic strategies and provide specific decision-making guidelines for researchers would be a very useful contribution to the literature. Although six sources of standards apply to SCED research reviewed in this article, five of them were developed almost exclusively to inform psychological and behavioral intervention research. The principles of SCED research remain the same in different contexts, but there is a need for non–intervention scientists to weigh in on these standards.

Finally, this article provides a first step in the synthesis of the available SCED reporting guidelines. However, it does not resolve disagreements, nor does it purport to be a definitive source. In the future, an entity with the authority to construct such a document ought to convene and establish a foundational, adaptable, and agreed-upon set of guidelines that cuts across subspecialties but is applicable to many, if not all, areas of psychological research, which is perhaps an idealistic goal. Certain preferences will undoubtedly continue to dictate what constitutes acceptable practice in each subspecialty of psychology, but uniformity along critical dimensions will help advance SCED research.

Conclusions

The first decade of the twenty-first century has seen an upwelling of SCED research across nearly all areas of psychology. This article contributes updated benchmarks in terms of the frequency with which SCED design and methodology characteristics are used, including the number of baseline observations, assessment and measurement practices, and data analytic approaches, most of which are largely consistent with previously reported benchmarks. However, this review is much broader than those of previous research teams and also breaks down the characteristics of single-case research by the predominant design. With the recent SCED proliferation came a number of standards for the conduct and reporting of such research. This article also provides a much-needed synthesis of recent SCED standards that can inform the work of researchers, reviewers, and funding agencies conducting and evaluating single-case research, which reveals many areas of consensus as well as areas of significant disagreement. It appears that the question of where to go next is very relevant at this point in time. The majority of the research design and measurement characteristics of the SCED are reasonably well established, and the results of this review suggest general practice that is in accord with existing standards and guidelines, at least in regard to published peer-reviewed works. In general, the published literature appears to be meeting the basic design and measurement requirement to ensure adequate internal validity of SCED studies.

Consensus regarding the superiority of any one analytic method stands out as an area of divergence. Judging by the current literature and lack of consensus, researchers will need to carefully select a method that matches the research design, hypotheses, and intended conclusions of the study, while also considering the most up-to-date empirical support for the chosen analytic method, whether it be visual or statistical. In some cases the number of observations and subjects in the study will dictate which analytic methods can and cannot be used. In the case of the true N -of-1 experiment, there are relatively few sound analytic methods, and even fewer that are robust with shorter data streams (see Borckardt et al., 2008 ). As the number of observations and subjects increases, sophisticated modeling techniques, such as MLM, SEM, and ARMA, become applicable. Trends in the data and autocorrelation further obfuscate the development of a clear statistical analysis selection algorithm, which currently does not exist. Autocorrelation was rarely addressed or discussed in the articles reviewed, except when the selected statistical analysis dictated consideration. Given the empirical evidence regarding the effect of autocorrelation on visual and statistical analysis, researchers need to address this more explicitly. Missing-data considerations are similarly left out when they are unnecessary for analytic purposes. As newly devised statistical analysis approaches mature and are compared with one another for appropriateness in specific SCED applications, guidelines for statistical analysis will necessarily be revised. Similarly, empirically derived guidance, in the form of a decision tree, must be developed to ensure application of appropriate methods based on characteristics of the data and the research questions being addressed. Researchers could also benefit from tutorials and comparative reviews of different software packages: This is a needed area of future research. Powerful and reliable statistical analyses help move the SCED up the ladder of experimental designs and attenuate the view that the method applies primarily to pilot studies and idiosyncratic research questions and situations.

Another potential future advancement of SCED research comes in the area of measurement. Currently, SCED research gives significant weight to observer ratings and seems to discourage other forms of data collection methods. This is likely due to the origins of the SCED in behavioral assessment and applied behavior analysis, which remains a present-day stronghold. The dearth of EMA and diary-like sampling procedures within the SCED research reviewed, yet their ever-growing prevalence in the larger psychological research arena, highlights an area for potential expansion. Observational measurement, although reliable and valid in many contexts, is time and resource intensive and not feasible in all areas in which psychologists conduct research. It seems that numerous untapped research questions are stifled because of this measurement constraint. SCED researchers developing updated standards in the future should include guidelines for the appropriate measurement requirement of non-observer-reported data. For example, the results of this review indicate that reporting of repeated measurements, particularly the high-density type found in diary and EMA sampling strategies, ought to be more clearly spelled out, with specific attention paid to autocorrelation and trend in the data streams. In the event that SCED researchers adopt self-reported assessment strategies as viable alternatives to observation, a set of standards explicitly identifying the necessary psychometric properties of the measures and specific items used would be in order.

Along similar lines, SCED researchers could take a page from other areas of psychology that champion multimethod and multisource evaluation of primary outcomes. In this way, the long-standing tradition of observational assessment and the cutting-edge technological methods of EMA and daily diary could be married with the goal of strengthening conclusions drawn from SCED research and enhancing the validity of self-reported outcome assessment. The results of this review indicate that they rarely intersect today, and I urge SCED researchers to adopt other methods of assessment informed by time-series, daily diary, and EMA methods. The EMA standards could serve as a jumping-off point for refined measurement and assessment reporting standards in the context of multimethod SCED research.

One limitation of the current SCED standards is their relatively limited scope. To clarify, with the exception of the Stone & Shiffman EMA reporting guidelines, the other five sources of standards were developed in the context of designing and evaluating intervention research. Although this is likely to remain its patent emphasis, SCEDs are capable of addressing other pertinent research questions in the psychological sciences, and the current standards truly only roughly approximate salient crosscutting SCED characteristics. I propose developing broad SCED guidelines that address the specific design, measurement, and analysis issues in a manner that allows it to be useful across applications, as opposed to focusing solely on intervention effects. To accomplish this task, methodology experts across subspecialties in psychology would need to convene. Admittedly this is no small task.

Perhaps funding agencies will also recognize the fiscal and practical advantages of SCED research in certain areas of psychology. One example is in the field of intervention effectiveness, efficacy, and implementation research. A few exemplary studies using robust forms of SCED methodology are needed in the literature. Case-based methodologies will never supplant the group design as the gold standard in experimental applications, nor should that be the goal. Instead, SCEDs provide a viable and valid alternative experimental methodology that could stimulate new areas of research and answer questions that group designs cannot. With the astonishing number of studies emerging every year that use single-case designs and explore the methodological aspects of the design, we are poised to witness and be a part of an upsurge in the sophisticated application of the SCED. When federal grant-awarding agencies and journal editors begin to use formal standards while making funding and publication decisions, the field will benefit.

Last, for the practice of SCED research to continue and mature, graduate training programs must provide students with instruction in all areas of the SCED. This is particularly true of statistical analysis techniques that are not often taught in departments of psychology and education, where the vast majority of SCED studies seem to be conducted. It is quite the conundrum that the best available statistical analytic methods are often cited as being inaccessible to social science researchers who conduct this type of research. This need not be the case. To move the field forward, emerging scientists must be able to apply the most state-of-the-art research designs, measurement techniques, and analytic methods.

Acknowledgments

Research support for the author was provided by research training grant MH20012 from the National Institute of Mental Health, awarded to Elizabeth A. Stormshak. The author gratefully acknowledges Robert Horner and Laura Lee McIntyre, University of Oregon; Michael Nash, University of Tennessee; John Ferron, University of South Florida; the Action Editor, Lisa Harlow, and the anonymous reviewers for their thoughtful suggestions and guidance in shaping this article; Cheryl Mikkola for her editorial support; and Victoria Mollison for her assistance in the systematic review process.

Appendix. Results of Systematic Review Search and Studies Included in the Review

Psycinfo search conducted july 2011.

  • Alternating treatment design
  • Changing criterion design
  • Experimental case*
  • Multiple baseline design
  • Replicated single-case design
  • Simultaneous treatment design
  • Time-series design
  • Quantitative study OR treatment outcome/randomized clinical trial
  • NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study
  • Publication range: 2000–2010
  • Published in peer-reviewed journals
  • Available in the English Language

Bibliography

(* indicates inclusion in study: N = 409)

1 Autocorrelation estimates in this range can be caused by trends in the data streams, which creates complications in terms of detecting level-change effects. The Smith et al. (in press) study used a Monte Carlo simulation to control for trends in the data streams, but trends are likely to exist in real-world data with high lag-1 autocorrelation estimates.

2 The author makes no endorsement regarding the superiority of any statistical program or package over another by their mention or exclusion in this article. The author also has no conflicts of interest in this regard.

3 However, it should be noted that it was often very difficult to locate an actual effect size reported in studies that used statistical analysis. Although this issue would likely have added little to this review, it does inhibit the inclusion of the results in meta-analysis.

  • Albright JJ, Marinova DM. Estimating multilevel modelsuUsing SPSS, Stata, and SAS. Indiana University; 2010. Retrieved from http://www.iub.edu/%7Estatmath/stat/all/hlm/hlm.pdf . [ Google Scholar ]
  • Allison DB, Gorman BS. Calculating effect sizes for meta-analysis: The case of the single case. Behavior Research and Therapy. 1993; 31 (6):621–631. doi: 10.1016/0005-7967(93)90115-B. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alloy LB, Just N, Panzarella C. Attributional style, daily life events, and hopelessness depression: Subtype validation by prospective variability and specificity of symptoms. Cognitive Therapy Research. 1997; 21 :321–344. doi: 10.1023/A:1021878516875. [ CrossRef ] [ Google Scholar ]
  • Arbuckle JL. Amos (Version 7.0) Chicago, IL: SPSS, Inc; 2006. [ Google Scholar ]
  • Barlow DH, Nock MK, Hersen M. Single case research designs: Strategies for studying behavior change. 3. New York, NY: Allyn and Bacon; 2008. [ Google Scholar ]
  • Barrett LF, Barrett DJ. An introduction to computerized experience sampling in psychology. Social Science Computer Review. 2001; 19 (2):175–185. doi: 10.1177/089443930101900204. [ CrossRef ] [ Google Scholar ]
  • Bloom M, Fisher J, Orme JG. Evaluating practice: Guidelines for the accountable professional. 4. Boston, MA: Allyn & Bacon; 2003. [ Google Scholar ]
  • Bolger N, Davis A, Rafaeli E. Diary methods: Capturing life as it is lived. Annual Review of Psychology. 2003; 54 :579–616. doi: 10.1146/annurev.psych.54.101601.145030. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borckardt JJ. Simulation Modeling Analysis: Time series analysis program for short time series data streams (Version 8.3.3) Charleston, SC: Medical University of South Carolina; 2006. [ Google Scholar ]
  • Borckardt JJ, Nash MR, Murphy MD, Moore M, Shaw D, O’Neil P. Clinical practice as natural laboratory for psychotherapy research. American Psychologist. 2008; 63 :1–19. doi: 10.1037/0003-066X.63.2.77. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borsboom D, Mellenbergh GJ, van Heerden J. The theoretical status of latent variables. Psychological Review. 2003; 110 (2):203–219. doi: 10.1037/0033-295X.110.2.203. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bower GH. Mood and memory. American Psychologist. 1981; 36 (2):129–148. doi: 10.1037/0003-066x.36.2.129. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Box GEP, Jenkins GM. Time-series analysis: Forecasting and control. San Francisco, CA: Holden-Day; 1970. [ Google Scholar ]
  • Brossart DF, Parker RI, Olson EA, Mahadevan L. The relationship between visual analysis and five statistical analyses in a simple AB single-case research design. Behavior Modification. 2006; 30 (5):531–563. doi: 10.1177/0145445503261167. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Browne MW, Nesselroade JR. Representing psychological processes with dynamic factor models: Some promising uses and extensions of autoregressive moving average time series models. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics: A festschrift for Roderick P McDonald. Mahwah, NJ: Lawrence Erlbaum Associates Publishers; 2005. pp. 415–452. [ Google Scholar ]
  • Busk PL, Marascuilo LA. Statistical analysis in single-case research: Issues, procedures, and recommendations, with applications to multiple behaviors. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc; 1992. pp. 159–185. [ Google Scholar ]
  • Busk PL, Marascuilo RC. Autocorrelation in single-subject research: A counterargument to the myth of no autocorrelation. Behavioral Assessment. 1988; 10 :229–242. [ Google Scholar ]
  • Campbell JM. Statistical comparison of four effect sizes for single-subject designs. Behavior Modification. 2004; 28 (2):234–246. doi: 10.1177/0145445503259264. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carr EG, Horner RH, Turnbull AP, Marquis JG, Magito McLaughlin D, McAtee ML, Doolabh A. Positive behavior support for people with developmental disabilities: A research synthesis. Washington, DC: American Association on Mental Retardation; 1999. [ Google Scholar ]
  • Center BA, Skiba RJ, Casey A. A methodology for the quantitative synthesis of intra-subject design research. Journal of Educational Science. 1986; 19 :387–400. doi: 10.1177/002246698501900404. [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Hollon SD. Defining empirically supported therapies. Journal of Consulting and Clinical Psychology. 1998; 66 (1):7–18. doi: 10.1037/0022-006X.66.1.7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Ollendick TH. Empirically supported psychological interventions: Controversies and evidence. Annual Review of Psychology. 2001; 52 :685–716. doi: 10.1146/annurev.psych.52.1.685. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chow S-M, Ho M-hR, Hamaker EL, Dolan CV. Equivalence and differences between structural equation modeling and state-space modeling techniques. Structural Equation Modeling. 2010; 17 (2):303–332. doi: 10.1080/10705511003661553. [ CrossRef ] [ Google Scholar ]
  • Cohen J. Statistical power analysis for the bahavioral sciences. 2. Hillsdale, NJ: Erlbaum; 1988. [ Google Scholar ]
  • Cohen J. The earth is round (p < .05) American Psychologist. 1994; 49 :997–1003. doi: 10.1037/0003-066X.49.12.997. [ CrossRef ] [ Google Scholar ]
  • Crosbie J. Interrupted time-series analysis with brief single-subject data. Journal of Consulting and Clinical Psychology. 1993; 61 (6):966–974. doi: 10.1037/0022-006X.61.6.966. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dattilio FM, Edwards JA, Fishman DB. Case studies within a mixed methods paradigm: Toward a resolution of the alienation between researcher and practitioner in psychotherapy research. Psychotherapy: Theory, Research, Practice, Training. 2010; 47 (4):427–441. doi: 10.1037/a0021181. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dempster A, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977; 39 (1):1–38. [ Google Scholar ]
  • Des Jarlais DC, Lyles C, Crepaz N. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. American Journal of Public Health. 2004; 94 (3):361–366. doi: 10.2105/ajph.94.3.361. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Diggle P, Liang KY. Analyses of longitudinal data. New York: Oxford University Press; 2001. [ Google Scholar ]
  • Doss BD, Atkins DC. Investigating treatment mediators when simple random assignment to a control group is not possible. Clinical Psychology: Science and Practice. 2006; 13 (4):321–336. doi: 10.1111/j.1468-2850.2006.00045.x. [ CrossRef ] [ Google Scholar ]
  • du Toit SHC, Browne MW. The covariance structure of a vector ARMA time series. In: Cudeck R, du Toit SHC, Sörbom D, editors. Structural equation modeling: Present and future. Lincolnwood, IL: Scientific Software International; 2001. pp. 279–314. [ Google Scholar ]
  • du Toit SHC, Browne MW. Structural equation modeling of multivariate time series. Multivariate Behavioral Research. 2007; 42 :67–101. doi: 10.1080/00273170701340953. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fechner GT. Elemente der psychophysik [Elements of psychophysics] Leipzig, Germany: Breitkopf & Hartel; 1889. [ Google Scholar ]
  • Ferron J, Sentovich C. Statistical power of randomization tests used with multiple-baseline designs. The Journal of Experimental Education. 2002; 70 :165–178. doi: 10.1080/00220970209599504. [ CrossRef ] [ Google Scholar ]
  • Ferron J, Ware W. Analyzing single-case data: The power of randomization tests. The Journal of Experimental Education. 1995; 63 :167–178. [ Google Scholar ]
  • Fox J. TEACHER’S CORNER: Structural equation modeling with the sem package in R. Structural Equation Modeling: A Multidisciplinary Journal. 2006; 13 (3):465–486. doi: 10.1207/s15328007sem1303_7. [ CrossRef ] [ Google Scholar ]
  • Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahwah, NJ: Lawrence Erlbaum Associates; 1997. [ Google Scholar ]
  • Franklin RD, Gorman BS, Beasley TM, Allison DB. Graphical display and visual analysis. In: Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahway, NJ: Lawrence Erlbaum Associates, Publishers; 1997. pp. 119–158. [ Google Scholar ]
  • Gardner W, Mulvey EP, Shaw EC. Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological Bulletin. 1995; 118 (3):392–404. doi: 10.1037/0033-2909.118.3.392. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Green AS, Rafaeli E, Bolger N, Shrout PE, Reis HT. Paper or plastic? Data equivalence in paper and electronic diaries. Psychological Methods. 2006; 11 (1):87–105. doi: 10.1037/1082-989X.11.1.87. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hamilton JD. Time series analysis. Princeton, NJ: Princeton University Press; 1994. [ Google Scholar ]
  • Hammond D, Gast DL. Descriptive analysis of single-subject research designs: 1983–2007. Education and Training in Autism and Developmental Disabilities. 2010; 45 :187–202. [ Google Scholar ]
  • Hanson MD, Chen E. Daily stress, cortisol, and sleep: The moderating role of childhood psychosocial environments. Health Psychology. 2010; 29 (4):394–402. doi: 10.1037/a0019879. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Harvey AC. Forecasting, structural time series models and the Kalman filter. Cambridge, MA: Cambridge University Press; 2001. [ Google Scholar ]
  • Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M. The use of single-subject research to identify evidence-based practice in special education. Exceptional Children. 2005; 71 :165–179. [ Google Scholar ]
  • Horner RH, Spaulding S. Single-case research designs. In: Salkind NJ, editor. Encyclopedia of research design. Thousand Oaks, CA: Sage Publications; 2010. [ Google Scholar ]
  • Horton NJ, Kleinman KP. Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician. 2007; 61 (1):79–90. doi: 10.1198/000313007X172556. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hser Y, Shen H, Chou C, Messer SC, Anglin MD. Analytic approaches for assessing long-term treatment effects. Evaluation Review. 2001; 25 (2):233–262. doi: 10.1177/0193841X0102500206. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huitema BE. Autocorrelation in applied behavior analysis: A myth. Behavioral Assessment. 1985; 7 (2):107–118. [ Google Scholar ]
  • Huitema BE, McKean JW. Reduced bias autocorrelation estimation: Three jackknife methods. Educational and Psychological Measurement. 1994; 54 (3):654–665. doi: 10.1177/0013164494054003008. [ CrossRef ] [ Google Scholar ]
  • Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH. Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association. 2005; 100 (469):332–346. doi: 10.1198/016214504000001844. [ CrossRef ] [ Google Scholar ]
  • Institute of Medicine. Reducing risks for mental disorders: Frontiers for preventive intervention research. Washington, DC: National Academy Press; 1994. [ PubMed ] [ Google Scholar ]
  • Jacobsen NS, Christensen A. Studying the effectiveness of psychotherapy: How well can clinical trials do the job? American Psychologist. 1996; 51 :1031–1039. doi: 10.1037/0003-066X.51.10.1031. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones RR, Vaught RS, Weinrott MR. Time-series analysis in operant research. Journal of Behavior Analysis. 1977; 10 (1):151–166. doi: 10.1901/jaba.1977.10-151. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones WP. Single-case time series with Bayesian analysis: A practitioner’s guide. Measurement and Evaluation in Counseling and Development. 2003; 36 (28–39) [ Google Scholar ]
  • Kanfer H. Self-monitoring: Methodological limitations and clinical applications. Journal of Consulting and Clinical Psychology. 1970; 35 (2):148–152. doi: 10.1037/h0029874. [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Drawing valid inferences from case studies. Journal of Consulting and Clinical Psychology. 1981; 49 (2):183–192. doi: 10.1037/0022-006X.49.2.183. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Mediators and mechanisms of change in psychotherapy research. Annual Review of Clinical Psychology. 2007; 3 :1–27. doi: 10.1146/annurev.clinpsy.3.022806.091432. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care. American Psychologist. 2008; 63 (3):146–159. doi: 10.1037/0003-066X.63.3.146. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Understanding how and why psychotherapy leads to change. Psychotherapy Research. 2009; 19 (4):418–428. doi: 10.1080/10503300802448899. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Single-case research designs: Methods for clinical and applied settings. 2. New York, NY: Oxford University Press; 2010. [ Google Scholar ]
  • Kirk RE. Practical significance: A concept whose time has come. Educational and Psychological Measurement. 1996; 56 :746–759. doi: 10.1177/0013164496056005002. [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR. Preparing psychologists for evidence-based school practice: Lessons learned and challenges ahead. American Psychologist. 2007; 62 :829–843. doi: 10.1037/0003-066X.62.8.829. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Hitchcock J, Horner RH, Levin JR, Odom SL, Rindskopf DM, Shadish WR. Single-case designs technical documentation. 2010 Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf . Retrieved from http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf .
  • Kratochwill TR, Levin JR. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc; 1992. [ Google Scholar ]
  • Kratochwill TR, Levin JR. Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods. 2010; 15 (2):124–144. doi: 10.1037/a0017736. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Levin JR, Horner RH, Swoboda C. Visual analysis of single-case intervention research: Conceptual and methodological considerations (WCER Working Paper No. 2011-6) 2011 Retrieved from University of Wisconsin–Madison, Wisconsin Center for Education Research website: http://www.wcer.wisc.edu/publications/workingPapers/papers.php .
  • Lambert D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics. 1992; 34 (1):1–14. [ Google Scholar ]
  • Lambert MJ, Hansen NB, Harmon SC. Developing and Delivering Practice-Based Evidence. John Wiley & Sons, Ltd; 2010. Outcome Questionnaire System (The OQ System): Development and practical applications in healthcare settings; pp. 139–154. [ Google Scholar ]
  • Littell JH, Corcoran J, Pillai VK. Systematic reviews and meta-analysis. New York: Oxford University Press; 2008. [ Google Scholar ]
  • Liu LM, Hudack GB. The SCA statistical system. Vector ARMA modeling of multiple time series. Oak Brook, IL: Scientific Computing Associates Corporation; 1995. [ Google Scholar ]
  • Lubke GH, Muthén BO. Investigating population heterogeneity with factor mixture models. Psychological Methods. 2005; 10 (1):21–39. doi: 10.1037/1082-989x.10.1.21. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Manolov R, Solanas A. Comparing N = 1 effect sizes in presence of autocorrelation. Behavior Modification. 2008; 32 (6):860–875. doi: 10.1177/0145445508318866. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marshall RJ. Autocorrelation estimation of time series with randomly missing observations. Biometrika. 1980; 67 (3):567–570. doi: 10.1093/biomet/67.3.567. [ CrossRef ] [ Google Scholar ]
  • Matyas TA, Greenwood KM. Visual analysis of single-case time series: Effects of variability, serial dependence, and magnitude of intervention effects. Journal of Applied Behavior Analysis. 1990; 23 (3):341–351. doi: 10.1901/jaba.1990.23-341. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Chair Members of the Task Force on Evidence-Based Interventions in School Psychology. Procedural and coding manual for review of evidence-based interventions. 2003 Retrieved July 18, 2011 from http://www.sp-ebi.org/documents/_workingfiles/EBImanual1.pdf .
  • Moher D, Schulz KF, Altman DF the CONSORT Group. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomized trials. Journal of the American Medical Association. 2001; 285 :1987–1991. doi: 10.1001/jama.285.15.1987. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morgan DL, Morgan RK. Single-participant research design: Bringing science to managed care. American Psychologist. 2001; 56 (2):119–127. doi: 10.1037/0003-066X.56.2.119. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Muthén BO, Curran PJ. General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods. 1997; 2 (4):371–402. doi: 10.1037/1082-989x.2.4.371. [ CrossRef ] [ Google Scholar ]
  • Muthén LK, Muthén BO. Mplus (Version 6.11) Los Angeles, CA: Muthén & Muthén; 2010. [ Google Scholar ]
  • Nagin DS. Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods. 1999; 4 (2):139–157. doi: 10.1037/1082-989x.4.2.139. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • National Institute of Child Health and Human Development. Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4769) Washington, DC: U.S. Government Printing Office; 2000. [ Google Scholar ]
  • Olive ML, Smith BW. Effect size calculations and single subject designs. Educational Psychology. 2005; 25 (2–3):313–324. doi: 10.1080/0144341042000301238. [ CrossRef ] [ Google Scholar ]
  • Oslin DW, Cary M, Slaymaker V, Colleran C, Blow FC. Daily ratings measures of alcohol craving during an inpatient stay define subtypes of alcohol addiction that predict subsequent risk for resumption of drinking. Drug and Alcohol Dependence. 2009; 103 (3):131–136. doi: 10.1016/J.Drugalcdep.2009.03.009. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Palermo TP, Valenzuela D, Stork PP. A randomized trial of electronic versus paper pain diaries in children: Impact on compliance, accuracy, and acceptability. Pain. 2004; 107 (3):213–219. doi: 10.1016/j.pain.2003.10.005. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parker RI, Brossart DF. Evaluating single-case research data: A comparison of seven statistical methods. Behavior Therapy. 2003; 34 (2):189–211. doi: 10.1016/S0005-7894(03)80013-8. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Cryer J, Byrns G. Controlling baseline trend in single case research. School Psychology Quarterly. 2006; 21 (4):418–440. doi: 10.1037/h0084131. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Vannest K. An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy. 2009; 40 (4):357–367. doi: 10.1016/j.beth.2008.10.006. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parsonson BS, Baer DM. The analysis and presentation of graphic data. In: Kratochwill TR, editor. Single subject research. New York, NY: Academic Press; 1978. pp. 101–166. [ Google Scholar ]
  • Parsonson BS, Baer DM. The visual analysis of data, and current research into the stimuli controlling it. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ; England: Lawrence Erlbaum Associates, Inc; 1992. pp. 15–40. [ Google Scholar ]
  • Piasecki TM, Hufford MR, Solham M, Trull TJ. Assessing clients in their natural environments with electronic diaries: Rationale, benefits, limitations, and barriers. Psychological Assessment. 2007; 19 (1):25–43. doi: 10.1037/1040-3590.19.1.25. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2005. [ Google Scholar ]
  • Ragunathan TE. What do we do with missing data? Some options for analysis of incomplete data. Annual Review of Public Health. 2004; 25 :99–117. doi: 10.1146/annurev.publhealth.25.102802.124410. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Raudenbush SW, Bryk AS, Congdon R. HLM 7 Hierarchical Linear and Nonlinear Modeling. Scientific Software International, Inc; 2011. [ Google Scholar ]
  • Redelmeier DA, Kahneman D. Patients’ memories of painful medical treatments: Real-time and retrospective evaluations of two minimally invasive procedures. Pain. 1996; 66 (1):3–8. doi: 10.1016/0304-3959(96)02994-6. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Reis HT. Domains of experience: Investigating relationship processes from three perspectives. In: Erber R, Gilmore R, editors. Theoretical frameworks in personal relationships. Mahwah, NJ: Erlbaum; 1994. pp. 87–110. [ Google Scholar ]
  • Reis HT, Gable SL. Event sampling and other methods for studying everyday experience. In: Reis HT, Judd CM, editors. Handbook of research methods in social and personality psychology. New York, NY: Cambridge University Press; 2000. pp. 190–222. [ Google Scholar ]
  • Robey RR, Schultz MC, Crawford AB, Sinner CA. Single-subject clinical-outcome research: Designs, data, effect sizes, and analyses. Aphasiology. 1999; 13 (6):445–473. doi: 10.1080/026870399402028. [ CrossRef ] [ Google Scholar ]
  • Rossi PH, Freeman HE. Evaluation: A systematic approach. 5. Thousand Oaks, CA: Sage; 1993. [ Google Scholar ]
  • SAS Institute Inc. The SAS system for Windows, Version 9. Cary, NC: SAS Institute Inc; 2008. [ Google Scholar ]
  • Schmidt M, Perels F, Schmitz B. How to perform idiographic and a combination of idiographic and nomothetic approaches: A comparison of time series analyses and hierarchical linear modeling. Journal of Psychology. 2010; 218 (3):166–174. doi: 10.1027/0044-3409/a000026. [ CrossRef ] [ Google Scholar ]
  • Scollon CN, Kim-Pietro C, Diener E. Experience sampling: Promises and pitfalls, strengths and weaknesses. Assessing Well-Being. 2003; 4 :5–35. doi: 10.1007/978-90-481-2354-4_8. [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA. Summarizing single-subject research: Issues and applications. Behavior Modification. 1998; 22 (3):221–242. doi: 10.1177/01454455980223001. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA, Casto G. The quantitative synthesis of single-subject research. Remedial and Special Education. 1987; 8 (2):24–33. doi: 10.1177/074193258700800206. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin; 2002. [ Google Scholar ]
  • Shadish WR, Rindskopf DM, Hedges LV. The state of the science in the meta-analysis of single-case experimental designs. Evidence-Based Communication Assessment and Intervention. 2008; 3 :188–196. doi: 10.1080/17489530802581603. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Sullivan KJ. Characteristics of single-case designs used to assess treatment effects in 2008. Behavior Research Methods. 2011; 43 :971–980. doi: 10.3758/s13428-011-0111-y. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sharpley CF. Time-series analysis of behavioural data: An update. Behaviour Change. 1987; 4 :40–45. [ Google Scholar ]
  • Shiffman S, Hufford M, Hickcox M, Paty JA, Gnys M, Kassel JD. Remember that? A comparison of real-time versus retrospective recall of smoking lapses. Journal of Consulting and Clinical Psychology. 1997; 65 :292–300. doi: 10.1037/0022-006X.65.2.292.a. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shiffman S, Stone AA. Ecological momentary assessment: A new tool for behavioral medicine research. In: Krantz DS, Baum A, editors. Technology and methods in behavioral medicine. Mahwah, NJ: Erlbaum; 1998. pp. 117–131. [ Google Scholar ]
  • Shiffman S, Stone AA, Hufford MR. Ecological momentary assessment. Annual Review of Clinical Psychology. 2008; 4 :1–32. doi: 10.1146/annurev.clinpsy.3.022806.091415. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the EM Algorithm. Journal of Time Series Analysis. 1982; 3 (4):253–264. doi: 10.1111/j.1467-9892.1982.tb00349.x. [ CrossRef ] [ Google Scholar ]
  • Skinner BF. The behavior of organisms. New York, NY: Appleton-Century-Crofts; 1938. [ Google Scholar ]
  • Smith JD, Borckardt JJ, Nash MR. Inferential precision in single-case time-series datastreams: How well does the EM Procedure perform when missing observations occur in autocorrelated data? Behavior Therapy. doi: 10.1016/j.beth.2011.10.001. (in press) [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith JD, Handler L, Nash MR. Therapeutic Assessment for preadolescent boys with oppositional-defiant disorder: A replicated single-case time-series design. Psychological Assessment. 2010; 22 (3):593–602. doi: 10.1037/a0019697. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Snijders TAB, Bosker RJ. Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage; 1999. [ Google Scholar ]
  • Soliday E, Moore KJ, Lande MB. Daily reports and pooled time series analysis: Pediatric psychology applications. Journal of Pediatric Psychology. 2002; 27 (1):67–76. doi: 10.1093/jpepsy/27.1.67. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • SPSS Statistics. Chicago, IL: SPSS Inc; 2011. (Version 20.0.0) [ Google Scholar ]
  • StataCorp. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP; 2011. [ Google Scholar ]
  • Stone AA, Broderick JE, Kaell AT, Deles-Paul PAEG, Porter LE. Does the peak-end phenomenon observed in laboratory pain studies apply to real-world pain in rheumatoid arthritics? Journal of Pain. 2000; 1 :212–217. doi: 10.1054/jpai.2000.7568. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stone AA, Shiffman S. Capturing momentary, self-report data: A proposal for reporting guidelines. Annals of Behavioral Medicine. 2002; 24 :236–243. doi: 10.1207/S15324796ABM2403_09. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stout RL. Advancing the analysis of treatment process. Addiction. 2007; 102 :1539–1545. doi: 10.1111/j.1360-0443.2007.01880.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tate RL, McDonald S, Perdices M, Togher L, Schultz R, Savage S. Rating the methodological quality of single-subject designs and N-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Scale. Neuropsychological Rehabilitation. 2008; 18 (4):385–401. doi: 10.1080/09602010802009201. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Thiele C, Laireiter A-R, Baumann U. Diaries in clinical psychology and psychotherapy: A selective review. Clinical Psychology & Psychotherapy. 2002; 9 (1):1–37. doi: 10.1002/cpp.302. [ CrossRef ] [ Google Scholar ]
  • Tiao GC, Box GEP. Modeling multiple time series with applications. Journal of the American Statistical Association. 1981; 76 :802–816. [ Google Scholar ]
  • Tschacher W, Ramseyer F. Modeling psychotherapy process by time-series panel analysis (TSPA) Psychotherapy Research. 2009; 19 (4):469–481. doi: 10.1080/10503300802654496. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. A comparison of missing-data procedures for ARIMA time-series analysis. Educational and Psychological Measurement. 2005a; 65 (4):596–615. doi: 10.1177/0013164404272502. [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. Missing data and the general transformation approach to time series analysis. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics. A festschrift to Roderick P McDonald. Hillsdale, NJ: Lawrence Erlbaum; 2005b. pp. 509–535. [ Google Scholar ]
  • Velicer WF, Fava JL. Time series analysis. In: Schinka J, Velicer WF, Weiner IB, editors. Research methods in psychology. Vol. 2. New York, NY: John Wiley & Sons; 2003. [ Google Scholar ]
  • Wachtel PL. Beyond “ESTs”: Problematic assumptions in the pursuit of evidence-based practice. Psychoanalytic Psychology. 2010; 27 (3):251–272. doi: 10.1037/a0020532. [ CrossRef ] [ Google Scholar ]
  • Watson JB. Behaviorism. New York, NY: Norton; 1925. [ Google Scholar ]
  • Weisz JR, Hawley KM. Finding, evaluating, refining, and applying empirically supported treatments for children and adolescents. Journal of Clinical Child Psychology. 1998; 27 :206–216. doi: 10.1207/s15374424jccp2702_7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weisz JR, Hawley KM. Procedural and coding manual for identification of beneficial treatments. Washinton, DC: American Psychological Association, Society for Clinical Psychology, Division 12, Committee on Science and Practice; 1999. [ Google Scholar ]
  • Westen D, Bradley R. Empirically supported complexity. Current Directions in Psychological Science. 2005; 14 :266–271. doi: 10.1111/j.0963-7214.2005.00378.x. [ CrossRef ] [ Google Scholar ]
  • Westen D, Novotny CM, Thompson-Brenner HK. The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting controlled clinical trials. Psychological Bulletin. 2004; 130 :631–663. doi: 10.1037/0033-2909.130.4.631. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wilkinson L The Task Force on Statistical Inference. Statistical methods in psychology journals: Guidelines and explanations. American Psychologist. 1999; 54 :694–704. doi: 10.1037/0003-066X.54.8.594. [ CrossRef ] [ Google Scholar ]
  • Wolery M, Busick M, Reichow B, Barton EE. Comparison of overlap methods for quantitatively synthesizing single-subject data. The Journal of Special Education. 2010; 44 (1):18–28. doi: 10.1177/0022466908328009. [ CrossRef ] [ Google Scholar ]
  • Wu Z, Huang NE, Long SR, Peng C-K. On the trend, detrending, and variability of nonlinear and nonstationary time series. Proceedings of the National Academy of Sciences. 2007; 104 (38):14889–14894. doi: 10.1073/pnas.0701020104. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Book cover

Handbook of Research Methods in Health Social Sciences pp 581–602 Cite as

Single-Case Designs

  • Breanne Byiers 2  
  • Reference work entry
  • First Online: 13 January 2019

688 Accesses

2 Citations

Single-case designs (also called single-case experimental designs) are system of research design strategies that can provide strong evidence of intervention effectiveness by using repeated measurement to establish each participant (or case) as his or her own control. The flexibility of the designs, and the focus on the individual as the unit of measurement, has led to an increased interest in the use of single-case design research in many areas of intervention research. The purpose of this chapter is to introduce the reader to the basic logic underlying the conduct and analysis of single-case design research by describing the fundamental features of this type of research, providing examples of several commonly used designs, and reviewing the guidelines for the visual analysis of single-case study data. Additionally, current areas of consensus and disagreement in the field of single-case design research will be discussed.

  • Single-case designs
  • Single-subject designs
  • Small-N research
  • Intervention research
  • Idiographic research
  • Operant psychology

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Ahearn WH, Kerwin ML, Eicher PS, Shantz J, Swearingin W. An alternating treatments comparison of two intensive interventions for food refusal. J Appl Behav Anal. 1996:29(3):321–32.

Article   Google Scholar  

Allison DB, Franklin RD, Heshka S. Reflections on visual inspection, response guided experimentation, and type I error rate in single-case designs. J Exp Educ. 1992;61(1):45–51.

Baer DM. Perhaps it would be better not to know everything. J Appl Behav Anal. 1977;10:167–72.

Baer DM, Wolf MM, Risley TR. Some current dimensions of applied behavior analysis. J Appl Behav Anal. 1968;1:91–7.

Barlow DH, Hayes SC. Alternating treatments design: one strategy for comparing the effects of two treatments in a single subject. J Appl Behav Anal. 1979;12(2):199–210.

Barlow DH, Nock M, Hersen M. Single-case experimental designs. 3rd ed. 2009.

Google Scholar  

Barrish HH, Saunders M, Wolf MM. Good behavior game: effects of individual contingencies for group consequences on disruptive behavior in a classroom. J Appl Behav Anal. 1969;2:119–24.

Brownell KD, Stunkard AJ, Albaum JM. Evaluation and modification of exercise patterns in the natural environment. Am J Psychiatr. 1980;137:1540–5.

Byun TM, Hitchcock ER, Ferron J. Masked visual analysis: minimizing type I error in visually guided single-case design for communication disorder. J Speech Lang Hear Res. 2017;60: 1455–66.

Colvin G, Sugai G, Good RJ, Lee YY. Using active supervision and pre-correction to improve transition behaviors in an elementary school. Sch Psychol Q. 1997;12:344–63.

Dallery J, Cassidy RN, Raiff BR. Single-case experimental designs to evaluate novel technology-based health interventions. J Med Internet Res. 2013;15:e22.

Dugard P, File P, Todman J. Single-case and small-n experimental designs: a practical guide to randomization tests. New York: Routledge; 2012.

Book   Google Scholar  

Ferron J, Ware W. Using randomization tests with responsive single-case designs. Behav Res Ther. 1994;32:787–91.

Fisch GS. Evaluating data from behavioral analysis: visual inspection or statistical models? Behav Process. 2001;54:137–54.

Gast DL, Ledford J. Single case research methodology. 2nd ed. New York: Routledge; 2014.

Hersen M, Bellack AS. A multiple-baseline analysis of social-skills training in chronic schizophrenics. J Appl Behav Anal. 1976;9(3):239–45.

Higgins Hains AH, Baer DM. Interaction effects in multielement designs: inevitable, desirable, and ignorable. J Appl Behav Anal. 1989;22:57–69.

Horner RD, Baer DM. Multiple-probe technique: a variation of the multiple baseline. J Appl Behav Anal. 1978;11:189–96.

Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M. The use of single subject research to identify evidence-based practice in special education. Except Child. 2005;71:165–79.

Horner RH, Swaminathan H, Sugai G, Smolkowski K. Considerations for the systematic analysis and use of single-case research. Educ Treat Child. 2012;35(2):269–90.

Jones RR, Weinrott MR, Vaught RS. Effects of serial dependency on the agreement between visual and statistical inference. J Appl Behav Anal. 1978;11:277–83.

Kazdin AE. Single-case experimental designs: methods for clinical and applied settings. New York: Oxford University Press; 1982.

Kazdin AE. Single-case research designs: methods for clinical and applied settings. New York: Oxford University Press; 2011.

Kratochwill TR, Hitchcock J, Horner RH, Levin JR, Odom SL, Rindskopf DM, Shadish WR. Single-case designs technical documentation. 2010. Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf .

Kratochwill TR, Levin JR. Enhancing the scientific credibility of single-case intervention research: randomization to the rescue. In: Kratochwill TR, Levin JR, editors. Single-case intervention research: methodological and statistical advances. Washington, DC: American Psychological Association; 2014. p. 53–90.

Chapter   Google Scholar  

Ledford JR, Gast DL. Measuring procedural fidelity in behavioural research. Neuropsychol Rehabil. 2014;24:332–48.

Manolov R, Gast DL, Perdices M, Evans JJ. Single-case experimental designs: reflections on conduct and analysis. Neuropsychol Rehabil. 2014;24(3–4):634.

Matyas TA, Greenwood KM. Visual analysis of single-case time series: effects of variability, serial dependence, and magnitude of intervention effects. J Appl Behav Anal. 1990;23:341–51.

Morgan DL, Morgan RK. Comparing group and single-case designs. In: Morgan DL, Morgan RK, editors. Single-case research methods for the behavioral and health sciences. Thousand Oaks: SAGE; 2014.

Parsonson BS, Baer DM. The analysis and presentation of graphic data. In: Kratochwill T, editor. Single subject research. New York: Academic; 1978. p. 101–66.

Parsonson BS, Baer DM. The visual analysis of data, and current research into stimuli controlling it. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: new directions for psychology and education. Hillsdale: Lawrence Erlbaum Associates; 1992. p. 15–40.

Putnam RF, Handler MW, Ramirez-Platt CM, Luiselli JK. Improving student bus-riding behavior through a whole-school intervention. J Appl Behav Anal. 2003;36:583–90.

Rose M. Single-subject experimental designs in health research. In: Liamputtong P, editor. Research methods in health: foundations for evidence-based practice. Melbourne: Oxford University Press; 2017. p. 217–34.

Schlosser RW, Blischak DM. Effects of speech and print feedback on spelling by children with autism. J Speech Lang Hear Res. 2004;47(4):848.

Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin; 2002.

Sidman M. Tactics of scientific research. Boston: Authors Cooperative, Inc; 1960.

Sindelar P, Rosenberg M, Wilson R. An adapted alternating treatments design for instructional research. Educ Treat Child. 1985;8(1):67–76.

Smith JD. Single-case experimental designs: a systematic review of published research and current standards. Psychol Methods. 2012;17:510–50.

Stark LJ, Bowen AM, Tyc VL, Evans S, Passero MA. A behavioral approach to increasing calorie consumption in children with cystic fibrosis. J Pediatr Psychol. 1990;15:309–26.

Wolery M. Procedural fidelity: a reminder of its functions. J Behav Educ. 1994;4:381–6.

Download references

Author information

Authors and affiliations.

Department of Educational Psychology, University of Minnesota, Minneapolis, MN, USA

Breanne Byiers

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Breanne Byiers .

Editor information

Editors and affiliations.

School of Science and Health, Western Sydney University, Penrith, NSW, Australia

Pranee Liamputtong

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this entry

Cite this entry.

Byiers, B. (2019). Single-Case Designs. In: Liamputtong, P. (eds) Handbook of Research Methods in Health Social Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-10-5251-4_92

Download citation

DOI : https://doi.org/10.1007/978-981-10-5251-4_92

Published : 13 January 2019

Publisher Name : Springer, Singapore

Print ISBN : 978-981-10-5250-7

Online ISBN : 978-981-10-5251-4

eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IMAGES

  1. single subject research design recommendations for levels of evidence

    quality single case research designs should have reliability data

  2. What is Research Design in Qualitative Research

    quality single case research designs should have reliability data

  3. Reliability vs. Validity in Research

    quality single case research designs should have reliability data

  4. What does Reliability and Validity mean in Research

    quality single case research designs should have reliability data

  5. single case intervention research design standards

    quality single case research designs should have reliability data

  6. Data validity and reliability in qualitative research

    quality single case research designs should have reliability data

VIDEO

  1. 9

  2. The Multiphase Design

  3. Brewing Insights: Applied Data Analytics in Real-World (CASE STUDIES)

  4. Qualitative Research and Case Study

  5. Charting in Single Case Research

  6. DATA ANALYSIS CASES

COMMENTS

  1. Single-Case Design, Analysis, and Quality Assessment for Intervention Research

    Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single case studies involve repeated measures, and manipulation of and independent variable. They can be designed to have strong internal validity for assessing causal relationships between interventions and outcomes, and external ...

  2. Generality of Findings From Single-Case Designs: It's Not All About the

    Direct replication is the strategy of repeating a study with no procedural changes to assess the reliability of a finding. This can be accomplished in the original study or in a separate study by the original or new researchers. In single-case design research, this type of replication is most apparent in the ABAB design, which includes an ...

  3. Single-Case Data Analysis: A Practitioner Guide for Accurate and

    This paper summarizes single-case experimental design and considerations for professionals to improve the accuracy and reliability of judgments made from single-case data. This paper can also help practitioners to appropriately incorporate single-case research design applications in their practice.

  4. Reliability, Validity, and Usability of Data Extraction Programs for

    The role of single-case designs in supporting rigorous intervention development and evaluation at the Institute of Educational Sciences. In Kratochwill T. R., Levin (Eds.). Single-case intervetion research: Methodological and statistical advances (pp. 283 - 296), Washington, DC: American Psychological Association.

  5. A systematic review of applied single-case research ...

    Single-case experimental designs (SCEDs) have become a popular research methodology in educational science, psychology, and beyond. The growing popularity has been accompanied by the development of specific guidelines for the conduct and analysis of SCEDs. In this paper, we examine recent practices in the conduct and analysis of SCEDs by systematically reviewing applied SCEDs published over a ...

  6. Single-case intervention research design standards: Additional proposed

    [N]onindependence issues that negatively impact statistical-conclusion validity occur in [single-case intervention] research investigations but which, fortunately, can be alleviated by adopting appropriate data-analysis strategies…[including randomization statistical test procedures to accompany the randomized single-case designs that were ...

  7. Full article: Reliable visual analysis of single-case data: A

    1. Introduction. Single-case experimental designs are used in many areas of psychological research, but investigators have yet to resolve substantial problems with single-case data analysis (Smith, Citation 2012).The most common method of single-case data analysis is visual analysis; however, interrater reliability among visual raters tends to be poor.

  8. Single Case Research Design

    Abstract. This chapter addresses the peculiarities, characteristics, and major fallacies of single case research designs. A single case study research design is a collective term for an in-depth analysis of a small non-random sample. The focus on this design is on in-depth.

  9. Single-Case Intervention Research Design Standards

    In an effort to responsibly incorporate evidence based on single-case designs (SCDs) into the What Works Clearinghouse (WWC) evidence base, the WWC assembled a panel of individuals with expertise in quantitative methods and SCD methodology to draft SCD standards.

  10. Reporting Quality of Single-Case Research Published in Learning

    Single-case research design (SCRD) studies, also referred to as n-of-1 experimental design studies, randomized controlled trials of 1, and interrupted time series studies, involve systematically manipulating an independent variable.Such studies are useful for testing causal relations with strong internal validity because each participant acts as their own control.

  11. Single-Case Design, Analysis, and Quality Assessment for Int ...

    Summary of Key Points: Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single-case studies involve repeated measures and manipulation of an independent variable. They can be designed to have strong internal validity for assessing causal relationships between interventions and outcomes, as well as external validity for ...

  12. Single-case synthesis tools I: Comparing tools to evaluate SCD quality

    The Single-Case Analysis and Design Framework ( Ledford et al., 2016) is a synthesis tool used to evaluate SCD on the design level using a hierarchical framework (cf. Zimmerman & Ledford, 2017; Zimmerman et al., 2018 ). Designs are only evaluated if at least three potential demonstrations of effect are present.

  13. Reliable visual analysis of single-case data: A comparison of rating

    Single-case experimental designs are used in many areas of psychological research, but investigators have yet to resolve substantial problems with single-case data analysis (Smith, 2012). The most common method of single-case data analysis is visual analysis; however, interrater reliability among visual raters tends to be poor.

  14. Evaluating Quality and Rigor in Single Case Research

    This chapter focuses on assessing rigor of single case design (SCD) studies. A study with low risk of bias and high rigor is internally valid. Risk of bias, rigor, and internal validity are not rated dichotomously; rather they are generally considered to be low, adequate, or high. Quality is another term used in evaluating research; it ...

  15. Single Case Research Design

    Argue why single-case research is a subsidiary design that should be used if no other, more specific research design fits. FormalPara Single case research in a nutshell Purpose: In-depth analysis of a small sample in its environmental context. Contextual conditions are part of the research process. A non-random sampling of cases.

  16. Single-Case Experimental Designs: A Systematic Review of Published

    The single-case experiment has a storied history in psychology dating back to the field's founders: Fechner (1889), Watson (1925), and Skinner (1938).It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see ...

  17. Single-Case Designs

    Single-case designs (also called single-case experimental designs) are system of research design strategies that can provide strong evidence of intervention effectiveness by using repeated measurement to establish each participant (or case) as his or her own control. The flexibility of the designs, and the focus on the individual as the unit of ...

  18. Reliability, Validity, and Usability of Data Extraction Programs for

    WebPlotDigitizer is a reliable and valid tool for extracting data from single-case design graphs and has been rated highest among several similar programs for user-friendliness (Drevon et al ...

  19. Planning Qualitative Research: Design and Decision Making for New

    Both single- and multiple-case designs are acceptable and common (Merriam & Tisdell, 2015; Stake, 1995; Yin, 2017). When choosing a single case over a multiple-case design, five rationales might apply; the single case may be (i) critical, (ii) unusual, (iii) common, (iv) revelatory, or (v) longitudinal . Multiple cases are typically used for ...

  20. Single Subject Design Flashcards

    Terms in this set (14) Single Subject Design. Single case design is different than a case study --> a case study is more narrative and highlights things out of the norm. -single subject design is an experimental study and does not include just one subject (usually 2 to 6 participants) Introduction. -AKA Single Case Design of Single Case Research.

  21. Single Case Research Designs Flashcards

    research final exam. 50 terms. quizlette696775. Preview. Types of Single Case Research Design. 26 terms. mattisdj. Preview. OB 354 chapter # 1-4 Practice quiz.

  22. Case Study Methodology of Qualitative Research: Key Attributes and

    Within a case study research, one may study a single case or multiple cases. Single case studies are most common in case study researches. Yin (2014, p. 59) says that single cases are 'eminently justifiable' under certain conditions: (a) when the case under study is unique or atypical, and hence, its study is revelatory, (b) when the case ...