No internet connection.

All search filters on the page have been cleared., your search has been saved..

  • All content
  • Dictionaries
  • Encyclopedias
  • Expert Insights
  • Foundations
  • How-to Guides
  • Journal Articles
  • Little Blue Books
  • Little Green Books
  • Project Planner
  • Tools Directory
  • Sign in to my profile My Profile

Not Logged In

  • Sign in Signed in
  • My profile My Profile

Not Logged In

  • Offline Playback link

qualitative research secondary data analysis

Have you created a personal profile? sign in or create a profile so that you can create alerts, save clips, playlists and searches.

What is secondary analysis of qualitative data?

  • Watching now: Chapter 1: What is Secondary Analysis of Qualitative Data? Start time: 00:00:08 End time: 00:03:39
  • Chapter 2: How is Secondary Data Analysis Different From Historians Revisiting Data? Start time: 00:03:40 End time: 00:05:13
  • Chapter 3: Who Does Secondary Data Analysis? Start time: 00:05:14 End time: 00:08:56
  • Chapter 4: What Might Be the Problems of Using Secondary Data? Start time: 00:08:57 End time: 00:13:55
  • Chapter 5: When Might Secondary Data Be Preferred to Original Data? Start time: 00:13:56 End time: 00:18:17

Video Type: Interview

Bishop, L. (Academics). (2011). What is secondary analysis of qualitative data? [Video]. Sage Research Methods. https:// doi. org/10.4135/9781412995573

Bishop, Libby. "What is secondary analysis of qualitative data?." In Sage Video . London: SAGE Publications Ltd., 2011. Video, 00:18:17. https:// doi. org/10.4135/9781412995573.

Bishop, L., 2011. What is secondary analysis of qualitative data? , Sage Video. [Streaming Video] London: Sage Publications Ltd. Available at: <https:// doi. org/10.4135/9781412995573 & gt; [Accessed 15 Mar 2024].

Bishop, Libby. What is secondary analysis of qualitative data? . Online video clip. SAGE Video. London: SAGE Publications, Ltd., 1 Jan 2011. doi: https:// doi. org/10.4135/9781412995573. 15 Mar 2024.

What is secondary analysis of qualitative data? [Streaming video]. 2011. doi:10.4135/9781412995573. Accessed 03/15/2024

Please log in from an authenticated institution or log into your member profile to access the email feature.

  • Sign in/register

Add this content to your learning management system or webpage by copying the code below into the HTML editor on the page. Look for the words HTML or </>. Learn More about Embedding Video   icon link (opens in new window)

Sample View:

This is an image of a Sage Research Methods video on a Learning Management System

  • Download PDF opens in new window
  • icon/tools/download-video icon/tools/video-downloaded Download video Downloading... Video downloaded

Dr. Libby Bishop defines secondary data analysis as reusing an existing data set to pursue a different research question. She explains that it is very similar to the research methodology used in history, because few historical documents were created expressly for research purposes. Bishop also highlights the benefits of using secondary data analysis to study over-researched and vulnerable populations.

Chapter 1: What is Secondary Analysis of Qualitative Data?

  • Start time: 00:00:08
  • End time: 00:03:39

Chapter 2: How is Secondary Data Analysis Different From Historians Revisiting Data?

  • Start time: 00:03:40
  • End time: 00:05:13

Chapter 3: Who Does Secondary Data Analysis?

  • Start time: 00:05:14
  • End time: 00:08:56

Chapter 4: What Might Be the Problems of Using Secondary Data?

  • Start time: 00:08:57
  • End time: 00:13:55

Chapter 5: When Might Secondary Data Be Preferred to Original Data?

  • Start time: 00:13:56
  • End time: 00:18:17
  • Product: Sage Research Methods
  • Type of Content: Interview
  • Title: What is secondary analysis of qualitative data?
  • Publisher: SAGE Publications Ltd.
  • Publication year: 2011
  • Online pub date: January 01, 2011
  • Discipline: Anthropology
  • Methods: Secondary data analysis , Archival research , Qualitative data collection , Historical research
  • Duration: 00:18:17
  • DOI: https:// doi. org/10.4135/9781412995573
  • Keywords: comparison , convenience , essays , fatherhood , finances , historical records , issues and controversies , motherhood , teaching , vulnerable populations Show all Show less

Things Discussed

Persons Discussed: Mike Savage

Interviewer: Patrick Brindle

Academic: Libby Bishop

Sign in to access this content

Get a 30 day free trial, more like this, sage recommends.

We found other relevant content for you on other Sage platforms.

Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches

Navigating away from this page will delete your results

Please save your results to "My Self-Assessments" in your profile before navigating away from this page.

Sign in to my profile

Sign up for a free trial and experience all Sage Learning Resources have to offer.

You must have a valid academic email address to sign up.

Get off-campus access

  • View or download all content my institution has access to.

Sign up for a free trial and experience all Sage Research Methods has to offer.

  • view my profile
  • view my lists
  • Technical Support
  • Find My Rep

You are here

Qualitative Secondary Research

Qualitative Secondary Research A Step-By-Step Guide

  • Claire Largan
  • Theresa Morris - University College Birmingham, UK
  • Description

Perfect for those doing dissertations and research projects, it provides an accessible introduction to the theory of secondary research and sets out the advantages and limitations of using this kind of research. Drawing on years of teaching and research experience, the authors

·       Offer step-by-step advice on how to use qualitative secondary data ·       Walk you through each stage of the research process ·       Provide practical, ethical tools to help you with your project ·       Show you how to avoid the potential pitfalls of using secondary data.

Clear and easy to understand, this book is a ready-made toolkit for successfully using qualitative secondary data. From beginner level and beyond, this no-nonsense guide takes the confusion and worry out of doing a secondary research project.

See what’s new to this edition by selecting the Features tab on this page. Should you need additional information or have questions regarding the HEOA information provided for this title, including what is new to this edition, please email [email protected] . Please include your name, contact information, and the name of the title for which you would like more information. For information on the HEOA, please go to http://ed.gov/policy/highered/leg/hea08/index.html .

For assistance with your order: Please email us at [email protected] or connect with your SAGE representative.

SAGE 2455 Teller Road Thousand Oaks, CA 91320 www.sagepub.com

I recommend this book to students and more experienced researchers who want to conduct qualitative secondary research. It is a timely and accessible guide.

Overall, the authors have written a well thought out, accessible and comprehensive book, which is a welcome addition to a relatively small literature on secondary data and documentary analysis.

I will definitely be using this in my own research, to ensure that I have not forgotten important elements in my research design and analysis, and will be recommending chapter 5 (ethics in qualitative secondary research) as additional reading in the ethics chapter of my forthcoming book aimed at undergraduate student

Concise and coherent text on QSR. Have been searching for a text that is accessible for students that will allow them to reflect on their progress as researchers while also equipping them with a voice to justify their choices. This meets those parameters.

This is an excellent introductory text for a methodology that has become accepted practice and increasingly expected by research funding bodies. Making full use of collected data is an ethical principle and will prepare students well for future practice.

Very good resource for students and graduates alike. Definitely a must-read and should-work-with book :-)

The contents of the book allows students to carry out research with ease, the book has an easy flow and many useful areas for undergraduates to follow and complete any research work

This book should be an essential companion for anyone undertaking a research project. This underrepresented topic area is broken down into comprehensive chapters that provide a practical approach whilst prompting critical reflection also. Highly recommended.

a well-crafted and accessibly-written textbook which willl be very useful to students at several levels

This is an essential and accessible book for all undergraduate and postgraduate students wishing to carry out secondary research. This book offers a step-by-step guide into the processes of qualitative research, whilst allowing readers to develop their own critical thinking skills.

Miss Novlett Mitchell University College Birmingham

This book provides insights about qualititavie research and it is very useful for every dissertation module. I am so glad I had the opportunity to include it in my module

Preview this book

For instructors, select a purchasing option, related products.

Designing Qualitative Research

This title is also available on SAGE Research Methods , the ultimate digital methods library. If your library doesn’t have access, ask your librarian to start a trial .

The Qualitative Report

Home > HCAS > HCAS_PUBS > HCAS_JOURNALS > TQR Home > TQR > Vol. 25 > No. 3 (2020)

Recommendations for Secondary Analysis of Qualitative Data

Sheryl L. Chatfield , Kent State University - Kent Campus Follow

Publications and presentations resulting from secondary analysis of qualitative research are less common than similar efforts using quantitative secondary analysis, although online availability of high-quality qualitative data continues to increase. Advantages of secondary qualitative analysis include access to sometimes hard to reach participants; challenges include identifying data that are sufficient to respond to purposes beyond those the data were initially gathered to address. In this paper I offer an overview of secondary qualitative analysis processes and provide general recommendations for researchers to consider in planning and conducting qualitative secondary analysis. I also include a select list of data sources. Well-planned secondary qualitative analysis projects potentially reflect efficient use or reuse of resources and provide meaningful insights regarding a variety of subjects.

Qualitative Research, Secondary Analysis, Online Research Data

Author Bio(s)

Dr. Sheryl L. Chatfield, C.T.R.S., is Assistant Professor in the College of Public Health at Kent State University in Kent, Ohio. She received her PhD in Health and Kinesiology with emphasis in health behavior and promotion from the University of Mississippi in University, MS, and received her M.S. degree in recreational therapy from The University of Southern Mississippi in Hattiesburg, MS. Dr. Chatfield completed the Nova Southeastern Graduate Certificate in Qualitative Research in 2014 and is currently one of the Senior Editors of TQR . Correspondence regarding this article can be addressed directly to: [email protected]

Publication Date

Creative commons license.

Creative Commons Attribution-Noncommercial 4.0 License

10.46743/2160-3715/2020.4092

Recommended APA Citation

Chatfield, S. L. (2020). Recommendations for Secondary Analysis of Qualitative Data. The Qualitative Report , 25 (3), 833-842. https://doi.org/10.46743/2160-3715/2020.4092

Since March 28, 2020

Included in

Quantitative, Qualitative, Comparative, and Historical Methodologies Commons , Social Statistics Commons

https://doi.org/10.46743/2160-3715/2020.4092

To view the content in your browser, please download Adobe Reader or, alternately, you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.

  • The Qualitative Report
  • About This Journal
  • Aims & Scope
  • Editorial Board
  • Open Access

TQR Publications

  • The Qualitative Report Conference
  • TQR Weekly Newsletter
  • Submit Article
  • Most Popular Papers
  • Receive Email Notices or RSS

Special Issues:

  • Volume 25 - Issue 13 - 4th World Conference on Qualitative Research Special Issue
  • World Conference on Qualitative Research Special Issue
  • Reflecting on the Future of QDA Software
  • Volume 22, Number 13: Asian Qualitative Research Association Special Issue - December 2017

Advanced Search

Print ISSN: 1052-0147

E-ISSN: 2160-3715

Follow TQR on:

Twitter

Submission Locations

  • View submissions on map
  • View submissions in Google Earth

qualitative research secondary data analysis

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What is Secondary Research? | Definition, Types, & Examples

What is Secondary Research? | Definition, Types, & Examples

Published on January 20, 2023 by Tegan George . Revised on January 12, 2024.

Secondary research is a research method that uses data that was collected by someone else. In other words, whenever you conduct research using data that already exists, you are conducting secondary research. On the other hand, any type of research that you undertake yourself is called primary research .

Secondary research can be qualitative or quantitative in nature. It often uses data gathered from published peer-reviewed papers, meta-analyses, or government or private sector databases and datasets.

Table of contents

When to use secondary research, types of secondary research, examples of secondary research, advantages and disadvantages of secondary research, other interesting articles, frequently asked questions.

Secondary research is a very common research method, used in lieu of collecting your own primary data. It is often used in research designs or as a way to start your research process if you plan to conduct primary research later on.

Since it is often inexpensive or free to access, secondary research is a low-stakes way to determine if further primary research is needed, as gaps in secondary research are a strong indication that primary research is necessary. For this reason, while secondary research can theoretically be exploratory or explanatory in nature, it is usually explanatory: aiming to explain the causes and consequences of a well-defined problem.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

qualitative research secondary data analysis

Secondary research can take many forms, but the most common types are:

Statistical analysis

Literature reviews, case studies, content analysis.

There is ample data available online from a variety of sources, often in the form of datasets. These datasets are often open-source or downloadable at a low cost, and are ideal for conducting statistical analyses such as hypothesis testing or regression analysis .

Credible sources for existing data include:

  • The government
  • Government agencies
  • Non-governmental organizations
  • Educational institutions
  • Businesses or consultancies
  • Libraries or archives
  • Newspapers, academic journals, or magazines

A literature review is a survey of preexisting scholarly sources on your topic. It provides an overview of current knowledge, allowing you to identify relevant themes, debates, and gaps in the research you analyze. You can later apply these to your own work, or use them as a jumping-off point to conduct primary research of your own.

Structured much like a regular academic paper (with a clear introduction, body, and conclusion), a literature review is a great way to evaluate the current state of research and demonstrate your knowledge of the scholarly debates around your topic.

A case study is a detailed study of a specific subject. It is usually qualitative in nature and can focus on  a person, group, place, event, organization, or phenomenon. A case study is a great way to utilize existing research to gain concrete, contextual, and in-depth knowledge about your real-world subject.

You can choose to focus on just one complex case, exploring a single subject in great detail, or examine multiple cases if you’d prefer to compare different aspects of your topic. Preexisting interviews , observational studies , or other sources of primary data make for great case studies.

Content analysis is a research method that studies patterns in recorded communication by utilizing existing texts. It can be either quantitative or qualitative in nature, depending on whether you choose to analyze countable or measurable patterns, or more interpretive ones. Content analysis is popular in communication studies, but it is also widely used in historical analysis, anthropology, and psychology to make more semantic qualitative inferences.

Primary Research and Secondary Research

Secondary research is a broad research approach that can be pursued any way you’d like. Here are a few examples of different ways you can use secondary research to explore your research topic .

Secondary research is a very common research approach, but has distinct advantages and disadvantages.

Advantages of secondary research

Advantages include:

  • Secondary data is very easy to source and readily available .
  • It is also often free or accessible through your educational institution’s library or network, making it much cheaper to conduct than primary research .
  • As you are relying on research that already exists, conducting secondary research is much less time consuming than primary research. Since your timeline is so much shorter, your research can be ready to publish sooner.
  • Using data from others allows you to show reproducibility and replicability , bolstering prior research and situating your own work within your field.

Disadvantages of secondary research

Disadvantages include:

  • Ease of access does not signify credibility . It’s important to be aware that secondary research is not always reliable , and can often be out of date. It’s critical to analyze any data you’re thinking of using prior to getting started, using a method like the CRAAP test .
  • Secondary research often relies on primary research already conducted. If this original research is biased in any way, those research biases could creep into the secondary results.

Many researchers using the same secondary research to form similar conclusions can also take away from the uniqueness and reliability of your research. Many datasets become “kitchen-sink” models, where too many variables are added in an attempt to draw increasingly niche conclusions from overused data . Data cleansing may be necessary to test the quality of the research.

Prevent plagiarism. Run a free check.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Sources in this article

We strongly encourage students to use sources in their work. You can cite our article (APA Style) or take a deep dive into the articles below.

George, T. (2024, January 12). What is Secondary Research? | Definition, Types, & Examples. Scribbr. Retrieved March 12, 2024, from https://www.scribbr.com/methodology/secondary-research/
Largan, C., & Morris, T. M. (2019). Qualitative Secondary Research: A Step-By-Step Guide (1st ed.). SAGE Publications Ltd.
Peloquin, D., DiMaio, M., Bierer, B., & Barnes, M. (2020). Disruptive and avoidable: GDPR challenges to secondary research uses of data. European Journal of Human Genetics , 28 (6), 697–705. https://doi.org/10.1038/s41431-020-0596-x

Is this article helpful?

Tegan George

Tegan George

Other students also liked, primary research | definition, types, & examples, how to write a literature review | guide, examples, & templates, what is a case study | definition, examples & methods, what is your plagiarism score.

Qualitative Secondary Analysis: A Case Exemplar

  • PMID: 29254902
  • PMCID: PMC5911239
  • DOI: 10.1016/j.pedhc.2017.09.007

Qualitative secondary analysis (QSA) is the use of qualitative data that was collected by someone else or was collected to answer a different research question. Secondary analysis of qualitative data provides an opportunity to maximize data utility, particularly with difficult-to-reach patient populations. However, qualitative secondary analysis methods require careful consideration and explicit description to best understand, contextualize, and evaluate the research results. In this article, we describe methodologic considerations using a case exemplar to illustrate challenges specific to qualitative secondary analysis and strategies to overcome them.

Keywords: Critical illness; ICU; qualitative research; secondary analysis.

Copyright © 2017 National Association of Pediatric Nurse Practitioners. Published by Elsevier Inc. All rights reserved.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Data Interpretation, Statistical*
  • Information Dissemination
  • Informed Consent
  • Qualitative Research*
  • Ventilator Weaning

Grants and funding

  • R01 NR007973/NR/NINR NIH HHS/United States
  • R03 AG063276/AG/NIA NIH HHS/United States

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Market Research
  • Artificial Intelligence
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Data Analysis & Reporting
  • Data Saturation In Qualitative Research

Try Qualtrics for free

What is data saturation in qualitative research.

8 min read A crucial milestone in qualitative research, data saturation means you can end the data collection phase and move on to your analysis. Here we explain exactly what it means, the telltale signs that you’ve reached it, and how to get there as efficiently as possible.

Author:  Will Webster

Subject Matter Expert:  Jess Oliveros

Data saturation is a point in data collection when new information no longer brings fresh insights to the research questions.

Reaching data saturation means you’ve collected enough data to confidently understand the patterns and themes within the dataset – you’ve got what you need to draw conclusions and make your points. Think of it like a conversation where everything that can be said has been said, and now it’s just repetition.

Why is data saturation most relevant to qualitative research? Because qualitative research is about understanding something deeply, and you can reach a critical mass when trying to do that. Quantitative research, on the other hand, deals in numbers and with predetermined sample sizes , so the concept of data saturation is less relevant.

Free eBook: Qualitative research design handbook

How to know when data saturation is reached

At the point of data saturation, you start to notice that the information you’re collecting is just reinforcing what you already know rather than providing new insights.

Knowing when you’ve reached this point is fairly subjective – there’s no formula or equation that can be applied. But there are some telltale signs that can apply to any qualitative research project .

When one or multiple of these signs are present, it’s a good time to begin finalizing the data collection phase and move on to a more detailed analysis.

Recurring themes

You start to notice that new data doesn’t bring up new themes or ideas. Instead, it echoes what you’ve already recorded.

This is a sign that you’ve likely tapped into all the main ideas related to your research question.

No new data

When interviews or surveys start to feel like you’re reading from the same script with each participant, you’ve probably reached the limit of diversity in responses. New participants will probably only confirm what you already know.

You’ve collected enough instances and evidence for each category of your analysis that you can support each theme with multiple examples. In other words, your data has become saturated with a depth and richness that illustrates each finding.

Full understanding

You reach a level of familiarity with the subject matter that allows you to accurately predict what your participants will say next. If this is the case, you’ve likely reached data saturation.

Consistency

The data starts to show consistent patterns that support a coherent story. Crucially, inconsistencies and outliers don’t challenge your thinking and significantly alter the narrative you’ve formed.

This consistency across the data set strengthens the validity of your findings.

Is data saturation the goal of qualitative research?

In a word, no. But it’s often a critical milestone.

The true goal of qualitative research is to gain a deep understanding of the subject matter; data saturation indicates that you’ve gathered enough information to achieve that understanding.

That said, working to achieve data saturation in the most efficient way possible should be a goal of your research project.

How can a qualitative research project reach data saturation?

Reaching data saturation is a pivotal point in qualitative research as a sign that you’ve generated comprehensive and reliable findings.

There’s no exact science for reaching this point, but it does consistently demand two things: an adequate sample size and well-screened participants.

Adequate sample size

Achieving data saturation in qualitative research heavily relies on determining an appropriate sample size .

This is less about hitting a specific number and more about ensuring that the range of participants is broad enough to capture the diverse perspectives your research needs – while being focused enough to allow for thorough analysis.

Flexibility is crucial in this process. For example, in a study exploring patient experiences in a hospital, starting with a small group of patients from various departments might be the initial plan. However, as the interviews progress, if new themes continue to emerge, it might indicate the need to broaden the sample size to include more patients or even healthcare providers for a more comprehensive understanding.

An iterative approach like this can help your research to capture the complexity of people’s experiences without overwhelming the research with redundant information. The goal is to reach a point where additional interviews yield little new information, signaling that the range of experiences has been adequately captured.

While yes, it’s important to stay flexible and iterate as you go, it’s always wise to make use of research solutions that can make recommendations on suggested sample size . Such tools can also monitor crucial metrics like completion rate and audience size to keep your research project on track to reach data saturation.

Well-screened participants

In qualitative research, the depth and validity of your findings are of course totally influenced by your participants. This is where the importance of well-screened participants becomes very clear.

In any research project that addresses a complex social issue – from public health strategy to educational reform – having participants who can provide a range of lived experiences and viewpoints is crucial. Generating the best result isn’t about finding a random assortment of individuals, but instead about forming a carefully selected research panel whose experiences and perspectives directly relate to the research questions.

Achieving this means looking beyond surface criteria, like age or occupation, and instead delving into qualities that are relevant to the study, like experiences, attitudes or behaviors. This ensures that the data collected is rich and deeply rooted in real-world contexts, and will ultimately set you on a faster route to data saturation.

At the same time, if you find that your participants aren’t providing the depth or range of insights expected, you probably need to reevaluate your screening criteria. It’s unlikely that you’ll get it right first time – as with determining sample size, don’t be afraid of an iterative process.

To expedite this process, researchers can use digital tools to build ever-richer pictures of respondents , driving more targeted research and more tailored interactions.

Elevate your qualitative research skills

Mastering qualitative research involves more than knowing concepts like data saturation – it’s about grasping the entire research journey. To do this, you need to dive deep into the world of qualitative research where understanding the ‘why’ behind the ‘what’ is key.

‘Qualitative research design handbook’ is your guide through this journey.

It covers everything from the essence of qualitative analysis to the intricacies of survey design and data collection. You’ll learn how to apply qualitative techniques effectively, ensuring your research is both rich and insightful.

  • Uncover the secrets of qualitative analysis
  • Design surveys that get to the heart of the matter
  • Learn strategic data collection
  • Master effective application of techniques

Get your hands on this invaluable resource to refine your research skills. Download our eBook now and step up your qualitative research game.

Related resources

Analysis & Reporting

Thematic Analysis 11 min read

Behavioral analytics 12 min read, statistical significance calculator: tool & complete guide 18 min read, regression analysis 19 min read, data analysis 31 min read, social media analytics 13 min read, kano analysis 21 min read, request demo.

Ready to learn more about Qualtrics?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Springer Open Choice

Protecting against researcher bias in secondary data analysis: challenges and potential solutions

Jessie r. baldwin.

1 Department of Clinical, Educational and Health Psychology, Division of Psychology and Language Sciences, University College London, London, WC1H 0AP UK

2 Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK

Jean-Baptiste Pingault

Tabea schoeler, hannah m. sallis.

3 MRC Integrative Epidemiology Unit at the University of Bristol, Bristol Medical School, University of Bristol, Bristol, UK

4 School of Psychological Science, University of Bristol, Bristol, UK

5 Centre for Academic Mental Health, Population Health Sciences, University of Bristol, Bristol, UK

Marcus R. Munafò

6 NIHR Biomedical Research Centre, University Hospitals Bristol NHS Foundation Trust and University of Bristol, Bristol, UK

Analysis of secondary data sources (such as cohort studies, survey data, and administrative records) has the potential to provide answers to science and society’s most pressing questions. However, researcher biases can lead to questionable research practices in secondary data analysis, which can distort the evidence base. While pre-registration can help to protect against researcher biases, it presents challenges for secondary data analysis. In this article, we describe these challenges and propose novel solutions and alternative approaches. Proposed solutions include approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) help ensure that pre-registered analyses will be appropriate for the data, and (4) address difficulties arising from reduced analytic flexibility in pre-registration. For each solution, we provide guidance on implementation for researchers and data guardians. The adoption of these practices can help to protect against researcher bias in secondary data analysis, to improve the robustness of research based on existing data.

Introduction

Secondary data analysis has the potential to provide answers to science and society’s most pressing questions. An abundance of secondary data exists—cohort studies, surveys, administrative data (e.g., health records, crime records, census data), financial data, and environmental data—that can be analysed by researchers in academia, industry, third-sector organisations, and the government. However, secondary data analysis is vulnerable to questionable research practices (QRPs) which can distort the evidence base. These QRPs include p-hacking (i.e., exploiting analytic flexibility to obtain statistically significant results), selective reporting of statistically significant, novel, or “clean” results, and hypothesising after the results are known (HARK-ing [i.e., presenting unexpected results as if they were predicted]; [ 1 ]. Indeed, findings obtained from secondary data analysis are not always replicable [ 2 , 3 ], reproducible [ 4 ], or robust to analytic choices [ 5 , 6 ]. Preventing QRPs in research based on secondary data is therefore critical for scientific and societal progress.

A primary cause of QRPs is common cognitive biases that affect the analysis, reporting, and interpretation of data [ 7 – 10 ]. For example, apophenia (the tendency to see patterns in random data) and confirmation bias (the tendency to focus on evidence that is consistent with one’s beliefs) can lead to particular analytical choices and selective reporting of “publishable” results [ 11 – 13 ]. In addition, hindsight bias (the tendency to view past events as predictable) can lead to HARK-ing, so that observed results appear more compelling.

The scope for these biases to distort research outputs from secondary data analysis is perhaps particularly acute, for two reasons. First, researchers now have increasing access to high-dimensional datasets that offer a multitude of ways to analyse the same data [ 6 ]. Such analytic flexibility can lead to different conclusions depending on the analytical choices made [ 5 , 14 , 15 ]. Second, current incentive structures in science reward researchers for publishing statistically significant, novel, and/or surprising findings [ 16 ]. This combination of opportunity and incentive may lead researchers—consciously or unconsciously—to run multiple analyses and only report the most “publishable” findings.

One way to help protect against the effects of researcher bias is to pre-register research plans [ 17 , 18 ]. This can be achieved by pre-specifying the rationale, hypotheses, methods, and analysis plans, and submitting these to either a third-party registry (e.g., the Open Science Framework [OSF]; https://osf.io/ ), or a journal in the form of a Registered Report [ 19 ]. Because research plans and hypotheses are specified before the results are known, pre-registration reduces the potential for cognitive biases to lead to p-hacking, selective reporting, and HARK-ing [ 20 ]. While pre-registration is not necessarily a panacea for preventing QRPs (Table ​ (Table1), 1 ), meta-scientific evidence has found that pre-registered studies and Registered Reports are more likely to report null results [ 21 – 23 ], smaller effect sizes [ 24 ], and be replicated [ 25 ]. Pre-registration is increasingly being adopted in epidemiological research [ 26 , 27 ], and is even required for access to data from certain cohorts (e.g., the Twins Early Development Study [ 28 ]). However, pre-registration (and other open science practices; Table ​ Table2) 2 ) can pose particular challenges to researchers conducting secondary data analysis [ 29 ], motivating the need for alternative approaches and solutions. Here we describe such challenges, before proposing potential solutions to protect against researcher bias in secondary data analysis (summarised in Fig.  1 ).

Limitations in the use of pre-registration to address QRPs

Challenges and potential solutions regarding sharing pre-existing data

An external file that holds a picture, illustration, etc.
Object name is 10654_2021_839_Fig1_HTML.jpg

Challenges in pre-registering secondary data analysis and potential solutions (according to researcher motivations). Note : In the “Potential solution” column, blue boxes indicate solutions that are researcher-led; green boxes indicate solutions that should be facilitated by data guardians

Challenges of pre-registration for secondary data analysis

Prior knowledge of the data.

Researchers conducting secondary data analysis commonly analyse data from the same dataset multiple times throughout their careers. However, prior knowledge of the data increases risk of bias, as prior expectations about findings could motivate researchers to pursue certain analyses or questions. In the worst-case scenario, a researcher might perform multiple preliminary analyses, and only pursue those which lead to notable results (perhaps posting a pre-registration for these analyses, even though it is effectively post hoc). However, even if the researcher has not conducted specific analyses previously, they may be biased (either consciously or subconsciously) to pursue certain analyses after testing related questions with the same variables, or even by reading past studies on the dataset. As such, pre-registration cannot fully protect against researcher bias when researchers have previously accessed the data.

Research may not be hypothesis-driven

Pre-registration and Registered Reports are tailored towards hypothesis-driven, confirmatory research. For example, the OSF pre-registration template requires researchers to state “specific, concise, and testable hypotheses”, while Registered Reports do not permit purely exploratory research [ 30 ], although a new Exploratory Reports format now exists [ 31 ]. However, much research involving secondary data is not focused on hypothesis testing, but is exploratory, descriptive, or focused on estimation—in other words, examining the magnitude and robustness of an association as precisely as possible, rather than simply testing a point null. Furthermore, without a strong theoretical background, hypotheses will be arbitrary and could lead to unhelpful inferences [ 32 , 33 ], and so should be avoided in novel areas of research.

Pre-registered analyses are not appropriate for the data

With pre-registration, there is always a risk that the data will violate the assumptions of the pre-registered analyses [ 17 ]. For example, a researcher might pre-register a parametric test, only for the data to be non-normally distributed. However, in secondary data analysis, the extent to which the data shape the appropriate analysis can be considerable. First, longitudinal cohort studies are often subject to missing data and attrition. Approaches to deal with missing data (e.g., listwise deletion; multiple imputation) depend on the characteristics of missing data (e.g., the extent and patterns of missingness [ 34 ]), and so pre-specifying approaches to dealing with missingness may be difficult, or extremely complex. Second, certain analytical decisions depend on the nature of the observed data (e.g., the choice of covariates to include in a multiple regression might depend on the collinearity between the measures, or the degree of missingness of different measures that capture the same construct). Third, much secondary data (e.g., electronic health records and other administrative data) were never collected for research purposes, so can present several challenges that are impossible to predict in advance [ 35 ]. These issues can limit a researcher’s ability to pre-register a precise analytic plan prior to accessing secondary data.

Lack of flexibility in data analysis

Concerns have been raised that pre-registration limits flexibility in data analysis, including justifiable exploration [ 36 – 38 ]. For example, by requiring researchers to commit to a pre-registered analysis plan, pre-registration could prevent researchers from exploring novel questions (with a hypothesis-free approach), conducting follow-up analyses to investigate notable findings [ 39 ], or employing newly published methods with advantages over those pre-registered. While this concern is also likely to apply to primary data analysis, it is particularly relevant to certain fields involving secondary data analysis, such as genetic epidemiology, where new methods are rapidly being developed [ 40 ], and follow-up analyses are often required (e.g., in a genome-wide association study to further investigate the role of a genetic variant associated with a phenotype). However, this concern is perhaps over-stated – pre-registration does not preclude unplanned analyses; it simply makes it more transparent that these analyses are post hoc. Nevertheless, another understandable concern is that reduced analytic flexibility could lead to difficulties in publishing papers and accruing citations. For example, pre-registered studies are more likely to report null results [ 22 , 23 ], likely due to reduced analytic flexibility and selective reporting. While this is a positive outcome for research integrity, null results are less likely to be published [ 13 , 41 , 42 ] and cited [ 11 ], which could disadvantage researchers’ careers.

In this section, we describe potential solutions to address the challenges involved in pre-registering secondary data analysis, including approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) ensure that pre-planned analyses will be appropriate for the data, and (4) address potential difficulties arising from reduced analytic flexibility.

Challenge: Prior knowledge of the data

Declare prior access to data.

To increase transparency about potential biases arising from knowledge of the data, researchers could routinely report all prior data access in a pre-registration [ 29 ]. This would ideally include evidence from an independent gatekeeper (e.g., a data guardian of the study) stating whether data and relevant variables were accessed by each co-author. To facilitate this process, data guardians could set up a central “electronic checkout” system that records which researchers have accessed data, what data were accessed, and when [ 43 ]. The researcher or data guardian could then provide links to the checkout histories for all co-authors in the pre-registration, to verify their prior data access. If it is not feasible to provide such objective evidence, authors could self-certify their prior access to the dataset and where possible, relevant variables—preferably listing any publications and in-preparation studies based on the dataset [ 29 ]. Of course, self-certification relies on trust that researchers will accurately report prior data access, which could be challenging if the study involves a large number of authors, or authors who have been involved on many studies on the dataset. However, it is likely to be the most feasible option at present as many datasets do not have available electronic records of data access. For further guidance on self-certifying prior data access when pre-registering secondary data analysis studies on a third-party registry (e.g., the OSF), we recommend referring to the template by Van den Akker, Weston [ 29 ].

The extent to which prior access to data renders pre-registration invalid is debatable. On the one hand, even if data have been accessed previously, pre-registration is likely to reduce QRPs by encouraging researchers to commit to a pre-specified analytic strategy. On the other hand, pre-registration does not fully protect against researcher bias where data have already been accessed, and can lend added credibility to study claims, which may be unfounded. Reporting prior data access in a pre-registration is therefore important to make these potential biases transparent, so that readers and reviewers can judge the credibility of the findings accordingly. However, for a more rigorous solution which protects against researcher bias in the context of prior data access, researchers should consider adopting a multiverse approach.

Conduct a multiverse analysis

A multiverse analysis involves identifying all potential analytic choices that could justifiably be made to address a given research question (e.g., different ways to code a variable, combinations of covariates, and types of analytic model), implementing them all, and reporting the results [ 44 ]. Notably, this method differs from the traditional approach in which findings from only one analytic method are reported. It is conceptually similar to a sensitivity analysis, but it is far more comprehensive, as often hundreds or thousands of analytic choices are reported, rather than a handful. By showing the results from all defensible analytic approaches, multiverse analysis reduces scope for selective reporting and provides insight into the robustness of findings against analytical choices (for example, if there is a clear convergence of estimates, irrespective of most analytical choices). For causal questions in observational research, Directed Acyclic Graphs (DAGs) could be used to inform selection of covariates in multiverse approaches [ 45 ] (i.e., to ensure that confounders, rather than mediators or colliders, are controlled for).

Specification curve analysis [ 46 ] is a form of multiverse analysis that has been applied to examine the robustness of epidemiological findings to analytic choices [ 6 , 47 ]. Specification curve analysis involves three steps: (1) identifying all analytic choices – termed “specifications”, (2) displaying the results graphically with magnitude of effect size plotted against analytic choice, and (3) conducting joint inference across all results. When applied to the association between digital technology use and adolescent well-being [ 6 ], specification curve analysis showed that the (small, negative) association diminished after accounting for adequate control variables and recall bias – demonstrating the sensitivity of results to analytic choices.

Despite the benefits of the multiverse approach in addressing analytic flexibility, it is not without limitations. First, because each analytic choice is treated as equally valid, including less justifiable models could bias the results away from the truth. Second, the choice of specifications can be biased by prior knowledge (e.g., a researcher may choose to omit a covariate to obtain a particular result). Third, multiverse analysis may not entirely prevent selective reporting (e.g., if the full range of results are not reported), although pre-registering multiverse approaches (and specifying analytic choices) could mitigate this. Last, and perhaps most importantly, multiverse analysis is technically challenging (e.g., when there are hundreds or thousands of analytic choices) and can be impractical for complex analyses, very large datasets, or when computational resources are limited. However, this burden can be somewhat reduced by tutorials and packages which are being developed to standardise the procedure and reduce computational time [see 48 , 49 ].

Challenge: Research may not be hypothesis-driven

Pre-register research questions and conditions for interpreting findings.

Observational research arguably does not need to have a hypothesis to benefit from pre-registration. For studies that are descriptive or focused on estimation, we recommend pre-registering research questions, analysis plans, and criteria for interpretation. Analytic flexibility will be limited by pre-registering specific research questions and detailed analysis plans, while post hoc interpretation will be limited by pre-specifying criteria for interpretation [ 50 ]. The potential for HARK-ing will also be minimised because readers can compare the published study to the original pre-registration, where a-priori hypotheses were not specified.

Detailed guidance on how to pre-register research questions and analysis plans for secondary data is provided in Van den Akker’s [ 29 ] tutorial. To pre-specify conditions for interpretation, it is important to anticipate – as much as possible – all potential findings, and state how each would be interpreted. For example, suppose that a researcher aims to test a causal relationship between X and Y using a multivariate regression model with longitudinal data. Assuming that all potential confounders have been fully measured and controlled for (albeit a strong assumption) and statistical power is high, three broad sets of results and interpretations could be pre-specified. First, an association between X and Y that is similar in magnitude to the unadjusted association would be consistent with a causal relationship. Second, an association between X and Y that is attenuated after controlling for confounders would suggest that the relationship is partly causal and partly confounded. Third, a minimal, non-statistically significant adjusted association would suggest a lack of evidence for a causal effect of X on Y. Depending on the context of the study, criteria could also be provided on the threshold (or range of thresholds) at which the effect size would justify different interpretations [ 51 ], be considered practically meaningful, or the smallest effect size of interest for equivalence tests [ 52 ]. While researcher biases might still affect the pre-registered criteria for interpreting findings (e.g., toward over-interpreting a small effect size as meaningful), this bias will at least be transparent in the pre-registration.

Use a holdout sample to delineate exploratory and confirmatory research

Where researchers wish to integrate exploratory research into a pre-registered, confirmatory study, a holdout sample approach can be used [ 18 ]. Creating a holdout sample refers to the process of randomly splitting the dataset into two parts, often referred to as ‘training’ and ‘holdout’ datasets. To delineate exploratory and confirmatory research, researchers can first conduct exploratory data analysis on the training dataset (which should comprise a moderate fraction of the data, e.g., 35% [ 53 ]. Based on the results of the discovery process, researchers can pre-register hypotheses and analysis plans to formally test on the holdout dataset. This process has parallels with cross-validation in machine learning, in which the dataset is split and the model is developed on the training dataset, before being tested on the test dataset. The approach enables a flexible discovery process, before formally testing discoveries in a non-biased way.

When considering whether to use the holdout sample approach, three points should be noted. First, because the training dataset is not reusable, there will be a reduced sample size and loss of power relative to analysing the whole dataset. As such, the holdout sample approach will only be appropriate when the original dataset is large enough to provide sufficient power in the holdout dataset. Second, when the training dataset is used for exploration, subsequent confirmatory analyses on the holdout dataset may be overfitted (due to both datasets being drawn from the same sample), so replication in independent samples is recommended. Third, the holdout dataset should be created by an independent data manager or guardian, to ensure that the researcher does not have knowledge of the full dataset. However, it is straightforward to randomly split a dataset into a holdout and training sample and we provide example R code at: https://github.com/jr-baldwin/Researcher_Bias_Methods/blob/main/Holdout_script.md .

Challenge: Pre-registered analyses are not appropriate for the data

Use blinding to test proposed analyses.

One method to help ensure that pre-registered analyses will be appropriate for the data is to trial the analyses on a blinded dataset [ 54 ], before pre-registering. Data blinding involves obscuring the data values or labels prior to data analysis, so that the proposed analyses can be trialled on the data without observing the actual findings. Various types of blinding strategies exist [ 54 ], but one method that is appropriate for epidemiological data is “data scrambling” [ 55 ]. This involves randomly shuffling the data points so that any associations between variables are obscured, whilst the variable distributions (and amounts of missing data) remain the same. We provide a tutorial for how to implement this in R (see https://github.com/jr-baldwin/Researcher_Bias_Methods/blob/main/Data_scrambling_tutorial.md ). Ideally the data scrambling would be done by a data guardian who is independent of the research, to ensure that the main researcher does not access the data prior to pre-registering the analyses. Once the researcher is confident with the analyses, the study can be pre-registered, and the analyses conducted on the unscrambled dataset.

Blinded analysis offers several advantages for ensuring that pre-registered analyses are appropriate, with some limitations. First, blinded analysis allows researchers to directly check the distribution of variables and amounts of missingness, without having to make assumptions about the data that may not be met, or spend time planning contingencies for every possible scenario. Second, blinded analysis prevents researchers from gaining insight into the potential findings prior to pre-registration, because associations between variables are masked. However, because of this, blinded analysis does not enable researchers to check for collinearity, predictors of missing data, or other covariances that may be necessary for model specification. As such, blinded analysis will be most appropriate for researchers who wish to check the data distribution and amounts of missingness before pre-registering.

Trial analyses on a dataset excluding the outcome

Another method to help ensure that pre-registered analyses will be appropriate for the data is to trial analyses on a dataset excluding outcome data. For example, data managers could provide researchers with part of the dataset containing the exposure variable(s) plus any covariates and/or auxiliary variables. The researcher can then trial and refine the analyses ahead of pre-registering, without gaining insight into the main findings (which require the outcome data). This approach is used to mitigate bias in propensity score matching studies [ 26 , 56 ], as researchers use data on the exposure and covariates to create matched groups, prior to accessing any outcome data. Once the exposed and non-exposed groups have been matched effectively, researchers pre-register the protocol ahead of viewing the outcome data. Notably though, this approach could help researchers to identify and address other analytical challenges involving secondary data. For example, it could be used to check multivariable distributional characteristics, test for collinearity between multiple predictor variables, or identify predictors of missing data for multiple imputation.

This approach offers certain benefits for researchers keen to ensure that pre-registered analyses are appropriate for the observed data, with some limitations. Regarding benefits, researchers will be able to examine associations between variables (excluding the outcome), unlike the data scrambling approach described above. This would be helpful for checking certain assumptions (e.g., collinearity or characteristics of missing data such as whether it is missing at random). In addition, the approach is easy to implement, as the dataset can be initially created without the outcome variable, which can then be added after pre-registration, minimising burden on data guardians. Regarding limitations, it is possible that accessing variables in advance could provide some insight into the findings. For example, if a covariate is known to be highly correlated with the outcome, testing the association between the covariate and the exposure could give some indication of the relationship between the exposure and the outcome. To make this potential bias transparent, researchers should report the variables that they already accessed in the pre-registration. Another limitation is that researchers will not be able to identify analytical issues relating to the outcome data in advance of pre-registration. Therefore, this approach will be most appropriate where researchers wish to check various characteristics of the exposure variable(s) and covariates, rather than the outcome. However, a “mixed” approach could be applied in which outcome data is provided in scrambled format, to enable researchers to also assess distributional characteristics of the outcome. This would substantially reduce the number of potential challenges to be considered in pre-registered analytical pipelines.

Pre-register a decision tree

If it is not possible to access any of the data prior to pre-registering (e.g., to enable analyses to be trialled on a dataset that is blinded or missing outcome data), researchers could pre-register a decision tree. This defines the sequence of analyses and rules based on characteristics of the observed data [ 17 ]. For example, the decision tree could specify testing a normality assumption, and based on the results, whether to use a parametric or non-parametric test. Ideally, the decision tree should provide a contingency plan for each of the planned analyses, if assumptions are not fulfilled. Of course, it can be challenging and time consuming to anticipate every potential issue with the data and plan contingencies. However, investing time into pre-specifying a decision tree (or a set of contingency plans) could save time should issues arise during data analysis, and can reduce the likelihood of deviating from the pre-registration.

Challenge: Lack of flexibility in data analysis

Transparently report unplanned analyses.

Unplanned analyses (such as applying new methods or conducting follow-up tests to investigate an interesting or unexpected finding) are a natural and often important part of the scientific process. Despite common misconceptions, pre-registration does not permit such unplanned analyses from being included, as long as they are transparently reported as post-hoc. If there are methodological deviations, we recommend that researchers should (1) clearly state the reasons for using the new method, and (2) if possible, report results from both methods, to ideally show that the change in methods was not due to the results [ 57 ]. This information can either be provided in the manuscript or in an update to the original pre-registration (e.g., on the third-party registry such as the OSF), which can be useful when journal word limits are tight. Similarly, if researchers wish to include additional follow-up analyses to investigate an interesting or unexpected finding, this should be reported but labelled as “exploratory” or “post-hoc” in the manuscript.

Ensure a paper’s value does not depend on statistically significant results

Researchers may be concerned that reduced analytic flexibility from pre-registration could increase the likelihood of reporting null results [ 22 , 23 ], which are harder to publish [ 13 , 42 ]. To address this, we recommend taking steps to ensure that the value and success of a study does not depend on a significant p-value. First, methodologically strong research (e.g., with high statistical power, valid and reliable measures, robustness checks, and replication samples) will advance the field, whatever the findings. Second, methods can be applied to allow for the interpretation of statistically non-significant findings (e.g., Bayesian methods [ 58 ] or equivalence tests, which determine whether an observed effect is surprisingly small [ 52 , 59 , 60 ]. This means that the results will be informative whatever they show, in contrast to approaches relying solely on null hypothesis significance testing, where statistically non-significant findings cannot be interpreted as meaningful. Third, researchers can submit the proposed study as a Registered Report, where it will be evaluated before the results are available. This is arguably the strongest way to protect against publication bias, as in-principle study acceptance is granted without any knowledge of the results. In addition, Registered Reports can improve the methodology, as suggestions from expert reviewers can be incorporated into the pre-registered protocol.

Under a system that rewards novel and statistically significant findings, it is easy for subconscious human biases to lead to QRPs. However, researchers, along with data guardians, journals, funders, and institutions, have a responsibility to ensure that findings are reproducible and robust. While pre-registration can help to limit analytic flexibility and selective reporting, it involves several challenges for epidemiologists conducting secondary data analysis. The approaches described here aim to address these challenges (Fig.  1 ), to either improve the efficacy of pre-registration or provide an alternative approach to address analytic flexibility (e.g., a multiverse analysis). The responsibility in adopting these approaches should not only fall on researchers’ shoulders; data guardians also have an important role to play in recording and reporting access to data, providing blinded datasets and hold-out samples, and encouraging researchers to pre-register and adopt these solutions as part of their data request. Furthermore, wider stakeholders could incentivise these practices; for example, journals could provide a designated space for researchers to report deviations from the pre-registration, and funders could provide grants to establish best practice at the cohort level (e.g., data checkout systems, blinded datasets). Ease of adoption is key to ensure wide uptake, and we therefore encourage efforts to evaluate, simplify and improve these practices. Steps that could be taken to evaluate these practices are presented in Box 1.

More broadly, it is important to emphasise that researcher biases do not operate in isolation, but rather in the context of wider publication bias and a “publish or perish” culture. These incentive structures not only promote QRPs [ 61 ], but also discourage researchers from pre-registering and adopting other time-consuming reproducible methods. Therefore, in addition to targeting bias at the individual researcher level, wider initiatives from journals, funders, and institutions are required to address these institutional biases [ 7 ]. Systemic changes that reward rigorous and reproducible research will help researchers to provide unbiased answers to science and society’s most important questions.

Box 1. Evaluation of approaches

To evaluate, simplify and improve approaches to protect against researcher bias in secondary data analysis, the following steps could be taken.

Co-creation workshops to refine approaches

To obtain feedback on the approaches (including on any practical concerns or feasibility issues) co-creation workshops could be held with researchers, data managers, and wider stakeholders (e.g., journals, funders, and institutions).

Empirical research to evaluate efficacy of approaches

To evaluate the effectiveness of the approaches in preventing researcher bias and/or improving pre-registration, empirical research is needed. For example, to test the extent to which the multiverse analysis can reduce selective reporting, comparisons could be made between effect sizes from multiverse analyses versus effect sizes from meta-analyses (of non-pre-registered studies) addressing the same research question. If smaller effect sizes were found in multiverse analyses, it would suggest that the multiverse approach can reduce selective reporting. In addition, to test whether providing a blinded dataset or dataset missing outcome variables could help researchers develop an appropriate analytical protocol, researchers could be randomly assigned to receive such a dataset (or no dataset), prior to pre-registration. If researchers who received such a dataset had fewer eventual deviations from the pre-registered protocol (in the final study), it would suggest that this approach can help ensure that proposed analyses are appropriate for the data.

Pilot implementation of the measures

To assess the practical feasibility of the approaches, data managers could pilot measures for users of the dataset (e.g., required pre-registration for access to data, provision of datasets that are blinded or missing outcome variables). Feedback could then be collected from researchers and data managers via about the experience and ease of use.

Acknowledgements

The authors are grateful to Professor George Davey for his helpful comments on this article.

Author contributions

JRB and MRM developed the idea for the article. The first draft of the manuscript was written by JRB, with support from MRM and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

J.R.B is funded by a Wellcome Trust Sir Henry Wellcome fellowship (grant 215917/Z/19/Z). J.B.P is a supported by the Medical Research Foundation 2018 Emerging Leaders 1 st Prize in Adolescent Mental Health (MRF-160–0002-ELP-PINGA). M.R.M and H.M.S work in a unit that receives funding from the University of Bristol and the UK Medical Research Council (MC_UU_00011/5, MC_UU_00011/7), and M.R.M is also supported by the National Institute for Health Research (NIHR) Biomedical Research Centre at the University Hospitals Bristol National Health Service Foundation Trust and the University of Bristol.

Declarations

Author declares that they have no conflict of interest.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

VIDEO

  1. Topic 11 Qualitative Data Analysis

  2. Qualitative Data Analysis: An Introduction

  3. Qualitative Data Analysis

  4. Beginners guide to coding qualitative data

  5. Five Types of Data Analysis

  6. Qualitative Data Explained

COMMENTS

  1. Conducting secondary analysis of qualitative data: Should we, can we

    SDA involves investigations where data collected for a previous study is analyzed - either by the same researcher(s) or different researcher(s) - to explore new questions or use different analysis strategies that were not a part of the primary analysis (Szabo and Strang, 1997).For research involving quantitative data, SDA, and the process of sharing data for the purpose of SDA, has become ...

  2. Conducting secondary analysis of qualitative data: Should we, can we

    Concerns about secondary data analysis when using qualitative data. The primary concerns about SDA with qualitative data surround rigor and ethics from a number of stakeholder perspectives, including research participants, funders, and the researchers themselves. Heaton (2004) suggests that a strength of secondary analysis of qualitative data ...

  3. Secondary Qualitative Research Methodology Using Online Data within the

    Secondary qualitative data analysis can be a powerful method by which to gain insights that primary data analysis cannot offer. ... This article provides a guideline for a new secondary qualitative data research methodology that draws on a range of existing methods and adds a procedural structure for a complete analysis from the beginning to ...

  4. Qualitative Secondary Analysis: A Case Exemplar

    Qualitative secondary analysis (QSA) is the use of qualitative data collected by someone else or to answer a different research question. Secondary analysis of qualitative data provides an opportunity to maximize data utility particularly with difficult to reach patient populations. However, QSA methods require careful consideration and ...

  5. Recommendations for Secondary Analysis of Qualitative Data

    Advantages and Challenges in Qualitative Secondary Analysis . Secondary analysis of qualitative research data has similar advantages to secondary quantitative analysis. These include time savings in the sampling, data processing and collection processes and ready availability of rich data for research projects that qualify for

  6. Sage Research Methods Foundations

    Abstract. Secondary analysis is a research methodology in which preexisting data are used to investigate new questions or to verify the findings of previous work. It can be applied to both quantitative and qualitative data but is more established in relation to the former. Interest in the secondary analysis of qualitative data has grown since ...

  7. Secondary Analysis of Qualitative Data: An Overview

    2.1 Re-use of pre-existing research data. Secondary analysis involves the re-use of pre-existing qualitative data derived. from previous research studies. These data include material such as semi. structured interviews, responses to open-ended questions in questionnaires, field notes and research diaries.

  8. Secondary Analysis Research

    Secondary analysis of data collected by another researcher for a different purpose, or SDA, is increasing in the medical and social sciences. This is not surprising, given the immense body of health care-related research performed worldwide and the potential beneficial clinical implications of the timely expansion of primary research (Johnston, 2014; Tripathy, 2013).

  9. Sage Research Methods

    The Ethics of Qualitative Secondary Analysis; Part II: BUILDING QUALITATIVE SECONDARY ANALYSIS INTO RESEARCH AND TEACHING. Documents of Lives and Times: Revisiting Qualitative Data through Time; Search Strategies: Analytic Searching Across Multiple Datasets and Within Combined Sources; Collective Qualitative Secondary Analysis and Data Sharing ...

  10. Qualitative Secondary Analysis: A Case Exemplar

    Qualitative secondary analysis (QSA) is the use of qualitative data that was collected by someone else or was collected to answer a different research question. Secondary analysis of qualitative data provides an opportunity to maximize data utility, particularly with difficult-to-reach patient populations. However, qualitative secondary ...

  11. (PDF) Conducting secondary analysis of qualitative data ...

    This critical interpretive synthesis examined research articles (n = 71) published between 2006 and 2016 that involved qualitative secondary data analysis and assessed the context, purpose, and ...

  12. Sage Research Methods

    Summary. Dr. Libby Bishop defines secondary data analysis as reusing an existing data set to pursue a different research question. She explains that it is very similar to the research methodology used in history, because few historical documents were created expressly for research purposes.

  13. Secondary analysis of qualitative data: an overview

    It clarifies what secondary analysis is and how the methodology relates to other similar approaches used in qualitative research. It looks at the development of secon dary analysis in qualitative research, and some of the factors that have shaped this. And it examines the ways in which researchers have re-used qualitative data in published ...

  14. PDF Conducting Qualitative Secondary Data Analysis: PGT Projects

    qualitative secondary data analysis as part of their postgraduate dissertation/project. Please note: This document does not cover how to analyse data. For some guidance on this, please see the references in Appendix A (end of document), reference your research methods training guidance, or consult your project/dissertation supervisor.

  15. Qualitative Secondary Research

    Clear and easy to understand, this book is a ready-made toolkit for successfully using qualitative secondary data. From beginner level and beyond, this no-nonsense guide takes the confusion and worry out of doing a secondary research project. Available Formats. ISBN: 9781526410986. Paperback.

  16. Recommendations for Secondary Analysis of Qualitative Data

    Publications and presentations resulting from secondary analysis of qualitative research are less common than similar efforts using quantitative secondary analysis, although online availability of high-quality qualitative data continues to increase. Advantages of secondary qualitative analysis include access to sometimes hard to reach participants; challenges include identifying data that are ...

  17. PDF Secondary Analysis

    Secondary analysis of qualitative data is the use of existing data to find answers to research questions that differ from the questions asked in the original research (Hinds et al., 1997). Whilst there is a well-established tradition of carrying out secondary analysis of quantitative

  18. Conducting secondary analysis of qualitative data: Should we, can we

    This critical interpretive synthesis examined research articles (n = 71) published between 2006 and 2016 that involved qualitative secondary data analysis and assessed the context, purpose, and methodologies that were reported.

  19. Qualitative secondary data analysis: Ethics, epistemology and context

    Abstract. There has been a significant growth in the infrastructure for archiving and sharing qualitative data, facilitating reuse and secondary analysis. The article explores some issues relating to ethics and epistemology in the conduct of qualitative secondary analysis. It also offers a critical discussion of the importance of engaging with ...

  20. What is Secondary Research?

    Revised on January 12, 2024. Secondary research is a research method that uses data that was collected by someone else. In other words, whenever you conduct research using data that already exists, you are conducting secondary research. On the other hand, any type of research that you undertake yourself is called primary research.

  21. Qualitative Secondary Analysis: A Case Exemplar

    Qualitative secondary analysis (QSA) is the use of qualitative data that was collected by someone else or was collected to answer a different research question. Secondary analysis of qualitative data provides an opportunity to maximize data utility, particularly with difficult-to-reach patient populations. However, qualitative secondary ...

  22. Secondary Data Analysis: Ethical Issues and Challenges

    Issues in Secondary analysis of Qualitative data. In qualitative research, the culture of data archiving is absent . Also, there is a concern that data archiving exposes subject's personal views. However, the best practice is to plan anonymisation at the time of initial transcription. Use of pseudonyms or replacements can protect subject's ...

  23. PDF Dissertation projects: Introduction to Secondary Analysis for

    What is secondary analysis? "Asking new questions of old data" by re-analysing data that has already been collected by someone else. Researchers often collect far more data than is needed to answer their research question. SA is a way to reuse this "extra" data to answer a new research question.

  24. What is data saturation in qualitative research?

    Achieving data saturation in qualitative research heavily relies on determining an appropriate sample size. This is less about hitting a specific number and more about ensuring that the range of participants is broad enough to capture the diverse perspectives your research needs - while being focused enough to allow for thorough analysis.

  25. Protecting against researcher bias in secondary data analysis

    Introduction. Secondary data analysis has the potential to provide answers to science and society's most pressing questions. An abundance of secondary data exists—cohort studies, surveys, administrative data (e.g., health records, crime records, census data), financial data, and environmental data—that can be analysed by researchers in academia, industry, third-sector organisations, and ...

  26. Reflexive Content Analysis: An Approach to Qualitative Data Analysis

    Secondly, the level of interpretation required by qualitive content analysis methods is often ambiguous. Qualitative content analysis has generally been seen as a method for the systematic reduction and description of textual data with the aim of identifying meaningful patterns (Cavanagh, 1997; Cho & Lee, 2014; Elo & Kyngäs, 2008; Erlingsson & Brysiewicz, 2017; Hsieh & Shannon, 2005; Mayring ...