• UNC Libraries
  • HSL Academic Process
  • Systematic Reviews
  • Step 7: Extract Data from Included Studies

Systematic Reviews: Step 7: Extract Data from Included Studies

Created by health science librarians.

HSL Logo

  • Step 1: Complete Pre-Review Tasks
  • Step 2: Develop a Protocol
  • Step 3: Conduct Literature Searches
  • Step 4: Manage Citations
  • Step 5: Screen Citations
  • Step 6: Assess Quality of Included Studies

About Step 7: Extract Data from Included Studies

About data extraction, select a data extraction tool, what should i extract, helpful tip- data extraction.

  • Data extraction FAQs
  • Step 8: Write the Review

  Check our FAQ's

   Email us

  Chat with us (during business hours)

   Call (919) 962-0800

   Make an appointment with a librarian

  Request a systematic or scoping review consultation

In Step 7, you will skim the full text of included articles to collect information about the studies in a table format (extract data), to summarize the studies and make them easier to compare. You will: 

  • Make sure you have collected the full text of any included articles.
  • Choose the pieces of information you want to collect from each study.
  • Choose a method for collecting the data.
  • Create the data extraction table.
  • Test the data collection table (optional). 
  • Collect (extract) the data. 
  • Review the data collected for any errors. 

For accuracy, two or more people should extract data from each study. This process can be done by hand or by using a computer program. 

Click an item below to see how it applies to Step 7: Extract Data from Included Studies.

Reporting your review with PRISMA

If you reach the data extraction step and choose to exclude articles for any reason, update the number of included and excluded studies in your PRISMA flow diagram.

Managing your review with Covidence

Covidence allows you to assemble a custom data extraction template, have two reviewers conduct extraction, then send their extractions for consensus.

How a librarian can help with Step 7

A librarian can advise you on data extraction for your systematic review, including: 

  • What the data extraction stage of the review entails
  • Finding examples in the literature of similar reviews and their completed data tables
  • How to choose what data to extract from your included articles 
  • How to create a randomized sample of citations for a pilot test
  • Best practices for reporting your included studies and their important data in your review

In this step of the systematic review, you will develop your evidence tables, which give detailed information for each study (perhaps using a PICO framework as a guide), and summary tables, which give a high-level overview of the findings of your review. You can create evidence and summary tables to describe study characteristics, results, or both. These tables will help you determine which studies, if any, are eligible for quantitative synthesis.

Data extraction requires a lot of planning.  We will review some of the tools you can use for data extraction, the types of information you will want to extract, and the options available in the systematic review software used here at UNC, Covidence .

How many people should extract data?

The Cochrane Handbook and other studies strongly suggest at least two reviewers and extractors to reduce the number of errors.

  • Chapter 5: Collecting Data (Cochrane Handbook)
  • A Practical Guide to Data Extraction for Intervention Systematic Reviews (Covidence)

Click on a type of data extraction tool below to see some more information about using that type of tool and what UNC has to offer.

Systematic Review Software (Covidence)

Most systematic review software tools have data extraction functionality that can save you time and effort.  Here at UNC, we use a systematic review software called Covidence. You can see a more complete list of options in the Systematic Review Toolbox .

Covidence allows you to create and publish a data extraction template with text fields, single-choice items, section headings and section subheadings; perform dual and single reviewer data extraction ; review extractions for consensus ; and export data extraction and quality assessment to a CSV with each item in a column and each study in a row.

  • Covidence@UNC Guide
  • Covidence for Data Extraction (Covidence)
  • A Practical Guide to Data Extraction for Intervention Systematic Reviews(Covidence)

Spreadsheet or Database Software (Excel, Google Sheets)

You can also use spreadsheet or database software to create custom extraction forms. Spreadsheet software (such as Microsoft Excel) has functions such as drop-down menus and range checks can speed up the process and help prevent data entry errors. Relational databases (such as Microsoft Access) can help you extract information from different categories like citation details, demographics, participant selection, intervention, outcomes, etc.

  • Microsoft Products (UNC Information Technology Services)

Cochrane RevMan

RevMan offers collection forms for descriptive information on population, interventions, and outcomes, and quality assessments, as well as for data for analysis and forest plots. The form elements may not be changed, and data must be entered manually. RevMan is a free software download.

  • Cochrane RevMan 5.0 Download
  • RevMan for Non-Cochrane Reviews (Cochrane Training)

Survey or Form Software (Qualtrics, Poll Everywhere)

Survey or form tools can help you create custom forms with many different question types, such as multiple choice, drop downs, ranking, and more. Content from these tools can often be exported to spreadsheet or database software as well. Here at UNC we have access to the survey/form software Qualtrics & Poll Everywhere.

  • Qualtrics (UNC Information Technology Services)
  • Poll Everywhere (UNC Information Technology Services)

Electronic Documents or Paper & Pencil (Word, Google Docs)

In the past, people often used paper and pencil to record the data they extracted from articles. Handwritten extraction is less popular now due to widespread electronic tools. You can record extracted data in electronic tables or forms created in Microsoft Word or other word processing programs, but this process may take longer than many of our previously listed methods. If chosen, the electronic document or paper-and-pencil extraction methods should only be used for small reviews, as larger sets of articles may become unwieldy. These methods may also be more prone to errors in data entry than some of the more automated methods.

There are benefits and limitations to each method of data extraction.  You will want to consider:

  • The cost of the software / tool
  • Shareability / versioning
  • Existing versus custom data extraction forms
  • The data entry process
  • Interrater reliability

For example, in Covidence you may spend more time building your data extraction form, but save time later in the extraction process as Covidence can automatically highlight discrepancies for review and resolution between different extractors. Excel may require less time investment to create an extraction form, but it may take longer for you to match and compare data between extractors. More in-depth comparison of the benefits and limitations of each extraction tool can be found in the table below.

Sample information to include in an extraction table

It may help to consult other similar systematic reviews to identify what data to collect or to think about your question in a framework such as PICO .

Helpful data for an intervention question may include:

  • Information about the article (author(s), year of publication, title, DOI)
  • Information about the study (study type, participant recruitment / selection / allocation, level of evidence, study quality)
  • Patient demographics (age, sex, ethnicity, diseases / conditions, other characteristics related to the intervention / outcome)
  • Intervention (quantity, dosage, route of administration, format, duration, time frame, setting)
  • Outcomes (quantitative and / or qualitative)

If you plan to synthesize data, you will want to collect additional information such as sample sizes, effect sizes, dependent variables, reliability measures, pre-test data, post-test data, follow-up data, and statistical tests used.

Extraction templates and approaches should be determined by the needs of the specific review.   For example, if you are extracting qualitative data, you will want to extract data such as theoretical framework, data collection method, or role of the researcher and their potential bias.

  • Supplementary Guidance for Inclusion of Qualitative Research in Cochrane Systematic Reviews of Interventions (Cochrane Collaboration Qualitative Methods Group)
  • Look for an existing extraction form or tool to help guide you.  Use existing systematic reviews on your topic to identify what information to collect if you are not sure what to do.
  • Train the review team on the extraction categories and what type of data would be expected.  A manual or guide may help your team establish standards.
  • Pilot the extraction / coding form to ensure data extractors are recording similar data. Revise the extraction form if needed.
  • Discuss any discrepancies in coding throughout the process.
  • Document any changes to the process or the form.  Keep track of the decisions the team makes and the reasoning behind them.
  • << Previous: Step 6: Assess Quality of Included Studies
  • Next: Step 8: Write the Review >>
  • Last Updated: Apr 23, 2024 5:25 PM
  • URL: https://guides.lib.unc.edu/systematic-reviews

Search & Find

  • E-Research by Discipline
  • More Search & Find

Places & Spaces

  • Places to Study
  • Book a Study Room
  • Printers, Scanners, & Computers
  • More Places & Spaces
  • Borrowing & Circulation
  • Request a Title for Purchase
  • Schedule Instruction Session
  • More Services

Support & Guides

  • Course Reserves
  • Research Guides
  • Citing & Writing
  • More Support & Guides
  • Mission Statement
  • Diversity Statement
  • Staff Directory
  • Job Opportunities
  • Give to the Libraries
  • News & Exhibits
  • Reckoning Initiative
  • More About Us

UNC University Libraries Logo

  • Search This Site
  • Privacy Policy
  • Accessibility
  • Give Us Your Feedback
  • 208 Raleigh Street CB #3916
  • Chapel Hill, NC 27515-8890
  • 919-962-1053

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 26, Issue 3
  • Summarising good practice guidelines for data extraction for systematic reviews and meta-analysis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0001-6589-5456 Kathryn S Taylor 1 ,
  • http://orcid.org/0000-0002-7791-8552 Kamal R Mahtani 1 ,
  • http://orcid.org/0000-0003-1139-655X Jeffrey K Aronson 2
  • 1 Nuffield Department of Primary Care Health Sciences , University of Oxford , Oxford , UK
  • 2 Centre for Evidence Based Medicine , University of Oxford , Oxford , UK
  • Correspondence to Dr Kathryn S Taylor, Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford OX2 6GG, UK; kathryn.taylor{at}phc.ox.ac.uk

https://doi.org/10.1136/bmjebm-2020-111651

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data extraction is the process of a systematic review that occurs between identifying eligible studies and analysing the data, whether it can be a qualitative synthesis or a quantitative synthesis involving the pooling of data in a meta-analysis. The aims of data extraction are to obtain information about the included studies in terms of the characteristics of each study and its population and, for quantitative synthesis, to collect the necessary data to carry out meta-analysis. In systematic reviews, information about the included studies will also be required to conduct risk of bias assessments, but these data are not the focus of this article.

Following good practice when extracting data will help make the process efficient and reduce the risk of errors and bias. Failure to follow good practice risks basing the analysis on poor quality data, and therefore providing poor quality inputs, which will result in poor quality outputs, with unreliable conclusions and invalid study findings. In computer science, this is known as ‘garbage in, garbage out’ or ‘rubbish in, rubbish out’. Furthermore, providing insufficient information about the included studies for readers to be able to assess the generalisability of the findings from a systematic review will undermine the value of the pooled analysis. Such failures will cause your systematic review and meta-analysis to be less useful than it ought to be.

Some guidelines for data extraction are formal, including those described in the Cochrane Handbook for Systematic Reviews of Interventions, 1 the Cochrane Handbook for Diagnostic Test Accuracy Reviews, 2 3 the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines for systematic reviews and their protocols 4–7 and other sources, 8 9 , formal guidelines are complemented with informal advice in the form of examples and videos on how to avoid possible pitfalls and guidance on how to carry out data extraction more efficiently. 10–12

Guidelines for data extraction involve recommendations for:

Duplication.

Anticipation.

Organisation.

Documentation.

Duplication

Ideally, at least two reviewers should extract data independently, 1 2 9–12 particularly outcome data, 1 as data extraction by only one person can generate errors. 1 13 Data will be extracted from the same sources into identical data extraction forms. If time or resources prevent independent dual extraction, one reviewer should extract the full data and another should independently check the extracted data for both accuracy and completeness. 8 In rapid or restricted reviews, an acceptable level of verification of the data extraction by the first reviewer may be achieved by a second reviewer extracting a random sample of data. 14 Then before comparing the extracted data and seeking a consensus, the extent to which coded (categorical) data extracted by two different reviewers is consistent and may be measured using kappa statistics, 1 2 12 15 or Fleis kappa statistics, when more than two people have extracted the data. 16 Formal comparisons are not routine in Cochrane Reviews, and the Cochrane Handbook recommends that if agreement is to be formally assessed, it should focus only on key outcomes or risk of bias assessments. 1

Anticipation

Disagreement between reviewers when extracting data . Some differences in extracted data are simply due to human error, and such conflicts can be easily resolved. Conflicts and questions about clinical issues, about which data to extract, or whether the relevant data have been reported can be addressed by involving both clinicians and methodologists in data extraction. 3 12 The protocol should set out the strategy for resolving disagreements between reviewers, using consensus and, if necessary, arbitration by another reviewer. If arbitration fails, the study authors should be contacted for clarification. If that is unsuccessful, the disagreement should be documented and reported. 1 6 7

Outcome data being reported in different ways, which are not necessarily suitable for meta-analysis . Many resources are available for helping with data extraction, involving various methods and equations to transform reported data or make estimates. 1 2 10 The protocol may acknowledge this by stating that any estimates made and their justification will be documented and reported.

Including estimates and alternative data . It is also important to anticipate the roles that extracted data will play in the analysis. Studies should be highlighted when multiple sets of outcome data are reported or when estimates have been made in extracting outcome data. 9 Clearly identifying these studies during the data extraction phase will ensure that the studies can be quickly identified later, during the data analysis phase.

Risk of double counting patients . Some studies involve multiple reports, but the study should be the unit of interest. 1 Tracking down multiple reports and ensuring that patients are not double-counted may require good detective skills.

Risk of human error, inconsistency and subjectivity when extracting data . The protocol should state whether data extraction was independent and carried out in duplicate, if a standardised data extraction form was used, and whether it was piloted. The protocol should also state any special instruction, for example, only extracting prespecified eligibility criteria. 1 2 6–9 11 12

Ambiguous or incomplete data . Authors should be contacted to seek clarification about data and make enquiries about the availability of unreported data. 1 2 9 The process of confirming and obtaining data from authors should be prespecified 6 7 including the number of attempts that will be made to make contact, who will be contacted (eg, the first author), and what form the data request will take. Asking for data that are likely to be readily available will reduce the risk of authors offering data with preconditions.

Extracting the right amount of data . Time and resources are wasted extracting data that will not be analysed, such as the language of the publication and the journal name when other extracted data (first author, title and year) adequately identify the publication. The aim of the systematic review will determine which study characteristics are extracted. 16 For example, if the prevalence of a disease is important and is known to vary across cities, the country and city should be extracted. Any assumptions and simplifications should be listed in the protocol. 6 7 The protocol should allow some flexibility for alternative analyses by not over-aggregating data, for example, collecting data on smoking status in categories ‘smoker/ex-smoker/never smoked’ instead of ‘smoker/non-smoker’. 11

Organisation

Guidelines recommend that the process of extracting data should be well organised. This involves having a clear plan, which should feature in the protocol, stating who will extract the data, the actual data that will be extracted, details about the use, development, piloting of a standardised data extraction form 1 6–9 and having good data management procedures, 10 including backing up files frequently. 11 Standardised data extraction forms can provide consistency in a systematic review, while at the same time reducing biases and improving validity and reliability. It may be possible to reuse a form from another review. 12 It is recommended that the data extraction form is piloted and that reviewers receive training in advance 1 2 12 and instructions should be given with extraction forms (eg, about codes and definitions used in the form) to reduce subjectivity and to ensure consistency. 1 2 12 It is recommended that instructions be integrated into the extraction form, so that they are seen each time data are extracted, rather than having instructions in a separate instruction document, which may be ignored or forgotten. 2 Data extraction forms may be paper based or electronic or involve sophisticated data systems. Each approach will have advantages and disadvantages. 1 11 17 For example, using a paper-based form does not require internet access or software skills, but using an electronic extraction form facilitates data analysis. Data systems, while costly, can provide online data storage and automated comparisons between data that have been independently extracted.

Documentation

Data extraction procedures and preanalysis calculations should be well documented 9 10 and based on ‘good bookkeeping’. 5 10 Having good documentation supports accurate reporting, transparency and the ability to scrutinise and replicate the analysis. Reporting guidelines for systematic reviews are provided by PRISMA, 4 5 and these correspond to the set of PRISMA guidelines for protocols of systematic reviews. 6 7 In cases where data are derived from multiple reports, documenting the source of each data item will facilitate the process of resolving disagreements with other reviewers, by enabling the source of conflict to be quickly identified. 10

Data extraction is both time consuming and error-prone, and automation of data extraction is still in its infancy. 1 18 Following both formal and informal guidelines for good practice in data extraction ( table 1 ) will make the process efficient and reduce the risk of errors and bias when extracting data. This will contribute towards ensuring that systematic reviews and meta-analyses are carried out to a high standard.

  • View inline

Summarising guidelines for extracting data for systematic reviews meta-analysis

  • Higgins JPT ,
  • Leeflang MM ,
  • Davenport CF ,
  • Takwoini Y ,
  • Davenport CF
  • Liberati A ,
  • Tetzlaff J , et al
  • Altman DG ,
  • Shamseer L ,
  • Clarke M , et al
  • Centre for Reviews and Dissemination, University of York
  • Collaboration for Environmental Evidence
  • Dalhousie University
  • Buscemi N ,
  • Hartling L ,
  • Vandermeer B , et al
  • Plüddemann A ,
  • Aronson JK ,
  • Onakpoya I , et al
  • Vedula SS ,
  • Hadar N , et al
  • Jonnalagadda SR ,

Twitter @dataextips

Contributors KST and KRM conceived the idea of the series of which this is one part. KST wrote the first draft of the manuscript. All authors revised the manuscript and agreed the final version.

Funding This research was supported by the National Institute for Health Research Applied Research Collaboration Oxford and Thames Valley at Oxford Health NHS Foundation Trust.

Disclaimer The views expressed in this publication are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

Competing interests KRM and JKA were associate editors of BMJ Evidence Medicine at the time of submission.

Provenance and peer review Commissioned; internally peer reviewed.

Read the full text or download the PDF:

Jump to navigation

Home

Cochrane Training

Chapter 5: collecting data.

Tianjing Li, Julian PT Higgins, Jonathan J Deeks

Key Points:

  • Systematic reviews have studies, rather than reports, as the unit of interest, and so multiple reports of the same study need to be identified and linked together before or after data extraction.
  • Because of the increasing availability of data sources (e.g. trials registers, regulatory documents, clinical study reports), review authors should decide on which sources may contain the most useful information for the review, and have a plan to resolve discrepancies if information is inconsistent across sources.
  • Review authors are encouraged to develop outlines of tables and figures that will appear in the review to facilitate the design of data collection forms. The key to successful data collection is to construct easy-to-use forms and collect sufficient and unambiguous data that faithfully represent the source in a structured and organized manner.
  • Effort should be made to identify data needed for meta-analyses, which often need to be calculated or converted from data reported in diverse formats.
  • Data should be collected and archived in a form that allows future access and data sharing.

Cite this chapter as: Li T, Higgins JPT, Deeks JJ (editors). Chapter 5: Collecting data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

5.1 Introduction

Systematic reviews aim to identify all studies that are relevant to their research questions and to synthesize data about the design, risk of bias, and results of those studies. Consequently, the findings of a systematic review depend critically on decisions relating to which data from these studies are presented and analysed. Data collected for systematic reviews should be accurate, complete, and accessible for future updates of the review and for data sharing. Methods used for these decisions must be transparent; they should be chosen to minimize biases and human error. Here we describe approaches that should be used in systematic reviews for collecting data, including extraction of data directly from journal articles and other reports of studies.

5.2 Sources of data

Studies are reported in a range of sources which are detailed later. As discussed in Section 5.2.1 , it is important to link together multiple reports of the same study. The relative strengths and weaknesses of each type of source are discussed in Section 5.2.2 . For guidance on searching for and selecting reports of studies, refer to Chapter 4 .

Journal articles are the source of the majority of data included in systematic reviews. Note that a study can be reported in multiple journal articles, each focusing on some aspect of the study (e.g. design, main results, and other results).

Conference abstracts are commonly available. However, the information presented in conference abstracts is highly variable in reliability, accuracy, and level of detail (Li et al 2017).

Errata and letters can be important sources of information about studies, including critical weaknesses and retractions, and review authors should examine these if they are identified (see MECIR Box 5.2.a ).

Trials registers (e.g. ClinicalTrials.gov) catalogue trials that have been planned or started, and have become an important data source for identifying trials, for comparing published outcomes and results with those planned, and for obtaining efficacy and safety data that are not available elsewhere (Ross et al 2009, Jones et al 2015, Baudard et al 2017).

Clinical study reports (CSRs) contain unabridged and comprehensive descriptions of the clinical problem, design, conduct and results of clinical trials, following a structure and content guidance prescribed by the International Conference on Harmonisation (ICH 1995). To obtain marketing approval of drugs and biologics for a specific indication, pharmaceutical companies submit CSRs and other required materials to regulatory authorities. Because CSRs also incorporate tables and figures, with appendices containing the protocol, statistical analysis plan, sample case report forms, and patient data listings (including narratives of all serious adverse events), they can be thousands of pages in length. CSRs often contain more data about trial methods and results than any other single data source (Mayo-Wilson et al 2018). CSRs are often difficult to access, and are usually not publicly available. Review authors could request CSRs from the European Medicines Agency (Davis and Miller 2017). The US Food and Drug and Administration had historically avoided releasing CSRs but launched a pilot programme in 2018 whereby selected portions of CSRs for new drug applications were posted on the agency’s website. Many CSRs are obtained through unsealed litigation documents, repositories (e.g. clinicalstudydatarequest.com ), and other open data and data-sharing channels (e.g. The Yale University Open Data Access Project) (Doshi et al 2013, Wieland et al 2014, Mayo-Wilson et al 2018)).

Regulatory reviews such as those available from the US Food and Drug Administration or European Medicines Agency provide useful information about trials of drugs, biologics, and medical devices submitted by manufacturers for marketing approval (Turner 2013). These documents are summaries of CSRs and related documents, prepared by agency staff as part of the process of approving the products for marketing, after reanalysing the original trial data. Regulatory reviews often are available only for the first approved use of an intervention and not for later applications (although review authors may request those documents, which are usually brief). Using regulatory reviews from the US Food and Drug Administration as an example, drug approval packages are available on the agency’s website for drugs approved since 1997 (Turner 2013); for drugs approved before 1997, information must be requested through a freedom of information request. The drug approval packages contain various documents: approval letter(s), medical review(s), chemistry review(s), clinical pharmacology review(s), and statistical reviews(s).

Individual participant data (IPD) are usually sought directly from the researchers responsible for the study, or may be identified from open data repositories (e.g. www.clinicalstudydatarequest.com ). These data typically include variables that represent the characteristics of each participant, intervention (or exposure) group, prognostic factors, and measurements of outcomes (Stewart et al 2015). Access to IPD has the advantage of allowing review authors to reanalyse the data flexibly, in accordance with the preferred analysis methods outlined in the protocol, and can reduce the variation in analysis methods across studies included in the review. IPD reviews are addressed in detail in Chapter 26 .

MECIR Box 5.2.a Relevant expectations for conduct of intervention reviews

5.2.1 Studies (not reports) as the unit of interest

In a systematic review, studies rather than reports of studies are the principal unit of interest. Since a study may have been reported in several sources, a comprehensive search for studies for the review may identify many reports from a potentially relevant study (Mayo-Wilson et al 2017a, Mayo-Wilson et al 2018). Conversely, a report may describe more than one study.

Multiple reports of the same study should be linked together (see MECIR Box 5.2.b ). Some authors prefer to link reports before they collect data, and collect data from across the reports onto a single form. Other authors prefer to collect data from each report and then link together the collected data across reports. Either strategy may be appropriate, depending on the nature of the reports at hand. It may not be clear that two reports relate to the same study until data collection has commenced. Although sometimes there is a single report for each study, it should never be assumed that this is the case.

MECIR Box 5.2.b Relevant expectations for conduct of intervention reviews

It can be difficult to link multiple reports from the same study, and review authors may need to do some ‘detective work’. Multiple sources about the same trial may not reference each other, do not share common authors (Gøtzsche 1989, Tramèr et al 1997), or report discrepant information about the study design, characteristics, outcomes, and results (von Elm et al 2004, Mayo-Wilson et al 2017a).

Some of the most useful criteria for linking reports are:

  • trial registration numbers;
  • authors’ names;
  • sponsor for the study and sponsor identifiers (e.g. grant or contract numbers);
  • location and setting (particularly if institutions, such as hospitals, are named);
  • specific details of the interventions (e.g. dose, frequency);
  • numbers of participants and baseline data; and
  • date and duration of the study (which also can clarify whether different sample sizes are due to different periods of recruitment), length of follow-up, or subgroups selected to address secondary goals.

Review authors should use as many trial characteristics as possible to link multiple reports. When uncertainties remain after considering these and other factors, it may be necessary to correspond with the study authors or sponsors for confirmation.

5.2.2 Determining which sources might be most useful

A comprehensive search to identify all eligible studies from all possible sources is resource-intensive but necessary for a high-quality systematic review (see Chapter 4 ). Because some data sources are more useful than others (Mayo-Wilson et al 2018), review authors should consider which data sources may be available and which may contain the most useful information for the review. These considerations should be described in the protocol. Table 5.2.a summarizes the strengths and limitations of different data sources (Mayo-Wilson et al 2018). Gaining access to CSRs and IPD often takes a long time. Review authors should begin searching repositories and contact trial investigators and sponsors as early as possible to negotiate data usage agreements (Mayo-Wilson et al 2015, Mayo-Wilson et al 2018).

Table 5.2.a Strengths and limitations of different data sources for systematic reviews

5.2.3 Correspondence with investigators

Review authors often find that they are unable to obtain all the information they seek from available reports about the details of the study design, the full range of outcomes measured and the numerical results. In such circumstances, authors are strongly encouraged to contact the original investigators (see MECIR Box 5.2.c ). Contact details of study authors, when not available from the study reports, often can be obtained from more recent publications, from university or institutional staff listings, from membership directories of professional societies, or by a general search of the web. If the contact author named in the study report cannot be contacted or does not respond, it is worthwhile attempting to contact other authors.

Review authors should consider the nature of the information they require and make their request accordingly. For descriptive information about the conduct of the trial, it may be most appropriate to ask open-ended questions (e.g. how was the allocation process conducted, or how were missing data handled?). If specific numerical data are required, it may be more helpful to request them specifically, possibly providing a short data collection form (either uncompleted or partially completed). If IPD are required, they should be specifically requested (see also Chapter 26 ). In some cases, study investigators may find it more convenient to provide IPD rather than conduct additional analyses to obtain the specific statistics requested.

MECIR Box 5.2.c Relevant expectations for conduct of intervention reviews

5.3 What data to collect

5.3.1 what are data.

For the purposes of this chapter, we define ‘data’ to be any information about (or derived from) a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators. Review authors should plan in advance what data will be required for their systematic review, and develop a strategy for obtaining them (see MECIR Box 5.3.a ). The involvement of consumers and other stakeholders can be helpful in ensuring that the categories of data collected are sufficiently aligned with the needs of review users ( Chapter 1, Section 1.3 ). The data to be sought should be described in the protocol, with consideration wherever possible of the issues raised in the rest of this chapter.

The data collected for a review should adequately describe the included studies, support the construction of tables and figures, facilitate the risk of bias assessment, and enable syntheses and meta-analyses. Review authors should familiarize themselves with reporting guidelines for systematic reviews (see online Chapter III and the PRISMA statement; (Liberati et al 2009) to ensure that relevant elements and sections are incorporated. The following sections review the types of information that should be sought, and these are summarized in Table 5.3.a (Li et al 2015).

MECIR Box 5.3.a Relevant expectations for conduct of intervention reviews

Table 5.3.a Checklist of items to consider in data collection

*Full description required for assessments of risk of bias (see Chapter 8 , Chapter 23 and Chapter 25 ).

5.3.2 Study methods and potential sources of bias

Different research methods can influence study outcomes by introducing different biases into results. Important study design characteristics should be collected to allow the selection of appropriate methods for assessment and analysis, and to enable description of the design of each included study in a table of ‘Characteristics of included studies’, including whether the study is randomized, whether the study has a cluster or crossover design, and the duration of the study. If the review includes non-randomized studies, appropriate features of the studies should be described (see Chapter 24 ).

Detailed information should be collected to facilitate assessment of the risk of bias in each included study. Risk-of-bias assessment should be conducted using the tool most appropriate for the design of each study, and the information required to complete the assessment will depend on the tool. Randomized studies should be assessed using the tool described in Chapter 8 . The tool covers bias arising from the randomization process, due to deviations from intended interventions, due to missing outcome data, in measurement of the outcome, and in selection of the reported result. For each item in the tool, a description of what happened in the study is required, which may include verbatim quotes from study reports. Information for assessment of bias due to missing outcome data and selection of the reported result may be most conveniently collected alongside information on outcomes and results. Chapter 7 (Section 7.3.1) discusses some issues in the collection of information for assessments of risk of bias. For non-randomized studies, the most appropriate tool is described in Chapter 25 . A separate tool also covers bias due to missing results in meta-analysis (see Chapter 13 ).

A particularly important piece of information is the funding source of the study and potential conflicts of interest of the study authors.

Some review authors will wish to collect additional information on study characteristics that bear on the quality of the study’s conduct but that may not lead directly to risk of bias, such as whether ethical approval was obtained and whether a sample size calculation was performed a priori.

5.3.3 Participants and setting

Details of participants are collected to enable an understanding of the comparability of, and differences between, the participants within and between included studies, and to allow assessment of how directly or completely the participants in the included studies reflect the original review question.

Typically, aspects that should be collected are those that could (or are believed to) affect presence or magnitude of an intervention effect and those that could help review users assess applicability to populations beyond the review. For example, if the review authors suspect important differences in intervention effect between different socio-economic groups, this information should be collected. If intervention effects are thought constant over such groups, and if such information would not be useful to help apply results, it should not be collected. Participant characteristics that are often useful for assessing applicability include age and sex. Summary information about these should always be collected unless they are not obvious from the context. These characteristics are likely to be presented in different formats (e.g. ages as means or medians, with standard deviations or ranges; sex as percentages or counts for the whole study or for each intervention group separately). Review authors should seek consistent quantities where possible, and decide whether it is more relevant to summarize characteristics for the study as a whole or by intervention group. It may not be possible to select the most consistent statistics until data collection is complete across all or most included studies. Other characteristics that are sometimes important include ethnicity, socio-demographic details (e.g. education level) and the presence of comorbid conditions. Clinical characteristics relevant to the review question (e.g. glucose level for reviews on diabetes) also are important for understanding the severity or stage of the disease.

Diagnostic criteria that were used to define the condition of interest can be a particularly important source of diversity across studies and should be collected. For example, in a review of drug therapy for congestive heart failure, it is important to know how the definition and severity of heart failure was determined in each study (e.g. systolic or diastolic dysfunction, severe systolic dysfunction with ejection fractions below 20%). Similarly, in a review of antihypertensive therapy, it is important to describe baseline levels of blood pressure of participants.

If the settings of studies may influence intervention effects or applicability, then information on these should be collected. Typical settings of healthcare intervention studies include acute care hospitals, emergency facilities, general practice, and extended care facilities such as nursing homes, offices, schools, and communities. Sometimes studies are conducted in different geographical regions with important differences that could affect delivery of an intervention and its outcomes, such as cultural characteristics, economic context, or rural versus city settings. Timing of the study may be associated with important technology differences or trends over time. If such information is important for the interpretation of the review, it should be collected.

Important characteristics of the participants in each included study should be summarized for the reader in the table of ‘Characteristics of included studies’.

5.3.4 Interventions

Details of all experimental and comparator interventions of relevance to the review should be collected. Again, details are required for aspects that could affect the presence or magnitude of an effect or that could help review users assess applicability to their own circumstances. Where feasible, information should be sought (and presented in the review) that is sufficient for replication of the interventions under study. This includes any co-interventions administered as part of the study, and applies similarly to comparators such as ‘usual care’. Review authors may need to request missing information from study authors.

The Template for Intervention Description and Replication (TIDieR) provides a comprehensive framework for full description of interventions and has been proposed for use in systematic reviews as well as reports of primary studies (Hoffmann et al 2014). The checklist includes descriptions of:

  • the rationale for the intervention and how it is expected to work;
  • any documentation that instructs the recipient on the intervention;
  • what the providers do to deliver the intervention (procedures and processes);
  • who provides the intervention (including their skill level), how (e.g. face to face, web-based) and in what setting (e.g. home, school, or hospital);
  • the timing and intensity;
  • whether any variation is permitted or expected, and whether modifications were actually made; and
  • any strategies used to ensure or assess fidelity or adherence to the intervention, and the extent to which the intervention was delivered as planned.

For clinical trials of pharmacological interventions, key information to collect will often include routes of delivery (e.g. oral or intravenous delivery), doses (e.g. amount or intensity of each treatment, frequency of delivery), timing (e.g. within 24 hours of diagnosis), and length of treatment. For other interventions, such as those that evaluate psychotherapy, behavioural and educational approaches, or healthcare delivery strategies, the amount of information required to characterize the intervention will typically be greater, including information about multiple elements of the intervention, who delivered it, and the format and timing of delivery. Chapter 17 provides further information on how to manage intervention complexity, and how the intervention Complexity Assessment Tool (iCAT) can facilitate data collection (Lewin et al 2017).

Important characteristics of the interventions in each included study should be summarized for the reader in the table of ‘Characteristics of included studies’. Additional tables or diagrams such as logic models ( Chapter 2, Section 2.5.1 ) can assist descriptions of multi-component interventions so that review users can better assess review applicability to their context.

5.3.4.1 Integrity of interventions

The degree to which specified procedures or components of the intervention are implemented as planned can have important consequences for the findings from a study. We describe this as intervention integrity ; related terms include adherence, compliance and fidelity (Carroll et al 2007). The verification of intervention integrity may be particularly important in reviews of non-pharmacological trials such as behavioural interventions and complex interventions, which are often implemented in conditions that present numerous obstacles to idealized delivery.

It is generally expected that reports of randomized trials provide detailed accounts of intervention implementation (Zwarenstein et al 2008, Moher et al 2010). In assessing whether interventions were implemented as planned, review authors should bear in mind that some interventions are standardized (with no deviations permitted in the intervention protocol), whereas others explicitly allow a degree of tailoring (Zwarenstein et al 2008). In addition, the growing field of implementation science has led to an increased awareness of the impact of setting and context on delivery of interventions (Damschroder et al 2009). (See Chapter 17, Section 17.1.2.1 for further information and discussion about how an intervention may be tailored to local conditions in order to preserve its integrity.)

Information about integrity can help determine whether unpromising results are due to a poorly conceptualized intervention or to an incomplete delivery of the prescribed components. It can also reveal important information about the feasibility of implementing a given intervention in real life settings. If it is difficult to achieve full implementation in practice, the intervention will have low feasibility (Dusenbury et al 2003).

Whether a lack of intervention integrity leads to a risk of bias in the estimate of its effect depends on whether review authors and users are interested in the effect of assignment to intervention or the effect of adhering to intervention, as discussed in more detail in Chapter 8, Section 8.2.2 . Assessment of deviations from intended interventions is important for assessing risk of bias in the latter, but not the former (see Chapter 8, Section 8.4 ), but both may be of interest to decision makers in different ways.

An example of a Cochrane Review evaluating intervention integrity is provided by a review of smoking cessation in pregnancy (Chamberlain et al 2017). The authors found that process evaluation of the intervention occurred in only some trials and that the implementation was less than ideal in others, including some of the largest trials. The review highlighted how the transfer of an intervention from one setting to another may reduce its effectiveness when elements are changed, or aspects of the materials are culturally inappropriate.

5.3.4.2 Process evaluations

Process evaluations seek to evaluate the process (and mechanisms) between the intervention’s intended implementation and the actual effect on the outcome (Moore et al 2015). Process evaluation studies are characterized by a flexible approach to data collection and the use of numerous methods to generate a range of different types of data, encompassing both quantitative and qualitative methods. Guidance for including process evaluations in systematic reviews is provided in Chapter 21 . When it is considered important, review authors should aim to collect information on whether the trial accounted for, or measured, key process factors and whether the trials that thoroughly addressed integrity showed a greater impact. Process evaluations can be a useful source of factors that potentially influence the effectiveness of an intervention.

5.3.5 Outcome s

An outcome is an event or a measurement value observed or recorded for a particular person or intervention unit in a study during or following an intervention, and that is used to assess the efficacy and safety of the studied intervention (Meinert 2012). Review authors should indicate in advance whether they plan to collect information about all outcomes measured in a study or only those outcomes of (pre-specified) interest in the review. Research has shown that trials addressing the same condition and intervention seldom agree on which outcomes are the most important, and consequently report on numerous different outcomes (Dwan et al 2014, Ismail et al 2014, Denniston et al 2015, Saldanha et al 2017a). The selection of outcomes across systematic reviews of the same condition is also inconsistent (Page et al 2014, Saldanha et al 2014, Saldanha et al 2016, Liu et al 2017). Outcomes used in trials and in systematic reviews of the same condition have limited overlap (Saldanha et al 2017a, Saldanha et al 2017b).

We recommend that only the outcomes defined in the protocol be described in detail. However, a complete list of the names of all outcomes measured may allow a more detailed assessment of the risk of bias due to missing outcome data (see Chapter 13 ).

Review authors should collect all five elements of an outcome (Zarin et al 2011, Saldanha et al 2014):

1. outcome domain or title (e.g. anxiety);

2. measurement tool or instrument (including definition of clinical outcomes or endpoints); for a scale, name of the scale (e.g. the Hamilton Anxiety Rating Scale), upper and lower limits, and whether a high or low score is favourable, definitions of any thresholds if appropriate;

3. specific metric used to characterize each participant’s results (e.g. post-intervention anxiety, or change in anxiety from baseline to a post-intervention time point, or post-intervention presence of anxiety (yes/no));

4. method of aggregation (e.g. mean and standard deviation of anxiety scores in each group, or proportion of people with anxiety);

5. timing of outcome measurements (e.g. assessments at end of eight-week intervention period, events occurring during eight-week intervention period).

Further considerations for economics outcomes are discussed in Chapter 20 , and for patient-reported outcomes in Chapter 18 .

5.3.5.1 Adverse effects

Collection of information about the harmful effects of an intervention can pose particular difficulties, discussed in detail in Chapter 19 . These outcomes may be described using multiple terms, including ‘adverse event’, ‘adverse effect’, ‘adverse drug reaction’, ‘side effect’ and ‘complication’. Many of these terminologies are used interchangeably in the literature, although some are technically different. Harms might additionally be interpreted to include undesirable changes in other outcomes measured during a study, such as a decrease in quality of life where an improvement may have been anticipated.

In clinical trials, adverse events can be collected either systematically or non-systematically. Systematic collection refers to collecting adverse events in the same manner for each participant using defined methods such as a questionnaire or a laboratory test. For systematically collected outcomes representing harm, data can be collected by review authors in the same way as efficacy outcomes (see Section 5.3.5 ).

Non-systematic collection refers to collection of information on adverse events using methods such as open-ended questions (e.g. ‘Have you noticed any symptoms since your last visit?’), or reported by participants spontaneously. In either case, adverse events may be selectively reported based on their severity, and whether the participant suspected that the effect may have been caused by the intervention, which could lead to bias in the available data. Unfortunately, most adverse events are collected non-systematically rather than systematically, creating a challenge for review authors. The following pieces of information are useful and worth collecting (Nicole Fusco, personal communication):

  • any coding system or standard medical terminology used (e.g. COSTART, MedDRA), including version number;
  • name of the adverse events (e.g. dizziness);
  • reported intensity of the adverse event (e.g. mild, moderate, severe);
  • whether the trial investigators categorized the adverse event as ‘serious’;
  • whether the trial investigators identified the adverse event as being related to the intervention;
  • time point (most commonly measured as a count over the duration of the study);
  • any reported methods for how adverse events were selected for inclusion in the publication (e.g. ‘We reported all adverse events that occurred in at least 5% of participants’); and
  • associated results.

Different collection methods lead to very different accounting of adverse events (Safer 2002, Bent et al 2006, Ioannidis et al 2006, Carvajal et al 2011, Allen et al 2013). Non-systematic collection methods tend to underestimate how frequently an adverse event occurs. It is particularly problematic when the adverse event of interest to the review is collected systematically in some studies but non-systematically in other studies. Different collection methods introduce an important source of heterogeneity. In addition, when non-systematic adverse events are reported based on quantitative selection criteria (e.g. only adverse events that occurred in at least 5% of participants were included in the publication), use of reported data alone may bias the results of meta-analyses. Review authors should be cautious of (or refrain from) synthesizing adverse events that are collected differently.

Regardless of the collection methods, precise definitions of adverse effect outcomes and their intensity should be recorded, since they may vary between studies. For example, in a review of aspirin and gastrointestinal haemorrhage, some trials simply reported gastrointestinal bleeds, while others reported specific categories of bleeding, such as haematemesis, melaena, and proctorrhagia (Derry and Loke 2000). The definition and reporting of severity of the haemorrhages (e.g. major, severe, requiring hospital admission) also varied considerably among the trials (Zanchetti and Hansson 1999). Moreover, a particular adverse effect may be described or measured in different ways among the studies. For example, the terms ‘tiredness’, ‘fatigue’ or ‘lethargy’ may all be used in reporting of adverse effects. Study authors also may use different thresholds for ‘abnormal’ results (e.g. hypokalaemia diagnosed at a serum potassium concentration of 3.0 mmol/L or 3.5 mmol/L).

No mention of adverse events in trial reports does not necessarily mean that no adverse events occurred. It is usually safest to assume that they were not reported. Quality of life measures are sometimes used as a measure of the participants’ experience during the study, but these are usually general measures that do not look specifically at particular adverse effects of the intervention. While quality of life measures are important and can be used to gauge overall participant well-being, they should not be regarded as substitutes for a detailed evaluation of safety and tolerability.

5.3.6 Results

Results data arise from the measurement or ascertainment of outcomes for individual participants in an intervention study. Results data may be available for each individual in a study (i.e. individual participant data; see Chapter 26 ), or summarized at arm level, or summarized at study level into an intervention effect by comparing two intervention arms. Results data should be collected only for the intervention groups and outcomes specified to be of interest in the protocol (see MECIR Box 5.3.b ). Results for other outcomes should not be collected unless the protocol is modified to add them. Any modification should be reported in the review. However, review authors should be alert to the possibility of important, unexpected findings, particularly serious adverse effects.

MECIR Box 5.3.b Relevant expectations for conduct of intervention reviews

Reports of studies often include several results for the same outcome. For example, different measurement scales might be used, results may be presented separately for different subgroups, and outcomes may have been measured at different follow-up time points. Variation in the results can be very large, depending on which data are selected (Gøtzsche et al 2007, Mayo-Wilson et al 2017a). Review protocols should be as specific as possible about which outcome domains, measurement tools, time points, and summary statistics (e.g. final values versus change from baseline) are to be collected (Mayo-Wilson et al 2017b). A framework should be pre-specified in the protocol to facilitate making choices between multiple eligible measures or results. For example, a hierarchy of preferred measures might be created, or plans articulated to select the result with the median effect size, or to average across all eligible results for a particular outcome domain (see also Chapter 9, Section 9.3.3 ). Any additional decisions or changes to this framework made once the data are collected should be reported in the review as changes to the protocol.

Section 5.6 describes the numbers that will be required to perform meta-analysis, if appropriate. The unit of analysis (e.g. participant, cluster, body part, treatment period) should be recorded for each result when it is not obvious (see Chapter 6, Section 6.2 ). The type of outcome data determines the nature of the numbers that will be sought for each outcome. For example, for a dichotomous (‘yes’ or ‘no’) outcome, the number of participants and the number who experienced the outcome will be sought for each group. It is important to collect the sample size relevant to each result, although this is not always obvious. A flow diagram as recommended in the CONSORT Statement (Moher et al 2001) can help to determine the flow of participants through a study. If one is not available in a published report, review authors can consider drawing one (available from www.consort-statement.org ).

The numbers required for meta-analysis are not always available. Often, other statistics can be collected and converted into the required format. For example, for a continuous outcome, it is usually most convenient to seek the number of participants, the mean and the standard deviation for each intervention group. These are often not available directly, especially the standard deviation. Alternative statistics enable calculation or estimation of the missing standard deviation (such as a standard error, a confidence interval, a test statistic (e.g. from a t-test or F-test) or a P value). These should be extracted if they provide potentially useful information (see MECIR Box 5.3.c ). Details of recalculation are provided in Section 5.6 . Further considerations for dealing with missing data are discussed in Chapter 10, Section 10.12 .

MECIR Box 5.3.c Relevant expectations for conduct of intervention reviews

5.3.7 Other information to collect

We recommend that review authors collect the key conclusions of the included study as reported by its authors. It is not necessary to report these conclusions in the review, but they should be used to verify the results of analyses undertaken by the review authors, particularly in relation to the direction of effect. Further comments by the study authors, for example any explanations they provide for unexpected findings, may be noted. References to other studies that are cited in the study report may be useful, although review authors should be aware of the possibility of citation bias (see Chapter 7, Section 7.2.3.2 ). Documentation of any correspondence with the study authors is important for review transparency.

5.4 Data collection tools

5.4.1 rationale for data collection forms.

Data collection for systematic reviews should be performed using structured data collection forms (see MECIR Box 5.4.a ). These can be paper forms, electronic forms (e.g. Google Form), or commercially or custom-built data systems (e.g. Covidence, EPPI-Reviewer, Systematic Review Data Repository (SRDR)) that allow online form building, data entry by several users, data sharing, and efficient data management (Li et al 2015). All different means of data collection require data collection forms.

MECIR Box 5.4.a Relevant expectations for conduct of intervention reviews

The data collection form is a bridge between what is reported by the original investigators (e.g. in journal articles, abstracts, personal correspondence) and what is ultimately reported by the review authors. The data collection form serves several important functions (Meade and Richardson 1997). First, the form is linked directly to the review question and criteria for assessing eligibility of studies, and provides a clear summary of these that can be used to identify and structure the data to be extracted from study reports. Second, the data collection form is the historical record of the provenance of the data used in the review, as well as the multitude of decisions (and changes to decisions) that occur throughout the review process. Third, the form is the source of data for inclusion in an analysis.

Given the important functions of data collection forms, ample time and thought should be invested in their design. Because each review is different, data collection forms will vary across reviews. However, there are many similarities in the types of information that are important. Thus, forms can be adapted from one review to the next. Although we use the term ‘data collection form’ in the singular, in practice it may be a series of forms used for different purposes: for example, a separate form could be used to assess the eligibility of studies for inclusion in the review to assist in the quick identification of studies to be excluded from or included in the review.

5.4.2 Considerations in selecting data collection tools

The choice of data collection tool is largely dependent on review authors’ preferences, the size of the review, and resources available to the author team. Potential advantages and considerations of selecting one data collection tool over another are outlined in Table 5.4.a (Li et al 2015). A significant advantage that data systems have is in data management ( Chapter 1, Section 1.6 ) and re-use. They make review updates more efficient, and also facilitate methodological research across reviews. Numerous ‘meta-epidemiological’ studies have been carried out using Cochrane Review data, resulting in methodological advances which would not have been possible if thousands of studies had not all been described using the same data structures in the same system.

Some data collection tools facilitate automatic imports of extracted data into RevMan (Cochrane’s authoring tool), such as CSV (Excel) and Covidence. Details available here https://documentation.cochrane.org/revman-kb/populate-study-data-260702462.html

Table 5.4.a Considerations in selecting data collection tools

5.4.3 Design of a data collection form

Regardless of whether data are collected using a paper or electronic form, or a data system, the key to successful data collection is to construct easy-to-use forms and collect sufficient and unambiguous data that faithfully represent the source in a structured and organized manner (Li et al 2015). In most cases, a document format should be developed for the form before building an electronic form or a data system. This can be distributed to others, including programmers and data analysts, and as a guide for creating an electronic form and any guidance or codebook to be used by data extractors. Review authors also should consider compatibility of any electronic form or data system with analytical software, as well as mechanisms for recording, assessing and correcting data entry errors.

Data described in multiple reports (or even within a single report) of a study may not be consistent. Review authors will need to describe how they work with multiple reports in the protocol, for example, by pre-specifying which report will be used when sources contain conflicting data that cannot be resolved by contacting the investigators. Likewise, when there is only one report identified for a study, review authors should specify the section within the report (e.g. abstract, methods, results, tables, and figures) for use in case of inconsistent information.

If review authors wish to automatically import their extracted data into RevMan, it is advised that their data collection forms match the data extraction templates available via the RevMan Knowledge Base. Details available here https://documentation.cochrane.org/revman-kb/data-extraction-templates-260702375.html.

A good data collection form should minimize the need to go back to the source documents. When designing a data collection form, review authors should involve all members of the team, that is, content area experts, authors with experience in systematic review methods and data collection form design, statisticians, and persons who will perform data extraction. Here are suggested steps and some tips for designing a data collection form, based on the informal collation of experiences from numerous review authors (Li et al 2015).

Step 1. Develop outlines of tables and figures expected to appear in the systematic review, considering the comparisons to be made between different interventions within the review, and the various outcomes to be measured. This step will help review authors decide the right amount of data to collect (not too much or too little). Collecting too much information can lead to forms that are longer than original study reports, and can be very wasteful of time. Collection of too little information, or omission of key data, can lead to the need to return to study reports later in the review process.

Step 2. Assemble and group data elements to facilitate form development. Review authors should consult Table 5.3.a , in which the data elements are grouped to facilitate form development and data collection. Note that it may be more efficient to group data elements in the order in which they are usually found in study reports (e.g. starting with reference information, followed by eligibility criteria, intervention description, statistical methods, baseline characteristics and results).

Step 3. Identify the optimal way of framing the data items. Much has been written about how to frame data items for developing robust data collection forms in primary research studies. We summarize a few key points and highlight issues that are pertinent to systematic reviews.

  • Ask closed-ended questions (i.e. questions that define a list of permissible responses) as much as possible. Closed-ended questions do not require post hoc coding and provide better control over data quality than open-ended questions. When setting up a closed-ended question, one must anticipate and structure possible responses and include an ‘other, specify’ category because the anticipated list may not be exhaustive. Avoid asking data extractors to summarize data into uncoded text, no matter how short it is.
  • Avoid asking a question in a way that the response may be left blank. Include ‘not applicable’, ‘not reported’ and ‘cannot tell’ options as needed. The ‘cannot tell’ option tags uncertain items that may promote review authors to contact study authors for clarification, especially on data items critical to reach conclusions.
  • Remember that the form will focus on what is reported in the article rather what has been done in the study. The study report may not fully reflect how the study was actually conducted. For example, a question ‘Did the article report that the participants were masked to the intervention?’ is more appropriate than ‘Were participants masked to the intervention?’
  • Where a judgement is required, record the raw data (i.e. quote directly from the source document) used to make the judgement. It is also important to record the source of information collected, including where it was found in a report or whether information was obtained from unpublished sources or personal communications. As much as possible, questions should be asked in a way that minimizes subjective interpretation and judgement to facilitate data comparison and adjudication.
  • Incorporate flexibility to allow for variation in how data are reported. It is strongly recommended that outcome data be collected in the format in which they were reported and transformed in a subsequent step if required. Review authors also should consider the software they will use for analysis and for publishing the review (e.g. RevMan).

Step 4. Develop and pilot-test data collection forms, ensuring that they provide data in the right format and structure for subsequent analysis. In addition to data items described in Step 2, data collection forms should record the title of the review as well as the person who is completing the form and the date of completion. Forms occasionally need revision; forms should therefore include the version number and version date to reduce the chances of using an outdated form by mistake. Because a study may be associated with multiple reports, it is important to record the study ID as well as the report ID. Definitions and instructions helpful for answering a question should appear next to the question to improve quality and consistency across data extractors (Stock 1994). Provide space for notes, regardless of whether paper or electronic forms are used.

All data collection forms and data systems should be thoroughly pilot-tested before launch (see MECIR Box 5.4.a ). Testing should involve several people extracting data from at least a few articles. The initial testing focuses on the clarity and completeness of questions. Users of the form may provide feedback that certain coding instructions are confusing or incomplete (e.g. a list of options may not cover all situations). The testing may identify data that are missing from the form, or likely to be superfluous. After initial testing, accuracy of the extracted data should be checked against the source document or verified data to identify problematic areas. It is wise to draft entries for the table of ‘Characteristics of included studies’ and complete a risk of bias assessment ( Chapter 8 ) using these pilot reports to ensure all necessary information is collected. A consensus between review authors may be required before the form is modified to avoid any misunderstandings or later disagreements. It may be necessary to repeat the pilot testing on a new set of reports if major changes are needed after the first pilot test.

Problems with the data collection form may surface after pilot testing has been completed, and the form may need to be revised after data extraction has started. When changes are made to the form or coding instructions, it may be necessary to return to reports that have already undergone data extraction. In some situations, it may be necessary to clarify only coding instructions without modifying the actual data collection form.

5.5 Extracting data from reports

5.5.1 introduction.

In most systematic reviews, the primary source of information about each study is published reports of studies, usually in the form of journal articles. Despite recent developments in machine learning models to automate data extraction in systematic reviews (see Section 5.5.9 ), data extraction is still largely a manual process. Electronic searches for text can provide a useful aid to locating information within a report. Examples include using search facilities in PDF viewers, internet browsers and word processing software. However, text searching should not be considered a replacement for reading the report, since information may be presented using variable terminology and presented in multiple formats.

5.5.2 Who should extract data?

Data extractors should have at least a basic understanding of the topic, and have knowledge of study design, data analysis and statistics. They should pay attention to detail while following instructions on the forms. Because errors that occur at the data extraction stage are rarely detected by peer reviewers, editors, or users of systematic reviews, it is recommended that more than one person extract data from every report to minimize errors and reduce introduction of potential biases by review authors (see MECIR Box 5.5.a ). As a minimum, information that involves subjective interpretation and information that is critical to the interpretation of results (e.g. outcome data) should be extracted independently by at least two people (see MECIR Box 5.5.a ). In common with implementation of the selection process ( Chapter 4, Section 4.6 ), it is preferable that data extractors are from complementary disciplines, for example a methodologist and a topic area specialist. It is important that everyone involved in data extraction has practice using the form and, if the form was designed by someone else, receives appropriate training.

Evidence in support of duplicate data extraction comes from several indirect sources. One study observed that independent data extraction by two authors resulted in fewer errors than data extraction by a single author followed by verification by a second (Buscemi et al 2006). A high prevalence of data extraction errors (errors in 20 out of 34 reviews) has been observed (Jones et al 2005). A further study of data extraction to compute standardized mean differences found that a minimum of seven out of 27 reviews had substantial errors (Gøtzsche et al 2007).

MECIR Box 5.5.a Relevant expectations for conduct of intervention reviews

5.5.3 Training data extractors

Training of data extractors is intended to familiarize them with the review topic and methods, the data collection form or data system, and issues that may arise during data extraction. Results of the pilot testing of the form should prompt discussion among review authors and extractors of ambiguous questions or responses to establish consistency. Training should take place at the onset of the data extraction process and periodically over the course of the project (Li et al 2015). For example, when data related to a single item on the form are present in multiple locations within a report (e.g. abstract, main body of text, tables, and figures) or in several sources (e.g. publications, ClinicalTrials.gov, or CSRs), the development and documentation of instructions to follow an agreed algorithm are critical and should be reinforced during the training sessions.

Some have proposed that some information in a report, such as its authors, be blinded to the review author prior to data extraction and assessment of risk of bias (Jadad et al 1996). However, blinding of review authors to aspects of study reports generally is not recommended for Cochrane Reviews as there is little evidence that it alters the decisions made (Berlin 1997).

5.5.4 Extracting data from multiple reports of the same study

Studies frequently are reported in more than one publication or in more than one source (Tramèr et al 1997, von Elm et al 2004). A single source rarely provides complete information about a study; on the other hand, multiple sources may contain conflicting information about the same study (Mayo-Wilson et al 2017a, Mayo-Wilson et al 2017b, Mayo-Wilson et al 2018). Because the unit of interest in a systematic review is the study and not the report, information from multiple reports often needs to be collated and reconciled. It is not appropriate to discard any report of an included study without careful examination, since it may contain valuable information not included in the primary report. Review authors will need to decide between two strategies:

  • Extract data from each report separately, then combine information across multiple data collection forms.
  • Extract data from all reports directly into a single data collection form.

The choice of which strategy to use will depend on the nature of the reports and may vary across studies and across reports. For example, when a full journal article and multiple conference abstracts are available, it is likely that the majority of information will be obtained from the journal article; completing a new data collection form for each conference abstract may be a waste of time. Conversely, when there are two or more detailed journal articles, perhaps relating to different periods of follow-up, then it is likely to be easier to perform data extraction separately for these articles and collate information from the data collection forms afterwards. When data from all reports are extracted into a single data collection form, review authors should identify the ‘main’ data source for each study when sources include conflicting data and these differences cannot be resolved by contacting authors (Mayo-Wilson et al 2018). Flow diagrams such as those modified from the PRISMA statement can be particularly helpful when collating and documenting information from multiple reports (Mayo-Wilson et al 2018).

5.5.5 Reliability and reaching consensus

When more than one author extracts data from the same reports, there is potential for disagreement. After data have been extracted independently by two or more extractors, responses must be compared to assure agreement or to identify discrepancies. An explicit procedure or decision rule should be specified in the protocol for identifying and resolving disagreements. Most often, the source of the disagreement is an error by one of the extractors and is easily resolved. Thus, discussion among the authors is a sensible first step. More rarely, a disagreement may require arbitration by another person. Any disagreement that cannot be resolved should be addressed by contacting the study authors; if this is unsuccessful, the disagreement should be reported in the review.

The presence and resolution of disagreements should be carefully recorded. Maintaining a copy of the data ‘as extracted’ (in addition to the consensus data) allows assessment of reliability of coding. Examples of ways in which this can be achieved include the following:

  • Use one author’s (paper) data collection form and record changes after consensus in a different ink colour.
  • Enter consensus data onto an electronic form.
  • Record original data extracted and consensus data in separate forms (some online tools do this automatically).

Agreement of coded items before reaching consensus can be quantified, for example using kappa statistics (Orwin 1994), although this is not routinely done in Cochrane Reviews. If agreement is assessed, this should be done only for the most important data (e.g. key risk of bias assessments, or availability of key outcomes).

Throughout the review process informal consideration should be given to the reliability of data extraction. For example, if after reaching consensus on the first few studies, the authors note a frequent disagreement for specific data, then coding instructions may need modification. Furthermore, an author’s coding strategy may change over time, as the coding rules are forgotten, indicating a need for retraining and, possibly, some recoding.

5.5.6 Extracting data from clinical study reports

Clinical study reports (CSRs) obtained for a systematic review are likely to be in PDF format. Although CSRs can be thousands of pages in length and very time-consuming to review, they typically follow the content and format required by the International Conference on Harmonisation (ICH 1995). Information in CSRs is usually presented in a structured and logical way. For example, numerical data pertaining to important demographic, efficacy, and safety variables are placed within the main text in tables and figures. Because of the clarity and completeness of information provided in CSRs, data extraction from CSRs may be clearer and conducted more confidently than from journal articles or other short reports.

To extract data from CSRs efficiently, review authors should familiarize themselves with the structure of the CSRs. In practice, review authors may want to browse or create ‘bookmarks’ within a PDF document that record section headers and subheaders and search key words related to the data extraction (e.g. randomization). In addition, it may be useful to utilize optical character recognition software to convert tables of data in the PDF to an analysable format when additional analyses are required, saving time and minimizing transcription errors.

CSRs may contain many outcomes and present many results for a single outcome (due to different analyses) (Mayo-Wilson et al 2017b). We recommend review authors extract results only for outcomes of interest to the review (Section 5.3.6 ). With regard to different methods of analysis, review authors should have a plan and pre-specify preferred metrics in their protocol for extracting results pertaining to different populations (e.g. ‘all randomized’, ‘all participants taking at least one dose of medication’), methods for handling missing data (e.g. ‘complete case analysis’, ‘multiple imputation’), and adjustment (e.g. unadjusted, adjusted for baseline covariates). It may be important to record the range of analysis options available, even if not all are extracted in detail. In some cases it may be preferable to use metrics that are comparable across multiple included studies, which may not be clear until data collection for all studies is complete.

CSRs are particularly useful for identifying outcomes assessed but not presented to the public. For efficacy outcomes and systematically collected adverse events, review authors can compare what is described in the CSRs with what is reported in published reports to assess the risk of bias due to missing outcome data ( Chapter 8, Section 8.5 ) and in selection of reported result ( Chapter 8, Section 8.7 ). Note that non-systematically collected adverse events are not amenable to such comparisons because these adverse events may not be known ahead of time and thus not pre-specified in the protocol.

5.5.7 Extracting data from regulatory reviews

Data most relevant to systematic reviews can be found in the medical and statistical review sections of a regulatory review. Both of these are substantially longer than journal articles (Turner 2013). A list of all trials on a drug usually can be found in the medical review. Because trials are referenced by a combination of numbers and letters, it may be difficult for the review authors to link the trial with other reports of the same trial (Section 5.2.1 ).

Many of the documents downloaded from the US Food and Drug Administration’s website for older drugs are scanned copies and are not searchable because of redaction of confidential information (Turner 2013). Optical character recognition software can convert most of the text. Reviews for newer drugs have been redacted electronically; documents remain searchable as a result.

Compared to CSRs, regulatory reviews contain less information about trial design, execution, and results. They provide limited information for assessing the risk of bias. In terms of extracting outcomes and results, review authors should follow the guidance provided for CSRs (Section 5.5.6 ).

5.5.8 Extracting data from figures with software

Sometimes numerical data needed for systematic reviews are only presented in figures. Review authors may request the data from the study investigators, or alternatively, extract the data from the figures either manually (e.g. with a ruler) or by using software. Numerous tools are available, many of which are free. Those available at the time of writing include tools called Plot Digitizer, WebPlotDigitizer, Engauge, Dexter, ycasd, GetData Graph Digitizer. The software works by taking an image of a figure and then digitizing the data points off the figure using the axes and scales set by the users. The numbers exported can be used for systematic reviews, although additional calculations may be needed to obtain the summary statistics, such as calculation of means and standard deviations from individual-level data points (or conversion of time-to-event data presented on Kaplan-Meier plots to hazard ratios; see Chapter 6, Section 6.8.2 ).

It has been demonstrated that software is more convenient and accurate than visual estimation or use of a ruler (Gross et al 2014, Jelicic Kadic et al 2016). Review authors should consider using software for extracting numerical data from figures when the data are not available elsewhere.

5.5.9 Automating data extraction in systematic reviews

Because data extraction is time-consuming and error-prone, automating or semi-automating this step may make the extraction process more efficient and accurate. The state of science relevant to automating data extraction is summarized here (Jonnalagadda et al 2015).

  • At least 26 studies have tested various natural language processing and machine learning approaches for facilitating data extraction for systematic reviews.

· Each tool focuses on only a limited number of data elements (ranges from one to seven). Most of the existing tools focus on the PICO information (e.g. number of participants, their age, sex, country, recruiting centres, intervention groups, outcomes, and time points). A few are able to extract study design and results (e.g. objectives, study duration, participant flow), and two extract risk of bias information (Marshall et al 2016, Millard et al 2016). To date, well over half of the data elements needed for systematic reviews have not been explored for automated extraction.

  • Most tools highlight the sentence(s) that may contain the data elements as opposed to directly recording these data elements into a data collection form or a data system.
  • There is no gold standard or common dataset to evaluate the performance of these tools, limiting our ability to interpret the significance of the reported accuracy measures.

At the time of writing, we cannot recommend a specific tool for automating data extraction for routine systematic review production. There is a need for review authors to work with experts in informatics to refine these tools and evaluate them rigorously. Such investigations should address how the tool will fit into existing workflows. For example, the automated or semi-automated data extraction approaches may first act as checks for manual data extraction before they can replace it.

5.5.10 Suspicions of scientific misconduct

Systematic review authors can uncover suspected misconduct in the published literature. Misconduct includes fabrication or falsification of data or results, plagiarism, and research that does not adhere to ethical norms. Review authors need to be aware of scientific misconduct because the inclusion of fraudulent material could undermine the reliability of a review’s findings. Plagiarism of results data in the form of duplicated publication (either by the same or by different authors) may, if undetected, lead to study participants being double counted in a synthesis.

It is preferable to identify potential problems before, rather than after, publication of the systematic review, so that readers are not misled. However, empirical evidence indicates that the extent to which systematic review authors explore misconduct varies widely (Elia et al 2016). Text-matching software and systems such as CrossCheck may be helpful for detecting plagiarism, but they can detect only matching text, so data tables or figures need to be inspected by hand or using other systems (e.g. to detect image manipulation). Lists of data such as in a meta-analysis can be a useful means of detecting duplicated studies. Furthermore, examination of baseline data can lead to suspicions of misconduct for an individual randomized trial (Carlisle et al 2015). For example, Al-Marzouki and colleagues concluded that a trial report was fabricated or falsified on the basis of highly unlikely baseline differences between two randomized groups (Al-Marzouki et al 2005).

Cochrane Review authors are advised to consult with Cochrane editors if cases of suspected misconduct are identified. Searching for comments, letters or retractions may uncover additional information. Sensitivity analyses can be used to determine whether the studies arousing suspicion are influential in the conclusions of the review. Guidance for editors for addressing suspected misconduct will be available from Cochrane’s Editorial Publishing and Policy Resource (see community.cochrane.org ). Further information is available from the Committee on Publication Ethics (COPE; publicationethics.org ), including a series of flowcharts on how to proceed if various types of misconduct are suspected. Cases should be followed up, typically including an approach to the editors of the journals in which suspect reports were published. It may be useful to write first to the primary investigators to request clarification of apparent inconsistencies or unusual observations.

Because investigations may take time, and institutions may not always be responsive (Wager 2011), articles suspected of being fraudulent should be classified as ‘awaiting assessment’. If a misconduct investigation indicates that the publication is unreliable, or if a publication is retracted, it should not be included in the systematic review, and the reason should be noted in the ‘excluded studies’ section.

5.5.11 Key points in planning and reporting data extraction

In summary, the methods section of both the protocol and the review should detail:

  • the data categories that are to be extracted;
  • how extracted data from each report will be verified (e.g. extraction by two review authors, independently);
  • whether data extraction is undertaken by content area experts, methodologists, or both;
  • pilot testing, training and existence of coding instructions for the data collection form;
  • how data are extracted from multiple reports from the same study; and
  • how disagreements are handled when more than one author extracts data from each report.

5.6 Extracting study results and converting to the desired format

In most cases, it is desirable to collect summary data separately for each intervention group of interest and to enter these into software in which effect estimates can be calculated, such as RevMan. Sometimes the required data may be obtained only indirectly, and the relevant results may not be obvious. Chapter 6 provides many useful tips and techniques to deal with common situations. When summary data cannot be obtained from each intervention group, or where it is important to use results of adjusted analyses (for example to account for correlations in crossover or cluster-randomized trials) effect estimates may be available directly.

5.7 Managing and sharing data

When data have been collected for each individual study, it is helpful to organize them into a comprehensive electronic format, such as a database or spreadsheet, before entering data into a meta-analysis or other synthesis. When data are collated electronically, all or a subset of them can easily be exported for cleaning, consistency checks and analysis.

Tabulation of collected information about studies can facilitate classification of studies into appropriate comparisons and subgroups. It also allows identification of comparable outcome measures and statistics across studies. It will often be necessary to perform calculations to obtain the required statistics for presentation or synthesis. It is important through this process to retain clear information on the provenance of the data, with a clear distinction between data from a source document and data obtained through calculations. Statistical conversions, for example from standard errors to standard deviations, ideally should be undertaken with a computer rather than using a hand calculator to maintain a permanent record of the original and calculated numbers as well as the actual calculations used.

Ideally, data only need to be extracted once and should be stored in a secure and stable location for future updates of the review, regardless of whether the original review authors or a different group of authors update the review (Ip et al 2012). Standardizing and sharing data collection tools as well as data management systems among review authors working in similar topic areas can streamline systematic review production. Review authors have the opportunity to work with trialists, journal editors, funders, regulators, and other stakeholders to make study data (e.g. CSRs, IPD, and any other form of study data) publicly available, increasing the transparency of research. When legal and ethical to do so, we encourage review authors to share the data used in their systematic reviews to reduce waste and to allow verification and reanalysis because data will not have to be extracted again for future use (Mayo-Wilson et al 2018).

5.8 Chapter information

Editors: Tianjing Li, Julian PT Higgins, Jonathan J Deeks

Acknowledgements: This chapter builds on earlier versions of the Handbook . For details of previous authors and editors of the Handbook , see Preface. Andrew Herxheimer, Nicki Jackson, Yoon Loke, Deirdre Price and Helen Thomas contributed text. Stephanie Taylor and Sonja Hood contributed suggestions for designing data collection forms. We are grateful to Judith Anzures, Mike Clarke, Miranda Cumpston and Peter Gøtzsche for helpful comments.

Funding: JPTH is a member of the National Institute for Health Research (NIHR) Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JJD received support from the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

5.9 References

Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 2005; 331 : 267-270.

Allen EN, Mushi AK, Massawe IS, Vestergaard LS, Lemnge M, Staedke SG, Mehta U, Barnes KI, Chandler CI. How experiences become data: the process of eliciting adverse event, medical history and concomitant medication reports in antimalarial and antiretroviral interaction trials. BMC Medical Research Methodology 2013; 13 : 140.

Baudard M, Yavchitz A, Ravaud P, Perrodeau E, Boutron I. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses. BMJ 2017; 356 : j448.

Bent S, Padula A, Avins AL. Better ways to question patients about adverse medical events: a randomized, controlled trial. Annals of Internal Medicine 2006; 144 : 257-261.

Berlin JA. Does blinding of readers affect the results of meta-analyses? University of Pennsylvania Meta-analysis Blinding Study Group. Lancet 1997; 350 : 185-186.

Buscemi N, Hartling L, Vandermeer B, Tjosvold L, Klassen TP. Single data extraction generated more errors than double data extraction in systematic reviews. Journal of Clinical Epidemiology 2006; 59 : 697-703.

Carlisle JB, Dexter F, Pandit JJ, Shafer SL, Yentis SM. Calculating the probability of random sampling for continuous variables in submitted or published randomised controlled trials. Anaesthesia 2015; 70 : 848-858.

Carroll C, Patterson M, Wood S, Booth A, Rick J, Balain S. A conceptual framework for implementation fidelity. Implementation Science 2007; 2 : 40.

Carvajal A, Ortega PG, Sainz M, Velasco V, Salado I, Arias LHM, Eiros JM, Rubio AP, Castrodeza J. Adverse events associated with pandemic influenza vaccines: Comparison of the results of a follow-up study with those coming from spontaneous reporting. Vaccine 2011; 29 : 519-522.

Chamberlain C, O'Mara-Eves A, Porter J, Coleman T, Perlen SM, Thomas J, McKenzie JE. Psychosocial interventions for supporting women to stop smoking in pregnancy. Cochrane Database of Systematic Reviews 2017; 2 : CD001055.

Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implementation Science 2009; 4 : 50.

Davis AL, Miller JD. The European Medicines Agency and publication of clinical study reports: a challenge for the US FDA. JAMA 2017; 317 : 905-906.

Denniston AK, Holland GN, Kidess A, Nussenblatt RB, Okada AA, Rosenbaum JT, Dick AD. Heterogeneity of primary outcome measures used in clinical trials of treatments for intermediate, posterior, and panuveitis. Orphanet Journal of Rare Diseases 2015; 10 : 97.

Derry S, Loke YK. Risk of gastrointestinal haemorrhage with long term use of aspirin: meta-analysis. BMJ 2000; 321 : 1183-1187.

Doshi P, Dickersin K, Healy D, Vedula SS, Jefferson T. Restoring invisible and abandoned trials: a call for people to publish the findings. BMJ 2013; 346 : f2865.

Dusenbury L, Brannigan R, Falco M, Hansen WB. A review of research on fidelity of implementation: implications for drug abuse prevention in school settings. Health Education Research 2003; 18 : 237-256.

Dwan K, Altman DG, Clarke M, Gamble C, Higgins JPT, Sterne JAC, Williamson PR, Kirkham JJ. Evidence for the selective reporting of analyses and discrepancies in clinical trials: a systematic review of cohort studies of clinical trials. PLoS Medicine 2014; 11 : e1001666.

Elia N, von Elm E, Chatagner A, Popping DM, Tramèr MR. How do authors of systematic reviews deal with research malpractice and misconduct in original studies? A cross-sectional analysis of systematic reviews and survey of their authors. BMJ Open 2016; 6 : e010442.

Gøtzsche PC. Multiple publication of reports of drug trials. European Journal of Clinical Pharmacology 1989; 36 : 429-432.

Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA 2007; 298 : 430-437.

Gross A, Schirm S, Scholz M. Ycasd - a tool for capturing and scaling data from graphical representations. BMC Bioinformatics 2014; 15 : 219.

Hoffmann TC, Glasziou PP, Boutron I, Milne R, Perera R, Moher D, Altman DG, Barbour V, Macdonald H, Johnston M, Lamb SE, Dixon-Woods M, McCulloch P, Wyatt JC, Chan AW, Michie S. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014; 348 : g1687.

ICH. ICH Harmonised tripartite guideline: Struture and content of clinical study reports E31995. ICH1995. www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E3/E3_Guideline.pdf .

Ioannidis JPA, Mulrow CD, Goodman SN. Adverse events: The more you search, the more you find. Annals of Internal Medicine 2006; 144 : 298-300.

Ip S, Hadar N, Keefe S, Parkin C, Iovin R, Balk EM, Lau J. A web-based archive of systematic review data. Systematic Reviews 2012; 1 : 15.

Ismail R, Azuara-Blanco A, Ramsay CR. Variation of clinical outcomes used in glaucoma randomised controlled trials: a systematic review. British Journal of Ophthalmology 2014; 98 : 464-468.

Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJM, Gavaghan DJ, McQuay H. Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Controlled Clinical Trials 1996; 17 : 1-12.

Jelicic Kadic A, Vucic K, Dosenovic S, Sapunar D, Puljak L. Extracting data from figures with software was faster, with higher interrater reliability than manual extraction. Journal of Clinical Epidemiology 2016; 74 : 119-123.

Jones AP, Remmington T, Williamson PR, Ashby D, Smyth RL. High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. Journal of Clinical Epidemiology 2005; 58 : 741-742.

Jones CW, Keil LG, Holland WC, Caughey MC, Platts-Mills TF. Comparison of registered and published outcomes in randomized controlled trials: a systematic review. BMC Medicine 2015; 13 : 282.

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Systematic Reviews 2015; 4 : 78.

Lewin S, Hendry M, Chandler J, Oxman AD, Michie S, Shepperd S, Reeves BC, Tugwell P, Hannes K, Rehfuess EA, Welch V, McKenzie JE, Burford B, Petkovic J, Anderson LM, Harris J, Noyes J. Assessing the complexity of interventions within systematic reviews: development, content and use of a new tool (iCAT_SR). BMC Medical Research Methodology 2017; 17 : 76.

Li G, Abbade LPF, Nwosu I, Jin Y, Leenus A, Maaz M, Wang M, Bhatt M, Zielinski L, Sanger N, Bantoto B, Luo C, Shams I, Shahid H, Chang Y, Sun G, Mbuagbaw L, Samaan Z, Levine MAH, Adachi JD, Thabane L. A scoping review of comparisons between abstracts and full reports in primary biomedical research. BMC Medical Research Methodology 2017; 17 : 181.

Li TJ, Vedula SS, Hadar N, Parkin C, Lau J, Dickersin K. Innovations in data collection, management, and archiving for systematic reviews. Annals of Internal Medicine 2015; 162 : 287-294.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Medicine 2009; 6 : e1000100.

Liu ZM, Saldanha IJ, Margolis D, Dumville JC, Cullum NA. Outcomes in Cochrane systematic reviews related to wound care: an investigation into prespecification. Wound Repair and Regeneration 2017; 25 : 292-308.

Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association 2016; 23 : 193-201.

Mayo-Wilson E, Doshi P, Dickersin K. Are manufacturers sharing data as promised? BMJ 2015; 351 : h4169.

Mayo-Wilson E, Li TJ, Fusco N, Bertizzolo L, Canner JK, Cowley T, Doshi P, Ehmsen J, Gresham G, Guo N, Haythomthwaite JA, Heyward J, Hong H, Pham D, Payne JL, Rosman L, Stuart EA, Suarez-Cuervo C, Tolbert E, Twose C, Vedula S, Dickersin K. Cherry-picking by trialists and meta-analysts can drive conclusions about intervention efficacy. Journal of Clinical Epidemiology 2017a; 91 : 95-110.

Mayo-Wilson E, Fusco N, Li TJ, Hong H, Canner JK, Dickersin K, MUDS Investigators. Multiple outcomes and analyses in clinical trials create challenges for interpretation and research synthesis. Journal of Clinical Epidemiology 2017b; 86 : 39-50.

Mayo-Wilson E, Li T, Fusco N, Dickersin K. Practical guidance for using multiple data sources in systematic reviews and meta-analyses (with examples from the MUDS study). Research Synthesis Methods 2018; 9 : 2-12.

Meade MO, Richardson WS. Selecting and appraising studies for a systematic review. Annals of Internal Medicine 1997; 127 : 531-537.

Meinert CL. Clinical trials dictionary: Terminology and usage recommendations . Hoboken (NJ): Wiley; 2012.

Millard LAC, Flach PA, Higgins JPT. Machine learning to assist risk-of-bias assessments in systematic reviews. International Journal of Epidemiology 2016; 45 : 266-277.

Moher D, Schulz KF, Altman DG. The CONSORT Statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 2001; 357 : 1191-1194.

Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010; 340 : c869.

Moore GF, Audrey S, Barker M, Bond L, Bonell C, Hardeman W, Moore L, O'Cathain A, Tinati T, Wight D, Baird J. Process evaluation of complex interventions: Medical Research Council guidance. BMJ 2015; 350 : h1258.

Orwin RG. Evaluating coding decisions. In: Cooper H, Hedges LV, editors. The Handbook of Research Synthesis . New York (NY): Russell Sage Foundation; 1994. p. 139-162.

Page MJ, McKenzie JE, Kirkham J, Dwan K, Kramer S, Green S, Forbes A. Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions. Cochrane Database of Systematic Reviews 2014; 10 : MR000035.

Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM. Trial publication after registration in ClinicalTrials.Gov: a cross-sectional analysis. PLoS Medicine 2009; 6 .

Safer DJ. Design and reporting modifications in industry-sponsored comparative psychopharmacology trials. Journal of Nervous and Mental Disease 2002; 190 : 583-592.

Saldanha IJ, Dickersin K, Wang X, Li TJ. Outcomes in Cochrane systematic reviews addressing four common eye conditions: an evaluation of completeness and comparability. PloS One 2014; 9 : e109400.

Saldanha IJ, Li T, Yang C, Ugarte-Gil C, Rutherford GW, Dickersin K. Social network analysis identified central outcomes for core outcome sets using systematic reviews of HIV/AIDS. Journal of Clinical Epidemiology 2016; 70 : 164-175.

Saldanha IJ, Lindsley K, Do DV, Chuck RS, Meyerle C, Jones LS, Coleman AL, Jampel HD, Dickersin K, Virgili G. Comparison of clinical trial and systematic review outcomes for the 4 most prevalent eye diseases. JAMA Ophthalmology 2017a; 135 : 933-940.

Saldanha IJ, Li TJ, Yang C, Owczarzak J, Williamson PR, Dickersin K. Clinical trials and systematic reviews addressing similar interventions for the same condition do not consider similar outcomes to be important: a case study in HIV/AIDS. Journal of Clinical Epidemiology 2017b; 84 : 85-94.

Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, Tierney JF, PRISMA-IPD Development Group. Preferred reporting items for a systematic review and meta-analysis of individual participant data: the PRISMA-IPD statement. JAMA 2015; 313 : 1657-1665.

Stock WA. Systematic coding for research synthesis. In: Cooper H, Hedges LV, editors. The Handbook of Research Synthesis . New York (NY): Russell Sage Foundation; 1994. p. 125-138.

Tramèr MR, Reynolds DJ, Moore RA, McQuay HJ. Impact of covert duplicate publication on meta-analysis: a case study. BMJ 1997; 315 : 635-640.

Turner EH. How to access and process FDA drug approval packages for use in research. BMJ 2013; 347 .

von Elm E, Poglia G, Walder B, Tramèr MR. Different patterns of duplicate publication: an analysis of articles used in systematic reviews. JAMA 2004; 291 : 974-980.

Wager E. Coping with scientific misconduct. BMJ 2011; 343 : d6586.

Wieland LS, Rutkow L, Vedula SS, Kaufmann CN, Rosman LM, Twose C, Mahendraratnam N, Dickersin K. Who has used internal company documents for biomedical and public health research and where did they find them? PloS One 2014; 9 .

Zanchetti A, Hansson L. Risk of major gastrointestinal bleeding with aspirin (Authors' reply). Lancet 1999; 353 : 149-150.

Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results database: update and key issues. New England Journal of Medicine 2011; 364 : 852-860.

Zwarenstein M, Treweek S, Gagnier JJ, Altman DG, Tunis S, Haynes B, Oxman AD, Moher D. Improving the reporting of pragmatic trials: an extension of the CONSORT statement. BMJ 2008; 337 : a2390.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of f1000res

  • PMC7338918.1 ; 2020 Mar 25
  • ➤ PMC7338918.2; 2020 Jun 8

Data extraction methods for systematic review (semi)automation: A living review protocol

Lena schmidt.

1 Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK

Babatunde K. Olorisade

Luke a. mcguinness, james thomas.

2 UCL Social Research Institute, University College London, London, WC1H 0AL, UK

Julian P. T. Higgins

Associated data.

  • Schmidt L, McGuinness LA, Olorisade BK, et al.: Protocol .2020. 10.17605/OSF.IO/ECB3T [ CrossRef ]

Underlying data

No underlying data are associated with this article.

Extended data

Open Science Framework: Data Extraction Methods for Systematic Review (semi)Automation: A Living Review / Protocol. https://doi.org/10.17605/OSF.IO/ECB3T 11

This project contains the following extended data:

  • Additional_Fields.docx (overview of data fields of interest for text mining in clinical trials)
  • Search.docx (additional information about the searches, including full search strategies)

Reporting guidelines

Data are available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) data waiver.

Version Changes

Revised. amendments from version 1.

In the second version we incorporated changes based on the first two peer reviews. These changes included a clarification that the scope of data extraction that we are interested in goes beyond PICOs, we elaborated on the rationale of looking at data extraction specifically, we clarified the objectives and added some more recent related literature.W e also clarified the proposed schedule for updating the review in living mode. We have also added a new author James Thomas.

Peer Review Summary

Background: Researchers in evidence-based medicine cannot keep up with the amounts of both old and newly published primary research articles. Support for the early stages of the systematic review process – searching and screening studies for eligibility – is necessary because it is currently impossible to search for relevant research with precision. Better automated data extraction may not only facilitate the stage of review traditionally labelled ‘data extraction’, but also change earlier phases of the review process by making it possible to identify relevant research. Exponential improvements in computational processing speed and data storage are fostering the development of data mining models and algorithms. This, in combination with quicker pathways to publication, led to a large landscape of tools and methods for data mining and extraction.

Objective: To review published methods and tools for data extraction to (semi)automate the systematic reviewing process.

Methods: We propose to conduct a living review. With this methodology we aim to do constant evidence surveillance, bi-monthly search updates, as well as review updates every 6 months if new evidence permits it. In a cross-sectional analysis we will extract methodological characteristics and assess the quality of reporting in our included papers.

Conclusions: We aim to increase transparency in the reporting and assessment of automation technologies to the benefit of data scientists, systematic reviewers and funders of health research. This living review will help to reduce duplicate efforts by data scientists who develop data mining methods. It will also serve to inform systematic reviewers about possibilities to support their data extraction.

Introduction

Research on systematic review (semi)automation sits at the interface between evidence-based medicine and data science. The capacity of computers for supporting humans increases, along with the development of processing power and storage space. Data extraction for systematic reviewing is a repetitive task. This opens opportunities for support through intelligent software. Tools and methods in this domain frequently focused on automatic processing of information related to the PICO framework (Population, Intervention, Comparator, Outcome). A 2017 analysis of 195 systematic reviews investigated the workload associated with authoring a review. On average, the analysed reviews took 67 weeks to write and publish. Although review size and the number of authors varied between the analysed reviews, the authors concluded that supporting the reviewing process with technological means is important in order to save thousands of personal working hours of trained and specialised staff 1 . The potential workload for systematic reviewers is increasing, because the evidence base of clinical studies that can be reviewed is growing rapidly ( Figure 1 ). This entails not only a need to publish new reviews, but also to commit to them and to continually keep the evidence up to date.

An external file that holds a picture, illustration, etc.
Object name is f1000research-9-27003-g0000.jpg

Rapid development in the field of systematic review (semi)automation

Language processing toolkits and machine learning libraries are well documented and available to use free of charge. At the same time, freely available training data make it easy to train classic machine-learning classifiers such as support vector machines, or even complex, deep neural networks such as long short-term memory (LSTM) neural networks. These are reasons why health data science, much like the rest of computer science and natural language processing, is a rapidly developing field. There is a need for fast publication, because trends and state-of-the-art methods are changing at a fast pace. Preprint repositories, such as the arXiv , are offering near rapid publication after a short moderation process rather than full peer review. Consequently, publishing research is becoming easier.

Why this review is needed

An easily updatable review of available methods and tools is needed to inform systematic reviewers, data scientists or their funders on the status quo of (semi)automated data extraction methodology. For data scientists, it contributes to reducing waste and duplication in research. For reviewers, it contributes to highlighting the current possibilities for data extraction and empowering them to choose the right tools for their task. Currently, data extraction represents one of the most time consuming 1 and error-prone (Jones, Remmington, Williamson, Ashby, & Smyth, 2005) elements of the systematic review process, particularly if a large number of primary studies meet the inclusion criteria. Data mining, paralleled by automatic data extraction of relevant data for any specific systematic review project, has the potential to disrupt the traditional systematic reviewing process. This systematic review workflow usually follows the steps of searching, screening, and extracting data. If high-quality and curated data mining results are available then the searching and screening process is likely to change in the future. This review will provide constant surveillance of emerging data extraction tools.

Many systematic reviewers are free to use any tool that is available to them and need sufficient information to make informed decisions about which tools are to be preferred. Our proposed continuous analysis of the available tools will include the final performance metrics that a model achieves, and will also assess dimensions such as transparency of methods, reproducibility, and how these items are reported. Reported pitfalls of applying health data science methods to systematic reviewing tasks will be summarised to highlight risks that current, as well as future, systems are facing. Reviewing the available literature on systematic review automation is one of many small steps towards supporting evidence synthesis of all available medical research data. If the evidence arising from a study is never reviewed, and as a result never noticed by policy makers and providers of care, then it counts towards waste in research.

Aims of this review

This review aims to:

  • 1. Review published methods and tools aimed at automating or semi-automating the process of data extraction in the context of a systematic review of medical research studies.
  • 2. Review this evidence in the scope of a living review, keeping information up to date and relevant to the challenges faced by systematic reviewers at any time.

Our objectives are three-fold. First we want to examine the methods and tools from the data science perspective, seeking to reduce duplicate efforts, summarise current knowledge, and encourage comparability of published methods. Second, we seek to highlight contributions of methods and tools from the perspective of systematic reviewers who wish to use (semi)automation for data extraction: what is the extent of automation?; is it reliable?; and can we identify important caveats discussed in the literature, as well as factors that facilitate the adoption of tools in practice?

Related research

We have identified three previous reviews of tools and methods, two documents providing overviews and guidelines relevant to our topic, and an ongoing effort to characterise published tools for different parts of the systematic reviewing process with respect to interoperability and workflow integration. In 2014, Tsafnat et al. provided a broad overview on automation technologies for different stages of authoring a systematic review 2 . O'Mara-Eves et al. published a systematic review focusing on text-mining approaches in 2015. It includes a summary of methods for the evaluation of systems (such as recall, F1 and related scores). The reviewers focused on tasks related to PICO classification and supporting the screening process 3 . In the same year, Jonnalagadda et al. described methods for data extraction, focusing on PICOs and related fields 4 .

These reviews present an overview of classical machine learning and NLP methods applied to tasks such as data mining in the field of evidence-based medicine. At the time of publication of these documents, methods such as topic modelling (Latent Dirichlet Allocation) and support vector machines constituted the state-of-the art for language models. The age of these documents means that the latest static or contextual embedding-based and neural methods are not included. These modern methods, however, are used in contemporary systematic review automation software 5 .

Marshall and Wallace (2019) 6 present a more recent overview of automation technologies, with a focus on availability of tools and adoption into practice. They conclude that tools facilitating screening are widely accessible and usable, while data extraction tools are sill at piloting stages or require higher amounts of human input.

Beller et al. 7 present a brief overview of tools for systematic review automation. They discuss principles for systematic review automation from a meeting of the International Collaboration for the Automation of Systematic Reviews (ICASR). They highlight that low levels of funding, as well as the complexity of integrating tools for different systematic reviewing tasks have led to many small and isolated pieces of software. A working group formed at the ICASR 2019 Hackathon is compiling an overview of tools published on the Systematic Review Toolbox website 8 . This ongoing work is focused on assessing maintenance status, accessibility and supported reviewing tasks of 120 tools that can be used in any part of the systematic reviewing process as of November 2019.

Prospective registration of this review

We registered this protocol via OSF ( https://doi.org/10.17605/OSF.IO/ECB3T ). PROSPERO was initially considered as platform for registration, but it is limited to reviews with health related outcomes.

Choosing to maintain this review as a living review

The challenges highlighted in the previous section create several problems. A large variety of approaches and different means of expressing results creates uncertainty in the existing evidence. At the same time, new evidence is being published constantly. Rapid means of publications necessitate a structured, but at the same time easily updatable review of published methods and tools in the field. We therefore chose a living review approach as the updating strategy for this review.

Search and updates

For literature searches and updates we follow the living review recommendations published by Elliott et al. 9 and Brooker et al. 10 , as well as F1000Research guidelines for projects that are included in their living evidence collection. We plan to run searches for new studies every second month. This will also include screening abstracts of the newly retrieved reports. The bi-monthly interval for screening was chosen because we expect no sudden rise in relevant publications that could justify daily, weekly or monthly screening. The review itself will be updated every six months, providing that a sufficient quantity of new records are identified for inclusion. As a threshold for updating, we plan to use 10 new records, but we will consider updating the review earlier if new impactful evidence is published. We define impactful evidence as, for example, the publication of a tool that is immediately accessible to systematic reviewers and offers substantial automation of the data extraction process, or a tool that aims to change the traditional SR workflow. Figure 2 describes the anticipated reviewing process in more detail.

An external file that holds a picture, illustration, etc.
Object name is f1000research-9-27003-g0001.jpg

Our Medline search strategy was developed with the help of an information specialist. Due to the interdisciplinary topic of this review, we plan to search bibliographic databases related to both medicine and computer science. These include Medline via Ovid and Web of Science Core Collection, as well as the computer science arXiv and the DBLP computer science bibliography. We aim to retrieve publications related to two clusters of search terms. The first cluster includes computational aspects such as data mining, while the second cluster identifies publication related to systematic reviews. The Medline search strategy is provided as Extended data 11 . We aim to adapt this search strategy for conducting searches in all mentioned databases. Previous reviews of data mining in systematic reviewing contexts identified the earliest text mining application in 2005 3 , 4 . We therefore plan to search all databases from this year on. In a preliminary test our search strategy was able to identify 4320 Medline records, including all Medline-indexed records included by O’Mara-Eves et al. 3 . We plan to search the Systematic Review Toolbox website for further information on any published or unpublished tools 8 .

Workflow and study design

All titles and abstracts will be screened independently by two reviewers. Any differences in judgement will be discussed, and resolved with the help of a third reviewer if necessary. The process for assessing full texts will be the same. Data extraction will be carried out by single reviewers, and random 10% samples from each reviewer will be checked independently. If needed, we plan to contact the authors of reports for clarification or further information. In the base review, as well as in every published update, we will present a cross-sectional analysis of the evidence from our searches. This analysis will include the characteristics of each reviewed method or tool, as well as a summary of our findings. In addition, we will assess the quality of reporting at publication level. This assessment will focus on transparency, reproducibility and both internal and external validity of the described data extraction algorithms. If we at any point deviate from this protocol, we will discuss this in the final publication.

All search results will be de-duplicated and managed with EndNote. The screening and data extraction process will be managed with the help of Abstrackr 12 and customised data extraction forms in Excel. All data, including bi-monthly screening results, will be continuously available on our Open Science Framework (OSF) repository, as discussed in the Data availability section.

Which systematic reviewing tasks are supported by the methods we review

Tsafnat et al. 2 categorised sub-tasks in the systematic reviewing process that contained published tools and methods for automation. In our overview, we follow this categorisation and focus on tasks related to data retrieval. More specifically, we will focus on software architectures that receive as input a set of full texts or abstracts of clinical trial reports. Report types of interest are randomised controlled trials, cohort, or case-control studies. As output, the tools of interest should produce structured data representing features or findings from the study described. A comprehensive list with data fields of interest can be found in the supplementary material for this protocol.

Eligibility criteria

Eligible papers

  • We will include full text publications that describe an original natural language processing approach to extract data related to systematic reviewing tasks. Data fields of interest are adapted from the Cochrane Handbook for Systematic Reviews of Interventions 13 , and defined in the Extended data 11 . We will include the full range of natural language processing (NLP) methods, including for example regular expressions, rule-based systems, machine learning, and deep neural networks.
  • Papers must describe a full cycle of implementation and evaluation of a method.
  • We will include reports published from 2005 until the present day, similar to O’Mara-Eves et al. 3 and Jonnalagadda et al. 4 . We will translate non-English reports where feasible.
  • The data that that included papers use for mining must be texts from randomised controlled trials, comparative cohort studies or case control studies in the form of abstracts, conference proceedings, full texts or part of the text body.

Ineligible papers

We will exclude papers reporting:

  • methods and tools related solely to image processing and importing biomedical data from PDF files without any NLP approach, including data extraction from graphs;
  • any research that focuses exclusively on protocol preparation, synthesis of already extracted data, write-up, pre-processing of text and dissemination will be excluded;
  • methods or tools that provide no natural language processing approach and offer only organisational interfaces, document management, databases or version control; or
  • any publications related to electronic health reports or mining genetic data will be excluded.

Key items of interest

  • 1. Machine learning approaches used
  • 2. Reported performance metrics used for evaluation
  • Scope: Abstract, conference proceeding, or full text
  • Target design: Randomised controlled trial, cohort, case-control
  • Type of input: The input data format, for example data imported as structured result of literature search (e.g. RIS), APIs, from PDF or text files.
  • Type of output: In which format are data exported after the extraction, for example as text file.
  • 1. Granularity of data mining: Does the system extract specific entities, sentences, or larger parts of text?
  • 2. Other reported metrics, such as impacts on systematic review processes (e.g. time saved during data extraction).

Assessment of the quality of reporting: We will extract information related to the quality of reporting and reproducibility of methods in text mining 14 . The domains of interest, adapted for our reviewing task, are listed in the following.

  • Are the sources for training/testing data reported?
  • If pre-processing techniques were applied to the data, are they described?
  • Is there a description of the algorithms used?
  • Is there a description of the dataset used and of its characteristics?
  • Is there a description of the hardware used?
  • Is the source code available?
  • Is there a justification/an explanation of the model assessment?
  • Are basic metrics reported (true/false positives and negatives)?
  • Does the assessment include any information about trade-offs between recall and precision (also known as sensitivity and positive predictive value)?
  • Can we obtain a runnable version of the software based on the information in the publication?
  • Persistence: is the dataset likely to be available for future use?
  • Is the use of third-party frameworks reported and are they accessible?
  • Does the dataset or assessment measure provide a possibility to compare to other tools in same domain?
  • Are explanations for the influence of both visible and hidden variables in the dataset given?
  • Is the process of avoiding over- or underfitting described?
  • Is the process of splitting training from validation data described?
  • Is the model’s adaptability to different formats and/or environments beyond training and testing data described?
  • Does the paper describe caveats for using the method?
  • Are sources of funding described?
  • Are conflicts of interest reported?

Dissemination of information

We plan to publish the finished review, along with future updates, via F1000Research.

All data will be available via a project on Open Science Framework (OSF): https://osf.io/4sgfz/ (see Data availability ).

Study status

Protocol published. We did a preliminary Medline search as described in this protocol and the supplementary material. The final search, including all additional databases, will be conducted as part of the full review.

Data availability

Acknowledgements.

We thank Sarah Dawson for developing and evaluating the search strategy, and providing advice on databases to search for this review. Many thanks also to Alexandra McAleenan and Vincent Cheng for providing valuable feedback on this protocol.

[version 2; peer review: 2 approved]

Funding Statement

This work was funded by the National Institute for Health Research [RM-SR-2017-09-028; NIHR Systematic Review Fellowship to LS and DRF-2018-11-ST2-048; NIHR Doctoral Research Fellowship to LAM]. The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. LS funding ends in September 2020, but ideally further updates to this review will continue after this date.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Reviewer response for version 2

Matt carter.

1 Bond University Centre for Research in Evidence-Based Practice, Bond University, Robina, Australia

No further comments.

Is the study design appropriate for the research question?

Is the rationale for, and objectives of, the study clearly described?

Are sufficient details of the methods provided to allow replication by others?

Are the datasets clearly presented in a useable and accessible format?

Reviewer Expertise:

Automation of Systematic Reviews

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Reviewer response for version 1

I believe that the research proposal clearly lays out its objectives and aims. Although there are some minor edits as per the below list:

  • The abstract should specifically mention PICO data extraction rather than data extraction generally. The "Aims" section outlines this but seems to contradict with the more general "Introduction / Objective" section.
  • While the Tsafnat et al. workflow is considered the first, other papers such as Clark et al. (2019) or Macleod et al. for animal testing are a little more current and also fix some of the gaps with Tsafnat. Since there are a few in this area now a reason should be given for the preference.
  • "We plan to run searches for new studies every second month" - There is currently an effort to get this adjusted to bi-monthly as you suggest but the guidelines specify that this should be monthly. Perhaps just a quick sentence acknowledging this and saying that it is excessive for the subject area?
  • The generated search strategy has a specific filter for English / German papers which seems to contradict the more general phrasing of "We will translate non-English reports where feasible".
  • Data sets are obviously not provided as this is an ongoing paper.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

University of Bristol, UK

“I believe that the research proposal clearly lays out its objectives and aims. Although there are some minor edits as per the below list:

  • The abstract should specifically mention PICO data extraction rather than data extraction generally. The "Aims" section outlines this but seems to contradict with the more general "Introduction / Objective" section.”

Thank you for pointing this out, we have made some changes to the abstract and to the aims. Firstly, in the abstract we clarified that the data extraction goal is wider (not limited to PICO, it included fields that are generally of interest in systematic reviews as defined in the Cochrane Handbook). We have made changes in the aims section to reflect that, and to make it more consistent throughout. In the method section, we deleted the section “Objectives” and summarised everything under “Aims of this review” in the introduction.

In response to this comment we added a more recent paper review paper to our summary of related research (Marshall and Wallace, 2019). The related research was chosen because it was either in the form of a systematic review or it was an overview that is directly related to our topic of interest. We also added an additional explanation for our preference to focus on the data extraction stage of the systematic review process in response to this, and also a previous peer review comment . To summarise this quickly, data extraction is one of the most time-consuming and error-prone tasks in the systematic reviewing process. By reviewing automation of data extraction, we aim to summarise the current knowledge. Furthermore, the area of data extraction has future potential to disrupt the “traditional” systematic review process – if data are extracted and well classified centrally then the searching and screening workflow can change as well.

Thank you, yes. We have added a statement: “The bi-monthly interval for screening was chosen because we expect no sudden rise in relevant publications that could justify daily, weekly or monthly screening”. Furthermore, we added a statement defining impactful research that would lead to a review update even if the threshold of new studies is not met (please see reply to the first peer review for reference)

  • “The generated search strategy has a specific filter for English / German papers which seems to contradict the more general phrasing of "We will translate non-English reports where feasible".

Thank you, this item was unclear. This initial draft of the search strategy was created for protocol publication and will be minimally altered for the full search and when the remaining databases are searched. Then it will be impossible to use a language filter on most databases. The initial Medline search specifically included German studies because they are feasible to assess, and for any other database search we have no language restrictions. The process of searching all databases will be described in detail in the full review.

  • “Data sets are obviously not provided as this is an ongoing paper.”

Thank you for your peer review and for helping us to improve the quality of our paper.

Emma McFarlane

1 National Institute for Health and Care Excellence, Manchester, UK

This protocol outlines a project to review methods and tools for data extraction to help automate a step in the systematic reviewing process. This will be done in the context of a living systematic review with the aim of providing guidance to reviewers who may want to semi-automate their work.

The objectives of the study are described clearly and are set within the current context of increasing publications. However, it would be helpful, as part of the aim and purpose of the study, to note why the focus is around the data extraction stage specifically.

The methods of the study are described in enough detail to be replicated. In terms of the methods, can the authors double check the searching approach for accuracy? The abstract notes searches will be conducted monthly whereas the body of the protocol states every two months. The authors indicated they will update their systematic review when evidence expected to impact is identified however, it would be helpful to include some detail to note how impactful evidence will be defined for people doing similar work. 

A comment from the authors about their choice of study design for the included papers would be helpful. Identifying RCTs in the context of data science is likely to be challenging so it would be interesting to understand the expectations about the evidence base and how study design could link to the point about impact on the results of the review and need to update.

The outcomes listed in the protocol appear to be comprehensive. However, it is not clear if consideration was given to accuracy of the tools identified as that could link to the objective of identifying whether automation is reliable. 

Overall, this appears to be an interesting study straddling the fields of systematic reviewing and data science.

Not applicable

I am currently conducting a systematic review of automation in systematic reviewing or guideline development. I am also working on a research project on machine learning within the context of guideline recommendations.

Thank you for providing this very helpful peer review. We tried to address the concerns below:

“This protocol outlines a project to review methods and tools for data extraction to help automate a step in the systematic reviewing process. This will be done in the context of a living systematic review with the aim of providing guidance to reviewers who may want to semi-automate their work.”

“The objectives of the study are described clearly and are set within the current context of increasing publications. However, it would be helpful, as part of the aim and purpose of the study, to note why the focus is around the data extraction stage specifically.”

Thank you for pointing this out. In the current revision we added more details about why we chose to focus on data extraction. To summarise this quickly, data extraction is one of the most time-consuming and error-prone tasks in the systematic reviewing process. By reviewing automation of data extraction, we aim to summarise the current knowledge. Furthermore, the area of data extraction has future potential to disrupt the “traditional” systematic review process – if data are extracted and well classified centrally then the searching and screening workflow can change as well.

“The methods of the study are described in enough detail to be replicated. In terms of the methods, can the authors double check the searching approach for accuracy? The abstract notes searches will be conducted monthly whereas the body of the protocol states every two months. “

We addressed this, thank you. New articles will be screened every two months.

“The authors indicated they will update their systematic review when evidence expected to impact is identified however, it would be helpful to include some detail to note how impactful evidence will be defined for people doing similar work.“

Thank you, this was not clear previously, and we added some further explanation: “We define impactful evidence as, for example, the publication of a tool that is immediately accessible to systematic reviewers and offers substantial automation of the data extraction process, or a tool that aims to change the traditional SR workflow”

“A comment from the authors about their choice of study design for the included papers would be helpful. Identifying RCTs in the context of data science is likely to be challenging so it would be interesting to understand the expectations about the evidence base and how study design could link to the point about impact on the results of the review and need to update.”

We clarified that we are open to include any paper, as long as the paper reports an automation technology for systematic reviewing that processes texts from clinical studies (not electronic health reports). To clarify further, the texts that the technologies process are likely to be RCT reports, but we are not looking to identify RCTs that compare data extraction tools.

“The outcomes listed in the protocol appear to be comprehensive. However, it is not clear if consideration was given to accuracy of the tools identified as that could link to the objective of identifying whether automation is reliable.“

Thank you for this comment. In response we clarified that we will extract the reported performance metrics. We do not plan to rank tools based on these metrics because results can vary throughout different datasets. Instead, we aim to assess quality of reporting (reproducibility, transparency…) in detail.

“Overall, this appears to be an interesting study straddling the fields of systematic reviewing and data science.”

Thank you very much for your review, it was very helpful in improving the protocol.

UCI Libraries Mobile Site

  • Langson Library
  • Science Library
  • Grunigen Medical Library
  • Law Library
  • Connect From Off-Campus
  • Accessibility
  • Gateway Study Center

Libaries home page

Email this link

Systematic reviews & evidence synthesis methods.

  • Schedule a Consultation / Meet our Team
  • What is Evidence Synthesis?
  • Types of Evidence Synthesis
  • Evidence Synthesis Across Disciplines
  • Finding and Appraising Existing Systematic Reviews
  • 1. Develop a Protocol
  • 2. Draft your Research Question
  • 3. Select Databases
  • 4. Select Grey Literature Sources
  • 5. Write a Search Strategy
  • 6. Register a Protocol
  • 7. Translate Search Strategies
  • 8. Citation Management
  • 9. Article Screening
  • 10. Risk of Bias Assessment
  • 11. Data Extraction
  • 12. Synthesize, Map, or Describe the Results
  • Open Access Evidence Synthesis Resources

Data Extraction

Whether you plan to perform a meta-analysis or not, you will need to establish a regimented approach to extracting data. Researchers often use a form or table to capture the data they will then summarize or analyze. The amount and types of data you collect, as well as the number of collaborators who will be extracting it, will dictate which extraction tools are best for your project. Programs like Excel or Google Spreadsheets may be the best option for smaller or more straightforward projects, while systematic review software platforms can provide more robust support for larger or more complicated data.

It is recommended that you pilot your data extraction tool, especially if you will code your data, to determine if fields should be added or clarified, or if the review team needs guidance in collecting and coding data.

Data Extraction Tools

  • Excel Excel is the most basic tool for the management of the screening and data extraction stages of the systematic review process. Customized workbooks and spreadsheets can be designed for the review process.
  • Covidence Covidence is a software platform built specifically for managing each step of a systematic review project, including data extraction. Read more about how Covidence can help you customize extraction tables and export your extracted data.
  • RevMan RevMan is free software used to manage Cochrane reviews. For more information on RevMan, including an explanation of how it may be used to extract and analyze data, watch Introduction to RevMan - a guided tour .
  • SRDR SRDR (Systematic Review Data Repository) is a Web-based tool for the extraction and management of data for systematic review or meta-analysis. It is also an open and searchable archive of systematic reviews and their data. Access the help page for more information.
  • DistillerSR DistillerSR is a systematic review management software program, similar to Covidence. It guides reviewers in creating project-specific forms, extracting, and analyzing data.
  • Sumari JBI Sumari (the Joanna Briggs Institute System for the United Management, Assessment and Review of Information) is a systematic review software platform geared toward fields such as health, social sciences, and humanities. Among the other steps of a review project, it facilitates data extraction and data synthesis. View their short introductions to data extraction and analysis for more information.
  • The Systematic Review Toolbox The SR Toolbox is a community-driven, searchable, web-based catalogue of tools that support the systematic review process across multiple domains. Use the advanced search option to restrict to tools specific to data extraction.

Additional Information

These resources offer additional information and examples of data extraction forms:​

Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for developing a coding scheme for meta-analysis.  Western Journal of Nursing Research ,  25 (2), 205–222. https://doi.org/10.1177/0193945902250038

Elamin, M. B., Flynn, D. N., Bassler, D., Briel, M., Alonso-Coello, P., Karanicolas, P. J., … Montori, V. M. (2009). Choice of data extraction tools for systematic reviews depends on resources and review complexity.  Journal of Clinical Epidemiology ,  62 (5), 506–510. https://doi.org/10.1016/j.jclinepi.2008.10.016

Higgins, J.P.T., & Thomas, J. (Eds.) (2022). Cochrane handbook for systematic reviews of interventions   Version 6.3. The Cochrane Collaboration. Available from https://training.cochrane.org/handbook/current (see Part 2: Core Methods, Chapters 4, 5)

Research guide from the George Washington University Himmelfarb Health Sciences Library.

  • << Previous: 10. Risk of Bias Assessment
  • Next: 12. Synthesize, Map, or Describe the Results >>
  • Last Updated: Apr 22, 2024 7:04 PM
  • URL: https://guides.lib.uci.edu/evidence-synthesis

Off-campus? Please use the Software VPN and choose the group UCIFull to access licensed content. For more information, please Click here

Software VPN is not available for guests, so they may not have access to some content when connecting from off-campus.

DistillerSR Logo

About Systematic Reviews

Data Extraction Tools for Systematic Reviews

data extraction tools for literature reviews

Automate every stage of your literature review to produce evidence-based research faster and more accurately.

Data extraction tools such as DistillerSR, for example, can help better manage, categorize, and refine data gathered from studies in a systematic review, leading to more accessible, understandable, and organized results.

What Is Data Extraction?

Data extraction is a step of the systematic review process that involves collecting, analyzing, and organizing data to develop evidence and summary tables based on the characteristics of a study, its results, or both. These tables will help determine which studies are eligible for synthesis and provide detailed information and a high-level overview of research findings. Data supposed to be used in systematic reviews are usually from a variety of sources, many of which are completely unstructured or poorly organized.

Data extraction requires a lot of planning, and it’s recommended that a research team has at least two reviewers to reduce bias and errors, and to properly organize a systematic review . This is necessary to ensure that information is relevant, accurate, and complete, and is presented in an accessible way for data sharing and future review updates.

What Data To Extract For A Systematic Review

Systematic reviews gather data from several sources of primary research. These include:

  • Journal articles
  • Conference abstracts
  • Errata and letters
  • Trial registers
  • Clinical study reports (CSRs)
  • Regulatory reviews
  • Participant data

Extracting data from these is done based on pre-established guidelines, which guide the types of data and sources that need to be analyzed based on the purpose of the systematic review. Some information to be considered are:

  • Year of publication
  • Topic, research question, or hypothesis
  • Conceptual framework
  • Research methodology or study type
  • Interventions
  • Outcomes and conclusions

Learn More About DistillerSR

(Article continues below)

data extraction tools for literature reviews

What Is a Data Extraction Tool?

Data extraction tools are tools used to manage data in systematic reviews. There are several available ones, including paper forms, electronic forms, and specialized software.

Each tool offers benefits and drawbacks: specialized web-based software is well suited in most ways but is associated with higher setup costs. Other approaches vary in their setup costs and difficulty, training requirements, portability and accessibility, versatility, progress tracking, and the ability to manage, present, store, and retrieve data.

Paper Forms

Paper forms refer to the manual recording of data, i.e., the use of pen and paper. This is a low-cost, low-resource use that’s suitable for small-scale reviews (with less than 10 included studies reviewed by a small team). That said, extra care must be taken with this method since it’s susceptible to human error and it can be difficult to amend when changes arise.

Electronic Forms

Electronic forms, which refer to online software, allow data to be processed, edited, stored, shared, and collated electronically. They make the step easier and faster while reducing the risk of errors (versus paper forms). These are best suited for small- to medium-scale reviews of teams with a few more resources, granted that the reviewers have familiarity with the software packages.

Data Systems

Data systems, like DistillerSR, are the most versatile and efficient data extraction tools suitable for all systematic review scales and team resource levels. They offer a wide range of benefits, including systematic review-specific functions (data automation, data integration, data export, etc.) and simultaneous access for all reviewers. Depending on your needs, it can require a bit of investment, both financially and in the effort that it takes to learn how to use it, but the tradeoff is worth it.

Tips For Extracting Data for A Systematic Review

3 reasons to connect.

data extraction tools for literature reviews

Brown University Homepage

Health Sciences Literature Reviews

  • Literature review services
  • Types of Reviews
  • Steps in the Evidence Synthesis process
  • Search Filters & Tools
  • Databases & Gray Literature
  • Screening tools & Data extraction
  • Citation Management
  • University Resources
  • Health and Biomedical Library Services

Screening Tools

The screening process is done in two stages:

  • Title/Abstract: Include or exclude articles for your review by reading the titles and abstracts of each result and determining if they fit your eligibility criteria.
  • Full Text: Results that have made it through the first stage are now screened by reading the full text of the article and determining if they fit your eligibility criteria. Document your reasons for exclusion at this stage.

The following tools can be used to screen through the search results of your reviews and select articles for inclusion in the review based on your eligibility criteria.

  • Covidence Covidence is a screening and data extraction tool for comprehensive literature reviews that is available to the Brown community. It automatically de-duplicates imported results, and users can create PRISMA flow diagrams easily, along with templates for data extraction and quality assessment.
  • Abstrackr From the Center for Evidence Synthesis in Health (CeSH) here at Brown University. Free, open-source tool for collaborative screening of abstracts.
  • EndNote EndNote is citation management software that is available to the Brown community. With EndNote, you can: import references directly from multiple databases; organize and manage references; and format bibliographies and manuscripts.
  • Rayyan Software for screening through titles and abstracts for reviews. Free version includes unlimited reviews and AI-powered features.

Data extraction tools

  • Systematic Review Data Repository (SRDR) From AHRQ, SRDR is a powerful and easy-to-use tool for extraction and management of data for systematic review or meta-analysis. It is also an open and searchable archive of systematic reviews and their data.
  • MS Excel (PIECES Workbook) An advanced approach to using MS Excel for data extraction, developed by Margaret J. Foster.
  • << Previous: Databases & Gray Literature
  • Next: Citation Management >>
  • Last Updated: Apr 19, 2024 9:15 AM
  • URL: https://libguides.brown.edu/Reviews

moBUL - Mobile Brown University Library

Brown University Library  |  Providence, RI 02912  |  (401) 863-2165  |  Contact  |  Comments  |  Library Feedback  |  Site Map

Library Intranet

Systematic Reviews: Data Extraction/Coding/Study characteristics/Results

  • Types of literature review, methods, & resources
  • Protocol and registration
  • Search strategy
  • Medical Literature Databases to search
  • Study selection and appraisal
  • Data Extraction/Coding/Study characteristics/Results
  • Reporting the quality/risk of bias
  • Manage citations using RefWorks This link opens in a new window
  • GW Box file storage for PDF's This link opens in a new window

Data Extraction: PRISMA Item 10

The next step is for the researchers to read the full text of each article identified for inclusion in the review and  extract the pertinent data using a standardized data extraction/coding form.  The data extraction form should be as long or as short as necessary and can be coded for computer analysis if desired.

If you are writing a narrative review to summarise information reported in a small number of studies then you probably don't need to go to the trouble of coding the data variables for computer analysis but instead summarize the information from the data extraction forms for the included studies.

If you are conducting an analytical review with a meta-analysis to compare data outcomes from several clinical trials you may wish to computerize the data collection and analysis processes.   Reviewers can use fillable forms to collect and code data reported in the studies included in the review, the data can then be uploaded to analytical computer software such as Excel or SPSS for statistical analysis.  GW School of Medicine, School of Public Health, and School of Nursing faculty, staff, and students can use the various  statistical analytical software in the Himmelfarb Library , and watch online training videos from LinkedIn Learning  at the Talent@GW website to learn about how to perform statistical analysis with Excel and SPSS.

Software to help you create coded data extraction forms from templates include: Covidence ,  DistillerSR (needs subscription), EPPI Reviewer (subscription, free trial), or AHRQ's  SRDR tool  (free) which is web-based and has a training environment, tutorials, and example templates of systematic review data extraction forms .  If you prefer to design your own coded data extraction form from scratch  Elamin et al (2009)  offer advice on how to decide what electronic tools to use to extract data for analytical reviews. The process of designing a coded data extraction form and codebook are described in  Brown, Upchurch & Acton (2003)  and  Brown et al (2013) .  You should assign a unique identifying number to each variable field so they can be programmed into fillable form fields in whatever software you decide to use for data extraction/collection. You can use AHRQ's Systematic Review Data Repository  SRDR tool , or online survey forms such as Qualtrics, RedCAP , or Survey Monkey, or design and create your own coded fillable forms using Adobe Acrobat Pro or Microsoft Access.   You might like to include on the data extraction form a field for grading the quality of the study, see the Screening for quality page for examples of some of the quality scales you might choose to apply.

Three examples of a data extraction form are below:  

  • Data Extraction Form Example (suitable for small-scale literature review of a few dozen studies) This example was used to gather data for a poster reporting a literature review of studies of interventions to increase Emergency Department throughput. The poster can be downloaded from http://hsrc.himmelfarb.gwu.edu/libfacpres/62/
  • Data Extraction Form for the Cochrane Review Group (uncoded & used to extract fine-detail/many variables) This is one example of a form, illustrating the thoroughness of the Cochrane research methodology. You could devise a simpler one page data extraction form for a more simple literature review.
  • Coded data extraction form (fillable form fields that can be computerized for data analysis) See Table 1 of Brown, Upchurch & Acton (2013)

Study characteristics: PRISMA Item 18

The data extraction forms can be used to produce a summary table of study characteristics that were considered important for inclusion. 

In the final report in the results section the characteristics of the studies that were included in the review should be reported for PRISMA Item 18 as:

  • Summary PICOS (Patient/Population, Intervention, Comparison if any, Outcomes, Study Design Type) and other pertinent characteristics of the reviewed studies should be reported both in the text in the Results section and in the form of a table. Here is an example of a  table that summarizes the characteristics of studies  in a review, note this table could be improved by adding a column for the quality score you assigned to each study, or you could add a column with a value representing the time period in which the study was carried out if this might be useful for the reader to know. The summary table could either be an appendix or in the text itself if the table is small enough e.g. similar to Table 1 of Shah et al (2007) .

A bibliography of the included studies should always be created, particularly if you are intending to publish your review. Read the advice for authors page on the journal website, or ask the journal editor to advise you on what citation format the journal requires you to use. Himmelfarb Library recommends using  RefWorks  to manage your references.

Results: PRISMA Item 20

In the final report the results from individual studies should be reported for PRISMA Item 20 as follows:

For all outcomes considered (benefits or harms) from each included study write in the results section:

  • (a) simple summary data for each intervention group
  • (b) effect estimates and confidence intervals

In a review where you are reporting a binary outcome e.g. intervention vs placebo or control, and you are able to combine/pool results from several experimental studies done using the same methods on like populations in like settings, then in the results section you should report the relative strength of treatment effects from each study in your review and the combined effect outcome from your meta-analysis.  For a meta-analysis of Randomized trials you should represent the meta-analysis visually on a “forest plot”  (see fig. 2).  Here is another example of a meta-analysis  forest plot , and on page 2 a description of how to interpret it.  

If your review included heterogenous study types (ie some combination of experimental trials and observational studies) you won't be able to do a meta-analysis, then instead your analysis could follow the  Synthesis Without Meta-analysis (SWiM) guideline , and consider presenting your results in an alternative visually arresting graphic using a template in Excel or SPSS or from a web-based applications for  infographics .   GW faculty, staff, and students, may watch online training videos from  LinkedIn Learning  at the Talent@GW website  to learn how to work with charts and graphs, and design infographics.

  • << Previous: Study selection and appraisal
  • Next: Reporting the quality/risk of bias >>

Creative Commons License

  • Last Updated: Apr 22, 2024 9:18 AM
  • URL: https://guides.himmelfarb.gwu.edu/systematic_review

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2850
  • [email protected]
  • https://himmelfarb.gwu.edu

Banner

  • JABSOM Library

Systematic Review Toolbox

Data extraction.

  • Guidelines & Rubrics
  • Databases & Indexes
  • Reference Management
  • Quality Assessment
  • Data Analysis
  • Manuscript Development
  • Software Comparison
  • Systematic Searching This link opens in a new window
  • Authorship Determination This link opens in a new window
  • Critical Appraisal Tools This link opens in a new window

Requesting Research Consultation

The Health Sciences Library provides consultation services for University of Hawaiʻi-affiliated students, staff, and faculty. The John A. Burns School of Medicine Health Sciences Library does not have staffing to conduct or assist researchers unaffiliated with the University of Hawaiʻi. Please utilize the publicly available guides and support pages that address research databases and tools.

Before Requesting Assistance

Before requesting systematic review assistance from the librarians, please review the relevant guides and the various pages of the Systematic Review Toolbox . Most inquiries received have been answered there previously. Support for research software issues is limited to help with basic installation and setup. Please contact the software developer directly if further assistance is needed.

Data extraction is the process of extracting the relevant pieces of information from the studies you have assessed for eligibility in your review and organizing the information in a way that will help you synthesize the studies and draw conclusions.

Extracting data from reviewed studies should be done in accordance to pre-established guidelines, such as the ones from PRISMA . From each included study, the following data may need to be extracted, depending on the review's purpose: title, author, year, journal, research question and specific aims, conceptual framework, hypothesis, research methods or study type, and concluding points. Special attention should be paid to the methodology, in order to organize studies by study type category in the review results section. If a meta-analysis is also being completed, extract raw and refined data from each result in the study.

Established frameworks for extracting data have been created. Common templates are offered by Cochrane  and supplementary resources have been collected by the George Washington University Libraries . Other forms are built into systematic review manuscript development software (e.g., Covidence, RevMan), although many scholars prefer to simply use Excel to collect data.

  • Data Collection Form A template developed by the Cochrane Collaboration for data extraction of both RCTs and non-RCTs in a systematic review
  • Data Extraction Template A comprehensive template for systematic reviews developed by the Cochrane Haematological Malignancies Group
  • A Framework for Developing a Coding Scheme for Meta-Analysis
  • << Previous: Reference Management
  • Next: Quality Assessment >>
  • Last Updated: Sep 20, 2023 9:14 AM
  • URL: https://hslib.jabsom.hawaii.edu/systematicreview

Health Sciences Library, John A. Burns School of Medicine, University of Hawai‘i at Mānoa, 651 Ilalo Street, MEB 101, Honolulu, HI 96813 - Phone: 808-692-0810, Fax: 808-692-1244

Copyright © 2004-2024. All rights reserved. Library Staff Page - Other UH Libraries

icon

Choice of data extraction tools for systematic reviews depends on resources and review complexity

Affiliation.

  • 1 Knowledge and Encounter Research Unit, College of Medicine, Mayo Clinic, Rochester, MN 55905, USA.
  • PMID: 19348977
  • DOI: 10.1016/j.jclinepi.2008.10.016

Objective: To assist investigators planning, coordinating, and conducting systematic reviews in the selection of data-extraction tools for conducting systematic reviews.

Study design and setting: We constructed an initial table listing available data-collection tools and reflecting our experience with these tools and their performance. An international group of experts iteratively reviewed the table and reflected on the performance of the tools until no new insights and consensus resulted.

Results: Several tools are available to manage data in systematic reviews, including paper and pencil, spreadsheets, web-based surveys, electronic databases, and web-based specialized software. Each tool offers benefits and drawbacks: specialized web-based software is well suited in most ways, but is associated with higher setup costs. Other approaches vary in their setup costs and difficulty, training requirements, portability and accessibility, versatility, progress tracking, and the ability to manage, present, store, and retrieve data.

Conclusion: Available funding, number and location of reviewers, data needs, and the complexity of the project should govern the selection of a data-extraction tool when conducting systematic reviews.

Publication types

  • Research Support, Non-U.S. Gov't
  • Data Collection / economics
  • Data Collection / methods*
  • Meta-Analysis as Topic
  • Systematic Reviews as Topic*
  • Open access
  • Published: 15 June 2015

Automating data extraction in systematic reviews: a systematic review

  • Siddhartha R. Jonnalagadda 1 ,
  • Pawan Goyal 2 &
  • Mark D. Huffman 3  

Systematic Reviews volume  4 , Article number:  78 ( 2015 ) Cite this article

41k Accesses

125 Citations

37 Altmetric

Metrics details

Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews.

We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports.

Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %.

Conclusions

We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1–7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.

Peer Review reports

Systematic reviews identify, assess, synthesize, and interpret published and unpublished evidence, which improves decision-making for clinicians, patients, policymakers, and other stakeholders [ 1 ]. Systematic reviews also identify research gaps to develop new research ideas. The steps to conduct a systematic review [ 1 – 3 ] are:

Define the review question and develop criteria for including studies

Search for studies addressing the review question

Select studies that meet criteria for inclusion in the review

Extract data from included studies

Assess the risk of bias in the included studies, by appraising them critically

Where appropriate, analyze the included data by undertaking meta-analyses

Address reporting biases

Despite their widely acknowledged usefulness [ 4 ], the process of systematic review, specifically the data extraction step (step 4), can be time-consuming. In fact, it typically takes 2.5–6.5 years for a primary study publication to be included and published in a new systematic review [ 5 ]. Further, within 2 years of the publication of systematic reviews, 23 % are out of date because they have not incorporated new evidence that might change the systematic review’s primary results [ 6 ].

Natural language processing (NLP), including text mining, involves information extraction, which is the discovery by computer of new, previously unfound information by automatically extracting information from different written resources [ 7 ]. Information extraction primarily constitutes concept extraction, also known as named entity recognition, and relation extraction, also known as association extraction. NLP handles written text at level of documents, words, grammar, meaning, and context. NLP techniques have been used to automate extraction of genomic and clinical information from biomedical literature. Similarly, automation of the data extraction step of the systematic review process through NLP may be one strategy to reduce the time necessary to complete and update a systematic review. The data extraction step is one of the most time-consuming steps of a systematic review. Automating or even semi-automating this step could substantially decrease the time taken to complete systematic reviews and thus decrease the time lag for research evidence to be translated into clinical practice. Despite these potential gains from NLP, the state of the science of automating data extraction has not been well described.

To date, there is limited knowledge and methods on how to automate the data extraction phase of the systematic reviews, despite being one of the most time-consuming steps. To address this gap in knowledge, we sought to perform a systematic review of methods to automate the data extraction component of the systematic review process.

Our methodology was based on the Standards for Systematic Reviews set by the Institute of Medicine [ 8 ]. We conducted our study procedures as detailed below with input from the Cochrane Heart Group US Satellite.

Eligibility criteria

We included a report that met the following criteria: 1) the methods or results section describes what entities were or needed to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity.

We excluded a report that met any of the following criteria: 1) the methods were not applied to the data extraction step of a systematic review; 2) the report was an editorial, commentary, or other non-original research report; or 3) there was no evaluation component.

Information sources and searches

For collecting the initial set of articles for our review, we developed search strategies with the help of the Cochrane Heart Group US Satellite, which includes systematic reviewers and a medical librarian. We refined these strategies using relevant citations from related papers. We searched three datasets: PubMed, IEEExplore, and ACM digital library, and our searches were limited between January 1, 2000 and January 6, 2015 (see Appendix 1 ). We restricted our search to these dates because biomedical information extraction algorithms prior to 2000 are unlikely to be accurate enough to be used for systematic reviews.

We retrieved articles that dealt with the extraction of various data elements, defined as categories of data that pertained to any information about or deriving from a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators [ 1 ] from included study reports. After we retrieved the initial set of reports from the search results, we then evaluated reports included in the references of these reports. We also sought expert opinion for additional relevant citations.

Study selection

We first de-duplicated the retrieve citations. For calibration and refinement of the inclusion and exclusion criteria, 100 citations were randomly selected and independently reviewed by a two authors (SRJ and PG). Disagreements were resolved by consensus with a third author (MH). In a second round, another set of 100 randomly selected abstracts was independently reviewed by two study authors (SRJ and PG), whereby we achieved a strong level of agreement (kappa = 0.97). Given the high level of agreement, the remaining studies were reviewed only by one author (PG). In this phase, we identified reports as “not relevant” or “potentially relevant”.

Two authors (PG and SRJ) independently reviewed the full text of all citations ( N  = 74) that were identified as “potentially relevant”. We classified included reports into various categories based on the particular data element that they attempted to extract from the original, scientific articles. Example of these data elements might be overall evidence, specific interventions, among others (Table  1 ). We resolved disagreements between the two reviewers through consensus with a third author (MDH).

Data collection process

Two authors (PG and SRJ) independently reviewed the included articles to extract data, such as the particular entity automatically extracted by the study, algorithm or technique used, and evaluation results into a data abstraction spreadsheet. We resolved disagreements through consensus with a third author (MDH).

We reviewed the Cochrane Handbook for Systematic Reviews [ 1 ], the CONsolidated Standards Of Reporting Trials (CONSORT) [ 9 ] statement, the Standards for Reporting of Diagnostic Accuracy (STARD) initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks to obtain the data elements to be considered. PICO stands for Population, Intervention, Comparison, Outcomes; PECODR stands for Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results; and PIBOSO stands for Population, Intervention, Background, Outcome, Study Design, Other.

Data synthesis and analysis

Because of the large variation in study methods and measurements, a meta-analysis of methodological features and contextual factors associated with the frequency of data extraction methods was not possible. We therefore present a narrative synthesis of our findings. We did not thoroughly assess risk of bias, including reporting bias, for these reports because the study designs did not match domains evaluated in commonly used instruments such as the Cochrane Risk of Bias tool [ 1 ] or QUADAS-2 instrument used for systematic reviews of randomized trials and diagnostic test accuracy studies, respectively [ 14 ].

Of 1190 unique citations retrieved, we selected 75 reports for full-text screening, and we included 26 articles that met our inclusion criteria (Fig.  1 ). Agreement on abstract and full-text screening was 0.97 and 1.00.

Process of screening the articles to be included for this systematic review

Study characteristics

Table  1 provides a list of items to be considered in the data extraction process based on the Cochrane Handbook (Appendix 2 ) [ 1 ], CONSORT statement [ 9 ], STARD initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks. We provide the major group for each field and report which standard focused on that field. Finally, we report whether there was a published method to extract that field. Table  1 also identifies the data elements relevant to systematic review process categorized by their domain and the standard from which the element was adopted and was associated with existing automation methods, where present.

Results of individual studies

Table  2 summarizes the existing information extraction studies. For each study, the table provides the citation to the study (study: column 1), data elements that the study focused on (extracted elements: column 2), dataset used by the study (dataset: column 3), algorithm and methods used for extraction (method: column 4), whether the study extracted only the sentence containing the data elements, full concept or neither of these (sentence/concept/neither: column 5), whether the extraction was done from full-text or abstracts (full text/abstract: column 6) and the main accuracy results reported by the system (results: column 7). The studies are arranged by increasing complexity by ordering studies that classified sentences before those that extracted the concepts and ordering studies that extracted data from abstracts before those that extracted data from full-text reports.

The accuracy of most ( N  = 18, 69 %) studies was measured using a standard text mining metric known as F-score, which is the harmonic mean of precision (positive predictive value) and recall (sensitivity). Some studies ( N  = 5, 19 %) reported only the precision of their method, while some reported the accuracy values ( N  = 2, 8 %). One study (4 %) reported P5 precision, which indicates the fraction of positive predictions among the top 5 results returned by the system.

Studies that did not implement a data extraction system

Dawes et al. [ 12 ] identified 20 evidence-based medicine journal synopses with 759 extracts in the corresponding PubMed abstracts. Annotators agreed with the identification of an element 85 and 87 % for the evidence-based medicine synopses and PubMed abstracts, respectively. After consensus among the annotators, agreement rose to 97 and 98 %, respectively. The authors proposed various lexical patterns and developed rules to discover each PECODR element from the PubMed abstracts and the corresponding evidence-based medicine journal synopses that might make it possible to partially or fully automate the data extraction process.

Studies that identified sentences but did not extract data elements from abstracts only

Kim et al. [ 13 ] used conditional random fields (CRF) [ 15 ] for the task of classifying sentences in one of the PICO categories. The features were based on lexical, syntactic, structural, and sequential information in the data. The authors found that unigrams, section headings, and sequential information from preceding sentences were useful features for the classification task. They used 1000 medical abstracts from PIBOSO corpus and achieved micro-averaged F-scores of 91 and 67 % over datasets of structured and unstructured abstracts, respectively.

Boudin et al. [ 16 ] utilized a combination of multiple supervised classification techniques for detecting PICO elements in the medical abstracts. They utilized features such as MeSH semantic types, word overlap with title, number of punctuation marks on random forests (RF), naive Bayes (NB), support vector machines (SVM), and multi-layer perceptron (MLP) classifiers. Using 26,000 abstracts from PubMed, the authors took the first sentence in the structured abstracts and assigned a label automatically to build a large training data. They obtained an F-score of 86 % for identifying participants (P), 67 % for interventions (I) and controls (C), and 56 % for outcomes (O).

Huang et al. [ 17 ] used a naive Bayes classifier for the PICO classification task. The training data were generated automatically from the structured abstracts. For instance, all sentences in the section of the structured abstract that started with the term “PATIENT” were used to identify participants (P). In this way, the authors could generate a dataset of 23,472 sentences. Using 23,472 sentences from the structured abstracts, they obtained an F-score of 91 % for identifying participants (P), 75 % for interventions (I), and 88 % for outcomes (O).

Verbeke et al. [ 18 ] used a statistical relational learning-based approach (kLog) that utilized relational features for classifying sentences. The authors also used the PIBOSO corpus for evaluation and achieved micro-averaged F-score of 84 % on structured abstracts and 67 % on unstructured abstracts, which was a better performance than Kim et al. [ 13 ].

Huang et al. [ 19 ] used 19,854 structured extracts and trained two classifiers: one by taking the first sentences of each section (termed CF by the authors) and the other by taking all the sentences in each section (termed CA by the authors). The authors used the naive Bayes classifier and achieved F-scores of 74, 66, and 73 % for identifying participants (P), interventions (I), and outcomes (O), respectively, by the CF classifier. The CA classifier gave F-scores of 73, 73, and 74 % for identifying participants (P), interventions (I), and outcomes (O), respectively.

Hassanzadeh et al. [ 20 ] used the PIBOSO corpus for the identification of sentences with PIBOSO elements. Using conditional random fields (CRF) with discriminative set of features, they achieved micro-averaged F-score of 91 %.

Robinson [ 21 ] used four machine learning models, 1) support vector machines, 2) naive Bayes, 3) naive Bayes multinomial, and 4) logistic regression to identify medical abstracts that contained patient-oriented evidence or not. These data included morbidity, mortality, symptom severity, and health-related quality of life. On a dataset of 1356 PubMed abstracts, the authors achieved the highest accuracy using a support vector machines learning model and achieved an F-measure of 86 %.

Chung [ 22 ] utilized a full sentence parser to identify the descriptions of the assignment of treatment arms in clinical trials. The authors used predicate-argument structure along with other linguistic features with a maximum entropy classifier. They utilized 203 abstracts from randomized trials for training and 124 abstracts for testing and achieved an F-score of 76 %.

Hara and Matsumoto [ 23 ] dealt with the problem of extracting “patient population” and “compared treatments” from medical abstracts. Given a sentence from the abstract, the authors first performed base noun-phrase chunking and then categorized the base noun-phrase into one of the five classes: “disease”, “treatment”, “patient”, “study”, and “others” using support vector machine and conditional random field models. After categorization, the authors used regular expression to extract the target words for patient population and comparison. The authors used 200 abstracts including terms such as “neoplasms” and “clinical trial, phase III” and obtained 91 % accuracy for the task of noun phrase classification. For sentence classification, the authors obtained a precision of 80 % for patient population and 82 % for comparisons.

Studies that identified only sentences but did not extract data elements from full-text reports

Zhao et al. [ 24 ] used two classification tasks to extract study data including patient details, including one at the sentence level and another at the keyword level. The authors first used a five-class scheme including 1) patient, 2) result, 3) intervention, 4) study design, and 5) research goal and tried to classify sentences into one of these five classes. They further used six classes for keywords such as sex (e.g., male, female), age (e.g., 54-year-old), race (e.g., Chinese), condition (e.g., asthma), intervention, and study design (e.g., randomized trial). They utilized conditional random fields for the classification task. Using 19,893 medical abstracts and full-text articles from 17 journal websites, they achieved F-scores of 75 % for identifying patients, 61 % for intervention, 91 % for results, 79 % for study design, and 76 % for research goal.

Hsu et al. [ 25 ] attempted to classify whether a sentence contains the “hypothesis”, “statistical method”, “outcomes”, or “generalizability” of the study and then extracted the values. Using 42 full-text papers, the authors obtained F-scores of 86 % for identifying hypothesis, 84 % for statistical method, 90 % for outcomes, and 59 % for generalizability.

Song et al. [ 26 ] used machine learning-based classifiers such as maximum entropy classifier (MaxEnt), support vector machines (SVM), multi-layer perceptron (MLP), naive Bayes (NB), and radial basis function network (RBFN) to classify the sentences into categories such as analysis (statistical facts found by clinical experiment), general (generally accepted scientific facts, process, and methodology), recommendation (recommendations about interventions), and rule (guidelines). They utilized the principle of information gain (IG) as well as genetic algorithm (GA) for feature selection. They used 346 sentences from the clinical guideline document and obtained an F-score of 98 % for classifying sentences.

Marshall et al. [ 27 ] used soft-margin support vector machines in a joint model for risk of bias assessment along with supporting sentences for random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, among others. They utilized presence of unigrams in the supporting sentences as features in their model. Working with full text of 2200 clinical trials, the joint model achieved F-scores of 56, 48, 35, and 38 % for identifying sentences corresponding to random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, respectively.

Studies that identified data elements only from abstracts but not from full texts

Demner-Fushman and Lin [ 28 ] used a rule-based approach to identify sentences containing PICO. Using 275 manually annotated abstracts, the authors achieved an accuracy of 80 % for population extraction and 86 % for problem extraction. They also utilized a supervised classifier for outcome extraction and achieved accuracy from 64 to 95 % across various experiments.

Kelly and Yang [ 29 ] used regular expressions and gazetteer to extract the number of participants, participant age, gender, ethnicity, and study characteristics. The authors utilized 386 abstracts from PubMed obtained with the query “soy and cancer” and achieved F-scores of 96 % for identifying the number of participants, 100 % for age of participants, 100 % for gender of participants, 95 % for ethnicity of participants, 91 % for duration of study, and 87 % for health status of participants.

Hansen et al. [ 30 ] used support vector machines [ 31 ] to extract number of trial participants from abstracts of the randomized control trials. The authors utilized features such as part-of-speech tag of the previous and next words and whether the sentence is grammatically complete (contained a verb). Using 233 abstracts from PubMed, they achieved an F-score of 86 % for identifying participants.

Xu et al. [ 32 ] utilized text classifications augmented with hidden Markov models [ 33 ] to identify sentences about subject demographics. These sentences were then parsed to extract information regarding participant descriptors (e.g., men, healthy, elderly), number of trial participants, disease/symptom name, and disease/symptom descriptors. After testing over 250 RCT abstracts, the authors obtained an accuracy of 83 % for participant descriptors: 83 %, 93 % for number of trial participants, 51 % for diseases/symptoms, and 92 % for descriptors of diseases/symptoms.

Summerscales et al. [ 34 ] used a conditional random field-based approach to identify various named entities such as treatments (drug names or complex phrases) and outcomes. The authors extracted 100 abstracts of randomized trials from the BMJ and achieved F-scores of 49 % for identifying treatment, 82 % for groups, and 54 % for outcomes.

Summerscales et al. [ 35 ] also proposed a method for automatic summarization of results from the clinical trials. The authors first identified the sentences that contained at least one integer (group size, outcome numbers, etc.). They then used the conditional random field classifier to find the entity mentions corresponding to treatment groups or outcomes. The treatment groups, outcomes, etc. were then treated as various “events.” To identify all the relevant information for these events, the authors utilized templates with slots. The slots were then filled using a maximum entropy classifier. They utilized 263 abstracts from the BMJ and achieved F-scores of 76 % for identifying groups, 42 % for outcomes, 80 % for group sizes, and 71 % for outcome numbers.

Studies that identified data elements from full-text reports

Kiritchenko et al. [ 36 ] developed ExaCT, a tool that assists users with locating and extracting key trial characteristics such as eligibility criteria, sample size, drug dosage, and primary outcomes from full-text journal articles. The authors utilized a text classifier in the first stage to recover the relevant sentences. In the next stage, they utilized extraction rules to find the correct solutions. The authors evaluated their system using 50 full-text articles describing randomized trials with 1050 test instances and achieved a P5 precision of 88 % for identifying the classifier. Precision and recall of their extraction rules was found to be 93 and 91 %, respectively.

Restificar et al. [ 37 ] utilized latent Dirichlet allocation [ 38 ] to infer the latent topics in the sample documents and then used logistic regression to compute the probability that a given candidate criterion belongs to a particular topic. Using 44,203 full-text reports of randomized trials, the authors achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.

Lin et al. [ 39 ] used linear-chain conditional random field for extracting various metadata elements such as number of patients, age group of the patients, geographical area, intervention, and time duration of the study. Using 93 full-text articles, the authors achieved a threefold cross validation precision of 43 % for identifying number of patients, 63 % for age group, 44 % for geographical area, 40 % for intervention, and 83 % for time period.

De Bruijn et al. [ 40 ] used support vector machine classifier to first identify sentences describing information elements such as eligibility criteria, sample size, etc. The authors then used manually crafted weak extraction rules to extract various information elements. Testing this two-stage architecture on 88 randomized trial reports, they obtained a precision of 69 % for identifying eligibility criteria, 62 % for sample size, 94 % for treatment duration, 67 % for intervention, 100 % for primary outcome estimates, and 67 % for secondary outcomes.

Zhu et al. [ 41 ] also used manually crafted rules to extract various subject demographics such as disease, age, gender, and ethnicity. The authors tested their method on 50 articles and for disease extraction obtained an F-score of 64 and 85 % for exactly matched and partially matched cases, respectively.

Risk of bias across studies

In general, many studies have a high risk of selection bias because the gold standards used in the respective studies were not randomly selected. The risk of performance bias is also likely to be high because the investigators were not blinded. For the systems that used rule-based approaches, it was unclear whether the gold standard was used to train the rules or if there were a separate training set. The risk of attrition bias is unclear based on the study design of these non-randomized studies evaluating the performance of NLP methods. Lastly, the risk of reporting bias is unclear because of the lack of protocols in the development, implementation, and evaluation of NLP methods.

Summary of evidence

Extracting the data elements.

Participants — Sixteen studies explored the extraction of the number of participants [ 12 , 13 , 16 – 20 , 23 , 24 , 28 – 30 , 32 , 39 ], their age [ 24 , 29 , 39 , 41 ], sex [ 24 , 39 ], ethnicity [ 41 ], country [ 24 , 39 ], comorbidities [ 21 ], spectrum of presenting symptoms, current treatments, and recruiting centers [ 21 , 24 , 28 , 29 , 32 , 41 ], and date of study [ 39 ]. Among them, only six studies [ 28 – 30 , 32 , 39 , 41 ] extracted data elements as opposed to highlighting the sentence containing the data element. Unfortunately, each of these studies used a different corpus of reports, which makes direct comparisons impossible. For example, Kelly and Yang [ 29 ] achieved high F-scores of 100 % for age of participants, 91 % for duration of study, 95 % for ethnicity of participants, 100 % for gender of subjects, 87 % for health status of participants, and 96 % for number of participants on a dataset of 386 abstracts.

Intervention — Thirteen studies explored the extraction of interventions [ 12 , 13 , 16 – 20 , 22 , 24 , 28 , 34 , 39 , 40 ], intervention groups [ 34 , 35 ], and intervention details (for replication if feasible) [ 36 ]. Of these, only six studies [ 28 , 34 – 36 , 39 , 40 ] extracted intervention elements. Unfortunately again, each of these studies used a different corpus. For example, Kiritchenko et al. [ 36 ] achieved an F-score of 75–86 % for intervention data elements on a dataset of 50 full-text journal articles.

Outcomes and comparisons — Fourteen studies also explored the extraction of outcomes and time points of collection and reporting [ 12 , 13 , 16 – 20 , 24 , 25 , 28 , 34 – 36 , 40 ] and extraction of comparisons [ 12 , 16 , 22 , 23 ]. Of these, only six studies [ 28 , 34 – 36 , 40 ] extracted the actual data elements. For example, De Bruijn et al. [ 40 ] obtained an F-score of 100 % for extracting primary outcome and 67 % for secondary outcome from 88 full-text articles. Summerscales [ 35 ] utilized 263 abstracts from the BMJ and achieved an F-score of 42 % for extracting outcomes.

Results — Two studies [ 36 , 40 ] extracted sample size data element from full text on two different data sets. De Bruijn et al. [ 40 ] obtained an accuracy of 67 %, and Kiritchenko et al. [ 36 ] achieved an F-score of 88 %.

Interpretation — Three studies explored extraction of overall evidence [ 26 , 42 ] and external validity of trial findings [ 25 ]. However, all these studies only highlighted sentences containing the data elements relevant to interpretation.

Objectives — Two studies [ 24 , 25 ] explored the extraction of research questions and hypotheses. However, both these studies only highlighted sentences containing the data elements relevant to interpretation.

Methods — Twelve studies explored the extraction of the study design [ 13 , 18 , 20 , 24 ], study duration [ 12 , 29 , 40 ], randomization method [ 25 ], participant flow [ 36 , 37 , 40 ], and risk of bias assessment [ 27 ]. Of these, only four studies [ 29 , 36 , 37 , 40 ] extracted the corresponding data elements from text using different sets of corpora. For example, Restificar et al. [ 37 ] utilized 44,203 full-text clinical trial articles and achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.

Miscellaneous — One study [ 26 ] explored extraction of key conclusion sentence and achieved a high F-score of 98 %.

Related reviews and studies

Previous reviews on the automation of systematic review processes describe technologies for automating the overall process or other steps. Tsafnat et al. [ 43 ] surveyed the informatics systems that automate some of the tasks of systematic review and report systems for each stage of systematic review. Here, we focus on data extraction. None of the existing reviews [ 43 – 47 ] focus on the data extraction step. For example, Tsafnat et al. [ 43 ] presented a review of techniques to automate various aspects of systematic reviews, and while data extraction has been described as a task in their review, they only highlighted three studies as an acknowledgement of the ongoing work. In comparison, we identified 26 studies and critically examined their contribution in relation to all the data elements that need to be extracted to fully support the data extraction step.

Thomas et al. [ 44 ] described the application of text mining technologies such as automatic term recognition, document clustering, classification, and summarization to support the identification of relevant studies in systematic reviews. The authors also pointed out the potential of these technologies to assist at various stages of the systematic review. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mentioned the need for development of new tools for reporting on and searching for structured data from clinical trials.

Tsafnat et al. [ 46 ] described four main tasks in systematic review: identifying the relevant studies, evaluating risk of bias in selected trials, synthesis of the evidence, and publishing the systematic reviews by generating human-readable text from trial reports. They mentioned text extraction algorithms for evaluating risk of bias and evidence synthesis but remain limited to one particular method for extraction of PICO elements.

Most natural language processing research has focused on reducing the workload for the screening step of systematic reviews (Step 3). Wallace et al. [ 48 , 49 ] and Miwa et al. [ 50 ] proposed an active learning framework to reduce the workload in citation screening for inclusion in the systematic reviews. Jonnalagadda et al. [ 51 ] designed a distributional semantics-based relevance feedback model to semi-automatically screen citations. Cohen et al. [ 52 ] proposed a module for grouping studies that are closely related and an automated system to rank publications according to the likelihood for meeting the inclusion criteria of a systematic review. Choong et al. [ 53 ] proposed an automated method for automatic citation snowballing to recursively pursue relevant literature for helping in evidence retrieval for systematic reviews. Cohen et al. [ 54 ] constructed a voting perceptron-based automated citation classification system to classify each article as to whether it contains high-quality, drug-specific evidence. Adeva et al. [ 55 ] also proposed a classification system for screening articles for systematic review. Shemilt et al. [ 56 ] also discussed the use of text mining to reduce screening workload in systematic reviews.

Research implications

No standard gold standards or dataset.

Among the 26 studies included in this systematic review, only three of them use a common corpus, namely 1000 medical abstracts from the PIBOSO corpus. Unfortunately, even that corpus facilitates only classification of sentences into whether they contain one of the data elements corresponding to the PIBOSO categories. No two other studies shared the same gold standard or dataset for evaluation. This limitation made it impossible for us to compare and assess the relative significance of the reported accuracy measures.

Separate systems for each data element

Few data elements, which are also relatively straightforward to extract automatically, such as the total number of participants (14 overall and 5 for extracting the actual data elements), have a relatively higher number of studies aiming towards extracting the same data element. This is not the case with other data elements. There are 27 out of 52 potential data elements that have not been explored for automated extraction, even if for highlighting the sentences containing them; seven more data elements were explored just by one study. There are 38 out of 52 potential data elements (>70 %) that have not been explored for automated extraction of the actual data elements; three more data elements were explored just by one study. The highest number of data elements extracted by a single study is only seven (14 %). This finding means that not only are more studies needed to explore the remaining 70 % data elements, but that there is an urgent need for a unified framework or system to extract all necessary data elements. The current state of informatics research for data extraction is exploratory, and multiple studies need to be conducted using the same gold standard and on the extraction of the same data elements for effective comparison.

Limitations

Our study has limitations. First, there is a possibility that data extraction algorithms were not published in journals or that our search might have missed them. We sought to minimize this limitation by searching in multiple bibliographic databases, including PubMed, IEEExplore, and ACM Digital Library. However, investigators may have also failed to publish algorithms that had lower F-scores than were previously reported, which we would not have captured. Second, we did not publish a protocol a priori, and our initial findings may have influenced our methods. However, we performed key steps, including screening, full-text review, and data extraction in duplicate to minimize potential bias in our systematic review.

Future work

“On demand” access to summarized evidence and best practices has been considered a sound strategy to satisfy clinicians’ information needs and enhance decision-making [ 57 – 65 ]. A systematic review of 26 studies concluded that information-retrieval technology produces positive impact on physicians in terms of decision enhancement, learning, recall, reassurance, and confirmation [ 62 ]. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mention the need for development of new tools for reporting on and searching for structured data from published literature. Automated information extraction framework that extract data elements have the potential to assist the systematic reviewers and to eventually automate the screening and data extraction steps.

Medical science is currently witnessing a rapid pace at which medical knowledge is being created—75 clinical trials a day [ 66 ]. Evidence-based medicine [ 67 ] requires clinicians to keep up with published scientific studies and use them at the point of care. However, it has been shown that it is practically impossible to do that even within a narrow specialty [ 68 ]. A critical barrier is that finding relevant information, which may be located in several documents, takes an amount of time and cognitive effort that is incompatible with the busy clinical workflow [ 69 , 70 ]. Rapid systematic reviews using automation technologies will enable clinicians with up-to-date and systematic summaries of the latest evidence.

Our systematic review describes previously reported methods to identify sentences containing some of the data elements for systematic reviews and only a few studies that have reported methods to extract these data elements. However, most of the data elements that would need to be considered for systematic reviews have been insufficiently explored to date, which identifies a major scope for future work. We hope that these automated extraction approaches might first act as checks for manual data extraction currently performed in duplicate; then serve to validate manual data extraction done by a single reviewer; then become the primary source for data element extraction that would be validated by a human; and eventually completely automate data extraction to enable living systematic reviews.

Abbreviations

natural language processing

CONsolidated Standards Of Reporting Trials

Standards for Reporting of Diagnostic Accuracy

Population, Intervention, Comparison, Outcomes

Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results

Population, Intervention, Background, Outcome, Study Design, Other

conditional random fields

naive Bayes

randomized control trial

British Medical Journal

Higgins J, Green S. Cochrane handbook for systematic reviews of interventions version 5.1. 0 [updated March 2011]. The Cochrane Collaboration. 2011. Available at [ http://community.cochrane.org/handbook ]

Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness: CRD’s guidance for carrying out or commissioning reviews, NHS Centre for Reviews and Dissemination. 2001.

Google Scholar  

Woolf SH. Manual for conducting systematic reviews, Agency for Health Care Policy and Research. 1996.

Field MJ, Lohr KN. Clinical practice guidelines: directions for a new program, Clinical Practice Guidelines. 1990.

Elliott J, Turner T, Clavisi O, Thomas J, Higgins J, Mavergames C, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014;11:e1001603.

Article   PubMed   PubMed Central   Google Scholar  

Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147(4):224–33.

Article   PubMed   Google Scholar  

Hearst MA. Untangling text data mining. Proceedings of the 37th annual meeting of the Association for Computational Linguistics. College Park, Maryland: Association for Computational Linguistics; 1999. p. 3–10.

Morton S, Levit L, Berg A, Eden J. Finding what works in health care: standards for systematic reviews. Washington D.C.: National Academies Press; 2011. Available at [ http://www.nap.edu/catalog/13059/finding-what-works-in-health-care-standards-for-systematic-reviews ]

Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA. 1996;276(8):637–9.

Article   CAS   PubMed   Google Scholar  

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem Lab Med. 2003;41(1):68–73. doi: 10.1515/CCLM.2003.012 .

Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3.

CAS   PubMed   Google Scholar  

Dawes M, Pluye P, Shea L, Grad R, Greenberg A, Nie J-Y. The identification of clinically important elements within medical journal abstracts: Patient–Population–Problem, Exposure–Intervention, Comparison, Outcome, Duration and Results (PECODR). Inform Prim Care. 2007;15(1):9–16.

PubMed   Google Scholar  

Kim S, Martinez D, Cavedon L, Yencken L. Automatic classification of sentences to support evidence based medicine. BMC Bioinform. 2011;12 Suppl 2:S5.

Article   Google Scholar  

Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3(1):25.

Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning. 2001. p. 282–9. %L 3140.

Boudin F, Nie JY, Bartlett JC, Grad R, Pluye P, Dawes M. Combining classifiers for robust PICO element detection. BMC Med Inform Decis Mak. 2010;10:29. doi: 10.1186/1472-6947-10-29 .

Huang K-C, Liu C-H, Yang S-S, Liao C-C, Xiao F, Wong J-M, et al, editors. Classification of PICO elements by text features systematically extracted from PubMed abstracts. Granular Computing (GrC), 2011 IEEE International Conference on; 2011: IEEE.

Verbeke M, Van Asch V, Morante R, Frasconi P, Daelemans W, De Raedt L, editors. A statistical relational learning approach to identifying evidence based medicine categories. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012: Association for Computational Linguistics.

Huang K-C, Chiang IJ, Xiao F, Liao C-C, Liu CC-H, Wong J-M. PICO element detection in medical text without metadata: are first sentences enough? J Biomed Inform. 2013;46(5):940–6.

Hassanzadeh H, Groza T, Hunter J. Identifying scientific artefacts in biomedical literature: the evidence based medicine use case. J Biomed Inform. 2014;49:159–70.

Robinson DA. Finding patient-oriented evidence in PubMed abstracts. Athens: University of Georgia; 2012.

Chung GY-C. Towards identifying intervention arms in randomized controlled trials: extracting coordinating constructions. J Biomed Inform. 2009;42(5):790–800.

Hara K, Matsumoto Y. Extracting clinical trial design information from MEDLINE abstracts. N Gener Comput. 2007;25(3):263–75.

Zhao J, Bysani P, Kan MY. Exploiting classification correlations for the extraction of evidence-based practice information. AMIA Annu Symp Proc. 2012;2012:1070–8.

PubMed   PubMed Central   Google Scholar  

Hsu W, Speier W, Taira R. Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature. AMIA Annu Symp Proc. 2012;2012:350–9.

Song MH, Lee YH, Kang UG. Comparison of machine learning algorithms for classification of the sentences in three clinical practice guidelines. Healthcare Informatics Res. 2013;19(1):16–24.

Marshall IJ, Kuiper J, Wallace BC, editors. Automating risk of bias assessment for clinical trials. Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics; 2014: ACM.

Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007;33(1):63–103.

Kelly C, Yang H. A system for extracting study design parameters from nutritional genomics abstracts. J Integr Bioinform. 2013;10(2):222. doi: 10.2390/biecoll-jib-2013-222 .

Hansen MJ, Rasmussen NO, Chung G. A method of extracting the number of trial participants from abstracts describing randomized controlled trials. J Telemed Telecare. 2008;14(7):354–8. doi: 10.1258/jtt.2008.007007 .

Joachims T. Text categorization with support vector machines: learning with many relevant features, Machine Learning: ECML-98, Tenth European Conference on Machine Learning. 1998. p. 137–42.

Xu R, Garten Y, Supekar KS, Das AK, Altman RB, Garber AM. Extracting subject demographic information from abstracts of randomized clinical trial reports. 2007.

Eddy SR. Hidden Markov models. Curr Opin Struct Biol. 1996;6(3):361–5.

Summerscales RL, Argamon S, Hupert J, Schwartz A. Identifying treatments, groups, and outcomes in medical abstracts. The Sixth Midwest Computational Linguistics Colloquium (MCLC 2009). 2009.

Summerscales R, Argamon S, Bai S, Huperff J, Schwartzff A. Automatic summarization of results from clinical trials, the 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2011. p. 372–7.

Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak. 2010;10:56.

Restificar A, Ananiadou S. Inferring appropriate eligibility criteria in clinical trial protocols without labeled data, Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. 2012. ACM.

Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3(4–5):993–1022.

Lin S, Ng J-P, Pradhan S, Shah J, Pietrobon R, Kan M-Y, editors. Extracting formulaic and free text clinical research articles metadata using conditional random fields. Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents; 2010: Association for Computational Linguistics.

De Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I, editors. Automated information extraction of key trial design elements from clinical trial publications. AMIA Annual Symposium Proceedings; 2008: American Medical Informatics Association.

Zhu H, Ni Y, Cai P, Qiu Z, Cao F. Automatic extracting of patient-related attributes: disease, age, gender and race. Stud Health Technol Inform. 2011;180:589–93.

Davis-Desmond P, Mollá D, editors. Detection of evidence in clinical research papers. Proceedings of the Fifth Australasian Workshop on Health Informatics and Knowledge Management-Volume 129; 2012: Australian Computer Society, Inc.

Tsafnat G, Glasziou P, Choong M, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014;3(1):74.

Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synthesis Methods. 2011;2(1):1–14.

Slaughter L, Berntsen CF, Brandt L, Mavergames C. Enabling living systematic reviews and clinical guidelines through semantic technologies. D-Lib Magazine. 2015;21(1/2). Available at [ http://www.dlib.org/dlib/january15/slaughter/01slaughter.html ]

Tsafnat G, Dunn A, Glasziou P, Coiera E. The automation of systematic reviews. BMJ. 2013;346:f139.

O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5.

Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11(1):55.

Wallace BC, Small K, Brodley CE, Trikalinos TA, editors. Active learning for biomedical citation screening. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining; 2010: ACM.

Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. J Biomed Inform. 2014;51:242–53.

Jonnalagadda S, Petitti D. A new iterative method to reduce workload in systematic review process. Int J Comput Biol Drug Des. 2013;6(1–2):5–17. doi: 10.1504/IJCBDD.2013.052198 .

Cohen A, Adams C, Davis J, Yu C, Yu P, Meng W, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. Proceedings of the 1st ACM International Health Informatics Symposium. 2010:376–80.

Choong MK, Galgani F, Dunn AG, Tsafnat G. Automatic evidence retrieval for systematic reviews. J Med Inter Res. 2014;16(10):e223.

Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13(2):206–19.

Article   CAS   PubMed   PubMed Central   Google Scholar  

García Adeva JJ, Pikatza Atxa JM, Ubeda Carrillo M, Ansuategi ZE. Automatic text classification to support systematic reviews in medicine. Expert Syst Appl. 2014;41(4):1498–508.

Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara‐Eves A, et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synthesis Methods. 2014;5(1):31–49.

Cullen RJ. In search of evidence: family practitioners’ use of the Internet for clinical information. J Med Libr Assoc. 2002;90(4):370–9.

Hersh WR, Hickam DH. How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. JAMA. 1998;280(15):1347–52.

Lucas BP, Evans AT, Reilly BM, Khodakov YV, Perumal K, Rohr LG, et al. The impact of evidence on physicians’ inpatient treatment decisions. J Gen Intern Med. 2004;19(5 Pt 1):402–9. doi: 10.1111/j.1525-1497.2004.30306.x .

Magrabi F, Coiera EW, Westbrook JI, Gosling AS, Vickland V. General practitioners’ use of online evidence during consultations. Int J Med Inform. 2005;74(1):1–12. doi: 10.1016/j.ijmedinf.2004.10.003 .

McColl A, Smith H, White P, Field J. General practitioner’s perceptions of the route to evidence based medicine: a questionnaire survey. BMJ. 1998;316(7128):361–5.

Pluye P, Grad RM, Dunikowski LG, Stephenson R. Impact of clinical information-retrieval technology on physicians: a literature review of quantitative, qualitative and mixed methods studies. Int J Med Inform. 2005;74(9):745–68. doi: 10.1016/j.ijmedinf.2005.05.004 .

Rothschild JM, Lee TH, Bae T, Bates DW. Clinician use of a palmtop drug reference guide. J Am Med Inform Assoc. 2002;9(3):223–9.

Rousseau N, McColl E, Newton J, Grimshaw J, Eccles M. Practice based, longitudinal, qualitative interview study of computerised evidence based guidelines in primary care. BMJ. 2003;326(7384):314.

Westbrook JI, Coiera EW, Gosling AS. Do online information retrieval systems help experienced clinicians answer clinical questions? J Am Med Inform Assoc. 2005;12(3):315–21. doi: 10.1197/jamia.M1717 .

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. doi: 10.1371/journal.pmed.1000326 .

Lau J. Evidence-based medicine and meta-analysis: getting more out of the literature. In: Greenes RA, editor. Clinical decision support: the road ahead. 2007. p. 249.

Fraser AG, Dunstan FD. On the impossibility of being expert. BMJ (Clinical Res). 2010;341:c6815.

Ely JW, Osheroff JA, Chambliss ML, Ebell MH, Rosenbaum ME. Answering physicians’ clinical questions: obstacles and potential solutions. J Am Med Inform Assoc. 2005;12(2):217–24. doi: 10.1197/jamia.M1608 .

Ely JW, Osheroff JA, Maviglia SM, Rosenbaum ME. Patient-care questions that physicians are unable to answer. J Am Med Inform Assoc. 2007;14(4):407–14. doi: 10.1197/jamia.M2398 .

Download references

Author information

Authors and affiliations.

Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 North Lake Shore Drive, 11th Floor, Chicago, IL, 60611, USA

Siddhartha R. Jonnalagadda

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India

Pawan Goyal

Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, USA

Mark D. Huffman

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Siddhartha R. Jonnalagadda .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

SRJ and PG had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design were done by SRJ. SRJ, PG, and MDH did the acquisition, analysis, or interpretation of data. SRJ and PG drafted the manuscript. SRJ, PG, and MDH did the critical revision of the manuscript for important intellectual content. SRJ obtained funding. PG and SRJ provided administrative, technical, or material support. SRJ did the study supervision. All authors read and approved the final manuscript.

Funding/Support

This project was partly supported by the National Library of Medicine (grant 5R00LM011389). The Cochrane Heart Group US Satellite at Northwestern University is supported by an intramural grant from the Northwestern University Feinberg School of Medicine.

Role of the sponsors

The funding source had no role in the design and conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine.

Additional contributions

Mark Berendsen (Research Librarian, Galter Health Sciences Library, Northwestern University Feinberg School of Medicine) provided insights on the design of this study, including the search strategies, and Dr. Kalpana Raja (Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine) reviewed the manuscript. None of them received compensation for their contributions.

Search strategies

Below, we provide the search strategies used in PubMed, ACM Digital Library, and IEEExplore. The search was conducted on January 6, 2015.

(“identification” [Title] OR “extraction” [Title] OR “extracting” [Title] OR “detection” [Title] OR “identifying” [Title] OR “summarization” [Title] OR “learning approach” [Title] OR “automatically” [Title] OR “summarization” [Title] OR “identify sections” [Title] OR “learning algorithms” [Title] OR “Interpreting” [Title] OR “Inferring” [Title] OR “Finding” [Title] OR “classification” [Title]) AND (“medical evidence”[Title] OR “PICO”[Title] OR “PECODR” [Title] OR “intervention arms” [Title] OR “experimental methods” [Title] OR “study design parameters” [Title] OR “Patient oriented Evidence” [Title] OR “eligibility criteria” [Title] OR “clinical trial characteristics” [Title] OR “evidence based medicine” [Title] OR “clinically important elements” [Title] OR “evidence based practice” [Title] “results from clinical trials” [Title] OR “statistical analyses” [Title] OR “research results” [Title] OR “clinical evidence” [Title] OR “Meta Analysis” [Title] OR “Clinical Research” [Title] OR “medical abstracts” [Title] OR “clinical trial literature” [Title] OR ”clinical trial characteristics” [Title] OR “clinical trial protocols” [Title] OR “clinical practice guidelines” [Title]).

We performed this search only in the metadata.

(“identification” OR “extraction” OR “extracting” OR “detection” OR “Identifying” OR “summarization” OR “learning approach” OR “automatically” OR “summarization” OR “identify sections” OR “learning algorithms” OR “Interpreting” OR “Inferring” OR “Finding” OR “classification”) AND (“medical evidence” OR “PICO” OR “intervention arms” OR “experimental methods” OR “eligibility criteria” OR “clinical trial characteristics” OR “evidence based medicine” OR “clinically important elements” OR “results from clinical trials” OR “statistical analyses” OR “clinical evidence” OR “Meta Analysis” OR “clinical research” OR “medical abstracts” OR “clinical trial literature” OR “clinical trial protocols”).

ACM digital library

((Title: “identification” or Title: “extraction” or Title: “extracting” or Title: “detection” or Title: “Identifying” or Title: “summarization” or Title: “learning approach” or Title: “automatically” or Title: “summarization “or Title: “identify sections” or Title: “learning algorithms” or Title: “scientific artefacts” or Title: “Interpreting” or Title: “Inferring” or Title: “Finding” or Title: “classification” or “statistical techniques”) and (Title: “medical evidence” or Abstract: “medical evidence” or Title: “PICO” or Abstract: “PICO” or Title: “intervention arms” or Title: “experimental methods” or Title: “study design parameters” or Title: “Patient oriented Evidence” or Abstract: “Patient oriented Evidence” or Title: “eligibility criteria” or Abstract: “eligibility criteria” or Title: “clinical trial characteristics” or Abstract: “clinical trial characteristics” or Title: “evidence based medicine” or Abstract: “evidence based medicine” or Title: “clinically important elements” or Title: “evidence based practice” or Title: “treatments” or Title: “groups” or Title: “outcomes” or Title: “results from clinical trials” or Title: “statistical analyses” or Abstract: “statistical analyses” or Title: “research results” or Title: “clinical evidence” or Abstract: “clinical evidence” or Title: “Meta Analysis” or Abstract:“Meta Analysis” or Title:“Clinical Research” or Title: “medical abstracts” or Title: “clinical trial literature” or Title: “Clinical Practice” or Title: “clinical trial protocols” or Abstract: “clinical trial protocols” or Title: “clinical questions” or Title: “clinical trial design”)).

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Jonnalagadda, S.R., Goyal, P. & Huffman, M.D. Automating data extraction in systematic reviews: a systematic review. Syst Rev 4 , 78 (2015). https://doi.org/10.1186/s13643-015-0066-7

Download citation

Received : 20 March 2015

Accepted : 21 May 2015

Published : 15 June 2015

DOI : https://doi.org/10.1186/s13643-015-0066-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Support Vector Machine
  • Data Element
  • Conditional Random Field
  • PubMed Abstract
  • Systematic Review Process

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

data extraction tools for literature reviews

  • Mayo Clinic Libraries
  • Systematic Reviews
  • Data Extraction

Systematic Reviews: Data Extraction

  • Knowledge Synthesis Comparison
  • Knowledge Synthesis Decision Tree
  • Standards & Reporting Results
  • Materials in the Mayo Clinic Libraries
  • Training Resources
  • Review Teams
  • Develop & Refine Your Research Question
  • Develop a Timeline
  • Project Management
  • Communication
  • PRISMA-P Checklist
  • Eligibility Criteria
  • Register your Protocol
  • Other Resources
  • Other Screening Tools
  • Grey Literature Searching
  • Citation Searching
  • Data Extraction Tools
  • Minimize Bias
  • Critical Appraisal by Study Design
  • Synthesis & Meta-Analysis
  • Publishing your Systematic Review

Data Collection

data extraction tools for literature reviews

"data slide" by bionicteaching is licensed under CC BY-NC 2.0

This stage of the systematic review process involves transcribing information from each study using a structured piloted format designed to consistently and objectively capture the relevant details.  Two reviewers working independently are preferred for accuracy.  Data must be managed appropriately in a transparent way and available for future updates of the systematic review and for data sharing. A sampling of data collection tools are listed here .

Data Extraction Elements :

  • Consider your research question components and objectives
  • Consider study inclusion / exclusion criteria
  • Full citation 
  • Intervention
  • Study Design and methodology
  • Participant characteristics
  • Outcome measures
  • Study quality factors

Consult Cochrane Interactive Learning Module 4: Selecting Studies and Collecting Data for further information.  *Please note you will need to register for a Cochrane account while initially on the Mayo network. You'll receive an email message containing a link to create a password and activate your account.*

References & Recommended Reading

1.      Li T, Higgins JPT, Deeks JJ. Collecting data. In: Higgins J, Thomas J, Chandler J, et al, eds. Cochrane Handbook for Systematic Reviews of Interventions. version 6.2 ed. Cochrane; 2021:chap 5. https://training.cochrane.org/handbook/current/chapter-05

2.      Page MJ, Moher D, Bossuyt PM, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ (Clinical research ed) . 2021;372:n160. doi:https://dx.doi.org/10.1136/bmj.n160

         See Item 9 – Data Collection Process and Item 10 – Data Items

3.      Buchter RB, Weise A, Pieper D. Development, testing and use of data extraction forms in systematic reviews: a review of methodological guidance. BMC medical research methodology . 2020;20(1):259. doi:https://dx.doi.org/10.1186/s12874-020-01143-3

4.      Mathes T, Klasen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: a methodological review. BMC medical research methodology . 2017;17(1):152. doi:https://dx.doi.org/10.1186/s12874-017-0431-4

5.      Hartling L. Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool . BMC Med Res Methodol. 2021 Aug 16;21(1):169. doi: 10.1186/s12874-021-01354-2. PMID: 34399684; PMCID: PMC8369614.

  • << Previous: Citation Searching
  • Next: Data Extraction Tools >>
  • Last Updated: Apr 12, 2024 9:46 AM
  • URL: https://libraryguides.mayo.edu/systematicreviewprocess

web1.jpg

We generate robust evidence fast

What is silvi.ai    .

Silvi is an end-to-end screening and data extraction tool supporting Systematic Literature Review and Meta-analysis.

Silvi helps create systematic literature reviews and meta-analyses that follow Cochrane guidelines in a highly reduced time frame, giving a fast and easy overview. It supports the user through the full process, from literature search to data analyses. Silvi is directly connected with databases such as PubMed and ClinicalTrials.gov and is always updated with the latest published research. It also supports RIS files, making it possible to upload a search string from your favorite search engine (i.e., Ovid). Silvi has a tagging system that can be tailored to any project.

Silvi is transparent, meaning it documents and stores the choices (and the reasons behind them) the user makes. Whether publishing the results from the project in a journal, sending them to an authority, or collaborating on the project with several colleagues, transparency is optimal to create robust evidence.

Silvi is developed with the user experience in mind. The design is intuitive and easily available to new users. There is no need to become a super-user. However, if any questions should arise anyway, we have a series of super short, instructional videos to get back on track.

To see Silvi in use, watch our short introduction video.

  Short introduction video  

data extraction tools for literature reviews

Learn more about Silvi’s specifications here.

"I like that I can highlight key inclusions and exclusions which makes the screening process really quick - I went through 2000+ titles and abstracts in just a few hours"

Eishaan Kamta Bhargava 

Consultant Paediatric ENT Surgeon, Sheffield Children's Hospital

"I really like how intuitive it is working with Silvi. I instantly felt like a superuser."

Henriette Kristensen

Senior Director, Ferring Pharmaceuticals

"The idea behind Silvi is great. Normally, I really dislike doing literature reviews, as they take up huge amounts of time. Silvi has made it so much easier! Thanks."

Claus Rehfeld

Senior Consultant, Nordic Healthcare Group

"AI has emerged as an indispensable tool for compiling evidence and conducting meta-analyses. Silvi.ai has proven to be the most comprehensive option I have explored, seamlessly integrating automated processes with the indispensable attributes of clarity and reproducibility essential for rigorous research practices."

Martin Södermark

M.Sc. Specialist in clinical adult psychology

weba.jpg

Silvi.ai was founded in 2018 by Professor in Health Economic Evidence, Tove Holm-Larsen, and expert in Machine Learning, Rasmus Hvingelby. The idea for Silvi stemmed from their own research, and the need to conduct systematic literature reviews and meta-analyses faster.

The ideas behind Silvi were originally a component of a larger project. In 2016, Tove founded the group “Evidensbaseret Medicin 2.0” in collaboration with researchers from Ghent University, Technical University of Denmark, University of Copenhagen, and other experts. EBM 2.0  wanted to optimize evidence-based medicine to its highest potential using Big Data and Artificial Intelligence, but needed a highly skilled person within AI.

Around this time, Tove met Rasmus, who shared the same visions. Tove teamed up with Rasmus, and Silvi.ai was created.

Our story  

Silvi ikon hvid (uden baggrund)

       Free Trial       

    No   card de t ails nee ded!  

A Guide to Evidence Synthesis: 10. Data Extraction

  • Meet Our Team
  • Our Published Reviews and Protocols
  • What is Evidence Synthesis?
  • Types of Evidence Synthesis
  • Evidence Synthesis Across Disciplines
  • Finding and Appraising Existing Systematic Reviews
  • 0. Develop a Protocol
  • 1. Draft your Research Question
  • 2. Select Databases
  • 3. Select Grey Literature Sources
  • 4. Write a Search Strategy
  • 5. Register a Protocol
  • 6. Translate Search Strategies
  • 7. Citation Management
  • 8. Article Screening
  • 9. Risk of Bias Assessment
  • 10. Data Extraction
  • 11. Synthesize, Map, or Describe the Results
  • Evidence Synthesis Institute for Librarians
  • Open Access Evidence Synthesis Resources

Data Extraction

Whether you plan to perform a meta-analysis or not, you will need to establish a regimented approach to extracting data. Researchers often use a form or table to capture the data they will then summarize or analyze. The amount and types of data you collect, as well as the number of collaborators who will be extracting it, will dictate which extraction tools are best for your project. Programs like Excel or Google Spreadsheets may be the best option for smaller or more straightforward projects, while systematic review software platforms can provide more robust support for larger or more complicated data.

It is recommended that you pilot your data extraction tool, especially if you will code your data, to determine if fields should be added or clarified, or if the review team needs guidance in collecting and coding data.

Data Extraction Tools

Excel is the most basic tool for the management of the screening and data extraction stages of the systematic review process. Customized workbooks and spreadsheets can be designed for the review process. A more advanced approach to using Excel for this purpose is the PIECES approach, designed by a librarian at Texas A&M. The PIECES workbook is downloadable at  this guide .

Covidence  is a software platform built specifically for managing each step of a systematic review project, including data extraction. Read more about how Covidence can help you customize extraction tables and export your extracted data.  

RevMan  is free software used to manage Cochrane reviews. For more information on RevMan, including an explanation of how it may be used to extract and analyze data, watch Introduction to RevMan - a guided tour .

SRDR  (Systematic Review Data Repository) is a Web-based tool for the extraction and management of data for systematic review or meta-analysis. It is also an open and searchable archive of systematic reviews and their data. Access the help page  for more information.

DistillerSR

DistillerSR is a systematic review management software program, similar to Covidence. It guides reviewers in creating project-specific forms, extracting, and analyzing data. 

JBI Sumari (the Joanna Briggs Institute System for the United Management, Assessment and Review of Information) is a systematic review software platform geared toward fields such as health, social sciences, and humanities. Among the other steps of a review project, it facilitates data extraction and data synthesis. View their short introductions to data extraction and analysis for more information.

The Systematic Review Toolbox

The SR Toolbox  is a community-driven, searchable, web-based catalogue of tools that support the systematic review process across multiple domains. Use the advanced search option to restrict to tools specific to data extraction. 

Additional Information

These resources offer additional information and examples of data extraction forms:​

Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for developing a coding scheme for meta-analysis.  Western Journal of Nursing Research ,  25 (2), 205–222. https://doi.org/10.1177/0193945902250038

Elamin, M. B., Flynn, D. N., Bassler, D., Briel, M., Alonso-Coello, P., Karanicolas, P. J., … Montori, V. M. (2009). Choice of data extraction tools for systematic reviews depends on resources and review complexity.  Journal of Clinical Epidemiology ,  62 (5), 506–510. https://doi.org/10.1016/j.jclinepi.2008.10.016

Higgins, J.P.T., & Deeks, J.J. (Eds.) (2011). Chapter 7 : Selecting studies and collecting data. In J.P.T.Higgins, & S. Green (Eds.),  Cochrane handbook for systematic reviews of interventions   Version 5.1.0 (updated March 2011). The Cochrane Collaboration. Available from www.handbook.cochrane.org .

Research guide from the George Washington University Himmelfarb Health Sciences Library: https://guides.himmelfarb.gwu.edu/c.php?g=27797&p=170447

  • << Previous: 9. Risk of Bias Assessment
  • Next: 11. Synthesize, Map, or Describe the Results >>
  • Last Updated: Apr 24, 2024 11:24 AM
  • URL: https://guides.library.cornell.edu/evidence-synthesis

utsc home

Gerstein Science Information Centre

Knowledge syntheses: systematic & scoping reviews, and other review types.

  • Before you start
  • Getting Started
  • Different Types of Knowledge Syntheses
  • Assemble a Team
  • Develop your Protocol
  • Eligibility Criteria
  • Screening for articles

Data extraction forms

Software for data extraction forms, designing data extraction forms.

  • Critical appraisal
  • What are Systematic Reviews?
  • What is a Meta-Analysis?
  • What are Scoping Reviews?
  • What are Rapid Reviews?
  • What are Realist Reviews?
  • What are Mapping Reviews?
  • What are Integrative Reviews?
  • What are Umbrella Reviews?
  • Standards and Guidelines
  • Supplementary Resources for All Review Types
  • Resources for Qualitative Synthesis
  • Resources for Quantitative Synthesis
  • Resources for Mixed Methods Synthesis
  • Bibliography
  • More Questions?
  • Common Mistakes in Systematic Reviews, scoping reviews, and other review types

The next step after completing the second stage of screening is for the researchers to read the full text of each article identified for inclusion in the review and extract the pertinent data using a standardized data extraction/coding form. The data extraction form should be as long or as short as necessary and can be coded for computer analysis if desired.

If you are writing a narrative review to summarise information reported in a small number of studies then you probably don't need to go to the trouble of coding the data variables for computer analysis but instead summarize the information from the data extraction forms for the included studies.

If you are conducting an analytical review with a meta-analysis to compare data outcomes from several clinical trials you may wish to computerize the data collection and analysis processes.  Reviewers can use fillable forms to collect and code data reported in the studies included in the review, the data can then be uploaded to analytical computer software such as Excel or SPSS for statistical analysis.  GW School of Medicine, School of Public Health, and School of Nursing faculty, staff, and students can use the various  statistical analytical software in the Himmelfarb Library , and watch online training videos from LinkedIn Learning  at the  Talent@GW  website to learn about how to perform statistical analysis with Excel and SPSS.

*Our librarians do not provide consultations on data extraction, however we're happy to provide you with the information and resources below.

The table below provides some software to help you create coded data extraction forms using templates.

If you prefer to design your own coded data extraction form from scratch,  Elamin et al. (2009)   offer advice on how to decide what electronic tools to use to extract data for analytical reviews.

The process of designing a coded data extraction form and codebook are described in Brown, Upchurch & Acton (2003)   and Brown et al (2013) .  You should assign a unique identifying number to each variable field so they can be programmed into fillable form fields in whatever software you decide to use for data extraction/collection.

The table below provides some software to help you design data extraction forms using templates.

You may choose to design and create your own coded fillable forms using Adobe Acrobat Pro or Microsoft Access. You might also like to include on the data extraction form a field for grading the quality of the study, see the reporting the quality/risk of bias   for examples of some of the quality scales you might choose to apply.

Three examples of data extraction forms are below:    

Data Extraction Form Example (suitable for small-scale literature review of a few dozen studies)

This form was used to gather data for a poster reporting a literature review of studies of interventions to increase Emergency Department throughput. To download the poster. 

Data Extraction Form for the Cochrane Review Group (uncoded & used to extract fine-detail/many variables)

This form illustrates the thoroughness of the Cochrane research methodology. You could devise a simpler one-page data extraction form for a more simple literature review.

Coded data extraction form 

Fillable form fields that can be computerized for data analysis.  see table 1 of brown, upchurch & acton (2013).

  • << Previous: Screening for articles
  • Next: Critical appraisal >>
  • Last Updated: Apr 16, 2024 1:53 PM
  • URL: https://guides.library.utoronto.ca/systematicreviews

Library links

  • Gerstein Home
  • U of T Libraries Home
  • Renew items and pay fines
  • Library hours
  • Contact Gerstein
  • University of Toronto Libraries
  • UT Mississauga Library
  • UT Scarborough Library
  • Information Commons
  • All libraries

Gerstein building

© University of Toronto . All rights reserved.

Connect with us

Follow us on Facebook

  • more social media

Unriddle

15 Best AI Literature Review Tools

Conducting a literature review? Make your research process easier with these literature review tools designed to enhance your review experience.

Table of Contents

What Is Literature Review

Explanation of literature review, purpose of literature review, literature review process, managing sources, analyzing sources, tools for analyzing, related reading, what are the 4 methods of literature review, 1. narrative review, 2. systematic review, 3. meta analysis, 4. scoping review, is there any tool for literature review, simplifying research with unriddle.ai, contextual understanding and source integration, enhancing writing with ai autocomplete, 1. unriddle.ai, 2. scienceopen, 3. semantic scholar, 4. scispace, 5. elicit.org, 7. the lens, 8. scite.ai, 9. citation gecko, 10. distellersr, 12. researchrabbit, 13. consensus, 14. read by qxmd, 15. colandr, complete step-by-step guide on how to use unriddle's ai research tool, interact with documents, automatic relations, citing your sources, writing with ai, chat settings, read faster & write better with unriddle for free today, understanding the complexities of unriddle in literature review tools, digging deeper into the features of unriddle in literature review tools, experience the transformative power of unriddle in literature review tools, embrace the future of literature review tools with unriddle.

15 Best AI Literature Review Tools

  • Literature Review Example
  • Literature Review Generator
  • Systematic Literature Review
  • Literature Review Outline Example
  • How To Write An Abstract For A Literature Review
  • How To Write A Literature Review
  • Thematic Literature Review
  • Systematic Review Vs Literature Review

Literature Review Tools

  • Google Scholar Alternative
  • Narrative Literature Review
  • Literature Review Abstract Example
  • Apa Literature Review Format
  • Literature Review Introduction Example
  • Ai Literature Review Generator
  • Paper Digest Literature Review
  • How To Write A Scientific Literature Review
  • Systematic Review Software
  • Literature Review Vs Annotated Bibliography
  • Review Of Related Literature
  • Systematic Literature Review Example
  • Dissertation Literature Review Example
  • Scientific Literature Review Example
  • Thesis Literature Review
  • Literature Review Topics
  • Consensus Ai Tool
  • Interact with documents via AI so you can quickly find and understand info.
  • Then start writing in a new Note and Unriddle will show you relevant content from your library as you type.
  • Get started by uploading a document .
  • Or read on for the full rundown.

notion image

  • Model: the machine learning model used to generate responses.
  • Temperature: the amount of creative license you give to the AI.
  • Max length: the maximum number of words generated in a response.
  • How To Do A Systematic Literature Review
  • Best Literature Review Writing Service
  • Methodology For Literature Review
  • Literature Review Online Tool
  • Methods Section Of Literature Review Example
  • Sources Of Literature Review
  • Chatgpt For Literature Review
  • Scoping Review Vs Literature Review
  • Scite Ai Assistant
  • How To Structure Literature Review

Share this post

Ready to take the next big step for your research?

Join 500K+ researchers now

Related posts.

Complete Literature Review Example Guides

Complete Literature Review Example Guides

Stuck on how to write a literature review? These literature review example articles showcase approaches and help guide you in crafting your own.

How To Write A Literature Review In 9 Simple Steps

How To Write A Literature Review In 9 Simple Steps

Need help on how to write a literature review? Follow these 9 simple steps to create a well-structured and comprehensive literature review.

How To Write A Systematic Literature Review In 7 Simple Step

How To Write A Systematic Literature Review In 7 Simple Step

Follow these seven steps to conduct a systematic literature review and gather valuable insights for your research project. Start today!

SYSTEMATIC REVIEW article

Predicting histologic grades for pancreatic neuroendocrine tumors by radiologic image-based artificial intelligence: a systematic review and meta-analysis.

Qian Yan,&#x;

  • 1 Department of General Surgery, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
  • 2 School of Medicine, South China University of Technology, Guangzhou, China
  • 3 Department of General Surgery, Heyuan People’s Hospital, Heyuan, China

Background: Accurate detection of the histological grade of pancreatic neuroendocrine tumors (PNETs) is important for patients’ prognoses and treatment. Here, we investigated the performance of radiological image-based artificial intelligence (AI) models in predicting histological grades using meta-analysis.

Method: A systematic literature search was performed for studies published before September 2023. Study characteristics and diagnostic measures were extracted. Estimates were pooled using random-effects meta-analysis. Evaluation of risk of bias was performed by the QUADAS-2 tool.

Results: A total of 26 studies were included, 20 of which met the meta-analysis criteria. We found that the AI-based models had high area under the curve (AUC) values and showed moderate predictive value. The pooled distinguishing abilities between different grades of PNETs were 0.89 [0.84-0.90]. By performing subgroup analysis, we found that the radiomics feature-only models had a predictive value of 0.90 [0.87-0.92] with I 2  = 89.91%, while the pooled AUC value of the combined group was 0.81 [0.77-0.84] with I 2  = 41.54%. The validation group had a pooled AUC of 0.84 [0.81-0.87] without heterogenicity, whereas the validation-free group had high heterogenicity (I 2  = 91.65%, P=0.000). The machine learning group had a pooled AUC of 0.83 [0.80-0.86] with I 2  = 82.28%.

Conclusion: AI can be considered as a potential tool to detect histological PNETs grades. Sample diversity, lack of external validation, imaging modalities, inconsistent radiomics feature extraction across platforms, different modeling algorithms and software choices were sources of heterogeneity. Standardized imaging, transparent statistical methodologies for feature selection and model development are still needed in the future to achieve the transformation of radiomics results into clinical applications.

Systematic Review Registration: https://www.crd.york.ac.uk/prospero/ , identifier CRD42022341852.

Introduction

Pancreatic neuroendocrine tumors (PNETs), which account for 3–5% of all pancreatic tumors, are a heterogeneous group of tumors derived from pluripotent stem cells of the neuroendocrine system ( 1 – 3 ). In the past 10 years, the incidence and prevalence of PNETs have steadily increased ( 4 – 6 ). Unlike malignant tumors, PNETs are heterogeneous: they range from indolent to aggressive ( 7 , 8 ). The World Health Organization (WHO) histological grading system is used to evaluate the features of PNETs, and a treatment strategy is developed accordingly ( 9 , 10 ). Therefore, accurate evaluation of the histological grade is crucial for patients with PNETs; non-invasive methods are helpful, especially for tumors that are difficult to biopsy.

The application of artificial intelligence (AI) to medicine is becoming more common; it is useful in areas such as radiology, pathology, genomics, and proteomics ( 11 – 14 ), with broad applications in disease diagnosis and treatment ( 15 – 18 ). Owing to developments in AI technology, radiomic analysis can now be used to predict PNETs grade, with promising results ( 19 , 20 ). A study by Guo et al. ( 21 ), which included 37 patients with PNETs, showed that the portal enhancement ratio, arterial enhancement ratio, mean grey-level intensity, kurtosis, entropy, and uniformity were significant predictors of histological grade. Luo et al. ( 22 ) found that by using specific computed tomography (CT) images, the deep learning (DL) algorithm achieved a higher accuracy rate than radiologists (73.12% vs. 58.1%) from G3 to G1/G2. Despite promising results, other studies with different methodologies have produced different findings. Thus, quantitative analysis will be valuable for comparing study efficacy and assessing the overall predictive power of AI in detecting the histological grade for PNETs.

In this review, we aimed to systematically summarize the latest literature on AI histological grade prediction for PNETs. By performing a meta-analysis, we aimed to evaluate AI accuracy and provide evidence for its clinical application and role in decision making.

Materials and methods

This combined systematic review and meta-analysis was based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. The study was registered in the Prospective Register of Systematic Reviews (PROSPERO ID: CRD42022341852).

Search strategy

Primary publications were extracted from multiple electronic databases (PubMed, MEDLINE, Cocorane and Web of Science) in September 2023 using radiomics/DL/machine learning (ML)/AI on CT/magnetic resonance imaging (MRI) examinations of PNETs grade. The search terms consisted of ML, AI, radiomics, or DL, along with PNETs grade. The detail of search string was as follows: (radiomics or machine learning or deep learning or artificial intelligence)and (PNETs or pancreatic neuroendocrine tumors). The reference lists of generated studies were then screened for eligibility.

Study selection

Two researchers determined the eligibility of each article by title and abstract evaluation and removed the duplicates. Case reports, non-original investigations (e.g., editorials, letters, and reviews), and studies that did not focus on the topic of interest were excluded. Based on the “PICOS” principle, the following inclusion criteria were designed. 1) All studies about PNETs grading which trained the models using only histology (and not biopsy) as gold standard were selected; 2) All PNETs grading predictive models built by AI were included. 3) Compared with physicians or models obtained from clinical and traditional imaging characteristics; 4) The main research purposes of the included studies were to differentiate the grades of PNETs; 5) Research types: case-control studies, cohort studies, nested case-control studies, and case-cohort studies; 6) English language. Exclusion criteria were: 1) Only the influencing factors were analyzed and a complete risk model was not built; 2) guides, case reports and non-original investigations (e.g., editorials, letters, meta-analyses and reviews); 3) other than English and animal studies. Any disagreements were resolved by consensus arbitrated by a third author.

Data extraction

Data extraction was performed independently by two reviewers, and any discrepancies were resolved by a third reviewer. The extracted data included first author, country, year of publication, study aim, study type, number of patients, sample size, validation, treatment, reference standard, imaging modality and features, methodology, model features and algorithm, software segmentation, and use of clinical information (e.g., age, tumor stage, and expression biomarkers). A detailed description of the true positive (TP), false positive (FP), true negative (TN), false negative (FN), sensitivity, and specificity were recorded. The AUC value of the validation group along with the 95% confidence interval (CI) or standard error (SE) of the model was also collected if reported.

Quality assessment

All included studies were independently assessed using the radiomics quality score (RQS), for image acquisition, radiomics feature extraction, data modeling, model validation, and data sharing. Each of the sixteen items was scored within a range of -8–36. Subsequently, the score was converted to a percentage, where -8 to 0 was defined as 0% and 36 as 100% ( 23 ).

The methodological quality of the included studies was accessed by the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) criteria ( 24 ). Two reviewers independently performed data extraction and quality assessment. Disagreements between the two reviewers were discussed at a research meeting until a consensus was reached.

Statistical analysis

Three software packages, Stata, version 12.0, MedCalc for Windows, version 16.4.3 (MedCalc Software, Ostend, Belgium), and RevMan, version 5.3.21 were used for statistical analysis. A bivariate meta-analysis model was employed to calculate the pooled sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR), respectively. The symmetric receiver operating characteristic (SROC) curve was generated. The I 2 value was used to assess statistical heterogeneity and estimate the percentage of variability among the included studies. An I 2 value >50% indicated substantial heterogeneity, and a random-effects model was used to analyze the differences within and between studies. In contrast, if the value was <50%, it signified less heterogeneity and a fixed-effects model was used ( 25 ). Meta-regression and subgroup analysis were conducted to explore the sources of heterogeneity. Moreover, the sensitivity analysis was also performed to evaluate the stability. Deeks’ funnel plot was used to examine publication bias. A p value less than 0.05 was considered significant. Fagan’s nomogram was employed to examine the post-test probability.

Literature selection

We retrieved 260 articles from PubMed and 156 from Web of Science; 137 were duplicates and were eliminated, resulting in 343 studies. After screening titles and abstracts, 85 potentially eligible articles were identified. After full-text review, six articles were excluded because of insufficient information; thus, 26 articles were included in this systematic review ( 21 , 22 , 26 – 49 ). Among them, six studies lacked information on positive and negative diagnosis values; therefore, only 20 articles were eligible for the meta-analysis. The results of the literature search are shown in Figure 1 .

www.frontiersin.org

Figure 1 Flowchart of the article selection.

Quality and risk bias assessment

As shown in Table 1 , the selected articles were published between 2018 and 2023. The RQS average total and relative scores were 9.58 (2–20) and 26.60% (5.56–55.56%), respectively. No validation group in 13 studies, and five were based on two datasets from more than two distinct institutes. Due to the lack of prospective studies, deficiency of phantom studies on all scanners, absence of imaging at multiple time points, shortness of cost-effectiveness analysis, and unavailable open science and data, all the 11 included studies obtained the point of zero in these items. A detailed report of the RQS allocated by the expert reader is presented in Supplementary Table S1 .

www.frontiersin.org

Table 1 Characteristic of all included studies.

Study quality and risk of bias were assessed using the QUADAS-2 criteria; the details are presented in Supplementary Figure S1 . A majority of studies showed a low or unclear risk of bias in each domain. In the Patient Selection domain, one study is at high risk, 25 studies are at moderate risk, and this risk mainly arises from “discontinuous patient inclusion”. In the Index Test domain, 9 studies are at moderate risk due to the insufficient information provided to make a judgment, while others were at low risk. In the Reference Standard domain, only one study is at high risk because some patients cannot be accurately categorized to the specific grading in this study. In the Flow and Timing domain, all were at low risk.

Publication bias

Deeks’ funnel plot asymmetry test was adopted to detect publication bias: no bias was detected within the meta-analysis (p=0.347, Figure 2 ).

www.frontiersin.org

Figure 2 Deeks’ funnel plot evaluating the potential publication bias (p=0.034).

Clinical diagnostic value of grading PNETs

As shown in Figure 3 , Fagan’s nomogram was useful for evaluating the diagnostic value of PNETs grade, and clinical application. The results showed an increase of post-test probability of the positive result (at 50%) to 81%, and a decrease of the negative result to 4%.

www.frontiersin.org

Figure 3 Fagan’s nomogram assessing the clinical diagnostic value of PNETs.

Study characteristics

Study characteristics are summarized in Table 1 . All studies employed a retrospective design, were published between 2018 and 2023, and the number of included patients was 32–270. Among the 26 studies, China was the main publication country (15 studies), followed by Italy (5 studies), the USA (3 studies), Korea (2 study), and Japan (1 study). Nineteen studies were based on CT and eight on MRI images, while two combined images from CT and MRI, and one applied for PET-CT. Thirteen of the 26 studies had validation sets; five were externally validated using data from another institute. The majority (20/26) used different kinds of ML classifications (such as Randon Rorest (RF); Support Vector Machine (SVM); Least absolute shrinkage and selection operator (LASSO) logistic regression), and two of them adopted Convolutional Neural Network (CNN). About half of the included studies (11/21) used models combined with clinical features (such as tumor size, tumor margin, TMN stage, etc.), while others used only radiomics features. Thirteen studies applied cross-validation to select stable features between observers.

The details of TP/FP/FN/TN and the models’ sensitivity and specificity are shown in Table 2 . The highest area under the curve (AUC) value of the AI-based validation model was 0.99 (95% CI: 0.97–1.00). Six studies offered no details regarding TP/FP/FN/TN, and the AUC value of four studies was incomplete; thus, all of these six studies were excluded in meta-analysis.

www.frontiersin.org

Table 2 Results for accuracy to predict grade of PNETs.

Meta-analysis

Overall performance of the ai models.

Twenty studies with 2639 patients were included in the meta-analysis, which provided data on TP/FP/FN/TN and model sensitivity and specificity, and 19 studies offered the AUC with 95% CI of the models. The results are reported in Tables 2 and 3 and Figure 4 . The AI models for PNETs showed an overall pooled sensitivity of 0.826 [0.759, 0.877], a pooled specificity of 0.812 [0.765, 0.851] and the pooled PLR and NLR were 4.382 [3.509, 5.472] and 0.215 [0.155, 0.298], respectively. Moreover, the pooled DOR was 20.387 [13.108, 31.706], and the AUC of the SROC curve was 0.89 [0.84-0.90] with I 2  = 90.42% [81.10-99.73], P=0.000.

www.frontiersin.org

Table 3 Subgroup analysis and estimates pooled of PNETs.

www.frontiersin.org

Figure 4 Pooled diagnostic accuracy of PNETs. (A, B) Forest plots of sensitivity, specificity; (C) . Summary receiver operator characteristic curve.

Subgroup analysis based on the image source and AI methodology

Meta-regression was conducted and found there was no significant differences between groups ( Supplementary Table S2 ). Then subgroup analysis was performed to compare studies evaluating the performance of different image sources: CT and MRI. Two models used both CT and MRI images; thus, 16 models extracted radiomic features from CT images and six models from MRI. The pooled SE, SP, PLR, and NLR were 0.849 [0.786, 0.895], 0.803 [0.748, 0.847], 4.297 [3.386, 5.451], and 0.189 [0.134, 0.266], respectively for CT models, and 0.791 [0.643, 0.888], 0.820 [0.764, 0.866], 4.407 [3.206, 6.058], 0.255 [0.141, 0.459], respectively for MRI models. The pooled DOR was 22.769 [14.707, 35.250] and 17.304 [7.713, 38.822] for CT and MRI models, respectively. The AUC of the SROC curve was 0.88 [0.85-0.91] with heterogeneity (I 2  = 79.25% [55.20-100.00], P=0.004) on CT images compared with MRI (AUC=0.83 [0.79-0.86], I 2  = 71.55%[36.80-100.00], P=0.015).

Subgroup analysis of different AI methodologies was used to compare algorithm architecture; most models not only applied ML classifiers, but more than one classifier. In total, 15 models were conducted using ML for PNETs. The pooled SE, SP, PLR, and NLR were 0.806 [0.727, 0.867, 0.789 [0.742, 0.829], 3.813 [3.156, 4.606], and 0.246 [0.175, 0.346], respectively. The pooled DOR was 15.508 [10.196, 23.589] and the AUC of the SROC curve was 0.84[0.81-0.87] with heterogenicity, I 2  = 89.88%[79.90-99.86]. Of the remaining three models for non-ML, the pooled AUC value was 0.89 [0.86-0.92] with I 2  = 28.80[0.00-100.00] ( Table 3 ).

There were ten models using cross-validation to select the best features and models. The group with cross-validation had a pooled AUC of 0.87 [0.83-0.91] with I 2  = 78.98%, while the group without was 0.88 [0.84-0.90] with I 2  = 75.30%. The pooled SE, SP, PLR, and NLR were 0.831 [0.784, 0.871], 0.785 [0.737, 0.828], 3.523 [2.812, 4.414] and 0.196 [0.127, 0.302], respectively for the cross-validation group, and 0.799 [0.670, 0.866], 0.835 [0.772, 0.884], 4.849 [3.365, 6.698], and 0.241 [0.141, 0.413], respectively for the group without ICC. The pooled DOR were 20.262 [12.084, 33.973] and 20.120 [9.171, 44.140] for the groups with and without ICC, respectively.

Subgroup analysis based on dataset characteristics

We also compared the models that included clinical data and by utilizing radiomics features only, and found that clinical features reduced heterogenicity. The pooled SE, SP, PLR, and NLR were 0.801 [0.707, 0.870], 0.795 [0.739, 0.842], 3.906 [2.983, 5.115], and 0.251 [0.166, 0.379], respectively for the group including clinical data, and 0.847 [0.747, 0.913], 0.829 [0.749, 0.888], 4.970 [3.349, 7.377], and 0.184 [0.109, 0.310], respectively for the radiomics-only group. The pooled DOR for the radiomics group was 27.034 [13.412, 54.492], and the AUC of the SROC curve was 0.81 [0.77-0.84] with I 2  = 41.54%, which was a little higher than that of the included clinical data group (DOR: 16.581 [9.466, 29.044]); AUC: 0.90 [0.87-0.92]) ( Table 3 ).

Moreover, 11 models were validated, while nine models were not. The pooled SE, SP, PLR, and NLR were 0.823 [0.754, 0.876], 0.799 [0.744, 0.846], 4.106 [3.128, 5.389], and 0.221 [0.155, 0.315], respectively for the validated group, and 0.836 [0.708, 0.914], 0.824 [0.741, 0.884], 4.741 [3.248, 6.920], and 0.199 [0.110,0.361], respectively for the control group. The pooled DOR for the validated group was 15.574 [8.579, 28.273] and 23.766 [11.504, 49.095] for the control group. The AUC of the SROC curve was 0.84 [0.81-0.87] without heterogeneity for the validation group and 0.89 [0.86-0.91] with I 2  = 91.65% for the control group.

In a subgroup analysis based on the number of patients, the pooled results of 12 models, which included >100 patients, were 0.815 [0.737, 0.874], 0.784 [0.735, 0.826], 3.769 [3.086, 4.603], and 0.236 [0.165, 0.337] for the pooled SE, SP, PLR, and NLR, respectively. For the remaining eight models, the pooled SE, SP, PLR, and NLR were 0.847 [0.715, 0.925], 0.871 [0.799, 0.920], 6.560 [4.224, 10.187], and 0.175 [0.091, 0.338], respectively. The pooled DOR and the AUC values for the two groups were 15.974 [10.228, 24.948] and 0.84 [0.81-0.87] vs. 37.404 [16.542, 84.577] and 0.91 [0.88-0.93] ( Table 3 ).

PNETs are a heterogeneous group of malignancies: they can be grouped into grades G1, G2, and G3 according to mitotic count and Ki-67 index ( 1 – 3 ). Accurate classification of PNETs grades is important for treatment selection, prognosis, and follow-up. However, due to the heterogeneity of PNETs, tumor grading may not be uniform within a single lesion or between different lesions in the same patient ( 7 , 8 ). Moreover, histology is currently the only validated tool to grade tumors and describe their characteristics; surgery and endoscopic biopsy are used clinically to analyze the histological grade of PNETs. However, it is difficult to perform a satisfactory biopsy for PNETs located around major vessels, or small tumors—especially using fine-needle aspiration biopsy ( 50 – 53 ). Therefore, the detection of histological grades based on radiological images is also an important diagnostic tool. With increasing AI application in medical fields, we believe that AI-based models can enhance the prediction value of tumor grading. To the best of our knowledge, we are only aware of few and insufficiently updated systematic review on this topic that has evaluated the diagnostic accuracy of radiomics.

In our study, we investigated the ability of imaging-based AI to detect PNETs histologic grading. Our results showed that AI-based grading of PNETs with an AUC of 0.89 [0.84 - 0.90] exhibited good performance but high heterogeneity (I 2  = 90.42% [81.10-99.73], P = 0.000). Among the included studies, we found considerable heterogeneity in pooled sensitivity and specificity. Moreover, according to our sensitive analysis, 3 articles ( 29 , 40 , 46 ) had poor robustness and may be one of the sources of heterogeneity ( Figure S2 ). There was no significant publication bias between studies.

The diagnostic performance of the radiomics model varied with the strategies employed. CT and MRI images are the main sources for analyzing PNETs. Because of its high availability and low cost, CT is widely used than MRI. In this study, we found that imaging techniques may be influencing factors of prediction power, but not independently so. CT was more commonly used (16 studies) and showed better performance than MRI (6 studies) in grading PNETs, with an AUC of 0.88 [0.85-0.91] vs. 0.83 [0.79-0.86]. Although unconfirmed, we speculate that CT may be more powerful for obtaining vessel enhancement characteristics and observing the neo-vascular distribution, which is useful in vascularly-rich PNETs ( 54 ). Future studies are needed to validate this finding. We had only one study applied PET-CT grading PNETs and found AUROC of 0.864 in the tumor grade prediction model which showed good forecasting ability ( 47 ). Thus, more investigation into PET-CT will be useful in developing AI models, which showing good predictive performance (AUC = 0.992) and can detect cell surface expression of somatostatin receptors ( 55 , 56 ).

Clinical data such as age, gender, tumor size, tumor shape, tumor margin and CT stage are closely related to the pathogenic process of PNETs and therefore should not be ignored in diagnostic models ( 27 – 29 , 47 , 49 ).,Liang et al. ( 37 ) built a combined model which can improve the performance (0.856, [0.730–0.939] vs. 0.885 [0.765–0.957]). Wang et al. ( 42 ) found that the addition of clinical features can improve the radiomics models (from 0.837 [0.827–0.847] to 0.879 [0.869–0.889]). However, we found that including clinical factors did not always result in better performance but did decrease the heterogenicity (AUC of 0.81 [0.77-0.84] with I 2  = 41.94% vs. 0.90 [0.87-0.92] with I 2  = 89.91%). This may due to the data are processed differently, such as age or other clinical numerical data can be easily quantified by radiomic modeling (i.e., age as a variable in an algorithm or function). And in clinical models, age regarded as risk factors always varied in different situations. Therefore, future radiomics analyses should incorporate clinical features to create more reliable models or add radiomics features to existing diagnostic models to validate their true diagnostic power.

The lack of standardized quality control and reporting throughout the workflow limits the application of radiomics ( 17 , 57 ). For example, validation/testing data must remain completely independent or hidden until validation/testing is performed in order to create generalizable predictive models at each step of a radiomics study. In our study, 11 studies of 20 had validation set and only 3 had external validation. Lack of proper external validation would influence the transportability and generalizability of the models in the studies and also hamper the conclusions of the review. Moreover, according to our findings, lacking validation sets was also one of the main causes of heterogeneity. There should be no direct comparison between the results obtained by studying only the primary cohort and those obtained by studying both the primary and validation cohorts. Validated models should be considered more reliable and promising, even if the reported performance is lower.

As shown in Table 1 , there were also a wide variety of feature extraction and model selection methods, and although AI classifiers did not show outstanding diagnostic performance in our evaluation, it is undeniably a future research direction and trend. Most of the included studies used more than one machine learning or deep learning for feature selection or classification, but the best performing AI classifiers varied from study to study. To date, there are no universal and well-recognized classifiers, and the characteristics of the samples are a key factor affecting the performance of classifiers ( 58 , 59 ). Finding uniform and robust classifiers for specific medical problems has always been a challenge.

Despite the encouraging results of this meta-analysis, the overall methodological quality of the included literature was poor, reducing the reliability and reproducibility of radiomics models for clinical applications. Lack of prospective studies with scanner modeling studies, lack of imaging studies at multiple time points, insufficient validation and calibration validity of the models, short time frame for cost-effectiveness analyses, insufficient cost-effectiveness analyses, and lack of publicly available science and data contributed to the low RQS scores. In addition, only half of the studies were internally validated and less independent external validation. To further standardize the process and improve the quality of radiomics, the RQS should be used not only to assess the methodological quality of radiomics studies, but also to guide the design of radiomics studies ( 17 ).

Diversity of the samples, inconsistencies with radiomics feature extraction across platforms, different modeling algorithms, and simultaneous incorporation of clinical features may all account for the high heterogenicity of the combined models. According to our sub-analysis results, the heterogenicity mainly came from the different imaging materials (CT vs MRI), the algorithm architecture (ML vs non-ML), whether validated or not and clinical features included. Thus, standardized imaging, a standardized independent and robust set of features, as well as validation even external validation are all approaches to lower the heterogenicity and highlights for attention in future research. To sum, the AI method was effective in the preoperative prediction of PNETs grade; this may help with the understanding of tumor behavior, and facilitate vision-making in clinical practice.

Our study has several limitations. First, most included studies were single-center and retrospective, inevitably causing patient selection bias. Second, different methods were investigated, including the type of imaging scans utilized, the type and number of radiological features studied, the choice of software, and the type of analysis/methods implemented, thus leading to the high heterogeneity among studies. Therefore, some pooled estimates of the quantitative results must be interpreted with caution. Further prospective studies could validate these results; a stable method of data extraction and analysis is important for developing a reproducible AI model.

Conclusions

Overall, this meta-analysis demonstrated the value of AI models in predicting PNETs grading. According to our result, diversity of the samples, lack of external validation, imaging modalities, inconsistencies with radiomics feature extraction across platforms, different modeling algorithms and the choice of software all are sources of heterogeneity. Thus, standardized imaging, as well as a standardized, independent and robust set of features will be important for future application. Multi-center, large-sample, randomized clinical trials could be used to confirm the predictive power of image-based AI systems in clinical practice. To sum, AI can be considered as a potential tool to detect histological PNETs grades.

Data availability statement

The original contributions presented in the study are included in the article/ Supplementary Material . Further inquiries can be directed to the corresponding authors.

Author contributions

QY: Conceptualization, Methodology, Visualization, Writing – original draft. YC: Data curation, Methodology, Software, Writing – original draft. CL: Data curation, Investigation, Software, Writing – original draft. HS: Data curation, Investigation, Validation, Writing – original draft. MH: Formal analysis, Software, Writing – original draft. ZW: Investigation, Validation, Writing – original draft. SH: Data curation, Formal analysis, Funding acquisition, Writing – review & editing. CZ: Funding acquisition, Project administration, Supervision, Writing – review & editing. BH: Funding acquisition, Project administration, Supervision, Writing – review & editing.

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This study was supported by the Natural Science Foundation of China (82102961, 82072635 and 82072637), High-level Hospital Construction Research Project of Heyuan People's Hospital (YNKT202202), the Science and Technology Program of Heyuan (23051017147335), the Science and Technology Program of Guangzhou (2024A04J10016 and 202201011642).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2024.1332387/full#supplementary-material

Supplementary Figure 1 | The quality assessment of 26 included studies by QUADAS-2 tool.

Supplementary Figure 2 | The sensitive analysis of 26 included studies.

1. Chang A, Sherman SK, Howe JR, Sahai V. Progress in the management of pancreatic neuroendocrine tumors. Annu Rev Med . (2022) 73:213–29. doi: 10.1146/annurev-med-042320-011248

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Ma ZY, Gong YF, Zhuang HK, Zhou ZX, Huang SZ, Zou YP, et al. Pancreatic neuroendocrine tumors: A review of serum biomarkers, staging, and management. World J Gastroenterol . (2020) 26:2305–22. doi: 10.3748/wjg.v26.i19.2305

3. Cives M, Strosberg JR. Gastroenteropancreatic neuroendocrine tumors. CA Cancer J Clin . (2018) 68:471–87. doi: 10.3322/caac.21493

4. Pulvirenti A, Marchegiani G, Pea A, Allegrini V, Esposito A, Casetti L, et al. Clinical implications of the 2016 international study group on pancreatic surgery definition and grading of postoperative pancreatic fistula on 775 consecutive pancreatic resections. Ann Surg . (2018) 268:1069–75. doi: 10.1097/SLA.0000000000002362

5. Fan JH, Zhang YQ, Shi SS, Chen YJ, Yuan XH, Jiang LM, et al. A nation-wide retrospective epidemiological study of gastroenteropancreatic neuroendocrine neoplasms in China. Oncotarget . (2017) 8:71699–708. doi: 10.18632/oncotarget.17599

6. Yao JC, Hassan M, Phan A, Dagohoy C, Leary C, Mares JE, et al. One hundred years after "carcinoid": epidemiology of and prognostic factors for neuroendocrine tumors in 35,825 cases in the United States. J Clin Oncol . (2008) 26:3063–72. doi: 10.1200/JCO.2007.15.4377

7. Yang Z, Tang LH, Klimstra DS. Effect of tumor heterogeneity on the assessment of Ki67 labeling index in well-differentiated neuroendocrine tumors metastatic to the liver: implications for prognostic stratification. Am J Surg Pathol . (2011) 35:853–60. doi: 10.1097/PAS.0b013e31821a0696

8. Partelli S, Gaujoux S, Boninsegna L, Cherif R, Crippa S, Couvelard A, et al. Pattern and clinical predictors of lymph node involvement in nonfunctioning pancreatic neuroendocrine tumors (NF-PanNETs). JAMA Surg . (2013) 148:932–9. doi: 10.1001/jamasurg.2013.3376

9. Marchegiani G, Landoni L, Andrianello S, Masini G, Cingarlini S, D'Onofrio M, et al. Patterns of recurrence after resection for pancreatic neuroendocrine tumors: who, when, and where? Neuroendocrinology . (2019) 108:161–71. doi: 10.1159/000495774

10. Nagtegaal ID, Odze RD, Klimstra D, Paradis V, Rugge M, Schirmacher P, et al. The 2019 WHO classification of tumours of the digestive system. Histopathology . (2020) 76:182–8. doi: 10.1111/his.13975

11. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer . (2018) 18:500–10. doi: 10.1038/s41568-018-0016-5

12. Jin P, Ji X, Kang W, Li Y, Liu H, Ma F, et al. Artificial intelligence in gastric cancer: a systematic review. J Cancer Res Clin Oncol . (2020) 146:2339–50. doi: 10.1007/s00432-020-03304-9

13. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat BioMed Eng . (2018) 2:719–31. doi: 10.1038/s41551-018-0305-z

14. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA . (2018) 319:1317–8. doi: 10.1001/jama.2017.18391

15. Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol . (2022) 23:40–55. doi: 10.1038/s41580-021-00407-0

16. Issa NT, Stathias V, Schürer S, Dakshanamurthy S. Machine and deep learning approaches for cancer drug repurposing. Semin Cancer Biol . (2021) 68:132–42. doi: 10.1016/j.semcancer.2019.12.011

17. Bezzi C, Mapelli P, Presotto L, Neri I, Scifo P, Savi A, et al. Radiomics in pancreatic neuroendocrine tumors: methodological issues and clinical significance. Eur J Nucl Med Mol Imaging . (2021) 48:4002–15. doi: 10.1007/s00259-021-05338-8

18. Rauschecker AM, Rudie JD, Xie L, Wang J, Duong MT, Botzolakis EJ, et al. Artificial intelligence system approaching neuroradiologist-level differential diagnosis accuracy at brain MRI. Radiology . (2020) 295:626–37. doi: 10.1148/radiol.2020190283

19. Caruso D, Polici M, Rinzivillo M, Zerunian M, Nacci I, Marasco M, et al. CT-based radiomics for prediction of therapeutic response to Everolimus in metastatic neuroendocrine tumors. Radiol Med . (2022) 127:691–701. doi: 10.1007/s11547-022-01506-4

20. Yang J, Xu L, Yang P, Wan Y, Luo C, Yen EA, et al. Generalized methodology for radiomic feature selection and modeling in predicting clinical outcomes. Phys Med Biol . (2021) 66:10.1088/1361-6560/ac2ea5. doi: 10.1088/1361-6560/ac2ea5

CrossRef Full Text | Google Scholar

21. Guo C, Zhuge X, Wang Z, Wang Q, Sun K, Feng Z, et al. Textural analysis on contrast-enhanced CT in pancreatic neuroendocrine neoplasms: association with WHO grade. Abdom Radiol (NY) . (2019) 44:576–85. doi: 10.1007/s00261-018-1763-1

22. Luo Y, Chen X, Chen J, Song C, Shen J, Xiao H, et al. Preoperative prediction of pancreatic neuroendocrine neoplasms grading based on enhanced computed tomography imaging: validation of deep learning with a convolutional neural network. Neuroendocrinology . (2020) 110:338–50. doi: 10.1159/000503291

23. Lambin P, Leijenaar RTH, Deist TM. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol . (2017) 14:749–62. doi: 10.1038/nrclinonc.2017.141

24. Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med . (2011) 155:529–36. doi: 10.7326/0003-4819-155-8-201110180-00009

25. Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta- analyses. BMJ . (2003) 327:557–60. doi: 10.1136/bmj.327.7414.557

26. Benedetti G, Mori M, Panzeri MM, Barbera M, Palumbo D, Sini C, et al. CT-derived radiomic features to discriminate histologic characteristics of pancreatic neuroendocrine tumors. Radiol Med . (2021) 126:745–60. doi: 10.1007/s11547-021-01333-z

27. Bian Y, Jiang H, Ma C, Wang L, Zheng J, Jin G, et al. CT-based radiomics score for distinguishing between grade 1 and grade 2 nonfunctioning pancreatic neuroendocrine tumors. AJR Am J Roentgenol . (2020) 215:852–63. doi: 10.2214/AJR.19.22123

28. Bian Y, Zhao Z, Jiang H, Fang X, Li J, Cao K, et al. Noncontrast radiomics approach for predicting grades of nonfunctional pancreatic neuroendocrine tumors. J Magn Reson Imaging . (2020) 52:1124–36. doi: 10.1002/jmri.27176

29. Bian Y, Li J, Cao K, Fang X, Jiang H, Ma C, et al. Magnetic resonance imaging radiomic analysis can preoperatively predict G1 and G2/3 grades in patients with NF-pNETs. Abdom Radiol (NY) . (2021) 46:667–80. doi: 10.1007/s00261-020-02706-0

30. Canellas R, Burk KS, Parakh A, Sahani DV. Prediction of pancreatic neuroendocrine tumor grade based on CT features and texture analysis. AJR Am J Roentgenol . (2018) 210:341–6. doi: 10.2214/AJR.17.18417

31. Choi TW, Kim JH, Yu MH, Park SJ, Han JK. Pancreatic neuroendocrine tumor: prediction of the tumor grade using CT findings and computerized texture analysis. Acta Radiol . (2018) 59:383–92. doi: 10.1177/0284185117725367

32. Gao X, Wang X. Deep learning for World Health Organization grades of pancreatic neuroendocrine tumors on contrast-enhanced magnetic resonance images: a preliminary study. Int J Comput Assist Radiol Surg . (2019) 14:1981–91. doi: 10.1007/s11548-019-02070-5

33. Gu D, Hu Y, Ding H, Wei J, Chen K, Liu H, et al. CT radiomics may predict the grade of pancreatic neuroendocrine tumors: a multicenter study. Eur Radiol . (2019) 29:6880–90. doi: 10.1007/s00330-019-06176-x

34. Guo CG, Ren S, Chen X, Wang QD, Xiao WB, Zhang JF, et al. Pancreatic neuroendocrine tumor: prediction of the tumor grade using magnetic resonance imaging findings and texture analysis with 3-T magnetic resonance. Cancer Manag Res . (2019) 11:1933–44. doi: 10.2147/CMAR

35. Liu C, Bian Y, Meng Y, Liu F, Cao K, Zhang H, et al. Preoperative prediction of G1 and G2/3 grades in patients with nonfunctional pancreatic neuroendocrine tumors using multimodality imaging. Acad Radiol . (2022) 29:e49–60. doi: 10.1016/j.acra.2021.05.017

36. Li W, Xu C, Ye Z. Prediction of pancreatic neuroendocrine tumor grading risk based on quantitative radiomic analysis of MR. Front Oncol . (2021) 11:758062. doi: 10.3389/fonc.2021.758062

37. Liang W, Yang P, Huang R, Xu L, Wang J, Liu W, et al. A combined nomogram model to preoperatively predict histologic grade in pancreatic neuroendocrine tumors. Clin Cancer Res . (2019) 25:584–94. doi: 10.1158/1078-0432.CCR-18-1305

38. Ohki K, Igarashi T, Ashida H, Takenaga S, Shiraishi M, Nozawa Y, et al. Usefulness of texture analysis for grading pancreatic neuroendocrine tumors on contrast-enhanced computed tomography and apparent diffusion coefficient maps. Jpn J Radiol . (2021) 39:66–75. doi: 10.1007/s11604-020-01038-9

39. D'Onofrio M, Ciaravino V, Cardobi N, De Robertis R, Cingarlini S, Landoni L, et al. CT enhancement and 3D texture analysis of pancreatic neuroendocrine neoplasms. Sci Rep . (2019) 9:2176. doi: 10.1038/s41598-018-38459-6

40. Pulvirenti A, Yamashita R, Chakraborty J, Horvat N, Seier K, McIntyre CA, et al. Quantitative computed tomography image analysis to predict pancreatic neuroendocrine tumor grade. JCO Clin Cancer Inform . (2021) 5:679–94. doi: 10.1200/CCI.20.00121

41. Ricci C, Mosconi C, Ingaldi C, Vara G, Verna M, Pettinari I, et al. The 3-dimensional-computed tomography texture is useful to predict pancreatic neuroendocrine tumor grading. Pancreas . (2021) 50:1392–9. doi: 10.1097/MPA.0000000000001927

42. Wang X, Qiu JJ, Tan CL, Chen YH, Tan QQ, Ren SJ, et al. Development and validation of a novel radiomics-based nomogram with machine learning to preoperatively predict histologic grade in pancreatic neuroendocrine tumors. Front Oncol . (2022) 12:843376. doi: 10.3389/fonc.2022.843376

43. Zhao Z, Bian Y, Jiang H, Fang X, Li J, Cao K, et al. CT-radiomic approach to predict G1/2 nonfunctional pancreatic neuroendocrine tumor. Acad Radiol . (2020) 27:e272–81. doi: 10.1016/j.acra.2020.01.002

44. Zhou RQ, Ji HC, Liu Q, Zhu CY, Liu R. Leveraging machine learning techniques for predicting pancreatic neuroendocrine tumor grades using biochemical and tumor markers. World J Clin Cases . (2019) 7:1611–22. doi: 10.12998/wjcc.v7.i13.1611

45. Chiti G, Grazzini G, Flammia F, Matteuzzi B, Tortoli P, Bettarini S, et al. Gastroenteropancreatic neuroendocrine neoplasms (GEP-NENs): a radiomic model to predict tumor grade. Radiol Med . (2022) 127:928–38. doi: 10.1007/s11547-022-01529-x

46. Mori M, Palumbo D, Muffatti F, Partelli S, Mushtaq J, Andreasi V, et al. Prediction of the characteristics of aggressiveness of pancreatic neuroendocrine neoplasms (PanNENs) based on CT radiomic features. Eur Radiol . (2023) 33:4412–21. doi: 10.1007/s00330-022-09351-9

47. Park YJ, Park YS, Kim ST, Hyun SH. A machine learning approach using [18F]FDG PET-based radiomics for prediction of tumor grade and prognosis in pancreatic neuroendocrine tumor. Mol Imaging Biol . (2023) 25:897–910. doi: 10.1007/s11307-023-01832-7

48. Javed AA, Zhu Z, Kinny-Köster B, Habib JR, Kawamoto S, Hruban RH, et al. Accurate non-invasive grading of nonfunctional pancreatic neuroendocrine tumors with a CT derived radiomics signature. Diagn Interv Imaging . (2024) 105:33–39. doi: 10.1016/j.diii.2023.08.002

49. Zhu HB, Zhu HT, Jiang L, Nie P, Hu J, Tang W, et al. Radiomics analysis from magnetic resonance imaging in predicting the grade of nonfunctioning pancreatic neuroendocrine tumors: a multicenter study. Eur Radiol . (2023) 34:90–102. doi: 10.1007/s00330-023-09957-7

50. Sadula A, Li G, Xiu D, Ye C, Ren S, Guo X, et al. Clinicopathological characteristics of nonfunctional pancreatic neuroendocrine neoplasms and the effect of surgical treatment on the prognosis of patients with liver metastases: A study based on the SEER database. Comput Math Methods Med . (2022) 2022:3689895. doi: 10.1155/2022/3689895

51. Wallace MB, Kennedy T, Durkalski V, Eloubeidi MA, Etamad R, Matsuda K, et al. Randomized controlled trial of EUS-guided fine needle aspiration techniques for the detection of Malignant lymphadenopathy. Gastrointest Endosc . (2001) 54:441–7. doi: 10.1067/mge.2001.117764

52. Canakis A, Lee LS. Current updates and future directions in diagnosis and management of gastroenteropancreatic neuroendocrine neoplasms. World J Gastrointest Endosc . (2022) 14:267–90. doi: 10.4253/wjge.v14.i5.267

53. Sallinen VJ, Le Large TYS, Tieftrunk E, Galeev S, Kovalenko Z, Haugvik SP, et al. Prognosis of sporadic resected small (≤2 cm) nonfunctional pancreatic neuroendocrine tumors—a multiinstitutional study. HPB . (2018) 20:251–9. doi: 10.1016/j.hpb.2017.08.034

54. Liu Y, Shi S, Hua J, Xu J, Zhang B, Liu J, et al. Differentiation of solid-pseudopapillary tumors of the pancreas from pancreatic neuroendocrine tumors by using endoscopic ultrasound. Clin Res Hepatol Gastroenterol . (2020) 44:947–53. doi: 10.1016/j.clinre.2020.02.002

55. Mapelli P, Bezzi C, Palumbo D, Canevari C, Ghezzo S, Samanes Gajate AM, et al. 68Ga-DOTATOC PET/MR imaging and radiomic parameters in predicting histopathological prognostic factors in patients with pancreatic neuroendocrine well-differentiated tumours. Eur J Nucl Med Mol Imaging . (2022) 49:2352–63. doi: 10.1007/s00259-022-05677-0

56. Atkinson C, Ganeshan B, Endozo R, Wan S, Aldridge MD, Groves AM, et al. Radiomics-based texture analysis of 68Ga-DOTATATE positron emission tomography and computed tomography images as a prognostic biomarker in adults with neuroendocrine cancers treated with 177Lu-DOTATATE. Front Oncol . (2021) 11:686235. doi: 10.3389/fonc.2021.686235

57. Jha AK, Mithun S, Sherkhane UB, Dwivedi P, Puts S, Osong B, et al. Emerging role of quantitative imaging (radiomics) and artificial intelligence in precision oncology. Explor Target Antitumor Ther . (2023) 4:569–82. doi: 10.37349/etat

58. Parmar C, Grossmann P, Bussink J, Lambin P, Aerts HJWL. Machine learning methods for quantitative radiomic biomarkers. Sci Rep . (2015) 5:13087. doi: 10.3389/fonc.2015.00272

59. Avanzo M, Wei L, Stancanello J, Vallières M, Rao A, Morin O, et al. Machine and deep learning methods for radiomics. Med Phys . (2020) 47:e185–202. doi: 10.1002/mp.13678

Keywords: pancreatic neuroendocrine tumors, meta-analysis, radiomics, machine learning, deep learning

Citation: Yan Q, Chen Y, Liu C, Shi H, Han M, Wu Z, Huang S, Zhang C and Hou B (2024) Predicting histologic grades for pancreatic neuroendocrine tumors by radiologic image-based artificial intelligence: a systematic review and meta-analysis. Front. Oncol. 14:1332387. doi: 10.3389/fonc.2024.1332387

Received: 02 November 2023; Accepted: 02 April 2024; Published: 23 April 2024.

Reviewed by:

Copyright © 2024 Yan, Chen, Liu, Shi, Han, Wu, Huang, Zhang and Hou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shanzhou Huang, [email protected] ; Chuanzhao Zhang, [email protected] ; Baohua Hou, [email protected]

† These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Open access
  • Published: 19 April 2024

Person-centered care assessment tool with a focus on quality healthcare: a systematic review of psychometric properties

  • Lluna Maria Bru-Luna 1 ,
  • Manuel Martí-Vilar 2 ,
  • César Merino-Soto 3 ,
  • José Livia-Segovia 4 ,
  • Juan Garduño-Espinosa 5 &
  • Filiberto Toledano-Toledano 5 , 6 , 7  

BMC Psychology volume  12 , Article number:  217 ( 2024 ) Cite this article

268 Accesses

Metrics details

The person-centered care (PCC) approach plays a fundamental role in ensuring quality healthcare. The Person-Centered Care Assessment Tool (P-CAT) is one of the shortest and simplest tools currently available for measuring PCC. The objective of this study was to conduct a systematic review of the evidence in validation studies of the P-CAT, taking the “Standards” as a frame of reference.

First, a systematic literature review was conducted following the PRISMA method. Second, a systematic descriptive literature review of validity tests was conducted following the “Standards” framework. The search strategy and information sources were obtained from the Cochrane, Web of Science (WoS), Scopus and PubMed databases. With regard to the eligibility criteria and selection process, a protocol was registered in PROSPERO (CRD42022335866), and articles had to meet criteria for inclusion in the systematic review.

A total of seven articles were included. Empirical evidence indicates that these validations offer a high number of sources related to test content, internal structure for dimensionality and internal consistency. A moderate number of sources pertain to internal structure in terms of test-retest reliability and the relationship with other variables. There is little evidence of response processes, internal structure in measurement invariance terms, and test consequences.

The various validations of the P-CAT are not framed in a structured, valid, theory-based procedural framework like the “Standards” are. This can affect clinical practice because people’s health may depend on it. The findings of this study show that validation studies continue to focus on the types of validity traditionally studied and overlook interpretation of the scores in terms of their intended use.

Peer Review reports

Person-centered care (PCC)

Quality care for people with chronic diseases, functional limitations, or both has become one of the main objectives of medical and care services. The person-centered care (PCC) approach is an essential element not only in achieving this goal but also in providing high-quality health maintenance and medical care [ 1 , 2 , 3 ]. In addition to guaranteeing human rights, PCC provides numerous benefits to both the recipient and the provider [ 4 , 5 ]. Additionally, PCC includes a set of necessary competencies for healthcare professionals to address ongoing challenges in this area [ 6 ]. PCC includes the following elements [ 7 ]: an individualized, goal-oriented care plan based on individuals’ preferences; an ongoing review of the plan and the individual’s goals; support from an interprofessional team; active coordination among all medical and care providers and support services; ongoing information exchange, education and training for providers; and quality improvement through feedback from the individual and caregivers.

There is currently a growing body of literature on the application of PCC. A good example of this is McCormack’s widely known mid-range theory [ 8 ], an internationally recognized theoretical framework for PCC and how it is operationalized in practice. This framework forms a guide for care practitioners and researchers in hospital settings. This framework is elaborated in PCC and conceived of as “an approach to practice that is established through the formation and fostering of therapeutic relationships between all care providers, service users, and others significant to them, underpinned by values of respect for persons, [the] individual right to self-determination, mutual respect, and understanding” [ 9 ].

Thus, as established by PCC, it is important to emphasize that reference to the person who is the focus of care refers not only to the recipient but also to everyone involved in a care interaction [ 10 , 11 ]. PCC ensures that professionals are trained in relevant skills and methodology since, as discussed above, carers are among the agents who have the greatest impact on the quality of life of the person in need of care [ 12 , 13 , 14 ]. Furthermore, due to the high burden of caregiving, it is essential to account for caregivers’ well-being. In this regard, studies on professional caregivers are beginning to suggest that the provision of PCC can produce multiple benefits for both the care recipient and the caregiver [ 15 ].

Despite a considerable body of literature and the frequent inclusion of the term in health policy and research [ 16 ], PCC involves several complications. There is no standard consensus on the definition of this concept [ 17 ], which includes problematic areas such as efficacy assessment [ 18 , 19 ]. In addition, the difficulty of measuring the subjectivity involved in identifying the dimensions of the CPC and the infrequent use of standardized measures are acute issues [ 20 ]. These limitations and purposes motivated the creation of the Person-Centered Care Assessment Tool (P-CAT; [ 21 ]), which emerged from the need for a brief, economical, easily applied, versatile and comprehensive assessment instrument to provide valid and reliable measures of PCC for research purposes [ 21 ].

Person-centered care assessment tool (P-CAT)

There are several instruments that can measure PCC from different perspectives (i.e., the caregiver or the care recipient) and in different contexts (e.g., hospitals and nursing homes). However, from a practical point of view, the P-CAT is one of the shortest and simplest tools and contains all the essential elements of PCC described in the literature. It was developed in Australia to measure the approach of long-term residential settings to older people with dementia, although it is increasingly used in other healthcare settings, such as oncology units [ 22 ] and psychiatric hospitals [ 23 ].

Due to the brevity and simplicity of its application, the versatility of its use in different medical and care contexts, and its potential emic characteristics (i.e., constructs that can be cross-culturally applicable with reasonable and similar structure and interpretation; [ 24 ]), the P-CAT is one of the most widely used tests by professionals to measure PCC [ 25 , 26 ]. It has expanded to several countries with cultural and linguistic differences. Since its creation, it has been adapted in countries separated by wide cultural and linguistic differences, such as Norway [ 27 ], Sweden [ 28 ], China [ 29 ], South Korea [ 30 ], Spain [ 25 ], and Italy [ 31 ].

The P-CAT comprises 13 items rated on a 5-point ordinal scale (from “strongly disagree” to “strongly agree”), with high scores indicating a high degree of person-centeredness. The scale consists of three dimensions: person-centered care (7 items), organizational support (4 items) and environmental accessibility (2 items). In the original study ( n  = 220; [ 21 ]), the internal consistency of the instrument yielded satisfactory values for the total scale ( α  = 0.84) and good test-retest reliability ( r  =.66) at one-week intervals. A reliability generalization study conducted in 2021 [ 32 ] that estimated the internal consistency of the P-CAT and analyzed possible factors that could affect the it revealed that the mean α value for the 25 meta-analysis samples (some of which were part of the validations included in this study) was 0.81, and the only variable that had a statistically significant relationship with the reliability coefficient was the mean age of the sample. With respect to internal structure validity, three factors (56% of the total variance) were obtained, and content validity was assessed by experts, literature reviews and stakeholders [ 33 ].

Although not explicitly stated, the apparent commonality between validation studies of different versions of the P-CAT may be influenced by an influential decades-old validity framework that differentiates three categories: content validity, construct validity, and criterion validity [ 34 , 35 ]. However, a reformulation of the validity of the P-CAT within a modern framework, which would provide a different definition of validity, has not been performed.

Scale validity

Traditionally, validation is a process focused on the psychometric properties of a measurement instrument [ 36 ]. In the early 20th century, with the frequent use of standardized measurement tests in education and psychology, two definitions emerged: the first defined validity as the degree to which a test measures what it intends to measure, while the second described the validity of an instrument in terms of the correlation it presents with a variable [ 35 ].

However, in the past century, validity theory has evolved, leading to the understanding that validity should be based on specific interpretations for an intended purpose. It should not be limited to empirically obtained psychometric properties but should also be supported by the theory underlying the construct measured. Thus, to speak of classical or modern validity theory suggests an evolution in the classical or modern understanding of the concept of validity. Therefore, a classical approach (called classical test theory, CTT) is specifically differentiated from a modern approach. In general, recent concepts associated with a modern view of validity are based on (a) a unitary conception of validity and (b) validity judgments based on inferences and interpretations of the scores of a measure [ 37 , 38 ]. This conceptual advance in the concept of validity led to the creation of a guiding framework to for obtaining evidence to support the use and interpretation of the scores obtained by a measure [ 39 ].

This purpose is addressed by the Standards for Educational and Psychological Testing (“Standards”), a guide created by the American Educational Research Association (AERA), the American Psychological Association (APA) and the National Council on Measurement in Education (NCME) in 2014 with the aim of providing guidelines to assess the validity of the interpretations of scores of an instrument based on their intended use. Two conceptual aspects stand out in this modern view of validity: first, validity is a unitary concept centered on the construct; second, validity is defined as “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” [ 37 ]. Thus, the “Standards” propose several sources that serve as a reference for assessing different aspects of validity. The five sources of valid evidence are as follows [ 37 ]: test content, response processes, internal structure, relations to other variables and consequences of testing. According to AERA et al. [ 37 ], test content validity refers to the relationship of the administration process, subject matter, wording and format of test items to the construct they are intended to measure. It is measured predominantly with qualitative methods but without excluding quantitative approaches. The validity of the responses is based on analysis of the cognitive processes and interpretation of the items by respondents and is measured with qualitative methods. Internal structure validity is based on the interrelationship between the items and the construct and is measured by quantitative methods. Validity in terms of the relationship with other variables is based on comparison between the variable that the instrument intends to measure and other theoretically relevant external variables and is measured by quantitative methods. Finally, validity based on the results of the test analyses consequences, both intended and unintended, that may be due to a source of invalidity. It is measured mainly by qualitative methods.

Thus, although validity plays a fundamental role in providing a strong scientific basis for interpretations of test scores, validation studies in the health field have traditionally focused on content validity, criterion validity and construct validity and have overlooked the interpretation and use of scores [ 34 ].

“Standards” are considered a suitable validity theory-based procedural framework for reviewing the validity of questionnaires due to its ability to analyze sources of validity from both qualitative and quantitative approaches and its evidence-based method [ 35 ]. Nevertheless, due to a lack of knowledge or the lack of a systematic description protocol, very few instruments to date have been reviewed within the framework of the “Standards” [ 39 ].

Current study

Although the P-CAT is one of the most widely used instruments by professionals and has seven validations [ 25 , 27 , 28 , 29 , 30 , 31 , 40 ], no analysis has been conducted of its validity within the framework of the “Standards”. That is, empirical evidence of the validity of the P-CAT has not been obtained in a way that helps to develop a judgment based on a synthesis of the available information.

A review of this type is critical given that some methodological issues seem to have not been resolved in the P-CAT. For example, although the multidimensionality of the P-CAT was identified in the study that introduced it, Bru-Luna et al. [ 32 ] recently stated that in adaptations of the P-CAT [ 25 , 27 , 28 , 29 , 30 , 40 ], the total score is used for interpretation and multidimensionality is disregarded. Thus, the multidimensionality of the original study was apparently not replicated. Bru-Luna et al. [ 32 ] also indicated that the internal structure validity of the P-CAT is usually underreported due to a lack of sufficiently rigorous approaches to establish with certainty how its scores are calculated.

The validity of the P-CAT, specifically its internal structure, appears to be unresolved. Nevertheless, substantive research and professional practice point to this measure as relevant to assessing PCC. This perception is contestable and judgment-based and may not be sufficient to assess the validity of the P-CAT from a cumulative and synthetic angle based on preceding validation studies. An adequate assessment of validity requires a model to conceptualize validity followed by a review of previous studies of the validity of the P-CAT using this model.

Therefore, the main purpose of this study was to conduct a systematic review of the evidence provided by P-CAT validation studies while taking the “Standards” as a framework.

The present study comprises two distinct but interconnected procedures. First, a systematic literature review was conducted following the PRISMA method ( [ 41 ]; Additional file 1; Additional file 2) with the aim of collecting all validations of the P-CAT that have been developed. Second, a systematic description of the validity evidence for each of the P-CAT validations found in the systematic review was developed following the “Standards” framework [ 37 ]. The work of Hawkins et al. [ 39 ], the first study to review validity sources according to the guidelines proposed by the “Standards”, was also used as a reference. Both provided conceptual and pragmatic guidance for organizing and classifying validity evidence for the P-CAT.

The procedure conducted in the systematic review is described below, followed by the procedure for examining the validity studies.

Systematic review

Search strategy and information sources.

Initially, the Cochrane database was searched with the aim of identifying systematic reviews of the P-CAT. When no such reviews were found, subsequent preliminary searches were performed in the Web of Science (WoS), Scopus and PubMed databases. These databases play a fundamental role in recent scientific literature since they are the main sources of published articles that undergo high-quality content and editorial review processes [ 42 ]. The search formula was as follows. The original P-CAT article [ 21 ] was located, after which all articles that cited it through 2021 were identified and analyzed. This approach ensured the inclusion of all validations. No articles were excluded on the basis of language to avoid language bias [ 43 ]. Moreover, to reduce the effects of publication bias, a complementary search in Google Scholar was also performed to allow the inclusion of “gray” literature [ 44 ]. Finally, a manual search was performed through a review of the references of the included articles to identify other articles that met the search criteria but were not present in any of the aforementioned databases.

This process was conducted by one of the authors and corroborated by another using the Covidence tool [ 45 ]. A third author was consulted in case of doubt.

Eligibility criteria and selection process

The protocol was registered in PROSPERO, and the search was conducted according to these criteria. The identification code is CRD42022335866.

The articles had to meet the following criteria for inclusion in the systematic review: (a) a methodological approach to P-CAT validations, (b) an experimental or quasiexperimental studies, (c) studies with any type of sample, and (d) studies in any language. We discarded studies that met at least one of the following exclusion criteria: (a) systematic reviews or bibliometric reviews of the instrument or meta-analyses or (b) studies published after 2021.

Data collection process

After the articles were selected, the most relevant information was extracted from each article. Fundamental data were recorded in an Excel spreadsheet for each of the sections: introduction, methodology, results and discussion. Information was also recorded about the limitations mentioned in each article as well as the practical implications and suggestions for future research.

Given the aim of the study, information was collected about the sources of validity of each study, including test content (judges’ evaluation, literature review and translation), response processes, internal structure (factor analysis, design, estimator, factor extraction method, factors and items, interfactor R, internal replication, effect of the method, and factor loadings), and relationships with other variables (convergent, divergent, concurrent and predictive validity) and consequences of measurement.

Description of the validity study

To assess the validity of the studies, an Excel table was used. Information was recorded for the seven articles included in the systematic review. The data were extracted directly from the texts of the articles and included information about the authors, the year of publication, the country where each P-CAT validation was produced and each of the five standards proposed in the “Standards” [ 37 ].

The validity source related to internal structure was divided into three sections to record information about dimensionality (e.g., factor analysis, design, estimator, factor extraction method, factors and items, interfactor R, internal replication, effect of the method, and factor loadings), reliability expression (i.e., internal consistency and test-retest) and the study of factorial invariance according to the groups into which it was divided (e.g., sex, age, profession) and the level of study (i.e., metric, intercepts). This approach allowed much more information to be obtained than relying solely on source validity based on internal structure. This division was performed by the same researcher who performed the previous processes.

Study selection and study characteristics

The systematic review process was developed according to the PRISMA methodology [ 41 ].

The WoS, Scopus, PubMed and Google Scholar databases were searched on February 12, 2022 and yielded a total of 485 articles. Of these, 111 were found in WoS, 114 in Scopus, 43 in PubMed and 217 in Google Scholar. In the first phase, the title and abstracts of all the articles were read. In this first screening, 457 articles were eliminated because they did not include studies with a methodological approach to P-CAT validation and one article was excluded because it was the original P-CAT article. This resulted in a total of 27 articles, 19 of which were duplicated in different databases and, in the case of Google Scholar, within the same database. This process yielded a total of eight articles that were evaluated for eligibility by a complete reading of the text. In this step, one of the articles was excluded due to a lack of access to the full text of the study [ 31 ] (although the original manuscript was found, it was impossible to access the complete content; in addition, the authors of the manuscript were contacted, but no reply was received). Finally, a manual search was performed by reviewing the references of the seven studies, but none were considered suitable for inclusion. Thus, the review was conducted with a total of seven articles.

Of the seven studies, six were original validations in other languages. These included Norwegian [ 27 ], Swedish [ 28 ], Chinese (which has two validations [ 29 , 40 ]), Spanish [ 25 ], and Korean [ 30 ]. The study by Selan et al. [ 46 ] included a modification of the Swedish version of the P-CAT and explored the psychometric properties of both versions (i.e., the original Swedish version and the modified version).

The item selection and screening process are illustrated in detail in Fig.  1 .

figure 1

PRISMA 2020 flow diagram for new systematic reviews including database searches

Validity analysis

To provide a clear overview of the validity analyses, Table  1 descriptively shows the percentages of items that provide information about the five standards proposed by the “Standards” guide [ 37 ].

The table shows a high number of validity sources related to test content and internal structure in relation to dimensionality and internal consistency, followed by a moderate number of sources for test-retest and relationship with other variables. A rate of 0% is observed for validity sources related to response processes, invariance and test consequences. Below, different sections related to each of the standards are shown, and the information is presented in more detail.

Evidence based on test content

The first standard, which focused on test content, was met for all items (100%). Translation, which refers to the equivalence of content between the original language and the target language, was met in the six articles that conducted validation in another language and/or culture. These studies reported that the validations were translated by bilingual experts and/or experts in the area of care. In addition, three studies [ 25 , 29 , 40 ] reported that the translation process followed International Test Commission guidelines, such as those of Beaton et al. [ 47 ], Guillemin [ 48 ], Hambleton et al. [ 49 ], and Muñiz et al. [ 50 ]. Evaluation by judges, who referred to the relevance, clarity and importance of the content, was divided into two categories: expert evaluation (a panel of expert judges for each of the areas to consider in the evaluation instrument) and experiential evaluation (potential participants testing the test). The first type of evaluation occurred in three of the articles [ 28 , 29 , 46 ], while the other occurred in two [ 25 , 40 ]. Only one of the items [ 29 ] reported that the scale contained items that reflected the dimension described in the literature. The validity evidence related to the test content presented in each article can be found in Table  2 .

Evidence based on response processes

The second standard, related to the validity of the response process, was obtained according to the “Standards” from the analysis of individual responses: “questioning test takers about their performance strategies or response to particular items (…), maintaining records that monitor the development of a response to a writing task (…), documentation of other aspects of performance, like eye movement or response times…” [ 37 ] (p. 15). According to the analysis of the validity of the response processes, none of the articles complied with this evidence.

Evidence based on internal structure

The third standard, validity related to internal structure, was divided into three sections. First, the dimensionality of each study was examined in terms of factor analysis, design, estimator, factor extraction method, factors and items, interfactor R, internal replication, effect of the method, and factor loadings. Le et al. [ 40 ] conducted an exploratory-confirmatory design while Sjögren et al. [ 28 ] conducted a confirmatory-exploratory design to assess construct validity using confirmatory factor analysis (CFA) and investigated it further using exploratory factor analysis (EFA). The remaining articles employed only a single form of factor analysis: three employed EFA, and two employed CFA. Regarding the next point, only three of the articles reported the factor extraction method used, including Kaiser’s eigenvalue, criterion, scree plot test, parallel analysis and Velicer’s MAP test. Instrument validations yielded a total of two factors in five of the seven articles, while one yielded a single dimension [ 25 ] and the other yielded three dimensions [ 29 ], as in the original instrument. The interfactor R was reported only in the study by Zhong and Lou [ 29 ], whereas in the study by Martínez et al. [ 25 ], it could be easily obtained since it consisted of only one dimension. Internal replication was also calculated in the Spanish validation by randomly splitting the sample into two to test the correlations between factors. The effectiveness of the method was not reported in any of the articles. This information is presented in Table  3 in addition to a summary of the factor loadings.

The second section examined reliability. All the studies presented measures of internal consistency conducted in their entirety with Cronbach’s α coefficient for both the total scale and the subscales. The ω coefficient of McDonald was not used in any case. Four of the seven articles performed a test-retest test. Martínez et al. [ 25 ] conducted a test-retest after a period of seven days, while Le et al. [ 40 ] and Rokstad et al. [ 27 ] performed it between one and two weeks later and Sjögren et al. [ 28 ] allowed approximately two weeks to pass after the initial test.

The third section analyzes the calculation of invariance, which was not reported in any of the studies.

Evidence based on relationships with other variables

In the fourth standard, based on validity according to the relationship with other variables, the articles that reported it used only convergent validity (i.e., it was hypothesized that the variables related to the construct measured by the test—in this case, person-centeredness—were positively or negatively related to another construct). Discriminant validity hypothesizes that the variables related to the PCC construct are not correlated in any way with any other variable studied. No article (0%) measured discriminant evidence, while four (57%) measured convergent evidence [ 25 , 29 , 30 , 46 ]. Convergent validity was obtained through comparisons with instruments such as the Person-Centered Climate Questionnaire–Staff Version (PCQ-S), the Staff-Based Measures of Individualized Care for Institutionalized Persons with Dementia (IC), the Caregiver Psychological Elder Abuse Behavior Scale (CPEAB), the Organizational Climate (CLIOR) and the Maslach Burnout Inventory (MBI). In the case of Selan et al. [ 46 ], convergent validity was assessed on two items considered by the authors as “crude measures of person-centered care (i.e., external constructs) giving an indication of the instruments’ ability to measure PCC” (p. 4). Concurrent validity, which measures the degree to which the results of one test are or are not similar to those of another test conducted at more or less the same time with the same participants, and predictive validity, which allows predictions to be established regarding behavior based on comparison between the values of the instrument and the criterion, were not reported in any of the studies.

Evidence based on the consequences of testing

The fifth and final standard was related to the consequences of the test. It analyzed the consequences, both intended and unintended, of applying the test to a given sample. None of the articles presented explicit or implicit evidence of this.

The last two sources of validity can be seen in Table  4 .

Table  5 shows the results of the set of validity tests for each study according to the described standards.

The main purpose of this article is to analyze the evidence of validity in different validation studies of the P-CAT. To gather all existing validations, a systematic review of all literature citing this instrument was conducted.

The publication of validation studies of the P-CAT has been constant over the years. Since the publication of the original instrument in 2010, seven validations have been published in other languages (taking into account the Italian version by Brugnolli et al. [ 31 ], which could not be included in this study) as well as a modification of one of these versions. The very unequal distribution of validations between languages and countries is striking. A recent systematic review [ 51 ] revealed that in Europe, the countries where the PCC approach is most widely used are the United Kingdom, Sweden, the Netherlands, Northern Ireland, and Norway. It has also been shown that the neighboring countries seem to exert an influence on each other due to proximity [ 52 ] such that they tend to organize healthcare in a similar way, as is the case for Scandinavian countries. This favors the expansion of PCC and explains the numerous validations we found in this geographical area.

Although this approach is conceived as an essential element of healthcare for most governments [ 53 ], PCC varies according to the different definitions and interpretations attributed to it, which can cause confusion in its application (e.g., between Norway and the United Kingdom [ 54 ]). Moreover, facilitators of or barriers to implementation depend on the context and level of development of each country, and financial support remains one of the main factors in this regard [ 53 ]. This fact explains why PCC is not globally widespread among all territories. In countries where access to healthcare for all remains out of reach for economic reasons, the application of this approach takes a back seat, as does the validation of its assessment tools. In contrast, in a large part of Europe or in countries such as China or South Korea that have experienced decades of rapid economic development, patients are willing to be involved in their medical treatment and enjoy more satisfying and efficient medical experiences and environments [ 55 ], which facilitates the expansion of validations of instruments such as the P-CAT.

Regarding validity testing, the guidelines proposed by the “Standards” [ 37 ] were followed. According to the analysis of the different validations of the P-CAT instrument, none of the studies used a structured validity theory-based procedural framework for conducting validation. The most frequently reported validity tests were on the content of the test and two of the sections into which the internal structure was divided (i.e., dimensionality and internal consistency).

In the present article, the most cited source of validity in the studies was the content of the test because most of the articles were validations of the P-CAT in other languages, and the authors reported that the translation procedure was conducted by experts in all cases. In addition, several of the studies employed International Test Commission guidelines, such as those by Beaton et al. [ 47 ], Guillemin [ 48 ], Hambleton et al. [ 49 ], and Muñiz et al. [ 50 ]. Several studies also assessed the relevance, clarity and importance of the content.

The third source of validity, internal structure, was the next most often reported, although it appeared unevenly among the three sections into which this evidence was divided. Dimensionality and internal consistency were reported in all studies, followed by test-retest consistency. In relation to the first section, factor analysis, a total of five EFAs and four CFAs were presented in the validations. Traditionally, EFA has been used in research to assess dimensionality and identify key psychological constructs, although this approach involves a number of inconveniences, such as difficulty testing measurement invariance and incorporating latent factors into subsequent analyses [ 56 ] or the major problem of factor loading matrix rotation [ 57 ]. Studies eventually began to employ CFA, a technique that overcame some of these obstacles [ 56 ] but had other drawbacks; for example, the strict requirement of zero cross-loadings often does not fit the data well, and misspecification of zero loadings tends to produce distorted factors [ 57 ]. Recently, exploratory structural equation modeling (ESEM) has been proposed. This technique is widely recommended both conceptually and empirically to assess the internal structure of psychological tools [ 58 ] since it overcomes the limitations of EFA and CFA in estimating their parameters [ 56 , 57 ].

The next section, reliability, reports the total number of items according to Cronbach’s α reliability coefficient. Reliability is defined as a combination of systematic and random influences that determine the observed scores on a psychological test. Reporting the reliability measure ensures that item-based scores are consistent, that the tool’s responses are replicable and that they are not modified solely by random noise [ 59 , 60 ]. Currently, the most commonly employed reliability coefficient in studies with a multi-item measurement scale (MIMS) is Cronbach’s α [ 60 , 61 ].

Cronbach’s α [ 62 ] is based on numerous strict assumptions (e.g., the test must be unidimensional, factor loadings must be equal for all items and item errors should not covary) to estimate internal consistency. These assumptions are difficult to meet, and their violation may produce small reliability estimates [ 60 ]. One of the alternative measures to α that is increasingly recommended by the scientific literature is McDonald’s ω [ 63 ], a composite reliability measure. This coefficient is recommended for congeneric scales in which tau equivalence is not assumed. It has several advantages. For example, estimates of ω are usually robust when the estimated model contains more factors than the true model, even with small samples, or when skewness in univariate item distributions produces lower biases than those found when using α [ 59 ].

The test-retest method was the next most commonly reported internal structure section in these studies. This type of reliability considers the consistency of the scores of a test between two measurements separated by a period [ 64 ]. It is striking that test-retest consistency does not have a prevalence similar to that of internal consistency since, unlike internal consistency, test-retest consistency can be assessed for practically all types of patient-reported outcomes. It is even considered by some measurement experts to report reliability with greater relevance than internal consistency since it plays a fundamental role in the calculation of parameters for health measures [ 64 ]. However, the literature provides little guidance regarding the assessment of this type of reliability.

The internal structure section that was least frequently reported in the studies in this review was invariance. A lack of invariance refers to a difference between scores on a test that is not explained by group differences in the structure it is intended to measure [ 65 ]. The invariance of the measure should be emphasized as a prerequisite in comparisons between groups since “if scale invariance is not examined, item bias may not be fully recognized and this may lead to a distorted interpretation of the bias in a particular psychological measure” [ 65 ].

Evidence related to other variables was the next most reported source of validity in the studies included in this review. Specifically, the four studies that reported this evidence did so according to convergent validity and cited several instruments. None of the studies included evidence of discriminant validity, although this may be because there are currently several obstacles related to the measurement of this type of validity [ 66 ]. On the one hand, different definitions are used in the applied literature, which makes its evaluation difficult; on the other hand, the literature on discriminant validity focuses on techniques that require the use of multiple measurement methods, which often seem to have been introduced without sufficient evidence or are applied randomly.

Validity related to response processes was not reported by any of the studies. There are several methods to analyze this validity. These methods can be divided into two groups: “those that directly access the psychological processes or cognitive operations (think aloud, focus group, and interviews), compared to those which provide indirect indicators which in turn require additional inference (eye tracking and response times)” [ 38 ]. However, this validity evidence has traditionally been reported less frequently than others in most studies, perhaps because there are fewer clear and accepted practices on how to design or report these studies [ 67 ].

Finally, the consequences of testing were not reported in any of the studies. There is debate regarding this source of validity, with two main opposing streams of thought. On the one hand [ 68 , 69 ]) suggests that consequences that appear after the application of a test should not derive from any source of test invalidity and that “adverse consequences only undermine the validity of an assessment if they can be attributed to a problem of fit between the test and the construct” (p. 6). In contrast, Cronbach [ 69 , 70 ] notes that adverse social consequences that may result from the application of a test may call into question the validity of the test. However, the potential risks that may arise from the application of a test should be minimized in any case, especially in regard to health assessments. To this end, it is essential that this aspect be assessed by instrument developers and that the experiences of respondents be protected through the development of comprehensive and informed practices [ 39 ].

This work is not without limitations. First, not all published validation studies of the P-CAT, such as the Italian version by Brugnolli et al. [ 31 ], were available. These studies could have provided relevant information. Second, many sources of validity could not be analyzed because the studies provided scant or no data, such as response processes [ 25 , 27 , 28 , 29 , 30 , 40 , 46 ], relationships with other variables [ 27 , 28 , 40 ], consequences of testing [ 25 , 27 , 28 , 29 , 30 , 40 , 46 ], or invariance [ 25 , 27 , 28 , 29 , 30 , 40 , 46 ] in the case of internal structure and interfactor R [ 27 , 28 , 30 , 40 , 46 ], internal replication [ 27 , 28 , 29 , 30 , 40 , 46 ] or the effect of the method [ 25 , 27 , 28 , 29 , 30 , 40 , 46 ] in the case of dimensionality. In the future, it is hoped that authors will become aware of the importance of validity, as shown in this article and many others, and provide data on unreported sources so that comprehensive validity studies can be performed.

The present work also has several strengths. The search was extensive, and many studies were obtained using three different databases, including WoS, one of the most widely used and authoritative databases in the world. This database includes a large number and variety of articles and is not fully automated due to its human team [ 71 , 72 , 73 ]. In addition, to prevent publication bias, gray literature search engines such as Google Scholar were used to avoid the exclusion of unpublished research [ 44 ]. Finally, linguistic bias was prevented by not limiting the search to articles published in only one or two languages, thus avoiding the overrepresentation of studies in one language and underrepresentation in others [ 43 ].

Conclusions

Validity is understood as the degree to which tests and theory support the interpretations of instrument scores for their intended use [ 37 ]. From this perspective, the various validations of the P-CAT are not presented in a structured, valid, theory-based procedural framework like the “Standards” are. After integration and analysis of the results, it was observed that these validation reports offer a high number of sources of validity related to test content, internal structure in dimensionality and internal consistency, a moderate number of sources for internal structure in terms of test-retest reliability and the relationship with other variables, and a very low number of sources for response processes, internal structure in terms of invariance, and test consequences.

Validity plays a fundamental role in ensuring a sound scientific basis for test interpretations because it provides evidence of the extent to which the data provided by the test are valid for the intended purpose. This can affect clinical practice as people’s health may depend on it. In this sense, the “Standards” are considered a suitable and valid theory-based procedural framework for studying this modern conception of questionnaire validity, which should be taken into account in future research in this area.

Although the P-CAT is one of the most widely used instruments for assessing PCC, as shown in this study, PCC has rarely been studied. The developers of measurement tests applied to the health care setting, on which the health and quality of life of many people may depend, should use this validity framework to reflect the clear purpose of the measurement. This approach is important because the equity of decision making by healthcare professionals in daily clinical practice may depend on the source of validity. Through a more extensive study of validity that includes the interpretation of scores in terms of their intended use, the applicability of the P-CAT, an instrument that was initially developed for long-term care homes for elderly people, could be expanded to other care settings. However, the findings of this study show that validation studies continue to focus on traditionally studied types of validity and overlook the interpretation of scores in terms of their intended use.

Data availability

All data relevant to the study were included in the article or uploaded as additional files. Additional template data extraction forms are available from the corresponding author upon reasonable request.

Abbreviations

American Educational Research Association

American Psychological Association

Confirmatory factor analysis

Organizational Climate

Caregiver Psychological Elder Abuse Behavior Scale

Exploratory factor analysis

Exploratory structural equation modeling

Staff-based Measures of Individualized Care for Institutionalized Persons with Dementia

Maslach Burnout Inventory

Multi-item measurement scale

Maximum likelihood

National Council on Measurement in Education

Person-Centered Care Assessment Tool

  • Person-centered care

Person-Centered Climate Questionnaire–Staff Version

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

International Register of Systematic Review Protocols

Standards for Educational and Psychological Testing

weighted least square mean and variance adjusted

Web of Science

Institute of Medicine. Crossing the quality chasm: a new health system for the 21st century. Washington, DC: National Academy; 2001.

Google Scholar  

International Alliance of Patients’ Organizations. What is patient-centred healthcare? A review of definitions and principles. 2nd ed. London, UK: International Alliance of Patients’ Organizations; 2007.

World Health Organization. WHO global strategy on people-centred and integrated health services: interim report. Geneva, Switzerland: World Health Organization; 2015.

Britten N, Ekman I, Naldemirci Ö, Javinger M, Hedman H, Wolf A. Learning from Gothenburg model of person centred healthcare. BMJ. 2020;370:m2738.

Article   PubMed   Google Scholar  

Van Diepen C, Fors A, Ekman I, Hensing G. Association between person-centred care and healthcare providers’ job satisfaction and work-related health: a scoping review. BMJ Open. 2020;10:e042658.

Article   PubMed   PubMed Central   Google Scholar  

Ekman N, Taft C, Moons P, Mäkitalo Å, Boström E, Fors A. A state-of-the-art review of direct observation tools for assessing competency in person-centred care. Int J Nurs Stud. 2020;109:103634.

American Geriatrics Society Expert Panel on Person-Centered Care. Person-centered care: a definition and essential elements. J Am Geriatr Soc. 2016;64:15–8.

Article   Google Scholar  

McCormack B, McCance TV. Development of a framework for person-centred nursing. J Adv Nurs. 2006;56:472–9.

McCormack B, McCance T. Person-centred practice in nursing and health care: theory and practice. Chichester, England: Wiley; 2016.

Nolan MR, Davies S, Brown J, Keady J, Nolan J. Beyond person-centred care: a new vision for gerontological nursing. J Clin Nurs. 2004;13:45–53.

McCormack B, McCance T. Person-centred nursing: theory, models and methods. Oxford, UK: Wiley-Blackwell; 2010.

Book   Google Scholar  

Abraha I, Rimland JM, Trotta FM, Dell’Aquila G, Cruz-Jentoft A, Petrovic M, et al. Systematic review of systematic reviews of non-pharmacological interventions to treat behavioural disturbances in older patients with dementia. The SENATOR-OnTop series. BMJ Open. 2017;7:e012759.

Anderson K, Blair A. Why we need to care about the care: a longitudinal study linking the quality of residential dementia care to residents’ quality of life. Arch Gerontol Geriatr. 2020;91:104226.

Bauer M, Fetherstonhaugh D, Haesler E, Beattie E, Hill KD, Poulos CJ. The impact of nurse and care staff education on the functional ability and quality of life of people living with dementia in aged care: a systematic review. Nurse Educ Today. 2018;67:27–45.

Smythe A, Jenkins C, Galant-Miecznikowska M, Dyer J, Downs M, Bentham P, et al. A qualitative study exploring nursing home nurses’ experiences of training in person centred dementia care on burnout. Nurse Educ Pract. 2020;44:102745.

McCormack B, Borg M, Cardiff S, Dewing J, Jacobs G, Janes N, et al. Person-centredness– the ‘state’ of the art. Int Pract Dev J. 2015;5:1–15.

Wilberforce M, Challis D, Davies L, Kelly MP, Roberts C, Loynes N. Person-centredness in the care of older adults: a systematic review of questionnaire-based scales and their measurement properties. BMC Geriatr. 2016;16:63.

Rathert C, Wyrwich MD, Boren SA. Patient-centered care and outcomes: a systematic review of the literature. Med Care Res Rev. 2013;70:351–79.

Sharma T, Bamford M, Dodman D. Person-centred care: an overview of reviews. Contemp Nurse. 2016;51:107–20.

Ahmed S, Djurkovic A, Manalili K, Sahota B, Santana MJ. A qualitative study on measuring patient-centered care: perspectives from clinician-scientists and quality improvement experts. Health Sci Rep. 2019;2:e140.

Edvardsson D, Fetherstonhaugh D, Nay R, Gibson S. Development and initial testing of the person-centered Care Assessment Tool (P-CAT). Int Psychogeriatr. 2010;22:101–8.

Tamagawa R, Groff S, Anderson J, Champ S, Deiure A, Looyis J, et al. Effects of a provincial-wide implementation of screening for distress on healthcare professionals’ confidence and understanding of person-centered care in oncology. J Natl Compr Canc Netw. 2016;14:1259–66.

Degl’ Innocenti A, Wijk H, Kullgren A, Alexiou E. The influence of evidence-based design on staff perceptions of a supportive environment for person-centered care in forensic psychiatry. J Forensic Nurs. 2020;16:E23–30.

Hulin CL. A psychometric theory of evaluations of item and scale translations: fidelity across languages. J Cross Cult Psychol. 1987;18:115–42.

Martínez T, Suárez-Álvarez J, Yanguas J, Muñiz J. Spanish validation of the person-centered Care Assessment Tool (P-CAT). Aging Ment Health. 2016;20:550–8.

Martínez T, Martínez-Loredo V, Cuesta M, Muñiz J. Assessment of person-centered care in gerontology services: a new tool for healthcare professionals. Int J Clin Health Psychol. 2020;20:62–70.

Rokstad AM, Engedal K, Edvardsson D, Selbaek G. Psychometric evaluation of the Norwegian version of the person-centred Care Assessment Tool. Int J Nurs Pract. 2012;18:99–105.

Sjögren K, Lindkvist M, Sandman PO, Zingmark K, Edvardsson D. Psychometric evaluation of the Swedish version of the person-centered Care Assessment Tool (P-CAT). Int Psychogeriatr. 2012;24:406–15.

Zhong XB, Lou VW. Person-centered care in Chinese residential care facilities: a preliminary measure. Aging Ment Health. 2013;17:952–8.

Tak YR, Woo HY, You SY, Kim JH. Validity and reliability of the person-centered Care Assessment Tool in long-term care facilities in Korea. J Korean Acad Nurs. 2015;45:412–9.

Brugnolli A, Debiasi M, Zenere A, Zanolin ME, Baggia M. The person-centered Care Assessment Tool in nursing homes: psychometric evaluation of the Italian version. J Nurs Meas. 2020;28:555–63.

Bru-Luna LM, Martí-Vilar M, Merino-Soto C, Livia J. Reliability generalization study of the person-centered Care Assessment Tool. Front Psychol. 2021;12:712582.

Edvardsson D, Innes A. Measuring person-centered care: a critical comparative review of published tools. Gerontologist. 2010;50:834–46.

Hawkins M, Elsworth GR, Nolte S, Osborne RH. Validity arguments for patient-reported outcomes: justifying the intended interpretation and use of data. J Patient Rep Outcomes. 2021;5:64.

Sireci SG. On the validity of useless tests. Assess Educ Princ Policy Pract. 2016;23:226–35.

Hawkins M, Elsworth GR, Osborne RH. Questionnaire validation practice: a protocol for a systematic descriptive literature review of health literacy assessments. BMJ Open. 2019;9:e030753.

American Educational Research Association, American Psychological Association. National Council on Measurement in Education. Standards for educational and psychological testing. Washington, DC: American Educational Research Association; 2014.

Padilla JL, Benítez I. Validity evidence based on response processes. Psicothema. 2014;26:136–44.

PubMed   Google Scholar  

Hawkins M, Elsworth GR, Hoban E, Osborne RH. Questionnaire validation practice within a theoretical framework: a systematic descriptive literature review of health literacy assessments. BMJ Open. 2020;10:e035974.

Le C, Ma K, Tang P, Edvardsson D, Behm L, Zhang J, et al. Psychometric evaluation of the Chinese version of the person-centred Care Assessment Tool. BMJ Open. 2020;10:e031580.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int J Surg. 2021;88:105906.

Falagas ME, Pitsouni EI, Malietzis GA, Pappas G. Comparison of PubMed, Scopus, web of Science, and Google Scholar: strengths and weaknesses. FASEB J. 2008;22:338–42.

Grégoire G, Derderian F, Le Lorier J. Selecting the language of the publications included in a meta-analysis: is there a tower of Babel bias? J Clin Epidemiol. 1995;48:159–63.

Arias MM. Aspectos metodológicos Del metaanálisis (1). Pediatr Aten Primaria. 2018;20:297–302.

Covidence. Covidence systematic review software. Veritas Health Innovation, Australia. 2014. https://www.covidence.org/ . Accessed 28 Feb 2022.

Selan D, Jakobsson U, Condelius A. The Swedish P-CAT: modification and exploration of psychometric properties of two different versions. Scand J Caring Sci. 2017;31:527–35.

Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine (Phila Pa 1976). 2000;25:3186–91.

Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Rheumatol. 1995;24:61–3.

Hambleton R, Merenda P, Spielberger C. Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NJ: Lawrence Erlbaum Associates; 2005.

Muñiz J, Elosua P, Hambleton RK. International test commission guidelines for test translation and adaptation: second edition. Psicothema. 2013;25:151–7.

Rosengren K, Brannefors P, Carlstrom E. Adoption of the concept of person-centred care into discourse in Europe: a systematic literature review. J Health Organ Manag. 2021;35:265–80.

Alharbi T, Olsson LE, Ekman I, Carlström E. The impact of organizational culture on the outcome of hospital care: after the implementation of person-centred care. Scand J Public Health. 2014;42:104–10.

Bensbih S, Souadka A, Diez AG, Bouksour O. Patient centered care: focus on low and middle income countries and proposition of new conceptual model. J Med Surg Res. 2020;7:755–63.

Stranz A, Sörensdotter R. Interpretations of person-centered dementia care: same rhetoric, different practices? A comparative study of nursing homes in England and Sweden. J Aging Stud. 2016;38:70–80.

Zhou LM, Xu RH, Xu YH, Chang JH, Wang D. Inpatients’ perception of patient-centered care in Guangdong province, China: a cross-sectional study. Inquiry. 2021. https://doi.org/10.1177/00469580211059482 .

Marsh HW, Morin AJ, Parker PD, Kaur G. Exploratory structural equation modeling: an integration of the best features of exploratory and confirmatory factor analysis. Annu Rev Clin Psychol. 2014;10:85–110.

Asparouhov T, Muthén B. Exploratory structural equation modeling. Struct Equ Model Multidiscip J. 2009;16:397–438.

Cabedo-Peris J, Martí-Vilar M, Merino-Soto C, Ortiz-Morán M. Basic empathy scale: a systematic review and reliability generalization meta-analysis. Healthc (Basel). 2022;10:29–62.

Flora DB. Your coefficient alpha is probably wrong, but which coefficient omega is right? A tutorial on using R to obtain better reliability estimates. Adv Methods Pract Psychol Sci. 2020;3:484–501.

McNeish D. Thanks coefficient alpha, we’ll take it from here. Psychol Methods. 2018;23:412–33.

Hayes AF, Coutts JJ. Use omega rather than Cronbach’s alpha for estimating reliability. But… Commun Methods Meas. 2020;14:1–24.

Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:297–334.

McDonald R. Test theory: a unified approach. Mahwah, NJ: Erlbaum; 1999.

Polit DF. Getting serious about test-retest reliability: a critique of retest research and some recommendations. Qual Life Res. 2014;23:1713–20.

Ceylan D, Çizel B, Karakaş H. Testing destination image scale invariance for intergroup comparison. Tour Anal. 2020;25:239–51.

Rönkkö M, Cho E. An updated guideline for assessing discriminant validity. Organ Res Methods. 2022;25:6–14.

Hubley A, Zumbo B. Response processes in the context of validity: setting the stage. In: Zumbo B, Hubley A, editors. Understanding and investigating response processes in validation research. Cham, Switzerland: Springer; 2017. pp. 1–12.

Messick S. Validity of performance assessments. In: Philips G, editor. Technical issues in large-scale performance assessment. Washington, DC: Department of Education, National Center for Education Statistics; 1996. pp. 1–18.

Moss PA. The role of consequences in validity theory. Educ Meas Issues Pract. 1998;17:6–12.

Cronbach L. Five perspectives on validity argument. In: Wainer H, editor. Test validity. Hillsdale, MI: Erlbaum; 1988. pp. 3–17.

Birkle C, Pendlebury DA, Schnell J, Adams J. Web of Science as a data source for research on scientific and scholarly activity. Quant Sci Stud. 2020;1:363–76.

Bramer WM, Rethlefsen ML, Kleijnen J, Franco OH. Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study. Syst Rev. 2017;6:245.

Web of Science Group. Editorial selection process. Clarivate. 2024. https://clarivate.com/webofsciencegroup/solutions/%20editorial-selection-process/ . Accessed 12 Sept 2022.

Download references

Acknowledgements

The authors thank the casual helpers for their aid in information processing and searching.

This work is one of the results of research project HIM/2015/017/SSA.1207, “Effects of mindfulness training on psychological distress and quality of life of the family caregiver”. Main researcher: Filiberto Toledano-Toledano Ph.D. The present research was funded by federal funds for health research and was approved by the Commissions of Research, Ethics and Biosafety (Comisiones de Investigación, Ética y Bioseguridad), Hospital Infantil de México Federico Gómez, National Institute of Health. The source of federal funds did not control the study design, data collection, analysis, or interpretation, or decisions regarding publication.

Author information

Authors and affiliations.

Departamento de Educación, Facultad de Ciencias Sociales, Universidad Europea de Valencia, 46010, Valencia, Spain

Lluna Maria Bru-Luna

Departamento de Psicología Básica, Universitat de València, Blasco Ibáñez Avenue, 21, 46010, Valencia, Spain

Manuel Martí-Vilar

Departamento de Psicología, Instituto de Investigación de Psicología, Universidad de San Martín de Porres, Tomás Marsano Avenue 242, Lima 34, Perú

César Merino-Soto

Instituto Central de Gestión de la Investigación, Universidad Nacional Federico Villarreal, Carlos Gonzalez Avenue 285, 15088, San Miguel, Perú

José Livia-Segovia

Unidad de Investigación en Medicina Basada en Evidencias, Hospital Infantil de México Federico Gómez Instituto Nacional de Salud, Dr. Márquez 162, 06720, Doctores, Cuauhtémoc, Mexico

Juan Garduño-Espinosa & Filiberto Toledano-Toledano

Unidad de Investigación Multidisciplinaria en Salud, Instituto Nacional de Rehabilitación Luis Guillermo Ibarra Ibarra, México-Xochimilco 289, Arenal de Guadalupe, 14389, Tlalpan, Mexico City, Mexico

Filiberto Toledano-Toledano

Dirección de Investigación y Diseminación del Conocimiento, Instituto Nacional de Ciencias e Innovación para la Formación de Comunidad Científica, INDEHUS, Periférico Sur 4860, Arenal de Guadalupe, 14389, Tlalpan, Mexico City, Mexico

You can also search for this author in PubMed   Google Scholar

Contributions

L.M.B.L. conceptualized the study, collected the data, performed the formal anal- ysis, wrote the original draft, and reviewed and edited the subsequent drafts. M.M.V. collected the data and reviewed and edited the subsequent drafts. C.M.S. collected the data, performed the formal analysis, wrote the original draft, and reviewed and edited the subsequent drafts. J.L.S. collected the data, wrote the original draft, and reviewed and edited the subsequent drafts. J.G.E. collected the data and reviewed and edited the subsequent drafts. F.T.T. conceptualized the study and reviewed and edited the subsequent drafts. L.M.B.L. conceptualized the study and reviewed and edited the subsequent drafts. M.M.V. conceptualized the study and reviewed and edited the subsequent drafts. C.M.S. reviewed and edited the subsequent drafts. J.G.E. reviewed and edited the subsequent drafts. F.T.T. conceptualized the study; provided resources, software, and supervision; wrote the original draft; and reviewed and edited the subsequent drafts.

Corresponding author

Correspondence to Filiberto Toledano-Toledano .

Ethics declarations

Ethics approval and consent to participate.

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Commissions of Research, Ethics and Biosafety (Comisiones de Investigación, Ética y Bioseguridad), Hospital Infantil de México Federico Gómez, National Institute of Health. HIM/2015/017/SSA.1207, “Effects of mindfulness training on psychological distress and quality of life of the family caregiver”.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Bru-Luna, L.M., Martí-Vilar, M., Merino-Soto, C. et al. Person-centered care assessment tool with a focus on quality healthcare: a systematic review of psychometric properties. BMC Psychol 12 , 217 (2024). https://doi.org/10.1186/s40359-024-01716-7

Download citation

Received : 17 May 2023

Accepted : 07 April 2024

Published : 19 April 2024

DOI : https://doi.org/10.1186/s40359-024-01716-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Person-centered care assessment tool

BMC Psychology

ISSN: 2050-7283

data extraction tools for literature reviews

IMAGES

  1. Appendix 4: Example Data Extraction Form

    data extraction tools for literature reviews

  2. Ace your research with these 5 literature review tools

    data extraction tools for literature reviews

  3. Data extraction methods for systematic review...

    data extraction tools for literature reviews

  4. Table 4 from Achieving Rigor in Literature Reviews: Insights from

    data extraction tools for literature reviews

  5. 10 Best Data Extraction Tools for 2021

    data extraction tools for literature reviews

  6. Automating data extraction in systematic reviews: a systematic review

    data extraction tools for literature reviews

VIDEO

  1. Data Extraction Tutorial: Invoice Extraction Using AI

  2. 5. Data extraction from graphs for most data types & practice using web plot digitizer

  3. Specialized Bearing Extraction Tools Tips for Easy Bearing Removal

  4. Extract Amazon products in Bulk

  5. Two Free AI for Literature Review

  6. Extraction Tools Your Bestfriends in Vajacial

COMMENTS

  1. Systematic Reviews: Step 7: Extract Data from Included Studies

    A librarian can advise you on data extraction for your systematic review, including: What the data extraction stage of the review entails; Finding examples in the literature of similar reviews and their completed data tables; How to choose what data to extract from your included articles ; How to create a randomized sample of citations for a ...

  2. Data extraction methods for systematic review (semi)automation: Update

    12 Schmidt et al. 13 published a narrative review of tools with a focus on living systematic review automation. They discuss tools that automate or support the constant literature retrieval that is the hallmark of LSRs, while well-integrated (semi) automation of data extraction and automatic dissemination or visualisation of results between ...

  3. Summarising good practice guidelines for data extraction for systematic

    Data extraction is the process of a systematic review that occurs between identifying eligible studies and analysing the data, whether it can be a qualitative synthesis or a quantitative synthesis involving the pooling of data in a meta-analysis. The aims of data extraction are to obtain information about the included studies in terms of the characteristics of each study and its population and ...

  4. Chapter 5: Collecting data

    5.5.9 Automating data extraction in systematic reviews. Because data extraction is time-consuming and error-prone, automating or semi-automating this step may make the extraction process more efficient and accurate. The state of science relevant to automating data extraction is summarized here (Jonnalagadda et al 2015).

  5. Data extraction methods for systematic review (semi)automation: A

    1. Review published methods and tools aimed at automating or semi-automating the process of data extraction in the context of a systematic review of medical research studies. 2. Review this evidence in the scope of a living review, keeping information up to date and relevant to the challenges faced by systematic reviewers at any time.

  6. Data Extraction Tools and Techniques for Systematic Reviews

    The basic technique for performing a systematic review involves several critical steps, including formulating the key question, mapping the available evidence, appraising the available literature, synthesizing evidence, and developing a logical summary. To accomplish this, you have to employ the right data extraction tools and methods.

  7. 10. Data Extraction

    SRDR (Systematic Review Data Repository) is a Web-based tool for the extraction and management of data for systematic review or meta-analysis. It is also an open and searchable archive of systematic reviews and their data. Access the help page for more information.

  8. PDF Data Extraction for Intervention Systematic Reviews

    Welcome to this practical guide to data extraction for intervention systematic reviews. Whether you're a seasoned researcher or just starting out, this guide can help streamline your data extraction processes, avoid unnecessary work, and deliver high-quality outcomes, regardless of which data extraction tool you are using.

  9. Systematic Reviews: Data Extraction Tools

    To use the data extraction template in Covidence, select the studies for extraction from the Review Summary screen. The Data Extraction Template button brings up an example template with an editing function and preview pane or you may create a template from scratch. Be sure to save any firm changes. Once the template has been developed, publish ...

  10. Data Extraction Tools for Systematic Reviews

    Data Systems. Data systems, like DistillerSR, are the most versatile and efficient data extraction tools suitable for all systematic review scales and team resource levels. They offer a wide range of benefits, including systematic review-specific functions (data automation, data integration, data export, etc.) and simultaneous access for all ...

  11. Data extraction and comparison for complex systematic reviews: a step

    Data extraction (DE) is a challenging step in systematic reviews (SRs). Complex SRs can involve multiple interventions and/or outcomes and encompass multiple research questions. Attempts have been made to clarify DE aspects focusing on the subsequent meta-analysis; there are, however, no guidelines for DE in complex SRs. Comparing datasets extracted independently by pairs of reviewers to ...

  12. Screening tools & Data extraction

    Covidence is a screening and data extraction tool for comprehensive literature reviews that is available to the Brown community. It automatically de-duplicates imported results, and users can create PRISMA flow diagrams easily, along with templates for data extraction and quality assessment.

  13. Research Guides: Systematic Reviews: Data Extraction/Coding/Study

    If you prefer to design your own coded data extraction form from scratch Elamin et al (2009) offer advice on how to decide what electronic tools to use to extract data for analytical reviews. The process of designing a coded data extraction form and codebook are described in Brown, Upchurch & Acton (2003) and Brown et al (2013) .

  14. JABSOM Library: Systematic Review Toolbox: Data Extraction

    Special attention should be paid to the methodology, in order to organize studies by study type category in the review results section. If a meta-analysis is also being completed, extract raw and refined data from each result in the study. Established frameworks for extracting data have been created.

  15. Choice of data extraction tools for systematic reviews depends on

    Abstract. Objective: To assist investigators planning, coordinating, and conducting systematic reviews in the selection of data-extraction tools for conducting systematic reviews. Study design and setting: We constructed an initial table listing available data-collection tools and reflecting our experience with these tools and their performance.

  16. Automating data extraction in systematic reviews: a systematic review

    Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate ...

  17. Data Extraction

    Data Collection. "data slide" by bionicteaching is licensed under CC BY-NC 2.0. This stage of the systematic review process involves transcribing information from each study using a structured piloted format designed to consistently and objectively capture the relevant details. Two reviewers working independently are preferred for accuracy.

  18. Silvi.ai

    Silvi is an end-to-end screening and data extraction tool supporting Systematic Literature Review and Meta-analysis. Silvi helps create systematic literature reviews and meta-analyses that follow Cochrane guidelines in a highly reduced time frame, giving a fast and easy overview. It supports the user through the full process, from literature ...

  19. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  20. A Guide to Evidence Synthesis: 10. Data Extraction

    Excel is the most basic tool for the management of the screening and data extraction stages of the systematic review process. Customized workbooks and spreadsheets can be designed for the review process. A more advanced approach to using Excel for this purpose is the PIECES approach, designed by a librarian at Texas A&M.

  21. PDF Data Extraction Templates in Systematic Literature Reviews: How

    Systematic literature reviews (SLR) are the foundation informing clinical and cost-effectiveness analyses in healthcare decision-making. Established guidelines have encouraged the use of standardised data extraction templates (DET) to guide extraction, ensure transparency in information collected across the studies and allow qualitative and/or ...

  22. Tools for Data

    DistillerSR is an online application designed specifically for the screening and data extraction phases of a systematic review. The system automatically handles promotion and exclusion based on your form design. Export data to any spreadsheet or database software. Easy protocol changes, the system can re-evaluate all user responses in seconds.

  23. Data Extraction

    Data Extraction Form for the Cochrane Review Group (uncoded & used to extract fine-detail/many variables) This form illustrates the thoroughness of the Cochrane research methodology. You could devise a simpler one-page data extraction form for a more simple literature review.

  24. 15 Best AI Literature Review Tools

    10. DistellerSR. DistillerSR is a reliable literature review tool trusted by researchers for its user-friendly interface and advanced search capabilities. With DistillerSR, researchers can quickly identify relevant studies from multiple databases, saving time and effort in the literature review process.

  25. Tools to (Semi) Automate Evidence Synthesis

    This living review will summarize the evidence on the use of ML and AI tools in the conduct of any specific aspects of ESPs commonly produced by EPCs (e.g., abstract screening, data extraction, summary writing). The intended audience includes evidence synthesis practitioners, tool developers, and evidence synthesis methods developers. For the purpose of this review, we define a tool as a ...

  26. SciDaSynth: Interactive Structured Knowledge Extraction and ...

    Extraction and synthesis of structured knowledge from extensive scientific literature are crucial for advancing and disseminating scientific progress. Although many existing systems facilitate literature review and digest, they struggle to process multimodal, varied, and inconsistent information within and across the literature into structured data. We introduce SciDaSynth, a novel interactive ...

  27. Frontiers

    Any disagreement during literature screening and data extraction was resolved by consensus or consulting a third researcher (HD). ... previous literature reviews on MCDAs in healthcare have exhibited certain limitations, including oversimplified categorization of criteria. ... work package 4 report: benefit-risk tools and processes. Available ...

  28. Frontiers

    Data extraction The extracted data included first author, country, year of publication, study aim, study type, number of patients, sample size, validation, treatment, reference standard, imaging modality and features, methodology, model features and algorithm, software segmentation, and use of clinical information (e.g., age, tumor stage, and ...

  29. Person-centered care assessment tool with a focus on quality healthcare

    The person-centered care (PCC) approach plays a fundamental role in ensuring quality healthcare. The Person-Centered Care Assessment Tool (P-CAT) is one of the shortest and simplest tools currently available for measuring PCC. The objective of this study was to conduct a systematic review of the evidence in validation studies of the P-CAT, taking the "Standards" as a frame of reference.

  30. Webinar: Why and how to create an evidence gap map using sexual and

    Evidence on global development programs often remains fragmented by thematic areas of study or regions and populations. Evidence gap maps (EGMs) are the tools that visually highlight where evidence concentrations and gaps exist in a sector or topic area and, in doing so, consolidate knowledge of these programs to inform future investments in research and programming.In the field of health ...