Automated systematic literature search using R, litsearchr and easyPubMed

Introduction.

You would like to do a systematic search of the scientific literature on a given topic. But a wild caveat appears - your familiarity (or lack there of) with the topic will bias your search, while your field lacks in standardized terminology and is fragmented into multiple nomenclature clusters.

Grames et al. (2019) devised a method to address this (you can read the article here ). A gentle introduction to both R and litsearchr package that trivializes the analyses can be found on Library Carpentry .

Simply put we can cast a wide net using a naïve search, retrieve relevant information from data bases (e.g., titles, keywords, abstracts) and analyse this interlinked text data to derive a systematic search strategy. I view it as way to bootstrap knowledge.

We will use the following packages:

  • easyPubMed that simplifies the use of the PubMed API to query and extract article data.
  • litsearchr for automated approach to identifying search terms for systematic reviews using keyword co-occurrence networks.
  • stopwords for the Stopwords ISO Dataset which is the most comprehensive collection of stopwords for multiple languages.
  • igraph for network analyses (this package is already a dependence of litsearchr but there are still many useful functions that are not wrapped by litsearchr functions).
  • ggplot2 , ggraph , and ggrepel for plotting.

1. Load or install packages

2. query pubmed for relevant literature, 2.1. single query.

This is to demonstrate the basic workflow for a single query. The term uses boolean operators to define a query for articles that contain “psychotherapy” and “PTSD” only from year 2020.

The easyPubMed steps are quite simple: (1) retrieve PubMed article ids for the corresponding query using get_pubmed_ids , (2) fetch the actually data corresponding to the retrieved ids, and (3) extract and restructure relevant information.

Check the first 10 entries:

2.2. Multiple queries

We will search in Title/Abstract for the presence three terms (“psychotherapy” / “psychological intervention”, “trial” / “randomized”, “PTSD”) and constrain the search to years 2022 and 2023, respectively. The following outlines the way to use expand.grid to generate combinations of the terms that will subsequently be used for multiple queries.

Inspect terms:

Inspect literature trends with respect to our query:

3. Extract terms from retrieved data

Within extract_terms() use method = "tagged" if keywords are provided. For extracting interesting words out of titles use the Rapid Automatic Keyword Extraction (RAKE) algorithm, coupled with a good stop word collection.

Here we will do both. My heuristic is when in doubt, pool results together .

4. Create Co-Occurrence Network

We will consider the title and abstract of each article to represent the article’s “content” and we will consider a term to have appeared in the article if it appears in either the title or abstract. Based on this we will create the document-feature matrix, where the “documents” are our articles (title and abstract) and the “features” are the search terms. The Co-Occurrence Network is computed using this document-feature matrix.

5. Prune the Network based on node strength

5.1 compute node strength.

Node strength in a network is calculated by summing up the weights of all edges connected to the respective node.Thus, node strength investigates how strongly it is directly connected to other nodes in the network.

5.1 Prune based on chosen criteria

We want to keep only those nodes that have high strength, but how will we decide how many to prune out? litsearchr::find_cutoff() provides us with two ways to decide: cumulative cutoff and change points. The cumulative cutoff method simply retains a certain proportion of the total strength. The change points method uses changepoint::cpt.mean() under the hood to calculate optimal cutoff positions where the trend in strength shows sharp changes.

Again, we will use the heuristic when in doubt, pool results together , i.e. we will use the change point nearest the to the cumulative cutoff value we set.

Inspect selected terms:

We see that some term groupings are obvious: type of study (design), type of intervention, disorder/symptoms, and population. We are not interested in a particular population, so we will exclude terms like “child”, “adult”, “aged”, “veterans”. We will also exclude terms that are good candidates for these groups but are too wide for our scope: “mental health” is too wide for disorder/symptoms, while “psychological” and “treatment” are too wide for type of intervention. Afterwards we will manually define groupings for the rest of the terms.

6. Manual grouping into clusters

7. automatically write the search string, discussions.

Only the last two steps, pertaining to term exclusion and term grouping, need the careful decisions of a human researcher. The automatic workflow, on it’s own, found some important terms that I would have surely omitted. I am especially pleased about “exposure therapy” which is a prevalent class of psychotherapy interventions but is called a “therapy”, making it harder to distinguish from other terms that end in “therapy” but are not psychotherapies.

Grames, E. M., Stillman, A. N., Tingley, M. W., & Elphick, C. S. (2019). An automated approach to identifying search terms for systematic reviews using keyword co‐occurrence networks. Methods in Ecology and Evolution, 10(10), 1645-1654.

Claudiu-Cristian Papasteri

Psychologist, psychotherapist, researcher.

  • Automated systematic literature search using R, litsearchr, and Google Scholar web scraping
  • Pocket guide to Exploratory and Confirmatory Factor Analysis in R
  • Top 5 RStudio Addins for productivity and quality of life
  • HomeLab 6: Dockerized RStudio Server, packages, persistent storage and SSL certs
  • Exploring Moving Average and other smoothers in R

IMAGES

  1. literature review article examples Sample of research literature review

    literature review in r

  2. 50 Smart Literature Review Templates (APA) ᐅ TemplateLab

    literature review in r

  3. 50 Smart Literature Review Templates (APA) ᐅ TemplateLab

    literature review in r

  4. steps for writing a good literature review

    literature review in r

  5. Sample of Research Literature Review

    literature review in r

  6. literaturereviewwritingservice.com » literature review nursing structure

    literature review in r

VIDEO

  1. 3 roles of literature in society

  2. Literature Review (الجزء الثاني)

  3. Chapter two

  4. Literature Review Week 2 By Yeourng Sak

  5. Research Methods

  6. Approaches , Analysis And Sources Of Literature Review ( RESEARCH METHODOLOGY AND IPR)

COMMENTS

  1. Automated systematic literature search using R, litsearchr

    An automated approach to identifying search terms for systematic reviews using keyword co‐occurrence networks. Methods in Ecology and Evolution, 10 (10), 1645-1654. Introduction You would like to do a systematic search of the scientific literature on a given topic. But a wild caveat appears - your familiarity (or lack there of) with the topic ...