Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 28 April 2023

Applications of single-cell RNA sequencing in drug discovery and development

  • Bram Van de Sande 1   na1 ,
  • Joon Sang Lee   ORCID: orcid.org/0000-0003-0838-6166 2   na1 ,
  • Euphemia Mutasa-Gottgens   ORCID: orcid.org/0000-0001-6651-2592 3   na1 ,
  • Bart Naughton   ORCID: orcid.org/0000-0002-9121-7445 4 ,
  • Wendi Bacon   ORCID: orcid.org/0000-0002-8170-8806 3 , 5 ,
  • Jonathan Manning 3 ,
  • Yong Wang   ORCID: orcid.org/0000-0001-7183-3737 6 ,
  • Jack Pollard 7 ,
  • Melissa Mendez   ORCID: orcid.org/0000-0002-3995-2469 8 ,
  • Jon Hill   ORCID: orcid.org/0000-0003-1592-6482 9 ,
  • Namit Kumar   ORCID: orcid.org/0000-0002-5084-3490 10 ,
  • Xiaohong Cao   ORCID: orcid.org/0000-0002-2275-4060 11 ,
  • Xiao Chen 12 ,
  • Mugdha Khaladkar 13 ,
  • Ji Wen   ORCID: orcid.org/0000-0002-3327-5591 14 ,
  • Andrew Leach   ORCID: orcid.org/0000-0001-8178-0253 3 &
  • Edgardo Ferran   ORCID: orcid.org/0000-0003-2092-2486 3  

Nature Reviews Drug Discovery volume  22 ,  pages 496–520 ( 2023 ) Cite this article

46k Accesses

27 Citations

56 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Gene expression profiling

Target identification

Single-cell technologies, particularly single-cell RNA sequencing (scRNA-seq) methods, together with associated computational tools and the growing availability of public data resources, are transforming drug discovery and development. New opportunities are emerging in target identification owing to improved disease understanding through cell subtyping, and highly multiplexed functional genomics screens incorporating scRNA-seq are enhancing target credentialling and prioritization. ScRNA-seq is also aiding the selection of relevant preclinical disease models and providing new insights into drug mechanisms of action. In clinical development, scRNA-seq can inform decision-making via improved biomarker identification for patient stratification and more precise monitoring of drug response and disease progression. Here, we illustrate how scRNA-seq methods are being applied in key steps in drug discovery and development, and discuss ongoing challenges for their implementation in the pharmaceutical industry.

Similar content being viewed by others

drug screening research papers

Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data

Martin Jinye Zhang, Kangcheng Hou, … Alkes L. Price

drug screening research papers

Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2

Yingxin Lin, Yue Cao, … Jean Y. H. Yang

drug screening research papers

Next-generation characterization of the Cancer Cell Line Encyclopedia

Mahmoud Ghandi, Franklin W. Huang, … William R. Sellers

Introduction

Drug discovery is generally an inefficient process characterized by rising costs 1 , 2 , long timelines 3 and high rates of attrition 4 . These inefficiencies are partly rooted in our limited understanding of human biology, in particular, disease-related mechanisms, actionable therapeutic targets and disease response heterogeneity 5 , 6 . The lack of sufficiently representative preclinical models, and the limitations of necessarily reductionist disease models, compound the challenges of understanding human systems.

Before single-cell (SC) approaches, cell and tissue characteristics could only be assessed in bulk and from relatively large amounts of starting material. Amplification-based techniques, such as microarrays, bulk RNA sequencing (RNA-seq) and quantitative PCR with reverse transcription (qRT–PCR) 7 , measured mRNA transcripts in pools of cells and could not distinguish relevant signals from heterogeneous subpopulations or rare cell types. Techniques capable of SC resolution, such as fluorescence-activated cell sorting (FACS), immunohistochemistry and cytometry by time of flight (CyTOF), were limited by the relatively small scale of testable targets and the need for a priori biological insights to enable experimental design 8 , 9 , 10 .

SC technologies that have been developed in the past decade (reviewed in refs. 11 , 12 , 13 ) have made significant inroads towards resolving some of these limitations, while at the same time being complementary to bulk applications that are still commonly used. Among the growing range of technologies, single-cell RNA sequencing (scRNA-seq; Box  1 ) has advanced substantially 14 , 15 since the demonstration of whole-transcriptome profiling from a single cell in 2009 (ref. 16 ), and has reached the point where it is being applied in the pharmaceutical industry to investigate key questions in drug discovery and development (Fig.  1 ). Consequently, scRNA-seq is the focus of this article. SC technologies that extend beyond mRNA to DNA, epigenetic, proteomic and other features 17 are also highlighted.

figure 1

Single-cell technologies are being applied to answer key questions at various stages in the drug discovery and development pipeline. These applications are anticipated to increase the probability of success in the clinic by improving the quality of both the drug candidates emerging from discovery programmes and the clinical development plans for those drug candidates in stratified disease populations.

The rapid and simultaneous development of scalable plate-based and microfluidic-based methods capable of profiling large numbers of single cells has enhanced the utility of SC techniques for industrial-scale applications. Novel computational techniques and other methods (Fig.  2 ; Supplementary Table  1 ; Boxes  2 and 3 ) have also played a key part in leveraging SC data, supported by a growing user community that has helped to improve public data access and generate best practices. The combination of SC profiling platforms and sophisticated computational methods is driving step-change improvements in our knowledge of disease biology and pharmacology. For example, the availability of SC sequencing data for animal model systems is improving our understanding of translatability to humans 18 . ScRNA-seq has enabled identification of molecular pathways that allow prediction of survival 19 , response to therapy 20 , likelihood of resistance 21 , 22 and candidacy for alternative intervention 23 . Further capabilities provided by SC technologies include the identification of novel cell types 24 and subtypes 25 , the refinement of cell differentiation trajectories and the dissection of heterogeneously manifested human traits 26 or constituent cell types that compose multicellular organs or tumours 27 .

figure 2

Representation of the computational tools and/or methods (see Supplementary Table  1 for further details and URLs for the various tools), currently used by pharmaceutical companies for data handling and to probe biological insights through cell-type annotation to reveal genotype and/or phenotype and functional assignment. B cell receptor; CNV, copy number variation; eQTL, expression quantitative trait loci; scATAC-seq, single-cell sequencing assay for transposase-accessible chromatin; scDNA-seq, single-cell DNA sequencing; scRNA-seq, single-cell RNA sequencing; SNV, single-nucleotide variant; ST, spatial transcriptomics; TCR, T cell receptor.

In this Review, we illustrate how SC technologies, primarily scRNA-seq methods, are being applied in the various steps of the drug discovery pipeline, from target identification to clinical decision-making. Ongoing challenges related to study design and data accessibility are also highlighted, as well as potential future directions for the use of SC techniques in drug discovery and development.

Box 1 Fundamentals of single-cell RNA sequencing

A typical single-cell RNA sequencing (scRNA-seq) workflow has three key phases: library generation, pre-processing and post-processing. The library generation process includes the isolation of individual cells or nuclei, mRNA capture and sequencing (see figure). Once sequences are obtained, the subsequent steps are computational. Pre-processing includes the initial analyses to count and clean the data. In post-processing, dimensionality is reduced, gene signatures and cell types are identified, and visualizations may be generated. Data integration and batch correction are optional steps, and ultimately may support the inference analyses. All or a subset of these steps are often performed iteratively to optimize outcomes. Key phases in the typical scRNA-seq workflow are described in more detail below and illustrated in Supplementary Fig. 1 .

Library generation and sequencing

Library generation transforms cells or nuclei into sequencer-ready samples. Sample preparation is a crucial step, which often requires tissue dissociation with mechanical or enzymatic stress, depending on sample type. This unavoidably releases RNA into the suspension, contributing to high background or noise if not removed during data processing. Fresh samples are ideal for high-quality scRNA-seq, and single-nucleus RNA sequencing is usually preferable when samples must be frozen.

Samples are then separated into reaction chambers for lysis and RNA capture, most commonly using 10X Chromium technology, which combines an aqueous flow of cells, barcoded primers carried in beads, lysis buffer and reverse transcription enzymes with oil to create microdroplet reaction chambers. Plate-based technologies perform this step in microwells, and automated microfluidic devices use other forms of microchamber. The common feature is that individual cells must be trapped within a space that is not continuous with spaces containing any other individual cells.

Next, the RNA transcripts of each cell are tagged with a barcoded unique molecular identifier (UMI), to help distinguish between cell transcripts and extraneous PCR amplicons generated during processing. A cDNA library is created by reverse transcription and amplified; depending on the tagging strategy, multiple amplification steps may be needed, and adapter sequences that bind to the flow cell may be ligated as well. The cDNA is then processed, similarly to bulk RNA-seq techniques, by fragmentation to create a homogeneously sized pool of molecules and the addition of index sequences useful for the identification of read origin (for example, to allow multiplexing). Like any sequencing protocol, this workflow contains several purification and quantification steps to ensure high quality. Multiple samples, with different indices, are finally loaded onto a flow cell and sequenced.

drug screening research papers

Sequence data pre-processing

Reads from plate-based technologies (for example, SMART-seq2 (ref. 201 )) can be analysed by traditional bulk genome or transcriptome alignment and quantification pipelines. Droplet-based platforms require specific tools to handle highly cell-multiplexed data to correctly assign UMI counts to cell barcodes. For all methods, an RNA capture rate of between 10% and 20% is common and must be accounted for during analysis 202 .

The Cell Ranger pipeline from 10X Genomics is widely used to process 10X data. It is based on the STAR method for RNA-seq alignment and offers additional features such as cell counting and quality control summary reporting 203 . Academic efforts strengthened by the open-source community provide more recent solutions such as STARsolo 204 , Alevin 205 and Kallisto-BUStools 206 , 207 .

For all platforms, the next steps are to determine counts for each gene in each cell to generate a cell-by-gene matrix. For processing in droplet platforms, pre-emptive filtering to distinguish cells from empty droplets may first be applied 208 , 209 . Further filtering of ambient RNA 210 , 211 and/or methods for removing  doublets are also used 212 , 213 , 214 , and together help to clean the data and reduce data volume. The matrix is then normalized to take into account discrepancies in RNA capture for each cell 215 , 216 , 217 and finally, highly variable genes in a sample may be flagged for downstream analysis.

Sequence data post-processing

Downstream of matrix generation and normalization, typical scRNA-seq workflows include  unsupervised clustering 218 to group together cells with similar expression profiles, and dimensionality reduction, via methods such as t-distributed stochastic neighbour embedding  (t-SNE) 219 or uniform manifold approximation and projection (UMAP) 220 that enable visualization of cell clustering in a 2D or 3D space. Marker genes associated with each cluster are detected via differential expression analysis. Cell-type annotation methods, integrative analysis to correct batch effects, trajectory mapping to trace cell differentiation and cell communication analysis can provide additional insights. Downstream analyses may need to be iteratively performed to optimize the analyses.

Box 2 Other single-cell technologies

Single-cell CRISPR screening technologies: pooled CRISPR screening is an efficient and scalable approach to drug-target discovery but is restricted to low-content readouts and can only identify genes yielding distinct phenotypes. To overcome this, single-cell (SC) CRISPR screening technologies such as Perturb-seq 84 , 86 , 221 , 222 were developed, coupling pooled CRISPR screening with single-cell RNA sequencing (scRNA-seq) or SC multi-omics. Several computational frameworks (MIMOSCA 84 , scMAGeCK 223 , MUSIC 224 , Mixscape 222 ) and a screening platform 85 allow decoding of the effect of individual perturbations on gene expression, their interactions or their cell-state dependence and prioritization of the cell types most sensitive to CRISPR-mediated perturbations at a SC level.

Single-cell DNA sequencing technologies: these have been mainly used to infer cell lineage of cancers and to track cells with treatment-resistant mutations. To overcome technical limitations such as non-uniform coverage depth in scRNA-seq, several computational methods 225 , 226 , 227 , 228 , 229 , 230 have been developed for the identification of single-nucleotide variants (SNVs); short insertions and deletions (indels) and copy number variation (CNV). CNV detection methods for other technologies (for example, array-CGH, single-nucleotide polymorphism (SNP) arrays and whole-genome sequencing (WGS) or whole-exome sequencing (WES)) were also extended and applied to scDNA-seq data 231 . However, scWGS is still very expensive. Therefore, computational methods such as CopyKat 232 and InferCNV 233 have been developed to characterize copy number and intratumoural heterogeneity using scRNA-seq data instead. These methods are also used to infer aneuploidy in cells from scRNA-seq cancer data sets to better delineate host from cancer cells. In addition, scRNA-seq-based point mutation detection approaches 234 , 235 allow linkage of genotype to phenotype and make it possible to detect functional mutations that drive cell-type-specific gene expression. Best practices for mapping of single-cell expression quantitative trait loci (sc-eQTL) have been assessed 236 .

SC T cell receptor and B cell receptor sequencing technologies: scTCR-seq and scBCR-seq help to investigate the dynamics of T cell or B cell clones in tissues or peripheral blood by determining T cell or B cell clonotypes at a SC level. Cells from the adaptive immune system originating from a common ancestor and therefore sharing the same TCR or BCR are called clonotypes. Alternatively, TCR and BCR repertoire reconstruction and clonality inference can be made based on scRNA-seq by using computational methods 237 , 238 , 239 , 240 , 241 . The clonotype dynamics can be examined by using computational tools such as scRepertoire 242 and CellaRepertorium 243 . Coupling scTCR-seq or scBCR-seq with scRNA-seq can reveal the relationship between clonotype and phenotype (or transcriptional states) in T or B cell populations 244 . Detailed characterization of T and B cells provided by SC technologies has helped in understanding of disease (for example, cancer microenvironment, multiple sclerosis antigens, etc.) and in improving of engineered T cell therapies such as chimeric antigen receptor (CAR) T cells.

SC epigenetics: various SC technologies capture epigenetic characteristics at near-nucleotide resolution. SC open chromatin structure (that is, transposase-accessible) can be revealed by single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) 245 , chromatin histone modifications by scCUT&Tag 246 or scChIP–seq 247 , and DNA methylation patterns by scBS-seq 248 . Understanding promoters and enhancers that are activated in a certain cell type or state can help in identifying tissues, cell types and/or biological conditions in which a target is more abundantly expressed and the transcriptional programmes that lead to expression of the target. Moreover, these techniques help to identify causal non-coding variants associated with a disease discovered by genome-wide association studies (GWAS) and map them to a specific cell type.

SC proteomics methods: emerging SC proteomics methods decode the variation of the proteome across individual cells 249 . SC proteomics (sc-proteomics — see reviews 250 , 251 ) methods typically focus on either absolute quantification of a small number of proteins or on highly multiplexed protein measurements. A method has been recently proposed for counting single proteins in single cells, based on nanopore single-molecule peptide reads, that is sensitive to single-amino acid substitutions within individual peptides 252 . This method opens the opportunity to develop single-molecule protein fingerprinting in the future.

SC multi-omics technologies: technologies such as ECCITE-seq 97 , scNMT-seq 161 and DOGMA-seq 253 now allow for the simultaneous measurement of different readouts (for example, RNA expression, surface protein expression, clonotypes, DNA methylation and/or chromatin accessibility) from the same single cells.

Emerging SC technologies and methods: methods for SC microRNA 254 and SC long non-coding RNA (see review 255 ) have expanded RNA transcriptomic profiling. SC metabolomics (sc-metabolomics) techniques were proposed for cataloguing the chemical contents of a single cell or even a single organelle 256 . scRibo-seq, for SC ribosomal profiling, opens the possibility to explore translation at SC level. Integrated with a machine learning approach, this method achieves single codon resolution 257 . Methods such as scSPRITE 258 and Higashi 259 allow detection of high-order 3D genome structures in single cells (scHi-C).

Spatially resolved omics approaches: SC technologies lose spatial information during the tissue dissociation step. Spatially resolved omics approaches have been recently developed 260 , 261 , 262 to recover the spatial context. Excellent reviews on spatial transcriptomics and associated computational methods are available 263 , 264 , 265 , 266 .

Box 3 Computational methods used to infer insights from single-cell RNA sequencing data sets

Single-cell (SC) sequence data pre-processing is required before insights can be generated from a SC data set. Once a gene expression matrix has been generated several methods exist to provide answers to relevant research questions. This box highlights pre-processing methods, focusing on areas of active development and concern.

Methods for addressing sparsity in scRNA-seq data sets: single-cell RNA sequencing (scRNA-seq) data sets are sparse in that many counts in the gene expression matrix are zero, that is, not a single RNA molecule is detected for those genes. The source of this higher prevalence of zeros in comparison with bulk samples is diverse. Biological sources of sparsity in a data set are mainly driven by absent gene expression in the various cell types captured in a sample. In addition, gene expression is a stochastic process, which also contributes to a higher frequency of zero read counts. Technical sources of sparsity are inefficiencies in mRNA capture and/or sampling effect owing to limited sequencing depth. How to deal with these challenges is under discussion and ranges from using appropriate statistical models, for example, zero-inflated Poisson models, to use of imputation techniques. This topic is nicely reviewed in ref. 267 .

Batch-effect correction and data integration methods: SC data from large-scale or multiple studies are frequently generated by multiple institutions and/or in different experimental conditions. Two recent papers 268 , 269 comprehensively evaluate the performance of batch correction, that is, removing the variability in the data due to technical or other less relevant variables, and data integration methods, that is, methods that combine several data sets in an embedded space or provide a common expression matrix. These tools help to facilitate integrative analyses of SC data from various sources. However, the application of batch correction methods to SC data from heterogeneous diseases (for example, tumours) may risk obscuring true biological signals. Proper experimental planning is important and directly empowers these tools 270 .

Single-cell multi-omics analysis: joint analysis of SC multi-omics data enhances the ability to more deeply characterize cell types and states and their association with disease progression and drug effect 251 . Weighted nearest neighbour (WNN) analysis in Seurat 183 , CiteFuse 252 , MOFA+ 253 and totalVI 183 , 253 , 254 , 255 , 256 have been developed to improve the ability to resolve cell states and fates by integration of multimodal SC data. When generated from different cells, such multimodal measurements are projected into a common latent space by computational methods such as LIGER 255 , 257 , and canonical correlation analysis (CCA) in Seurat 256 to jointly model variation across sample groups and data modalities.

Cell-type annotation: for scRNA-seq data, cell-type annotation can be performed based on unsupervised cell clustering and marker genes identified per cluster. This approach is very labour intensive. To facilitate cell-type annotation, automated cell-type annotation tools have been developed including Seurat label transfer 271 , Garnett 272 , scmap 273 , SingleR 274 Cell-ID 275 and more recently CellTypist 189 .

Once a properly integrated, normalized and annotated data set is available, insights can be derived from these data sets using a wide variety of methods.

Trajectory inference or pseudo-time analysis: cells experience dynamic processes such as differentiation, response to treatment and disease evolution. A heterogeneous sample of cells represents a snapshot of cells in various phases of these processes. Trajectory inference (TI) is used to determine the pattern of such a dynamic process. Widely used TI computational tools include Monocle 276 , PAGA 277 , Slingshot 278 , STEMNET 279 and Scorpius 280 . Most TI methods require prior understanding of the anticipated topology and careful design considerations 169 . These methods are different from RNA velocity 281 , which exploits the presence of unspliced mRNA to derive an estimate of the rate of change of gene expression. Many methods have expanded upon this technique: Velocyto 281 and scVelo 282 . CellRank 283 combines TI and RNA velocity techniques.

Pathway analysis tools: these provide cell-type specific functional annotation and new biological insights into disease and response to treatment. GSVA 284 and single sample gene set enrichment analysis (ssGSEA) 285 were designed for bulk RNA-seq but can be applied to scRNA-seq data for this purpose. Tools such as Pagoda2 (ref. 286 ) and Vision 287 were developed for the characterization of cell-type specific transcriptional heterogeneity. They allow interactive analysis of large SC data sets and identification of intercellular relationships in disease or in response to treatment.

Cell–cell communication analysis: disease can be caused by disruptions in cell–cell communications 288 , and a growing collection of computational tools support drawing inferences about these disruptions 183 , 289 , 290 , 291 , 292 , 293 , 294 , generating new hypotheses and potentially enhancing disease understanding 295 .

Cell-type deconvolution methods: most clinical transcriptomics data are currently generated with either bulk RNA-seq or microarray. Cell-type deconvolution methods 296 , 297 , 298 , 299 , 300 , 301 enable the estimation of cell-type composition based on gene signatures derived from scRNA-seq data and are especially useful in the drug development pipeline.

Methods of mapping disease-associated variants to scRNA-seq data sets: methods are emerging that integrate genetic cues from genome-wide association studies (GWAS) with SC phenotypic data sets such as transcriptomics. SC Linker combines GWAS summary statistics with SC transcriptomics to quantify the heritability of a gene expression signature derived from scRNA-seq data sets (capturing either a cell type or a biological process) 81 . Another method called scDRS looks for enrichment of polygenic GWAS-derived signatures in SC gene expression profiles 182 .

Applications in drug discovery and development

SC technologies can be applied throughout drug discovery and development (Fig.  1 ). Improved disease understanding gained through subtyping based on altered cell compositions and cell states can guide the identification of novel cellular and molecular targets.  Target credentialling and validation can benefit from the use of SC sequencing in the identification of relevant preclinical models for a given disease subtype. Highly multiplexed functional genomics screens that merge CRISPR and SC sequencing (sc CRISPR screening ; Box  2 ) can enhance target credentialling throughput and augment the perturbation readouts with mechanistic information to improve target prioritization. SC sequencing technologies can provide insights on cell-type-specific compound actions, off-target effects and heterogeneous responses to inform drug candidate selection. In clinical development, these technologies can contribute by helping to identify  biomarkers for patient stratification, elucidating drug mechanisms of action or resistance, or monitoring drug responses and disease progression. Opportunities to characterize and improve engineered biologics and cell therapies using SC technologies are also emerging (Box  4 ).

Below, we discuss representative published studies that demonstrate how SC technologies, particularly scRNA-seq approaches, can be applied in key steps in drug discovery and development, with a focus on those that are widely used in the pharmaceutical industry.

Box 4 Single-cell analysis for biologics and cell therapies

Monoclonal antibodies

Single-cell sequencing technologies can accelerate and improve therapeutic antibody identification and optimization. Charting the full antibody repertoire of an immunized animal, subsequently tracking its evolution during clonal selection, expansion and affinity maturation, and comparison with derived hybridoma cell lines at cellular resolution is enabled by high-throughput single-cell B cell receptor sequencing (scBCR-seq) 302 (Box  2 ). These efforts can assist therapeutic antibody identification by expanding the available BCR repertoire, and may also improve the generation of diverse and large phage displays or the mining for therapeutic antibodies based on sequence similarity 303 . Moreover, technologies such as LIBRA-seq combine scBCR readouts with antigen specificity and thereby directly expedite lead discovery 304 . Finally, direct usage of the human B cell reservoir of convalescent donors as an antibody pool opens new avenues for the development of therapeutic monoclonal antibodies. This approach has been used to engineer neutralizing monoclonal antibodies for coronavirus disease 2019 (COVID-19) 305 .

CAR-T cell therapies

Chimeric antigen receptor (CAR)-T cell therapies have shown strong efficacy in the treatment of some B cell-originating haematological malignancies. Unfortunately, the toxicity induced by these treatments can be life-threatening, and efficacy is restricted to a subset of patients. Single-cell RNA sequencing (scRNA-seq) has been used as a complementary tool to investigate cellular heterogeneity and cell composition dynamics in the pre-treated patient peripheral blood mononuclear cell (PBMC) samples and post-CAR-T infusion time points 306 .

B cell maturation antigen (BCMA) CAR-T cells have demonstrated promising effects in patients with relapsed or refractory multiple myeloma. ScRNA-seq has been used to analyse the dynamics of BCMA CAR-T cells in a clinically successful case of relapsed or refractory primary plasma cell leukaemia (pPCL) 307 . At the peak phase, CAR-T cells were found to shift from a highly proliferative state to a highly cytotoxic state, finally changing to a memory-like state at remission phase.

Many SC studies focus on understanding factors that drive favourable outcomes in CAR-T cell therapies. In large B cell lymphoma (LBCL), complete response is associated with the increase in memory CD8 + T cells 308 . Multi-omic SC interrogation of T cells showed that interferon signalling controlled by IRF7 reduces persistence of CAR-T cells after treatment 309 . In parallel, efforts to better understand and control toxicity of these therapies are undertaken. In normal brain tissue, a small population of mural cells — which surround the endothelium and are crucial for blood–brain barrier integrity — were shown to express CD19 and are therefore potentially targeted by CD19 CAR-T cells 310 . These findings can explain the CAR-T cell-induced neurotoxicity, due to increased vascular permeability in the brain. Investigation of expression patterns of CD19 using human SC reference atlases such as the Human Cell Landscape (HCL), revealed potentially on-target off-tumour toxic effects of CD19 CAR-T cell treatment 311 .

Improvements in CAR-T cell therapy are also being explored using genome-wide genetic perturbation techniques. CRISPR perturbation studies revealed that knocking out TLE4 and IKZF2 (encoding Helios) in CAR-T cells boosted their antitumour efficacy 312 . In a different approach, OverCITE-seq, which overexpresses open reading frames (ORFs) in T cells in a high-throughput fashion, was developed and combined with SC transcriptomics and epitope profiling 313 . Applying this to CAR-T cells, the gene LTBR was discovered to increase resistance to exhaustion and to augment overall effector function f these cells.

Disease understanding

As most complex diseases involve multiple cell types, SC resolution can significantly advance disease understanding. ScRNA-seq captures differences in cell-type composition and changes in cellular phenotype that are characteristic of a pathological state. Moreover, the unbiased view of scRNA-seq can detect the presence of rare cell types that drive pathobiology.

SC technologies are providing detailed knowledge of underlying disease mechanisms, enabling the investigation of novel therapeutic approaches. Although an exhaustive review is outside the scope of this article, illustrative examples for cancer, neurodegenerative diseases, inflammatory and autoimmune diseases, as well as infectious diseases are presented.

SC molecular phenotyping has been extensively used to understand cancer development. Notable examples include the application of SC technologies to identify the cell of origin or cells associated with prostate carcinogenesis, heterogeneous papillary renal cell carcinoma (pRCC) and Barrett’s oesophagus leading to oesophageal adenocarcinoma 28 , 29 , 30 .

ScRNA-seq has revealed extensive cellular and transcriptional cell-state diversity in cancer and enabled tracking of cancer cell heterogeneity. This has been combined with immunophenotyping techniques to provide a view of stromal–immune niches (ecosystems or ecotypes) with unique cellular composition characterizing different types of tumour. Certain ecotypes are sometimes associated with tumour initiation or progression, sensitivity or resistance to therapeutic agents or clinical outcome as demonstrated by the application of this approach to capture the heterogenicity of diffuse large B cell lymphoma, breast cancer, oesophageal squamous cell carcinoma tumours and papillary thyroid carcinoma 31 , 32 , 33 , 34 .

SC technologies such as Perturb-seq hold promise in the mapping of genotype to phenotype changes — not only for oncology but also in other diseases — by assessing the impact of rare and common human disease genetic variants. This has been applied to assess the phenotypic consequences of somatic coding variants in the oncogene KRAS and the tumour suppressor gene TP53 in an unbiased and high-throughput fashion 35 .

As the extensive transcriptional cell-state diversity found in cancer is often observed independently of genetic heterogeneity, many studies have investigated the epigenetic coding of malignant cell states. Understanding epigenetic mechanisms is vital as they may enable adaptation to challenging microenvironments and may contribute to therapeutic resistance. Multi-omics SC profiling (Box  2 ) has provided insights into intratumoural heterogeneity in glioma and identified epigenetic mechanisms that underlie gliomagenesis 36 , 37 .

Longitudinal studies provide insights into the biological mechanisms associated with tumour progression and fitness of polyclonal tumours. Most studies have been carried out using mouse models or patient-derived xenografts (PDXs). Examples of this approach include a longitudinal SC analysis of samples from a myeloma mouse model that led to the identification of the GCN2 stress response as a potential therapeutic target 38 , and multi-year time-series SC whole-genome sequencing (scWGS; Box  2 ) of breast epithelium and primary triple-negative breast cancer (TNBC) PDX, which revealed how clonal fitness dynamics was induced by TP53 mutations and cisplatin chemotherapy 39 .

SC studies have also improved understanding of metastasis. A Cas9-based, SC lineage tracer has been applied to study the rates, routes and drivers of metastasis in a lung cancer xenograft mouse model, revealing that metastatic capacity was heterogeneous, arising from pre-existing and heritable differences in gene expression, and uncovering a previously unknown suppressive role for KRT17 (ref. 40 ). This study demonstrated the power of tracing cancer progression at subclonal resolution and vast scale. Further, SC immune mapping of melanoma sentinel lymph nodes (SLNs) identified immunological changes that compromise anti-melanoma immunity and contribute to a high relapse rate 41 . The progressive immune dysfunction found to be associated with micro-metastasis in patients with stage I–III cutaneous melanoma may motivate new hypotheses for neoadjuvant therapy with potential to reinvigorate endogenous antitumour immunity 42 . A similar suppressed immune environment was observed in acral melanoma compared with that of cutaneous melanoma from non-acral skin 43 . Expression of multiple, therapeutically tractable immune checkpoints was observed, offering new options for clinical translation that may have been missed without SC approaches. Metastasis studies based on SC analysis of circulating tumour cells (CTCs) have also been carried out 44 , 45 . The spatial heterogeneity and the immune-evasion mechanism of CTCs in hepatocellular carcinoma (HCC) have been dissected using scRNA-seq 44 , identifying chemokine CCL5 as an important mediator of CTC immune evasion, and highlighting a potential anti-metastatic therapeutic strategy in HCC. Further, it was recently shown that the spread of breast cancer cells occurs predominantly during sleep. ScRNA-seq analysis of blood CTCs, which increase during rest in both patients and mouse models, revealed a marked upregulation of mitotic genes, exclusively during the resting phase, thus enabling metastasis proficiency 45 .

A step change in our understanding of cancer is anticipated from initiatives such as the Human Tumour Atlas Network (HTAN) 46 established by the National Cancer Institute, the primary focus of which is to elucidate the evolution of cancer from its pre-malignant forms to the state of metastasis at SC and spatial resolution. HTAN will generate SC, multiparametric, longitudinal atlases and integrate them with clinical outcomes. This initiative has already resulted in studies that capture in detail tumour initiation and progression as demonstrated by the creation of a SC tumour atlas covering the transition of polyps to malignant adenocarcinoma in colorectal cancer (CRC) 47 .

Neurodegenerative diseases

Parkinson disease is caused by the degeneration of dopaminergic neurons in the substantia nigra 48 , but not all dopamine-producing neurons degenerate. SC genomic profiling of human dopamine neurons found that although there are ten transcriptionally defined dopaminergic subpopulations in the human substantia nigra, only one population selectively degenerates in Parkinson disease, and the transcriptional signature of this population is highly enriched for the expression of genes associated with Parkinson disease risk 49 . The vulnerability of this population of dopaminergic neurons may provide insights for potential therapeutic interventions.

A different approach was used to study somatic DNA changes in single Alzheimer disease neurons. By comparing more than 300 individual neurons from the hippocampus and the prefrontal cortex of patients with Alzheimer disease with matched controls using scWGS, genomic alterations implicating nucleotide oxidation in the impairment of neural function were identified 50 . This work provided a different perspective on disease evolution, suggesting that the known pathogenic mechanisms in Alzheimer disease may lead to genomic damage in neurons that can progressively impair their function.

The role of immune cells in neurodegenerative diseases is posited in many recent studies. ScRNA-seq studies of brain tissues from both healthy mice and Alzheimer disease mouse models highlight disease-associated microglia, suggesting that a cell-state-targeting strategy may benefit patients with Alzheimer disease 51 (Fig.  3 ). In addition, SC transcriptome and T cell receptor (TCR) profiling (Box  2 ) has revealed T cell compartments that are activated and expanded in Parkinson disease 52 .

figure 3

Single-cell RNA sequencing (scRNA-seq) reveals a novel microglia type in an Alzheimer disease (AD) mouse model. Unbiased clustering of single immune cells (CD45 + ) sorted from wild-type (WT) and AD mouse brains classified the cells into ten subpopulations, according to the expression patterns of the 500 most variable genes. The analysis thus allowed for de novo identification of rare subpopulations and revealed three microglia types: 1 (yellow), 2 (orange) and 3 (red). As the distinct microglia states of the orange and red clusters are found only in the AD model mice, they are called ‘disease-associated microglia’ (DAM). Microglia 1 cluster corresponds to homeostatic monocyte states found in both WT and AD. Differential expression analysis between DAM (microglia 3) and homeostatic microglia (microglia 1) from the AD mouse brain shows that DAMs are characterized by a significant downregulation of homeostatic markers and upregulation of several known AD risk factors. Microglia 2 is an intermediate Trem2 -independent state between microglia 1 and microglia 3. t-Distributed stochastic neighbour embedding (t-SNE) map adapted with permission from ref. 51 , Elsevier.

Novel SC technologies have been developed to study the brain. Examples include Patch-seq 53 , 54 — a robust platform that combines scRNA-seq with patch clamp recording — and VINE-seq 55 , which is based on single-nucleus RNA sequencing (snRNA-seq). These approaches have been used to identify cell types in the neocortex that were selectively depleted in Alzheimer disease and to chart vascular and perivascular cell types at SC resolution in the human Alzheimer disease brain, respectively 55 , 56 .

Inflammatory and autoimmune diseases

ScRNA-seq was used to characterize a particular regulatory T cell present in spondyloarthritis 57 and helped the discovery of cytotoxic T cells in the synovium in psoriatic arthritis. Clonal expansion of these synovial immune cells was demonstrated via complementary TCR-seq 58 . Differentiation of peripheral blood mononuclear cell (PBMC) samples of patients with anti-citrullinated peptide antibody-positive (ACPA + ) and negative (ACPA − ) rheumatoid arthritis at the SC level mapped immune correlates to each of these two different rheumatoid arthritis subtypes 59 , while profiling of the immune compartment of skin biopsies revealed that common dermatological inflammatory diseases each have distinct T cell resident memory, innate lymphoid cell and CD8 + T cell gene signatures 59 , 60 .

In multiple sclerosis, comparing PBMC samples at SC resolution from sets of twins discordant in multiple sclerosis revealed an inflammatory shift in a monocyte cluster, together with a subset of naive helper T cells that are IL-2-hyper-responsive in the multiple sclerosis cohort 61 . SC techniques have also helped to explain epidemiological evidence implicating Epstein–Barr virus (EBV) as a necessary aetiological factor in multiple sclerosis 62 . Using single-cell B cell receptor sequencing (scBCR-seq; Box  2 ) of both cerebrospinal fluid and blood from patients with multiple sclerosis revealed expansion of B cell clones in multiple sclerosis that bind a similar antigen in glia (GlialCAM) and EBV (EBNA1) 63 .

Further studies in rheumatoid arthritis, modelling expression quantitative trait loci (eQTLs) at SC resolution in memory T cells found several autoimmune variants enriched in cell-state-dependent eQTLs 64 , identifying risk variants for rheumatoid arthritis enriched near the ORMDL3 and CTLA4 genes. It is important to note that eQTLs depend on the functional cell state, thus their identification is complicated in studies that aggregate cells.

Technological advancements building on SC protocols can further enhance disease understanding. For example, tetramer-associated T cell antigen receptor sequencing (TetTCR-SeqHD) helped to unravel the role of cytotoxic T cells in type 1 diabetes by combining TCR-seq readouts with cognate antigen specificity, gene expression and surface marker presence 65 .

Infectious diseases

A prominent example of the use of SC approaches to advance understanding of infectious diseases is in the recent study of coronavirus disease 2019 (COVID-19) to identify immune correlates of disease severity in human tissue. Comparing bronchoalveolar lavages of patients with COVID-19 of different disease severity found local immune profiles associated with disease status 66 . Analyses of SC transcriptome, surface proteome and T and B lymphocyte antigen receptors of PBMC samples from patients with COVID-19 found a monocytic role in platelet aggregation, circulating follicular helper T cells in mild disease and clonal expansion of cytotoxic CD8 + T cells and an increased ratio of CD8 + effector T cells to effector memory T cells in the more severe cases 67 . These findings indicate cellular components that might be targeted therapeutically. Similarly, scRNA-seq of circulating immune cells and readouts of metabolites in plasma of patients with COVID-19 revealed an intricate interplay between immunophenotypes and metabolic reprogramming. Emerging rare, but metabolically dominant, T cell subpopulations were found, along with a bifurcation of monocytes into two metabolically distinct subsets that correlated with disease severity 68 . Further, combining SC transcriptomics and SC proteomics (Box  2 ) with mechanistic studies found that generation of the C3a complement protein fragment by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection drives differentiation of a CD16-expressing T cell population associated with severe COVID-19 disease outcomes 69 .

SC analysis of lung tissue samples collected post-mortem from patients with COVID-19 identified molecular fingerprints of hyperinflammation, alveolar epithelial cell exhaustion, vascular changes and fibrosis 70 . Data suggested FOXO3A suppression as a potential mechanism underlying the fibroblast-to-myofibroblast transition associated with COVID-19 pulmonary fibrosis, providing insights into potential symptomatic treatments for SARS-CoV-2. A complementary study compiling lethal COVID-19 multi-tissue SC data sets from scRNA-seq and snRNA-seq analyses identified potential disease-relevant mechanisms, such as defective alveolar type 2 differentiation, expansion of fibroblasts and putative TP63 + intrapulmonary basal-like progenitor cells in the lungs of dead patients 71 . A review of the SC immunology of SARS-CoV-2 infection has provided interactive and downloadable curated SC data sets 72 .

Other notable applications of SC technologies in infectious diseases include the study of bacterial heterogeneous clonal evolution during infection and the characterization of granulomas in tuberculosis.

Parallel sequential fluorescence in situ hybridization (Par-seqFISH) was developed to capture gene expression profiles of individual prokaryotic cells while preserving spatial context 73 . This technology showed heterogeneity in growing Pseudomonas aeruginosa populations and demonstrated that individual multicellular biofilms can contain coexisting but separated subpopulations with distinct physiological activities 73 .

Coupling sophisticated SC analyses with detailed in vivo measurements of Mycobacterium tuberculosis- associated granulomas was used to define the cellular and transcriptional properties of a successful host immune response during tuberculosis 74 . Lack of clearance of granulomas and persistence of M. tuberculosis was characterized by type 2 immunity and a wound-healing involvement, whereas granulomas that drove bacterial control were dominated by the presence of pro-inflammatory type 1, type 17 and cytotoxic T cells 74 .

Target discovery

The precision and granularity that SC technologies bring to disease understanding can not only accelerate the discovery of new drug targets, but also potentially reduce attrition by providing insights into issues that affect the likelihood that drug candidates modulating these targets will progress successfully. Below, we discuss examples that illustrate the general impact of SC technologies in target discovery, while being mindful that the terms associated with target progression, such as identification, validation, credentialling and qualification have different but overlapping meanings.

Oncology is at the forefront of the application of SC approaches to target identification. A clear example of the use of SC analysis in the discovery of novel cell-type-specific targets is the identification of S100A4 as a novel immunotherapy target in glioblastoma, following an integrated analysis of >200,000 glioma, immune and other stromal cells from human glioma samples at the SC level. Deleting this target in non-cancer cells reprogrammed the immune landscape and significantly improved survival 75 . Developing strategies to directly target cancer cells remains a primary focus, and SC technologies can also provide significant benefits here. As an example, SC genomics has recently provided a map charting potential new tumour antigens 76 . These are ideal targets for cell-depleting therapeutic monoclonal antibodies, as has been demonstrated for haematological cancers (for example, rituximab or alemtuzumab).

SC techniques have been applied in target identification in other therapeutic areas besides oncology. Of particular interest are studies in diseases with a fibrotic component, as there are few therapeutic options currently available. For example, scRNA-seq in mice comparing healthy and ischaemic hearts identified CKAP4 as a potential target for preventing fibroblast activation and thereby reducing the risk of cardiac fibrosis 77 . In cardiac samples from patients with ischaemic heart disease, expression of CKAP4 positively correlated with genes known to be induced in activated cardiac fibroblasts. In human chronic kidney disease, the creation of a multi-model SC atlas facilitated the discovery of myofibroblast-specific naked cuticle homologue 2 (NKD2) as a candidate therapeutic target in kidney fibrosis 78 . In addition, in a mouse model of kidney fibrosis, the transcription factor RUNX1 was identified as a potential target to block myofibroblast differentiation, after further analysis of sparse single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq; Box  2 ) data 79 .

Human genetic data are a key resource for target identification 4 . Integrating information on cell-type-specific expression with disease-associated genetic variants from genome-wide association studies (GWAS) — so-called sc-eQTL — can identify the cell types and effector genes that have a causal role in disease, providing insight into potential therapeutic approaches 80 . Other strategies that combine GWAS summary statistics with SC transcriptomics quantify the heritability of a gene expression signature derived from scRNA-seq data sets (capturing either a cell type or a biological process) 81 . Via a method called SC Linker (Box  3 ), novel relationships between GABAergic neurons in major depressive disorder, disease progression programmes in M cells in ulcerative colitis and a disease-specific complement cascade process in multiple sclerosis have been identified 81 .

Computational frameworks integrating complementary molecular information have been used extensively to prioritize potential drug targets. For example, GuiltyTargets annotates on protein–protein interaction networks with differentially expressed genes linked to a disease, learns an embedded representation and uses this to predict new targets 82 . The incorporation of SC data sets into these computational approaches enables the prediction of cell-specific targets. For example, a network-based approach based on SC data sets has been used to prioritize drug targets in arthritis 83 .

Target credentialling and validation

In target credentialling and validation, confidence in a gene target is established by acquiring and combining evidence from various sources (disease biology, target biology and tractability, genetic studies, etc.). The translational validity of study models may also be examined to better understand potential gaps between the models and the disease biology or therapeutic aim. ScRNA-seq data can inform each of these facets.

Routes to improving confidence in a target include validating functional linkages between the target and the disease biology. Gene targets, gene signatures and cell states affected by individual perturbations and their genetic interactions may all be assessed at once through a scCRISPR screen, allowing target categorization and prioritization. Traditionally, significant resources are involved in target credentialling, and so compromises are often made between the number of targets examined and the complexity and number of readouts. ScCRISPR screening alone or after a genome-wide pooled screen (Box  2 ) can mitigate this trade-off by allowing tens to hundreds of perturbations to be pooled and profiled at once 84 , 85 , 86 .

An application of this scCRISPR screening approach first involved the identification of regulators of T cell stimulation and immunosuppression using a genome-wide pooled CRISPR screen, with candidate hits followed up with functional assays and Perturb-seq to reveal affected gene programmes, leading to at least four potential antitumour targets 87 . More recently, the platform has been expanded to allow paired CRISPR activation (CRISPRa) and CRISPR interference (CRISPRi) screening and pooled scRNA-seq profiling, advancing the range and depth of target validation. Perturb-seq could also be performed in vivo 88 , allowing investigation of gene functions in multiple cell types in a physiological context.

Targets may be further credentialled and validated for their impact on disease-relevant mechanisms by using functional genomics or pharmacology studies in vitro or in vivo. Currently, readouts of these studies are usually low-dimensional, focusing on only dozens of predefined proteins or specific disease-related phenotypes 89 , 90 , 91 . However, coupling these studies with unbiased omics readouts can provide more granularity, allow exploration of drug mode of action (MoA) (see also next section) and even reveal any unexpected toxicity profiles. Transcriptomic readouts are often the most cost-effective and relatively straightforward to interpret, and SC transcriptomics has the additional advantage of high resolution, especially for complex models. For example, dual specificity phosphatase 6 (DUSP6) has been proposed as a potential target for inflammatory bowel disease (IBD) 92 and the roles of Dusp6 , which had remained unclear previously from a study using bulk RNA sequencing 93 , have been dissected in mice in a cell-type-specific manner using scRNA-seq 94 .

De-orphaning studies are typically needed if the target of the drug candidate is unknown. These studies are particularly interesting for drug combinations or bispecific treatments, because biological mechanisms that are different from those of the individual drugs may be involved. For example, scRNA-seq profiling of CD45 + -enriched cells from livers of mice treated with an anti-CTLA4 immune checkpoint inhibitor (ICI), and/or the IDO1 inhibitor epacadostat showed that the combination promotes CD8 + T cell proliferation and activation, and the enrichment of an interferon-γ (IFNγ) gene signature 95 . Similarly, flow cytometry and CyTOF were applied to demonstrate that anti-CD47–PDL1 bispecific treatment reduced binding on red blood cells and enhanced selectivity to the tumour microenvironment (TME), compared with anti-CD47 and anti-PDL1 monotherapies or combination therapies 96 . ScRNA-seq enabled further exploration of the mechanism, including myeloid population reprogramming, activation of the innate immune system and T cell differentiation, which cannot be directly measured using traditional methods.

ScRNA-seq can be conveniently combined with scATAC-seq for chromatin information, DNA-barcoded antibody staining for surface and/or intracellular protein expression (such as CITE-seq/ECCITE-seq 97 and INs-seq 98 ) and is therefore useful when target modulation results in pre- and/or post-transcriptional changes (Box  2 ). For instance, to study ICI resistance (ICR), Perturb-seq was extended and coupled with antibody staining and TCR profiling 99 . This work targeted 248 genes of the ICR signature identified in a previous study 22 and revealed novel ICR mechanisms including downregulation of CD58 along with known resistance mechanisms.

Preclinical studies

Selecting the appropriate models for target credentialling maximizes clinical translatability. In vitro models include cell lines, primary cells and patient-derived organoids (PDOs), the latter incorporating some elements of higher-order tissue organizational complexity. In vivo models include syngeneic models, in which murine cancer cells are isografted into genotypically similar mice, PDX in immunodeficient mice, and genetically engineered mouse models (GEMMs), which recapitulate genetic alterations crucial to human carcinogenesis. Before the advent of SC omics technologies, the relative translatability of derived research models could be assessed using bulk and/or antibody-targeted SC methods (for example, flow cytometry) capable of demonstrating that characteristics of patients or donors were, in fact, recapitulated by the research models 100 . SC sequencing methods expand the granularity with which model or patient fidelity can be examined by shifting assessments from wholesale pools or averages to measurements of cell-type composition, intra-tissue heterogeneity and detection of rare cell phenotypes.

It has long been suggested that therapeutic strategies that account for the cellular pathogenic diversity present in complex diseases such as cancer are more likely to be successful in patients. ScRNA-seq profiling of the Cancer Cell Line Encyclopedia (CCLE) revealed patterns of heterogeneity shared between tumour lineages and specific cell model lines, suggesting that derivative cell models are promising tools for the discovery of therapeutic strategies that are not compromised by cellular heterogeneity 101 .

Although cell lines are easy to manipulate and have limited associated costs, more complex biological model systems better recapitulate the cell–cell interplay and emergent functions of human physiology. Using scRNA-seq to expand and quantify the extent of this recapitulation helps to guide efforts towards the most translatable systems for preclinical development, and recent areas of focus include mouse 102 and human organoids 103 . Human liver organoids have been shown to be highly predictive for drug-induced liver injury (DILI) 104 , and human PDOs derived from pancreatic duct adenocarcinoma malignant ductal cells have been assessed as a good model for the human counterpart 105 .

Taking model complexity a step further, SC sequencing studies of hepatoblastoma and lung adenocarcinoma have demonstrated that tumour state and heterogeneity are preserved in PDX models despite differences in TME 106 and that they can help to identify heterogeneity in drug responses and likely associations with anti-drug resistance 107 .

Characterization of well-established GEMMs at SC resolution 108 and compendiums of mouse SC transcriptomic data have facilitated the identification of genes with similar murine and human expression profiles 109 , ligand–receptor interactions across all cell types in a microenvironment of syngeneic mouse models 110 , and similarities across murine–human cell populations or subpopulations in lung cancer 18 (Supplementary Fig. 2 ). Similarly, recent SC studies revealed mechanisms underlying chemotherapy-induced ototoxicity after comparing healthy and cisplatin-exposed mice 111 , as well as mechanisms of ICI-induced liver injury following comparisons of treated versus untreated mice 95 .

A growing number of public SC data sets, representing models of interest, healthy and diseased human donors, are enabling researchers to better assess translatability 18 , 109 , 112 (Table  1 ).

Drug screening and MoA analysis

High-throughput screening (HTS) in drug discovery is traditionally performed using coarse (cell viability or proliferation) or highly specific (marker expression) readouts. If a more unbiased phenotypic assessment is chosen, using bulk assessments such as RNA-seq assumes that all cells in the assay behave similarly. In comparison with bulk RNA-seq, SC transcriptomics offers more detailed views of the responding cell types, and the corresponding cell-type-specific changes (pathway, off-target effects, dose–response profiles), allowing for separation of confounding factors such as cell cycles. Therefore, HTS approaches have recently been combined with scRNA-seq readouts. Standard HTS tests a much larger number of compounds but typically at a single dose and under very limited biological conditions, whereas the novel HTS approaches that use SC gene expression readouts test several doses and conditions at the same time and are well adapted for drug MoA studies (Fig.  4 ).

figure 4

a , Standard high-throughput screening (HTS) tests a much larger number of compounds than HTS using single cells, but typically at a single dose and a single biological condition. The most active compounds obtained by standard HTS must be further studied (for example, dose–response analysis) but finally provide hits that are the starting point for drug discovery of active and safe drugs. b , HTS using single-cell approaches allows for testing of several doses and conditions at the same time and it is mainly used for drug mode of action (MoA) studies. In the uniform manifold approximation and projection (UMAP) embeddings shown, each cell is coloured either by the type of perturbation or the perturbation dose. k, thousand; M, million; t-SNE, t-distributed stochastic neighbour embedding. Elements of part b adapted from: ref. 200 , CC BY 4.0 ; ref. 115 . © The Authors, some rights reserved; exclusive licensee AAAS.

To mitigate the costs of scRNA-seq as a readout for chemical perturbation studies and to increase its throughput, multiplexing techniques have been developed. Hundreds of compounds can now be simultaneously profiled, considering multiple doses, time points and cell types, leading to a comprehensive understanding of compound function at scale and SC resolution. Using pre-existing genetic diversity and barcode -labelled antibodies or lipids, samples originating from different experimental conditions (time points, compounds, dose) can be pooled together; techniques that are collectively called hashing . For example, MIX-seq increases throughput using single-nucleotide polymorphism (SNP)-based demultiplexing of scRNA-seq readouts of cell lines and has been used to identify treatment-induced transcriptional changes for 13 drugs on up to 99 cell lines 113 . Another application of this approach relied on transient transfection of cells with short oligo barcodes 114 . The technology was validated by first multiplexing cell samples from various species (human or mouse) and, in a subsequent experiment, by multiplexing different time exposures of a human chronic myelogenous leukaemia cell line to a drug perturbation (imatinib, a BCR–ABL-targeting drug). Multiplexing the response of this cell line to 45 drugs (mostly kinase inhibitors) revealed drug-induced differential gene expression. A recent extension of single-cell combinatorial indexing sequencing (sci-RNA-seq), called sci-Plex, introduces a precursory step for sample multiplexing by single-stranded DNA (ssDNA) oligo uptake in single nuclei. This technique has been applied to screen exposure of 188 compounds in three cancer cell lines and profiled up to 650,000 cells 115 . Common and dose-dependent pathways associated with HDAC inhibitors, interfering with epigenetic cellular mechanisms, across these three diverse cancer cell lines were discovered. A metabolic consequence to depletion of cellular acetyl-CoA reserves in HDAC-inhibited cells was found, providing insight into the MoA of histone deacetylase (HDAC) inhibitors.

The field of deep learning has embraced the rich and high-dimensional data sets generated by SC multiplexed perturbation experiments (see review 116 ). These methods enable the prediction of the cellular changes induced by a drug 117 or exploration of the prohibitively large combinatorial space when combining chemical perturbations (for example, compositional perturbation autoencoder (CPA) 118 ). The latter can identify potential combination treatments from the large multiplex SC data sets generated by techniques such as sci-Plex.

SC approaches using human samples can also help to explore the MoA of drugs or vaccines. As an example, elucidating the nature of the induced immunological memory after SARS-CoV-2 vaccination from real-world evidence has complemented the preclinical and clinical studies of these vaccines. SC technologies were used to compare the immunological changes induced by natural infection, vaccine-based antigen exposure or a combination of the two. The immunological B cell response to BNT162b2 vaccination was charted using scRNA-seq and scBCR-seq (Box  2 ), and the effectiveness of this mRNA vaccine against emerging variants of concern was analysed 119 . On the basis of SC data, it was discovered that the antibody response resulting from hybrid exposure (previously infected people vaccinated with the BNT162b2 mRNA vaccine) has an increased potency for neutralization 120 . These findings were later proved to be clinically relevant in a much larger cohort of patients 121 . Regarding therapies, the RECOVERY trial established dexamethasone as an effective treatment for hospitalized patients with COVID-19 receiving oxygen or mechanical ventilation 122 . Subsequent SC studies unravelled the immunological components that underlie the effectiveness of dexamethasone. A prominent role for neutrophils in response to this potent corticosteroid in patients with severe COVID-19 was discovered 123 . These insights may thus help the development of more targeted treatment options for severe COVID-19.

Finally, SC expression profiling has also been applied to study the biological mechanisms of drug resistance at cellular resolution. Analysing SC data from pre- and multiple post-treatment time points from a lung adenocarcinoma cell line demonstrated the mechanism of acquired resistance to epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors such as erlotinib in non-small-cell lung carcinoma and the existence of intracellular heterogeneity in treatment sensitivity, highlighting the importance of unbiased SC readouts 124 .

Biomarkers and patient stratification

In some settings, patients can be stratified into refined populations on the basis of disease prognosis or therapeutically relevant markers that predict drug response. These prognostic or predictive biomarkers are often used as eligibility criteria in clinical trials to identify patients who are more likely to have disease progression or respond to a drug, respectively (Fig.  5a ).

figure 5

a , Single-cell RNA sequencing or single-cell multi-omics technologies enable the identification of a predictive biomarker from a cohort of patients enrolled in an early-phase clinical study. Such a predictive biomarker can be used to identify patients who can benefit from a given treatment as a biomarker enrichment strategy. b , Single-cell analysis of immune cells from samples from patients with metastatic melanoma treated with immune checkpoint inhibitor (ICI) therapies uncovers a TCF7 + memory-like state in the cytotoxic T cell population associated with a positive outcome. t-SNE, t-distributed stochastic neighbour embedding. Elements of part b reprinted with permission from ref. 19 , Elsevier.

Bulk transcriptomic signatures have been typically used to determine prognostic biomarkers in cancer, as in the case of the four consensus molecular subtypes (CMS1–4) defined by an international consortium for CRC 125 . However, the CMS classification has not yet proved convincingly useful in the clinic 126 . Bulk sequencing inherently lacks the resolution to capture crucial cell populations of CRC tumours and their complex microenvironment; and the underlying epithelial cell diversity remains unclear in the CMSs. Recently, scRNA-seq has helped to define more precise prognostic biomarkers in CRC 127 , 128 . Analysis of the transcriptomes of single cells from tumour and adjacent normal samples led to the definition of two epithelial cell groups with different intrinsic CMSs (named iCMS2 and iCMS3). Combining them with microsatellite instability and fibrosis status, a new classification called IMF has been proposed 128 . IMF includes five subtype classes, having distinct signalling pathways, mutational profiles and transcriptional programmes. Although promising, the value of this new classification is yet to be proved in the clinic.

ICI therapy has been successful in achieving durable responses in a subset of patients in a wide range of malignancies. However, there are still many unanswered questions around why not all patients respond to ICI therapy, and identification of predictive biomarkers for the response of ICI remains a key goal. Through these efforts, several predictive biomarkers, including tumour mutation burden (TMB), have been discovered 129 , 130 . Unfortunately, these predictive biomarkers fail to explain response to ICI for all patients. Recent SC sequencing studies have demonstrated the ability to identify new predictive biomarkers for the response or resistance to ICI. A study of CD8 + T cellular states at baseline 19 revealed that responders to checkpoint inhibitors are enriched in the TCF7 + CD8 + T cell state, which is also present in other indications responsive to checkpoint blockade (Fig.  5b ). Beyond the conventional CD8 + T cell mediated mechanisms associated with ICI response, SC sequencing is also highlighting other cell types that shape response, such as TREM2 hi macrophages, γδ T cells, CXCL9 + tumour-associated macrophages, T cell exclusion signatures and lung cancer activation module (LCAM hi ) characterized by PDCD1 + CXCL13 + activated T cells, IgG + plasma cells and SPP1 + macrophages 131 , 132 , 133 , 134 , 135 , 136 . Promisingly, some of these cell types and states have been recurrent in multiple independent studies across tumour types 137 and have outperformed currently used predictors such as TMB, tumour infiltrating lymphocyte (TIL) levels and PDL1 expression. In addition to scRNA-seq, there are examples of SC spatial analysis being applied to identification of potential predictive biomarkers of response. The proximity of exhausted CD8 + T cells to PDL1 + cells has been reported to predict the clinical response of combined PARP and PD1 inhibition in ovarian cancer 138 , while the proximity of antigen-presenting cells to stem-like CD8 T cells in intratumoural tertiary lymphoid structures has been reported to predict ICI efficacy 139 , 140 .

ScRNA-seq has also been applied to characterize chemotherapy resistance processes in cancer, as exemplified by a study in high-grade serous ovarian cancer (HGSOC). SC analysis of tissue samples collected before and after chemotherapy showed that stress-associated cancer cell populations pre-exist and are subclonally enriched during chemotherapy. The stress-associated gene signature also predicted poor prognosis in HGSOC 141 . In addition, scRNA-seq may be applied to predict future relapse, as seen in MLL-rearranged acute lymphoblastic leukaemia (ALL) by quantifying the proportion of cells that are identified as resistant or sensitive to treatment 142 . In this study, the relapse prediction outperformed the current risk stratification scheme 143 .

Outside oncology, SC studies are, for the first time, providing an opportunity to stratify disease into actionable subtypes. In IBD, scRNA-seq identified a cellular module called GIMATS in inflamed tissues from patients with Crohn’s disease 144 , consisting of IgG plasma cells, inflammatory mononuclear phagocytes, activated T cells and stromal cells. A high GIMATS score in patients was associated with failure to achieve durable remission after antitumour necrosis factor (TNF) therapy. In addition, profiling patients with ulcerative colitis and healthy individuals identified immune and stromal cells (including inflammation-associated fibroblasts) associated with resistance to anti-TNF treatment 145 . Furthermore, scRNA-seq analysis of PBMCs from patients with acute Kawasaki disease revealed the decreased abundance of CD16 + monocytes and downregulation of pro-inflammatory cytokines such as TNF and IL-1β in response to high-dose intravenous immunoglobulin (IVIG) therapy 146 . There have now also been several studies that have applied scRNA-seq approaches to diseased tissues and reported on biomarkers predictive of drug response or resistance 124 , 131 , 147 ; however, there is still a gap in terms of understanding how well these findings translate into the clinic.

Although these SC studies are limited in terms of patient numbers, conditions and samples, methods such as cell-type deconvolution allow them to be used to complement existing bulk RNA-seq studies that typically have more mature response and outcome data 22 .

Monitoring of drug response and disease progression

Clinical monitoring of both disease progression and response to therapy with SC sequencing approaches is starting to influence clinical decision-making. The field of oncology has taken the lead in this area. The concept of minimal residual disease (MRD) as a metric to indicate remaining cancer cells during or after completing therapy has been a central tenet in measuring drug response. For example, patients with acute myeloid leukaemia (AML) often harbour multiple subclones, each with complex molecular abnormalities 148 . Clinical practice today defines complete remission as <5% blasts detected by morphological evaluation in the bone marrow without an assessment of subclonal molecular abnormalities or their evolution during therapy. Evidence is mounting that MRD assessments below this 5% threshold are a relapse risk factor and could therefore guide treatment decisions 149 . MRD assessment with SC mutational profiling (in contrast to more traditional MRD methods) allows for subclonal assessment at lower detection limits and for analysis of subclonal evolution throughout treatment 150 . SC mutational profiling improved sensitivity and specificity of MRD detection and was also able to identify relapse-causing resistant clones.

The relapse risk associated with MRD is partially explained by the presence of persister cells that are induced in response to treatment. This type of drug resistance is often driven by non-genetic adaptive mechanisms, although these are poorly understood. To study the rare and transiently resistant persister cells, a high-complexity lentiviral barcode library called Watermelon was developed to simultaneously trace the clonal lineage, proliferation status and transcriptional profile of individual cells during drug treatment 151 (Supplementary Fig. 3 ). This approach identified rare cancerous persister lineages that are preferentially poised to proliferate under drug pressure and found that upregulation of antioxidant gene programmes and a metabolic shift to fatty acid oxidation are associated with persister proliferative capacity. Obstructing oxidative stress or rewiring of the metabolic programme of these cells alters their proportion. In human tumours, programmes associated with cycling persisters are induced in response to multiple targeted therapies. Persister cell states should thus be targeted to delay or even prevent cancer recurrence. In addition, the PERSIST-SEQ consortium ( https://persist-seq.org/ ) was initiated to create a SC atlas of persister cells to improve the understanding of therapeutic resistance in cancer. Similarly, initiatives like HTAN 46 could potentially contribute to consistent mapping of persister cell states among the set of clinical transitions of adult and paediatric malignancies when exploring therapeutic resistance. A study in TNBC showed that treatment-resistant clones originated from pre-existing cancer cells. By combining bulk whole-exome sequencing (WES) with SC transcriptomics, it was demonstrated that some of these adaptive changes were not induced by somatic mutations but were characterized by transcriptional reprogramming of these cells 152 .

As discussed previously, ICI therapy is a promising new therapeutic modality for some cancer patients, and understanding which subpopulation benefits from this treatment option is important. In addition, monitoring of pharmacodynamic changes and closely following response to ICI treatment from a molecular level are required for better patient selection and overall treatment outcome improvement. Mechanisms by which PD1/PDL1 blockade either revives pre-existing TILs or recruits novel T cells have been examined recently with the application of paired scRNA-seq and scTCR-seq on site-matched tumours from patients with basal or squamous cell carcinoma before and after anti-PD1 therapy 153 . Analysis of TCR clones and their transcriptional phenotypes revealed that drug response is driven by the expansion of novel T cell clones not previously observed in the same tumour, probably derived from a distinct repertoire of T cell clones that recently migrated into the tumour. Another SC study 154 showed that CXCL13 + CD8 + T cells were expanded in response to PDL1 treatment and identified a circulating T cell subtype that shared higher levels of TCR clones with tumour CXCL13 + CD8 + T cells. The number of T cell clonotypes induced during early treatment provides a good proxy for future treatment success. This metric was used to identify SC changes induced by successful ICI treatment during a window of opportunity study 155 . These findings have also been recently confirmed in a multiple tumour type study 155 , 156 , thereby not only providing insight into the PD1/PDL1 blockade MoA, but also suggesting that liquid biopsies that sample TCR repertoire and identify clonal changes upon treatment may provide an actionable pharmacodynamic response.

Current challenges

Several challenges remain for industry to harness the transformational capabilities of scRNA-seq technologies, which will require changes to infrastructure and ways of working. Moreover, as the generation of scRNA-seq data in the public domain has outpaced that of internal efforts from any single pharmaceutical company, effective integration of all relevant scRNA-seq data is particularly challenging. In addition, owing in part to sample requirements and cost of scRNA-seq data generation, it is not likely to quickly replace bulk molecular profiling of early discovery or clinical samples, and so effective integration of scRNA-seq and bulk molecular profiling data is also needed.

Study design and implementation

Standardized design and implementation of SC experiments is still in its infancy. Although SC resolution has the potential to improve understanding of cell states and subsets of rare populations, discerning a cell type precisely and consistently across different experiments for rare cell populations is difficult, especially when fine distinctions guide cell-type identification. A uniform analysis pipeline, together with consistent methodology and vocabulary, are prerequisites to addressing this. Multi-omics approaches, by providing orthogonal indicators including cell surface and intracellular proteins or epigenetic markers, can further refine cell-state delineation but also imply new analysis challenges 157 , 158 , 159 , 160 , 161 .

SC sequencing throughput is primarily limited by the cost, but also by sample processing and computation capacity. For scRNA-seq, tissue samples need to be dissociated and processed immediately after collection to preserve high RNA quality 145 , 162 . SC library preparation poses a challenge to clinical sites where personnel may not necessarily be trained to handle sample preparation and specialized equipment. Sample quality and consistency are also hard to control, especially in large-scale multi-site clinical studies. Technology development of single-nucleus sequencing on cryopreserved or even formalin-fixed paraffin embedded (FFPE) samples provides a potential solution to this issue, allowing clinical sites to bank biopsies for later processing 163 , 164 , 165 . This technology also makes it possible to take advantage of banked samples from previous studies. However, care should be taken when selecting technologies as each has its own limitations 166 , 167 .

An online calculator ( https://satijalab.org/howmanycells/ ) can help to determine the number of cells to be interrogated in a sample given prior assumptions on the diversity and relative composition of cells in the biology under investigation. Guidance in deciding which protocol to use or how deeply to sequence the collected cells has been provided 168 . In addition, design considerations for setting up longitudinal SC experiments have been reported 169 .

Design of SC experiments presents unique opportunities and challenges compared with bulk transcriptomics assays. On one hand, the availability of many SC samples within the experiment allows application of machine learning approaches that may be inappropriate for the typically powered bulk experiment. However, the results may have limited generalizability, owing to the low number of biological samples used to generate the SC data. On the other hand, compared with bulk RNA-seq, scRNA-seq is more expensive, and samples are more difficult to access and process. Bulk techniques have been optimized to deal with poor-quality RNA, frozen samples and even FFPE samples, whereas SC technology is only recently expanding beyond the use of fresh tissue. Enabling technologies, such as cryopreservation 170 or snRNA-seq 165 , are still undergoing considerable optimization. A balance in complexity and budget can be achieved by combining bulk and scRNA-seq in a single experiment. SC samples can be used to computationally deconvolute cell-type abundance from bulk samples collected using an experimental set-up that favours fewer SC and more bulk sequenced samples. In addition, leveraging publicly available SC data sets can mitigate budget constraints.

Data accessibility

The current organization of public SC data generally falls short of the FAIR principles for data stewardship in several aspects 171 , in particular with respect to data accessibility. Ongoing cataloguing efforts (for example, the BROAD Single Cell Portal — https://singlecell.broadinstitute.org/single_cell , spreadsheet of data set  metadata 172 ) and international collaborations to generate healthy reference databases (for example, Human Cell Landscape (HCL) 173 , Tabula Sapiens 174 — https://tabula-sapiens-portal.ds.czbiohub.org/ ) provide an initial entry point for discovery of data sets. However, none of these initiatives is comprehensive, resulting in the need to manually search the publication databases (for example, PubMed) and omics repositories (for example, GEO). Without uniform metadata across these databases, the search strategy must also be varied between various resources to ensure completeness.

Within a given organization, some data are likely to be accessible only to a subset of analysts. Tracking designations flagging permissible data use in the metadata versus in an external system each present different barriers related to internal risk management and compliance, as well as to scientists and analysts seeking to use those data or to build on previously completed analyses. For public data sets, similar issues exist — data access might be restricted behind security portals, as in the case of dbGaP and EGA, because of privacy laws, contractual considerations or the sensitivity of human data. This is especially true for raw reads from full transcript protocols such as Smart-Seq2 and is equally likely to be applicable to internally generated data.

Data interoperability and reusability

Most SC transcriptomics data sets of published work are made available publicly. Unfortunately, there is considerable variability in the format and layout of data. Digital formats for expression or count matrices (scRNA-seq) and experimental metadata are not standardized 175 . In addition, lack of comprehensive sample metadata is a common problem. Therefore, the interoperability of these data sets is limited.

Moreover, the non-uniformity of data processing, including the quality control (QC), cell-type annotation and the lack of a well-defined cell-type nomenclature (that is, either ‘flat’ or ‘shallow’ nomenclatures are used, with different levels of detail across studies), necessitates reprocessing of the data sets to interrogate them for new research questions.

Currently, the pharmaceutical industry either resorts to in-house curation efforts to augment their internal library of SC data sets with uniformly processed public entries and/or engages with external vendors for this service (see Box  5 for an example from a company and Box  6 for general use of SC public data sets by industry). The maturity, range and type of services provided by vendors varies greatly, from project-based and ad hoc curation of a small set of data sets, to platforms that house an industrialized pipeline, SC web viewers and exploratory research environments. The extent of the curation is also highly variable: some vendors start from raw sequence reads, whereas others reuse published gene expression matrices and cell-type annotations. Another big challenge to overcome is technical variations in SC data introduced by multiple factors such as laboratories and conditions. It is crucial to properly handle technical variations in the data integration and curation step (see Box  3 for computational tools for batch-effect correction and data integration). However, these approaches are expensive and time-consuming. To avoid duplication of work across companies and academic institutions, the community could benefit from collaboratively adopting and developing common standards. The academic sector has clearly paved the way by showing the value generated by creating repositories of uniformly processed and/or integrated data sets (Table  1 ).

Direct exploration of published data sets is being facilitated by both online viewers hosted by some researchers and general purpose scRNA-seq platforms that provide more elaborate exploratory analysis capabilities. Researcher-hosted viewers are useful to quickly check the expression of a gene but do not support maximal reuse of published data sets. Even the most advanced viewers, such as Cellxgene 176 limit the scope of interrogation to selected use cases. These viewers are not a durable resource and often rely on temporary web hosting and are therefore more appropriate for accessing the data immediately after publication. By contrast, general purpose platforms such as Cumulus/Pegasus, which runs on Terra.Bio 177 , provide a cloud infrastructure tailored to run scRNA-seq bioinformatics pipelines and a notebook system for exploratory analysis. The EMBL-EBI Single Cell Expression Atlas (SCEA) 178 has built a uniform pipeline for transcript quantification, quality control and cell-type annotation, and it runs on the browser-based Galaxy platform 179 . A final example, the HCA Data Coordination Platform (DCP), is a public, cloud-based platform on which scientists can share, organize and interrogate SC data.

Box 5 Harmonizing metadata across single-cell data sets

Single-cell (SC) sequencing performs unbiased profiling of individual cells and enables evaluation of rare cellular populations, often missed using bulk sequencing. However, the diversity and multiplicity of the SC data sets pose a challenge, further exacerbated when working with large data sets typically generated by complex organizations such as the Human Cell Atlas (HCA) consortium. Merging public domain SC data sets with those generated within the private sector adds another complication. As the number and scale of SC data sets increase, there is an unmet technological need to develop suitable database platforms to evaluate key biological hypotheses across this multiplicity of data sets. In addition to the absence of a common processing workflow mapping raw sequences to gene expression matrices in a uniform way, the lack of standardized metadata collection is a primary challenge.

To address this challenge, the REVEAL:SingleCell platform, built by a pharma company on top of SciDB, provides unified scientific data management and computational tools to load, store, retrieve and query multiple SC data sets 314 . Its data model accommodates FAIR access to heterogeneous, multi-attribute data as well as metadata such as ontologies and reference data sets. Multiple users can load, read and write data in a secure, transactionally safe manner. REVEAL:SingleCell provides purpose-built data schema, interfaces and task-focused functionality, using a controlled vocabulary. R and Python APIs provide direct, ad hoc access and analysis, as well as extensibility via the integration of additional library packages. A FLASK REST API implements a web interface. A Shiny GUI supports data visualization and exploration by non-programmers.

The platform was applied to coronavirus disease 2019 (COVID-19) research; integrating a collection of 32 disease-related data sets available at that time (from 2.2 million cells in all), including public data from HCA Census of Immune Cells data set and COVID-19 Cell Atlas 314 . As the data sets were generated by different groups and metadata standardization was completely lacking, the company harmonized metadata for cell-type annotations, a crucial factor when performing cross-data set analysis. Harmonizing of cell-type annotations (T cell, B cell, etc.) is highly desirable because they are typically captured as free text and under variable names (Cell type, CellType, etc.). To solve the lack of metadata standardization, a workflow that identified and captured the cell-type information for each data set in a predefined variable name (Celltype.select) was created and mapped back to unique Ontobee cell ontology CL identifiers ( https://ontobee.org/ontology/CL ). This step harmonized the cell-type annotations from a free text format to controlled Ontobee CL identifiers. On the other hand, raw expression data from the multiple SC studies were normalized into a common format. These expression counts, along with the harmonized metadata, were then loaded into SciDB, which allows profiling queries across data sets with user-defined thresholds of gene expression values and metadata features to select cells of interest. For example, using this platform it was found that more than 40% of gallbladder cells co-express ACE2 and TMPRSS2 and can thus be infected by the virus. The workflow is generalizable for other metadata features such as tissues and diseases.

Box 6 Public single-cell data in drug discovery and development

The vast array of publicly available single-cell (SC) data is crucial for the industrial use of SC technologies. Table  1 shows selected key public SC data resources of interest to pharmaceutical companies. Some of these resources originate from academic initiatives to assemble pre-existing data sets into harmonized resources and atlases. The original data sets and these secondary resources can be used to complement internal research programmes in several ways.

Access to a uniform pipeline is a first step that many companies take to ensure compatibility between internally generated and public data. Unfortunately, reprocessing of public data at each company still results in considerable duplication of effort. As with bulk RNA-seq projects (ARCHS4 (ref. 315 ), recount2 (ref. 316 ) or UCSC Toil 317 ), academic initiatives are also leading in the creation of uniform catalogues or integrated SC data sets. Sometimes this is because of an immediate need (for example, Conquer 318 created a benchmark of SC data sets to assess differential expression methods), but most initiatives were driven by the added value generated. An example is the EMBL-EBI Single Cell Expression Atlas (SCEA) 178 , which, in addition to a uniform pipeline, also provides the original author cell-type labels as well as cell ontology-matched labels.

SC atlases, such as those produced by the Human Tumour Atlas Network 46 initiative, can be used as a reference for cell-type annotation of internal research data sets (see Box  3 for relevant methods). Multimodal technologies that enrich SC transcriptomics with matched cell surface protein (for example, CITE-Seq or REAP-Seq) and/or open chromatin data, are also yielding public data sets. For instance, many CITE-Seq data sets have been generated, are publicly available and can be used to predict protein expression from internally generated single-cell RNA sequencing (scRNA-seq) experiments 319 .

Benchmarking of the many available computational methods in the SC field also benefits strongly from the availability of public data. Benchmarking is necessary to assess method performance and guide the development of best practices 320 . Synthetically generated data sets can help to assess methods, but creating such synthetic data sets is difficult 321 . Publicly available data sets can be used instead either to define the starting data for generative methods 322 or to benchmark the generated data sets, for example, in Splatter 323 . Public data sets can also be used directly in other benchmarking exercises, for example, benchmarking trajectory inference methods that rely on a synthetic and public repository of data sets 324 .

Bulk transcriptomics assays to provide an unbiased view on the effect of a drug are now an integral part of internal research programmes in industry. The tools to deconvolute the cellular composition of bulk RNA-seq samples need prior knowledge of cell types present in the sample and their associated gene expression profiles or marker genes. Public scRNA-seq from matching tissues is an excellent source of this information. In addition, as recently illustrated using EcoTyper in diffuse large B cell lymphoma 31 , SC data can be used to reanalyse bulk RNA-seq from previous studies to further define cell states or classes linked to outcome. As there is a huge amount of public and internal bulk RNA-seq data available, re-analysis of public data with SC data sets focusing on specific clinical questions is of interest.

Similarly, integration of SC analysis with other types of internal or public bulk assay (for example, epigenomics, proteomics, metabolomics) would also be of value. In fact, this is an emerging frontier in research, with tools such as flux analysis and others being explored. However, although relevant for research, these approaches are not yet adopted by industry.

Public data can also serve as independent cohorts to verify internal findings, and integrative methods (for example, Harmony 325 ) allow the generation of SC atlases by combining cellular spaces from several experiments, increasing the generalizability of exploratory research. This approach has been successfully applied to uncover biomarkers and improve disease understanding in lung fibrosis, when internal scRNA-seq data were combined with two public data sets with a similar experimental set-up (that is, control versus disease) 326 .

Finally, public data studies can serve as pilot experiments when performing power calculations (that is, to define the number of samples required to demonstrate predetermined effect size) and can be helpful for getting basic information related to experimental design (for example, to decide experimental protocols) 168 , 327 , 328 .

Conclusions and future perspectives

Most complex diseases for which treatment remains elusive have a multicellular aetiology, and a SC perspective could be crucial in advancing our understanding and ability to select the most therapeutically impactful cellular or molecular targets. SC protocols combined with sophisticated multiplex strategies have increased the scale and resolution at which assays can be performed. In addition, SC profiling of commonly used preclinical models enables researchers to select the model that best recapitulates essential human pathobiology. Interrogating human samples at cellular resolution can help to advance personalized medicine, by expediting the discovery of new biomarkers to help stratify patients on the basis of prognosis or prediction of treatment effect. A longitudinal SC view on diseased tissues during treatment can also provide physicians with a more direct and mechanistic view on response to treatment.

Having established the more mature scRNA-seq-based methods for routine use in industry, effort is increasingly focused on adopting other methods such as SC proteomics and spatial omics technologies, as industrial SC capabilities are expanded. As the core technologies become standardized, the requisite skills become more widely available and the costs fall, the rate of SC data generation is likely to continue to accelerate 180 , 181 .

As the technical challenges involved in SC data generation, curation and access are addressed, new opportunities are emerging. For example, upstream of target discovery, the focus is already shifting from the discovery of novel cell types and cellular marker genes towards hypothesis generation rooted in deeper understanding of cellular mechanisms. The integration of additional data types supports this shift as omics and other multiparametric data enhance the granularity of insight into the cellular environment. For example, mapping genetic cues on disease provided by GWAS on SC profiles from scRNA-seq experiments can help to elucidate cellular phenotypes linked to complex diseases 81 , 182 .

With the increasing maturity of spatial profiling technologies, we are beginning to better understand human tissue organization and microenvironment niches. Spatial profiling enables cell types to be accurately counted and localized within the broader tissue architecture. In addition, it facilitates the mapping of intricate auto- and paracrine interactions between cell types within a tissue. However, the resolution of the most unbiased and comprehensive approaches (for example, 10X Visium) remains supracellular. We expect that such approaches will evolve to provide SC resolution, and thus complement and extend the pipeline of methods applicable to intercellular interaction discovery from scRNA-seq (for example, CellPhoneDB 183 ). Moreover, advances in spatial profiling are lining up with the recent progress made in digital pathology. Combined with automated feature extraction and molecular classification of digitized pathology images via deep learning techniques 184 , orthogonal informational cues assayed via sequencing or multiplex imaging technologies will enable researchers to develop a deeper knowledge of the complex biology involved in some diseases.

Given the enormous technical, computational and scientific complexities involved in SC data generation and translating those data into benefits to patients, collaboration has a key role. This is clearly demonstrated by the Accelerating Medicines Partnership and LifeTime initiatives, and the rapid growth of SC research around SARS-CoV-2 (ref. 185 ). LifeTime established a special task force to study COVID-19 and to identify SC-based biomarkers and novel modalities. In this case, HCA and LifeTime created a common framework for sharing knowledge, data, tools and other resources. As the scale and complexity of SC data and our understanding of human biology continue to deepen, collaborative efforts between academia and industry will be increasingly vital to realize the transformational potential of SC technologies.

DiMasi, J. A., Grabowski, H. G. & Hansen, R. W. Innovation in the pharmaceutical industry: new estimates of R&D costs. J. Health Econ. 47 , 20–33 (2016).

Article   PubMed   Google Scholar  

Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323 , 844–853 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Paul, S. M. et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat. Rev. Drug Discov. 9 , 203–214 (2010).

Article   CAS   PubMed   Google Scholar  

Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47 , 856–860 (2015).

1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491 , 56–65 (2012).

Article   Google Scholar  

Sernoskie, S. C., Jee, A. & Uetrecht, J. P. The emerging role of the innate immune response in idiosyncratic drug reactions. Pharmacol. Rev. 73 , 861–896 (2021).

Heid, C. A., Stevens, J., Livak, K. J. & Williams, P. M. Real time quantitative PCR. Genome Res. 6 , 986–994 (1996).

Cheung, R. K. & Utz, P. J. CyTOF — the next generation of cell detection. Nat. Rev. Rheumatol. 7 , 502–503 (2011).

Bendall, S. C. et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332 , 687–696 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Nassar, A. F., Ogura, H. & Wisnewski, A. V. Impact of recent innovations in the use of mass cytometry in support of drug development. Drug. Discov. Today 20 , 1169–1175 (2015).

Wen, L. & Tang, F. Recent advances in single-cell sequencing technologies. Precis. Clin. Med. 5 , pbac002 (2022).

Jovic, D. et al. Single‐cell RNA sequencing technologies and applications: a brief overview. Clin. Transl. Med. 12 , e694 (2022).

Kashima, Y. et al. Single-cell sequencing techniques from individual to multiomics analyses. Exp. Mol. Med. 52 , 1419–1427 (2020).

Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13 , 599–604 (2018).

Aldridge, S. & Teichmann, S. A. Single cell transcriptomics comes of age. Nat. Commun. 11 , 4307 (2020).

Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6 , 377–382 (2009). Successful attempt to sequence the full transcriptome of a single cell in an unbiased way.

Navin, N. E., Rozenblatt-Rosen, O. & Zhang, N. R. New frontiers in single-cell genomics. Genome Res. 31 , ix–x (2021).

Article   PubMed Central   Google Scholar  

Zilionis, R. et al. Single-cell transcriptomics of human and mouse lung cancers reveals conserved myeloid populations across individuals and species. Immunity 50 , 1317–1334.e10 (2019). A detailed study correlating immune cell populations in mouse and human lung cancer.

Sade-Feldman, M. et al. Defining T cell states associated with response to checkpoint immunotherapy in melanoma. Cell 175 , 998–1013.e20 (2018). Illustration of how scRNA-seq approaches can be used to identify new predictive biomarkers for the response or resistance to ICI therapies in cancer.

Jang, J. S. et al. Molecular signatures of multiple myeloma progression through single cell RNA-Seq. Blood Cancer J. 9 , 2 (2019).

Tanaka, N. et al. Single-cell RNA-seq analysis reveals the platinum resistance gene COX7B and the surrogate marker CD63. Cancer Med. 7 , 6193–6204 (2018).

Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell 175 , 984–997.e24 (2018). This work demonstrates the utility of scRNA-seq for the identification of an immune resistance programme associated with T cell exclusion and immune evasion. It also provides new therapeutic approaches to overcome resistance to ICI.

Cohen, Y. C. et al. Identification of resistance pathways and therapeutic targets in relapsed multiple myeloma patients through single-cell sequencing. Nat. Med. 27 , 491–503 (2021).

Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356 , eaah4573 (2017).

Park, J.-E. et al. A cell atlas of human thymic development defines T cell repertoire formation. Science 367 , eaay3224 (2020).

GTEx Consortium.Landscape of X chromosome inactivation across human tissues. Nature 550 , 244–248 (2017).

Ramachandran, P. et al. Resolving the fibrotic niche of human liver cirrhosis at single-cell level. Nature 575 , 512–518 (2019).

Song, H. et al. Single-cell analysis of human primary prostate cancer reveals the heterogeneity of tumor-associated epithelial cell states. Nat. Commun. 13 , 141 (2022).

Wang, Q. et al. Single-cell chromatin accessibility landscape in kidney identifies additional cell-of-origin in heterogenous papillary renal cell carcinoma. Nat. Commun. 13 , 31 (2022).

Nowicki-Osuch, K. et al. Molecular phenotyping reveals the identity of Barrett’s esophagus and its malignant transition. Science 373 , 760–767 (2021). Illustrative example of how SC studies can help to understand tumorigenesis.

Steen, C. B. et al. The landscape of tumor cell states and ecosystems in diffuse large B cell lymphoma. Cancer Cell 39 , 1422–1437.e10 (2021).

Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53 , 1334–1347 (2021).

Zhang, X. et al. Dissecting esophageal squamous-cell carcinoma ecosystem by single-cell transcriptomic analysis. Nat. Commun. 12 , 5291 (2021).

Pu, W. et al. Single-cell transcriptomic analysis of the tumor ecosystems underlying initiation and progression of papillary thyroid carcinoma. Nat. Commun. 12 , 6058 (2021).

Ursu, O. et al. Massively parallel phenotyping of coding variants in cancer with Perturb-seq. Nat. Biotechnol. 40 , 896–905 (2022). High-throughput analysis of oncogene and tumour suppressor variant phenotypes at single-cell level.

Chaligne, R. et al. Epigenetic encoding, heritability and plasticity of glioma transcriptional cell states. Nat. Genet. 53 , 1469–1479 (2021).

Johnson, K. C. et al. Single-cell multimodal glioma analyses identify epigenetic regulators of cellular plasticity and environmental stress response. Nat. Genet. 53 , 1456–1468 (2021).

Croucher, D. C. et al. Longitudinal single-cell analysis of a myeloma mouse model identifies subclonal molecular programs associated with progression. Nat. Commun. 12 , 6322 (2021).

Salehi, S. et al. Clonal fitness inferred from time-series modelling of single-cell cancer genomes. Nature 595 , 585–590 (2021). SC-based study showing how TP53 mutations alter tumour clonal fitness in TNBC and the impact on resistance to cisplatin chemotherapy.

Quinn, J. J. et al. Single-cell lineages reveal the rates, routes, and drivers of metastasis in cancer xenografts. Science 371 , eabc1944 (2021).

Yaddanapudi, K. et al. Single-cell immune mapping of melanoma sentinel lymph nodes reveals an actionable immunotolerant microenvironment. Clin. Cancer Res. 28 , 2069–2081 (2022).

Lund, A. W. Standing watch: immune activation and failure in melanoma sentinel lymph nodes. Clin. Cancer Res. 28 , 1996–1998 (2022).

Li, J. et al. Single-cell characterization of the cellular landscape of acral melanoma identifies novel targets for immunotherapy. Clin. Cancer Res. 28 , 2131–2146 (2022).

Sun, Y.-F. et al. Dissecting spatial heterogeneity and the immune-evasion mechanism of CTCs by single-cell RNA-seq in hepatocellular carcinoma. Nat. Commun. 12 , 4091 (2021).

Diamantopoulou, Z. et al. The metastatic spread of breast cancer accelerates during sleep. Nature 607 , 156–162 (2022).

Rozenblatt-Rosen, O. et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 181 , 236–249 (2020). Description of the goals of the Human Tumor Atlas Network project — building a SC and spatially resolved pan-cancer atlas also covering the dynamics from cancer initiation to metastasis.

Becker, W. R. et al. Single-cell analyses define a continuum of cell state and composition changes in the malignant transformation of polyps to colorectal cancer. Nat. Genet. 54 , 985–995 (2022).

Arenas, E. Parkinson’s disease in the single-cell era. Nat. Neurosci. 25 , 536–538 (2022).

Kamath, T. et al. Single-cell genomic profiling of human dopamine neurons identifies a population that selectively degenerates in Parkinson’s disease. Nat. Neurosci. 25 , 588–595 (2022). Identification and characterization of a dopamine neuron subpopulation that selectively degenerates in Parkinson disease.

Miller, M. B. et al. Somatic genomic changes in single Alzheimer’s disease neurons. Nature 604 , 714–722 (2022).

Keren-Shaul, H. et al. A unique microglia type associated with restricting development of Alzheimer’s disease. Cell 169 , 1276–1290.e17 (2017). Identification and characterization of a disease-associated microglia population in Alzheimer disease.

Wang, P. et al. Single-cell transcriptome and TCR profiling reveal activated and expanded T cell populations in Parkinson’s disease. Cell Discov. 7 , 52 (2021).

Cadwell, C. R. et al. Electrophysiological, transcriptomic and morphologic profiling of single neurons using Patch-seq. Nat. Biotechnol. 34 , 199–203 (2016).

Fuzik, J. et al. Integration of electrophysiological recordings with single-cell RNA-seq data identifies neuronal subtypes. Nat. Biotechnol. 34 , 175–183 (2016).

Yang, A. C. et al. A human brain vascular atlas reveals diverse mediators of Alzheimer’s risk. Nature 603 , 885–892 (2022).

Berg, J. et al. Human neocortical expansion involves glutamatergic neuron diversification. Nature 598 , 151–158 (2021).

Simone, D. et al. Single cell analysis of spondyloarthritis regulatory T cells identifies distinct synovial gene expression patterns and clonal fates. Commun. Biol. 4 , 1395 (2021).

Penkava, F. et al. Single-cell sequencing reveals clonal expansions of pro-inflammatory synovial CD8 T cells expressing tissue-homing receptors in psoriatic arthritis. Nat. Commun. 11 , 4767 (2020).

Wu, X. et al. Single-cell sequencing of immune cells from anticitrullinated peptide antibody positive and negative rheumatoid arthritis. Nat. Commun. 12 , 4977 (2021).

Liu, Y. et al. Classification of human chronic inflammatory skin disease based on single-cell immune profiling. Sci. Immunol. 7 , eabl9165 (2022).

Ingelfinger, F. et al. Twin study reveals non-heritable immune perturbations in multiple sclerosis. Nature 603 , 152–158 (2022).

Bjornevik, K. et al. Longitudinal analysis reveals high prevalence of Epstein-Barr virus associated with multiple sclerosis. Science 375 , 296–301 (2022).

Lanz, T. V. et al. Clonally expanded B cells in multiple sclerosis bind EBV EBNA1 and GlialCAM. Nature 603 , 321–327 (2022).

Nathan, A. et al. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 606 , 120–128 (2022). Describes the discovery of cell-state-specific and dynamic eQTL patterns in human memory T cells revealing new eQTL associations for non-coding variants linked to disease.

Ma, K.-Y. et al. High-throughput and high-dimensional single-cell analysis of antigen-specific CD8 + T cells. Nat. Immunol. 22 , 1590–1598 (2021).

Wauters, E. et al. Discriminating mild from critical COVID-19 by innate and adaptive immune single-cell profiling of bronchoalveolar lavages. Cell Res. 31 , 272–290 (2021).

Stephenson, E. et al. Single-cell multi-omics analysis of the immune response in COVID-19. Nat. Med. 27 , 904–916 (2021).

Lee, J. W. et al. Integrated analysis of plasma and single immune cells uncovers metabolic changes in individuals with COVID-19. Nat. Biotechnol. 40 , 110–120 (2022).

Georg, P. et al. Complement activation induces excessive T cell cytotoxicity in severe COVID-19. Cell 185 , 493–512.e25 (2022).

Wang, S. et al. A single-cell transcriptomic landscape of the lungs of patients with COVID-19. Nat. Cell Biol. 23 , 1314–1328 (2021). Study using SC sequencing to better understand severe COVID-19.

Delorey, T. M. et al. COVID-19 tissue atlases reveal SARS-CoV-2 pathology and cellular targets. Nature 595 , 107–113 (2021). Study using SC sequencing to better understand severe COVID-19.

Tian, Y. et al. Single-cell immunology of SARS-CoV-2 infection. Nat. Biotechnol. 40 , 30–41 (2022).

Dar, D., Dar, N., Cai, L. & Newman, D. K. Spatial transcriptomics of planktonic and sessile bacterial populations at single-cell resolution. Science 373 , eabi4882 (2021).

Gideon, H. P. et al. Multimodal profiling of lung granulomas in macaques reveals cellular correlates of tuberculosis control. Immunity 55 , 827–846.e10 (2022).

Abdelfattah, N. et al. Single-cell analysis of human glioma and immune cells identifies S100A4 as an immunotherapy target. Nat. Commun. 13 , 767 (2022).

Lareau, C. A., Parker, K. R. & Satpathy, A. T. Charting the tumor antigen maps drawn by single-cell genomics. Cancer Cell 39 , 1553–1557 (2021).

Gladka, M. M. et al. Single-cell sequencing of the healthy and diseased heart reveals cytoskeleton-associated protein 4 as a new modulator of fibroblasts activation. Circulation 138 , 166–180 (2018). Illustrative example of how SC approaches can help to identify candidate targets. Here, CKAP4 for cardiac fibrosis.

Kuppe, C. et al. Decoding myofibroblast origins in human kidney fibrosis. Nature 589 , 281–286 (2021).

Li, Z. et al. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen. Nat. Commun. 12 , 6386 (2021).

Cano-Gamez, E. & Trynka, G. From GWAS to function: using functional genomics to identify the mechanisms underlying complex diseases. Front. Genet. 11 , 424 (2020).

Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat. Genet. 54 , 1479–1492 (2022). The method scLinker combines GWAS summary statistics with scRNA-seq data sets and thereby enables the discovery of cell types (and biological processes) linked to disease.

Muslu, O., Hoyt, C. T., Lacerda, M., Hofmann-Apitius, M. & Frohlich, H. Guiltytargets: prioritization of novel therapeutic targets with network representation learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 19 , 491–500 (2022).

Gawel, D. R. et al. A validated single-cell-based strategy to identify diagnostic and therapeutic targets in complex diseases. Genome Med. 11 , 47 (2019).

Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167 , 1853–1866.e17 (2016). Technique for pooled CRISPR screening with scRNA-seq readouts.

Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167 , 1867–1882.e21 (2016).

Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14 , 297–301 (2017).

Shifrut, E. et al. Genome-wide CRISPR screens in primary human T cells reveal key regulators of immune function. Cell 175 , 1958–1971.e15 (2018).

Jin, X. et al. In vivo Perturb-Seq reveals neuronal and glial abnormalities associated with autism risk genes. Science 370 , eaaz6063 (2020).

Lazo, J. S. et al. Credentialing and pharmacologically targeting PTP4A3 phosphatase as a molecular target for ovarian cancer. Biomolecules 11 , 969 (2021).

Wang, W. et al. MAPK4 promotes triple negative breast cancer growth and reduces tumor sensitivity to PI3K blockade. Nat. Commun. 13 , 245 (2022).

Wang, P.-X. et al. Targeting CASP8 and FADD-like apoptosis regulator ameliorates nonalcoholic steatohepatitis in mice and nonhuman primates. Nat. Med. 23 , 439–449 (2017).

Bertin, S. et al. Dual-specificity phosphatase 6 regulates CD4+ T-cell functions and restrains spontaneous colitis in IL-10-deficient mice. Mucosal Immunol. 8 , 505–515 (2015).

Ruan, J.-W. et al. Dual-specificity phosphatase 6 deficiency regulates gut microbiome and transcriptome response against diet-induced obesity in mice. Nat. Microbiol. 2 , 16220 (2016).

Chang, C.-S. et al. Single-cell RNA sequencing uncovers the individual alteration of intestinal mucosal immunocytes in Dusp6 knockout mice. iScience 25 , 103738 (2022).

Llewellyn, H. P. et al. T cells and monocyte-derived myeloid cells mediate immunotherapy-related hepatitis in a mouse model. J. Hepatol. 75 , 1083–1095 (2021).

Chen, S.-H. et al. Dual checkpoint blockade of CD47 and PD-L1 using an affinity-tuned bispecific antibody maximizes antitumor immunity. J. Immunother. Cancer 9 , e003464 (2021).

Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16 , 409–412 (2019).

Katzenelenbogen, Y. et al. Coupled scRNA-Seq and intracellular protein activity reveal an immunosuppressive role of TREM2 in cancer. Cell 182 , 872–885.e19 (2020).

Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53 , 332–341 (2021).

Schütte, M. et al. Molecular dissection of colorectal cancer in pre-clinical models identifies biomarkers predicting sensitivity to EGFR inhibitors. Nat. Commun. 8 , 14262 (2017).

Kinker, G. S. et al. Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity. Nat. Genet. 52 , 1208–1218 (2020).

Mead, B. E. et al. Screening for modulators of the cellular composition of gut epithelia via organoid models of intestinal stem cell differentiation. Nat. Biomed. Eng. 6 , 476–494 (2022).

Bock, C. et al. The organoid cell atlas. Nat. Biotechnol. 39 , 13–17 (2021).

Shinozawa, T. et al. High-fidelity drug-induced liver injury screen using human pluripotent stem cell-derived organoids. Gastroenterology 160 , 831–846.e10 (2021). Characterization of organoid preclinical models for liver injury drug screening using scRNA-seq.

Krieger, T. G. et al. Single-cell analysis of patient-derived PDAC organoids reveals cell state heterogeneity and a conserved developmental hierarchy. Nat. Commun. 12 , 5826 (2021).

Bondoc, A. et al. Identification of distinct tumor cell populations and key genetic mechanisms through single cell sequencing in hepatoblastoma. Commun. Biol. 4 , 1049 (2021).

Kim, K.-T. et al. Single-cell mRNA sequencing identifies subclonal heterogeneity in anti-cancer drug responses of lung adenocarcinoma cells. Genome Biol. 16 , 127 (2015).

Hosein, A. N. et al. Cellular heterogeneity during mouse pancreatic ductal adenocarcinoma progression at single-cell resolution. JCI Insight 5 , 129212 (2019).

Tabula Muris Consortium. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562 , 367–372 (2018). The Tabula Muris project generated a SC multi-tissue atlas at SC resolution for the frequently used Mus musculus animal model in preclinical research.

Kumar, M. P. et al. Analysis of single-cell RNA-Seq identifies cell-cell communication associated with tumor characteristics. Cell Rep. 25 , 1458–1468.e4 (2018).

Taukulis, I. A. et al. Single-cell RNA-Seq of cisplatin-treated adult stria vascularis identifies cell type-specific regulatory networks and novel therapeutic gene targets. Front. Mol. Neurosci. 14 , 718241 (2021). Illustrative example of how SC approaches can be used to explain toxic undesirable effects of therapies.

Yofe, I., Dahan, R. & Amit, I. Single-cell genomic approaches for developing the next generation of immunotherapies. Nat. Med. 26 , 171–177 (2020).

McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11 , 4296 (2020).

Shin, D., Lee, W., Lee, J. H. & Bang, D. Multiplexed single-cell RNA-seq via transient barcoding for simultaneous expression profiling of various drug perturbations. Sci. Adv. 5 , eaav2249 (2019).

Srivatsan, S. R. et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science 367 , 45–51 (2020). Illustration of how a high-content screening method that uses scRNA-seq as readout can provide new hints on HDAC inhibitor MoA in cancer.

Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J. Machine learning for perturbational single-cell omics. Cell Syst. 12 , 522–537 (2021).

Lotfollahi, M. J., Wolf, F. A. & Theis, F.J. scGen predicts single-cell perturbation responses. Nat. Methods 16 , 715–721 (2019).

Lotfollahi, M. et al. Learning interpretable cellular responses to complex perturbations in high-throughput screens. Preprint at  bioRxiv   https://doi.org/10.1101/2021.04.14.439903 (2021).

Brewer, R. C. et al. BNT162b2 vaccine induces divergent B cell responses to SARS-CoV-2 S1 and S2. Nat. Immunol. 23 , 33–39 (2022).

Andreano, E. et al. Hybrid immunity improves B cells and antibodies against SARS-CoV-2 variants. Nature 600 , 530–535 (2021).

Hall, V. et al. Protection against SARS-CoV-2 after Covid-19 vaccination and previous infection. N. Engl. J. Med. 386 , 1207–1220 (2022).

RECOVERY Collaborative Group. Dexamethasone in hospitalized patients with Covid-19. N. Engl. J. Med. 384 , 693–704 (2021).

Sinha, S. et al. Dexamethasone modulates immature neutrophils and interferon programming in severe COVID-19. Nat. Med. 28 , 201–211 (2022).

Aissa, A. F. et al. Single-cell transcriptional changes associated with drug tolerance and response to combination therapies in cancer. Nat. Commun. 12 , 1628 (2021).

Guinney, J. et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 21 , 1350–1356 (2015).

Mehrvarz Sarshekeh, A. et al. Consensus molecular subtype (CMS) as a novel integral biomarker in colorectal cancer: a phase II trial of bintrafusp alfa in CMS4 metastatic CRC. JCO 38 , 4084–4084 (2020).

Khaliq, A. M. et al. Refining colorectal cancer classification and clinical stratification through a single-cell atlas. Genome Biol. 23 , 113 (2022).

Joanito, I. et al. Single-cell and bulk transcriptome sequencing identifies two epithelial tumor cell states and refines the consensus molecular classification of colorectal cancer. Nat. Genet. 54 , 963–975 (2022). Novel classification of CRC for biomarker prognosis proposed by using SC approaches and the tumour environment.

Litchfield, K. et al. Meta-analysis of tumor- and T cell-intrinsic mechanisms of sensitization to checkpoint inhibition. Cell 184 , 596–614.e14 (2021).

Li, H., van der Merwe, P. A. & Sivakumar, S. Biomarkers of response to PD-1 pathway blockade. Br. J. Cancer 126 , 1663–1675 (2022).

Leader, A. M. et al. Single-cell analysis of human non-small cell lung cancer lesions refines tumor classification and patient stratification. Cancer Cell 39 , 1594–1609.e12 (2021).

Xiong, D., Wang, Y. & You, M. A gene expression signature of TREM2hi macrophages and γδ T cells predicts immunotherapy response. Nat. Commun. 11 , 5084 (2020).

Kieffer, Y. et al. Single-cell analysis reveals fibroblast clusters linked to immunotherapy resistance in cancer. Cancer Discov. 10 , 1330–1351 (2020).

Dominguez, C. X. et al. Single-cell RNA sequencing reveals stromal evolution into LRRC15 + myofibroblasts as a determinant of patient response to cancer immunotherapy. Cancer Discov. 10 , 232–253 (2020).

Guo, X. et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat. Med. 24 , 978–985 (2018).

Zheng, C. et al. Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing. Cell 169 , 1342–1356.e16 (2017).

Pittet, M. J., Michielin, O. & Migliorini, D. Clinical relevance of tumour-associated macrophages. Nat. Rev. Clin. Oncol. 19 , 402–421 (2022).

Färkkilä, A. et al. Immunogenomic profiling determines responses to combined PARP and PD-1 inhibition in ovarian cancer. Nat. Commun. 11 , 1459 (2020).

Jansen, C. S. et al. An intra-tumoral niche maintains and differentiates stem-like CD8 T cells. Nature 576 , 465–470 (2019).

Vanhersecke, L. et al. Mature tertiary lymphoid structures predict immune checkpoint inhibitor efficacy in solid tumors independently of PD-L1 expression. Nat. Cancer 2 , 794–802 (2021).

Zhang, K. et al. Longitudinal single-cell RNA-seq analysis reveals stress-promoted chemoresistance in metastatic ovarian cancer. Sci. Adv. 8 , eabm1831 (2022).

Candelli, T. et al. Identification and characterization of relapse-initiating cells in MLL-rearranged infant ALL by single-cell transcriptomics. Leukemia 36 , 58–67 (2022).

Pieters, R. et al. A treatment protocol for infants younger than 1 year with acute lymphoblastic leukaemia (Interfant-99): an observational study and a multicentre randomised trial. Lancet 370 , 240–250 (2007).

Martin, J. C. et al. Single-cell analysis of Crohn’s disease lesions identifies a pathogenic cellular module associated with resistance to anti-TNF therapy. Cell 178 , 1493–1508.e20 (2019).

Smillie, C. S. et al. Intra- and inter-cellular rewiring of the human colon during ulcerative colitis. Cell 178 , 714–730.e22 (2019).

Wang, Z. et al. Single-cell RNA sequencing of peripheral blood mononuclear cells from acute Kawasaki disease patients. Nat. Commun. 12 , 5444 (2021).

Zhang, Y. et al. Single-cell analyses of renal cell cancers reveal insights into tumor microenvironment, cell of origin, and therapy response. Proc. Natl Acad. Sci. USA 118 , e2103240118 (2021).

Tyner, J. W. et al. Functional genomic landscape of acute myeloid leukaemia. Nature 562 , 526–531 (2018).

Schuurhuis, G. J. et al. Minimal/measurable residual disease in AML: a consensus document from the European LeukemiaNet MRD Working Party. Blood 131 , 1275–1291 (2018).

Ediriwickrema, A. et al. Single-cell mutational profiling enhances the clinical evaluation of AML MRD. Blood Adv. 4 , 943–952 (2020). Minimal residual disease in acute myeloid leukaemia can be better assessed by using SC mutational profiling.

Oren, Y. et al. Cycling cancer persister cells arise from lineages with distinct programs. Nature 596 , 576–582 (2021). Shows that SC approaches are key for the identification of cancer persister cells induced in response to treatment.

Kim, C. et al. Chemoresistance evolution in triple-negative breast cancer delineated by single-cell sequencing. Cell 173 , 879–893.e13 (2018).

Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat. Med. 25 , 1251–1259 (2019).

Zhang, Y. et al. Single-cell analyses reveal key immune cell subsets associated with response to PD-L1 blockade in triple-negative breast cancer. Cancer Cell 39 , 1578–1593.e8 (2021).

Bassez, A. et al. A single-cell map of intratumoral changes during anti-PD1 treatment of patients with breast cancer. Nat. Med. 27 , 820–832 (2021).

Wu, T. D. et al. Peripheral T cell expansion predicts tumour infiltration and clinical response. Nature 579 , 274–278 (2020).

Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14 , 865–868 (2017). Explains the CITE-seq technique, which enables researchers to simultaneously assess the full transcriptome at SC resolution with the protein expression of selected cell surface markers.

Peterson, V. M. et al. Multiplexed quantification of proteins and transcripts in single cells. Nat. Biotechnol. 35 , 936–939 (2017).

Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37 , 1452–1457 (2019).

Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183 , 1103–1116.e20 (2020).

Clark, S. J. et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat. Commun. 9 , 781 (2018).

Ren, X. et al. COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas. Cell 184 , 1895–1913.e19 (2021).

Mathys, H. et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570 , 332–337 (2019).

Melms, J. C. et al. A molecular single-cell lung atlas of lethal COVID-19. Nature 595 , 114–119 (2021).

Ding, J. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat. Biotechnol. 38 , 737–746 (2020).

Thrupp, N. et al. Single-nucleus RNA-Seq is not suitable for detection of microglial activation genes in humans. Cell Rep. 32 , 108189 (2020).

Der, E. et al. Tubular cell and keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I IFN and fibrosis relevant pathways. Nat. Immunol. 20 , 915–927 (2019).

Haque, A., Engel, J., Teichmann, S. A. & Lönnberg, T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med. 9 , 75 (2017).

Ding, J., Sharon, N. & Bar-Joseph, Z. Temporal modelling using single-cell transcriptomics. Nat. Rev. Genet. 23 , 355–368 (2022). An excellent review on how to design and analyse SC time-series experiments.

Guillaumet-Adkins, A. et al. Single-cell transcriptome conservation in cryopreserved cells and tissues. Genome Biol. 18 , 45 (2017).

Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3 , 160018 (2016).

Svensson, V., da Veiga Beltrame, E. & Pachter, L. A curated database reveals trends in single-cell transcriptomics. Database 2020 , baaa073 (2020).

Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581 , 303–309 (2020).

Tabula Sapiens Consortium. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376 , eabl4896 (2022). The Tabula Sapiens consortium created and publicly released a multi-tissue transcriptome SC atlas covering 15 human donors.

Füllgrabe, A. et al. Guidelines for reporting single-cell RNA-seq experiments. Nat. Biotechnol. 38 , 1384–1386 (2020).

Meghill, C. et al. Cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. Preprint at bioRxiv https://doi.org/10.1101/2021.04.05.438318 (2021).

Li, B. et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat. Methods 17 , 793–798 (2020).

Papatheodorou, I. et al. Expression Atlas update: from tissues to single cells. Nucleic Acids Res. 48 , D77–D83 (2020). EMBL-EBI SCEA is a valuable public SC resource used by industry.

CAS   PubMed   Google Scholar  

Moreno, P. et al. User-friendly, scalable tools and workflows for single-cell RNA-seq analysis. Nat. Methods 18 , 327–328 (2021).

Lähnemann, D. et al. Eleven grand challenges in single-cell data science. Genome Biol. 21 , 31 (2020).

Angerer, P. et al. Single cells make big data: new challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 4 , 85–91 (2017).

Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. 54 , 1572–1580 (2022).

Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. CellPhoneDB: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nat. Protoc. 15 , 1484–1506 (2020).

Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1 , 800–810 (2020).

Warnat-Herresthal, S. et al. Swarm Learning as a privacy-preserving machine learning approach for disease classification. Preprint at  bioRxiv https://doi.org/10.1101/2020.06.25.171009 (2020).

Regev, A. et al. The human cell atlas. eLife 6 , e27041 (2017). Clearly explains the idea and goals of the HCA project.

Han, L. et al. Cell transcriptomic atlas of the non-human primate Macaca fascicularis . Nature 604 , 723–731 (2022).

Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184 , 3573–3587.e29 (2021).

Domínguez Conde, C. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376 , eabl5197 (2022).

Qian, J. et al. A pan-cancer blueprint of the heterogeneous tumor microenvironment revealed by single-cell profiling. Cell Res. 30 , 745–762 (2020).

Zheng, L. et al. Pan-cancer single-cell landscape of tumor-infiltrating T cells. Science 374 , abe6474 (2021).

Sun, D. et al. TISCH: a comprehensive web resource enabling interactive single-cell transcriptome visualization of tumor microenvironment. Nucleic Acids Res. 49 , D1420–D1430 (2021).

Nieto, P. et al. A single-cell tumor immune atlas for precision oncology. Genome Res. 31 , 1913–1926 (2021).

Zhang, F. et al. Defining inflammatory cell states in rheumatoid arthritis joint synovial tissues by integrating single-cell transcriptomics and mass cytometry. Nat. Immunol. 20 , 928–942 (2019).

Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566 , 496–502 (2019).

Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174 , 1309–1324.e18 (2018).

Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357 , 661–667 (2017).

Cusanovich, D. A. et al. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555 , 538–542 (2018).

Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184 , 5985–6001.e19 (2021).

Cheng, J., Liao, J., Shao, X., Lu, X. & Fan, X. Multiplexing methods for simultaneous large‐scale transcriptomic profiling of samples at single‐cell resolution. Adv. Sci. 8 , 2101229 (2021).

Article   CAS   Google Scholar  

Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10 , 1096–1098 (2013).

Hwang, B., Lee, J. H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 50 , 1–14 (2018).

Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29 , 15–21 (2013).

Kaminow, B., Yunusov, D. & Dobin, A. STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. Preprint at  bioRxiv https://doi.org/10.1101/2021.05.05.442755 (2021).

Srivastava, A., Malik, L., Smith, T., Sudbery, I. & Patro, R. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biol. 20 , 65 (2019).

Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34 , 525–527 (2016).

Melsted, P., Ntranos, V. & Pachter, L. The barcode, UMI, set format and BUStools. Bioinformatics 35 , 4472–4473 (2019).

Lun, A. T. L. et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 20 , 63 (2019).

Muskovic, W. & Powell, J. E. DropletQC: improved identification of empty droplets and damaged cells in single-cell RNA-seq data. Genome Biol. 22 , 329 (2021).

Yang, S. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol. 21 , 57 (2020).

Young, M. D. & Behjati, S. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. GigaScience 9 , giaa151 (2020).

Wolock, S. L., Lopez, R. & Klein, A. M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 8 , 281–291.e9 (2019).

McGinnis, C. S., Murrow, L. M. & Gartner, Z. J. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst. 8 , 329–337.e4 (2019).

DePasquale, E. A. K. et al. DoubletDecon: deconvoluting doublets from single-cell RNA-sequencing data. Cell Rep. 29 , 1718–1727.e8 (2019).

Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res 5 , 2122 (2016).

PubMed   PubMed Central   Google Scholar  

Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20 , 296 (2019).

Bacher, R. et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14 , 584–586 (2017).

Duò, A., Robinson, M. D. & Soneson, C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 7 , 1141 (2020).

Kobak, D. & Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10 , 5416 (2019). Best practices on applying tSNE non-linear projections on scRNA-seq data sets.

Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37 , 38–44 (2019). Comparison of UMAP with respect to other non-linear projection methods when applied to scRNA-seq data sets.

Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-Seq. Cell 167 , 1883–1896.e15 (2016).

Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53 , 322–331 (2021).

Yang, L. et al. scMAGeCK links genotypes with multiple phenotypes in single-cell CRISPR screens. Genome Biol. 21 , 19 (2020).

Duan, B. et al. Model-based understanding of single-cell CRISPR screening. Nat. Commun. 10 , 2233 (2019).

Wang, R., Lin, D.-Y. & Jiang, Y. SCOPE: a normalization and copy-number estimation method for single-cell DNA sequencing. Cell Syst. 10 , 445–452.e6 (2020).

Zaccaria, S. & Raphael, B. J. Characterizing allele- and haplotype-specific copy numbers in single cells with CHISEL. Nat. Biotechnol. 39 , 207–214 (2021).

Zafar, H., Wang, Y., Nakhleh, L., Navin, N. & Chen, K. Monovar: single-nucleotide variant detection in single cells. Nat. Methods 13 , 505–507 (2016).

Dong, X. et al. Accurate identification of single-nucleotide variants in whole-genome-amplified single cells. Nat. Methods 14 , 491–493 (2017).

Luquette, L. J., Bohrson, C. L., Sherman, M. A. & Park, P. J. Identification of somatic mutations in single cell DNA-seq using a spatial model of allelic imbalance. Nat. Commun. 10 , 3908 (2019).

Singer, J., Kuipers, J., Jahn, K. & Beerenwinkel, N. Single-cell mutation identification via phylogenetic inference. Nat. Commun. 9 , 5144 (2018).

Mallory, X. F., Edrisi, M., Navin, N. & Nakhleh, L. Methods for copy number aberration detection from single-cell DNA-sequencing data. Genome Biol. 21 , 208 (2020).

Gao, R. et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat. Biotechnol. 39 , 599–608 (2021).

Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344 , 1396–1401 (2014).

Petti, A. A. et al. A general approach for detecting expressed mutations in AML cells using single cell RNA-sequencing. Nat. Commun. 10 , 3660 (2019).

Vu, T. N. et al. Cell-level somatic mutation detection from single-cell RNA sequencing. Bioinformatics 35 , 4679–4687 (2019).

Cuomo, A. S. E. et al. Optimizing expression quantitative trait locus mapping workflows for single-cell studies. Genome Biol. 22 , 188 (2021).

Stubbington, M. J. T. et al. T cell fate and clonality inference from single-cell transcriptomes. Nat. Methods 13 , 329–332 (2016).

Lindeman, I. et al. BraCeR: B-cell-receptor reconstruction and clonality inference from single-cell RNA-seq. Nat. Methods 15 , 563–565 (2018).

Song, L. et al. TRUST4: immune repertoire reconstruction from bulk and single-cell RNA-seq data. Nat. Methods 18 , 627–630 (2021).

Upadhyay, A. A. et al. BALDR: a computational pipeline for paired heavy and light chain immunoglobulin reconstruction in single-cell RNA-seq data. Genome Med. 10 , 20 (2018).

Rizzetto, S. et al. B-cell receptor reconstruction from single-cell RNA-seq with VDJPuzzle. Bioinformatics 34 , 2846–2847 (2018).

Borcherding, N., Bormann, N. L. & Kraus, G. scRepertoire: an R-based toolkit for single-cell immune receptor analysis. F1000Res 9 , 47 (2020).

McDavid, A., Gu, Y. & VonKaenel, E. CellaRepertorium: data structures, clustering and testing for single cell immune receptor repertoires (scRNAseq RepSeq/AIRR-seq). https://rdrr.io/bioc/CellaRepertorium (2021).

Zhang, Z., Xiong, D., Wang, X., Liu, H. & Wang, T. Mapping the functional landscape of T cell receptor repertoires by single-T cell transcriptomics. Nat. Methods 18 , 92–99 (2021).

Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523 , 486–490 (2015).

Wu, S. J. et al. Single-cell CUT&Tag analysis of chromatin modifications in differentiation and tumor progression. Nat. Biotechnol. 39 , 819–824 (2021).

Grosselin, K. et al. High-throughput single-cell ChIP-seq identifies heterogeneity of chromatin states in breast cancer. Nat. Genet. 51 , 1060–1066 (2019).

Clark, S. J. et al. Genome-wide base-resolution mapping of DNA methylation in single cells using single-cell bisulfite sequencing (scBS-seq). Nat. Protoc. 12 , 534–547 (2017).

Slavov, N. Learning from natural variation across the proteomes of single cells. PLoS Biol. 20 , e3001512 (2022).

Vistain, L. F. & Tay, S. Single-cell proteomics. Trends Biochem. Sci. 46 , 661–672 (2021).

Perkel, J. M. Single-cell proteomics takes centre stage. Nature 597 , 580–582 (2021).

Brinkerhoff, H., Kang, A. S. W., Liu, J., Aksimentiev, A. & Dekker, C. Multiple rereads of single proteins at single–amino acid resolution using nanopores. Science 374 , 1509–1513 (2021).

Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol. 10 , 1246–1258 (2021).

Hücker, S. M. et al. Single-cell microRNA sequencing method comparison and application to cell lines and circulating lung tumor cells. Nat. Commun. 12 , 4316 (2021).

Gawronski, K. A. B. & Kim, J. Single cell transcriptomics of noncoding RNAs and their cell-specificity: Single cell transcriptomics of noncoding RNAs. WIREs RNA 8 , e1433 (2017).

Seydel, C. Single-cell metabolomics hits its stride. Nat. Methods 18 , 1452–1456 (2021).

VanInsberghe, M., van den Berg, J., Andersson-Rolf, A., Clevers, H. & van Oudenaarden, A. Single-cell Ribo-seq reveals cell cycle-dependent translational pausing. Nature 597 , 561–565 (2021).

Arrastia, M. V. et al. Single-cell measurement of higher-order 3D genome organization with scSPRITE. Nat. Biotechnol. 40 , 64–73 (2022).

Zhang, R., Zhou, T. & Ma, J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat. Biotechnol. 40 , 254–261 (2022).

Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348 , aaa6090 (2015).

Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363 , 1463–1467 (2019).

Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods 16 , 987–990 (2019).

Liu, B., Li, Y. & Zhang, L. Analysis and visualization of spatial transcriptomic data. Front. Genet. 12 , 785290 (2022).

Hu, J. et al. Statistical and machine learning methods for spatially resolved transcriptomics with histology. Comput. Struct. Biotechnol. J. 19 , 3829–3841 (2021).

Zeng, Z., Li, Y., Li, Y. & Luo, Y. Statistical and machine learning methods for spatially resolved transcriptomics data analysis. Genome Biol. 23 , 83 (2022).

Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. 40 , 308–318 (2022).

Jiang, R., Sun, T., Song, D. & Li, J. J. Statistics or biology: the zero-inflation controversy about scRNA-seq data. Genome Biol. 23 , 31 (2022).

Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19 , 41–50 (2022). Benchmark of data integration methods of scRNA-seq data sets.

Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21 , 12 (2020).

Song, F., Chan, G. M. A. & Wei, Y. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nat. Commun. 11 , 3274 (2020).

Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177 , 1888–1902.e21 (2019).

Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16 , 983–986 (2019).

Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15 , 359–362 (2018).

Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20 , 163–172 (2019).

Cortal, A., Martignetti, L., Six, E. & Rausell, A. Gene signature extraction and cell identity recognition at the single-cell level with Cell-ID. Nat. Biotechnol. 39 , 1095–1102 (2021).

Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14 , 979–982 (2017).

Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20 , 59 (2019).

Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19 , 477 (2018).

Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19 , 271–281 (2017).

Schlitzer, A. et al. Identification of cDC1- and cDC2-committed DC progenitors reveals early lineage priming at the common DC progenitor stage in the bone marrow. Nat. Immunol. 16 , 718–728 (2015).

La Manno, G. et al. RNA velocity of single cells. Nature 560 , 494–498 (2018).

Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38 , 1408–1414 (2020).

Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19 , 159–170 (2022).

Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinforma. 14 , 7 (2013).

Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102 , 15545–15550 (2005).

Fan, J. et al. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat. Methods 13 , 241–244 (2016).

DeTomaso, D. et al. Functional interpretation of single cell similarity maps. Nat. Commun. 10 , 4376 (2019).

Wei, C.-J., Xu, X. & Lo, C. W. Connexins and cell signaling in development and disease. Annu. Rev. Cell Dev. Biol. 20 , 811–838 (2004).

Noël, F. et al. Dissection of intercellular communication using the transcriptome-based framework ICELLNET. Nat. Commun. 12 , 1089 (2021).

Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat. Methods 17 , 159–162 (2020).

Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12 , 1088 (2021).

Cabello-Aguilar, S. et al. SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics. Nucleic Acids Res. 48 , e55–e55 (2020).

Wang, S., Karikomi, M., MacLean, A. L. & Nie, Q. Cell lineage and communication network inference via optimization for single-cell transcriptomics. Nucleic Acids Res. 47 , e66 (2019).

Dimitrov, D. et al. Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data. Nat. Commun. 13 , 3224 (2022). A review of methods for inferring intercellular interactions from SC transcriptomics data sets.

Zhang, Q. et al. Landscape and dynamics of single immune cells in hepatocellular carcinoma. Cell 179 , 829–845.e20 (2019).

Wang, X., Park, J., Susztak, K., Zhang, N. R. & Li, M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 10 , 380 (2019).

Erdmann-Pham, D. D., Fischer, J., Hong, J. & Song, Y. S. Likelihood-based deconvolution of bulk gene expression data using single-cell references. Genome Res. 31 , 1794–1806 (2021).

Wang, J., Roeder, K. & Devlin, B. Bayesian estimation of cell type-specific gene expression with prior derived from single-cell data. Genome Res. 31 , 1807–1818 (2021).

Sokolowski, D. J. et al. Single-cell mapper (scMappR): using scRNA-seq to infer the cell-type specificities of differentially expressed genes. NAR Genom. Bioinform. 3 , lqab011 (2021).

Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37 , 773–782 (2019). This paper presents CIBERSORTx — a method to computationally infer cell-type-specific gene expression profiles and their relative proportions from bulk RNA-seq samples relying on scRNA-seq data sets as the reference for relevant cell types and their markers.

Luca, B. A. et al. Atlas of clinically distinct cell states and ecosystems across human solid tumors. Cell 184 , 5482–5496.e28 (2021).

Goldstein, L. D. et al. Massively parallel single-cell B-cell receptor sequencing enables rapid discovery of diverse antigen-reactive antibodies. Commun. Biol. 2 , 304 (2019).

Marks, C. & Deane, C. M. How repertoire data are changing antibody science. J. Biol. Chem. 295 , 9823–9837 (2020).

Setliff, I. et al. High-throughput mapping of B cell receptor sequences to antigen specificity. Cell 179 , 1636–1646.e15 (2019).

Peng, L. et al. Monospecific and bispecific monoclonal SARS-CoV-2 neutralizing antibodies that maintain potency against B.1.617. Nat. Commun. 13 , 1638 (2022).

Castellanos-Rueda, R., Di Roberto, R. B., Schlatter, F. S. & Reddy, S. T. Leveraging single-cell sequencing for chimeric antigen receptor T cell therapies. Trends Biotechnol. 39 , 1308–1320 (2021). Review paper on how SC sequencing is helping to characterize and identify CAR-T cells.

Li, X. et al. Single-cell transcriptomic analysis reveals BCMA CAR-T cell dynamics in a patient with refractory primary plasma cell leukemia. Mol. Ther. 29 , 645–657 (2021). Illustrative example of how scRNA-seq can be used to analyse the dynamics of CAR-T cells in a clinically successful case of relapsed or refractory primary plasma cell leukaemia.

Deng, Q. et al. Characteristics of anti-CD19 CAR T cell infusion products associated with efficacy and toxicity in patients with large B cell lymphomas. Nat. Med. 26 , 1878–1887 (2020).

Chen, G. M. et al. Integrative bulk and single-cell profiling of premanufacture T-cell populations reveals factors mediating long-term persistence of CAR T-cell therapy. Cancer Discov. 11 , 2186–2199 (2021).

Parker, K. R. et al. Single-cell analyses identify brain mural cells expressing CD19 as potential off-tumor targets for CAR-T immunotherapies. Cell 183 , 126–142.e17 (2020).

Jing, Y. et al. Expression of chimeric antigen receptor therapy targets detected by single-cell sequencing of normal cells may contribute to off-tumor toxicity. Cancer Cell 39 , 1558–1559 (2021).

Wang, D. et al. CRISPR screening of CAR T cells and cancer stem cells reveals critical dependencies for cell-based therapies. Cancer Discov. 11 , 1192–1211 (2021).

Legut, M. et al. A genome-scale screen for synthetic drivers of T cell proliferation. Nature 603 , 728–735 (2022).

Kumar, N. et al. Rapid single cell evaluation of human disease and disorder targets using REVEAL: SingleCell TM . BMC Genomics 22 , 5 (2021). Illustrative example of how the pharmaceutical industry is using publicly available SC resources internally.

Lachmann, A. et al. Massive mining of publicly available RNA-seq data from human and mouse. Nat. Commun. 9 , 1366 (2018).

Collado-Torres, L. et al. Reproducible RNA-seq analysis using recount2. Nat. Biotechnol. 35 , 319–321 (2017).

Vivian, J. et al. Toil enables reproducible, open source, big biomedical data analyses. Nat. Biotechnol. 35 , 314–316 (2017).

Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15 , 255–261 (2018).

Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20 , 257–272 (2019).

Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15 , e8746 (2019). Provides best practices on analysing SC transcriptomics data sets.

Cannoodt, R., Saelens, W., Deconinck, L. & Saeys, Y. Spearheading future omics analyses using dyngen, a multi-modal simulator of single cells. Nat. Commun. 12 , 3942 (2021).

Treppner, M. et al. Synthetic single cell RNA sequencing data from small pilot studies using deep generative models. Sci. Rep. 11 , 9403 (2021).

Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18 , 174 (2017).

Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37 , 547–554 (2019). A comprehensive review that compares trajectory inference methods for SC data sets and provides guidance on their limitations and usage.

Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16 , 1289–1296 (2019).

Mayr, C. H. et al. Integrative analysis of cell state changes in lung fibrosis with peripheral protein biomarkers. EMBO Mol. Med. 13 , e12871 (2021).

Nguyen, Q. H., Pervolarakis, N., Nee, K. & Kessenbrock, K. Experimental considerations for single-cell RNA sequencing approaches. Front. Cell Dev. Biol. 6 , 108 (2018).

Dal Molin, A. & Di Camillo, B. How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives. Brief. Bioinform. 20 , 1384–1394 (2019).

Download references

Acknowledgements

The authors thank I. Papatheodorou (Research Group Leader, EMBL-EBI), B. Kidd (Director, Bristol Myers Squibb (BMS)), R. Loos (Director, BMS) and M. Hall (Senior Scientific Officer, EMBL-EBI) for constructive criticism and proofreading of the original article before this revision.

Author information

These authors contributed equally: Bram Van de Sande, Joon Sang Lee, Euphemia Mutasa-Gottgens.

Authors and Affiliations

UCB Pharma, Braine l’Alleud, Belgium

Bram Van de Sande

Precision Oncology, Sanofi, Cambridge, MA, USA

Joon Sang Lee

EMBL-EBI, Wellcome Genome Campus, Hinxton, UK

Euphemia Mutasa-Gottgens, Wendi Bacon, Jonathan Manning, Andrew Leach & Edgardo Ferran

Computational Neurobiology, Eisai, Cambridge, MA, USA

Bart Naughton

The Open University, Milton Keynes, UK

Wendi Bacon

Precision Bioinformatics, Prometheus Biosciences, San Diego, CA, USA

Moderna Inc., Cambridge, MA, USA

Jack Pollard

Genomic Sciences, GlaxoSmithKline, Collegeville, PA, USA

Melissa Mendez

Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, USA

Informatics & Predictive Sciences, Bristol Myers Squibb, San Diego, CA, USA

Namit Kumar

Genomic Research Center, AbbVie Inc., Cambridge, MA, USA

Xiaohong Cao

Magnet Biomedicine, Cambridge, MA, USA

Human Genetics and Computational Biology, GlaxoSmithKline, Collegeville, PA, USA

Mugdha Khaladkar

Oncology Research and Development Unit, Pfizer, La Jolla, CA, USA

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Euphemia Mutasa-Gottgens .

Ethics declarations

Competing interests.

N.K. is an employee and shareholder of BMS. M.M. is an employee and shareholder of GSK. B.V.d.S. is an employee and shareholder of UCB Pharma. M.K. is an employee and shareholder of GSK. J.H. is an employee of Boehringer Ingelheim Pharmaceuticals, Inc. B.N. is an employee of Eisai, Inc. J.S.L. is an employee and shareholder of Sanofi. Y.W. was previously a shareholder of BMS. J.P. was previously an employee and shareholder of Sanofi. J.W. is an employee of Pfizer. E.F. is a shareholder of Sanofi and Board Director of Pulmobiotics. A.L. is a GSK shareholder, has consulted for Astex Therapeutics, LifeArc and Syncona and has received research funding from Novo Nordisk and AstraZeneca. X.C. is a former employee and shareholder of AbbVie. E.M.-G., W.B. and J.M. declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

A short DNA sequence ‘tag’ to identify reads that originate from the same cell.

Readouts used to classify biological states, often in the context of patient stratification.

Estimation of the proportion of particular cell types in a bulk RNA sequencing sample, based on cell markers or a labelled single-cell expression matrix.

A pooled or arrayed screen of cells harbouring CRISPR-mediated gene edits.

Sets of two (or more) cells mistakenly considered as single cells, owing to being captured and processed in the same droplet and thus with the same barcode in data.

A labelling technique that attaches barcoded antibodies to cell surface proteins, allowing multiplexing of samples for single-cell sequencing, and subsequent disambiguation of sample of origin during analysis.

A set of data that describe and give information about other data (Oxford dictionary). For example, patient or sample characteristics in an RNA sequencing experiment.

A popular R package for the quality control, analysis and exploration of single-cell RNA sequencing data.

Also called target qualification. Exploration of target quality more expansively than a straightforward target validation. May include contextually informed enquiries into biological characteristics such as network, pathway or interactome mapping, regulatory landscape or other investigations intended to either help rank target quality or inform on-target biology.

(t-SNE). A popular dimensionality reduction technique for the visualization of single-cell experiments.

Inference from single-cell data of the order of cells along a dynamic biological process (for example, developmental trajectory). Relies on the fact that a heterogeneous sample provides a snapshot view on a mixture of cells in different phases along the developmental or dynamic biological process. Also called ‘pseudo-time analysis’.

(UMAP). A popular dimensionality reduction technique for the visualization of single-cell experiments, with some advantages in preservation of global data structure and performance compared with t-distributed stochastic neighbour embedding.

(UMI). Reads with the same UMI are from the same mRNA molecule. UMIs help in the assessment of sequencing accuracy and precision.

Analysis grouping of similar samples together that does not require labelling or prior knowledge.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Van de Sande, B., Lee, J.S., Mutasa-Gottgens, E. et al. Applications of single-cell RNA sequencing in drug discovery and development. Nat Rev Drug Discov 22 , 496–520 (2023). https://doi.org/10.1038/s41573-023-00688-4

Download citation

Accepted : 10 March 2023

Published : 28 April 2023

Issue Date : June 2023

DOI : https://doi.org/10.1038/s41573-023-00688-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Antimicrobial resistance crisis: could artificial intelligence be the solution.

  • Guang-Yu Liu
  • Xiao-Fen Liu

Military Medical Research (2024)

Single-cell RNA sequencing reveals critical modulators of extracellular matrix of penile cavernous cells in erectile dysfunction

  • Yaqian Peng

Scientific Reports (2024)

Unidirectional particle transport in microfluidic chips operating in a tri-axial magnetic field for particle concentration and bio-analyte detection

  • Negar Sadeghidelouei
  • Roozbeh Abedini-Nassab

Microfluidics and Nanofluidics (2024)

Macrophage profiling in atherosclerosis: understanding the unstable plaque

  • Ioanna Gianopoulos
  • Stella S. Daskalopoulou

Basic Research in Cardiology (2024)

Single-cell technologies in multiple myeloma: new insights into disease pathogenesis and translational implications

  • Mengping Chen
  • Jinxing Jiang

Biomarker Research (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

drug screening research papers

  • Open access
  • Published: 26 March 2023

A systematic review of substance use screening in outpatient behavioral health settings

  • Diana Woodward 1 ,
  • Timothy E. Wilens 1 ,
  • Meyer Glantz 2 ,
  • Vinod Rao 1 ,
  • Colin Burke 1 &
  • Amy M. Yule   ORCID: orcid.org/0000-0002-2409-9426 3  

Addiction Science & Clinical Practice volume  18 , Article number:  18 ( 2023 ) Cite this article

2483 Accesses

2 Citations

9 Altmetric

Metrics details

Despite the frequent comorbidity of substance use disorders (SUDs) and psychiatric disorders, it remains unclear if screening for substance use in behavioral health clinics is a common practice. The aim of this review is to examine what is known about systematic screening for substance use in outpatient behavioral health clinics.

We conducted a PRISMA-based systematic literature search assessing substance use screening in outpatient adult and pediatric behavioral health settings in PubMed, Embase, and PsycINFO. Quantitative studies published in English before May 22, 2020 that reported the percentage of patients who completed screening were included.

Only eight articles met our inclusion and exclusion criteria. Reported prevalence of screening ranged from 48 to 100%, with half of the studies successfully screening more than 75% of their patient population. There were limited data on patient demographics for individuals who were and were not screened (e.g., gender, race) and screening practices (e.g., electronic versus paper/pencil administration).

Conclusions

The results of this systematic review suggest that successful screening for substance use in behavioral health settings is possible, yet it remains unclear how frequently screening occurs. Given the high rates of comorbid SUD and psychopathology, future research is necessary regarding patient and clinic-level variables that may impact the successful implementation of substance use screening.

Trial registry A methodological protocol was registered with the PROSPERO systematic review protocol registry (ID: CRD42020188645).

Introduction

Substance use disorders (SUD) pose a substantial societal burden in the United States. In 2020 alone, an estimated 28.3 million people aged 12 or older met criteria for a past-year alcohol use disorder, while 18.4 million people aged 12 or older experienced a past-year illicit drug use disorder [ 1 ] Risky substance use and SUD are associated with substantial disability and mortality, with an estimated 480000 tobacco-related deaths and 95000 alcohol-related deaths annually in the United States [ 2 , 3 ]. Of particular concern, drug-related overdose deaths have risen over the past years, increasing from 70,630 deaths in 2019 to 92000 deaths in 2020 [ 4 , 5 ].

Prior research has established psychopathology as a significant risk factor for developing a SUD [ 6 , 7 , 8 , 9 ]. For example, individuals with depression are approximately 2 times more likely to develop a SUD, and those with attention deficit hyperactivity disorder exhibit a 2.3 times greater risk [ 10 ]. Furthermore, individuals with one or more psychiatric diagnoses experience greater SUD severity [ 11 , 12 ]. The sequelae of co-occurring SUD and psychiatric disorders include increased odds of additional psychopathology [ 15 ], hospitalizations [ 16 ], suicide attempts [ 13 , 17 , 18 ], overdose [ 19 , 20 , 21 ], criminal behavior [ 22 ], and homelessness [ 23 ]. Additionally, adults with co-occurring disorders report overall lower quality of life [ 24 ] and lower social and occupational functioning [ 13 , 25 , 26 ].

Despite the imposed burden of comorbid SUD and psychopathology, in 2019, 51.4% of individuals in the United States with co-occurring disorders received no treatment, 38.7% received mental health treatment only, 7.8% received treatment for both mental health and SUD, and 1.9% received SUD treatment only (27). Given that many treatment-seeking individuals with co-occurring SUD and psychopathology obtain mental health treatment rather than substance use treatment, screening for substance use concerns in behavioral health settings is necessary to identify individuals at the greatest risk for maladaptive outcomes.

To this end, both the Substance Abuse and Mental Health Services Administration (SAMHSA) and the National Institute for Health and Clinical Guidance (NICE) have urged mental health providers to routinely administer patient self-report questionnaires to screen for substance use [ 28 , 29 ]. Most efforts to integrate substance use screening into clinical care have focused on primary care settings [ 30 , 31 , 32 , 33 ]. As such, the success of substance use screening tools in other outpatient settings remains unclear. Because behavioral health clinics generally have both fewer ancillary supports to assist with screening compared to primary care, as well as high staff turnover rates [ 34 , 35 ], research is needed on screening for substance use in these settings. Hence, we aim to summarize the extant literature on systematic screening for substance use in behavioral health, with a focus on the prevalence of screening within these clinics, characteristics of the screening tools used, and screening practices.

A methodological protocol was registered with the PROSPERO systematic review protocol registry (ID: CRD42020188645).

Search strategy

We conducted a search based upon Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines of peer-reviewed literature within the PubMed, Embase, and PsycINFO databases through May 22, 2020 with no restrictions for the start date. We examined both the prevalence and frequency of substance use screening in outpatient behavioral health clinics as well as the characteristics of the outpatient behavioral health clinics that screen for substance use. We searched each database using various combinations of search terms that can be found in the Additional file 1 . Bibliographies of reviewed articles were also examined for additional studies to ensure that no relevant articles were omitted.

Inclusion criteria were quantitative studies examining substance use screening in outpatient adult and pediatric behavioral health clinics published in English. This included general psychiatric clinics, community mental health organizations, university counseling centers, and other specialty services. Studies were only included if they implemented systematic screening for substance use and reported the percentage of patients who completed the screener. Editorials, commentaries, opinion papers, chapters, and research studies that recruited participants to complete screening tools were excluded. Studies examining screening for substance use only in integrated behavioral health settings within primary care, emergency rooms, or inpatient settings were also excluded. If studies examined screening for substance use in behavioral health-only clinics alongside integrated behavioral health settings they were included if they stratified screening rates by clinic type.

Selection of studies

Two reviewers independently screened the titles and abstracts of all papers. Any disagreements were resolved by consensus, and irrelevant titles were excluded. A record was kept of all irrelevant and duplicate articles. The full text of the remaining papers was reviewed by the two investigators and included/excluded. A third senior investigator reviewed all the included papers to confirm they met inclusion/exclusion criteria.

Data extraction, analysis, and synthesis

Data were extracted from the quantitative studies by one reviewer and discussed with the senior reviewer. The following variables were extracted: Setting, sample size, percentage of patients screened, patient demographics, language of screening tool, screener administered, substances screened, date of study, frequency of screening, and method of screening (computer, paper, self-report, clinician report, etc.).

Our initial search yielded 362 non-duplicate articles (Fig.  1 ). Eighty-four articles were determined to be potentially relevant and therefore reviewed in full. Of the 84 potentially relevant articles, 76 articles were excluded based on eligibility criteria (see Fig.  1 ). Eight articles were included in the final review (Table 1 ). The most common reasons for exclusion after full-text review were that the article reported on data from a sample recruited for a research study (N = 23), the authors did not report the percentage of patients screened (N = 13), or screening was implemented in a non-behavioral health setting (N = 11). The 8 articles included in this systematic review were published between 1992 and 2018. The sample sizes ranged from 88 to 22,956 screened patients.

figure 1

PRISMA diagram

Six of the eight studies were conducted in behavioral health clinics within a larger healthcare system, two of which took place in Veterans Affairs (VA) facilities [ 36 , 37 , 38 , 39 , 40 , 41 ]. The two studies that were not conducted in healthcare systems were conducted in a university counseling center [ 42 ] and community mental health organizations [ 43 ]. All studies were conducted in the United States. Four studies were single-site [ 37 , 40 , 41 , 42 ], three studies included multiple sites ranging from 2 to 48 [ 36 , 39 , 43 ], and one study did not report the number of sites [ 38 ]. The majority of the studies (62.5%) were conducted in adult clinics [ 37 , 39 , 40 , 42 , 44 ], with one study focused on college students [ 42 ]. Two studies included pediatric patients [ 36 , 43 ], and one study did not report age [ 41 ].

Screener and substances screened

All of the studies screened for alcohol, the majority screened for drugs (N = 6) [ 36 , 37 , 39 , 40 , 42 , 43 ], and half of the studies screened for tobacco (N = 4) [ 36 , 39 , 40 , 42 ]. Of those that screened for drugs, two studies administered screeners which did not differentiate type of substance [ 36 , 37 ]. Of the remaining four, all specifically queried about marijuana/cannabis [ 39 , 40 , 42 , 43 ], and three screened for other drugs, including opioids [ 39 , 40 , 42 ]. A range of 1 to 5 screeners was used to assess for substance use. Additionally, one study administered both a pre-screening instrument and a screening instrument [ 42 ]. The most commonly used screeners were the Alcohol Use Disorders Identification Test-Concise (AUDIT-C) [ 38 , 42 , 45 ], the CRAFFT [ 36 , 43 , 46 ], the Alcohol, Smoking, and Substance Involvement Screening Test (ASSIST) [ 39 , 42 , 47 ], and the Short Michigan Alcohol Screening Test (SMAST) [ 40 , 41 , 48 ] (all N = 2).

Frequency and methods of screening

The majority of studies reported screening only at intake (N = 6) [ 37 , 38 , 39 , 40 , 41 , 42 ]. One clinic implemented different screening instruments at intake, quarterly, and one year [ 36 ], and Stanhope et al. did not report the frequency of their screening across community mental health organizations. Of the eight studies, five relied solely on self-administration [ 36 , 37 , 39 , 40 , 41 ], one on both self- (prescreen) and clinician- (screen) administration [ 42 ], and two did not report how the screening was administered [ 38 , 43 ]. Additionally, although the majority (N = 5) of authors did not report how information was collected [ 36 , 38 , 41 , 42 , 43 ], two studies utilized an electronic screen [ 39 , 40 ] and one study relied on paper and pencil [ 37 ]. Finally, none of the studies reported the language of their screening instrument(s) [ 37 ].

Screening rate

One study reported screening all patients [ 42 ]. The screening rates of the remaining studies ranged from 48 to 93.5% of patients. Screening in adult-only clinics ranged from 48 to 100% of patients [ 37 , 38 , 39 , 40 , 42 ] while screening from clinics with adult and pediatric patients ranged from 84 to 93.5% [ 36 , 43 ]. The screening rate using an electronic screen ranged from 48 to 75% of patients [ 39 , 40 ], and the rate for paper/pencil was 74.9%.

Demographics

Five studies reported on the gender of screened patients [ 36 , 37 , 39 , 40 , 43 ], and one study reported on gender across the total study population (patients who did and did not complete the screening) [ 38 ]. Of those that reported on the gender of screened patients, the range was 30 to 86% male. In two studies that did not report the gender across the total study population, the studies did report that there were no significant gender differences between patients who did and did not complete screening [ 40 , 43 ].

Four studies reported the mean age of screened patients, with a range of 16.6 to 42.9 years [ 37 , 39 , 40 , 43 ]. Three studies reported mean age across the total study population, with a range of 36.1 to 53.5 years [ 38 , 39 , 40 ]. One additional study that did not report mean age across the total study population reported no significant difference in mean age between screened patients and the total study population [ 43 ].

The three studies that reported on the race of screened patients included predominantly white patients, with these participants ranging from 52.8 to 72% of the sample [ 39 , 40 , 43 ]. The next most represented race was Asian, ranging from 9 to 10.5% of the sample. No studies reported race across the total study population; however, two studies reported no significant racial differences between patients who did and did not complete screening [ 40 , 43 ].

The two studies that reported on the ethnicity of screened patients included predominantly non-Hispanic patients, with these patients ranging from 73 to 93% of the sample [ 40 , 43 ]. The one study that reported ethnicity across the total study population (patients who did and did not complete screening) was also largely non-Hispanic (94.2%) [ 38 ]. Two additional studies reported no significant differences in ethnicity between patients who did and did not complete screening; however, they did not report ethnicity type for the study population [ 40 , 43 ].

Psychiatric comorbidities

Though two studies provided descriptive information on psychopathology, neither compared psychopathology between those who were and were not screened in the clinic [ 37 , 38 ]. Karno et al. reported rates of depressive disorder (48%), anxiety disorder (15%), bipolar disorder (13%), and schizophrenia/ schizoaffective disorder (11%) in screened patients. King and colleagues found that 15.1% of all clinic patients had a trauma/ stressor-related disorder (including post-traumatic stress disorder), and 12.9% of all clinic patients had a mood disorder.

Our aim in this review was to determine the prevalence and the characteristics of screening practices for substance use in outpatient behavioral health clinics. Though we identified only 8 studies that met review criteria, half of these studies reported screening more than 75% of their patient population [ 36 , 41 , 42 , 43 ].

The screening rates in the identified studies are comparable to those reported in a recent examination of substance use screening in primary care settings, which found that 71.8% of eligible patients were screened after implementation efforts [ 49 ]. However, whether existing research on screening for substance use represents standard practice in all behavioral health clinics remains unclear given limited reporting on this practice. While the 2020 National Mental Health Services Survey (N-MHSS) reported that approximately 54% of the 4,941 surveyed outpatient mental health treatment facilitates offer screening for tobacco use, it did not specify whether this screening is systematic and routine and did not report on screening for non-nicotine substances (50). Furthermore, the intent to screen for substance use does not always translate into clinical practice. A large survey found that although 93.1% and 78.9% of mental health clinic directors reported having screening guidelines for alcohol and illicit substance use, respectively, only 66.6% and 57.8% of clinic staff reported conducting said screening [ 51 ].

Several patient- and clinic-level variables influence the successful implementation of systematic screening. Unfortunately, few studies in the current review reported patient demographic information. We were therefore unable to identify specific patient demographics associated with a high prevalence of screening for substance use or demographic differences between patients who were and were not screened to help identify patient groups who did not complete screening. This is notable since research from the primary care setting has found differences in screening for substance use based on demographics. For example, Black and Hispanic patients and adults over the age of 65 may require more assistance to complete electronic screening for substance use due to problems with comprehension or technical issues [ 52 ]. In light of increasing overdose deaths among Black and Hispanic youth [ 53 ], research examining the barriers to screening for substance use in particular demographic groups is needed to ensure equitable care.

Clinic factors that influence the successful implementation of screening center around the method of screening administration. For most studies in our review, screening tools were administered as patient self-report [ 36 , 37 , 39 , 40 , 41 ]. This is consistent with recent research in primary care and emergency department settings showing increased patient comfort with self-report screening compared to clinician-administered screening [ 54 , 55 , 56 ], particularly amongst individuals who belong to groups who are more stigmatized for substance use [ 52 , 57 , 58 ]. Another notable finding of our review was the omission of data regarding screening tools (paper and pencil versus electronic) and language of screening. A review of screening in primary care found that electronic questionnaires using patient self-report in both pediatric and adult settings improved data quality and completion time, decreased costs, and were preferred by patients. However, the use of electronic questionnaires also led to increased privacy concerns and access challenges [ 59 ]. Electronic measures, particularly those linked to the electronic medical record, may also result in racial and ethnic disparities in screening completion rates [ 60 ]. Additional research in the behavioral health setting is needed to determine patient and clinician preferences regarding the method of screening, particularly for more stigmatized conditions such as substance use [ 61 , 62 ].

Finally, the timing and frequency of screening is another important factor to consider during implementation. Most studies in our systematic review reported screening patients for substance use only at intake [ 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 ]. Although screening at intake identifies patients who may benefit from SUD treatment [ 63 ], ongoing screening and progress monitoring improves engagement in SUD treatment and SUD outcomes [ 64 , 65 ], and a recent consensus panel organized by SAMHSA recommended screening patients with psychiatric disorders for substance use annually [ 66 ]. Thus, future research should examine the prevalence and success of repeated screening for substance use.

The results of our review need to be considered in light of methodological limitations. The generalizability of the findings may be limited given the small number of eligible manuscripts. Moreover, several of these studies were missing information on patient- and clinic-level variables related to implementation that was recently identified as necessary to report on for studies evaluating the use of patient self-report questionnaires to improve the methodological quality, transparency, and applicability of the findings [ 67 ]. Hence it was difficult to conclude what variables contributed to the successful implementation of screening for substance use in the behavioral health setting. Furthermore, of those studies that did report patient demographics, the majority of the subjects were adults, white, and non-Hispanic. As such, the results may not be generalizable to pediatric or more diverse racial and ethnic groups. Additionally, to narrow the scope of the current review, we excluded manuscripts that examined substance use screening in integrated behavioral health clinics within primary care. Although implementation in these settings is important to investigate to better understand the overall landscape of screening for substance use in settings that provide behavioral health care, integrated behavioral health clinics likely face different barriers and facilitators. Lastly, more clinics may be systematically screening for substance use and not reporting their findings in published results. Thus, this topic is at risk for publication bias as behavioral health clinics that have struggled to implement systematic screening for substance use may not pursue publication.

In summary, the results of our review indicate that screening for substance use in the outpatient behavioral health setting can be successfully implemented at initial intake. Our review highlights the need for further examination of patient- and clinic-level variables that may impact the successful implementation of screening in behavioral health. Future research should include these variables to inform implementation efforts, ensure equity in screening, and achieve consistency with recent reporting guidelines [ 67 ].

Availability of data and materials

All data generated or analyzed during this study are included in this published article [and its Additional information files].

Abbreviations

  • Substance use disorder

Substance Abuse and Mental Health Services Administration

National Institute for Health and Clinical Guidance

Preferred reporting items for systematic reviews and meta-analyses

Veterans affairs

Alcohol use disorders identification test

Car, relax, alone, forget, friends, trouble

Alcohol, smoking and substance involvement screening test

Short Michigan alcohol screening test

National Mental Health Services Survey

Substance Abuse and Mental Health Services Administration. Key Substance Use and Mental Health Indicators in the United States: Results from the 2020 National Survey on Drug Use and Health. 2020;156. https://www.samhsa.gov/data/sites/default/files/reports/rpt35325/NSDUHFFRPDFWHTMLFiles2020/2020NSDUHFFR1PDFW102121.pdf . Accessed 14 July 2022.

Esser MB, Sherk A, Liu Y, Naimi TS, Stockwell T, Stahre M, et al. Deaths and years of potential life lost from excessive alcohol use—United States, 2011–2015. MMWR Morb Mortal Wkly Rep. 2020;69(39):1428–33. https://doi.org/10.15585/mmwr.mm6939a6 .

Article   PubMed   PubMed Central   Google Scholar  

National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health. The health consequences of smoking—50 years of progress: a report of the surgeon general. centers for disease control and prevention (US). 2014. https://www.ncbi.nlm.nih.gov/books/NBK179276/ . Accessed 2 Aug. 2022.

Ahmad FB, Cisewski JA, Rossen LM, Sutton P. Provisional drug overdose death counts national center for health statistics. 2022. https://www.cdc.gov/nchs/nvss/vsrr/drug-overdose-data.htm . Accessed 2 June 2022.

Multiple Cause of Death 1999–2019 on CDC WONDER Online Database. Centers for disease control and prevention, national center for health statistics. 2020 https://www.cdc.gov/nchs/nvss/vsrr/drug-overdose-data.htm . Accessed 18 May 2021.

Abraham HD, Fava M. Order of onset of substance abuse and depression in a sample of depressed outpatients. Compr Psychiatry. 1999;40(1):44–50. https://doi.org/10.1016/s0010-440x(99)90076-7 .

Article   CAS   PubMed   Google Scholar  

Kessler RC. The epidemiology of dual diagnosis. Biol Psychiatry. 2004;56(10):730–7. https://doi.org/10.1016/j.biopsych.2004.06.034 .

Article   PubMed   Google Scholar  

Merikangas KR, McClair VL. Epidemiology of substance use disorders. Hum Genet. 2012;131(6):779–89. https://doi.org/10.1007/s00439-012-1168-0 .

Wilens TE, Martelon M, Joshi G, Bateman C, Fried R, Petty C, et al. Does ADHD predict substance-use disorders? A 10-year follow-up study of young adults with ADHD. J Am Acad Child Adolesc Psychiatry. 2011;50(6):543–53. https://doi.org/10.1016/j.jaac.2011.01.021 .

Groenman AP, Janssen TWP, Oosterlaan J. Childhood psychiatric disorders as risk factor for subsequent substance abuse: a meta-analysis. J Am Acad Child Adolesc Psychiatry. 2017;56(7):556–69. https://doi.org/10.1016/j.jaac.2017.05.004 .

Russell BS, Trudeau JJ, Leland AJ. Social influence on adolescent polysubstance use: the escalation to opioid use. Subst Use Misuse. 2015;50(10):1325–31. https://doi.org/10.3109/10826084.2015.1013128 .

Shane PA, Jasiukaitis P, Green RS. Treatment outcomes among adolescents with substance abuse problems: The relationship between comorbidities and post-treatment substance involvement. Eval Program Plann. 2003;26(4):393–402. https://doi.org/10.1016/S0149-7189(03)00055-7 .

Article   Google Scholar  

Baker KD, Lubman DI, Cosgrave EM, Killackey EJ, Yuen HP, Hides L, et al. Impact of co-occurring substance use on 6 month outcomes for young people seeking mental health treatment. Aust N Z J Psychiatry. 2007;41(11):896–902. https://doi.org/10.1080/00048670701634986 .

Tolliver BK, Anton RF. Assessment and treatment of mood disorders in the context of substance abuse. Dialogues Clin Neurosci. 2015;17(2):181–90. https://doi.org/10.31887/DCNS.2015.17.2/btolliver .

Mitchell JD, Brown ES, Rush AJ. Comorbid disorders in patients with bipolar disorder and concomitant substance dependence. J Affect Disord. 2007;102(1–3):281–7. https://doi.org/10.1016/j.jad.2007.01.005 .

Curran GM, Sullivan G, Williams K, Han X, Allee E, Kotrla KJ. The association of psychiatric comorbidity and use of the emergency department among persons with substance use disorders: an observational cohort study. BMC Emerg Med. 2008;8:17. https://doi.org/10.1186/1471-227X-8-17 .

Appleby L, Shaw J, Amos T, McDonnell R, Harris C, McCann K, et al. Suicide within 12 months of contact with mental health services: national clinical survey. BMJ. 1999;318(7193):1235–9. https://doi.org/10.1136/bmj.318.7193.1235 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Oquendo MA, Currier D, Liu SM, Hasin DS, Grant BF, Blanco C. Increased risk for suicidal behavior in comorbid bipolar disorder and alcohol use disorders: results from the national epidemiologic survey on alcohol and related conditions (NESARC). J Clin Psychiatry. 2010;71(7):902–9. https://doi.org/10.4088/JCP.09m05198gry .

Bohnert AS, Ilgen MA, Ignacio RV, McCarthy JF, Valenstein M, Blow FC. Risk of death from accidental overdose associated with psychiatric and substance use disorders. Am J Psychiatry. 2012;169(1):64–70. https://doi.org/10.1176/appi.ajp.2011.10101476 .

Park TW, Lin LA, Hosanagar A, Kogowski A, Paige K, Bohnert AS. Understanding risk factors for opioid overdose in clinical populations to inform treatment and policy. J Addict Med. 2016;10(6):369–81. https://doi.org/10.1097/ADM.0000000000000245 .

Yule AM, Carrellas NW, Fitzgerald M, McKowen JW, Nargiso JE, Bergman BG, et al. Risk factors for overdose in treatment-seeking youth with substance use disorders. J Clin Psychiatry. 2018. https://doi.org/10.4088/JCP.17m11678 .

Wilton G, Stewart LA. Outcomes of offenders with co-occurring substance use disorders and mental disorders. Psychiatr Serv. 2017;68(7):704–9. https://doi.org/10.1176/appi.ps.201500391 .

Gonzalez G, Rosenheck RA. Outcomes and service use among homeless persons with serious mental illness and substance abuse. Psychiatr Serv. 2002;53(4):437–46. https://doi.org/10.1176/appi.ps.53.4.437 .

Saatcioglu O, Yapici A, Cakmak D. Quality of life, depression and anxiety in alcohol dependence. Drug Alcohol Rev. 2008;27(1):83–90. https://doi.org/10.1080/09595230701711140 .

Kronenberg LM, Slager-Visscher K, Goossens PJJ, van den Brink W, van Achterberg T. Everyday life consequences of substance use in adult patients with a substance use disorder (SUD) and co-occurring attention deficit/hyperactivity disorder (ADHD) or autism spectrum disorder (ASD): a patient’s perspective. BMC Psychiatry. 2014;14(1):264. https://doi.org/10.1186/s12888-014-0264-1 .

Olfson M, Shea S, Feder A, Fuentes M, Nomura Y, Gameroff M, et al. Prevalence of anxiety, depression, and substance use disorders in an urban general medicine practice. Arch Fam Med. 2000;9(9):876. https://doi.org/10.1001/archfami.9.9.876 .

Substance Use and Mental Health Services Administration. 2019 National survey of drug use and health (NSDUH) Releases: U.S. department of health and human services. 2019; https://www.samhsa.gov/data/release/2019-national-survey-drug-use-and-health-nsduh-releases . Accessed 2 Aug. 2022.

National Collaborating Centre for Mental Health (UK). Common mental health disorders: identification and pathways to care. british psychological society (UK). 2011 NICE Clinical Guidelines, No. 123. https://www.ncbi.nlm.nih.gov/books/NBK92266/ . Accessed 2 Aug. 2022.

Substance Use and Mental Health Services Administration. white paper on the evidence supporting screening, brief intervention and referral to treatment (SBIRT). 2011. https://www.samhsa.gov/sites/default/files/sbirtwhitepaper_0.pdf . Accessed 2 Aug. 2022.

McNeely J, Kumar PC, Rieckmann T, Sedlander E, Farkas S, Chollak C, et al. Barriers and facilitators affecting the implementation of substance use screening in primary care clinics: a qualitative study of patients, providers, and staff. Addict Sci Clin Pract. 2018;13(1):8. https://doi.org/10.1186/s13722-018-0110-8 .

McPherson TL, Hersch RK. Brief substance use screening instruments for primary care settings: a review. J Subst Abuse Treat. 2000;18(2):193–202. https://doi.org/10.1016/s0740-5472(99)00028-8 .

Pilowsky DJ, Wu L-T. Screening for alcohol and drug use disorders among adults in primary care: a review. Subst Abuse Rehabil. 2012;3:25. https://doi.org/10.2147/SAR.S30057 .

Rahm AK, Boggs JM, Martin C, Price DW, Beck A, Backer TE, et al. Facilitators and barriers to implementing screening, brief intervention, and referral to treatment (SBIRT) in primary care in integrated health care settings. Subst Abus. 2015;36(3):281–8. https://doi.org/10.1080/08897077.2014.951140 .

Paris M Jr, Hoge MA. Burnout in the mental health workforce: a review. J Behav Health Serv Res. 2010;37(4):519–28. https://doi.org/10.1007/s11414-009-9202-2 .

Woltmann EM, Whitley R, McHugo GJ, Brunette M, Torrey WC, Coots L, et al. The role of staff turnover in the implementation of evidence-based practices in mental health care. Psychiatr Serv. 2008;59(7):732–7. https://doi.org/10.1176/ps.2008.59.7.732 .

Gabel S, Radigan M, Wang R, Sederer LI. Health monitoring and promotion among youths with psychiatric disorders: program development and initial findings. Psychiatr Serv. 2011;62(11):1331–7. https://doi.org/10.1176/ps.62.11.pss6211_1331 .

Karno M, Granholm E, Lin A. Factor structure of the alcohol use disorders identification test (audit) in a mental health clinic sample. J Stud Alcohol. 2000;61(5):751–8. https://doi.org/10.15288/jsa.2000.61.751 .

King PR, Beehler GP, Wade M, Buchholz LJ, Funderburk JS, Lilienthal KR, et al. 2018. Opportunities to improve measurement based care practices in mental health care systems a case example of electronic mental health screening and measurement. Fam Syst Health. https://doi.org/10.1037/fsh0000379

Ramo DE, Bahorik AL, Delucchi KL, Campbell CI, Satre DD. Alcohol and drug use, pain and psychiatric symptoms among adults seeking outpatient psychiatric treatment: latent class patterns and relationship to health status. J Psychoactive Drugs. 2018;50(1):43–53. https://doi.org/10.1080/02791072.2017.1401185 .

Satre D, Wolfe W, Eisendrath S, Weisner C. Computerized screening for alcohol and drug use among adults seeking outpatient psychiatric services. Psychiatr Serv. 2008;59(4):441–4. https://doi.org/10.1176/ps.2008.59.4.441 .

Silverman DC, O’Neill SF, Cleary PD, Barwick C, Joseph R. Recognition of alcohol abuse in psychiatric outpatients and its effect on treatment. Psychiatr Serv. 1992;43(6):644–6. https://doi.org/10.1176/ps.43.6.644 .

Article   CAS   Google Scholar  

Denering LL, Spear SE. Routine use of screening and brief intervention for college students in a university counseling center. J Psychoactive Drugs. 2012;44(4):318–24. https://doi.org/10.1080/02791072.2012.718647 .

Stanhope V, Manuel JI, Jessell L, Halliday TM. Implementing SBIRT for adolescents within community mental health organizations: a mixed methods study. J Subst Abuse Treat. 2018;90:38–46. https://doi.org/10.1016/j.jsat.2018.04.009 .

King WM, Restar A, Operario D. Exploring multiple forms of intimate partner violence in a gender and racially/ethnically diverse sample of transgender adults. J Interpers Violence. 2021;36(19–20):10477–98. https://doi.org/10.1177/0886260519876024 .

Bush K, Kivlahan DR, McDonell MB, Fihn SD, Bradley KA, Project ACQI. The AUDIT alcohol consumption questions (AUDIT-C) an effective brief screening test for problem drinking. Arch Intern Med. 1998. https://doi.org/10.1001/archinte.158.16.1789 .

Knight JR, Sherritt L, Shrier LA, Harris SK, Chang G. Validity of the CRAFFT substance abuse screening test among adolescent clinic patients. Arch Pediatr Adolesc Med. 2002;156(6):607–14. https://doi.org/10.1001/archpedi.156.6.607 .

WHO Assist Working Group. The alcohol, smoking and substance involvement screening test (ASSIST): development, reliability and feasibility. Addiction. 2002;97(9):1183–94. https://doi.org/10.1046/j.1360-0443.2002.00185.x .

Selzer ML, Vinokur A, van Rooijen L. A self-administered short Michigan alcoholism screening test (SMAST). J Stud Alcohol. 1975;36(1):117–26. https://doi.org/10.15288/jsa.1975.36.117 .

McNeely J, Adam A, Rotrosen J, Wakeman SE, Wilens TE, Kannry J, et al. Comparison of methods for alcohol and drug screening in primary care clinics. JAMA Netw Open. 2021;4(5):e2110721. https://doi.org/10.1001/jamanetworkopen.2021.10721 .

Substance Abuse and Mental Health Services Administration. National Mental Health Services Survey (N-MHSS) 2020 data on mental health treatment facilities. 2020. https://www.samhsa.gov/data/report/national-mental-health-services-survey-n-mhss-2020-data-mental-health-treatment-facilities . Accessed 2 Aug. 2022.

Sundström C, Petersén E, Sinadinovic K, Gustafsson P, Berman AH. Identification and management of alcohol use and illicit substance use in outpatient psychiatric clinics in Sweden: a national survey of clinic directors and staff. Addict Sci Clin Pract. 2019;14(1):10. https://doi.org/10.1186/s13722-019-0140-x .

Adam A, Schwartz RP, Wu L-T, Subramaniam G, Laska E, Sharma G, et al. Electronic self-administered screening for substance use in adult primary care patients: feasibility and acceptability of the tobacco, alcohol, prescription medication, and other substance use (myTAPS) screening tool. Addict Sci Clin Pract. 2019;14(1):39. https://doi.org/10.1186/s13722-019-0167-z .

Spencer M, Warner M, Bastian BA, Trinidad JP, Hedegaard H. Drug overdose deaths involving fentanyl, 2011–2016. Natl Vital Stat Rep. 2019;68(3):1–19.

PubMed   Google Scholar  

Chisolm DJ, Gardner W, Julian T, Kelleher KJ. Adolescent satisfaction with computer-assisted behavioural risk screening in primary care. Child Adolesc Ment Health. 2008;13(4):163–8. https://doi.org/10.1111/j.1475-3588.2007.00474.x .

Paperny DM, Aono JY, Lehman RM, Hammar SL, Risser J. Computer-assisted detection and intervention in adolescent high-risk health behaviors. J Pediatr. 1990;116(3):456–62. https://doi.org/10.1016/s0022-3476(05)82844-6 .

Rhodes KV, Lauderdale DS, Stocking CB, Howes DS, Roizen MF, Levinson W. Better health while you wait: a controlled trial of a computer-based intervention for screening and health promotion in the emergency department. Ann Emerg Med. 2001;37(3):284–91. https://doi.org/10.1067/mem.2001.110818 .

Jimenez DE, Bartels SJ, Cardenas V, Alegría M. Stigmatizing attitudes toward mental illness among racial/ethnic older adults in primary care. Int J Geriatr Psychiatry. 2013;28(10):1061–8. https://doi.org/10.1002/gps.3928 .

Small J, Curran GM, Booth B. Barriers and facilitators for alcohol treatment for women: are there more or less for rural women? J Subst Abuse Treat. 2010;39(1):1–13. https://doi.org/10.1016/j.jsat.2010.03.002 .

Meirte J, Hellemans N, Anthonissen M, Denteneer L, Maertens K, Moortgat P, et al. Benefits and disadvantages of electronic patient-reported outcome measures: systematic review. JMIR Perioper Med. 2020;3(1):e15588. https://doi.org/10.2196/15588 .

Sisodia RC, Rodriguez JA, Sequist TD. Digital disparities: lessons learned from a patient reported outcomes program during the COVID-19 pandemic. JAMIA Open. 2021;28(10):2265–8. https://doi.org/10.1093/jamia/ocab138 .

Earnshaw VA, Bogart LM, Menino DD, Kelly JF, Chaudoir SR, Reed NM, et al. Disclosure, stigma, and social support among young people receiving treatment for substance use disorders and their caregivers: a qualitative analysis. Int J Ment Health Addict. 2019;17(6):1535–49. https://doi.org/10.1007/s11469-018-9930-8 .

Kelly JF, Greene MC, Abry A. A US national randomized study to guide how best to reduce stigma when describing drug-related impairment in practice and policy. Addiction. 2021;116(7):1757–67. https://doi.org/10.1111/add.15333 .

Simon KM, Harris SK, Shrier LA, Bukstein OG. Measurement-based care in the treatment of adolescents with substance use disorders. Child Adolesc Psychiatr Clin N Am. 2020;29(4):675–90. https://doi.org/10.1016/j.chc.2020.06.006 .

Fadus MC, Squeglia LM, Valadez EA, Tomko RL, Bryant BE, Gray KM. Adolescent substance use disorder treatment: an update on evidence-based strategies. Curr Psychiatry Rep. 2019;21(10):96. https://doi.org/10.1007/s11920-019-1086-0 .

Van Horn DHA, Goodman J, Lynch KG, Bonn-Miller MO, Thomas T, Del Re AC, et al. The predictive validity of the progress assessment, a clinician administered instrument for use in measurement-based care for substance use disorders. Psychiatry Res. 2020. https://doi.org/10.1016/j.psychres.2020.113282 .

Substance Abuse and Mental Health Services Administration. TIP 42: Substance use disorder treatment for people with co-occurring disorders. 2020; 42. https://store.samhsa.gov/product/tip-42-substance-use-treatment-persons-co-occurring-disorders/PEP20-02-01-004 . Accessed 28 July 2022.

Gagnier JJ, Lai J, Mokkink LB, Terwee CB. COSMIN reporting guideline for studies on measurement properties of patient-reported outcome measures. Qual Life Res. 2021. https://doi.org/10.1007/s11136-021-02822-4 .

Download references

Acknowledgements

We would like to acknowledge Melissa Lydston for her essential role in developing our search terms. We would like to acknowledge Sylvia Lanni for her important role in reviewing and editing our manuscript.

AY and TW received funding for this project from the National Institutes of Health through the NIH HEAL Initiative under award number 4UH3DA050252-0. In addition, AY was supported by research funding from the AACAP-NIDA Career Development Award in Substance Use Research (K12), Award Number 5K12DA000357-17, Boston University Doris Duke Charitable Foundation’s Fund to Retain Clinical Scientists, and a Boston University Clinical and Translational Science Institute voucher. She also has funding for clinical program development from the Jack Satter Foundation. CB receives funding from the Harvard Medical School Zinberg Fellowship in Addiction Psychiatry Research, the Massachusetts General Hospital Louis V. Gerstner Research Scholarship, and the AACAP-NIDA Career Development Award in Substance Use Research (K12), Award Number 3K12DA000357-22S1.

Author information

Authors and affiliations.

Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA

Diana Woodward, Timothy E. Wilens, Vinod Rao & Colin Burke

Private Clinical Practice, Rockville, MD, USA

Meyer Glantz

Department of Psychiatry, Boston Medical Center, 850 Harrison Avenue, Boston, MA, 02118, USA

Amy M. Yule

You can also search for this author in PubMed   Google Scholar

Contributions

DW, TW, VR, and AY contributed to the development of the study protocol. DW screened the titles and abstracts of all papers. TW and AY reviewed included papers. VR extracted data from quantitative studies. All authors provided contributions to the writing of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Amy M. Yule .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

AY is a consultant to the Gavin House and BayCove Human Services (clinical services), as well as the American Psychiatric Association's Providers Clinical Support System Sub-Award. TW has been a consultant for Neurovance/Otsuka, Ironshore, KemPharm, and Vallon, and he has a licensing agreement with Ironshore for a copyrighted diagnostic questionnaire that he co-owns (Before School Functioning Questionnaire). TW also serves as a clinical consultant to the US National Football League (ERM Associates), US Minor/Major League Baseline, Phoenix House/Gavin Foundation, and Bay Cove Human Services. There are no disclosures to report for the remaining authors.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

. Search terms used in the systematic review of substance use screening in outpatient behavioral health settings.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Woodward, D., Wilens, T.E., Glantz, M. et al. A systematic review of substance use screening in outpatient behavioral health settings. Addict Sci Clin Pract 18 , 18 (2023). https://doi.org/10.1186/s13722-023-00376-z

Download citation

Received : 25 August 2022

Accepted : 15 March 2023

Published : 26 March 2023

DOI : https://doi.org/10.1186/s13722-023-00376-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Substance use
  • Behavioral health

Addiction Science & Clinical Practice

ISSN: 1940-0640

drug screening research papers

  • Reference Manager
  • Simple TEXT file

People also looked at

Review article, structure-based virtual screening: from classical to artificial intelligence.

drug screening research papers

  • 1 Laboratory of Pharmaceutical Medicinal Chemistry, Federal University of São João Del Rei, Divinópolis, Brazil
  • 2 Federal Center for Technological Education of Minas Gerais—CEFET-MG, Belo Horizonte, Brazil

The drug development process is a major challenge in the pharmaceutical industry since it takes a substantial amount of time and money to move through all the phases of developing of a new drug. One extensively used method to minimize the cost and time for the drug development process is computer-aided drug design (CADD). CADD allows better focusing on experiments, which can reduce the time and cost involved in researching new drugs. In this context, structure-based virtual screening (SBVS) is robust and useful and is one of the most promising in silico techniques for drug design. SBVS attempts to predict the best interaction mode between two molecules to form a stable complex, and it uses scoring functions to estimate the force of non-covalent interactions between a ligand and molecular target. Thus, scoring functions are the main reason for the success or failure of SBVS software. Many software programs are used to perform SBVS, and since they use different algorithms, it is possible to obtain different results from different software using the same input. In the last decade, a new technique of SBVS called consensus virtual screening (CVS) has been used in some studies to increase the accuracy of SBVS and to reduce the false positives obtained in these experiments. An indispensable condition to be able to utilize SBVS is the availability of a 3D structure of the target protein. Some virtual databases, such as the Protein Data Bank, have been created to store the 3D structures of molecules. However, sometimes it is not possible to experimentally obtain the 3D structure. In this situation, the homology modeling methodology allows the prediction of the 3D structure of a protein from its amino acid sequence. This review presents an overview of the challenges involved in the use of CADD to perform SBVS, the areas where CADD tools support SBVS, a comparison between the most commonly used tools, and the techniques currently used in an attempt to reduce the time and cost in the drug development process. Finally, the final considerations demonstrate the importance of using SBVS in the drug development process.

Introduction

In the past, the discovery of new drugs was made through random screening and empirical observations of the effects of natural products for known diseases.

This random screening process, although inefficient, led to the identification of several important compounds until the 1980s. Currently, this process is improved by high-throughput screening (HTS), which is suitable for automating the screening process of many thousands of compounds against a molecular target or cellular assay very quickly. The milestone of HTS was used in the identification of cyclosporine A as a immunosuppressant ( von Wartburg and Traber, 1988 ). Subsequently, several drugs such as nevirapine ( Merluzzi et al., 1990 ), gefitinib ( Ward et al., 1994 ), and maraviroc ( Wood and Armour, 2005 ) have reached the market. Notably, gefitinib was discovered by computational methods through a collection of 1500 compounds by ALLADIN ( Martin, 1992 ) software. In addition, computational methods have been used to search successful compounds against malaria disease ( Nunes et al., 2019 ). The structures of these molecules are in Figure 1 .

www.frontiersin.org

Figure 1 . Examples of structures identified by HTS. (A) cyclosporine A, (B) Neviparine, (C) Gefitinib, (D) Clioquinol, and (E) Maraviroc.

Alternatively, the increased cost and evolution of medicines available in the last century have led to an improvement in the quality of life of the world population. However, while the average quality of life has been improved, a third of the population is still without access to essential medicines, which means that more than 2 billion people cannot afford to buy basic medicines ( Leisinger et al., 2012 ). This problem is even worse in some places in Africa and Asia, where more than 50% of the people face problems obtaining medicines ( Leisinger et al., 2012 ). Moreover, throughout the world, more than 18 million deaths that occur every year could be avoided, as well as tens of millions of deaths related to poverty and lack of access to essential medicines ( Sridhar, 2008 ). The price of many medicines is inaccessible to limited-income populations and middle-income countries ( Stevens and Huys, 2017 ).

While there is a need to increase the population's access to medicines, the pharmaceutical industry is facing unprecedented challenges in its business model ( Paul et al., 2010 ). The current process of developing new drugs began to mature only in the second half of the twentieth century. The process evolved from observations made in the correlation of certain physical-chemical properties of organic molecules with biological potency. Optimization of these compounds by the incorporation of more favorable substituents resulted in more potent drugs. X-ray crystallography and nuclear magnetic resonance (NMR) techniques have provided information on the structures of enzymes and drug receptors. Many drugs, such as angiotensin-converting-enzyme (ACE) inhibitors, have been introduced to the clinical practice from this structural information.

The drug development process aims to identify bioactive compounds to assist in the treatment of diseases. In summary ( Figure 2 ), the process starts with the identification of molecular targets for a given compound (natural or synthetic) and is followed by their validation. Then, virtual screening (VS) can be used to assist in hit identification (identification of active drug candidates) and lead optimization (biologically active compounds are transformed into appropriate drugs by improving their physicochemical properties). Finally these optimized leads will undergo preclinical and clinical trials to ultimately be approved by regulatory bodies ( Lima et al., 2016 ).

www.frontiersin.org

Figure 2 . Drug development timeline.

In general, this process is time-consuming, laborious and expensive. The development of a new drug has an average cost between 1 and 2 billion USD and could take 10–17 years ( Leelananda and Lindert, 2016 ), since it has to move through all phases for new drug development, from target discovery to drug registration. Even so, Arrowsmith (2012) showed that the probability of a drug candidate reaching the market after entering Phase I clinical trials fell from 10% in the 2002–2004 period to approximately 5% between 2006 and 2008, which represents a 50% decrease in just 4 years.

Thus, researchers are constantly investing in the development of new methods to increase the efficiency of the drug discovery process ( Hillisch et al., 2004 ). The computer-aided drug design (CADD) approach, which employs molecular modeling techniques, has been used by researchers to increase the efficacy in the development of new drugs since it uses in silico simulations. Molecular modeling allows the analysis of many molecules in a short period of time, demonstrating how they interact with targets of pharmacological interest even before their synthesis. The technique allows the simulation and prediction of several essential factors, such as toxicity, activity, bioavailability and efficacy, even before the compound undergoes in vitro testing, thus allowing better planning and direction of the research ( Ferreira et al., 2011 ). Better planning of the research means, in this case, fewer in vitro and in vivo experiments. Therefore, it reduces the run time and overall research costs.

In this context, virtual screening (VS) is a promising in silico technique used in the drug discovery process. An indispensable condition in performing virtual screening is the availability of a 3D structure of the target protein ( Cavasotto, 2011 ). Therefore, some virtual databases were created to store 3D structures of molecules. Virtual screening is now widely applied in the development of new drugs and has already contributed to compounds on the market. Examples of drugs that came to the market with the assistance of VS include captopril (antihypertensive drug), saquinavir, ritonavir, and indinavir (three drugs for the treatment of human immunodeficiency virus), tirofiban (fibrinogen antagonist), dorzolamide (used to treat glaucoma), zanamivir (a selective antiviral for influenza virus), aliskiren (antihypertensive drug), boceprevir (protease inhibitor used for the treatment of hepatitis C), nolatrexed (in phase III clinical trial for the treatment of liver cancer) ( Talele et al., 2010 ; Sliwoski et al., 2013 ; Devi et al., 2015 ; Nunes et al., 2019 ). The structures of these molecules are in Figures 3 , 4 .

www.frontiersin.org

Figure 3 . Drugs that came to the market with the assistance of VS: (A) Captopril, (B) Saquinavir, (C) Tirofiban, (D) Indinavir, (E) Ritonavir.

www.frontiersin.org

Figure 4 . Drugs that came to the market with the assistance of VS. (A) Dorzolamide, (B) Zanamivir, (C) Aliskiren, (D) Boceprevir, (E) Nolatrexid.

This review will present an overview of the challenges involved in the development of new drugs. Section Computer-aided drug design (CADD) will describe CADD while section 3 will demonstrate how VS has been used as an agent in the process of developing of new drugs. Section Virtual screening (VS), in turn, will explain the main scoring functions used in recent scientific research. Section Consensus docking will explain consensus docking, which is a relatively unexplored topic in the virtual screening process. Section Virtual Databases will list the main virtual databases used in this task. Section Virtual screening algorithms presents the main VS algorithms used. Section Methods of evaluating the quality of a simulation will present some evaluation methods used to verify if the quality of the performed model/simulation is good. Section VS software programs, in turn, will present the main VS software currently used. Section Final considerations will present final considerations.

Computer-Aided Drug Design (CADD)

One approach used to increase the effectiveness in the development of new drugs is the use of computer-aided drug design (CADD, well known as an in silico method) techniques, which uses a computational chemistry approach for the drug discovery process. CADD is a cyclic process for developing new drugs, in which all stages of design and analysis are performed by computer programs, operated by medicinal chemists ( Oglic et al., 2018 ).

Strategies for CADD may vary, depending on what information about the target and ligand are available. In the early stage of the drug development process, it is normal for little or no information to exist about the target, ligands, or their structures. CADD techniques are able to obtain this information, such as which proteins can be targeted in pathogenesis and what are the possible active ligands that can inhibit these proteins. Kapetanovic (2008) briefly notes that CADD comprises (i) making the drug discovery and development process faster with the contribution of in silico simulations; (ii) optimizing and identifying new drugs using the computational approach to discover chemical and biological information about possible ligands and/or molecular targets; and (iii) using simulations to eliminate compounds with undesirable properties and selecting candidates with more chances for success. Recent software uses empirical molecular mechanics, quantum mechanics and, more recently, statistical mechanics. This last advancement allows the explicit effects of solvents to be incorporated ( Das and Saha, 2017 ).

CADD gained prominence, as it allows obtaining information about the specific properties of a molecule, which can influence its interaction with the receptor. Thus, it has been considered a useful tool in rational planning and the discovery of new bioactive compounds. Alternatively, CADD simulations require a high computational cost, taking up to weeks if long jobs are used for molecular dynamics simulations. Therefore, it is a continuous challenge to find viable solutions that reduce the simulation runtime and simultaneously increase the accuracy of the simulations ( Ripphausen et al., 2011 ). In this context, VS is a promising approach.

Virtual Screening (Vs)

Popular VS techniques originated in the 1980s, but the first publication about VS appeared in 1997 ( Horvath, 1997 ). In recent times, the use of VS techniques has been shown to be an excellent alternative to high throughput screening, especially in terms of cost-effectiveness and probability of finding the most appropriate result through a large virtual database ( Surabhi and Singh, 2018 ).

VS is an in silico technique used in the drug discovery process. During VS, large databases of known 3D structures are automatically evaluated using computational methods ( Maia et al., 2017 ). VS works like a funnel, by selecting more promising molecules for in vitro assays to be performed. In the example shown in Figure 5 , it is assumed that a virtual screening will be performed on 500 possible active ligands for a target. Then, VS with AutoDock Vina ( Trott and Olson, 2009 ) was carried out and the top 50 ligands were selected. Then, a VS using DOCK 6 ( Allen et al., 2015 ) with the Amber scoring function was performed. DOCK 6 with Amber scoring function takes longer, because it performs molecular dynamics, but it promises better results. Finally, after VS with DOCK 6, the top 5 active compounds are selected to be purchased and then tested in vitro . With the use of VS, it is expected that those identified molecules are more susceptible to binding to the molecular target, which is typically a protein or enzyme receptor. Therefore, VS assists in identifying the most promising hits able to bind to the target protein or enzyme receptor, and only the most promising molecules are synthesized. In addition, VS identifies compounds that may be toxic or have unfavorable pharmacodynamic (for example, potency, affinity, selectivity) and pharmacokinetic (for example, absorption, metabolism, bioavailability) properties. Thus, VS techniques play a prominent role among strategies for the identification of new bioactive substances ( Berman et al., 2013 ).

www.frontiersin.org

Figure 5 . VS scheme.

VS for drug discovery is becoming an essential tool to assist in fast and cost-effective lead discovery and drug optimization ( Maia et al., 2017 ). This technique can aid in the discovery of bioactive molecules, since they allow the selection of compounds in a structure database that are most likely to show biological activity against a target of interest. After identification, these bioactive molecules undergo biological assays. In addition, there are VS techniques using machine learning methods that predict compounds with specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structural and physicochemical properties that are derived from the ligand structure ( Ma et al., 2009 ). Hence, VS tools play a prominent role among the strategies used for the identification of new bioactive substances, since they increase the speed of the drug discovery process as long as they automatically evaluate large compound libraries through computational simulations ( Maithri and Narendra, 2016 ).

Structure based virtual screening (SBVS) is a robust, useful and promising in silico technique for drug design ( Lionta et al., 2014 ). Therefore, this review will address SBVS, although there are other types of VS such as ligand-based virtual screening ( Banegas-Luna et al., 2018 ) and fragment-based virtual screening ( Wang et al., 2015 ).

4-Structure-Based Virtual Screening (SBVS)

Structure-based virtual screening (SBVS), also known as target-based virtual screening (TBVS), attempts to predict the best interaction between ligands against a molecular target to form a complex. As a result, the ligands are ranked according to their affinity to the target, and the most promising compounds are shown at the top of the list. SBVS methods require that the 3D structure of the target protein be known so that the interactions between the target and each chemical compound can be predicted in silico ( Liu et al., 2018 ). In this strategy, the compounds are selected from a database and classified according to their affinity for the receptor site.

Among the techniques of SBVS, molecular docking is noteworthy due to its low computational cost and good results achieved ( Meng et al., 2011 ). This technique emerged in the 1980s, when Kuntz et al. (1982) designed and tested a set of algorithms that could explore the geometrically feasible alignments of a ligand and target. However, although the approach was promising, it was only in the 1990s that it became widely used after there was an improvement in the techniques used in conjunction with an increase in the computational power and a greater access to the structural data of target molecules. During the execution of SBVS, the evaluated molecules are sorted according to their affinity to the receptor site. Hence, it is possible to identify ligands that are more likely to present some pharmacological activity with the molecular target. Score functions are used to verify the likelihood of a binding site describing the affinity between the ligand and target. In this process, a reliable scoring function is the critical component of the docking process ( Leelananda and Lindert, 2016 ).

The use of SBVS has advantages and disadvantages. Among the advantages are the following:

I There is a decrease in the time and cost involved in the screening of millions of small molecules.

II There is no need for the physical existence of the molecule, so it can be tested computationally even before being synthesized.

III There arebr several tools available to assist SBVS.

The disadvantages can be highlighted as the following:

I Some tools work best in specific cases, but not in more general cases ( Lionta et al., 2014 ).

II It is difficult to accurately predict the correct binding position and classification of compounds due to the difficulty of parameterizing the complexity of ligand-receptor binding interactions.

III It can generate false positives and false negatives.

Despite the disadvantages noted above, many studies using SBVS have been developed in recent years ( Carregal et al., 2017 ; Mugumbate et al., 2017 ; Wójcikowski et al., 2017 ; Carpenter et al., 2018 ; Dutkiewicz and Mikstacka, 2018 ; Surabhi and Singh, 2018 ; Nunes et al., 2019 ), which shows that although SBVS has disadvantages, it is still wide used for developing drugs due to the reduction of time and cost. However, docking protocols are essential for achieving accurate SBVS. These protocols are composed of two main components: the search algorithm and the score function.

Search Algorithms

Search algorithms are used to systematically search for ligand orientations and conformations at the binding site. A good docking protocol will achieve the most viable ligand conformations, in addition the most realistic position of the ligand at the binding site.

Thus, the search algorithm explores different positions of ligands at the active binding site using translational and rotational degrees of freedom in the case of rigid docking, while flexible docking adds conformational degrees of freedom to translations and rotations of the ligands. To predict the correct conformation of ligands, search algorithms adopt various techniques, such as checking the chemistry and geometry of the atoms involved [DOCK 6 ( Allen et al., 2015 ), FLEXX ( Rarey et al., 1996 )], genetic algorithm [GOLD ( Verdonk et al., 2003 )] and incremental construction ( Friesner et al., 2004 ). Algorithms that consider ligand flexibility can be divided into three types: systematic, stochastic and deterministic ( Ruiz-Tagle et al., 2018 ). Some software uses more than one of these approaches to obtain better results.

Systematic search algorithms exploit the degrees of freedom of the molecules, usually through their incremental construction at the binding site. Increasing the degree of freedom (rotatable bonds) increases the number of evaluations needed to be performed by the algorithm. Increasing the degree of freedom (rotary links) increases the number of evaluations required to be performed by the algorithm, causing an increase in the time required for its execution. To reduce the time it takes to execute, termination criteria are inserted that prevent the algorithm from trying solutions that are in the space known to lead to wrong solutions. DOCK 6 ( Allen et al., 2015 ), FLEXX ( Rarey et al., 1996 ), and Glide ( Friesner et al., 2004 ) are examples of software that uses systematic search algorithms.

Stochastic search algorithms perform random changes in the spatial conformation of the ligand, usually changing one system degree of freedom at a time, which leads to the exploration of several possible conformations ( Ruiz-Tagle et al., 2018 ). The main problem of stochastic algorithms is the uncertainty of converging to a good solution. For this reason, to minimize this problem, several independent executions of stochastic algorithms are usually performed. Examples of stochastic research algorithms are Monte Carlo (MC) methods used by Glide ( Friesner et al., 2004 ) and MOE ( Vilar et al., 2008 ) and genetic algorithms used by GOLD ( Verdonk et al., 2003 ) and AutoDock4 ( Morris et al., 2009 ).

During the execution of a deterministic search algorithm, the initial state is responsible for determining the movement that can be made to generate the next state, which generally must be equal to or less in the energy from the initial state. One problem with deterministic algorithms is that they are often trapped in local minima because they cannot cross barriers; there are approaches, such as increasing the simulation temperature, that can be implemented to circumvent this problem. Energy minimization methods are an example of deterministic algorithms. Molecular dynamics (MD) is also an example of a deterministic search algorithm and is used by DOCK 6 ( Allen et al., 2015 ). However, MD computational demands are very high, and while MD promises to have better results and ensures full-system flexibility, the runtime becomes a limiting factor for simulations because structure databases can have millions of ligands and targets.

Scoring Functions

Molecular docking software uses scoring functions to estimate the force of non-covalent interactions between a ligand and molecular target using mathematical methods. A scoring function is one of the most important components in SBVS ( Huang et al., 2010 ) as it is primarily responsible for predicting the binding affinity between a target and its ligand candidate. Thus, the scoring functions are the main reason for the success or failure of docking tools ( ten Brink and Exner, 2009 ). Therefore, despite the wide use, the estimation of the interaction force between a ligand and molecular target remains a major challenge in VS. Figure 6 illustrates docking using Autodock Vina between cyclooxygenase-2 (PDB ID: 4PH9) and two ligands (a) an inactive ligand and (b) celecoxib (an anti-inflammatory). Compared to the inactive ligand, celecoxib is observed to have much more interactions with the protein, which causes celecoxib to form a more stable binding in the VS. This result causes the AutoDock Vina scoring function to see a binding energy of −10.4 kcal/mol for celecoxib and −5.4 kcal/mol for the inactive compound. The ligand with the highest binding affinity to the target can be selected for further testing. Therefore, in this case celecoxib would be chosen.

www.frontiersin.org

Figure 6 . Identification of a ligand candidate by using a typical scoring function. The hydrogens were omitted for better visualization. (A) Inactive ligand, (B) celecoxib.

In general, there are three important applications of scoring functions in molecular docking. First, they can be used to determine the ligand binding site and the conformation between a target and ligand. This approach can be used to search for allosteric sites. Second, they can be used to predict the binding affinity between a protein and ligand. Third, they can also be used in lead optimization ( Li et al., 2013 ).

Most authors define the scoring functions as three types ( Huang et al., 2010 ; Ferreira et al., 2015 ; Haga et al., 2016 ): force field (FF), empirical and knowledge-based. Liu and Wang (2015) define two more types of scoring functions as: machine-learning-based and hybrid methods.

The force field scoring functions are based on the intermolecular interactions between the ligand and target atoms, such as the van der Waals, electrostatic and bond stretching/bending/torsional force interactions, obtained from experimental data and in accordance with the principles of molecular mechanics ( Ferreira et al., 2015 ). Some published force-field scoring functions include the ones described in Li et al. (2015) , Goldscore ( Verdonk et al., 2003 ), and Sybyl/D-Score ( Ash et al., 1997 ).

Empirical scoring functions estimate the binding free energy based on weighted structural parameters by adjusting the scoring functions to experimentally determine the binding constants of a set of complexes ( Ferreira et al., 2015 ). To create an empirical scoring function, a set of data from protein-binding complexes whose affinities are known is initially used for training. A linear regression is then performed as a way of predicting the values of some variables ( Huang et al., 2010 ). The weight constants generated by the empirical function are used as coefficients to adjust the equation terms. Each term of the function describes a type of physical event involved in the formation of the ligand-receptor complex. Thus, hydrogen bonding, ionic bonding, non-polar interactions, desolvation and entropic effects are considered. Some popular empirical, scoring functions include Glide-Score ( Friesner et al., 2004 ), Sybyl-X/F-score ( Certara, 2016 ) and DOCK 6 empirical force field ( Allen et al., 2015 ).

In the knowledge-based scoring functions, the binding affinity is calculated by summing the binding interactions of the atoms of a protein and the molecular target ( Ferreira et al., 2015 ). These functions consider statistical observations performed on large databases ( Ferreira et al., 2015 ). The method uses pairwise energy potentials extracted from known ligand-receptor complexes to obtain a general scoring function. These methods assume that intermolecular interactions occurring near certain types of atoms or functional groups that occur more frequently are more likely to contribute favorably to the binding affinity. The final score is given as a sum of the score of all individual interactions. One example of software that uses a knowledge-based scoring function is ParaDockS ( Meier et al., 2010 ).

In addition, machine-learning-based methods ( Liu and Wang, 2015 ) have been considered as a fourth type of scoring function. Machine learning-based methods have gained attention for their reliable prediction ( Pereira et al., 2016 ; Chen et al., 2018 ). Many researchers have used machine learning to improve SBVS algorithms, but we do not know any drugs developed after combining SBVS with machine learning. However, some researchers applied machine learning techniques to discover a new antibiotic capable of inhibiting the growth of E. coli bacteria ( Stokes et al., 2020 ). These techniques have been used in quantitative structure-activity relationship (QSAR) analysis to predict various physical-chemical (for example, hydrophobicity, and stereochemistry of the molecule), biological (for example, activity and selectivity), and pharmaceutical (for example, absorption, and metabolism) properties of small molecule compounds. In these types of scoring functions, modern QSAR analyses can be applied to derive statistical models that calculate protein-ligand binding scores. Some scoring functions of this type are NNScore 2.0 ( Durrant and McCammon, 2011 ), RF-Score-VS ( Wójcikowski et al., 2017 ), SFCscoreRF ( Zilian and Sotriffer, 2013 ), SVR-KB ( Li et al., 2011 ), SVR-EP ( Li et al., 2011 ), ID-Score ( Li et al., 2013 ) and CScore ( Ouyang et al., 2011 ).

There are some hybridized scoring functions that cannot easily be classified into any of the categories listed above because they combine two or more of the previously defined scoring function types [force field (FF), empirical, knowledge based and machine-learning-based] into one scoring function. Therefore, they are called hybrid scoring functions. In general, the hybrid scoring function is a linear combination of the two or more scoring function components derived from a multiple linear regression fitting procedure ( Tanchuk et al., 2016 ). For example, the GalaxyDock score function is a hybrid of physics-based, empirical, and knowledge-based score terms that has the advantages of each component. As a result, the performance was improved in decoy pose discrimination tests ( Baek et al., 2017 ). A few recently published examples of this type of scoring function include the hybrid scoring function developed by Tanchuk et al. ( Tanchuk et al., 2016 ), which combines force field machine learning scoring functions; SMoG2016 ( Geng et al., 2019 ), which combines knowledge-based and an empirical scoring functions; GalaxyDock BP2 ( Baek et al., 2017 ), which combines force field, empirical, and knowledge-based scoring functions and iScore ( Geng et al., 2019 ), which combines empirical and force-field scoring functions.

Consensus Docking

In the last decade, a new technique of VS called consensus docking (CD) has been used in some studies ( Park et al., 2014 ; Tuccinardi et al., 2014 ; Chermak et al., 2016 ; Poli et al., 2016 ; Aliebrahimi et al., 2017 ) to increase the accuracy of VS studies and to reduce the false positives obtained in VS experiments ( Aliebrahimi et al., 2017 ).

This technique is a combination of two different approaches, in which the resultant combination is better than a single approach alone. However, Poli et al. (2016) reported that there are few studies that evaluate the possibility of combining the results from different VS methods to achieve higher success rates in VS studies.

Houston and Walkinshaw (2013) described the main reason for using this combination: the individual program may present incorrect results and these errors are mostly random. Therefore, even when two programs present different results, the combination of these results may, in principle, be much closer to the correct answer than even the best program alone. Houston and Walkinshaw also suggest that CD approaches using two different docking programs improve the precision of the predicted binding mode for any VS study. The same study also verified that a greater level of consensus in a given pose indicates a greater reliability in this result. Finally, the results presented by the authors suggest that the CD approach works as well as the best VS approaches available in the literature.

Park et al. (2014) use an approach in which they used a combination of the programs AutoDock 4.2 ( Morris et al., 2009 ) and FlexX ( Rarey et al., 1996 ) programs. These programs were chosen because both use different types of score functions (force field in AutoDock and empirical in FlexX). In this study, they achieved superior performance with the application of consensus docking than using each of the programs alone.

Alternatively, when using two different VS programs, there is extra time to run the two different tools and combine the results. However, Houston and Walkinshaw (2013) showed that the increased runtime may be advantageous; using AutoDock Vina ( Trott and Olson, 2009 ) in a VS approach along with AutoDock4 ( Morris et al., 2009 ) increased the final runtime by ~10%. This combination is interesting given the potential gains from its use.

Therefore, the use of consensus docking is a recent technique, and although there are few papers in the literature on the subject, it seems to be a promising approach for further VS studies.

Virtual Databases

An indispensable condition in performing VS is the availability of a 3D structure of the target protein ( Cavasotto, 2011 ) and ligands to be docked. Some databases were created to store 3D structures of molecules. Some of the free databases include Protein Data Bank (PDB) ( Berman et al., 2013 ), PubChem ( Kim et al., 2016 ), ChEMBL ( Bento et al., 2014 ), ChemSpider ( Pence and Williams, 2010 ), Zinc ( Sterling and Irwin, 2015 ), Brazilian Malaria Molecular Targets (BraMMT) ( Nunes et al., 2019 ), Drugbank ( Wishart et al., 2018 ), and Our Own Molecular Targets (OOMT) ( Carregal et al., 2013 ). In addition, there are some commercially available databases such as the MDL Drug Data Report 1 Below we are going to present a brief explanation of each of these databases:

• Protein Data Bank (PDB) ( Berman et al., 2013 ): PDB is the public database where three-dimensional structures of proteins, nucleic acids, and complex molecules have been deposited since 1971. The worldwide PDB organization ensures that PDB files are publicly available to the global community. It is widely used by the academic community and has grown consistently in recent years. In the last 10 years, the number of 3D structures of the PDB increased from 48,169 at the end of 2008 to 147,604 in the end of 2018, an increase of nearly 207%. This implies that in the last 10 years, almost 9,943 new structures have been added to the PDB every year, just over 27 structures per day, on average. The pace of this growth has increased. At the beginning of this decade approximately 25 new entries were added per day on average. In 2018, over 31 new structures were added per day, an average daily growth of 24% compared to 2010.

• PubChem ( Kim et al., 2016 ): PubChem is a public database, aggregating information from smaller, more specific databases. It has more than 97 million compounds available.

• ChEMBL ( Bento et al., 2014 ): ChEMBL is a database of bioactive molecules with medicinal properties maintained by the European Institute of Bioinformatics (EBI) of the European Molecular Biology Laboratory (EMBL). Currently, it has almost 2.3 million compounds and 15.2 million known biological activities.

• Zinc ( Sterling and Irwin, 2015 ): Zinc is a free database of commercially available compounds for VS. Zinc has more than 230 million commercially available compounds in the 3D format. Zinc is maintained by Irwin and Shoichet Laboratories of the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF).

• NatProDB ( Paixão and Pita, 2016 ): The State University of Feira de Santana has made NatProDB available. This database stores 3D structures of the semiarid biome. The pharmacological profile of compounds from the semiarid flora have not yet been studied, which has motivated our research group to deepen the research by their molecular targets ( Taranto et al., 2015 ).

• Our Own Molecular Target (OOMT) ( Carregal et al., 2013 ): OOMT is a special molecular target database because it has the biological assay for all its molecular targets, and includes specific targets for cancer, dengue, and malaria. OOMT was created by a group of researchers from Federal University of São João del-Rei (UFSJ).

• Brazilian Malaria Molecular Targets (BraMMT) ( Nunes et al., 2019 ): The BRAMMT database comprises thirty-five molecular targets for Plasmodium falciparum retrieved from the PDB database. This database allows in silico virtual high throughput screening (vHTS) experiments against a pool of P. falciparum molecular targets.

• Drugbank ( Wishart et al., 2018 ): DrugBank is a database that contains comprehensive molecular information about drugs, their mechanisms, their interactions, and their targets. The database contains more than 11,900 drug entries, including nearly 2,538 FDA-approved small molecule drugs, 1,670 biotechnology (protein / peptide) drugs approved by the FDA, 129 nutraceuticals and nearly 6,000 investigational drugs.

Commercially available Databases:

• MDL Drug Data Report (MDDR) ( Sci Tegic Accelrys Inc, 2019 ): MDDR is a commercial database built from patent databases, publications and congresses. It has more than 260,000 biologically relevant compounds and approximately 10,000 compounds are added every year.

• ChemSpider ( Pence and Williams, 2010 ): ChemSpider is a database of chemical substances owned by the Royal Society of Chemistry. It has more than 71 million chemical structures from over 250 data sources. ChemSpider allows downloading up to 1000 structures per day. Previous contact is needed for the download of more structures, and ChemSpider is therefore not a totally free database.

Virtual Screening Algorithms

In VS, we are targeting proteins in the human body to find novel ligands that will bind to them. VS can be divided into two classes: structure-based and ligand-based. In structure-based virtual screening, a 3D structure of the target protein is known, and the goal is to identify ligands from a database of candidates that will have better affinity with the 3D structure of the target. VS can be performed using molecular docking, a computational process where ligands are moved in 3D space to find a configuration of the target and ligand that maximizes the scoring function. The ligands in the database are ranked according to their maximum score, and the best ones can be investigated further, e.g., by examining the mode and type of interaction that occurs. Additionally, VS techniques can be divided according to the algorithms used as follows:

• Machine Learning-based Algorithms

• Artificial neural networks (ANNs) ( Ashtawy and Mahapatra, 2018 );

• Support vector machines ( Sengupta and Bandyopadhyay, 2014 );

• Bayesian techniques ( Abdo et al., 2010 );

• Decision tree ( Ho, 1998 );

• k-nearest neighbors (kNN) ( Peterson et al., 2009 );

• Kohonen's SOMs and counterpropagation ANNs ( Schneider et al., 2009 );

• Ensemble methods using machine learning ( Korkmaz et al., 2015 );

• Evolutionary Algorithms

• Genetic algorithms ( Xia et al., 2017 );

• Differential evolution ( Friesner et al., 2004 ), Gold ( Verdonk et al., 2003 ), Surflex ( Spitzer and Jain, 2012 ) and FlexX ( Hui-fang et al., 2010 );

• Ant colony optimization ( Korb et al., 2009 );

• Tabu search ( Baxter et al., 1998 );

• Particle swarm optimization ( Gowthaman et al., 2015 ) and PSOVina ( Ng et al., 2015 );

• Local search such as Autodock Vina ( Trott and Olson, 2009 ), SwissDock/EADock ( Grosdidier et al., 2011 ) and GlamDock ( Tietze and Apostolakis, 2007 );

• Exhaustive search such as eHiTS ( Zsoldos et al., 2007 );

• Linear programming methods such as Simplex Method ( Ruiz-Carmona et al., 2014 );

• Systematic methods such as incremental construction used by FlexX ( Rarey et al., 1996 ), Surflex ( Spitzer and Jain, 2012 ), and Sybyl-X ( Certara, 2016 );

• Statistical methods

• Monte Carlo ( Harrison, 2010 );

• Simulated annealing (SA) ( Doucet and Pelletier, 2007 ), Hatmal and Taha (Hatmal and Taha, 2017) ;

• Conformational space annealing (CSA) ( Shin et al., 2011 );

• Similarity-based algorithms

• Based on substructures ( Tresadern et al., 2009 );

• Pharmacochemical ( Cruz-Monteagudo et al., 2014 );

• Overlapping volumes ( Leach et al., 2010 );

• Molecular interaction fields (MIFs) ( Willett, 2006 );

• Hybrid approach ( Morris et al., 2009 ; Haga et al., 2016 );

After performing a VS simulation, it is necessary to verify whether the quality of the generated protein-ligand complexes can represent a complex that could be reproduced in experiments. There are several methods that can perform this assessment, which will be explained in the next section.

Methods OF Evaluating The Quality of a Simulation

To verify the quality of a docking approach, some methods are used to evaluate generated complexes and to verify if the protein generated by the docking can reproduce the experimental data results of the ligand-receptor complex. The most common evaluation methods are root mean square deviation (RMSD) ( Hawkins et al., 2008 ), receiver operating characteristic (ROC), area under the curve ROC (AUC-ROC) ( Flach and Wu, 2005 ; Trott and Olson, 2009 ) enrichment factors (EFs) ( Truchon and Bayly, 2007 ) and Boltzmann-enhanced discrimination of ROC (BEDROC) ( Truchon and Bayly, 2007 ).

Root-Mean-Square Deviation (RMSD)

One of the aspects evaluated in docking programs is the accuracy of the generated geometry ( Jain, 2008 ). Docking programs attempt to reproduce the conformation of the ligand-receptor complex in a crystallographic structure. The metric root-mean-square deviation (RMSD) of atomic coordinates after the ideal superposition of rigid bodies of two structures is popular. Its popularity is because it allows the quantification of the differences between two structures, and these can be structures with the same and different amino acid sequences ( Sargsyan et al., 2017 ). RMSD is widely used to evaluate the quality of a docking process performed by a program ( Ding et al., 2016 ). The RMSD between two structures can be calculated according to the following equation ( Sargsyan et al., 2017 ):

where d is the distance between atom i in the two structures and N is the total number of equivalent atoms. Since the calculation of RMSD requires the same number of atoms in both structures, it is often used in the calculation of only the heavy atoms or backbone of each amino acid residue.

Using the RMSD calculation, it is possible to evaluate if a program was able to reliably reproduce a known crystallographic conformation, as well as their respective intramolecular interactions. To verify if a given program can accomplish this task, ligand-targets complexes are subjected to a redocking process. After redocking, the overlap of the crystallographic ligand with the conformation of the ligand obtained with the docking program is then performed. Then, the RMSD calculation is used to check the average distance between the corresponding atoms (usually backbone atoms).

Generally, the RMSD threshold value is 2.0 Å ( Jain, 2008 ; Meier et al., 2010 ; Gowthaman et al., 2015 ). However, for ligands with several dihedral angles, an RMSD value of 2.5 Å is considered acceptable ( De Magalhães et al., 2004 ). In the case of binding a large ligand, some authors generally relax this criterion ( Méndez et al., 2003 ; Verschueren et al., 2013 ). For a model generated by homology modeling, evaluating the RMSD value is important, although visual inspection of the generated model is also essential.

However, RMSD has some important limitations:

• RMSD can only compare structures with the same number of atoms;

• A small perturbation in just one part of the structure can create large RMSD values, suggesting that the two structures are very different, although they are not ( Carugo, 2007 );

• It has also been observed that RMSD values depend on the resolution of structures that are compared ( Carugo, 2003 );

• RMSD does not distinguish between a structure with some very rigid regions and some very flexible regions from a molecule in which all regions are semiflexible ( Sargsyan et al., 2017 );

Comparing the RMSD value of large structures may be significantly distorted from the commonly used 2Å threshold ( Méndez et al., 2003 ). Despite these limitations, RMSD remains one of the most commonly used metrics to quantify differences between structures ( Sargsyan et al., 2017 ).

Figure 7 shows the visualization of the FCP ligand superposed with its conformation after redocking to a protein (PDB ID: 1VZK, A Thiophene Based Diamidine Forms a “Super” AT Binding Minor Groove Agent). The RMSD between the crystallographic ligand and the same ligand after the redocking using DOCK6 is 0.97 Å. In the figure below, red represents the crystallographic ligand FCP and yellow represents FCP ligand after redocking using DOCK 6.

www.frontiersin.org

Figure 7 . RMSD between the ligand FCP with a protein (PDB ID: 1VZK) after redocking using DOCK6.

ROC Curve and AUC

One of the great challenges of VS methods is the ability to differentiate true positive compounds (TPCs) against the target from false positive compounds (FPCs) ( Awuni and Mu, 2015 ). Thus, it is important that VS tools have ways to assist their users in distinguishing TPCs from FPCs. The ROC curve and the area under the ROC curve (AUC-ROC) ( Lätti et al., 2016 ) are widely used methodologies for this purpose.

TPC and decoys are used to create a ROC curve and AUC-ROC. TPCs are those with known biological activity for the molecular target of interest. Some databases, such as ChEMBL ( Gaulton et al., 2012 ; Bento et al., 2014 ), allows users to search for these compounds. Alternatively, decoys are compounds that, although possessing physical properties similar to a TPC (such as molecular mass, number of rotatable bonds, and logP), have different chemical structures that make them inactive. They are generated from random molecular modifications in the structure of a TPC ( Huang et al., 2006 ). Some databases, such as DUD-E ( Mysinger et al., 2012 ) and Zinc ( Sterling and Irwin, 2015 ), provide decoys for compounds of interest. DUD-E generates 50 different decoys for each TPC. The idea of using DUD-E decoys in VS is that the result of VS is more reliable if the program can separate TPCs from FPCs generated by DUD-E because FPCs have many TPC-like physical properties but are known to be inactive. A small number (>2) of known TPCs have to be used to calculate an AUC-ROC ( Lätti et al., 2016 ).

After generating decoys, a VS process is performed using known TPCs and decoys against a target of interest ( Yuriev and Ramsland, 2013 ). For each ligand-target complex, an affinity energy is then calculated. TPCs are expected to have lower affinity energy than inactive compounds. The ROC curve plots the distribution of true and false results on a graph, while AUC-ROC allows the evaluation of the probability of a result to be false. Hence, AUC-ROC reflects the probability of recovering an active compound preferentially to inactive compounds ( Triballeau et al., 2005 ; Zhao et al., 2009 ), allowing verification of the sensitivity of a VS experiment in relation to its specificity. The larger the area under the curve, the better the ability to have a TPC and fewer FPC.

The AUC value can vary between 0 and 1. Hamza ( Hamza et al., 2012 ) showed a practical way of interpreting the AUC values:

• AUC between 0.90 and 1.00: Excellent

• AUC between 0.80 and 0.90: Good

• AUC between 0.70 and 0.80: Fair

• AUC between 0.60 and 0.70: Poor

• AUC between 0.50 and 0.60: Failure

Therefore, the closer the AUC is to 1, the greater the ability of the VS tool to separate between TPCs and FPCs. AUC-ROC values close to 0.5 indicate a random process ( Ogrizek et al., 2015 ). Acceptable values should be >0.7.

Figure 8 shows an example of an ROC curve generated in a VS performed with cyclooxygenase-1 complexed with meloxicam (PDB ID: 4O1Z) protein using five TPCs and 250 decoys. The VS tool was able to distinguish well between TPCs and FPCs with the generated ROC curve and its respective AUC, which was 0.8628.

www.frontiersin.org

Figure 8 . ROC curve example.

Boltzmann-Enhanced Discrimination of ROC (BEDROC)

There is much criticism in the use of the ROC curve as a method to measure virtual screening performance because it does not highlight the best ranked active compounds that would be used in in vitro experiments, which is called early recognition. Thus, Tuchon and Bayly ( Truchon and Bayly, 2007 ) proposed Boltzmann-Enhanced Discrimination of ROC (BEDROC), which uses exponential weighting to give early rankings of active compounds more weight than late rankings of active compounds. However, Nicholls ( Nicholls, 2008 ) say that AUC-ROC and BEDROC correlate when considering virtual screening simulations, and therefore, the ROC curve is a sufficient metric for performance measurements.

Enrichment Factors (EFs)

The enrichment factor (EF) consists of the number of active compounds found in a fraction of 0 < χ <1 in relation to the number of active compounds that would be found after a random search ( Truchon and Bayly, 2007 ). EFs are often calculated against a given percentage of the database. For example, EF10% represents the value obtained when 10% of the database is screened. EFs can be defined by the following formula (1):

r i is the rank of the i th active compound in the list, N is the total number of compounds and n is the number corresponding to the selected compounds. The maximum value of EF is 1 / χ if x ≥ n / N and N / n if χ < n / N. The minimum value for EF is 0.

EF is quite simple, but it has some disadvantages. The EF, in addition to depending on the value set for χ, depends on the number of true positives and true negatives, which makes it another measure of experiment performance rather than measuring method performance ( Nicholls, 2011 ). Another disadvantage of EF is that it weighs active compounds equally within the cutoff, so it is not possible to distinguish the best ranking algorithm in which all active compounds are ranked at the beginning of the ordered list of a worse algorithm and they are sorted immediately before the cutoff value [saturation effect ( Lopes et al., 2017 )].

The relative enrichment factor (REF) proposed by von Korff et al. (2009) eliminates the problem associated with the saturation effect by normalizing the EF by the maximum possible enrichment. Consequently, REF has well-defined boundaries and is less subject to the saturation effect.

Vs Software Programs

There are several VS software programs using different docking algorithms that make a VS process easier for the researchers to execute by avoiding the need to have advanced knowledge of computer science and on how to implement the algorithms used in this task. In this regard, VS software can act as a possible cost reducer, since they function as filters that select from a database with thousands of molecules that are more likely to present biological activity against a target of interest. VS programs measure the affinity energy of a small molecule (ligand) to a molecular target of interest to determine the interaction energy of the resulting complex ( Carregal et al., 2017 ).

Table 1 summarizes the main characteristics of the most used software in VS. The first column contains the software used and its reference. The second column contains the type of software license: free for academic use, freeware, open-source, or commercial. The free for academic use license indicates that the software in question can be used for teaching and research in the academic world without a fee. However, it implies that the software has restrictions for commercial use. A freeware license indicates that the software is free. Thus, users can use it without a fee, and all the functions of the program are available to be used without any restrictions. An open-source license indicates that the software source code is accessible so users can study, change, and distribute the software to anyone and for any purpose. Software developed under a commercial license indicates that it is designed and developed for a commercial purpose. Thus, in general, it is necessary to pay some licensing fee for its use. The third column indicates on which platforms the software can be used (Windows, Linux, or Mac). The next column indicates whether or not the software may consider protein flexibility during anchoring. The docking algorithm column lists the algorithms used by the software to perform the docking. The sixth column, called the scoring function, indicates which scoring functions are used by the software.

www.frontiersin.org

Table 1 . Virtual screening software.

Final Considerations

CADD has been used to improve the drug development process. In the past, the discovery of new drugs was often conducted through the empirical observation of the effect of natural products in known diseases. Thus, several possible drug candidates were tested without efficacy, and thereby wasted resources. The use of CADD allows for improving the development of new biologically active compounds and decreasing the time and cost for the development of a new drug. Thus, the emergence of SBVS has improved the drug discovery process and was established as one of the most promising in silico techniques for drug design.

This review verified that CADD approaches can contribute to many stages of the drug discovery process, notably to perform a search for active compounds by VS.

The use of techniques, such as SBVS, has limitations, such as the possibility of generating false positives and correct ranking of ligands docked. Moreover, there are several CADD methods and it is possible to obtain different results for the same input in different software. However, reducing the time and cost of the new drug development process as well as the constant improvement of existing docking tools indicates that CADD techniques will be one of the most promising techniques in the drug discovery process over the next years.

In the last decade, many studies have applied artificial intelligence in CADD to obtain more accurate models. Thus, most studies and future innovations will benefit from the application of AI in CADD.

Finally, the use of CADD tools requires a variety of expertise of researchers to perform all of the steps of the process, such as selecting and preparing targets and ligands, analyzing the results and having broad knowledge of computation, chemistry and biology. Thus, the researcher's background is important for the selection of new hits and to enrich high throughput experiments.

Author Contributions

All authors of this review have made a great contribution to the work. All authors wrote the paper and approved the final version.

Funding sources for this project include FAPEMIG (APQ-02742-17 and APQ-00557-14), CNPq (449984/2014-1), CNPq Universal (426261/2018-6), and UFSJ/PPGBiotec. AT and LA are grateful to CNPq (305117/2017-3) and CAPES for their research fellowships.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor declared a past co-authorship with the author LA.

Acknowledgments

The authors would like to thank the Federal Univesity of São João del-Rei (UFSJ) and the Federal Center for Technological Education of Minas Gerais (CEFET-MG) for providing the physical infrastructure.

1. ^ http://accelrys.com/products/collaborative-science/databases/bioactivity-databases/mddr.html

Abagyan, R., Totrov, M., and Kuznetsov, D. (1994). ICM - a new method for protein modeling and design: applications to docking and structure prediction from distorted native conformation. J. Comput. Chem. 15, 488–506. doi: 10.1002/jcc.540150503

CrossRef Full Text | Google Scholar

Abdo, A., Chen, B., Mueller, C., Salim, N., and Willett, P. (2010). Ligand-based virtual screening using bayesian networks. J. Chem. Inf. Model. 50, 1012–1020. doi: 10.1021/ci100090p

PubMed Abstract | CrossRef Full Text | Google Scholar

Aliebrahimi, S., Montasser Kouhsari, S., Ostad, S. N., Arab, S. S., and Karami, L. (2017). Identification of phytochemicals targeting c-Met kinase domain using consensus docking and molecular dynamics simulation studies. Cell Biochem. Biophys. 76, 135–145. doi: 10.1007/s12013-017-0821-6

Allen, W. J., Balius, T. E., Mukherjee, S., Brozell, S. R., Moustakas, D. T., Lang, P. T., et al. (2015). DOCK 6: impact of new features and current docking performance. J. Comput. Chem. 36, 1132–1156. doi: 10.1002/jcc.23905

Arrowsmith, J. (2012). A decade of change. Nat. Rev. Drug Discov. 11, 17–18. doi: 10.1038/nrd3630

Ash, S., Cline, M. A., Homer, R. W., Hurst, T., and Smith, G. B. (1997). SYBYL line notation (SLN): a versatile language for chemical structure representation. J. Chem. Inf. Model. 37, 71–79. doi: 10.1021/ci960109j

Ashtawy, H. M., and Mahapatra, N. R. (2018). Task-specific scoring functions for predicting ligand binding poses and affinity and for screening enrichment. J. Chem. Inf. Model. 58, 119–133. doi: 10.1021/acs.jcim.7b00309

Awuni, Y., and Mu, Y. (2015). Reduction of false positives in structure-based virtual screening when receptor plasticity is considered. Molecules 20, 5152–5164. doi: 10.3390/molecules20035152

Baek, M., Shin, W-H., Chung, H. W., and Seok, C. (2017). GalaxyDock BP2 score: a hybrid scoring function for accurate protein–ligand docking. J. Comput. Aided. Mol. Des. 31, 653–666. doi: 10.1007/s10822-017-0030-9

Banegas-Luna, A.-J., Cerón-Carrasco, J. P., and Pérez-Sánchez, H. (2018). A review of ligand-based virtual screening web tools and screening algorithms in large molecular databases in the age of big data. Future Med. Chem. 10, 2641–2658. doi: 10.4155/fmc-2018-0076

Baxter, C. A., Murray, C. W., Clark, D. E., Westhead, D. R., and Eldridge, M. D. (1998). Flexible docking using Tabu search and an empirical estimate of binding affinity. Proteins Struct. Funct. Genet. 33, 367–382. doi: 10.1002/(SICI)1097-0134(19981115)33:3<367::AID-PROT6>3.0.CO;2-W

Bento, A. P., Gaulton, A., Hersey, A., Bellis, L. J., Chambers, J., Davies, M., et al. (2014). The ChEMBL bioactivity database: an update. Nucleic Acids Res. 42, D1083–D1090. doi: 10.1093/nar/gkt1031

Berman, H. M., Kleywegt, G. J., Nakamura, H., and Markley, J. L. (2013). The future of the protein data bank. Biopolymers 99, 218–222. doi: 10.1002/bip.22132

Carpenter, K. A., Cohen, D. S., Jarrell, J. T., and Huang, X. (2018). Deep learning and virtual drug screening. Future Med. Chem. 10, 2557–2567. doi: 10.4155/fmc-2018-0314

Carregal, A. P., Maciel, F. V., Carregal, J. B., dos Reis Santos, B., da Silva, A. M., and Taranto, A. G. (2017). Docking-based virtual screening of Brazilian natural compounds using the OOMT as the pharmacological target database. J. Mol. Model. 23:111. doi: 10.1007/s00894-017-3253-8

Carregal, A. P. Jr., Comar, M., and Taranto, A. G. (2013). Our own molecular targets data bank (OOMT). Biochem. Biotechnol. Reports 2, 14–16. doi: 10.5433/2316-5200.2013v2n2espp14

Carugo, O. (2003). How root-mean-square distance (r.m.s.d.) values depend on the resolution of protein structures that are compared. J. Appl. Crystallogr. 36, 125–128. doi: 10.1107/S0021889802020502

Carugo, O. (2007). Statistical validation of the root-mean-square-distance, a measure of protein structural proximity. Protein Eng. Des. Sel. 20, 33–37. doi: 10.1093/protein/gzl051

Cavasotto, C. N. (2011). Homology models in docking and high-throughput docking. Curr. Top. Med. Chem. 11, 1528–1534. doi: 10.2174/156802611795860951

Certara (2016). SYBYL-X Suite 2.1 . Available online at: https://support.certara.com/software/molecular-modeling-and-simulation/sybyl-x/ (accessed August 23, 2017).

Google Scholar

Chen, R., Liu, X., Jin, S., Lin, J., and Liu, J. (2018). Machine learning for drug-target interaction prediction. Molecules 23:2205. doi: 10.3390/molecules23092208

Chermak, E., De Donato, R., Lensink, M. F., Petta, A., Serra, L., Scarano, V., et al. (2016). Introducing a clustering step in a consensus approach for the scoring of protein-protein docking models. PLoS ONE 11:e0166460. doi: 10.1371/journal.pone.0166460

Chopade, A. R. (2015). Molecular docking studies of phytocompounds from phyllanthus species as potential chronic pain modulators. Sci. Pharm. 83, 243–267. doi: 10.3797/scipharm.1408-10

Corbeil, C. R., Englebienne, P., and Moitessier, N. (2007). Docking ligands into flexible and solvated macromolecules. 1. development and validation of FITTED 1.0. J. Chem. Inf. Model. 47, 435–449. doi: 10.1021/ci6002637

Corbeil, C. R., Englebienne, P., Yannopoulos, C. G., Chan, L., Das, S. K., Bilimoria, D., et al. (2008). Docking ligands into flexible and solvated macromolecules 2 development and application of F ITTED 1. 5 to the virtual screening of potential HCV polymerase inhibitors. J. Chem. Inf. Model. , 902–909. doi: 10.1021/ci700398h

Cruz-Monteagudo, M., Medina-Franco, J. L., Pérez-Castillo, Y., Nicolotti, O., Cordeiro, M. N. D. S., and Borges, F. (2014). Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde? Drug Discov. Today 19, 1069–1080. doi: 10.1016/j.drudis.2014.02.003

Das, P. S., and Saha, P. (2017). A review on computer aided drug design in drug discovery. World J. Pharm. Pharm. Sci. 6, 279–291. doi: 10.20959/wjpps20177-9450

De Magalhães, C. S., Barbosa, H. J. C., and Dardenne, L. E. (2004). “Selection-insertion schemes in genetic algorithms for the flexible ligand docking problem,” In: Genetic and Evolutionary Computation – GECCO 2004. GECCO 2004. Lecture Notes in Computer Science , ed K. Deb (Berlin, Heidelberg: Springer), 368–379. doi: 10.1007/978-3-540-24854-5_38

Devi, R. V., Sathya, S. S., and Coumar, M. S. (2015). Evolutionary algorithms for de novo drug design - a survey. Appl. Soft Comput. J. 27, 543–552. doi: 10.1016/j.asoc.2014.09.042

Ding, Y., Fang, Y., Moreno, J., Ramanujam, J., Jarrell, M., and Brylinski, M. (2016). Assessing the similarity of ligand binding conformations with the contact mode score. Comput. Biol. Chem. 64, 403–413. doi: 10.1016/j.compbiolchem.2016.08.007

Doucet, N., and Pelletier, J. N. (2007). Simulated annealing exploration of an active-site tyrosine in TEM-1β-lactamase suggests the existence of alternate conformations. Proteins Struct. Funct. Bioinforma. 69, 340–348. doi: 10.1002/prot.21485

Durrant, J. D., and McCammon, J. A. (2011). NNScore 2.0: a neural-network receptor-ligand scoring function. J. Chem. Inf. Model. 51, 2897–2903. doi: 10.1021/ci2003889

Dutkiewicz, Z., and Mikstacka, R. (2018). Structure-based drug design for cytochrome P450 family 1 inhibitors. Bioinorg. Chem. Appl. 2018:3924608. doi: 10.1155/2018/3924608

Fang, Y., Ding, Y., Feinstein, W. P., Koppelman, D. M., Moreno, J., Jarrell, M., et al. (2016). Geauxdock: accelerating structure-based virtual screening with heterogeneous computing. PLoS ONE 11:e0158898. doi: 10.1371/journal.pone.0158898

Ferreira, L., dos Santos, R., Oliva, G., and Andricopulo, A. (2015). Molecular docking and structure-based drug design strategies. Molecules 20, 13384–13421. doi: 10.3390/molecules200713384

Ferreira, R. S., Glaucius, O., and Andricopulo, A. D. (2011). Integração das técnicas de triagem virtual e triagem biológica automatizada em alta escala: oportunidades e desafios em P&amp;D de fármacos. Quim. Nova 34, 1770–1778. doi: 10.1590/S0100-40422011001000010

Flach, P. A., and Wu, S. (2005). Repairing concavities in ROC curves. IJCAI Int. Joint Conf. Arti. Intell. 702–707. doi: 10.5555/1642293.1642406

Friesner, R. A., Banks, J. L., Murphy, R. B., Halgren, T. A., Klicic, J. J., Mainz, D. T., et al. (2004). Glide: a new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749. doi: 10.1021/jm0306430

Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., et al. (2012). ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107. doi: 10.1093/nar/gkr777

Geng, C., Jung, Y., Renaud, N., Honavar, V., Bonvin, A. M. J. J., and Xue, L. C. (2019). iScore: a novel graph kernel-based function for scoring protein-protein docking models. Bioinformatics 36, 112–121. doi: 10.1101/498584

Gowthaman, R., Lyskov, S., and Karanicolas, J. (2015). DARC 2.0: improved docking and virtual screening at protein interaction sites. PLoS ONE 10:e0131612. doi: 10.1371/journal.pone.0131612

Grosdidier, A., Zoete, V., and Michielin, O. (2011). SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucleic Acids Res. 39, 270–277. doi: 10.1093/nar/gkr366

Haga, J. H., Ichikawa, K., and Date, S. (2016). Virtual screening techniques and current computational infrastructures. Curr. Pharm. Des. 22, 3576–3584. doi: 10.2174/1381612822666160414142530

Hamza, A., Wei, N-N., and Zhan, C-G. (2012). Ligand-based virtual screening approach using a new scoring function. J. Chem. Inf. Model. 52, 963–974. doi: 10.1021/ci200617d

Harrison, R. L. (2010). Introduction to monte carlo simulation. AIP Conf. Proc. 1204, 17–21. doi: 10.1063/1.3295638

Hatmal, M. M., and Taha, M. O. (2017). Simulated annealing molecular dynamics and ligand–receptor contacts analysis for pharmacophore modeling. Future Med. Chem. 9, 1141–1159. doi: 10.4155/fmc-2017-0061

Hawkins, P. C. D., Warren, G. L., Skillman, A. G., and Nicholls, A. (2008). How to do an evaluation: pitfalls and traps. J. Comput. Aided. Mol. Des. 22, 179–190. doi: 10.1007/s10822-007-9166-3

Hillisch, A., Pineda, L. F., and Hilgenfeld, R. (2004). Utility of homology models in the drug discovery process. Drug Discov. Today 9, 659–669. doi: 10.1016/S1359-6446(04)03196-4

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844. doi: 10.1109/34.709601

Horvath, D. (1997). A virtual screening approach applied to the search for trypanothione reductase inhibitors. J. Med. Chem . 2623, 2412–2423. doi: 10.1021/jm9603781

Houston, D. R., and Walkinshaw, M. D. (2013). Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model. 53, 384–390. doi: 10.1021/ci300399w

Hsu, K-C., Chen, Y-F., Lin, S-R., and Yang, J-M. (2011). iGEMDOCK: a graphical environment of enhancing GEMDOCK using pharmacological interactions and post-screening analysis. BMC Bioinformatics 12(Suppl. 1):S33. doi: 10.1186/1471-2105-12-S1-S33

Huang, N., Shoichet, B. K., and Irwin, J. J. (2006). Benchmarking sets for molecular docking. J. Med. Chem. 49, 6789–6801. doi: 10.1021/jm0608356

Huang, S-Y., Grinter, S. Z., and Zou, X. (2010). Scoring functions and their evaluation methods for protein–ligand docking: recent advances and future directions. Phys. Chem. Chem. Phys. 12, 12899–12908. doi: 10.1039/c0cp00151a

Hui-fang, L., Qing, S., Jian, Z., and Wei, F. (2010). Evaluation of various inverse docking schemes in multiple targets identification. J. Mol. Graph. Model. 29, 326–330. doi: 10.1016/j.jmgm.2010.09.004

Jain, A. N. (2008). Bias, reporting, and sharing: computational evaluations of docking methods. J. Comput. Aided. Mol. Des. 22, 201–212. doi: 10.1007/s10822-007-9151-x

Kapetanovic, I. M. (2008). Computer aided darug discovery and development: in silico -chemico-biological approach. Chem. Biol. Interact. 171, 165–176. doi: 10.1016/j.cbi.2006.12.006

Kim, S., Thiessen, P. A., Bolton, E. E., Chen, J., Fu, G., Gindulyte, A., et al. (2016). PubChem substance and compound databases. Nucleic Acids Res. 44, D1202–D1213. doi: 10.1093/nar/gkv951

Korb, O., Stutzle, T., and Exner, T. E. (2009). PLANTS: application of ant colony optimization to structure-based drug design. Theor. Comput. Sci. 49, 84–96. doi: 10.1007/11839088_22

Korkmaz, S., Zararsiz, G., and Goksuluk, D. (2015). MLViS: a web tool for machine learning-based virtual screening in early-phase of drug discovery and development. PLoS ONE 10:e0124600. doi: 10.1371/journal.pone.0124600

Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R., and Ferrin, T. E. (1982). A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161, 269–288. doi: 10.1016/0022-2836(82)90153-X

Lätti, S., Niinivehmas, S., and Pentikäinen, O. T. (2016). Rocker: open source, easy-to-use tool for AUC and enrichment calculations and ROC visualization. J. Cheminform. 8:45. doi: 10.1186/s13321-016-0158-y

Leach, A. R., Gillet, V. J., Lewis, R. A., and Taylor, R. (2010). Three-dimensional pharmacophore methods in drug discovery. J. Med. Chem. 53, 539–558. doi: 10.1021/jm900817u

Leelananda, S. P., and Lindert, S. (2016). Computational methods in drug discovery. Beilstein J. Org. Chem. 12, 2694–2718. doi: 10.3762/bjoc.12.267

Leisinger, K. M., Garabedian, L. F., and Wagner, A. K. (2012). Improving access to medicines in low and middle income countries: corporate responsibilities in context. South. Med Rev. 5, 3–8.

PubMed Abstract | Google Scholar

Li, G. B., Yang, L. L., Wang, W. J., Li, L. L., and Yang, S. Y. (2013). ID-score: a new empirical scoring function based on a comprehensive set of descriptors related to protein-ligand interactions. J. Chem. Inf. Model. 53, 592–600. doi: 10.1021/ci300493w

Li, L., Wang, B., and Meroueh, S. O. (2011). Support vector regression scoring of receptor-ligand complexes for rank-ordering and virtual screening of chemical libraries. J. Chem. Inf. Model. 51, 2132–2138. doi: 10.1021/ci200078f

Li, Z., Gu, J., Zhuang, H., Kang, L., Zhao, X., and Guo, Q. (2015). Adaptive molecular docking method based on information entropy genetic algorithm. Appl. Soft Comput. 26, 299–302. doi: 10.1016/j.asoc.2014.10.008

Lima, A. N., Philot, E. A., Trossini, G. H. G., Scott, L. P. B., Maltarollo, V. G., and Honorio, K. M. (2016). Use of machine learning approaches for novel drug discovery. Expert Opin. Drug Discov. 11, 225–239. doi: 10.1517/17460441.2016.1146250

Lionta, E., Spyrou, G., Vassilatis, D., and Cournia, Z. (2014). Structure-based virtual screening for drug discovery: principles, applications and recent advances. Curr. Top. Med. Chem. 14, 1923–1938. doi: 10.2174/1568026614666140929124445

Liu, J., and Wang, R. (2015). On classification of current scoring functions. J. Chem. Inf. Model. 55, 475–482. doi: 10.1021/ci500731a

Liu, S., Alnammi, M., Ericksen, S. S., Voter, A. F., Ananiev, G. E., Keck, J. L., et al. (2018). Practical model selection for prospective virtual screening. J. Chem. Inf. Model . 59, 282–293. doi: 10.1101/337956

Lopes, J. C. D., Dos Santos, F. M., Martins-José, A., Augustyns, K., and De Winter, H. (2017). The power metric: a new statistically robust enrichment-type metric for virtual screening applications with early recovery capability. J. Cheminform. 9:7. doi: 10.1186/s13321-016-0189-4

Ma, X., Jia, J., Zhu, F., Xue, Y., Li, Z., and Chen, Y. (2009). Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries. Comb. Chem. High Throughput Screen. 12, 344–357. doi: 10.2174/138620709788167944

Maia, E. H. B., Campos, V. A., dos Reis Santos, B., Costa, M. S., Lima, I. G., Greco, S. J., et al. (2017). Octopus: a platform for the virtual high-throughput screening of a pool of compounds against a set of molecular targets. J. Mol. Model. 23:26. doi: 10.1007/s00894-016-3184-9

Maithri, G, Manasa, B, Vani, S. S., Narendra, A, and Harshita, T (2016). Computational drug design and molecular dynamic studies-a review. Int. J. Biomed. Data Min. 6:123. doi: 10.4172/2090-4924.1000123

Martin, Y. C. (1992). 3D database searching in drug design. J. Med. Chem. 35, 2145–2154. doi: 10.1021/jm00090a001

McGann, M. (2011). FRED pose prediction and virtual screening accuracy. J. Chem. Inf. Model. 51, 578–596. doi: 10.1021/ci100436p

Meier, R., Pippel, M., Brandt, F., Sippl, W., and Baldauf, C. (2010). ParaDockS: a framework for molecular docking with population-based metaheuristics. J. Chem. Inf. Model. 50, 879–889. doi: 10.1021/ci900467x

Méndez, R., Leplae, R., De Maria, L., and Wodak, S. J. (2003). Assessment of blind predictions of protein-protein interactions: current status of docking methods. Proteins Struct. Funct. Genet. 52, 51–67. doi: 10.1002/prot.10393

Meng, X-Y., Zhang, H-X., Mezei, M., and Cui, M. (2011). Molecular docking: a powerful approach for structure-based drug discovery. Curr. Comput. Aided. Drug Des. 7, 146–157. doi: 10.2174/157340911795677602

Merluzzi, V., Hargrave, K., Labadia, M., Grozinger, K., Skoog, M., Wu, J., et al. (1990). Inhibition of HIV-1 replication by a nonnucleoside reverse transcriptase inhibitor. Science 250, 1411–1413. doi: 10.1126/science.1701568

Montes, M., Miteva, M. A., and Villoutreix, B. O. (2007). Structure-based virtual ligand screening with LigandFit: pose prediction and enrichment of compound collections. Proteins Struct. Funct. Bioinforma 68, 712–725. doi: 10.1002/prot.21405

Morris, G. M., Huey, R., Lindstrom, W., Sanner, M. F., Belew, R. K., Goodsell, D. S., et al. (2009). AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785–2791. doi: 10.1002/jcc.21256

Mugumbate, G., Mendes, V., Blaszczyk, M., Sabbah, M., Papadatos, G., Lelievre, J., et al. (2017). Target identification of Mycobacterium tuberculosis phenotypic hits using a concerted chemogenomic, biophysical, and structural approach. Front. Pharmacol. 8:681. doi: 10.3389/fphar.2017.00681

Mysinger, M. M., Carchia, M., Irwin, J. J., and Shoichet, B. K. (2012). Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594. doi: 10.1021/jm300687e

Ng, M. C. K., Fong, S., and Siu, S. W. I. (2015). PSOVina: the hybrid particle swarm optimization algorithm for protein-ligand docking. J. Bioinform. Comput. Biol. 13:1541007. doi: 10.1142/S0219720015410073

Nicholls, A. (2008). What do we know and when do we know it? J. Comput. Aided. Mol. Des. 22, 239–255. doi: 10.1007/s10822-008-9170-2

Nicholls, A. (2011). What do we know?: Simple statistical techniques that help. Chemoinform. Comput. Chem. Biol. 672, 531–581. doi: 10.1007/978-1-60761-839-3_22

Nunes, R. R., da Fonseca, A. L., Pinto, A. C. D. S., Maia, E. H. B., da Silva, A. M., Varotti, F. de P., et al. (2019). Brazilian malaria molecular targets (BraMMT): selected receptors for virtual high-throughput screening experiments. Mem. Inst. Oswaldo Cruz. 114, 1–10. doi: 10.1590/0074-02760180465

Oglic, D., Oatley, S. A., Macdonald, S. J. F., Mcinally, T., Garnett, R., Hirst, J. D., et al. (2018). Active search for computer-aided drug design. Mol. Inform. 37:1700130. doi: 10.1002/minf.201700130

Ogrizek, M., Turk, S., Lešnik, S., Sosic, I., Hodošcek, M., Mirković, B., et al. (2015). Molecular dynamics to enhance structure-based virtual screening on cathepsin B. J. Comput. Aided. Mol. Des. 29, 707–712. doi: 10.1007/s10822-015-9847-2

Ouyang, X., Handoko, S. D., and Kwoh, C. K. (2011). Cscore : a simple yet effective scoring function for protein – ligand binding affinity prediction using modified cmac learning architecture. J. Bioinform. Comput. Biol. 9(Suppl. 1), 1–14. doi: 10.1142/S021972001100577X

Paixão, V. G., and Pita, S. S. R. (2016). Virtual Screening applied to search of inhibitors of trypanosoma cruzi trypanothione reductase employing the Natural Products Database from Bahia state (NatProDB). Rev. Virtual Química 8, 1289–1310. doi: 10.21577/1984-6835.20160093

Park, H., Eom, J. W., and Kim, Y. H. (2014). Consensus scoring approach to identify the inhibitors of AMP-activated protein kinase α2 with virtual screening. J. Chem. Inf. Model. 54, 2139–2146. doi: 10.1021/ci500214e

Paul, S. M., Mytelka, D. S., Dunwiddie, C. T., Persinger, C. C., Munos, B. H., Lindborg, S. R., et al. (2010). How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nat. Rev. Drug Discov. 9, 203–214. doi: 10.1038/nrd3078

Pence, H. E., and Williams, A. (2010). ChemSpider: an online chemical information resource. J. Chem. Educ. 87, 1123–1124. doi: 10.1021/ed100697w

Pereira, J. C., Caffarena, E. R., and Dos Santos, C. N. (2016). Boosting docking-based virtual screening with deep learning. J. Chem. Inf. Model. 56, 2495–2506. doi: 10.1021/acs.jcim.6b00355

Peterson, Y. K., Wang, X. S., Casey, P. J., and Tropsha, A. (2009). The discovery of geranylgeranyltransferase-i inhibitors with novel scaffolds by the means of quantitative structure-activity relationship modeling, virtual screening, and experimental validation. J. Med. Chem. 52, 83–88. doi: 10.1021/jm8013772

Poli, G., Martinelli, A., and Tuccinardi, T. (2016). Reliability analysis and optimization of the consensus docking approach for the development of virtual screening studies. J. Enzyme Inhib. Med. Chem. 31, 167–173. doi: 10.1080/14756366.2016.1193736

Rarey, M., Kramer, B., Lengauer, T., and Klebe, G. (1996). A fast flexible docking method using an incremental construction algorithm. J. Mol. Biol. 261, 470–489. doi: 10.1006/jmbi.1996.0477

Ripphausen, P., Nisius, B., and Bajorath, J. J. (2011). State-of-the-art in ligand-based virtual screening. Drug Discov. Today 16, 372–376. doi: 10.1016/j.drudis.2011.02.011

Ruiz-Carmona, S., Alvarez-Garcia, D., Foloppe, N., Garmendia-Doval, A. B., Juhos, S., Schmidtke, P., et al. (2014). rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLoS Comput. Biol. 10:e1003571. doi: 10.1371/journal.pcbi.1003571

Ruiz-Tagle, B., Villalobos-Cid, M., Dorn, M., and Inostroza-Ponta, M. (2018). “Evaluating the use of local search strategies for a memetic algorithm for the protein-ligand docking problem,” 2017 36th International Conference of the Chilean Computer Science Society (SCCC) (Arica), 1–12. doi: 10.1109/SCCC.2017.8405141

Sargsyan, K., Grauffel, C., and Lim, C. (2017). How molecular size impacts RMSD applications in molecular dynamics simulations. J. Chem. Theory Comput. 13, 1518–1524. doi: 10.1021/acs.jctc.7b00028

Schnecke, V., and Kuhn, L. A. (2000). Virtual screening with solvation and ligand-induced complementarity. Perspect. Drug Discov. Des. 20, 171–190. doi: 10.1023/A:1008737207775

Schneider, P., Tanrikulu, Y., and Schneider, G. (2009). Self-organizing maps in drug discovery: compound library design, scaffold-hopping, repurposing. Curr. Med. Chem. 16, 258–266. doi: 10.2174/092986709787002655

Sci Tegic Accelrys Inc (2019). The MDL Drug Data Report (MDDR) database . Available online at: http://accelrys.com/products/collaborative-science/databases/bioactivity-databases/mddr.html (accessed March 22, 2019).

Sengupta, S., and Bandyopadhyay, S. (2014). Application of support vector machines in virtual screening. Int. J. Comput. Biol. 1, 56–62. doi: 10.34040/IJCB.1.1.2012.20

Shin, W. H., Heo, L., Lee, J., Ko, J., Seok, C., and Lee, J. (2011). LigDockCSA: protein-ligand docking using conformational space annealing. J. Comput. Chem. 32, 3226–3232. doi: 10.1002/jcc.21905

Shin, W. H., Kim, J. K., Kim, D. S., and Seok, C. (2013). GalaxyDock2: protein-ligand docking using beta-complex and global optimization. J. Comput. Chem. 34, 2647–2656. doi: 10.1002/jcc.23438

Sliwoski, G., Kothiwale, S., Meiler, J., Edward, W., and Lowe, J. (2013). Computational methods in drug discovery. Pharmacol. Rev. 66, 334–395. doi: 10.1124/pr.112.007336

Spitzer, R., and Jain, A. N. (2012). Surflex-Dock: docking benchmarks and real-world application. J. Comput. Aided. Mol. Des. 26, 687–699. doi: 10.1007/s10822-011-9533-y

Sridhar, D. (2008). Improving access to essential medicines: how health concerns can be prioritised in the global governance system. Public Health Ethics 1, 83–88. doi: 10.1093/phe/phn012

Sterling, T., and Irwin, J. J. (2015). ZINC15–ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337. doi: 10.1021/acs.jcim.5b00559

Stevens, H., and Huys, I. (2017). Innovative approaches to increase access to medicines in developing countries. Front. Med. 4:218. doi: 10.3389/fmed.2017.00218

Stokes, J. M., Yang, K., Swanson, K., Jin, W., Cubillos-Ruiz, A., Donghia, N. M., et al. (2020). A deep learning approach to antibiotic discovery. Cell 180, 688-702.e13. doi: 10.1016/j.cell.2020.01.021

Surabhi, S., and Singh, B. (2018). Computer aided drug design: an overview. J. Drug Deliv. Ther. 8, 504–509. doi: 10.22270/jddt.v8i5.1894

Talele, T. T., Khedkar, S. A., and Rigby, A. C. (2010). Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. Curr. Top. Med. Chem. 10, 127–41. doi: 10.2174/156802610790232251

Tanchuk, V. Y., Tanin, V. O., Vovk, A. I., and Poda, G. (2016). A new, improved hybrid scoring function for molecular docking and scoring based on AutoDock and AutoDock vina. Chem. Biol. Drug Des. 87, 618–625. doi: 10.1111/cbdd.12697

Taranto, A. G., dos, R, Santos, B., Costa, M. S., Campos, V. A., Lima, I. G., et al. (2015). “Octopus: a virtual high thoughput screening plataform for multi-compouds and targets,” in: XVIIISimpósio Brasileiro de Química Teórica (Pirenópolis - GO -Brasil: Editora da UnB), 266.

ten Brink, T., and Exner, T. E. (2009). Influence of protonation, tautomeric, and stereoisomeric states on protein–ligand docking results. J. Chem. Inf. Model. 49, 1535–1546. doi: 10.1021/ci800420z

Tietze, S., and Apostolakis, J. (2007). GlamDock: development and validation of a new docking tool on several thousand protein-ligand complexes. J. Chem. Inf. Model. 47, 1657–1672. doi: 10.1021/ci7001236

Tresadern, G., Bemporad, D., and Howe, T. (2009). A comparison of ligand based virtual screening methods and application to corticotropin releasing factor 1 receptor. J. Mol. Graph. Model. 27, 860–870. doi: 10.1016/j.jmgm.2009.01.003

Triballeau, N., Acher, F., Brabet, I., Pin, J., and Bertrand, H. (2005). Virtual screening workflow development guided by the “receiver operating characteristic” curve approach. application to high-throughput docking on metabotropic glutamate receptor subtype 4. J. Med. Chem. 48, 2534–2547. doi: 10.1021/jm049092j

Trott, O., and Olson, A. J. (2009). AutoDock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461. doi: 10.1002/jcc.21334

Truchon, J. F., and Bayly, C. I. (2007). Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. J. Chem. Inf. Model. 47, 488–508. doi: 10.1021/ci600426e

Tuccinardi, T., Poli, G., Romboli, V., Giordano, A., and Martinelli, A. (2014). Extensive consensus docking evaluation for ligand pose prediction and virtual screening studies. J. Chem. Inf. Model. 54, 2980–2986. doi: 10.1021/ci500424n

Verdonk, M. L., Cole, J. C., Hartshorn, M. J., Murray, C. W., and Taylor, R. D. (2003). Improved protein-ligand docking using GOLD. Proteins Struct. Funct. Genet. 52, 609–623. doi: 10.1002/prot.10465

Verschueren, E., Vanhee, P., Rousseau, F., Schymkowitz, J., and Serrano, L. (2013). Protein-peptide complex prediction through fragment interaction patterns. Structure 21, 789–797. doi: 10.1016/j.str.2013.02.023

Vilar, S., Cozza, G., and Moro, S. (2008). Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery. Curr. Top. Med. Chem. 8, 1555–1572. doi: 10.2174/156802608786786624

von Korff, M., Freyss, J., and Sander, T. (2009). Comparison of ligand- and structure-based virtual screening on the DUD data set. J. Chem. Inf. Model. 49, 209–231. doi: 10.1021/ci800303k

von Wartburg, A., and Traber, R. (1988). 1 cyclosporins, fungal metabolites with immunosuppressive activities. Prog. Med. Chem. 25, 1–33. doi: 10.1016/S0079-6468(08)70276-5

Wang, T., Wu, M., Chen, Z., Chen, H., Lin, J., and Yang, L. (2015). Fragment-based drug discovery and molecular docking in drug design. Curr. Pharm. Biotechnol. 16, 11–25. doi: 10.2174/1389201015666141122204532

Ward, W. H. J., Cook, P. N., Slater, A. M., Davies, D. H., Holdgate, G. A., and Green, L. R. (1994). Epidermal growth factor receptor tyrosine kinase. Investigation of catalytic mechanism, structure-based searching and discovery of a potent inhibitor. Biochem. Pharmacol. 48, 659–666. doi: 10.1016/0006-2952(94)90042-6

Willett, P. (2006). Similarity-based virtual screening using 2D fingerprints. Drug Discov. Today 11, 1046–1053. doi: 10.1016/j.drudis.2006.10.005

Wishart, D. S., Feunang, Y. D., Guo, A. C., Lo, E. J., Marcu, A., Grant, R., et al. (2018). DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, 1074–1082. doi: 10.1093/nar/gkx1037

Wójcikowski, M., Ballester, P. J., and Siedlecki, P. (2017). Performance of machine-learning scoring functions in structure-based virtual screening. Sci. Rep. 7:46710. doi: 10.1038/srep46710

Wood, A., and Armour, D. (2005). The discovery of the CCR5 receptor antagonist, UK-427,857, a new agent for the treatment of HIV infection and AIDS. Progress Med Chem . 43, 239–271. doi: 10.1016/S0079-6468(05)43007-6

Xia, J., Feng, B., Shao, Q., Yuan, Y., Wang, X. S., Chen, N., et al. (2017). Virtual screening against phosphoglycerate kinase 1 in quest of novel apoptosis inhibitors. Molecules 22:1029. doi: 10.3390/molecules22061029

Yuriev, E., and Ramsland, P. A. (2013). Latest developments in molecular docking: 2010-2011 in review. J. Mol. Recognit. 26, 215–239. doi: 10.1002/jmr.2266

Zhao, W., Hevener, K. E., White, S. W., Lee, R. E., and Boyett, J. M. (2009). A statistical framework to evaluate virtual screening. BMC Bioinform. 10:225. doi: 10.1186/1471-2105-10-225

Zhao, Y., and Sanner, M. F. (2007). FLIPDock: docking flexible ligands into flexible receptors. Proteins Struct. Funct. Bioinforma 68, 726–737. doi: 10.1002/prot.21423

Zilian, D., and Sotriffer, C. A. (2013). SFCscoreRF: a random forest-based scoring function for improved affinity prediction of protein-ligand complexes. J. Chem. Inf. Model. 53, 1923–1933. doi: 10.1021/ci400120b

Zsoldos, Z., Reid, D., Simon, A., Sadjad, S. B., and Johnson, A. P. (2007). eHiTS: a new fast, exhaustive flexible ligand docking system. J. Mol. Graph. Model. 26, 198–212. doi: 10.1016/j.jmgm.2006.06.002

Keywords: SBVS, homology modeling, consensus virtual screening, scoring functions, computer-aided drug design

Citation: Maia EHB, Assis LC, Oliveira TA, Silva AM and Taranto AG (2020) Structure-Based Virtual Screening: From Classical to Artificial Intelligence. Front. Chem. 8:343. doi: 10.3389/fchem.2020.00343

Received: 27 June 2019; Accepted: 01 April 2020; Published: 28 April 2020.

Reviewed by:

Copyright © 2020 Maia, Assis, de Oliveira, da Silva and Taranto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Eduardo Habib Bechelane Maia, habib@cefetmg.br

This article is part of the Research Topic

In Silico Methods for Drug Design and Discovery

drug screening research papers

Using a swab inside the cheek and a sophisticated computer algorithm, a DNA test recently approved by federal regulators promises to assess genetic risk of opioid addiction. The test’s maker says results give doctors and patients a crucial tool when considering use of the very pain pills that ignited the nation’s opioid crisis.

But as the company, SOLVD Health, prepares to roll out AvertD in coming months, skeptics remain unconvinced. They worry that patients shown to have a low risk of addiction may feel emboldened to pop pain pills — then get hooked. Or that doctors will deny painkillers to patients errantly deemed at elevated risk.

Above all, some geneticists and public health experts say AvertD relies on unsound science. The Food and Drug Administration approved the AvertD cheek-swab test in December, despite an agency committee of experts voting overwhelmingly, 11-2, against recommending approval.

Andrew Kolodny, an opioid researcher at Brandeis University, blasted the test as a “sham.”

“You’ve got an alignment: profiteering by industry and federal agencies that feel they’re under the gun to do something about the problem, even if that something is counterproductive,” said Kolodny, president of Physicians for Responsible Opioid Prescribing, which aims to educate prescribers and patients.

The FDA says it approved the test because the severity of the opioid crisis “calls for innovative measures” to prevent addiction and save lives. The agency said it worked with SOLVD Health to address concerns raised by the advisory committee.

The company and the FDA stress that AvertD does not predict whether a person will develop opioid use disorder. They say doctors should use the test — which is available by prescription — only as part of an in-depth evaluation of adults who may experience pain or will undergo surgery and have no history of using opioids, not those already grappling with addiction. Patients identified by AvertD as having an elevated risk are 18 times more likely to develop an addiction after taking an opioid compared with people who take the test and are shown not to have an elevated risk, according to research cited by the company.

“We are extremely focused on informing and empowering a patient to understand their risk prior to taking that medication,” said Keri Donaldson, CEO of SOLVD Health, based in Carlsbad, Calif.

To use the test, a doctor swabs the inside of a patient’s cheek, then sends the sample overnight to the company. Its lab analyzes DNA for 15 genetic markers it says are associated with opioid use disorders. The data is run through an algorithm trained on genetic data from more than 7,000 people, some of whom were diagnosed with opioid use disorder. The computer generates a score of between zero and one. Someone who scores 0.33 or higher is considered at elevated genetic risk of getting hooked on an opioid medication.

“We felt that was important information that providers and patients should have access to,” William Maisel, director of the office of product evaluation and quality at the FDA’s Center for Devices and Radiological Health, said in an interview.

“You could argue that it’s unethical to withhold that type of product and that type of information from people. … We recognize the potential shortcomings of the test. But there are also great benefits,” Maisel said.

The FDA has long been criticized for approving certain drugs despite concerns about their safety or effectiveness. The agency has moved to increase regulation of increasingly popular lab tests, after earlier scrutiny over limited oversight.

AvertD was developed by SOLVD subsidiary AutoGenomics. The company has yet to set prices for the test, which could be covered by Medicare and Medicaid.

The FDA advisory committee that reviewed AvertD was composed of independent experts who raised questions about the validity of the test during an Oct. 20, 2022, meeting . While their vote was not binding, committee members cited worries about algorithm bias, uncertainty about how much of a role the genetic variations play in addiction and the design of the clinical trial.

The committee and speakers such as Kolodny and patient representative Elizabeth Joniak-Grant were troubled by how AvertD might be used in the real world — and how results will be understood by consumers.

Patient advocates wonder if time-strapped doctors, despite FDA-mandated labeling, will lean too heavily on results instead of asking probing questions of patients. Will doctors prescribe the test off-label? Could a test indicating a risk for addiction be memorialized in medical records and used against a patient years later — in a custody dispute in court, or by an employer? Will someone with an elevated test come to believe they are prone to addictions of all sorts?

Joniak-Grant, a North Carolina sociologist, worries about all of those questions — and whether pain patients might be treated unfairly if they decline to use AvertD. “Refusing the test could be interpreted by clinicians as a patient being drug-seeking,” she said.

FDA officials approved the test with conditions. The agency added a “black-box” label warning patients about the test’s limitations. Such warnings are rare for products newly approved by the FDA, said Michael Abrams, a senior researcher at Public Citizen, a nonprofit consumer advocacy group.

“Somehow, they justified approval based on putting on these warnings,” Abrams said. He said he fears that despite the warnings, the test could provide “very high cover for irresponsible opioid prescribing.”

In response to concerns raised by the advisory committee, the company also changed the wording of test results — patients now have an elevated, not “high,” risk of developing an opioid use disorder. Patients and doctors will need to undergo education programs such as videos, online classes or reading materials. The FDA is requiring the company to study how patients and doctors use the test. Donaldson, the company’s CEO, said SOLVD is committed to educating clinicians and patients about how the test results should be viewed responsibly.

“This type of information should be empowering, not fear inducing, and it should not be misinterpreted,” Donaldson said.

Products such as AvertD reflect hope that cutting-edge science can ease a drug epidemic killing more than 100,000 people in the United States each year, mostly from opioids. The crisis was kick-started by the proliferation of legal — and highly addictive — prescription pain pills in the late 1990s. Regulators and law enforcement cracked down on unscrupulous prescribers. Doctors began writing fewer prescriptions. Users switched to cheaper street heroin, now largely replaced by dangerous illicit fentanyl. Opioid use disorder affects at least 6 million Americans.

Genetic testing is playing an increasing role in identifying health risks. Tests for general wellness and low-risk medical purposes do not require FDA approval. In 2017, the FDA authorized 23andMe to market the first agency-approved direct-to-consumer genetics test that examines predisposition to diseases such as Parkinson’s and blood clotting disorders; the agency has approved several others since.

AvertD is the first FDA-authorized polygenic risk test. Such genetic tests analyze small variations in an array of genes to predict risk for chronic diseases or other traits. While insignificant on their own, these genetic variations combined may influence susceptibility to certain diseases.

Such testing isn’t supposed to be a definitive prediction of heart disease, diabetes or hypertension. Rather, the tests provide a risk score that can guide patients into changing their lifestyles, implementing preventive care and testing for diseases earlier. In most cases, genetics alone doesn’t determine likelihood of illness. Environmental and social factors such as poverty, clean air and access to healthy food play a significant role.

Well-known companies such as Ancestry and 23andMe offer direct-to-consumer reports that rely on polygenic risk scores and predict the probability of inheriting or passing down a certain disease, developing insomnia or even having an aversion to cilantro.

AvertD is intended for use in a medical setting, but critics fear that the imprimatur of the doctor’s office may lead patients and clinicians to assign too much weight to test results.

They note that substance use disorders are complex to ascertain through such testing because tiny genetic variations — say, ones affecting the reward center in the brain — may overlap with ones affecting other behavioral health disorders. Several researchers said they are concerned that AvertD’s analysis of 15 genetic mutations and the sample size of 7,000 people used to develop the test aren’t large enough to deliver meaningful information about risk for addiction to opioids.

“There should be hundreds or thousands of genetic variants going in that risk score, not 15,” said Danielle M. Dick, director of the Rutgers Addiction Research Center, who is part of a team developing a genetics test that would examine the broader risk of addiction by analyzing genetic data from more than 1.5 million people compiled by researchers during many years.

White people of European ancestry are overrepresented in most genetic data. Even algorithms using smaller, more diverse data sets primed to predict for opioid use disorders may be confounded by ancestry, potentially skewing results for African Americans and people of mixed heritage from different parts of the world, researchers say.

In a 2021 study , algorithms attempting to predict opioid addiction using 16 genetic markers performed “no better than a coin flip” after correcting for geographic ancestry, said lead author Alexander S. Hatoum, a researcher at Washington University in St. Louis. Hatoum said genetic testing remains less effective than a clinician asking simple questions about family history of drug use.

“It’s just an underdeveloped technology at this point,” Hatoum said.

Donaldson, of SOLVD Health, said about 30 percent of the people whose information shapes the algorithm identified as Black or African American, and that most volunteers hailed from the United States. He said the data was “purposely very diverse” and pointed to a clinical trial he asserts backs up the test’s validity.

Initially, AvertD won’t be available nationwide. Instead, the company will partner with institutions in five to 10 locations.

The test has drawn attention from families touched by the opioid crisis.

Ken Daniels, a play-by-play announcer for the Detroit Red Wings, lent his support to the test at the advisory committee meeting. His 23-year-old son, Jamie Daniels, died of an overdose in 2016 after a long struggle with addiction that started with prescription pills. In an interview, Ken Daniels said his son might have avoided taking that medication had he been able to access a test such as AvertD.

“If there is anything out there than can be done to give people more knowledge, I’m all for it,” said Daniels, who started an advocacy foundation in his son’s name.

drug screening research papers

  • canva icon Canvas
  • Panther Mail (students)
  • Arts & Film
  • Business & Economy
  • Campus & Students
  • College Knowledge
  • Health & Behavior
  • Humanities & Society
  • Science & Engineering
  • University News
  • Chapman Magazine
  • Chapman Forward
  • Maps & Directions
  • Visit Chapman
  • Discover Chapman
  • Facts & Rankings
  • Campus Services
  • Degrees & Programs
  • Schools & Colleges
  • Academic Calendar
  • Faculty Directory
  • Course Catalogs
  • International Study
  • Undergraduate Admission
  • Undergraduate Application
  • Graduate Admission
  • Graduate Application
  • Affordability
  • Financial Aid Calculator
  • Campus Tours
  • Get Involved
  • Career Support
  • Diversity & Inclusion
  • Fish Interfaith Center
  • Health & Safety
  • Residence Life
  • Student Life
  • Pre-Award Administration
  • Post-Award Administration
  • Research Integrity
  • Institutes & Centers
  • Center for Undergraduate Excellence
  • Graduate Research Support
  • Contact Development
  • Areas to Support
  • Alumni Involvement
  • Prospective Students
  • Current Students
  • Faculty & Staff
  • Parents & Families
  • All Directories

Dr. Parang in the lab

AI viable alternative to traditional small-molecule drug discovery process

' src=

High throughput screening (HTS) is a standard method in pharmaceutical development for identifying bioactive small molecules crucial for drug discovery. However, HTS necessitates testing molecules before their synthesis.

Alternatively, computational approaches using AI and machine learning allow for pre-synthesis molecule testing. This method yields results that guide which molecules merit synthesis, promising improvements over HTS in terms of cost, speed, and experimental integrity.

A recent study by The Atomwise AIMS Program, published in  Nature Scientific Reports  on April 2, 2024, explored the potential of deep learning-based methods in identifying novel bioactive chemotypes. The study applied the AtomNet model to identify hits for 22 internal pharmaceutical targets.

Contributing to the study,  Professor Keykavous Parang  of Chapman University’s  School of Pharmacy  utilized the AtomNet model to discover drug-like hits for Src kinase, a key enzyme strongly linked to cancer development across various human cancers. Targeting Src kinase holds significant promise in anti-cancer drug discovery.

The study identified drug-like hits for 296 academic targets, followed by dose-response experiments for 49 targets, and further validation through analog exploration of initial hits. Parang shared, “Initially, a single dose was used to identify the most promising hits that interacted with selected targets. Subsequently, based on the results of these experiments, 49 targets were selected, and the effects of varying doses on responses were determined. Eventually, through these experiments, 21 targets were confirmed to interact with the initial compounds identified by the AtomNet model.”

“By showcasing the efficiency of computational techniques in locating new drug candidates, this study may represent a major advance in drug discovery . Computational methods like AtomNet can effectively find interesting compounds in a variety of therapeutic areas and protein classes.  This might cause a paradigm change in drug development by increasing the accessible chemical space and decreasing the need for conventional high-throughput screening techniques.”

 “While a scientist must conduct independent research in order to grow and as expected by their university, major discoveries are usually made in conjunction with other scientists. All things considered, the study highlights the need for collaboration and teamwork for scientific growth, particularly in fields as complex and multidisciplinary as drug discovery. It serves as an example of how pooling the expertise and assets of multiple stakeholders can produce high-quality data that significantly enhances our understanding and validation of computer models in real-world settings. By forming a partnership with multiple universities, organizations, and researchers, the study carried out 318 separate investigations internationally covering a wide range of targets and treatment domains.”

About Chapman University

Founded in 1861, Chapman University is a nationally ranked private university in Orange, California, about 30 miles south of Los Angeles. Chapman serves nearly 10,000 undergraduate and graduate students, with a 12:1 student-to-faculty ratio. Students can choose from 123 areas of study within 11 colleges for a personalized education. Chapman is categorized by the Carnegie Classification as an R2 “high research activity” institution. Students at Chapman learn directly from distinguished world-class faculty including Nobel Prize winners, MacArthur fellows, published authors and Academy Award winners. The campus has produced a Rhodes Scholar, been named a top producer of Fulbright Scholars and hosts a chapter of Phi Beta Kappa, the nation’s oldest and most prestigious honor society. Chapman also includes the Harry and Diane Rinker Health Science Campus in Irvine. The university features the No. 4 film school and No. 60 business school in the U.S. Learn more about Chapman University:   www.chapman.edu .

About Atomwise – is a TechBio company leveraging AI/ML to revolutionize small molecule drug discovery. The Atomwise team invented the use of deep learning for structure-based drug design, a core technology of Atomwise’s best-in-class AI discovery and optimization engine, which is differentiated by its ability to find and optimize novel chemical matter. Atomwise has extensively verified its discovery engine, having demonstrated the ability to find compounds with therapeutic potential hundreds of times across a wide variety of protein types and multiple “hard to drug” targets. Atomwise is advancing a proprietary pipeline of small-molecule drug candidates.

' src=

Carly Murphy

Panther Cage Match 3

Media Contacts

Newsroom site, chapman site, services & policies.

drug screening research papers

© 2023 Chapman University

  • Arts & Film
  • Business & Economy
  • Campus & Students
  • Health & Behavior
  • Humanities & Society
  • Science & Engineering

Your Header Sidebar area is currently empty. Hurry up and add some widgets .

The Federal Register

The daily journal of the united states government, request access.

Due to aggressive automated scraping of FederalRegister.gov and eCFR.gov, programmatic access to these sites is limited to access to our extensive developer APIs.

If you are human user receiving this message, we can add your IP address to a set of IPs that can access FederalRegister.gov & eCFR.gov; complete the CAPTCHA (bot test) below and click "Request Access". This process will be necessary for each IP address you wish to access the site from, requests are valid for approximately one quarter (three months) after which the process may need to be repeated.

An official website of the United States government.

If you want to request a wider IP range, first request access for your current IP, and then use the "Site Feedback" button found in the lower left-hand side to make the request.

  • Share full article

For more audio journalism and storytelling, download New York Times Audio , a new iOS app available for news subscribers.

Ronna McDaniel, TV News and the Trump Problem

The former republican national committee chairwoman was hired by nbc and then let go after an outcry..

This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors. Please review the episode audio before quoting from this transcript and email [email protected] with any questions.

From “The New York Times,” I’m Michael Barbaro. This is “The Daily.”

[MUSIC PLAYING]

Today, the saga of Ronna McDaniel and NBC and what it reveals about the state of television news headed into the 2024 presidential race. Jim Rutenberg, a “Times” writer at large, is our guest.

It’s Monday, April 1.

Jim, NBC News just went through a very public, a very searing drama over the past week, that we wanted you to make sense of in your unique capacity as a longtime media and political reporter at “The Times.” This is your sweet spot. You were, I believe, born to dissect this story for us.

Oh, brother.

Well, on the one hand, this is a very small moment for a major network like NBC. They hire, as a contributor, not an anchor, not a correspondent, as a contributor, Ronna McDaniel, the former RNC chairwoman. It blows up in a mini scandal at the network.

But to me, it represents a much larger issue that’s been there since that moment Donald J. Trump took his shiny gold escalator down to announce his presidential run in 2015. This struggle by the news media to figure out, especially on television, how do we capture him, cover him for all of his lies, all the challenges he poses to Democratic norms, yet not alienate some 74, 75 million American voters who still follow him, still believe in him, and still want to hear his reality reflected in the news that they’re listening to?

Right. Which is about as gnarly a conundrum as anyone has ever dealt with in the news media.

Well, it’s proven so far unsolvable.

Well, let’s use the story of what actually happened with Ronna McDaniel and NBC to illustrate your point. And I think that means describing precisely what happened in this situation.

The story starts out so simply. It’s such a basic thing that television networks do. As elections get underway, they want people who will reflect the two parties.

They want talking heads. They want insiders. They want them on their payroll so they can rely on them whenever they need them. And they want them to be high level so they can speak with great knowledge about the two major candidates.

Right. And rather than needing to beg these people to come on their show at 6 o’clock, when they might be busy and it’s not their full-time job, they go off and they basically put them on retainer for a bunch of money.

Yeah. And in this case, here’s this perfect scenario because quite recently, Ronna McDaniel, the chairwoman of the Republican National Committee through the Trump era, most of it, is now out on the market. She’s actually recently been forced out of the party. And all the networks are interested because here’s the consummate insider from Trump world ready to get snatched up under contract for the next election and can really represent this movement that they’ve been trying to capture.

So NBC’S key news executives move pretty aggressively, pretty swiftly, and they sign her up for a $300,000 a year contributor’s contract.

Nice money if you can get it.

Not at millions of dollars that they pay their anchors, but a very nice contract. I’ll take it. You’ll take it. In the eyes of NBC execs she was perfect because she can be on “Meet the Press” as a panelist. She can help as they figure out some of their coverage. They have 24 hours a day to fill and here’s an official from the RNC. You can almost imagine the question that would be asked to her. It’s 10:00 PM on election night. Ronna, what are the Trump people thinking right now? They’re looking at the same numbers you are.

That was good, but that’s exactly it. And we all know it, right? This is television in our current era.

So last Friday, NBC makes what should be a routine announcement, but one they’re very proud of, that they’ve hired Ronna McDaniel. And in a statement, they say it couldn’t be a more important moment to have a voice like Ronna’s on the team. So all’s good, right? Except for there’s a fly in the ointment.

Because it turns out that Ronna McDaniel has been slated to appear on “Meet the Press,” not as a paid NBC contributor, but as a former recently ousted RNC chair with the “Meet The Press” host, Kristen Welker, who’s preparing to have a real tough interview with Ronna McDaniel. Because of course, Ronna McDaniel was chair of the party and at Trump’s side as he tried to refuse his election loss. So this was supposed to be a showdown interview.

From NBC News in Washington, the longest-running show in television history. This is “Meet The Press” with Kristen Welker.

And here, all of a sudden, Kristin Welker is thrown for a loop.

In full disclosure to our viewers, this interview was scheduled weeks before it was announced that McDaniel would become a paid NBC News contributor.

Because now, she’s actually interviewing a member of the family who’s on the same payroll.

Right. Suddenly, she’s interviewing a colleague.

This will be a news interview, and I was not involved in her hiring.

So what happens during the interview?

So Welker is prepared for a tough interview, and that’s exactly what she does.

Can you say, as you sit here today, did Joe Biden win the election fair and square?

He won. He’s the legitimate president.

Did he win fair and square?

Fair and square, he won. It’s certified. It’s done.

She presses her on the key question that a lot of Republicans get asked these days — do you accept Joe Biden was the winner of the election?

But, I do think, Kristen —

Ronna, why has it taken you until now to say that? Why has it taken you until now to be able to say that?

I’m going to push back a little.

McDaniel gets defensive at times.

Because I do think it’s fair to say there were problems in 2020. And to say that does not mean he’s not the legitimate president.

But, Ronna, when you say that, it suggests that there was something wrong with the election. And you know that the election was the most heavily scrutinized. Chris Krebs —

It’s a really combative interview.

I want to turn now to your actions in the aftermath of the 2020 election.

And Welker actually really does go deeply into McDaniel’s record in those weeks before January 6.

On November 17, you and Donald Trump were recorded pushing two Republican Michigan election officials not to certify the results of the election. And on the call —

For instance, she presses McDaniel on McDaniel’s role in an attempt to convince a couple county commissioner level canvassers in Michigan to not certify Biden’s victory.

Our call that night was to say, are you OK? Vote your conscience. Not pushing them to do anything.

McDaniel says, look, I was just telling them to vote their conscience. They should do whatever they think is right.

But you said, do not sign it. If you can go home tonight, do not sign it. How can people read that as anything other than a pressure campaign?

And Welker’s not going to just let her off the hook. Welker presses her on Trump’s own comments about January 6 and Trump’s efforts recently to gloss over some of the violence, and to say that those who have been arrested, he’ll free them.

Do you support that?

I want to be very clear. The violence that happened on January 6 is unacceptable.

And this is a frankly fascinating moment because you can hear McDaniel starting to, if not quite reverse some of her positions, though in some cases she does that, at least really soften her language. It’s almost as if she’s switching uniforms from the RNC one to an NBC one or almost like breaking from a role she was playing.

Ronna, why not speak out earlier? Why just speak out about that now?

When you’re the RNC chair, you kind of take one for the whole team, right? Now, I get to be a little bit more myself.

She says, hey, you know what? Sometimes as RNC chair, you just have to take it for the team sometimes.

Right. What she’s really saying is I did things as chairwoman of the Republican National committee that now that I no longer have that job, I can candidly say, I wished I hadn’t done, which is very honest. But it’s also another way of saying I’m two faced, or I was playing a part.

Ronna McDaniel, thank you very much for being here this morning.

Then something extraordinary happens. And I have to say, I’ve never seen a moment like this in decades of watching television news and covering television news.

Welcome back. The panel is here. Chuck Todd, NBC News chief political analyst.

Welker brings her regular panel on, including Chuck Todd, now the senior NBC political analyst.

Chuck, let’s dive right in. What were your takeaways?

And he launches right into what he calls —

Look, let me deal with the elephant in the room.

The elephant being this hiring of McDaniel.

I think our bosses owe you an apology for putting you in this situation.

And he proceeds, on NBC’S air, to lace into management for, as he describes it, putting Welker in this crazy awkward position.

Because I don’t know what to believe. She is now a paid contributor by NBC News. I have no idea whether any answer she gave to you was because she didn’t want to mess up her contract.

And Todd is very hung up on this idea that when she was speaking for the party, she would say one thing. And now that she’s on the payroll at NBC, she’s saying another thing.

She has credibility issues that she still has to deal with. Is she speaking for herself, or is she speaking on behalf of who’s paying her?

Todd is basically saying, how are we supposed to know which one to believe.

What can we believe?

It is important for this network and for always to have a wide aperture. Having ideological diversity on this panel is something I prided myself on.

And what he’s effectively saying is that his bosses should have never hired her in this capacity.

I understand the motivation, but this execution, I think, was poor.

Someone said to me last night we live in complicated times. Thank you guys for being here. I really appreciate it.

Now, let’s just note here, this isn’t just any player at NBC. Chuck Todd is obviously a major news name at the network. And him doing this appears to just open the floodgates across the entire NBC News brand, especially on its sister cable network, MSNBC.

And where I said I’d never seen anything like what I saw on “Meet the Press” that morning, I’d never seen anything like this either. Because now, the entire MSNBC lineup is in open rebellion. I mean, from the minute that the sun comes up. There is Joe Scarborough and Mika Brzezinski.

We weren’t asked our opinion of the hiring. But if we were, we would have strongly objected to it.

They’re on fire over this.

believe NBC News should seek out conservative Republican voices, but it should be conservative Republicans, not a person who used her position of power to be an anti-democracy election denier.

But it rolls out across the entire schedule.

Because Ronna McDaniel has been a major peddler of the big lie.

The fact that Ms. McDaniel is on the payroll at NBC News, to me that is inexplicable. I mean, you wouldn’t hire a mobster to work at a DA’s office.

Rachel Maddow devotes an entire half hour.

It’s not about just being associated with Donald Trump and his time in the Republican Party. It’s not even about lying or not lying. It’s about our system of government.

Thumbing their noses at our bosses and basically accusing them of abetting a traitorous figure in American history. I mean, just extraordinary stuff. It’s television history.

And let’s face it, we journalists, our bosses, we can be seen as crybabies, and we’re paid complaining. Yeah, that’s what we’re paid to do. But in this case, the NBC executives cannot ignore this, because in the outcry, there’s a very clear point that they’re all making. Ronna McDaniel is not just a voice from the other side. She was a fundamental part of Trump’s efforts to deny his election loss.

This is not inviting the other side. This is someone who’s on the wrong side —

Of history.

Of history, of these moments that we’ve covered and are still covering.

And I think it’s fair to say that at this point, everyone understands that Ronna McDaniel’s time at NBC News is going to be very short lived. Yeah, basically, after all this, the executives at NBC have to face facts it’s over. And on Tuesday night, they release a statement to the staff saying as much.

They don’t cite the questions about red lines or what Ronna McDaniel represented or didn’t represent. They just say we need to have a unified newsroom. We want cohesion. This isn’t working.

I think in the end, she was a paid contributor for four days.

Yeah, one of the shortest tenures in television news history. And look, in one respect, by their standards, this is kind of a pretty small contract, a few hundred thousand dollars they may have to pay out. But it was way more costly because they hired her. They brought her on board because they wanted to appeal to these tens of millions of Americans who still love Donald J. Trump.

And what happens now is that this entire thing is blown up in their face, and those very same people now see a network that, in their view, in the view of Republicans across the country, this network will not accept any Republicans. So it becomes more about that. And Fox News, NBC’S longtime rival, goes wall to wall with this.

Now, NBC News just caved to the breathless demands from their far left, frankly, emotionally unhinged host.

I mean, I had it on my desk all day. And every minute I looked at that screen, it was pounding on these liberals at NBC News driving this Republican out.

It’s the shortest tenure in TV history, I think. But why? Well, because she supports Donald Trump, period.

So in a way, this leaves NBC worse off with that Trump Republican audience they had wanted to court than maybe even they were before. It’s like a boomerang with a grenade on it.

Yeah, it completely explodes in their face. And that’s why to me, the whole episode is so representative of this eight-year conundrum for the news media, especially on television. They still haven’t been able to crack the code for how to handle the Trump movement, the Trump candidacy, and what it has wrought on the American political system and American journalism.

We’ll be right back.

Jim, put into context this painful episode of NBC into that larger conundrum you just diagnosed that the media has faced when it comes to Trump.

Well, Michael, it’s been there from the very beginning, from the very beginning of his political rise. The media was on this kind of seesaw. They go back and forth over how to cover him. Sometimes they want to cover him quite aggressively because he’s such a challenging candidate. He was bursting so many norms.

But at other times, there was this instinct to understand his appeal, for the same reason. He’s such an unusual candidate. So there was a great desire to really understand his voters. And frankly, to speak to his voters, because they’re part of the audience. And we all lived it, right?

But just let me take you back anyway because everything’s fresh again with perspective. And so if you go back, let’s look at when he first ran. The networks, if you recall, saw him as almost like a novelty candidate.

He was going to spice up what was expected to be a boring campaign between the usual suspects. And he was a ratings magnet. And the networks, they just couldn’t get enough of it. And they allowed him, at times, to really shatter their own norms.

Welcome back to “Meet the Press,” sir.

Good morning, Chuck.

Good morning. Let me start —

He was able to just call into the studio and riff with the likes of George Stephanopoulos and Chuck Todd.

What does it have to do with Hillary?

She can’t talk about me because nobody respects women more than Donald Trump.

And CNN gave him a lot of unmitigated airtime, if you recall during the campaign. They would run the press conferences.

It’s the largest winery on the East Coast. I own it 100 percent.

And let him promote his Trump steaks and his Trump wine.

Trump steaks. Where are the steaks? Do we have steaks?

I mean, it got that crazy. But again, the ratings were huge. And then he wins. And because they had previously given him all that airtime, they’ve, in retrospect, sort of given him a political gift, and more than that now have a journalistic imperative to really address him in a different way, to cover him as they would have covered any other candidate, which, let’s face it, they weren’t doing initially. So there’s this extra motivation to make up for lost ground and maybe for some journalistic omissions.

Right. Kind of correct for the lack of a rigorous journalistic filter in the campaign.

Exactly. And the big thing that this will be remembered for is we’re going to call a lie a lie.

I don’t want to sugarcoat this because facts matter, and the fact is President Trump lies.

Trump lies. We’re going to say it’s a lie.

And I think we can’t just mince around it because they are lies. And so we need to call them what they are.

We’re no longer going to use euphemisms or looser language we’re. Going to call it for what it is.

Trump lies in tweets. He spreads false information at rallies. He lies when he doesn’t need to. He lies when the truth is more than enough for him.

CNN was running chyrons. They would fact check Trump and call lies lies on the screen while Trump is talking. They were challenging Trump to his face —

One of the statements that you made in the tail end of the campaign in the midterms that —

Here we go.

That — well, if you don’t mind, Mr. President, that this caravan was an invasion.

— in these crazy press conferences —

They’re are hundreds of miles away, though. They’re hundreds and hundreds of miles away. That’s not an invasion.

Honestly, I think you should let me run the country. You run CNN. And if you did it well, your ratings —

Well, let me ask — if I may ask one other question. Mr. President, if I may ask another question. Are you worried —

That’s enough. That’s enough.

And Trump is giving it right back.

I tell you what, CNN should be ashamed of itself having you working for them. You are a rude, terrible person. You shouldn’t be working for CNN.

Very combative.

So this was this incredibly fraught moment for the American press. You’ve got tens of millions of Trump supporters seeing what’s really basic fact checking. These look like attacks to Trump supporters. Trump, in turn, is calling the press, the reporters are enemies of the people. So it’s a terrible dynamic.

And when January 6 happens, it’s so obviously out of control. And what the traditional press that follows, traditional journalistic rules has to do is make it clear that the claims that Trump is making about a stolen election are just so abjectly false that they don’t warrant a single minute of real consideration once the reporting has been done to show how false they are. And I think that American journalism really emerged from that feeling strongly about its own values and its own place in society.

But then there’s still tens of millions of Trump voters, and they don’t feel so good about the coverage. And they don’t agree that January 6 was an insurrection. And so we enter yet another period, where the press is going to have to now maybe rethink some things.

In what way?

Well, there’s a kind of quiet period after January 6. Trump is off of social media. The smoke is literally dissipating from the air in Washington. And news executives are kind of standing there on the proverbial battlefield, taking a new look at their situation.

And they’re seeing that in this clearer light, they’ve got some new problems, perhaps none more important for their entire business models than that their ratings are quickly crashing. And part of that diminishment is that a huge part of the country, that Trump-loving part of the audience, is really now severed from him from their coverage.

They see the press as actually, in some cases, being complicit in stealing an election. And so these news executives, again, especially on television, which is so ratings dependent, they’ve got a problem. So after presumably learning all these lessons about journalism and how to confront power, there’s a first subtle and then much less subtle rethinking.

Maybe we need to pull back from that approach. And maybe we need to take some new lessons and switch it up a little bit and reverse some of what we did. And one of the best examples of this is none other than CNN.

It had come under new management, was being led by a guy named Chris Licht, a veteran of cable news, but also Stephen Colbert’s late night show in his last job. And his new job under this new management is we’re going to recalibrate a little bit. So Chris Licht proceeds to try to bring the network back to the center.

And how does he do that?

Well, we see some key personalities who represented the Trump combat era start losing air time and some of them lose their jobs. There’s talk of, we want more Republicans on the air. There was a famous magazine article about Chris Licht’s balancing act here.

And Chris Licht says to a reporter, Tim Alberta of the “Atlantic” magazine, look, a lot in the media, including at his own network, quote unquote, “put on a jersey, took a side.” They took a side. And he says, I think we understand that jersey cannot go back on him. Because he says in the end of the day, by the way, it didn’t even work. We didn’t change anyone’s mind.

He’s saying that confrontational approach that defined the four years Trump was in office, that was a reaction to the feeling that TV news had failed to properly treat Trump with sufficient skepticism, that that actually was a failure both of journalism and of the TV news business. Is that what he’s saying?

Yeah. On the business side, it’s easier call, right? You want a bigger audience, and you’re not getting the bigger audience. But he’s making a journalistic argument as well that if the job is to convey the truth and take it to the people, and they take that into account as they make their own voting decisions and formulate their own opinions about American politics, if tens of millions of people who do believe that election was stolen are completely tuning you out because now they see you as a political combatant, you’re not achieving your ultimate goal as a journalist.

And what does Licht’s “don’t put a jersey back on” approach look like on CNN for its viewers?

Well, It didn’t look good. People might remember this, but the most glaring example —

Please welcome, the front runner for the Republican nomination for president, Donald Trump.

— was when he held a town hall meeting featuring Donald J. Trump, now candidate Trump, before an audience packed with Trump’s fans.

You look at what happened during that election. Unless you’re a very stupid person, you see what happens. A lot of the people —

Trump let loose a string of falsehoods.

Most people understand what happened. It was a rigged election.

The audience is pro-Trump audience, was cheering him on.

Are you ready? Are you ready? Can I talk?

Yeah, what’s your answer?

Can I? Do you mind?

I would like for you to answer the question.

OK. It’s very simple to answer.

That’s why I asked it.

It’s very simple. You’re a nasty person, I’ll tell you that.

And during, the CNN anchor hosting this, Kaitlan Collins, on CNN’s own air, it was a disaster.

It felt like a callback to the unlearned lessons of 2016.

Yeah. And in this case, CNN’s staff was up in arms.

Big shakeup in the cable news industry as CNN makes another change at the top.

Chris Licht is officially out at CNN after a chaotic run as chairman and CEO.

And Chris Licht didn’t survive it.

The chief executive’s departure comes as he faced criticism in recent weeks after the network hosted a town hall with Donald Trump and the network’s ratings started to drop.

But I want to say that the CNN leadership still, even after that, as they brought new leadership in, said, this is still the path we’re going to go on. Maybe that didn’t work out, but we’re still here. This is still what we have to do.

Right. And this idea is very much in the water of TV news, that this is the right overall direction.

Yeah. This is, by no means, isolated to CNN. This is throughout the traditional news business. These conversations are happening everywhere. But CNN was living it at that point.

And this, of course, is how we get to NBC deciding to hire Ronna McDaniel.

Right. Because they’re picking up — right where that conversation leaves off, they’re having the same conversation. But for NBC, you could argue this tension between journalistic values and audience. It’s even more pressing. Because even though MSNBC is a niche cable network, NBC News is part of an old-fashioned broadcast network. It’s on television stations throughout the country.

And in fact, those networks, they still have 6:30 newscasts. And believe it or not, millions of people still watch those every night. Maybe not as many as they used to, but there’s still some six or seven million people tuning in to nightly news. That’s important.

Right. We should say that kind of number is sometimes double or triple that of the cable news prime time shows that get all the attention.

On their best nights. So this is big business still. And that business is based on broad — it’s called broadcast for a reason. That’s based on broad audiences. So NBC had a business imperative, and they argue they had a journalistic imperative.

So given all of that, Jim, I think the big messy question here is, when it comes to NBC, did they make a tactical error around hiring the wrong Republican which blew up? Or did they make an even larger error in thinking that the way you handle Trump and his supporters is to work this hard to reach them, when they might not even be reachable?

The best way to answer that question is to tell you what they’re saying right now, NBC management. What the management saying is, yes, this was a tactical error. This was clearly the wrong Republican. We get it.

But they’re saying, we are going to — and they said this in their statement, announcing that they were severing ties with McDaniel. They said, we’re going to redouble our efforts to represent a broad spectrum of the American votership. And that’s what they meant was that we’re going to still try to reach these Trump voters with people who can relate to them and they can relate to.

But the question is, how do you even do that when so many of his supporters believe a lie? How is NBC, how is CNN, how are any of these TV networks, if they have decided that this is their mission, how are they supposed to speak to people who believe something fundamentally untrue as a core part of their political identity?

That’s the catch-22. How do you get that Trump movement person who’s also an insider, when the litmus test to be an insider in the Trump movement is to believe in the denialism or at least say you do? So that’s a real journalistic problem. And the thing that we haven’t really touched here is, what are these networks doing day in and day out?

They’re not producing reported pieces, which I think it’s a little easier. You just report the news. You go out into the world. You talk to people, and then you present it to the world as a nuanced portrait of the country. This thing is true. This thing is false. Again, in many cases, pretty straightforward. But their bread and butter is talking heads. It’s live. It’s not edited. It’s not that much reported.

So their whole business model especially, again, on cable, which has 24 hours to fill, is talking heads. And if you want the perspective from the Trump movement, journalistically, especially when it comes to denialism, but when it comes to some other major subjects in American life, you’re walking into a place where they’re going to say things that aren’t true, that don’t pass your journalistic standards, the most basic standards of journalism.

Right. So you’re saying if TV sticks with this model, the kind of low cost, lots of talk approach to news, then they are going to have to solve the riddle of who to bring on, who represents Trump’s America if they want that audience. And now they’ve got this red line that they’ve established, that that person can’t be someone who denies the 2020 election reality. But like you just said, that’s the litmus test for being in Trump’s orbit.

So this doesn’t really look like a conundrum. This looks like a bit of a crisis for TV news because it may end up meaning that they can’t hire that person that they need for this model, which means that perhaps a network like NBC does need to wave goodbye to a big segment of these viewers and these eyeballs who support Trump.

I mean, on the one hand, they are not ready to do that, and they would never concede that that’s something they’re ready to do. The problem is barring some kind of change in their news model, there’s no solution to this.

But why bar changes to their news model, I guess, is the question. Because over the years, it’s gotten more and more expensive to produce news, the news that I’m talking about, like recorded packages and what we refer to as reporting. Just go out and report the news.

Don’t gab about it. Just what’s going on, what’s true, what’s false. That’s actually very expensive in television. And they don’t have the kind of money they used to have. So the talking heads is their way to do programming at a level where they can afford it.

They do some packages. “60 Minutes” still does incredible work. NBC does packages, but the lion’s share of what they do is what we’re talking about. And that’s not going to change because the economics aren’t there.

So then a final option, of course, to borrow something Chris Licht said, is that a network like NBC perhaps doesn’t put a jersey on, but accepts the reality that a lot of the world sees them wearing a jersey.

Yeah. I mean, nobody wants to be seen as wearing a jersey in our business. No one wants to be wearing a jersey on our business. But maybe what they really have to accept is that we’re just sticking to the true facts, and that may look like we’re wearing a jersey, but we’re not. And that may, at times, look like it’s lining up more with the Democrats, but we’re not.

If Trump is lying about a stolen election, that’s not siding against him. That’s siding for the truth, and that’s what we’re doing. Easier said than done. And I don’t think any of these concepts are new.

I think there have been attempts to do that, but it’s the world they’re in. And it’s the only option they really have. We’re going to tell you the truth, even if it means that we’re going to lose a big part of the country.

Well, Jim, thank you very much.

Thank you, Michael.

Here’s what else you need to know today.

[PROTESTERS CHANTING]

Over the weekend, thousands of protesters took to the streets of Tel Aviv and Jerusalem in some of the largest domestic demonstrations against the government of Prime Minister Benjamin Netanyahu since Israel invaded Gaza in the fall.

[NON-ENGLISH SPEECH]

Some of the protesters called on Netanyahu to reach a cease fire deal that would free the hostages taken by Hamas on October 7. Others called for early elections that would remove Netanyahu from office.

During a news conference on Sunday, Netanyahu rejected calls for early elections, saying they would paralyze his government at a crucial moment in the war.

Today’s episode was produced by Rob Szypko, Rikki Novetsky, and Alex Stern, with help from Stella Tan.

It was edited by Brendan Klinkenberg with help from Rachel Quester and Paige Cowett. Contains original music by Marion Lozano, Dan Powell, and Rowan Niemisto and was engineered by Chris Wood. Our theme music is by Jim Brunberg and Ben Landsverk of Wonderly.

That’s it for “The Daily.” I’m Michael Barbaro. See you tomorrow.

The Daily logo

  • April 2, 2024   •   29:32 Kids Are Missing School at an Alarming Rate
  • April 1, 2024   •   36:14 Ronna McDaniel, TV News and the Trump Problem
  • March 29, 2024   •   48:42 Hamas Took Her, and Still Has Her Husband
  • March 28, 2024   •   33:40 The Newest Tech Start-Up Billionaire? Donald Trump.
  • March 27, 2024   •   28:06 Democrats’ Plan to Save the Republican House Speaker
  • March 26, 2024   •   29:13 The United States vs. the iPhone
  • March 25, 2024   •   25:59 A Terrorist Attack in Russia
  • March 24, 2024   •   21:39 The Sunday Read: ‘My Goldendoodle Spent a Week at Some Luxury Dog ‘Hotels.’ I Tagged Along.’
  • March 22, 2024   •   35:30 Chuck Schumer on His Campaign to Oust Israel’s Leader
  • March 21, 2024   •   27:18 The Caitlin Clark Phenomenon
  • March 20, 2024   •   25:58 The Bombshell Case That Will Transform the Housing Market
  • March 19, 2024   •   27:29 Trump’s Plan to Take Away Biden’s Biggest Advantage

Hosted by Michael Barbaro

Featuring Jim Rutenberg

Produced by Rob Szypko ,  Rikki Novetsky and Alex Stern

With Stella Tan

Edited by Brendan Klinkenberg ,  Rachel Quester and Paige Cowett

Original music by Marion Lozano ,  Dan Powell and Rowan Niemisto

Engineered by Chris Wood

Listen and follow The Daily Apple Podcasts | Spotify | Amazon Music

Ronna McDaniel’s time at NBC was short. The former Republican National Committee chairwoman was hired as an on-air political commentator but released just days later after an on-air revolt by the network’s leading stars.

Jim Rutenberg, a writer at large for The Times, discusses the saga and what it might reveal about the state of television news heading into the 2024 presidential race.

On today’s episode

drug screening research papers

Jim Rutenberg , a writer at large for The New York Times.

Ronna McDaniel is talking, with a coffee cup sitting on the table in front of her. In the background is footage of Donald Trump speaking behind a lecture.

Background reading

Ms. McDaniel’s appointment had been immediately criticized by reporters at the network and by viewers on social media.

The former Republican Party leader tried to downplay her role in efforts to overturn the 2020 election. A review of the record shows she was involved in some key episodes .

There are a lot of ways to listen to The Daily. Here’s how.

We aim to make transcripts available the next workday after an episode’s publication. You can find them at the top of the page.

The Daily is made by Rachel Quester, Lynsea Garrison, Clare Toeniskoetter, Paige Cowett, Michael Simon Johnson, Brad Fisher, Chris Wood, Jessica Cheung, Stella Tan, Alexandra Leigh Young, Lisa Chow, Eric Krupke, Marc Georges, Luke Vander Ploeg, M.J. Davis Lin, Dan Powell, Sydney Harper, Mike Benoist, Liz O. Baylen, Asthaa Chaturvedi, Rachelle Bonja, Diana Nguyen, Marion Lozano, Corey Schreppel, Rob Szypko, Elisheba Ittoop, Mooj Zadie, Patricia Willens, Rowan Niemisto, Jody Becker, Rikki Novetsky, John Ketchum, Nina Feldman, Will Reid, Carlos Prieto, Ben Calhoun, Susan Lee, Lexie Diao, Mary Wilson, Alex Stern, Dan Farrell, Sophia Lanman, Shannon Lin, Diane Wong, Devon Taylor, Alyssa Moxley, Summer Thomad, Olivia Natt, Daniel Ramirez and Brendan Klinkenberg.

Our theme music is by Jim Brunberg and Ben Landsverk of Wonderly. Special thanks to Sam Dolnick, Paula Szuchman, Lisa Tobin, Larissa Anderson, Julia Simon, Sofia Milan, Mahima Chablani, Elizabeth Davis-Moorer, Jeffrey Miranda, Renan Borelli, Maddy Masiello, Isabella Anderson and Nina Lassam.

Jim Rutenberg is a writer at large for The Times and The New York Times Magazine and writes most often about media and politics. More about Jim Rutenberg

Advertisement

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Elsevier - PMC COVID-19 Collection

Logo of pheelsevier

Artificial intelligence in drug discovery and development

  • • Artificial Intelligence (AI) has revolutionized many aspects of the pharmaceuticals.
  • • AI assistance to pharma industries helps to improve overall life cycle of product.
  • • AI can be implemented in pharma ranging from drug discovery to product management.
  • • Future challenges related to AI and their respective solutions have been expounded.

Artificial Intelligence (AI) has recently started to gear-up its application in various sectors of the society with the pharmaceutical industry as a front-runner beneficiary. This review highlights the impactful use of AI in diverse areas of the pharmaceutical sectors viz., drug discovery and development, drug repurposing, improving pharmaceutical productivity, clinical trials, etc. to name a few, thus reducing the human workload as well as achieving targets in a short period. Crosstalk on the tools and techniques utilized in enforcing AI, ongoing challenges, and ways to overcome them, along with the future of AI in the pharmaceutical industry, is also discussed.

Artificial intelligence-integrated drug discovery and development has accelerated the growth of the pharmaceutical sector, leading to a revolutionary change in the pharma industry. Here, we discuss areas of integration, tools, and techniques utilized in enforcing AI, ongoing challenges, and ways to overcome them.

The use of artificial intelligence (AI) has been increasing in various sectors of society, particularly the pharmaceutical industry. In this review, we highlight the use of AI in diverse sectors of the pharmaceutical industry, including drug discovery and development, drug repurposing, improving pharmaceutical productivity, and clinical trials, among others; such use reduces the human workload as well as achieving targets in a short period of time. We also discuss crosstalk between the tools and techniques utilized in AI, ongoing challenges, and ways to overcome them, along with the future of AI in the pharmaceutical industry.

Artificial intelligence: things to know

Over the past few years, there has been a drastic increase in data digitalization in the pharmaceutical sector. However, this digitalization comes with the challenge of acquiring, scrutinizing, and applying that knowledge to solve complex clinical problems [1] . This motivates the use of AI, because it can handle large volumes of data with enhanced automation [2] . AI is a technology-based system involving various advanced tools and networks that can mimic human intelligence. At the same time, it does not threaten to replace human physical presence 3 , 4 completely. AI utilizes systems and software that can interpret and learn from the input data to make independent decisions for accomplishing specific objectives. Its applications are continuously being extended in the pharmaceutical field, as described in this review. According to the McKinsey Global Institute, the rapid advances in AI-guided automation will be likely to completely change the work culture of society 5 , 6 .

AI: networks and tools

AI involves several method domains, such as reasoning, knowledge representation, solution search, and, among them, a fundamental paradigm of machine learning (ML). ML uses algorithms that can recognize patterns within a set of data that has been further classified. A subfield of the ML is deep learning (DL), which engages artificial neural networks (ANNs). These comprise a set of interconnected sophisticated computing elements involving ‘perceptons’ analogous to human biological neurons, mimicking the transmission of electrical impulses in the human brain [7] . ANNs constitute a set of nodes, each receiving a separate input, ultimately converting them to output, either singly or multi-linked using algorithms to solve problems [8] . ANNs involve various types, including multilayer perceptron (MLP) networks, recurrent neural networks (RNNs), and convolutional neural networks (CNNs), which utilize either supervised or unsupervised training procedures 9 , 10 .

The MLP network has applications including pattern recognition, optimization aids, process identification, and controls, are usually trained by supervised training procedures operating in a single direction only, and can be used as universal pattern classifiers [11] . RNNs are networks with a closed-loop, having the capability to memorize and store information, such as Boltzmann constants and Hopfield networks 11 , 12 . CNNs are a series of dynamic systems with local connections, characterized by its topology, and have use in image and video processing, biological system modeling, processing complex brain functions, pattern recognition, and sophisticated signal processing [13] . The more complex forms include Kohonen networks, RBF networks, LVQ networks, counter-propagation networks, and ADALINE networks 9 , 11 . Examples of method domains of AI are summarized in Figure 1 .

Figure 1

Method domains of artificial intelligence (AI). This figure shows different AI method domains along with their subfields that can be implemented in different fields drug discovery and development.

Several tools have been developed based on the networks that form the core architecture of AI systems. One such tool developed using AI technology is the International Business Machine (IBM) Watson supercomputer (IBM, New York, USA). It was designed to assist in the analysis of a patient’s medical information and its correlation with a vast database, resulting in suggesting treatment strategies for cancer. This system can also be used for the rapid detection of diseases. This was demonstrated by its ability to detect breast cancer in only 60 s 14 , 15 .

AI in the lifecycle of pharmaceutical products

Involvement of AI in the development of a pharmaceutical product from the bench to the bedside can be imagined given that it can aid rational drug design [16] ; assist in decision making; determine the right therapy for a patient, including personalized medicines; and manage the clinical data generated and use it for future drug development [17] . E-VAI is an analytical and decision-making AI platform developed by Eularis, which uses ML algorithms along with an easy-to-use user interface to create analytical roadmaps based on competitors, key stakeholders, and currently held market share to predict key drivers in sales of pharmaceuticals [18] , thus helping marketing executives to allocate resources for maximum market share gain, reversing poor sales and enabled them to anticipate where to make investments. Different applications of AI in drug discovery and development are summarized in Figure 2 .

Figure 2

Applications of artificial intelligence (AI) in different subfields of the pharmaceutical industry, from drug discovery to pharmaceutical product management.

AI in drug discovery

The vast chemical space, comprising >10 60 molecules, fosters the development of a large number of drug molecules [19] . However, the lack of advanced technologies limits the drug development process, making it a time-consuming and expensive task, which can be addressed by using AI [15] . AI can recognize hit and lead compounds, and provide a quicker validation of the drug target and optimization of the drug structure design 19 , 20 . Different applications of AI in drug discovery are depicted in Figure 3 .

Figure 3

Role of artificial intelligence (AI) in drug discovery. AI can be used effectively in different parts of drug discovery, including drug design, chemical synthesis, drug screening, polypharmacology, and drug repurposing.

Despite its advantages, AI faces some significant data challenges, such as the scale, growth, diversity, and uncertainty of the data. The data sets available for drug development in pharmaceutical companies can involve millions of compounds, and traditional ML tools might not be able to deal with these types of data. Quantitative structure-activity relationship (QSAR)-based computational model can quickly predict large numbers of compounds or simple physicochemical parameters, such as log P or log D. However, these models are some way from the predictions of complex biological properties, such as the efficacy and adverse effects of compounds. In addition, QSAR-based models also face problems such as small training sets, experimental data error in training sets, and lack of experimental validations. To overcome these challenges, recently developed AI approaches, such as DL and relevant modeling studies, can be implemented for safety and efficacy evaluations of drug molecules based on big data modeling and analysis. In 2012, Merck supported a QSAR ML challenge to observe the advantages of DL in the drug discovery process in the pharmaceutical industry. DL models showed significant predictivity compared with traditional ML approaches for 15 absorption, distribution, metabolism, excretion, and toxicity (ADMET) data sets of drug candidates 21 , 22 .

The virtual chemical space is enormous and suggests a geographical map of molecules by illustrating the distributions of molecules and their properties. The idea behind the illustration of chemical space is to collect positional information about molecules within the space to search for bioactive compounds and, thus, virtual screening (VS) helps to select appropriate molecules for further testing. Several chemical spaces are open access, including PubChem, ChemBank, DrugBank, and ChemDB.

Numerous in silico methods to virtual screen compounds from virtual chemical spaces along with structure and ligand-based approaches, provide a better profile analysis, faster elimination of nonlead compounds and selection of drug molecules, with reduced expenditure [19] . Drug design algorithms, such as coulomb matrices and molecular fingerprint recognition, consider the physical, chemical, and toxicological profiles to select a lead compound [23] .

Various parameters, such as predictive models, the similarity of molecules, the molecule generation process, and the application of in silico approaches can be used to predict the desired chemical structure of a compound 20 , 24 . Pereira et al. presented a new system, DeepVS, for the docking of 40 receptors and 2950 ligands, which showed exceptional performance when 95 000 decoys were tested against these receptors [25] . Another approach applied a multiobjective automated replacement algorithm to optimize the potency profile of a cyclin-dependent kinase-2 inhibitor by assessing its shape similarity, biochemical activity, and physicochemical properties [26] .

QSAR modeling tools have been utilized for the identification of potential drug candidates and have evolved into AI-based QSAR approaches, such as linear discriminant analysis (LDA), support vector machines (SVMs), random forest (RF) and decision trees, which can be applied to speed up QSAR analysis 27 , 28 , 29 . King et al. found a negligible statistical difference when the ability of six AI algorithms to rank anonymous compounds in terms of biological activity was compared with that of traditional approaches [30] .

AI in drug screening

The process of discovering and developing a drug can take over a decade and costs US$2.8 billion on average. Even then, nine out of ten therapeutic molecules fail Phase II clinical trials and regulatory approval 31 , 32 . Algorithms, such as Nearest-Neighbour classifiers, RF, extreme learning machines, SVMs, and deep neural networks (DNNs), are used for VS based on synthesis feasibility and can also predict in vivo activity and toxicity 31 , 33 . Several biopharmaceutical companies, such as Bayer, Roche, and Pfizer, have teamed up with IT companies to develop a platform for the discovery of therapies in areas such as immuno-oncology and cardiovascular diseases [19] . The aspects of VS to which AI has been applied are discussed below.

Prediction of the physicochemical properties

Physicochemical properties, such as solubility, partition coefficient (logP), degree of ionization, and intrinsic permeability of the drug, indirectly affect its pharmacokinetics properties and its target receptor family and, hence, must be considered when designing a new drug [34] . Different AI-based tools can be used to predict physicochemical properties. For example, ML uses large data sets produced during compound optimization done previously to train the program [35] . Algorithms for drug design include molecular descriptors, such as SMILES strings, potential energy measurements, electron density around the molecule, and coordinates of atoms in 3D, to generate feasible molecules via DNN and thereby predict its properties [36] .

Zang et al. created a quantitative structure–property relationship (QSPR) workflow to determine the six physicochemical properties of environmental chemicals obtained from the Environmental Protection Agency (EPA) called the Estimation Program Interface (EPI) Suite [35] . Neural networks based on the ADMET predictor and ALGOPS program have been used to predict the lipophilicity and solubility of various compounds [37] . DL methods, such as undirected graph recursive neural networks and graph-based convolutional neural networks (CVNN), have been used to predict the solubility of molecules [38] .

In several instances, ANN-based models, graph kernels, and kernel ridge-based models were developed to predict the acid dissociation constant of compounds 35 , 39 . Similarly, cell lines, such as Madin-Darby canine kidney cells and human colon adenocarcinoma (Caco-2) cells have been utilized to generate cellular permeability data of a diverse class of molecules, which are subsequently fed to AI-assisted predictors [34] .

Kumar et al. developed six predictive models [SVMs, ANNs, k-nearest neighbor algorithms, LDAs, probabilistic neural network algorithms, and partial least square (PLS)] utilizing 745 compounds for training; these were used later on 497 compounds to predict their intestinal absorptivity based on parameters including molecular surface area, molecular mass, total hydrogen count, molecular refractivity, molecular volume, logP, total polar surface area, the sum of E- states indices, solubility index (log S), and rotatable bonds [40] . On similar lines, RF and DNN-based in silico models were developed to determine human intestinal absorption of a variety of chemical compounds [41] . Thus, AI has a significant role in the development of a drug, to predict not only its desired physicochemical properties, but also the desired bioactivity.

Prediction of bioactivity

The efficacy of drug molecules depends on their affinity for the target protein or receptor. Drug molecules that do not show any interaction or affinity towards the targeted protein will not be able to deliver the therapeutic response. In some instances, it might also be possible that developed drug molecules interact with unintended proteins or receptors, leading to toxicity. Hence, drug target binding affinity (DTBA) is vital to predict drug–target interactions. AI-based methods can measure the binding affinity of a drug by considering either the features or similarities of the drug and its target. Feature-based interactions recognize the chemical moieties of the drug and that of the target to determine the feature vectors. By contrast, in similarity-based interaction, the similarity between drug and target is considered, and it is assumed that similar drugs will interact with the same targets [42] .

Web applications, such as ChemMapper and the similarity ensemble approach (SEA), are available for predicting drug–target interactions [43] . Many strategies involving ML and DL have been used to determine DTBA, such as KronRLS, SimBoost, DeepDTA, and PADME. ML-based approaches, such as Kronecker-regularized least squares (KronRLS), evaluate the similarity between drugs and protein molecules to determine DTBA. Similarly, SimBoost utilized regression trees to predict DTBA, and considers both feature-based and similarity-based interactions. Drug features from SMILES, ligand maximum common substructure (LMCS), extended connectivity fingerprint, or a combination thereof can also be considered [42] .

DL approaches have shown improved performance compared with ML because they apply network-based methods that do not depend on the availability of the 3D protein structure [43] . DeepDTA, PADME, WideDTA, and DeepAffinity are some DL methods used to measure DTBA. DeepDTA accepts drug data in the form of SMILES, whereby, the amino acid sequence is entered for protein input data and for the 1D representation of the drug structure [44] . WideDTA is CVNN DL method that incorporates ligand SMILES (LS), amino acid sequences, LMCS, and protein domains and motifs as input data for assessing the binding affinity [45] .

DeepAffinity and Protein And Drug Molecule interaction prEdiction (PADME) are similar to the approaches described earlier [46] . DeepAffinity is an interpretable DL model that uses both RNN and CNN and both unlabeled and labeled data. It takes into account the compound in the SMILES format and protein sequences in the structural and physicochemical properties [47] . PADME is a DL-based platform that utilizes feed-forward neural networks for predicting drug target interactions (DTIs). It considers the combination of the features of the drug and target protein as input data and forecasts the interaction strength between the two. For the drug and the target, the SMILES representation and the protein sequence composition (PSC) are used for illustration, respectively [46] . Unsupervised ML techniques, such as MANTRA and PREDICT, can be used to forecast the therapeutic efficacy of drugs and target proteins of known and unknown pharmaceuticals, which can also be extrapolated to the application of drug repurposing and interpreting the molecular mechanism of the therapeutics. MANTRA groups compound based on similar gene expression profiles using a CMap data set and clusters those compounds predicted to have a common mechanism of action and common biological pathway [43] . The bioactivity of a drug also includes ADME data. AI-based tools, such as XenoSite, FAME, and SMARTCyp, are involved in determining the sites of metabolism of the drug. In addition, software such as CypRules, MetaSite, MetaPred, SMARTCyp, and WhichCyp were used to identify specific isoforms of CYP450 that mediate a particular drug metabolism. The clearance pathway of 141 approved drugs was done by SVM-based predictors with high accuracy [48] .

Prediction of toxicity

The prediction of the toxicity of any drug molecule is vital to avoid toxic effects. Cell-based in vitro assays are often used as preliminary studies, followed by animal studies to identify the toxicity of a compound, increasing the expense of drug discovery. Several web-based tools, such as LimTox, pkCSM, admetSAR, and Toxtree, are available to help reduce the cost [35] . Advanced AI-based approaches look for similarities among compounds or project the toxicity of the compound based on input features. The Tox21 Data Challenge organized by the National Institutes of Health, Environmental Protection Agency (EPA), and US Food and Drug Administration (FDA) was an initiative to evaluate several computational techniques to forecast the toxicity of 12 707 environmental compounds and drugs [35] ; an ML algorithm named DeepTox outperformed all methods by identifying static and dynamic features within the chemical descriptors of the molecules, such as molecular weight (MW) and Van der Waals volume, and could efficiently predict the toxicity of a molecule based on predefined 2500 toxicophore features [49] . The different AI tools used in drug discovery are listed in Table 1 .

Examples of AI tools used in drug discovery

SEA was used to evaluate the safety target prediction of 656 marketed drugs against 73 unintended targets that might produce adverse effects [43] . Developed using an ML-based approach, eToxPred was applied to estimate the toxicity and synthesis feasibility of small organic molecules and showed accuracy as high as 72% [48] . Similarly, open-source tools, such as TargeTox and PrOCTOR, are also used in toxicity prediction [50] . TargeTox is biological network target-based drug toxicity risk prediction method that uses the guilt-by-association principle whereby entities that have similar functional properties share similarities in biological networks [51] . It can produce protein network data and unite pharmacological and functional properties in a ML classifier to predict drug toxicity [52] . PrOCTOR was trained using a RF model and took into account drug-likeliness properties, molecular features, target-based features, and properties of the protein targets to generate a ‘PrOCTOR score’, which forecasted whether a drug would fail in clinical trials owing to its toxicity. It also recognized FDA-approved drugs that later reported adverse drug events [53] . In another approach, Tox_(R)CNN involving a deep CVNN method evaluated the cytotoxicity of drugs that had been exposed to DAPI-stained cells [54] .

AI in designing drug molecules

Prediction of the target protein structure.

While developing a drug molecule, it is essential to assign the correct target for successful treatment. Numerous proteins are involved in the development of the disease and, in some cases, they are overexpressed. Hence, for selective targeting of disease, it is vital to predict the structure of the target protein to design the drug molecule. AI can assist in structure-based drug discovery by predicting the 3D protein structure because the design is in accordance with the chemical environment of the target protein site, thus helping to predict the effect of a compound on the target along with safety considerations before their synthesis or production [55] . The AI tool, AlphaFold, which is based on DNNs, was used to analyze the distance between the adjacent amino acids and the corresponding angles of the peptide bonds to predict the 3D target protein structure and demonstrated excellent results by correctly predicting 25 out of 43 structures.

In a study by AlQurashi, RNN was used to predict the protein structure. The author considered three stages (i.e., computation, geometry, and assessment) termed a recurrent geometric network (RGN). Here, the primary protein sequence was encoded, and the torsional angles for a given residue and a partially completed backbone obtained from the geometric unit upstream of this were then considered as input and provided a new backbone as output. The final unit produced the 3D structure as the output. Assessment of the deviation of predicted and experimental structures was done using the distance-based root mean square deviation (dRMSD) metric. The parameters in RGN were optimized to keep the dRMSD low between the experimental and predicted structures [56] . AlQurashi predicted that his AI method would be quicker than AlphaFold in terms of the time taken to predict the protein structure. However, AlphaFold is likely to have better accuracy in predicting protein structures with sequences similar to the reference structures [57] .

A study was conducted to predict the 2D structure of a protein using MATLAB assisted by a nonlinear three-layered NN toolbox based on a feed-forward supervised learning and backpropagation error algorithm. MATLAB was used to train input and output data sets, and the NNs were learning algorithms and performance evaluators. The accuracy in predicting the 2D structure was 62.72% [58] .

Predicting drug–protein interactions

Drug–protein interactions have a vital role in the success of a therapy. The prediction of the interaction of a drug with a receptor or protein is essential to understand its efficacy and effectiveness, allows the repurposing of drugs, and prevents polypharmacology [55] . Various AI methods have been useful in the accurate prediction of ligand–protein interactions, ensuring better therapeutic efficacy 55 , 59 . Wang et al. reported a model using the SVM approach, trained on 15 000 protein–ligand interactions, which were developed based on primary protein sequences and structural characteristics of small molecules to discover nine new compounds and their interaction with four crucial targets [60] .

Yu et al. exploited two RF models to predict possible drug–protein interactions by the integration of pharmacological and chemical data and validating them against known platforms, such as SVM, with high sensitivity and specificity. Also, these modes were capable of predicting drug–target associations that could be further extended to target–disease and target–target associations, thereby speeding up the drug discovery process [61] . Xiao et al. adopted the Synthetic Minority Over-Sampling Technique and the Neighborhood Cleaning Rule to obtain optimized data for the subsequent development of iDrugTarget. This is a combination of four subpredictors (iDrug-GPCR, iDrug-Chl, iDrug-Enz, and iDrug-NR) for identifying interactions between a drug and G-protein-coupled receptors (GPCRs), ion channels, enzymes, and nuclear receptors (NR) respectively. When this predictor was compared with existing predictors through target-jackknife tests, the former surpassed the latter in terms of both prediction accuracy and consistency [62] .

The ability of AI to predict drug–target interactions was also used to assist the repurposing of existing drugs and avoiding polypharmacology. Repurposing an existing drug qualifies it directly for Phase II clinical trials [19] . This also reduces expenditure because relaunching an existing drug costs ∼US$8.4 million compared with the launch of a new drug entity (∼US$41.3 million) [63] . The ‘Guilt by association’ approach can be utilized to forecast the innovative association of a drug and disease, which is either a knowledge-based or computationally driven network [64] . In a computationally driven network, the ML approach is widely used, which utilizes techniques such as SVM, NN, logistic regression, and DL. Logistic regression platforms, such as PREDICT, SPACE, and other ML approaches, consider drug–drug, disease–disease similarity, the similarity between target molecules, chemical structure, and gene expression profiles while repurposing a drug [65] .

Cellular network-based deep learning technology (deepDTnet) has been explored to predict the therapeutic use of topotecan, currently used as a topoisomerase inhibitor. It can also be used for the therapy of multiple sclerosis by inhibiting human retinoic acid receptor-related orphan receptor-gamma t (ROR-γt) [66] . This platform is currently under a provisional US patent. Self-organizing maps (SOMs) are in the unsupervised category of ML and are used in drug repurposing. They use a ligand-based approach to search novel off-targets for a set of drug molecules by training the system on a defined number of compounds with recognized biological activities, which is later used for the analysis of different compounds [67] . In a recent study, DNN was used to repurpose existing drugs with proven activity against SARS-CoV, HIV, influenza virus, and drugs that are 3C-like protease inhibitors. In this, extended connectivity fingerprint (ECFP), functional-class fingerprints (FCFPs), and an octanol-water partition coefficient (ALogP_count) were considered to train the AI platform. From the results, it was concluded that 13 of the screened drugs could be carried toward further development based on their cytotoxicity and viral inhibition [68] .

Drug–protein interactions can also predict the chances of polypharmacology, which is the tendency of a drug molecule to interact with multiple receptors producing off-target adverse effects [69] . AI can design a new molecule based on the rationale of polypharmacology and aid in the generation of safer drug molecules [70] . AI platforms such as SOM, along with the vast databases available, can be used to link several compounds to numerous targets and off-targets. Bayesian classifiers and SEA algorithms can be used to establish links between the pharmacological profiles of drugs and their possible targets [67] .

Li et al. demonstrated the use of KinomeX, an AI-based online medium using DNNs for the detection of polypharmacology of kinases based on their chemical structures. This platform uses DNN trained with ∼14 000 bioactivity data points developed based on >300 kinases. Thus, it has practical application in studying the overall selectivity of a drug towards the kinase family and particular subfamilies of kinases, thus helping to design novel chemical modifiers. This study used NVP-BHG712 as a model compound to predict its primary targets and also its off-targets with reasonable accuracy [71] . One prominent instance is Cyclica’s cloud-based proteome-screening AI platform, Ligand Express, which is used to find receptors that can interact with a particular small molecule (the molecular description of which is in SMILE string) and produce on and off-target interactions. This helps in understanding the possible adverse effects of the drug [72] .

AI in de novo drug design Over the past few years, the de novo drug design approach has been widely used to design drug molecules. The traditional method of de novo drug design is being replaced by evolving DL methods, the former having shortcomings of complicated synthesis routes and difficult prediction of the bioactivity of the novel molecule [36] . Computer-aided synthesis planning can also suggest millions of structures that can be synthesized and also predicts several different synthesis routes for them [73] .

Grzybowski et al. developed the Chematica program [74] , now renamed Synthia, which has the ability to encode a set of rules into the machine and propose possible synthesizing routes for eight medicinally essential targets. This program has proven to be efficient both in terms of improving the yield and reducing expenses. It is also capable of providing alternate synthesizing strategies for patented products and is said to be helpful in the synthesis of compounds that have not yet been synthesized. Similarly, DNN focuses on rules of organic chemistry and retrosynthesis, which, with the aid of Monte-Carlo tree searches and symbolic AI, help in reaction prediction and the process of drug discovery and design, which is much faster than traditional methods 75 , 76 .

Coley et al. developed a framework in which a rigid forward reaction template was applied to a group of reactants to synthesize chemically feasible products with a significant rate of reaction. ML was used to determine the dominant product based on a score given by the NNs [23] . Putin et al. explored a DNN architecture called the reinforced adversarial neural computer (RANC) based on RL for de novo design of small organic molecules. This platform was trained with molecules represented as SMILES strings. It then generated molecules with predefined chemical descriptors in terms of MW, logP, and topological polar surface area (TPSA). RANC was compared with another platform, ORGANIC, where the former outperformed in generating unique structures without sufficient loss of their structure length [77] .

Even RNN was based on the long short-term memory (LSTM) relating to molecules obtained from the ChEMBL database and fed as SMILES strings. This was used to generate a diverse library of molecules for VS. This approach was extended to procure novel molecules toward a particular target, such as targets for the 5-HT2A receptor, Staphylococcus aureus , and Plasmodium falciparum [78] .

Popova et al. developed the Reinforcement Learning for Structural Evolution strategy for de novo drug synthesis, which involves generative and predictive DNNs to develop new compounds. In this, the generative model produces more unique molecules in terms of SMILE strings based on a stack memory, whereas the predictive models are used to forecast the properties of the developed compound [79] . Merk et al. also exploited the generative AI model to design retinoid X and PPAR agonist molecules, with desired therapeutic effects without requiring complex rules. The authors successfully designed five molecules, four out of which have shown good modulatory activity in cell assays, thereby emphasizing the use of generative AI in new molecule synthesis [80] . The involvement of AI in the de novo design of molecules can be beneficial to the pharmaceutical sector because of its various advantages, such as providing online learning and simultaneous optimization of the already-learned data as well as suggesting possible synthesis routes for compounds leading to swift lead design and development 78 , 81 .

AI in advancing pharmaceutical product development

The discovery of a novel drug molecule requires its subsequent incorporation in a suitable dosage form with desired delivery characteristics. In this area, AI can replace the older trial and error approach [82] . Various computational tools can resolve problems encountered in the formulation design area, such as stability issues, dissolution, porosity, and so on, with the help of QSPR [83] . Decision-support tools use rule-based systems to select the type, nature, and quantity of the excipients depending on the physicochemical attributes of the drug and operate through a feedback mechanism to monitor the entire process and intermittently modify it [84] .

Guo et al. integrated Expert Systems (ES) and ANN to create a hybrid system for the development of direct-filling hard gelatin capsules of piroxicam in accordance with the specifications of its dissolution profile. The MODEL EXPERT SYSTEM (MES) makes decisions and recommendations for formulation development based on the input parameters. By contrast, ANN uses backpropagation learning to link formulation parameters to the desired response, jointly controlled by the control module, to ensure hassle-free formulation development [82] .

Various mathematical tools, such as computational fluid dynamics (CFD), discrete element modeling (DEM), and the Finite Element Method have been used to examine the influence of the flow property of the powder on the die-filling and process of tablet compression 85 , 86 . CFD can also be utilized to study the impact of tablet geometry on its dissolution profile [87] . The combination of these mathematical models with AI could prove to be of immense help in the rapid production of pharmaceutical products.

AI in pharmaceutical manufacturing

With the increasing complexities of manufacturing processes along with increasing demand for efficiency and better product quality, modern manufacturing systems are trying to confer human knowledge to machines, continuously changing the manufacturing practice [88] . The incorporation of AI in manufacturing can prove to be a boost for the pharmaceutical industry. Tools, such as CFD, uses Reynolds-Averaged Navier-Stokes solvers technology that studies the impact of agitation and stress levels in different equipment (e.g., stirred tanks), exploiting the automation of many pharmaceutical operations. Similar systems, such as direct numerical simulations and large eddy simulations, involve advanced approaches to solve complicated flow problems in manufacturing [85] .

The novel Chemputer platform helps digital automation for the synthesis and manufacturing of molecules, incorporating various chemical codes and operating by using a scripting language known as Chemical Assembly [23] . It has been successfully used for the synthesis and manufacture of sildenafil, diphenhydramine hydrochloride, and rufinamide, with the yield and purity significantly similar to manual synthesis [89] . The estimated completion of granulation in granulators of capacities ranging from 25 to 600 l can be done efficiently by AI technologies [90] . The technology and neuro-fuzzy logic correlated critical variables to their responses. They derived a polynomial equation for the prediction of the proportion of the granulation fluid to be added, required speed, and the diameter of the impeller in both geometrically similar and dissimilar granulators [91] .

DEM has been widely utilized in the pharmaceutical industry, such as in studying the segregation of powders in a binary mixture, the effects of varying blade speed and shape, predicting the possible path of the tablets in the coating process, along with analysis of time spent by tablets under the spray zone [85] . ANNs, along with fuzzy models, studied the correlation between machine settings and the problem of capping to reduce tablet capping on the manufacturing line [92] .

Meta-classifier and tablet-classifier are AI tools that help to govern the quality standard of the final product, indicating a possible error in the manufacturing of the tablet [93] . A patent has been filed, demonstrating a system capable of determining the most exquisite combination of drug and dosage regimen for each patient, using a processor receiving patient information, and designs the desired transdermal patch accordingly [94] .

AI in quality control and quality assurance

Manufacturing of the desired product from the raw materials includes a balance of various parameters [93] . Quality control tests on the products, as well as maintenance of batch-to-batch consistency, require manual interference. This might not be the best approach in each case, showcasing the need for AI implementation at this stage [85] . The FDA amended the Current Good Manufacturing Practices (cGMP) by introducing a ‘Quality by Design’ approach to understand the critical operation and specific criteria that govern the final quality of the pharmaceutical product [95] .

Gams et al. used a combination of human efforts and AI, wherein preliminary data from production batches were analyzed and decision trees developed. These were further translated into rules and analyzed by the operators to guide the production cycle in the future [93] . Goh et al. studied the dissolution profile, an indicator of batch-to-batch consistency of theophylline pellets with the aid of ANN, which correctly predicted the dissolution of the tested formulation with an error of <8% [96] .

AI can also be implemented for the regulation of in-line manufacturing processes to achieve the desired standard of the product [95] . ANN-based monitoring of the freeze-drying process is used, which applies a combination of self-adaptive evolution along with local search and backpropagation algorithms. This can be used to predict the temperature and desiccated-cake thickness at a future time point ( t + Δ t ) for a particular set of operating conditions, eventually helping to keep a check on the final product quality [97] .

An automated data entry platform, such as an Electronic Lab Notebook, along with sophisticated, intelligent techniques, can ensure the quality assurance of the product [98] . Also, data mining and various knowledge discovery techniques in the Total Quality Management expert system can be used as valuable approaches in making complex decisions, creating new technologies for intelligent quality control [99] .

AI in clinical trial design

Clinical trials are directed toward establishing the safety and efficacy of a drug product in humans for a particular disease condition and require 6–7 years along with a substantial financial investment. However, only one out of ten molecules entering these trials gain successful clearance, which is a massive loss for the industry [100] . These failures can result from inappropriate patient selection, shortage of technical requirements, and poor infrastructure. However, with the vast digital medical data available, these failures can be reduced with the implementation of AI [101] .

The enrolment of patients takes one-third of the clinical trial timeline. The success of a clinical trial can be ensured by the recruitment of suitable patients, which otherwise leads to ∼86% of failure cases [102] . AI can assist in selecting only a specific diseased population for recruitment in Phase II and III of clinical trials by using patient-specific genome–exposome profile analysis, which can help in early prediction of the available drug targets in the patients selected 19 , 101 . Preclinical discovery of molecules as well as predicting lead compounds before the start of clinical trials by using other aspects of AI, such as predictive ML and other reasoning techniques, help in the early prediction of lead molecules that would pass clinical trials with consideration of the selected patient population [101] .

Drop out of patients from clinical trials accounts for the failure of 30% of the clinical trials, creating additional recruiting requirements for the completion of the trial, leading to a wastage of time and money. This can be avoided by close monitoring of the patients and helping them follow the desired protocol of the clinical trial [102] . Mobile software was developed by AiCure that monitored regular medication intake by patients with schizophrenia in a Phase II trial, which increased the adherence rate of patients by 25%, ensuring successful completion of the clinical trial [19] .

AI in pharmaceutical product management

Ai in market positioning.

Market positioning is the process of creating an identity of the product in the market to attract consumers to buy them, making it an essential element in almost all business strategies for companies to establish their own unique identity 103 , 104 . This approach was used in the marketing of pioneer brand Viagra, where the company targeted it not only for the treatment of men’s erectile dysfunction, but also for other problems affecting quality of life [105] .

With the help of technology and e-commerce as a platform, it has become easier for companies to get a natural recognition of their brand in the public domain. Companies exploit search engines as one of the technological platforms to occupy a prominent position in online marketing and help in the positioning of the product in the market, as also confirmed by the Internet Advertising Bureau. Companies continuously try to rank their websites higher than those of other companies, giving recognition to their brand in a short period [106] .

Other techniques, such as statistical analysis methods, particle swarm optimization algorithms (proposed by Eberhart and Kennedy in 1995) in combination with NNs, provided a better idea about markets. They can help decide the marketing strategy for the product based on accurate consumer-demand prediction [107] .

AI in market prediction and analysis

The success of a company lies in the continuous development and growth of its business. Even with access to substantial funds, R&D output in the pharmaceutical industry is falling because of the failure of companies to adopt new marketing technologies [108] . The advances in digital technologies, referred to as the ‘Fourth industrial revolution’, is helping innovative digitalized marketing via a multicriteria decision-making approach, which collects and analyzes statistical and mathematical data and implements human inferences to make AI-based decision-making models explore new marketing methodology [109] .

AI also helped in a comprehensive analysis of the fundamental requirements of a product from the customer’s point of view as well as understanding the need of the market, which aid in decision-making using prediction tools. It can also forecast sales and analyze the market. AI-based software engages consumers and creates awareness among physicians by displaying advertisements directing them to the product site by just a click [110] . In addition, these methods use natural language-processing tools to analyze keywords entered by customers and relate them to the probability of purchasing the product 111 , 112 .

Several businesses to business (B2B) companies have announced self-service technologies that allow free browsing of health products, easily found by giving its specification, place orders, and track their shipping. Pharmaceutical companies are also introducing their online applications such as 1 mg, Medline, Netmeds, and Ask Apollo, to fulfill the unmet needs of the patients [109] . Prediction of the market is also essential for various pharmaceutical distribution companies, which can implement AI in the field, such as ‘Business intelligent Smart Sales Prediction Analysis’, which uses a combination of time series forecasting and real-time application. This helps pharmaceutical companies to predict the sale of products in advance to prevent costs of excess stock or prevent customer loss because of shortages [113] .

AI in product cost

Based on the market analysis and cost incurred in the development of the pharmaceutical product, the company determines the final price of the product. The critical concept in applying AI to determine this price is harnessing its ability to mimic the thinking of a human expert to assess the factors that control the pricing of a product after its manufacture [114] . Factors, such as expenditure during research and development of the drug, strict price regulatory schemes in the concerned country, length of the exclusivity period, market share of the innovated drug after a year before are patent expiry, price of the reference product, and price-fixing policies determine the price of branded and generic drugs [115] .

In ML, large sets of statistical data, such as product development cost, product demand in the market, inventory cost, manufacturing cost, and competitors’ product price, are analyzed by the software, subsequently developing algorithms for predicting the product price. AI platforms, such as In competitor, launched by Intelligence Node (founded in the year 2012), is a complete retail competitive intelligence platform that analyzes the competitor pricing data and helps retailers and brands to monitor the competition. Wise Athena and Navetti PricePoint enable the user to determine the pricing of their product, suggesting that pharmaceutical companies can adopt the same to assist product costing [116] .

AI-based advanced applications

Ai-based nanorobots for drug delivery.

Nanorobots comprise mainly integrated circuits, sensors, power supply, and secure backup of data, which are maintained via computational technologies, such as AI 117 , 118 . They are programmed to avoid the collision, target identification, detect and attach, and finally excretion from the body. Advances in nano/microrobots give them the ability to navigate to the targeted site based on physiological conditions, such as pH, thus improving the efficacy and reducing systemic adverse effects [118] . Development of implantable nanorobots developed for controlled delivery of drugs and genes requires consideration of parameters such as dose adjustment, sustained release, and control release, and the release of the drugs requires automation controlled by AI tools, such as NNs, fuzzy logic, and integrators [119] . Microchip implants are used for programmed release as well as to detect the location of the implant in the body.

AI in combination drug delivery and synergism/antagonism prediction

Several combinations of drugs are approved and marketed to treat complex diseases, such as TB and cancer, because they can provide a synergistic effect for quick recovery 120 , 121 . The selection of precise and potential drugs for combination requires high-throughput screening of a considerable number of drugs, making the process tedious; for example, cancer therapy requires six or seven drugs as a combination therapy. ANNs, logistic regression, and network-based modeling can screen drug combinations and improve overall dose regimen 120 , 122 . Rashid et al. developed a quadratic phenotype optimization platform for the detection of optimal combination therapy for the treatment of bortezomib-resistant multiple myeloma using a collection of 114 FDA-approved drugs. This model recommended the combination of decitabine (Dec) and mitomycin C (MitoC) as the best two-drug combination and Dec, MitoC, and mechlorethamine as the superior three-drug combination [121] .

Combination drug delivery can be more efficient if backed up by data on the synergism or antagonism of drugs administered together. The Master Regulator Inference Algorithm used ‘Mater regulator genes’ to efficiently predict 56% synergism. Other methods, such as Network-based Laplacian regularized least square synergistic drug combination, and RF, can also be used for the same [122] .

Li et al. developed a synergistic drug combination model using RF for the prediction of synergistic anticancer drug combinations. This model was formed based on gene expression profiles and various networks, and the authors successfully predicted 28 synergistic anticancer combinations. They have reported three such combinations, although the remainder might also prove to be important [69] . Similarly, Mason et al. applied an ML approach, called the Combination Synergy Estimation, to predict potential synergistic antimalarial combinations based on a data set of 1540 antimalarial drug compounds [123] .

AI emergence in nanomedicine

Nanomedicines use nanotechnology and medicines for the diagnosis, treatment, and monitoring of complex diseases, such as HIV, cancer, malaria, asthma, and various inflammatory diseases. In recent years, nanoparticle-modified drug delivery has become important in the field of therapeutics and diagnostics because they have enhanced efficacy and treatment 121 , 124 . A combination of nanotechnology and AI could provide solutions to many problems in formulation development [125] .

A methotrexate nanosuspension was computationally formulated by studying the energy generated on the interaction between the drug molecules, monitoring the conditions that could lead to the aggregation of the formulation [83] . Coarse-grained simulation, along with chemical calculation, can aid the determination of drug–dendrimer interactions and evaluation of drug encapsulation within the dendrimer. In addition, software such as LAMMPS and GROMACS 4 can be used to examine the impact of surface chemistry on the internalization of nanoparticles into cells [83] .

AI assisted the preparation of silicasomes, which is a combination of iRGD, a tumor-penetrating peptide, and irinotecan-loaded multifunctional mesoporous silica nanoparticles. This increased the uptake of silicasomes three–fourfold because iRGD improves the transcytosis of silicasomes, with improved treatment outcome and enhanced overall survival [124] .

Pharmaceutical market of AI

To decrease the financial cost and chances of failures that accompany VS, pharmaceutical companies are shifting towards AI. There was an increase in the AI market from US$200 million in 2015 to US$700 million in 2018, and is expected to increase to $5 billion by 2024 [126] . A 40% projected growth from 2017 to 2024 indicates that AI will likely revolutionize the pharmaceutical and medical sectors. Various pharmaceutical companies have made and are continuing to invest in AI and have collaborated with AI companies to developed essential healthcare tools. The collaboration of DeepMind Technologies, a subsidiary of Google, with the Royal Free London NHS Foundation Trust for the assistance of acute kidney injury, is an example of this. Major pharmaceutical companies and AI players are detailed in Figure 4 [19] .

Figure 4

Leading pharmaceutical companies and their association with Artificial Intelligence (AI) organizations that are working in fields including oncology, cardiovascular diseases, and central nervous system disorders.

Ongoing challenges in adopting AI: leads on ways to overcome

The entire success of AI depends on the availability of a substantial amount of data because these data are used for the subsequent training provided to the system. Access to data from various database providers can incur extra costs to a company, and the data should also be reliable and high quality to ensure accurate result prediction. Other challenges that prevent full-fledged adoption of AI in the pharmaceutical industry include the lack of skilled personnel to operate AI-based platforms, limited budget for small organizations, apprehension of replacing humans leading to job loss, skepticism about the data generated by AI, and the black box phenomenon (i.e., how the conclusions are reached by the AI platform) [6] .

Automation of certain tasks in drug development, manufacturing, and supply chains, clinical trials, and sales will take place with time, but these all fall under the category of ‘narrow AI’; where AI has to be trained using a large volume of data and, thus, makes it suitable for a particular task. Therefore, human intervention is mandatory for the successful implementation, development, and operation of the AI platform. However, the fear of unemployment could be a myth given that AI is currently is taking over repetitive jobs, while leaving scope for human intelligence to be used for developing more complicated insights and creativity.

Nevertheless, AI has been adopted by several pharmaceutical companies, and it is expected that a revenue of US$2.199 billion will be created by 2022 through AI-based solutions in the pharmaceutical sector, with an investment exceeding US$7.20 billion across 300+ deals between 2013 and 2018 by the pharmaceutical industry [127] . Pharmaceutical organizations need clarity about the potential of AI technology in finding solutions to problems once it has been implemented, along with understanding the reasonable goals that can be achieved. Skilled data scientists, software engineers with a sound knowledge of AI technology, and a clear understanding of the company business target and its R&D goal can be developed to utilize the full potential of the AI platform.

Concluding remarks and prospects

The advancement of AI, along with its remarkable tools, continuously aims to reduce challenges faced by pharmaceutical companies, impacting the drug development process along with the overall lifecycle of the product, which could explain the increase in the number of start-ups in this sector [23] . The current healthcare sector is facing several complex challenges, such as the increased cost of drugs and therapies, and society needs specific significant changes in this area. With the inclusion of AI in the manufacturing of pharmaceutical products, personalized medications with the desired dose, release parameters, and other required aspects can be manufactured according to individual patient need [85] . Using the latest AI-based technologies will not only speed up the time needed for the products to come to the market, but will also improve the quality of products and the overall safety of the production process, and provide better utilization of available resources along with being cost-effective, thereby increasing the importance of automation [128] .

The most significant worry regarding the incorporation of these technologies is the job losses that would follow and the strict regulations needed for the implementation of AI. However, these systems are intended only to make work easier and not to completely replace humans [129] . AI can not only aid quick and hassle-free hit compound identification, but also contribute to suggestions of synthesis routes of these molecules along with the prediction of the desired chemical structure and an understanding of drug–target interactions and its SAR.

AI can also make major contributions to the further incorporation of the developed drug in its correct dosage form as well as its optimization, in addition to aiding quick decision-making, leading to faster manufacturing of better-quality products along with assurance of batch-to-batch consistency. AI can also contribute to establishing the safety and efficacy of the product in clinical trials, as well as ensuring proper positioning and costing in the market through comprehensive market analysis and prediction. Although there are no drugs currently on the market developed with AI-based approaches and specific challenges remain with regards to the implementation of this technology, it is likely that AI will become an invaluable tool in the pharmaceutical industry in the near future.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in the paper.

Acknowledgments

The authors acknowledge the Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Government of India for financial support. R.K.T. would like to acknowledge the Science and Engineering Research Board (Statutory Body Established through an Act of Parliament: SERB Act 2008), Department of Science and Technology, Government of India for a grant (Grant #ECR/2016/001964) and N-PDF for funding (PDF/2016/003329) research in his laboratory.

Biographies

An external file that holds a picture, illustration, etc.
Object name is fx1_lrg.jpg

Rakesh K. Tekade , currently an associate professor in NIPER, Ahmedabad, is an academic-researcher with >10 years of teaching and research experience. Dr Tekade’s research group investigates the design, development, and characterization of targeted nanotechnology-based products for the site-specific delivery of therapeutic drugs, siRNA, miRNA, and so on, for the treatment of cancer, diabetes, arthritis, and neurological disorders. He has coauthored >100 peer-reviewed publications in international journals, contributed >50 international reference book chapters, five invited editorial articles, and four patent applications. Dr Tekade is an editor in chief of a Book Series entitled Advances in Pharmaceutical Product Development and Research Series .

An external file that holds a picture, illustration, etc.
Object name is fx2_lrg.jpg

Debleena Paul received a BSc in pharmacy from Maulana Abul Kalam Azad University of Technology and is currently pursuing her MS Pharm at NIPER-Ahmedabad under the guidance Rakesh K. Tekade. Her research focuses on the development of in situ gelling dusting powder for wound dressing applications.

An external file that holds a picture, illustration, etc.
Object name is fx3_lrg.jpg

Kiran Kalia is a professor of pharmacology and Director of NIPER, Ahmedabad; she is also a professor in lien in the Department of Biosciences, Sardar Patel University. She has research experience spanning over 35 years and has mentored several MSc and PhD students. She was an awardee of the Indian National Science Academy (INSA) Research fellowship and has received several CSIR Fellowships. Professor Kiran is an editorial board and review committee member of several international journals. Her research interests encompass proteomic markers for diabetic nephropathy from urine, genomic markers of the susceptibility of diabetic retinopathy, genomic alterations in oral cancer, and environmental biotechnology & toxicology studies.

IMAGES

  1. A high-throughput drug screening strategy against coronaviruses

    drug screening research papers

  2. High-throughput combination drug screening identified palbociclib-based

    drug screening research papers

  3. (PDF) Virtual Screening in Drug Discovery

    drug screening research papers

  4. IJMS

    drug screening research papers

  5. Fragment & Compound Screening Services

    drug screening research papers

  6. Drug screening strategy. Two primary screens one testing for inhibition

    drug screening research papers

VIDEO

  1. TNPSC || Previous year question papers || 2019 Drug Inspector || part 1

  2. Top Questions Individuals ask about Drug Testing

  3. How to perform the virtual screening in the drug discovery Various challenges in drug discovery

  4. Medical writing bias in myeloma clinical research

  5. Reasonable Suspicion Testing

  6. Mesh MEA for organoid research

COMMENTS

  1. Drug screening

    RSS Feed. Drug screening is the process by which potential drugs are identified and optimized before selection of a candidate drug to progress to clinical trials. It can involve screening large ...

  2. Functional Drug Screening in the Era of Precision Medicine

    Drug screening in three dimensional (3D) models, including patient derived organoids, organs on a chip, xenografts, and 3D-bioprinted models provide a functional medicine perspective and necessary complement to genomic testing. ... We searched PubMed for reviews and original research papers published between 1 January 2015 and 1 February 2022 ...

  3. A review for cell-based screening methods in drug discovery

    Moreover, when applied to large-scale drug screening, chip structures are often complex, involving multiple channels, liquid-controlled pump and valve designs (Liang et al. 2021). ... It plays an outstanding role in drug discovery, cancer research and immunology. Cell-based biosensor systems that use whole cells as a living model have an ...

  4. Current Screening Methodologies in Drug Discovery for Selected Human

    This review paper provides a comprehensive overview of screening methodologies commonly used and the future trends for the discovery of new bioactive compounds in the unmet medical needs in infectious and parasitic diseases, oncology and neurodegenerative diseases. Go to: 2. Methodologies of Screening. 2.1.

  5. Microfluidic trends in drug screening and drug delivery

    The high cost together with lengthy and complicated process of bringing a new drug from conception to the market necessitates improved streamlining of preclinical studies for more efficient evaluation of the efficacy and toxicity of drug candidates [74].Therefore, recent research has focused on developing on-chip microfluidic solutions to improve the drug screening process and bridge the gap ...

  6. Recent Advances on Cell Culture Platforms for In Vitro Drug Screening

    Regarding cancer drug research, the extrapolated biomedical information from 2D cell cultures is subject to more inconsistencies due to the inherently complex nature of the tumor microenvironment. ... and the structural variability of paper fibers that introduces undetermined errors in drug screening. Furthermore, paper is not optically ...

  7. Microfluidic nanodevices for drug sensing and screening applications

    The fabricated low-cost microfluidic device shows a linear range of phenol detection from 10 to 200 nmol L −1 with an LOD of 2.91 nmol L −1 and a high sensitivity of 131 nA μmol L −1. Thus, the analysis of the proposed device shows the realization of phenol detection in simple and fast mode with low cost ( Fig. 6 a).

  8. QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery

    QSAR-Based Virtual Screening vs. High-Throughput Screening. High-throughput screening can rapidly identify large subsets of molecules with desired activity from large screening collections of compounds (10 5 -10 6 compounds) using automated plate-based experimental assays (Mueller et al., 2012).However, the hit rate of HTS ranges between 0.01% and 0.1% and this highlights the frequently ...

  9. Applications of single-cell RNA sequencing in drug discovery and

    Once a gene expression matrix has been generated several methods exist to provide answers to relevant research questions. ... Two recent papers 268,269 ... High-throughput screening (HTS) in drug ...

  10. 12314 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on DRUG SCREENING. Find methods information, sources, references or conduct a literature review on DRUG ...

  11. A systematic review of substance use screening in outpatient behavioral

    Substance use disorders (SUD) pose a substantial societal burden in the United States. In 2020 alone, an estimated 28.3 million people aged 12 or older met criteria for a past-year alcohol use disorder, while 18.4 million people aged 12 or older experienced a past-year illicit drug use disorder [] Risky substance use and SUD are associated with substantial disability and mortality, with an ...

  12. Screening US adults for substance use

    I am a researcher and a primary care addiction medicine provider in New York City, NY, USA. I receive grant funding from the National Institutes of Health, National Institute on Drug Abuse, for research on substance use screening tools and interventions, and I have served as an expert reviewer for the USPSTF draft report on screening for illicit drug use (Evidence Synthesis No 186, published ...

  13. Microfluidics in High-Throughput Drug Screening: Organ-on-a-Chip and

    This review paper explores the cutting-edge landscape of microfluidic-based drug screening platforms, with a specific emphasis on two pioneering approaches: organ-on-a-chip and C. elegans-based chips. ... Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a ...

  14. Drug Testing and Analysis

    Drug Testing and Analysis is a specialist journal covering sports doping, ... Drug Development Research; Drug Testing and Analysis; Electroanalysis; Electrophoresis; ... This paper describes the detection of endogenous dienedione in entire male horses and investigates the possible conjugation of dienedione by UPLC-MS/MS. A population study of ...

  15. Frontiers

    The drug development process is a major challenge in the pharmaceutical industry since it takes a substantial amount of time and money to move through all the phases of developing of a new drug. One extensively used method to minimize the cost and time for the drug development process is computer-aided drug design (CADD). CADD allows better focusing on experiments, which can reduce the time ...

  16. (PDF) Recent Advances in Drug Discovery: Innovative ...

    Drug discovery is a dynamic field constantly evolving with the aim of identifying novel therapeutic agents to combat various diseases. In this review, we present an overview of recent advances in ...

  17. (PDF) Drug Design and Development Review

    The development and validation of analytical methods play important roles in the discovery, development, and manufacture of pharmaceuticals. Method development is the process of proving that an ...

  18. ThinkCyte advances biopharma phenotypic drug discovery programs with

    ThinkCyte today announced the publication of a paper in the March issue of Cell Reports Methods on a new method for high-throughput drug discovery. Together with a joint research group including ...

  19. Drug Testing

    Broadly defined, drug testing uses a biological sample to detect the presence or absence of a drug or its metabolites. This process can be completed in a variety of settings and with a variety of techniques. Many drug screening immunoassays were initially designed for use in the workplace as a drug screening tool for employees. As these tests have become cheaper, more readily available, and ...

  20. DNA test says it can predict opioid addiction risk. Skeptics aren't so

    The Food and Drug Administration approved the AvertD cheek-swab test in December, despite an agency committee of experts voting overwhelmingly, 11-2, against recommending approval.

  21. AI viable alternative to traditional small-molecule drug discovery

    High throughput screening (HTS) is a standard method in pharmaceutical development for identifying bioactive small molecules crucial for drug discovery. However, HTS necessitates testing molecules before their synthesis. Alternatively, computational approaches using AI and machine learning allow for pre-synthesis molecule testing. This method yields results that guide which molecules merit ...

  22. In silico Methods for Identification of Potential Therapeutic Targets

    Comparative Genomics Methods. In the past two decades, whole-cell screening (including large numbers of genetic screening) and in vitro screening of synthetic libraries have been used to identify novel lead compounds with powerful antimicrobial properties [].With the completely sequenced human genome, in addition to the completed genome sequences of numerous bacteria and fungi, the number of ...

  23. Handling and Retention of Bioavailability and Bioequivalence Testing

    Submit written requests for single copies of the guidance to the Division of Drug Information, Center for Drug Evaluation and Research, Food and Drug Administration, 10001 New Hampshire Ave., Hillandale Building, 4th Floor, Silver Spring, MD 20993-0002. Send one self-addressed adhesive label to assist that office in processing your requests.

  24. Ronna McDaniel, TV News and the Trump Problem

    Hosted by Michael Barbaro. Featuring Jim Rutenberg. Produced by Rob Szypko , Rikki Novetsky and Alex Stern. With Stella Tan. Edited by Brendan Klinkenberg, Rachel Quester and Paige Cowett ...

  25. Drug Design and Discovery: Principles and Applications

    Drug discovery is the process through which potential new therapeutic entities are identified, using a combination of computational, experimental, translational, and clinical models (see, e.g., [1,2]).Despite advances in biotechnology and understanding of biological systems, drug discovery is still a lengthy, costly, difficult, and inefficient process with a high attrition rate of new ...

  26. Artificial intelligence in drug discovery and development

    Artificial intelligence-integrated drug discovery and development has accelerated the growth of the pharmaceutical sector, leading to a revolutionary change in the pharma industry. Here, we discuss areas of integration, tools, and techniques utilized in enforcing AI, ongoing challenges, and ways to overcome them.