machine learning based research papers

Subscribe to the PwC Newsletter

Join the community, trending research, assisting in writing wikipedia-like articles from scratch with large language models.

stanford-oval/storm • 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.

LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

We present the LM Transparency Tool (LM-TT), an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models.

Magic Clothing: Controllable Garment-Driven Image Synthesis

We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.

Solving Data Quality Problems with Desbordante: a Demo

mstrutov/desbordante • 27 Jul 2023

However, most existing data profiling systems that focus on complex statistics do not provide proper integration with the tools used by contemporary data scientists.

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis.

MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion

To overcome their inherent incompleteness, multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given MMKGs, leveraging both structural information from the triples and multi-modal information of the entities.

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Frequently Asked Questions

JMLR Papers

Select a volume number to see its table of contents with links to the papers.

Volume 23 (January 2022 - Present)

Volume 22 (January 2021 - December 2021)

Volume 21 (January 2020 - December 2020)

Volume 20 (January 2019 - December 2019)

Volume 19 (August 2018 - December 2018)

Volume 18 (February 2017 - August 2018)

Volume 17 (January 2016 - January 2017)

Volume 16 (January 2015 - December 2015)

Volume 15 (January 2014 - December 2014)

Volume 14 (January 2013 - December 2013)

Volume 13 (January 2012 - December 2012)

Volume 12 (January 2011 - December 2011)

Volume 11 (January 2010 - December 2010)

Volume 10 (January 2009 - December 2009)

Volume 9 (January 2008 - December 2008)

Volume 8 (January 2007 - December 2007)

Volume 7 (January 2006 - December 2006)

Volume 6 (January 2005 - December 2005)

Volume 5 (December 2003 - December 2004)

Volume 4 (Apr 2003 - December 2003)

Volume 3 (Jul 2002 - Mar 2003)

Volume 2 (Oct 2001 - Mar 2002)

Volume 1 (Oct 2000 - Sep 2001)

Special Topics

Bayesian Optimization

Learning from Electronic Health Data (December 2016)

Gesture Recognition (May 2012 - present)

Large Scale Learning (Jul 2009 - present)

Mining and Learning with Graphs and Relations (February 2009 - present)

Grammar Induction, Representation of Language and Language Learning (Nov 2010 - Apr 2011)

Causality (Sep 2007 - May 2010)

Model Selection (Apr 2007 - Jul 2010)

Conference on Learning Theory 2005 (February 2007 - Jul 2007)

Machine Learning for Computer Security (December 2006)

Machine Learning and Large Scale Optimization (Jul 2006 - Oct 2006)

Approaches and Applications of Inductive Programming (February 2006 - Mar 2006)

Learning Theory (Jun 2004 - Aug 2004)

Special Issues

In Memory of Alexey Chervonenkis (Sep 2015)

Independent Components Analysis (December 2003)

Learning Theory (Oct 2003)

Inductive Logic Programming (Aug 2003)

Fusion of Domain Knowledge with Data for Decision Support (Jul 2003)

Variable and Feature Selection (Mar 2003)

Machine Learning Methods for Text and Images (February 2003)

Eighteenth International Conference on Machine Learning (ICML2001) (December 2002)

Computational Learning Theory (Nov 2002)

Shallow Parsing (Mar 2002)

Kernel Methods (December 2001)

machine learning Recently Published Documents

Total documents.

Latest Documents
Most Cited Documents
Contributed Authors
Related Sources
Related Keywords

An explainable machine learning model for identifying geographical origins of sea cucumber Apostichopus japonicus based on multi-element profile

A comparison of machine learning- and regression-based models for predicting ductility ratio of rc beam-column joints, alexa, is this a historical record.

Digital transformation in government has brought an increase in the scale, variety, and complexity of records and greater levels of disorganised data. Current practices for selecting records for transfer to The National Archives (TNA) were developed to deal with paper records and are struggling to deal with this shift. This article examines the background to the problem and outlines a project that TNA undertook to research the feasibility of using commercially available artificial intelligence tools to aid selection. The project AI for Selection evaluated a range of commercial solutions varying from off-the-shelf products to cloud-hosted machine learning platforms, as well as a benchmarking tool developed in-house. Suitability of tools depended on several factors, including requirements and skills of transferring bodies as well as the tools’ usability and configurability. This article also explores questions around trust and explainability of decisions made when using AI for sensitive tasks such as selection.

Automated Text Classification of Maintenance Data of Higher Education Buildings Using Text Mining and Machine Learning Techniques

Data-driven analysis and machine learning for energy prediction in distributed photovoltaic generation plants: a case study in queensland, australia, modeling nutrient removal by membrane bioreactor at a sewage treatment plant using machine learning models, big five personality prediction based in indonesian tweets using machine learning methods.

<span lang="EN-US">The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on personality prediction analysis of English texts, but Indonesian has received scant attention. Therefore, this research aims to predict user’s personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques, including <a name="_Hlk87278444"></a>naive Bayes (NB), K-nearest neighbors (KNN), and support vector machine (SVM), based on semantic features including emotion, sentiment, and publicly available Twitter profile. We predict the personality based on the big five personality model, the most appropriate model for predicting user personality in social media. We examine the relationships between the semantic features and the Big Five personality dimensions. The experimental results indicate that the Big Five personality exhibit distinct emotional, sentimental, and social characteristics and that SVM outperformed NB and KNN for Indonesian. In addition, we observe several terms in Indonesian that specifically refer to each personality type, each of which has distinct emotional, sentimental, and social features.</span>

Compressive strength of concrete with recycled aggregate; a machine learning-based evaluation

Temperature prediction of flat steel box girders of long-span bridges utilizing in situ environmental parameters and machine learning, computer-assisted cohort identification in practice.

The standard approach to expert-in-the-loop machine learning is active learning, where, repeatedly, an expert is asked to annotate one or more records and the machine finds a classifier that respects all annotations made until that point. We propose an alternative approach, IQRef , in which the expert iteratively designs a classifier and the machine helps him or her to determine how well it is performing and, importantly, when to stop, by reporting statistics on a fixed, hold-out sample of annotated records. We justify our approach based on prior work giving a theoretical model of how to re-use hold-out data. We compare the two approaches in the context of identifying a cohort of EHRs and examine their strengths and weaknesses through a case study arising from an optometric research problem. We conclude that both approaches are complementary, and we recommend that they both be employed in conjunction to address the problem of cohort identification in health research.

Export Citation Format

Share document.

Machine Learning

Reports substantive results on a wide range of learning methods applied to various learning problems.
Provides robust support through empirical studies, theoretical analysis, or comparison to psychological phenomena.
Demonstrates how to apply learning methods to solve significant application problems.
Improves how machine learning research is conducted.
Prioritizes verifiable and replicable supporting evidence in all published papers.
Hendrik Blockeel

Latest issue

Volume 113, Issue 4

Latest articles

Coresets for kernel clustering.

Shaofeng H. -C. Jiang
Robert Krauthgamer

From MNIST to ImageNet and back: benchmarking continual curriculum learning

Kamil Faber
Dominik Zurek
Roberto Corizzo

Reversible jump attack to textual classifiers with modification reduction

A survey on interpretable reinforcement learning

Claire Glanois

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Tomás Gutierrez
Davi Valladão
Bernardo K. Pagnoncelli

Journal updates

Cfp: discovery science 2023.

Submission Deadline: March 4, 2024

Guest Editors: Rita P. Ribeiro, Albert Bifet, Ana Carolina Lorena

CfP: IJCLR Learning and reasoning

Call for papers: conformal prediction and distribution-free uncertainty quantification.

Submission Deadline: January 7th, 2024

Guest Editors: Henrik Boström, Eyke Hüllermeier, Ulf Johansson, Khuong An Nguyen, Aaditya Ramdas

Call for Papers: DSAA 2024 Journal Track with Machine Learning Journal

Guest Editors: Longbing Cao, David C. Anastasiu, Qi Zhang, Xiaolin Huang,

Journal information

ACM Digital Library
Current Contents/Engineering, Computing and Technology
EI Compendex
Google Scholar
Japanese Science and Technology Agency (JST)
Mathematical Reviews
OCLC WorldCat Discovery Service
Science Citation Index Expanded (SCIE)
TD Net Discovery Service
UGC-CARE List (India)

Rights and permissions

Springer policies

Find a journal
Publish with us
Track your research

Analytics Insight

Top 10 Machine Learning Research Papers of 2021

Machine learning research papers showcasing the transformation of the technology

Unbiased gradient estimation in unrolled computation graphs with persistent evolution, solving high-dimensional parabolic pdes using the tensor train format.

TOP 10 MACHINE LEARNING TOOLS 2021
TOP COMPANIES USING MACHINE LEARNING IN A PROFITABLE WAY
MACHINE LEARNING GUIDE: DIFFERENCES BETWEEN PYTHON AND JAVA

Oops I took a gradient: Scalable sampling for discrete distributions

Optimal complexity in decentralized training, understanding self-supervised learning dynamics without contrastive pairs, how transferable are featured in deep neural networks, do we need hundreds of classifiers to solve real-world classification problems, knowledge vault: a web-scale approach to probabilistic knowledge fusion, scalable nearest neighbor algorithms for high dimensional data, trends in extreme learning machines.

Disclaimer: Any financial and crypto market information given on Analytics Insight are sponsored articles, written for informational purpose only and is not an investment advice. The readers are further advised that Crypto products and NFTs are unregulated and can be highly risky. There may be no regulatory recourse for any loss from such transactions. Conduct your own research by contacting financial experts before making any investment decisions. The decision to read hereinafter is purely a matter of choice and shall be construed as an express undertaking/guarantee in favour of Analytics Insight of being absolved from any/ all potential legal action, or enforceable claims. We do not represent nor own any cryptocurrency, any complaints, abuse or concerns with regards to the information provided shall be immediately informed here .

Can Robots Learn Complicated Tasks from Few Demonstrations?

Top 5 Metaverse Activations You Need to Know about Right Now!

10 Types of Cyberattacks that Will Take New Shape in 2023

Bitcoin Price Prediction: BTC to surpass 2021 high in November, Retik Finance (RETIK) at $0.09 eyes $30

Analytics Insight® is an influential platform dedicated to insights, trends, and opinion from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.

Select Language:
Privacy Policy
Content Licensing
Terms & Conditions
Submit an Interview

Special Editions

Dec – Crypto Weekly Vol-1
40 Under 40 Innovators
Women In Technology
Market Reports
AI Glossary
Infographics

Latest Issue

Disclaimer: Any financial and crypto market information given on Analytics Insight is written for informational purpose only and is not an investment advice. Conduct your own research by contacting financial experts before making any investment decisions, more information here .

Second Menu

Data Science
Quantum Computing

Miscellaneous

A Comprehensive Guide on RTMP Streaming

Blockchain booms, risks loom: the ai rescue mission in smart contract auditing, developing incident response plans for insider threats, weis wave: revolutionizing market analysis, top machine learning (ml) research papers released in 2022.

For every Machine Learning (ML) enthusiast, we bring you a curated list of the major breakthroughs in ML research in 2022.

Machine learning (ML) is gaining much traction in recent years owing to the disruption and development it brings in enhancing existing technologies. Every month, hundreds of ML papers from various organizations and universities get uploaded on the internet to share the latest breakthroughs in this domain. As the year ends, we bring you the Top 22 ML research papers of 2022 that created a huge impact in the industry. The following list does not reflect the ranking of the papers, and they have been selected on the basis of the recognitions and awards received at international conferences in machine learning.

Bootstrapped Meta-Learning

Meta-learning is a promising field that investigates ways to enable machine learners or RL agents (which include hyperparameters) to learn how to learn in a quicker and more robust manner, and it is a crucial study area for enhancing the efficiency of AI agents.

This 2022 ML paper presents an algorithm that teaches the meta-learner how to overcome the meta-optimization challenge and myopic meta goals. The algorithm’s primary objective is meta-learning using gradients, which ensures improved performance. The research paper also examines the potential benefits due to bootstrapping. The authors highlight several interesting theoretical aspects of this algorithm, and the empirical results achieve new state-of-the-art (SOTA) on the ATARI ALE benchmark as well as increased efficiency in multitask learning.

Competition-level code generation with AlphaCode

One of the exciting uses for deep learning and large language models is programming. The rising need for coders has sparked the race to build tools that can increase developer productivity and provide non-developers with tools to create software. However, these models still perform badly when put to the test on more challenging, unforeseen issues that need more than just converting instructions into code.

The popular ML paper of 2022 introduces AlphaCode, a code generation system that, in simulated assessments of programming contests on the Codeforces platform, averaged a rating in the top 54.3%. The paper describes the architecture, training, and testing of the deep-learning model.

Restoring and attributing ancient texts using deep neural networks

The epigraphic evidence of the ancient Greek era — inscriptions created on durable materials such as stone and pottery — had already been broken when it was discovered, rendering the inscribed writings incomprehensible. Machine learning can help in restoring, and identifying chronological and geographical origins of damaged inscriptions to help us better understand our past.

This ML paper proposed a machine learning model built by DeepMind, Ithaca, for the textual restoration and geographical and chronological attribution of ancient Greek inscriptions. Ithaca was trained on a database of just under 80,000 inscriptions from the Packard Humanities Institute. It had a 62% accuracy rate compared to historians, who had a 25% accuracy rate on average. But when historians used Ithaca, they quickly achieved a 72% accuracy.

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Large neural networks use more resources to train hyperparameters since each time, the network must estimate which hyperparameters to utilize. This groundbreaking ML paper of 2022 suggests a novel zero-shot hyperparameter tuning paradigm for more effectively tuning massive neural networks. The research, co-authored by Microsoft Research and OpenAI, describes a novel method called µTransfer that leverages µP to zero-shot transfer hyperparameters from small models and produces nearly perfect HPs on large models without explicitly tuning them.

This method has been found to reduce the amount of trial and error necessary in the costly process of training large neural networks. By drastically lowering the need to predict which training hyperparameters to use, this approach speeds up research on massive neural networks like GPT-3 and perhaps its successors in the future.

PaLM: Scaling Language Modeling with Pathways

Large neural networks trained for language synthesis and recognition have demonstrated outstanding results in various tasks in recent years. This trending 2022 ML paper introduced Pathways Language Model (PaLM), a 780 billion high-quality text token, and 540 billion parameter-dense decoder-only autoregressive transformer.

Although PaLM just uses a decoder and makes changes like SwiGLU Activation, Parallel Layers, Multi-Query Attention, RoPE Embeddings, Shared Input-Output Embeddings, and No Biases and Vocabulary, it is based on a typical transformer model architecture. The paper describes the company’s latest flagship surpassing several human baselines while achieving state-of-the-art in numerous zero, one, and few-shot NLP tasks.

Robust Speech Recognition via Large-Scale Weak Supervision

Machine learning developers have found it challenging to build speech-processing algorithms that are trained to predict a vast volume of audio transcripts on the internet. This year, OpenAI released Whisper , a new state-of-the-art (SotA) model in speech-to-text that can transcribe any audio to text and translate it into several languages. It has received 680,000 hours of training on a vast amount of voice data gathered from the internet. According to OpenAI, this model is robust to accents, background noise, and technical terminology. Additionally, it allows transcription into English from 99 different languages and translation into English from those languages.

The OpenAI ML paper mentions the author ensured that about one-third of the audio data is non-English. This helped the team outperform other supervised state-of-the-art models by maintaining a diversified dataset.

OPT: Open Pre-trained Transformer Language Models

Large language models have demonstrated extraordinary performance f on numerous tasks (e.g., zero and few-shot learning). However, these models are difficult to duplicate without considerable funding due to their high computing costs. Even while the public can occasionally interact with these models through paid APIs, complete research access is still only available from a select group of well-funded labs. This limited access has hindered researchers’ ability to comprehend how and why these language models work, which has stalled progress on initiatives to improve their robustness and reduce ethical drawbacks like bias and toxicity.

The popular 2022 ML paper introduces Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers with 125 million to 175 billion parameters that the authors want to share freely and responsibly with interested academics. The biggest OPT model, OPT-175B (it is not included in the code repository but is accessible upon request), which is impressively proven to perform similarly to GPT-3 (which also has 175 billion parameters) uses just 15% of GPT-3’s carbon footprint during development and training.

A Path Towards Autonomous Machine Intelligence

Yann LeCun is a prominent and respectable researcher in the field of artificial intelligence and machine learning. In June, his much-anticipated paper “ A Path Towards Autonomous Machine Intelligence ” was published on OpenReview. LeCun offered a number of approaches and architectures in his paper that might be combined and used to create self-supervised autonomous machines.

He presented a modular architecture for autonomous machine intelligence that combines various models to operate as distinct elements of a machine’s brain and mirror the animal brain. Due to the differentiability of all the models, they are all interconnected to power certain brain-like activities, such as identification and environmental response. It incorporates ideas like a configurable predictive world model, behavior-driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning.

LaMDA: Language Models for Dialog Applications

Despite tremendous advances in text generation, many of the chatbots available are still rather irritating and unhelpful. This 2022 ML paper from Google describes the LaMDA — short for “Language Model for Dialogue Applications” — system, which caused the uproar this summer when a former Google engineer, Blake Lemoine, alleged that it is sentient. LaMDA is a family of large language models for dialog applications built on Google’s Transformer architecture, which is known for its efficiency and speed in language tasks such as translation. The model’s ability to be adjusted using data that has been human-annotated and the capability of consulting external sources are its most intriguing features.

The model, which has 137 billion parameters, was pre-trained using 1.56 trillon words from publicly accessible conversation data and online publications. The model is also adjusted based on the three parameters of quality, safety, and groundedness.

Privacy for Free: How does Dataset Condensation Help Privacy?

One of the primary proposals in the award-winning ML paper is to use dataset condensation methods to retain data efficiency during model training while also providing membership privacy. The authors argue that dataset condensation, which was initially created to increase training effectiveness, is a better alternative to data generators for producing private data since it offers privacy for free.

Though existing data generators are used to produce differentially private data for model training to minimize unintended data leakage, they result in high training costs or subpar generalization performance for the sake of data privacy. This study was published by Sony AI and received the Outstanding Paper Award at ICML 2022.

TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data

The use of a model that converts time series into anomaly scores at each time step is essential in any system for detecting time series anomalies. Recognizing and diagnosing anomalies in multivariate time series data is critical for modern industrial applications. Unfortunately, developing a system capable of promptly and reliably identifying abnormal observations is challenging. This is attributed to a shortage of anomaly labels, excessive data volatility, and the expectations of modern applications for ultra-low inference times.

In this study , the authors present TranAD, a deep transformer network-based anomaly detection and diagnosis model that leverages attention-based sequence encoders to quickly execute inference while being aware of the more general temporal patterns in the data. TranAD employs adversarial training to achieve stability and focus score-based self-conditioning to enable robust multi-modal feature extraction. The paper mentions extensive empirical experiments on six publicly accessible datasets show that TranAD can perform better in detection and diagnosis than state-of-the-art baseline methods with data- and time-efficient training.

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

In the last few years, generative models called “diffusion models” have been increasingly popular. This year saw these models capture the excitement of AI enthusiasts around the world.

Going ahead of the current text to speech technology of recent times, this outstanding 2022 ML paper introduced the viral text-to-image diffusion model from Google, Imagen. This diffusion model achieves a new state-of-the-art FID score of 7.27 on the COCO dataset by combining the deep language understanding of transformer-based large language models with the photorealistic image-generating capabilities of diffusion models. A text-only frozen language model provides the text representation, and a diffusion model with two super-resolution upsampling stages, up to 1024×2014, produces the images. It employs several training approaches, including classifier-free guiding, to teach itself conditional and unconditional generation. Another important feature of Imagen is the use of dynamic thresholding, which stops the diffusion process from being saturated in specific areas of the picture, a behavior that reduces image quality, particularly when the weight placed on text conditional creation is large.

No Language Left Behind: Scaling Human-Centered Machine Translation

This ML paper introduced the most popular Meta projects of the year 2022: NLLB-200. This paper talks about how Meta built and open-sourced this state-of-the-art AI model at FAIR, which is capable of translating 200 languages between each other. It covers every aspect of this technology: language analysis, moral issues, effect analysis, and benchmarking.

No matter what language a person speaks, accessibility via language ensures that everyone can benefit from the growth of technology. Meta claims that several languages that NLLB-200 translates, such as Kamba and Lao, are not currently supported by any translation systems in use. The tech behemoth also created a dataset called “FLORES-200” to evaluate the effectiveness of the NLLB-200 and show that accurate translations are offered. According to Meta, NLLB-200 offers an average of 44% higher-quality translations than its prior model.

A Generalist Agent

AI pundits believe that multimodality will play a huge role in the future of Artificial General Intelligence (AGI). One of the most talked ML papers of 2022 by DeepMind introduces Gato – a generalist agent . This AGI agent is a multi-modal, multi-task, multi-embodiment network, which means that the same neural network (i.e. a single architecture with a single set of weights) can do all tasks while integrating inherently diverse types of inputs and outputs.

DeepMind claims that the general agent can be improved with new data to perform even better on a wider range of tasks. They argue that having a general-purpose agent reduces the need for hand-crafting policy models for each region, enhances the volume and diversity of training data, and enables continuous advances in the data, computing, and model scales. A general-purpose agent can also be viewed as the first step toward artificial general intelligence, which is the ultimate goal of AGI.

Gato demonstrates the versatility of transformer-based machine learning architectures by exhibiting their use in a variety of applications. Unlike previous neural network systems tailored for playing games, stack blocks with a real robot arm, read words, and caption images, Gato is versatile enough to perform all of these tasks on its own, using only a single set of weights and a relatively simple architecture.

The Forward-Forward Algorithm: Some Preliminary Investigations

AI pioneer Geoffrey Hinton is known for writing paper on the first deep convolutional neural network and backpropagation. In his latest paper presented at NeurIPS 2022, Hinton proposed the “forward-forward algorithm,” a new learning algorithm for artificial neural networks based on our understanding of neural activations in the brain. This approach draws inspiration from Boltzmann machines (Hinton and Sejnowski, 1986) and noise contrast estimation (Gutmann and Hyvärinen, 2010). According to Hinton, forward-forward, which is still in its experimental stages, can substitute the forward and backward passes of backpropagation with two forward passes, one with positive data and the other with negative data that the network itself could generate. Further, the algorithm could simulate hardware more efficiently and provide a better explanation for the brain’s cortical learning process.

Without employing complicated regularizers, the algorithm obtained a 1.4 percent test error rate on the MNIST dataset in an empirical study, proving that it is just as effective as backpropagation.

The paper also suggests a novel “mortal computing” model that can enable the forward-forward algorithm and understand our brain’s energy-efficient processes.

Focal Modulation Networks

In humans, the ciliary muscles alter the shape of the eye and hence the radius of the curvature lens to focus on near or distant objects. Changing the shape of the eye lens, changes the focal length of the lens. Mimicking this behavior of focal modulation in computer vision systems can be tricky.

This machine learning paper introduces FocalNet, an iterative information extraction technique that employs the premise of foveal attention to post-process Deep Neural Network (DNN) outputs by performing variable input/feature space sampling. Its attention-free design outperforms SoTA self-attention (SA) techniques in a wide range of visual benchmarks. According to the paper, focal modulation consists of three parts: According to the paper, focal modulation consists of three parts:

a. hierarchical contextualization, implemented using a stack of depth-wise convolutional layers, to encode visual contexts from close-up to a great distance;

b. gated aggregation to selectively gather contexts for each query token based on its content; and

c. element-wise modulation or affine modification to inject the gathered context into the query.

Learning inverse folding from millions of predicted structures

The field of structural biology is being fundamentally changed by cutting-edge technologies in machine learning, protein structure prediction, and innovative ultrafast structural aligners. Time and money are no longer obstacles to obtaining precise protein models and extensively annotating their functionalities. However, determining a protein sequence from its backbone atom coordinates remained a challenge for scientists. To date, machine learning methods to this challenge have been constrained by the amount of empirically determined protein structures available.

In this ICML Outstanding Paper (Runner Up) , authors explain tackling this problem by increasing training data by almost three orders of magnitude by using AlphaFold2 to predict structures for 12 million protein sequences. With the use of this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers is able to recover native sequence on structurally held-out backbones in 51% of cases while recovering buried residues in 72% of cases. This is an improvement of over 10% over previous techniques. In addition to designing protein complexes, partly masked structures, binding interfaces, and numerous states, the concept generalises to a range of other more difficult tasks.

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Within the AI research community, using video games as a training medium for AI has gained popularity. These autonomous agents have had great success in Atari games, Starcraft, Dota, and Go. Although these developments have gained popularity in the field of artificial intelligence research, the agents do not generalize beyond a narrow range of activities, in contrast to humans, who continually learn from open-ended tasks.

This thought-provoking 2022 ML paper suggests MineDojo, a unique framework for embodied agent research based on the well-known game Minecraft. In addition to building an internet-scale information base with Minecraft videos, tutorials, wiki pages, and forum discussions, Minecraft provides a simulation suite with tens of thousands of open-ended activities. Using MineDojo data, the author proposes a unique agent learning methodology that employs massive pre-trained video-language models as a learnt reward function. Without requiring a dense shaping reward that has been explicitly created, MinoDojo autonomous agent can perform a wide range of open-ended tasks that are stated in free-form language.

Is Out-of-Distribution Detection Learnable?

Machine learning (supervised ML) models are frequently trained using the closed-world assumption, which assumes that the distribution of the testing data will resemble that of the training data. This assumption doesn’t hold true when used in real-world activities, which causes a considerable decline in their performance. While this performance loss is acceptable for applications like product recommendations, developing an out-of-distribution (OOD) identification algorithm is crucial to preventing ML systems from making inaccurate predictions in situations where data distribution in real-world activities typically drifts over time (self-driving cars).

In this paper , authors explore the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem, to study the applicability of OOD detection. They first focus on identifying a prerequisite for OOD detection’s learnability. Following that, they attempt to show a number of impossibility theorems regarding the learnability of OOD detection in a handful yet different scenarios.

Gradient Descent: The Ultimate Optimizer

Gradient descent is a popular optimization approach for training machine learning models and neural networks. The ultimate aim of any machine learning (neural network) method is to optimize parameters, but selecting the ideal step size for an optimizer is difficult since it entails lengthy and error-prone manual work. Many strategies exist for automated hyperparameter optimization; however, they often incorporate additional hyperparameters to govern the hyperparameter optimization process. In this study , MIT CSAIL and Meta researchers offer a unique approach that allows gradient descent optimizers like SGD and Adam to tweak their hyperparameters automatically.

They propose learning the hyperparameters by self-using gradient descent, as well as learning the hyper-hyperparameters via gradient descent, and so on indefinitely. This paper describes an efficient approach for allowing gradient descent optimizers to autonomously adjust their own hyperparameters, which may be layered recursively to many levels. As these gradient-based optimizer towers expand in size, they become substantially less sensitive to the selection of top-level hyperparameters, reducing the load on the user to search for optimal values.

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

Embodied AI is a developing study field that has been influenced by recent advancements in artificial intelligence, machine learning, and computer vision. This method of computer learning makes an effort to translate this connection to artificial systems. The paper proposes ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR allows researchers to sample arbitrarily huge datasets of diverse, interactive, customisable, and performant virtual environments in order to train and assess embodied agents across navigation, interaction, and manipulation tasks.

According to the authors, models trained on ProcTHOR using only RGB images and without any explicit mapping or human task supervision achieve cutting-edge results in 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the ongoing Habitat2022, AI2-THOR Rearrangement2022, and RoboTHOR challenges. The paper received the Outstanding Paper award at NeurIPS 2022.

A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog

Emotion Recognition in Spoken Dialog (ERSD) has recently attracted a lot of attention due to the growth of open conversational data. This is due to the fact that excellent speech recognition algorithms have emerged as a result of the integration of emotional states in intelligent spoken human-computer interactions. Additionally, it has been demonstrated that recognizing emotions makes it possible to track the development of human-computer interactions, allowing for dynamic change of conversational strategies and impacting the result (e.g., customer feedback). But the volume of the current ERSD datasets restricts the model’s development.

This ML paper proposes a Commonsense Knowledge Enhanced Network (CKE-Net) with a retrospective loss to carry out dialog modeling, external knowledge integration, and historical state retrospect hierarchically.

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Enhancing efficiency: the role of data storage in ai systems, from insight to impact: the power of data expanding your business, the ultimate guide to scrape websites for data using web scraping tools, leave a reply cancel reply.

Save my name, email, and website in this browser for the next time I comment.

Most Popular

Analytics Drift strives to keep you updated with the latest technologies such as Artificial Intelligence, Data Science, Machine Learning, and Deep Learning. We are on a mission to build the largest data science community in the world by serving you with engaging content on our platform.

Skip to main content
Skip to primary sidebar
Skip to footer

The Best of Applied Artificial Intelligence, Machine Learning, Automation, Bots, Chatbots

2020’s Top AI & Machine Learning Research Papers

November 24, 2020 by Mariya Yao

Despite the challenges of 2020, the AI research community produced a number of meaningful technical breakthroughs. GPT-3 by OpenAI may be the most famous, but there are definitely many other research papers worth your attention.

For example, teams from Google introduced a revolutionary chatbot, Meena, and EfficientDet object detectors in image recognition. Researchers from Yale introduced a novel AdaBelief optimizer that combines many benefits of existing optimization methods. OpenAI researchers demonstrated how deep reinforcement learning techniques can achieve superhuman performance in Dota 2.

To help you catch up on essential reading, we’ve summarized 10 important machine learning research papers from 2020. These papers will give you a broad overview of AI research advancements this year. Of course, there are many more breakthrough papers worth reading as well.

We have also published the top 10 lists of key research papers in natural language processing and computer vision . In addition, you can read our premium research summaries , where we feature the top 25 conversational AI research papers introduced recently.

Subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new summaries.

If you’d like to skip around, here are the papers we featured:

A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning
Efficiently Sampling Functions from Gaussian Process Posteriors
Dota 2 with Large Scale Deep Reinforcement Learning
Towards a Human-like Open-Domain Chatbot
Language Models are Few-Shot Learners
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
EfficientDet: Scalable and Efficient Object Detection
Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Best AI & ML Research Papers 2020

1. a distributed multi-sensor machine learning approach to earthquake early warning , by kévin fauvel, daniel balouek-thomert, diego melgar, pedro silva, anthony simonet, gabriel antoniu, alexandru costan, véronique masson, manish parashar, ivan rodero, and alexandre termier, original abstract .

Our research aims to improve the accuracy of Earthquake Early Warning (EEW) systems by means of machine learning. EEW systems are designed to detect and characterize medium and large earthquakes before their damaging effects reach a certain location. Traditional EEW methods based on seismometers fail to accurately identify large earthquakes due to their sensitivity to the ground motion velocity. The recently introduced high-precision GPS stations, on the other hand, are ineffective to identify medium earthquakes due to their propensity to produce noisy data. In addition, GPS stations and seismometers may be deployed in large numbers across different locations and may produce a significant volume of data, consequently affecting the response time and the robustness of EEW systems.

In practice, EEW can be seen as a typical classification problem in the machine learning field: multi-sensor data are given in input, and earthquake severity is the classification result. In this paper, we introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, a novel machine learning-based approach that combines data from both types of sensors (GPS stations and seismometers) to detect medium and large earthquakes. DMSEEW is based on a new stacking ensemble method which has been evaluated on a real-world dataset validated with geoscientists. The system builds on a geographically distributed infrastructure, ensuring an efficient computation in terms of response time and robustness to partial infrastructure failures. Our experiments show that DMSEEW is more accurate than the traditional seismometer-only approach and the combined-sensors (GPS and seismometers) approach that adopts the rule of relative strength.

Our Summary

The authors claim that traditional Earthquake Early Warning (EEW) systems that are based on seismometers, as well as recently introduced GPS systems, have their disadvantages with regards to predicting large and medium earthquakes respectively. Thus, the researchers suggest approaching an early earthquake prediction problem with machine learning by using the data from seismometers and GPS stations as input data. In particular, they introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, which is specifically tailored for efficient computation on large-scale distributed cyberinfrastructures. The evaluation demonstrates that the DMSEEW system is more accurate than other baseline approaches with regard to real-time earthquake detection.

What’s the core idea of this paper?

Seismometers have difficulty detecting large earthquakes because of their sensitivity to ground motion velocity.
GPS stations are ineffective in detecting medium earthquakes, as they are prone to producing lots of noisy data.
takes sensor-level class predictions from seismometers and GPS stations (i.e. normal activity, medium earthquake, large earthquake);
aggregates these predictions using a bag-of-words representation and defines a final prediction for the earthquake category.
Furthermore, they introduce a distributed cyberinfrastructure that can support the processing of high volumes of data in real time and allows the redirection of data to other processing data centers in case of disaster situations.

What’s the key achievement?

precision – 100% vs. 63.2%;
recall – 100% vs. 85.7%;
F1 score – 100% vs. 72.7%.
precision – 76.7% vs. 70.7%;
recall – 38.8% vs. 34.1%;
F1 score – 51.6% vs. 45.0%.

What does the AI community think?

The paper received an Outstanding Paper award at AAAI 2020 (special track on AI for Social Impact).

What are future research areas?

Evaluating DMSEEW response time and robustness via simulation of different scenarios in an existing EEW execution platform.
Evaluating the DMSEEW system on another seismic network.

2. Efficiently Sampling Functions from Gaussian Process Posteriors , by James T. Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth

Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model’s success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, wherein quantities of interest are ultimately defined by integrating over posterior distributions. These quantities are frequently intractable, motivating the use of Monte Carlo methods. Despite substantial progress in scaling up Gaussian processes to large training sets, methods for accurately generating draws from their posterior distributions still scale cubically in the number of test locations. We identify a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data. Building off of this factorization, we propose an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time. In a series of experiments designed to test competing sampling schemes’ statistical properties and practical ramifications, we demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.

In this paper, the authors explore techniques for efficiently sampling from Gaussian process (GP) posteriors. After investigating the behaviors of naive approaches to sampling and fast approximation strategies using Fourier features, they find that many of these strategies are complementary. They, therefore, introduce an approach that incorporates the best of different sampling approaches. First, they suggest decomposing the posterior as the sum of a prior and an update. Then they combine this idea with techniques from literature on approximate GPs and obtain an easy-to-use general-purpose approach for fast posterior sampling. The experiments demonstrate that decoupled sample paths accurately represent GP posteriors at a much lower cost.

The introduced approach to sampling functions from GP posteriors centers on the observation that it is possible to implicitly condition Gaussian random variables by combining them with an explicit corrective term.
The authors translate this intuition to Gaussian processes and suggest decomposing the posterior as the sum of a prior and an update.
Building on this factorization, the researchers suggest an efficient approach for fast posterior sampling that seamlessly pairs with sparse approximations to achieve scalability both during training and at test time.
Introducing an easy-to-use and general-purpose approach to sampling from GP posteriors.
avoid many shortcomings of the alternative sampling strategies;
accurately represent GP posteriors at a much lower cost; for example, simulation of a well-known model of a biological neuron required only 20 seconds using decoupled sampling, while the iterative approach required 10 hours.
The paper received an Honorable Mention at ICML 2020.

Where can you get implementation code?

The authors released the implementation of this paper on GitHub .

3. Dota 2 with Large Scale Deep Reinforcement Learning , by Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław “Psyho” Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

The OpenAI research team demonstrates that modern reinforcement learning techniques can achieve superhuman performance in such a challenging esports game as Dota 2. The challenges of this particular task for the AI system lies in the long time horizons, partial observability, and high dimensionality of observation and action spaces. To tackle this game, the researchers scaled existing RL systems to unprecedented levels with thousands of GPUs utilized for 10 months. The resulting OpenAI Five model was able to defeat the Dota 2 world champions and won 99.4% of over 7000 games played during the multi-day showcase.

The goal of the introduced OpenAI Five model is to find the policy that maximizes the probability of winning the game against professional human players, which in practice implies maximizing the reward function with some additional signals like characters dying, resources collected, etc.
While the Dota 2 engine runs at 30 frames per second, the OpenAI Five only acts on every 4th frame.
At each timestep, the model receives an observation with all the information available to human players (approximated in a set of data arrays) and returns a discrete action , which encodes the desired movement, attack, etc.
A policy is defined as a function from the history of observations to a probability distribution over actions that are parameterized as an LSTM with ~159M parameters.
The policy is trained using a variant of advantage actor critic, Proximal Policy Optimization.
The OpenAI Five model was trained for 180 days spread over 10 months of real time.

defeated the Dota 2 world champions in a best-of-three match (2–0);
won 99.4% of over 7000 games during a multi-day online showcase.
Applying introduced methods to other zero-sum two-team continuous environments.

What are possible business applications?

Tackling challenging esports games like Dota 2 can be a promising step towards solving advanced real-world problems using reinforcement learning techniques.

4. Towards a Human-like Open-Domain Chatbot , by Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated.

In contrast to most modern conversational agents, which are highly specialized, the Google research team introduces a chatbot Meena that can chat about virtually anything. It’s built on a large neural network with 2.6B parameters trained on 341 GB of text. The researchers also propose a new human evaluation metric for open-domain chatbots, called Sensibleness and Specificity Average (SSA), which can capture important attributes for human conversation. They demonstrate that this metric correlates highly with perplexity, an automatic metric that is readily available. Thus, the Meena chatbot, which is trained to minimize perplexity, can conduct conversations that are more sensible and specific compared to other chatbots. Particularly, the experiments demonstrate that Meena outperforms existing state-of-the-art chatbots by a large margin in terms of the SSA score (79% vs. 56%) and is closing the gap with human performance (86%).

Despite recent progress, open-domain chatbots still have significant weaknesses: their responses often do not make sense or are too vague or generic.
Meena is built on a seq2seq model with Evolved Transformer (ET) that includes 1 ET encoder block and 13 ET decoder blocks.
The model is trained on multi-turn conversations with the input sequence including all turns of the context (up to 7) and the output sequence being the response.
making sense,
being specific.
The research team discovered that the SSA metric shows high negative correlation (R2 = 0.93) with perplexity, a readily available automatic metric that Meena is trained to minimize.
Proposing a simple human-evaluation metric for open-domain chatbots.
The best end-to-end trained Meena model outperforms existing state-of-the-art open-domain chatbots by a large margin, achieving an SSA score of 72% (vs. 56%).
Furthermore, the full version of Meena, with a filtering mechanism and tuned decoding, further advances the SSA score to 79%, which is not far from the 86% SSA achieved by the average human.
“Google’s “Meena” chatbot was trained on a full TPUv3 pod (2048 TPU cores) for 30 full days – that’s more than $1,400,000 of compute time to train this chatbot model.” – Elliot Turner, CEO and founder of Hyperia .
“So I was browsing the results for the new Google chatbot Meena, and they look pretty OK (if boring sometimes). However, every once in a while it enters ‘scary sociopath mode,’ which is, shall we say, sub-optimal” – Graham Neubig, Associate professor at Carnegie Mellon University .

Lowering the perplexity through improvements in algorithms, architectures, data, and compute.
Considering other aspects of conversations beyond sensibleness and specificity, such as, for example, personality and factuality.
Tackling safety and bias in the models.
further humanizing computer interactions;
improving foreign language practice;
making interactive movie and videogame characters relatable.
Considering the challenges related to safety and bias in the models, the authors haven’t released the Meena model yet. However, they are still evaluating the risks and benefits and may decide otherwise in the coming months.

5. Language Models are Few-Shot Learners , by Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions – something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10× more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

The OpenAI research team draws attention to the fact that the need for a labeled dataset for every new language task limits the applicability of language models. Considering that there is a wide range of possible tasks and it’s often difficult to collect a large labeled training dataset, the researchers suggest an alternative solution, which is scaling up language models to improve task-agnostic few-shot performance. They test their solution by training a 175B-parameter autoregressive language model, called GPT-3 , and evaluating its performance on over two dozen NLP tasks. The evaluation under few-shot learning, one-shot learning, and zero-shot learning demonstrates that GPT-3 achieves promising results and even occasionally outperforms the state of the art achieved by fine-tuned models.

The GPT-3 model uses the same model and architecture as GPT-2, including the modified initialization, pre-normalization, and reversible tokenization.
However, in contrast to GPT-2, it uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, as in the Sparse Transformer .
Few-shot learning , when the model is given a few demonstrations of the task (typically, 10 to 100) at inference time but with no weight updates allowed.
One-shot learning , when only one demonstration is allowed, together with a natural language description of the task.
Zero-shot learning , when no demonstrations are allowed and the model has access only to a natural language description of the task.
On the CoQA benchmark, 81.5 F1 in the zero-shot setting, 84.0 F1 in the one-shot setting, and 85.0 F1 in the few-shot setting, compared to the 90.7 F1 score achieved by fine-tuned SOTA.
On the TriviaQA benchmark, 64.3% accuracy in the zero-shot setting, 68.0% in the one-shot setting, and 71.2% in the few-shot setting, surpassing the state of the art (68%) by 3.2%.
On the LAMBADA dataset, 76.2 % accuracy in the zero-shot setting, 72.5% in the one-shot setting, and 86.4% in the few-shot setting, surpassing the state of the art (68%) by 18%.
The news articles generated by the 175B-parameter GPT-3 model are hard to distinguish from real ones, according to human evaluations (with accuracy barely above the chance level at ~52%).
“The GPT-3 hype is way too much. It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.” – Sam Altman, CEO and co-founder of OpenAI .
“I’m shocked how hard it is to generate text about Muslims from GPT-3 that has nothing to do with violence… or being killed…” – Abubakar Abid, CEO and founder of Gradio .
“No. GPT-3 fundamentally does not understand the world that it talks about. Increasing corpus further will allow it to generate a more credible pastiche but not fix its fundamental lack of comprehension of the world. Demos of GPT-4 will still require human cherry picking.” – Gary Marcus, CEO and founder of Robust.ai .
“Extrapolating the spectacular performance of GPT3 into the future suggests that the answer to life, the universe and everything is just 4.398 trillion parameters.” – Geoffrey Hinton, Turing Award winner .
Improving pre-training sample efficiency.
Exploring how few-shot learning works.
Distillation of large models down to a manageable size for real-world applications.
The model with 175B parameters is hard to apply to real business problems due to its impractical resource requirements, but if the researchers manage to distill this model down to a workable size, it could be applied to a wide range of language tasks, including question answering, dialog agents, and ad copy generation.
The code itself is not available, but some dataset statistics together with unconditional, unfiltered 2048-token samples from GPT-3 are released on GitHub .

6. Beyond Accuracy: Behavioral Testing of NLP models with CheckList , by Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.

The authors point out the shortcomings of existing approaches to evaluating performance of NLP models. A single aggregate statistic, like accuracy, makes it difficult to estimate where the model is failing and how to fix it. The alternative evaluation approaches usually focus on individual tasks or specific capabilities. To address the lack of comprehensive evaluation approaches, the researchers introduce CheckList , a new evaluation methodology for testing of NLP models. The approach is inspired by principles of behavioral testing in software engineering. Basically, CheckList is a matrix of linguistic capabilities and test types that facilitates test ideation. Multiple user studies demonstrate that CheckList is very effective at discovering actionable bugs, even in extensively tested NLP models.

The primary approach to the evaluation of models’ generalization capabilities, which is accuracy on held-out data, may lead to performance overestimation, as the held-out data often contains the same biases as the training data. Moreover, this single aggregate statistic doesn’t help much in figuring out where the NLP model is failing and how to fix these bugs.
The alternative approaches are usually designed for evaluation of specific behaviors on individual tasks and thus, lack comprehensiveness.
CheckList provides users with a list of linguistic capabilities to be tested, like vocabulary, named entity recognition, and negation.
Then, to break down potential capability failures into specific behaviors, CheckList suggests different test types , such as prediction invariance or directional expectation tests in case of certain perturbations.
Potential tests are structured as a matrix, with capabilities as rows and test types as columns.
The suggested implementation of CheckList also introduces a variety of abstractions to help users generate large numbers of test cases easily.
Evaluation of state-of-the-art models with CheckList demonstrated that even though some NLP tasks are considered “solved” based on accuracy results, the behavioral testing highlights many areas for improvement.
helps to identify and test for capabilities not previously considered;
results in more thorough and comprehensive testing for previously considered capabilities;
helps to discover many more actionable bugs.
The paper received the Best Paper Award at ACL 2020, the leading conference in natural language processing.
CheckList can be used to create more exhaustive testing for a variety of NLP tasks.
Such comprehensive testing that helps in identifying many actionable bugs is likely to lead to more robust NLP systems.
The code for testing NLP models with CheckList is available on GitHub .

7. EfficientDet: Scalable and Efficient Object Detection , by Mingxing Tan, Ruoming Pang, Quoc V. Le

Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and EfficientNet backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single-model and single-scale, our EfficientDet-D7 achieves state-of-the-art 52.2 AP on COCO test-dev with 52M parameters and 325B FLOPs, being 4×–9× smaller and using 13×–42× fewer FLOPs than previous detectors. Code is available on https://github.com/google/automl/tree/master/efficientdet .

The large size of object detection models deters their deployment in real-world applications such as self-driving cars and robotics. To address this problem, the Google Research team introduces two optimizations, namely (1) a weighted bi-directional feature pyramid network (BiFPN) for efficient multi-scale feature fusion and (2) a novel compound scaling method. By combining these optimizations with the EfficientNet backbones, the authors develop a family of object detectors, called EfficientDet . The experiments demonstrate that these object detectors consistently achieve higher accuracy with far fewer parameters and multiply-adds (FLOPs).

A weighted bi-directional feature pyramid network (BiFPN) for easy and fast multi-scale feature fusion. It learns the importance of different input features and repeatedly applies top-down and bottom-up multi-scale feature fusion.
A new compound scaling method for simultaneous scaling of the resolution, depth, and width for all backbone, feature network, and box/class prediction networks.
These optimizations, together with the EfficientNet backbones, allow the development of a new family of object detectors, called EfficientDet .
the EfficientDet model with 52M parameters gets state-of-the-art 52.2 AP on the COCO test-dev dataset, outperforming the previous best detector with 1.5 AP while being 4× smaller and using 13× fewer FLOPs;
with simple modifications, the EfficientDet model achieves 81.74% mIOU accuracy, outperforming DeepLabV3+ by 1.7% on Pascal VOC 2012 semantic segmentation with 9.8x fewer FLOPs;
the EfficientDet models are up to 3× to 8× faster on GPU/CPU than previous detectors.
The paper was accepted to CVPR 2020, the leading conference in computer vision.
The high level of interest in the code implementations of this paper makes this research one of the highest-trending papers introduced recently.
The high accuracy and efficiency of the EfficientDet detectors may enable their application for real-world tasks, including self-driving cars and robotics.
The authors released the official TensorFlow implementation of EfficientDet.
The PyTorch implementation of this paper can be found here and here .

8. Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild , by Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi

We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.

The research group from the University of Oxford studies the problem of learning 3D deformable object categories from single-view RGB images without additional supervision. To decompose the image into depth, albedo, illumination, and viewpoint without direct supervision for these factors, they suggest starting by assuming objects to be symmetric. Then, considering that real-world objects are never fully symmetrical, at least due to variations in pose and illumination, the researchers augment the model by explicitly modeling illumination and predicting a dense map with probabilities that any given pixel has a symmetric counterpart. The experiments demonstrate that the introduced approach achieves better reconstruction results than other unsupervised methods. Moreover, it outperforms the recent state-of-the-art method that leverages keypoint supervision.

no access to 2D or 3D ground truth information such as keypoints, segmentation, depth maps, or prior knowledge of a 3D model;
using an unconstrained collection of single-view images without having multiple views of the same instance.
leveraging symmetry as a geometric cue to constrain the decomposition;
explicitly modeling illumination and using it as an additional cue for recovering the shape;
augmenting the model to account for potential lack of symmetry – particularly, predicting a dense map that contains the probability of a given pixel having a symmetric counterpart in the image.
Qualitative evaluation of the suggested approach demonstrates that it reconstructs 3D faces of humans and cats with high fidelity, containing fine details of the nose, eyes, and mouth.
The method reconstructs higher-quality shapes compared to other state-of-the-art unsupervised methods, and even outperforms the DepthNet model, which uses 2D keypoint annotations for depth prediction.

The paper received the Best Paper Award at CVPR 2020, the leading conference in computer vision.
Reconstructing more complex objects by extending the model to use either multiple canonical views or a different 3D representation, such as a mesh or a voxel map.
Improving model performance under extreme lighting conditions and for extreme poses.
The implementation code and demo are available on GitHub .

9. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale , by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer can perform very well on image classification tasks when applied directly to sequences of image patches. When pre-trained on large amounts of data and transferred to multiple recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer attain excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

The authors of this paper show that a pure Transformer can perform very well on image classification tasks. They introduce Vision Transformer (ViT) , which is applied directly to sequences of image patches by analogy with tokens (words) in NLP. When trained on large datasets of 14M–300M images, Vision Transformer approaches or beats state-of-the-art CNN-based models on image recognition tasks. In particular, it achieves an accuracy of 88.36% on ImageNet, 90.77% on ImageNet-ReaL, 94.55% on CIFAR-100, and 77.16% on the VTAB suite of 19 tasks.

When applying Transformer architecture to images, the authors follow as closely as possible the design of the original Transformer designed for NLP.
splitting images into fixed-size patches;
linearly embedding each of them;
adding position embeddings to the resulting sequence of vectors;
feeding the patches to a standard Transformer encoder;
adding an extra learnable ‘classification token’ to the sequence.
Similarly to Transformers in NLP, Vision Transformer is typically pre-trained on large datasets and fine-tuned to downstream tasks.
88.36% on ImageNet;
90.77% on ImageNet-ReaL;
94.55% on CIFAR-100;
97.56% on Oxford-IIIT Pets;
99.74% on Oxford Flowers-102;
77.16% on the VTAB suite of 19 tasks.

The paper is trending in the AI research community, as evident from the repository stats on GitHub .
It is also under review for ICLR 2021 , one of the key conferences in deep learning.
Applying Vision Transformer to other computer vision tasks, such as detection and segmentation.
Exploring self-supervised pre-training methods.
Analyzing the few-shot properties of Vision Transformer.
Exploring contrastive pre-training.
Further scaling ViT.
Thanks to their efficient pre-training and high performance, Transformers may substitute convolutional networks in many computer vision applications, including navigation, automatic inspection, and visual surveillance.
The PyTorch implementation of Vision Transformer is available on GitHub .

10. AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients , by Juntang Zhuang, Tommy Tang, Sekhar Tatikonda, Nicha Dvornek, Yifan Ding, Xenophon Papademetris, James S. Duncan

Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) or accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptive methods are typically the default because of their stability. We propose AdaBelief to simultaneously achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability. The intuition for AdaBelief is to adapt the step size according to the “belief” in the current gradient direction. Viewing the exponential moving average (EMA) of the noisy gradient as the prediction of the gradient at the next time step, if the observed gradient greatly deviates from the prediction, we distrust the current observation and take a small step; if the observed gradient is close to the prediction, we trust it and take a large step. We validate AdaBelief in extensive experiments, showing that it outperforms other methods with fast convergence and high accuracy on image classification and language modeling. Specifically, on ImageNet, AdaBelief achieves comparable accuracy to SGD. Furthermore, in the training of a GAN on Cifar10, AdaBelief demonstrates high stability and improves the quality of generated samples compared to a well-tuned Adam optimizer. Code is available at https://github.com/juntang-zhuang/Adabelief-Optimizer .

The researchers introduce AdaBelief , a new optimizer, which combines the high convergence speed of adaptive optimization methods and good generalization capabilities of accelerated stochastic gradient descent (SGD) schemes. The core idea behind the AdaBelief optimizer is to adapt step size based on the difference between predicted gradient and observed gradient: the step is small if the observed gradient deviates significantly from the prediction, making us distrust this observation, and the step is large when the current observation is close to the prediction, making us believe in this observation. The experiments confirm that AdaBelief combines fast convergence of adaptive methods, good generalizability of the SGD family, and high stability in the training of GANs.

The idea of the AdaBelief optimizer is to combine the advantages of adaptive optimization methods (e.g., Adam) and accelerated SGD optimizers. Adaptive methods typically converge faster, while SGD optimizers demonstrate better generalization performance.
If the observed gradient deviates greatly from the prediction, we have a weak belief in this observation and take a small step.
If the observed gradient is close to the prediction, we have a strong belief in this observation and take a large step.
fast convergence, like adaptive optimization methods;
good generalization, like the SGD family;
training stability in complex settings such as GAN.
In image classification tasks on CIFAR and ImageNet, AdaBelief demonstrates as fast convergence as Adam and as good generalization as SGD.
It outperforms other methods in language modeling.
In the training of a WGAN , AdaBelief significantly improves the quality of generated images compared to Adam.
The paper was accepted to NeurIPS 2020, the top conference in artificial intelligence.
It is also trending in the AI research community, as evident from the repository stats on GitHub .
AdaBelief can boost the development and application of deep learning models as it can be applied to the training of any model that numerically estimates parameter gradient.
Both PyTorch and Tensorflow implementations are released on GitHub.

If you like these research summaries, you might be also interested in the following articles:

GPT-3 & Beyond: 10 NLP Research Papers You Should Read
Novel Computer Vision Research Papers From 2020
AAAI 2021: Top Research Papers With Business Applications
ICLR 2021: Key Research Papers

Enjoy this article? Sign up for more AI research updates.

We’ll let you know when we release more summary articles like this one.

Email Address *
Name * First Last
Natural Language Processing (NLP)
Chatbots & Conversational AI
Computer Vision
Ethics & Safety
Machine Learning
Deep Learning
Reinforcement Learning
Generative Models
Other (Please Describe Below)
What is your biggest challenge with AI research? *

Reader Interactions

About Mariya Yao

Mariya is the co-author of Applied AI: A Handbook For Business Leaders and former CTO at Metamaven. She "translates" arcane technical concepts into actionable business advice for executives and designs lovable products people actually want to use. Follow her on Twitter at @thinkmariya to raise your AI IQ.

May 16, 2021 at 8:13 pm

Merci pour ces informations massives

March 16, 2024 at 10:58 pm

It is perfect time to make a few plans for the longer trrm and it is time to be happy. I’ve learn this submit annd iff I may I desire to counnsel you some interesting things or advice.

Maybe you could write next articles referring to this article. I want to learn more things about it!

Here is my web page … Eleanore

March 21, 2024 at 5:48 pm

2020’s Top AI & Machine Learning Research Papers mytgpczlq http://www.gabu6e0lozi87m5i503901r03g7p5ec4s.org/ [url=http://www.gabu6e0lozi87m5i503901r03g7p5ec4s.org/]umytgpczlq[/url] amytgpczlq

March 25, 2024 at 11:22 pm

I’m excited to see where you’ll go next. illplaywithyou

You must be logged in to post a comment.

About TOPBOTS

Expert Contributors
Terms of Service & Privacy Policy
Contact TOPBOTS

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Springer Nature - PMC COVID-19 Collection

Machine Learning-Based Research for COVID-19 Detection, Diagnosis, and Prediction: A Survey

Yassine meraihi.

1 LIST Laboratory, University of M’Hamed Bougara Boumerdes, Avenue of Independence, 35000 Boumerdes, Algeria

Asma Benmessaoud Gabis

2 Ecole nationale Supérieure d’Informatique, Laboratoire des Méthodes de Conception des Systèmes, BP 68 M, 16309 Oued-Smar, Alger Algeria

Seyedali Mirjalili

3 Centre for Artificial Intelligence Research and Optimisation, Torrens University Australia, Fortitude Valley, Brisbane, QLD 4006 Australia

4 Yonsei Frontier Lab, Yonsei University, Seoul, Korea

Amar Ramdane-Cherif

5 LISV Laboratory, University of Versailles St-Quentin-en-Yvelines, 10-12 Avenue of Europe, 78140 Velizy, France

Fawaz E. Alsaadi

6 Information Technology Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia

The year 2020 experienced an unprecedented pandemic called COVID-19, which impacted the whole world. The absence of treatment has motivated research in all fields to deal with it. In Computer Science, contributions mainly include the development of methods for the diagnosis, detection, and prediction of COVID-19 cases. Data science and Machine Learning (ML) are the most widely used techniques in this area. This paper presents an overview of more than 160 ML-based approaches developed to combat COVID-19. They come from various sources like Elsevier, Springer, ArXiv, MedRxiv, and IEEE Xplore. They are analyzed and classified into two categories: Supervised Learning-based approaches and Deep Learning-based ones. In each category, the employed ML algorithm is specified and a number of used parameters is given. The parameters set for each of the algorithms are gathered in different tables. They include the type of the addressed problem (detection, diagnosis, or detection), the type of the analyzed data (Text data, X-ray images, CT images, Time series, Clinical data,...) and the evaluated metrics (accuracy, precision, sensitivity, specificity, F1-Score, and AUC). The study discusses the collected information and provides a number of statistics drawing a picture about the state of the art. Results show that Deep Learning is used in 79% of cases where 65% of them are based on the Convolutional Neural Network (CNN) and 17% use Specialized CNN. On his side, supervised learning is found in only 16% of the reviewed approaches and only Random Forest, Support Vector Machine (SVM) and Regression algorithms are employed.

Introduction

COVID-19 has led to one of the most disruptive disasters in the current century and is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The health system and economy of a large number of countries have been impacted. As per World Health Organization (WHO) data, there have been 225,024,781 confirmed cases of COVID-19, including 4,636,153 deaths as of 14 September 2021. Immediately, after its outbreak, several studies are conducted to understand the characteristics of this coronavirus.

It is argued that human-to-human transmission of SARS-CoV-2 is typically done via direct contacts and respiratory droplets [ 1 ]. On the other side, the incubation of the infection is estimated to a period of 2–14 days. This helps in controlling it and preventing the spread of COVID-19 is the primary intervention being used. Moreover, studies on clinical forms reveal the presence of asymptomatic carriers in the population and the most affected age groups [ 2 ]. After almost a year in this situation, and the high number of researches conducted in different disciplines to bring a relief, a huge amount of data is generated. Computer science researchers find themselves involved to provide their help. One of the first registered contributions is the visualization of data. The latter was mapped and/or plotted in graphs which allows to: (i) better track the propagation of the virus over the globe in general and country by country in particular (Fig. 1 );

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig1_HTML.jpg

Propagation of COVID-19 over the world

ii) better track the propagation of the pandemic over the time; iii) better estimate the number of confirmed cases and the number of deaths (Fig. (Fig.2a, 2 a, b). Later, more advanced techniques based essentially on Artificial Intelligence (AI) are employed. Bringing AI to go against COVID-19 has served in the prevention and monitoring of infectious patients. In fact, by using geographical coordinates of people, some governments were able to limit their movements and locate people with whom they were in contact. The second aspect in which AI benefits is the ability to classify individuals whether they are affected or not. Finally, AI offers the ability to make a prediction on possible future contaminations. To this purpose, Machine Learning (ML), which is often confused with AI, is precisely used. Beyond the different ML algorithms, Neural Network (NN) is one of the most used to solve real-world problems which gives the emergence of Deep Learning (DL).

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig2_HTML.jpg

Data-visualization for tracking COVID-19 progress

Deep learning is particularly suited to contexts where the data is complex and where there are large datasets available as it is the case with COVID-19.

In this context, the present paper gives an overview of the Machine Learning researches performed to handle COVID-19 data. It specifies for each of them the targeted objectives and the type of data used to achieve them.

To accomplish this study, we use Google scholar by employing the following search strings to build a database of COVID-19 related articles:

COVID-19 detection using Machine learning;
COVID-19 detection using Deep learning;
COVID-19 detection using Artificial intelligence;
COVID-19 diagnosis using Machine learning;
COVID-19 diagnosis using Deep learning;
COVID-19 diagnosis using Artificial intelligence;
COVID-19 prediction using Machine learning;
Deep learning for COVID-19 prediction;
Artificial intelligence for COVID-19 prediction.

We retain all articles in this field which:

Are published in scientific journals;
Propose new algorithms to deal with COVID-19;
Have more than 4 pages;
Are written in English;
Represent complete versions when several are available;
Do not report the statistical tests used to assess the significance of the presented results.
Do not report details on the source of their data sets.

The result is impressive. In fact, since February 2020, several papers are published in this area every month. As we can see in Fig. 3 , India and China seem to be those having the highest number of COVID-19 publications. However, many other countries showed a strong activity in the number of contributions. This is expected as the situation affects the entire world. The different papers appeared from various well-known publishers such as IEEE, Elsevier, Springer, ArXIv and many others as shown in Fig. 4 .

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig3_HTML.jpg

Number of COVID-19 published articles by countries

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig4_HTML.jpg

Percentage of identified COVID-19 papers in different scientific publishers

In this paper, the surveyed approaches are presented according to the Machine Learning classification given in Fig. 8 . Techniques highlighted in yellow color are those employed in the different propositions to go against COVID-19. We show that most of them are based on Convolutional Neural Networks (CNN) which allows making Deep Learning. Almost half of these techniques use X-ray images. Nevertheless, several other data sources are used at different proportions as shown in Fig. 5 . They include Computed Tomography (CT) images, Text data, Time series, Sounds, Coughing/Breathing videos, and even Blood Samples world cloud of the works we have summarized, reviewed, and analyzed in this paper can be seen in Fig. 6 .

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig5_HTML.jpg

Proportion of the different data sources used in COVID-19 publications

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig6_HTML.jpg

A world cloud of the works we have summarized, reviewed, and analyzed in this paper

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig8_HTML.jpg

Classification of Machine Learning Algorithms

There are similar surveys on AI and COVID-19 (e.g. in the works of Rasheed et al. [ 3 ], Shah et al. [ 4 ], Mehta et al. [ 5 ], Shinde et al. [ 6 ] and Chiroma et al. [ 7 ]). What makes this survey different is the focus on specialized Machine Learning techniques proposed globally to detect, diagnose, and predict COVID-19.

The remainder of this paper is organized as follows. In the second section, the definition of Deep Learning and its connection with AI and Machine Learning is given with descriptions of the most used algorithms. The third section presents a classification of the different approaches proposed to deal with COVID-19. They are illustrated by multiple tables highlighting the most important parameters of each of them. The fourth section discusses the results revealed from the conducted study in regard to the techniques used and their evaluation. It notes the limitations encountered and possible solutions to overcome them. The last section concludes the present article.

Artificial Intelligence, Machine Learning and Deep Learning

Artificial Intelligence (AI) as it is traditionally known is considered weak. Making it stronger results in making it capable of reproducing human behavior with consciousness, sensitivity and spirit. The appearance of Machine Learning (ML) was the means that made it possible to take a step towards achieving this objective. By definition, Machine Learning is a subfield of AI concerned with giving computers the ability to learn without being explicitly programmed. It is based on the principle of reproducing a behavior thanks to algorithms, themselves fed by a large amount of data. Faced with many situations, the algorithm learns which decision to make and creates a model. The machine can therefore automate the tasks according to the situations. The general process to carry out a Machine Learning requires a training dataset, a test dataset and an algorithm to generate a predictive model (Fig. 7 ). Four types of ML can be distinguished as we can see in Fig. 8 .

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig7_HTML.jpg

Machine learning prediction process

Supervised Learning

It is a form of machine learning that falls under artificial intelligence. The idea is to “guide” the algorithm on the way of learning based on pre-labeled examples of expected results. Artificial intelligence then learns from each example by adjusting its parameters to reduce the gap between the results obtained and the expected ones. The margin of error is thus reduced over the training sessions, with the aim of being able to generalize learning in the objective to predict the result of new cases [ 8 , 9 ]. The output is called classification if labels are like discrete classes or regression if they are like continuous quantities. Within each category, there exists several algorithms [ 10 , 11 ]. We define below those which was applied in the detection/prediction of COVID-19.

Linear Regression

Linea regression can be considered as one of the most conventional machine learning techniques [ 12 ], in which the best fit line/hyperplane for the available training data is determined using the minimum mean squared error function. This algorithm considers the predictive function as linear. Its general form is as follows: Y = a ∗ X + b + ϵ with a and b two constants. Y is the variable to be predicted, X the variable used to predict, a is the slope of the regression and b is the intercept, that is, the value of Y when X is zero.

Logistic Regression

Despite its name, Logistic Regression [ 13 ] can be employed to perform regression as classification. It is based on the sigmoid predictive function defined as: h ( z ) = 1 1 + e - z where z is a linear function. The function returns a probability score P between 0 and 1. In order to map this to two discrete classes ( 0 or 1), a threshold value θ is fixed. The predicted class is equal to 1 if P ≥ θ , to 0 otherwise.

Support Vector Machine (SVM)

Similar to the previously defined algorithms, the idea behind SVM [ 14 , 15 ] is to distinctly classifies data points by finding an hyperplane in an N-dimensional space. Since there are several possibilities to choose the hyperplane, in SVM a margin distance is calculated between data points of the two classes to separate. The objective is to maximize the value of this margin to get a clear decision boundary helping in the classification of future data points.

Decision Tree

A Decision Tree [ 16 ] is an algorithm that seeks to partition the individuals into groups of individuals as similar as possible from the point of view of the variable to be predicted. The result of the algorithm produces a tree that reveals hierarchical relationships between the variables. An iterative process is used where at each iteration a sub-population of individuals is obtained by choosing the explanatory variable which allows the best separation of individuals. The algorithm stops when no more split is possible.

Random Forest Algorithms

Random Forest Algorithms are methods that provide predictive models for classification and regression [ 17 , 18 ]. They are composed of a large number of Decision Tree blocks used as individual predictors. The fundamental idea behind the method is that instead of trying to get an optimized method all at once, several predictors are generated and their different predictions are pooled. The final predicted class is the one having the most votes.

Artificial Neural Network (ANN)

Artificial Neural Networks is a popular Supervised classification algorithm trying to mimic the way human brain works. It is often used whenever there is abundant labeled training data with many features [ 19 ]. The network calculates from the input a score (or a probability) to belong to each class. The class assigned to the input object corresponds to the one with the highest score. A Neural Network is a system made up of neurons. It is divided into several layers connected to each other where the output of one layer corresponds to the input of the next one [ 20 , 21 ]. The calculation of the final score is based on the calculation of a linear function from the layers weights and an activation function. The weights values are randomly assigned to each input at the beginning and then are learned (updated) by backpropagation of the gradient to minimize the loss function associated with the final layer. The optimization is done with a gradient descent technique [ 22 ].

Unsupervised Learning

Unsupervised learning is a type of self-organized learning that learns and creates models from unlabeled training datasets (unlike Supervised Learning). There are two practices in Unsupervised Learning. The first one is the clustering, which is the fact of gathering similar data in homogeneous groups. It is performed by applying one of the many existing clustering algorithms [ 23 ]: K-means, Hierarchical clustering, Hidden Markov, etc. The second practice is the dimensionality reduction [ 24 ] which consists of the reduction of features in highly dimensional data. The purpose is to extract new features and to find the best linear transformation representing maximum data points by guaranteeing a minimum loss of information.

Deep Learning

As illustrated in Fig. 9 , Deep Learning [ 25 , 26 ] is a branch of AI that focuses on creating large Neural Network models that are capable of making decision based on Machine Learning models, it is a Neural Networks with many hidden neural layers. Indeed, it has been observed that the addition of layers of neurons has a great impact on the quality of the results obtained.

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig9_HTML.jpg

Classification of Machine Learning Approaches

There are many different deep learning algorithms other than ANN. In the following we define the most used ones and which are applied in the context of COVID-19.

Convolutional Neural Network (CNN)

Convolutional Neural Networks or ConvNets [ 27 , 28 ] is a type of ANN used to make a Deep Learning that is able to categorize information from the simplest to the most complex one. They consist of a multilayer stack of neurons as well as mathematical functions with several adjustable parameters, which preprocess small amounts of information. Convolutional networks are characterized by their first convolutional layers (usually one to three). They seek to identify the presence of a basic and abstract pattern in an object. Successive layers can use this information to distinguish objects from each other (classification / recognition).

Recurrent Neural Network (RNN)

Recurrent Neural Network [ 29 , 30 ] is also a type of ANN used to make a Deep Learning where information can move in both directions between the deep layers and the first layers. This allows it to keep information from the near past in memory. For this reason, RNN is particularly suited to applications involving context, and more particularly to the processing of temporal sequences such as learning and signal generation. However, for applications involving long time differences (typically the classification of video sequences), this “short-term memory” is not sufficient because forgetting begins after about fifty iterations.

Generative Adversarial Network (GAN)

GAN [ 31 ] is a Deep Learning technique. It is based on the competition of two networks within a framework. These two networks are called “generator” and “discriminator”. The generator is a type of CNN whose role is to create new instances of an object which means that outputs are produced without it being possible to determine if they are false. On the other hand, the discriminator is a “deconvolutive” neural network that determines the authenticity of the object (whether or not it is part of a data set).

Reinforcement Learning

Reinforcement Learning [ 32 , 33 ] is a method of learning for machine learning models. Basically, this method lets the algorithm learn from its own mistakes. To learn how to make the right decisions, the AI program is directly confronted with choices. If it is wrong, it is “penalized”. On the contrary, if it makes the right decision, it is “rewarded”. In order to get more and more rewards, AI will therefore do its best to optimize its decision-making.

Overview of Machine Learning approaches used to combat COVID-19

Zhang et al. [ 34 ] applied Support Vector Machine (SVM) model for COVID-19 cases detection and classification. The clinical information and blood/urine test data were used in their work to validate SVM’s performance. Simulation results demonstrated the effectiveness of the SVM model by achieving an accuracy of 81.48%, sensitivity of 83.33%, and specificity of 100%.

Hassanien et al. [ 35 ] proposed a new approach based on the hybridization of SVM with Multi-Level Thresholding for detecting COVID-19 infected patients from X-ray images. The performance of the hybrid approach was evaluated using 40 contrast-enhanced lungs X-ray images (15 normal and 25 with COVID-19). A similar work was done by Sethy et al. [ 36 ], in which a combined approach based on the combination of SVM with 13 pre-trained CNN models for COVID-19 detection from chest X-ray images were proposed. Experimental results showed that ResNet50 combined with SVM outperforms other CCN models combined with SVM by achieving an average classification accuracy of 95.33%.

Sun et al. [ 37 ] used SVM model for predicting the COVID-19 patients with severe/critical symptoms. 220 clinical/laboratory observations records and 336 cases of patients infected COVID-19 divided into training and testing datasets were used to validate the performance of the SVM model. Simulation results showed that the SVM model achieves an Area Under Curve (AUC) of 0.9996 and 0.9757 in the training and testing dataset, respectively.

Singh et al. [ 38 ] used four machine learning approaches (SVM with Bagging Ensemble, CNN, Extreme Learning Machine (ELM), Online Sequential ELM (OS-ELM)) for automatic detection of COVID-19 cases. The performance of the proposed approaches was tested using datasets of 702 CT scan images (344with COVID-19 and 358 normal). Experimental results revealed the efficiency of SVM with Bagging Ensemble by obtaining an accuracy, precision, sensitivity, specificity, F1-score, and AUC of 95.70%, 95.50%, 96.30%, 94.80%, 95.90%, and 95.80%, respectively.

Singh et al. [ 39 ] proposed Least Square-SVM (LS-SVM) and Autoregressive Integrated Moving Average (ARIMA) for the prediction of COVID-19 cases. A dataset of COVID-19 confirmed cases collected from five the most affected countries 1 was used to validate the proposed models. It was demonstrated that the LS-SVM model outperforms the ARIMA model by obtaining an accuracy of 80%.

Nour et al. [ 40 ] applied machine learning approaches such as SVM, Decision tree (DT), and KNN for automatic detection of positive COVID-19 cases. The performance of the proposed approaches was validated on a public COVID-19 radiology database divided into training and test sets with 70% and 30% rates, respectively.

Tabrizchi et al. [ 41 ] used SVM with Naive Bayes (NB), Gradient boosting decision tree (GBDT), AdaBoost, CNN, and Multilayer perceptron (MLP) for rapid diagnosis of COVID-19. A dataset of 980 CT scan images (430 with COVID-19 and 550 normal) was used in the simulation and results showed that SVM outperforms other machine-learning approaches by achieving an average accuracy, precision, sensitivity, and F1-score of 99.20%, 98.19%, 100%, and 99.0%, respectively.

Regression Approaches

Yue et al. [ 42 ] used a linear regression model for the prediction of COVID-19 infected patients. CT images of 52 patients collected from five hospitals in Ankang, Lishui, Zhenjiang, Lanzhou, and Linxia were used to evaluate the performance of the regression model. Simulation results demonstrated that the linear regression model outperforms the Random Forest algorithm.

Another similar work was done by Shi et al. [ 43 ], in which a least absolute shrinkage and selection operator (LASSO) logistic regression model was proposed. The effectiveness of the proposed model was evaluated based on CT images taken from 196 patients (151 non-severe patients and 45 severe patients). Experimental results showed the high performance of the proposed model compared to quantitative CT parameters and PSI score by achieving an accuracy of 82.70%, sensitivity of 82.20%, specificity of 82.80%, and AUC of 89%

Yan et al. [ 44 ] proposed a supervised regression model, called XGBoost, for predicting COVID-19 patients. A database of blood samples of 485 infected patients in the region of Wuhan, China was used in simulations and results showed that XGBoost gives good performance by achieving an overall accuracy of 90% in the detection of patients with COVID-19.

Salama et al. [ 45 ] used the linear regression model with SVM and ANN for the prediction of COVID-19 infected patients. The effectiveness of the proposed models was assessed based on the Epidemiological dataset collected from many health reports of real-time cases. Simulation results demonstrated that SVM has the lowest mean absolute error with the value of 0.21, while the regression model has the lowest root mean squared error with a value of 0.46.

Gupta et al. [ 46 ] proposed a linear regression technique with mathematical SEIR (Susceptible, Exposed, Infectious, Recovered) model for COVID-19 outbreak predictions. It was tested using data collected from John Hopkins University repository taking into account the root mean squared log error (RMSLE) metric. Simulation results showed that SEIR model has the lowest RMSLE with the value of 1.52.

In the work of Chen and Liu [ 47 ], Logistic Regression with Random Forest, Partial Least Squares Regression (PLSR), Elastic Net, and Bagged Flexible Discriminant Analysis (BFDA) were proposed for predicting the severity of COVID-19 patients. The efficiency of the proposed models was evaluated using data of 183 severely infected COVID-19 patients and results showed that the logistic regression model outperforms other machine learning models by achieving a sensitivity of 89.20%, specificity of 68.70%, and AUC of 89.20%.

Another similar work was done by Ribeiro et al. [ 48 ], in which six machine learning approaches such as stacking-ensemble learning (SEL), support vector regression (SVR), cubist regression (CUBIST), auto-regressive integrated moving average (ARIMA), ridge regression (RIDGE), and random forest (RF) were employed for prediction purposes in COVID-19 datasets.

Yadav et al. [ 49 ] used three machine learning approaches (Linear Regression, Polynomial Regression, and SVR) for COVID-19 epidemic prediction and analysis. A dataset containing the total number of COVID19 positive cases was collected from different countries such as South Korea, China, US, India, and Italy. Results showed the superiority of SVR compared to Linear Regression and Polynomial Regression. The average accuracy for SVR, Linear Regression, and Polynomial Regression are 99.47%, 65.01%, and 98.82%, respectively.

Matos et al. [ 50 ] proposed four linear regression models (Penalized binomial regression (PBR, Conditional inference trees (CIR), Generalised linear (GL), and SVM with linear kernel) for COVID-19 diagnosis. CT images and Clinical data collected from 106 patients were used in the simulation and results showed that SVM with linear kernel gives better results compared to other models by providing an accuracy of 0.88, sensitivity of 0.90, specificity of 0.87, and AUC of 0.92.

Khanday et al. [ 51 ] proposed Logistic regression with six machine learning approaches (Adaboost, Stochastic Gradient Boosting, Decision Tree, SVM, Multinomial Naïve Bayes, and Random Forest) for COVID-19 detection and classification. It was evaluated using 212 clinical reports divided into four classes including COVID, ARDS, SARS, and Both (COVID, ARDS). Simulation results showed that logistic regression provides excellent performance by obtaining 94% of precision, 96% of sensitivity, accuracy of 96.20%, and 95% of F1-score.

Yang et al. [ 52 ] proposed Gradient Boosted Decision Tree (GBDT) with Decision Tree, Logistic Regression, and Random Forest for COVID-19 diagnosis. 27 routine laboratory tests collected from the New York Presbyterian Hospital/Weill Cornell Medicine (NYPH/WCM) were used to evaluate this technique. Experimental results revealed the efficiency of GBDT by achieving a sensitivity, specificity, and AUC of 76.10 %, 80.80%, and 85.40%, respectively.

Saqib [ 53 ] developed a novel model (PBRR) by combining Bayesian Ridge Regression (BRR) with n-degree Polynomial for forecasting COVID-19 outbreak progression. The performance of the PBRR model was validated using public datasets collected from John Hopkins University available until 11th May 2020. Experimental results revealed the good performance of PBRR with an average accuracy of 91%.

Random Forest Algorithm

Shi et al. [ 54 ] proposed an infection Size Aware Random Forest method (iSARF) for diagnosis of COVID-19. A dataset of 1020 CT images (1658 with COVID-19, and 1027 with pneumonia) was used to assess the performance of iSARF. Simulation results demonstrated that iSARF provides good performance by yielding the sensitivity of 90.7%, specificity of 83.30%, and accuracy of 87.90% under five-fold cross-validation.

Iwendi et al. [ 55 ] combined RF model with AdaBoost algorithm for COVID-19 disease severity prediction. The efficiency of the boosted RF model was evaluated based on COVID-19 patient’s geographical, travel, health, and demographic data. Boosted RF model gives an accuracy of 94% and F1-Score of 86% on the dataset used.

In the work of Brinati et al. [ 56 ], seven machine learning approaches (Random Forest, Logistic Regression, KNN, Decision Tree, Extremely Randomized Trees, Naïve Bayes, and SVM) were proposed for the identification of COVID-19 positive patients. Routine blood exams collected from 279 patients were used in the simulation and results demonstrated the feasibility and effectiveness of the Random Forest algorithm by achieving an accuracy, precision, sensitivity, specificity, and AUC of 82%, 83%, 92%, 65%, and 84%, respectively.

The main characteristics of the predefined Supervised Learning approaches are given in Table Table1 1 .

Summary of supervised learning approaches for detection, diagnosis, and prediction of COVID-19 cases

Deep Learning Approaches

The most applied method to detect, predict and diagnostic COVID-19 are based on Deep Learning with its different techniques. In the following, we summarize the found approaches in respect of the classification given in Fig. 8 . We gather in Tables Tables2, 2 , ,3, 3 , ,4, 4 , ,5 5 and and6 6 are their main features.

Summary of convolutional neural networks (CNN) approaches for detection, diagnosis, and prediction of COVID-19 cases

Summary of Recurrent Neural Networks (RNN) approaches for detection, diagnosis, and prediction of COVID-19 case

Summary of Specialized CNN approaches for detection, diagnosis, and prediction of COVID-19 cases

Summary of Generative Adversarial Network (GAN) approaches for detection, diagnosis, and prediction of COVID-19 cases

Summary of other deep learning approaches for detection, diagnosis, and prediction of COVID-19 cases

Wang et al. [ 60 ] proposed a deep CNN model, called Residual Network34 (ResNet34), for COVID-19 diagnosis in CT scan images. The effectiveness of ResNet34 was validated using CT scan images collected from 99 patients (55 patients with typical viral pneumonia and 44 patients with COVID-19). Simulation results showed that ResNet34 achieves an overall accuracy of 73.10%, specificity of 67%, and sensitivity of 74%.

Narin et al. [ 61 ] used three pre-trained techniques including ResNet50, InceptionV3, and InceptionResNetV2 for automatic diagnosis and detection of COVID-19. The case studies included four classes including normal, COVID-19, bacterial, and viral pneumonia patients. The authors demonstrated that ResNet50 gives the highest accuracy in three different datasets.

Maghdid et al. [ 62 ] proposed a CNN model with AlexNet for COVID-19 diagnosis. A dataset of 361 CT images and 170 X-ray images of COVID-19 disease collected from five different sources was used in the simulation. Quantitative results demonstrated that AlexNet achieves an accuracy of 98%, a sensitivity of 100%, and a specificity of 96% in X-ray images, while the modified CNN model achieves 94.10% of accuracy, 90% of sensitivity, and 100% of specificity in CT-images.

Wang et al. [ 63 ] employed eight deep learning (DL) models (fully convolutional network (FCN-8 s), UNet, VNet, 3D UNet++, dual-path network (DPN-92), Inceptionv3, ResNet50, and Attention ResNet50) for COVID-19 detection. The efficiency of the proposed models was evaluated using 1,136 CT images (723 with COVID-19 and 413 normal) collected from five hospitals. Simulation results demonstrated the superiority of 3D UNet++ compared to other CNN models.

In CT scan images, UNet++ was employed by Chen et al. [ 64 ] for COVID-19 detection. The performance of UNet++ was assessed based on a dataset of 106 CT scan images. Simulation results showed that UNet++ provides a per-patient accuracy of 95.24%, sensitivity of 100%, specificity of 93.55%. A per-image accuracy of 98.85%, sensitivity of 94.34%, specificity of 99.16% were also achieved.

Apostolopoulos et al. [ 65 ] proposed five deep CNN models (VGG19, MobileNetv2, Inception, Xception, and Inception ResNetv2) for COVID-19 detection cases. The proposed models were tested using two datasets of 1428 and 1442 images, respectively. In the first dataset (224 with COVID-19, 700 with bacterial pneumonia, and 504 normal), MobileNetv2 approach provided better results with a two-class problem accuracy, three-class problem accuracy, sensitivity, and specificity of 97.40%, 92.85%, 99.10%, and 97.09%, respectively. In the second dataset (224 with COVID-19, 714 with bacterial pneumonia, and 504 normal), MobileNetv2 approach also provided better performance by achieving a two-class problem accuracy, three-class problem accuracy, sensitivity, and specificity of 96.78%, 94.72%, 98.66%, and 96.46%, respectively.

Another deep CNN model was developed by Zhang et al. [ 66 ] which is composed of three components (a backbone network, a classification head, and an anomaly detection head). This technique was evaluated using 100 chest X-ray images of 70 patients taken from the Github repository. 1431 additional chest X-ray images of 1008 patients taken from the public Chest X-ray14 data were also used to facilitate deep learning. Simulation results showed that the proposed model is an effective diagnostic tool for low-cost and fast COVID-19 screening by achieving the accuracy of 96% for COVID-19 cases and 70.65% for non-COVID-19 cases.

Another intersting project was done by Ghoshal and Tucker [ 67 ], in which a Bayesian Convolutional Neural Networks (BCNN) was used in conjunction with Dropweights for COVID-19 diagnosis and classification.

Toraman et al. [ 68 ] proposed a CNN model, called CAPSNET, for fast and accurate diagnostics of COVID-19 cases. CAPSNET model was evaluated using two datasets of 2100 and 13,150 cases, respectively. In the first dataset (1050 with COVID-19 and 1050 no-findings), CAPSNET provided better results by achieving an accuracy, precision, sensitivity, specificity, F1-score of 97.23%, 97.08%, 97.42%, 97.04%, and 97.24% respectively. In the second dataset (1050 with COVID-19, 1050 no-findings, and 1050 pneumonia), CAPSNET provided better performance by achieving an accuracy, precision, sensitivity, specificity, and F1-score of 84.22%, 84.61%, 84.22%, 91.79%, and 84.21% respectively.

Hammoudi et al. [ 69 ] investigated six deep CNN models (ResNet34, ResNet50, DenseNet169, VGG19, InceptionResNetV2, and RNN-LSTM) for COVID-19 screening and detection. A dataset of 5,863 children’s X-Ray images (Normal and Pneumonia) was exploited to evaluate the techniques proposed. Simulation results showed that DenseNet169 outperforms other deep CNN models by obtaining an average accuracy of 95.72%.

Ardakani et al. [ 70 ] proposed ten deep CNN models (AlexNet, VGG16, VGG19, SqueezeNet, GoogleNet, MobileNetV2, ResNet18, ResNet50, ResNet101, and Xception) for COVID-19 diagnosis. A dataset of 1020 CT images (108 with COVID-19, and 86 with bacteria pneumonia) was used to benchmark the efficiency. Simulation results showed the high performance of ResNet101 compared to other deep CNN models by achieving an accuracy of 99.51%, sensitivity of 100%, AUC of 99.4%, and specificity of 99.02%. Xu et al. [ 71 ] proposed a hybrid deep learning model, called ResNet+, based on combining the traditional ResNet with location-attention mechanism for COVID-19 diagnosis. The effectiveness of ResNet+ was evaluated using 618 Computer Tomography (CT) images (175 normal, 219 with COVID-19, 224 with Influenza-A viral pneumonia) and results demonstrated that ResNet+ provides an overall accuracy of 86.70%, sensitivity of 81.50%, precision of 80.80%, and F1-score of 81.10%. It is also revealed that the proposed ResNet+ is a promising supplementary diagnostic technique for clinical doctors.

Cifci [ 72 ] proposed two deep CNN model (AlexNet and InceptionV4) for Diagnosis and prognosis analysis of COVID-19 cases. The effectiveness of the proposed models was evaluated using 5800 CT images divided into 80% training and 20% test. It was demonstrated that AlexNet outperforms InceptionV4 by achieving an overall accuracy of 94.74%, a sensitivity of 87.37%, and a specificity of 87.45%. Bai et al. [ 73 ] did a similar work by proposin an EfficientNet B4 CNN model with a fully connected neural network for the detection and classification of COVID-19 cases. CT scan images of 521 patients were used in the simulation.

Loey et al. [ 74 ] proposed three deep CNN approaches (Alexnet, Googlenet, and Restnet18) with GAN model for COVID-19 detection. The proposed approaches were evaluated using three scenarios: i) four classes (normal, viral pneumonia, bacteria pneumonia, and COVID-19 images); ii) three classes (COVID-19, Normal, and Pneumonia); and iii) two classes (COVID-19, Normal). Experimental results demonstrated that Googlenet gives better performance in the first and third scenario by achieving an accuracy of 80.60%, and 100%, respectively. Alexnet provides better results in the second scenario by achieving an accuracy of 85.20%.

Singh et al. [ 75 ] proposed a novel deep learning approach based on convolutional neural networks with multi-objective differential evolution (MODE) for the classification of COVID-19 patients. In addition, Mukherjee et al. [ 76 ] proposed a shallow light-weight CNN model for automatic detection of COVID-19 cases from Chest X-rays in a similar manner.

Ozkaya et al. [ 77 ] proposed an effective approach based on the combination of CNN model with the ranking method and SVM technique for COVID-19 detection. The case studies included two datasets generated from 150 CT images, each dataset contains 3000 normal images and 3000 with COVID-19. Simulation results showed the high performance and robustness of the proposed approach compared to VGG16, GoogleNet, and ResNet50 models in terms of accuracy, sensitivity, specificity, sensitivity, F1-score, and Matthews Correlation Coefficient (MCC) metrics.

Toğaçar et al. [ 78 ] proposed two CNN models (MobileNetV2, SqueezeNet) combined with SVM for COVID-19 detection. The efficiency of the proposed models was validated using a dataset of X-ray images divided into three classes: normal, with COVID-19, and with pneumonia. The accuracy obtained in their work is of 99.27%.

Pathak et al. [ 79 ] proposed a ResNet50 deep transfer learning technique for the detection and classification of COVID-19 infected patients. The effectiveness of ResNet50 was evaluated using 852 CT images collected from various datasets (413 COVID-19 (+) and 439 normal or pneumonia). Simulation results showed that ResNet50 model gives efficient performance by achieving a specificity, precision, sensitivity, accuracy of 94.78%, 95.19%, 91.48%, and 93.02%, respectively.

Elasnaoui et al. [ 80 ] proposed seven Deep CNN models including baseline CNN, VGG16, VGG19, DenseNet201, InceptionResNetV2, InceptionV3, Xception, Resnet50, and MobileNetV2 for automatic classification of pneumonia images. Chest X-Ray & CT datasets containing 5856 images (4273 pneumonia and 1583 normal) were used to validate the proposed models and results demonstrated that Resnet50, MobileNetV2, and InceptionResnetV2 provide high performance with an overall accuracy more than 96% against other CNN models with an accuracy around 84%. Another similar work was done by Zhang et al. [ 81 ], in which a diagnosis COVID-19 system based on 3D ResNet18 deep learning technique with five deep learning-based segmentation models (Unet, DRUNET, FCN, SegNet & DeepLabv3) for Diagnosis and prognosis prediction of COVID-19 cases.

Rajaraman and Antali [ 82 ] used five deep CNN models (VGG16, InceptionV3, Xception, DenseNet201, NasNetmobile) for COVID-19 screening. Six datasets of x-ray images including Pediatric CXR, RSNA CXR, CheXpert CXR, NIH CXR-14, Twitter COVID-19 CXR, and Montreal COVID-19 CXR were used to validate the effectiveness of the proposed models. The accuracy obtained was 99.26%.

Tsiknakis et al. [ 83 ] proposed a modified deep CNN model (Modified InceptionV3) for COVID-19 screening on chest X-rays. The Modified InceptionV3 was evaluated using two chest X-ray datasets, the first dataset was collected from [ 84 ], the second one was collected from the QUIBIM imagingcovid19 platform database and various public repositories. Experimental results showed that the modified InceptionV3 model gives an average accuracy, AUC, sensitivity, and specificity of 76%, 93%, 93%, and 91.80%, respectively.

Ahuja et al. [ 85 ] presented pre-trained transfer learning models (ResNet18, ResNet50, ResNet101, and SqueezeNet) for automatic detection of COVID-19 cases. Another similar work was done by Oh et al. [ 86 ], in which a patch-based convolutional neural network was proposed based on ResNet18.

Elasnaoui and Chawki [ 87 ] used seven pre-trained deep learning models (VGG16, VGG19, DenseNet201, InceptionResNetV2, InceptionV3, Resnet50, and MobileNetV2) for automated detection and diagnosis of COVID-19 disease. The effectiveness of the proposed models was assessed using chest X-ray & CT dataset of 6087 images. Simulation results showed the superiority of InceptionResNetV2 compared to other deep CNN models by achieving an accuracy, precision, sensitivity, specificity, and F1-score of 92.60%, 93.85%, 82.80%, 97.37%, and 87.98%, respectively.

Chowdhury et al. [ 88 ] introduced eight deep CNN (DenseNet201, RestNet18, MobileNetv2, InceptionV3, VGG19, ResNet101, CheXNet, and SqueezNet) for COVID-19 detection. A dataset of 3487 x-ray images (423 with COVID-19, 1485 with viral pneumonia, and 1579 normal) with and without image augmentation was used in the validation of the proposed models. Simulation results showed that CheXNet gives better results when image augmentation was not applied with an accuracy, precision, sensitivity, specificity, F1-score of 97.74%, 96.61 %, 96.61%, 98.31%, and 96.61% respectively. However, when image augmentation was used, DenseNet201 outperforms other deep CNN models by achieving an accuracy, precision, sensitivity, specificity, and F1-score of 97.94%, 97.95%, 97.94%, 98.80%, and 97.94%, respectively.

Apostolopoulos et al. [ 89 ] proposed a deep CNN model (MobileNetv2) for COVID-19 detection and classification. The efficiency of MobileNetv2 was assessed using a large-scale dataset of 3905 X-ray images and results showed its excellent performance by achieving an accuracy, sensitivity, specificity of 99.18%, 97.36%, and 99.42%, respectively in the detection of COVID-19.

Rahimzadeh and Attar [ 90 ] proposed a modified deep CNN model based on the combination of Xception and ReNet50V2 for detecting COVID-19 from chest X-ray images. The proposed model was tested using 11,302 chest X-ray images (31 with COVID-19, 4420 with pneumonia, and 6851 normal cases). Experimental results showed that the combined model gives an average accuracy, precision, sensitivity, and specificity of 91.4%, 72.8%, 87.3%, and 94.2%, respectively. In a similar work, Abbas et al. [ 91 ] adapted a Convolutional Neural Network model, called Decompose Transfer Compose (DeTraC). The effectiveness of the DeTraC model was validated using a dataset of X-ray images collected from several hospitals and institutions around the world. As the results 95.12% accuracy, 97.91% sensitivity, and 1.87% specificity were obtained.

Afshar et al. [ 92 ] developed a deep CNN model (COVID-CAPS) using on Capsule Networks for COVID-19 identification and diagnosis. The effectiveness of COVID-CAPS was tested using two publicly available chest X-ray datasets. [ 84 , 93 ] As the results 98.30% accuracy, 80% sensitivity, and 8.60% specificity were obtained.

Brunese et al. [ 94 ] adopted a deep CNN approach (VGG-16) for automatic and faster COVID-19 detection from chest X-ray images. The robustness of VGG-16 was evaluated using 6523 chest X-ray images (2753 with pneumonia disease, 250 with COVID-19, while 3520 healthy) and results showed that VGG-16 achieves an accuracy of 97% for the COVID-19 detection and diagnosis.

Jin et al. [ 95 ] proposed a deep learning-based AI system for diagnosis of COVID-19 in CT images. 10,250 CT scan images (COVID-19, viral pneumonia, influenza-A/B, normal) taken from three centers in China and three publicly available databases were used in the simulation and results showed that the proposed model achieves an AUC of 97.17%, a sensitivity of 90.19%, and a specificity of 95.76%.

Truncated Inception Net was proposed by Das et al. [ 96 ] as a Deep CNN model for COVID-19 cases detection. Six different datasets were used in the simulation considering healthy, with COVID-19, with Pneumonia, and with Tuberculosis cases. It was demonstrated that Truncated Inception Net provides accuracy, precision, sensitivity, specificity, and F1-score of 98.77%, 99%, 95%, 99%, and 97%, respectively.

Asif et al. [ 97 ] proposed a Deep CNN model (Inception V3) with transfer learning for automatic detection of COVID-19 patients cases. A dataset consists of 3550 chest x-ray images (864 with COVID-19, 1345 with viral pneumonia, and 1341 normal) was used to test Inception V3. Simulation results proved the efficiency of the Inception V3 by achieving an accuracy of 98%.

Punn and Agrawal [ 98 ] introduced five fine-tuned deep learning approaches (baseline ResNet, Inceptionv3, InceptionResNetv2, DenseNet169, and NASNetLarge) for automated diagnosis and classification of COVID-19. The performance of the proposed approaches was validated using three datasets of X-ray and CT images collected from Radiological Society of North America (RSNA), [ 99 ] U.S. national library of medicine (USNLM), [ 100 ] and COVID-19 image data collection. [ 84 ] Simulation results showed that NASNetLarge outperforms other CNN models by achieving 98% of accuracy, 88% of precision, 90% of sensitivity, 95% of specificity, and 89% of F1-score.

Shelke et al. [ 101 ] proposed three CNN models (VGG16, DenseNet161, and ResNet18) for COVID-19 diagnosis and analysis. The proposed models were tested using two datasets of 1191 and 1000 X-ray images, respectively. In the first dataset (303 with COVID-19, 500 with bacterial pneumonia, and 388 normal), VGG16 approach provided better results with an accuracy of 95.9%. In the second dataset (500 with COVID-19 and 500 normal), DenseNet161 approach provided better performance by achieving an accuracy of 98.9%.

Rajaraman et al. [ 102 ] proposed eight deep CNN models (VGG16, VGG19, InceptionV3, Xception, InceptionResNetV2, MobileNetV2, DenseNet201, NasNetmobile) for COVID-19 screening. Four datasets of x-ray images including Pediatric CXR, RSNA CXR, Twitter COVID-19 CXR, and Montreal COVID-19 CXR were used to validate the effectiveness of the proposed models. Experimental results demonstrated that the weighted average of the best-performing pruned models enhances performance by providing an accuracy, precision, sensitivity, AUC, F1-score of 99.01%, 99.01%, 99.01%, 99.72%, and 99.01%, respectively.

Another similar work was done by Luz et al. [ 103 ], which can be considered as an extension of EfficientNet for COVID-19 detection and diagnosis in X-Ray Chest images. It was compared with MobileNet, MobileNetV2, ResNet50, VGG16, and VGG19. Simulation results demonstrated the effectiveness of EfficientNet compared to other deep CNN models by achieving an overall accuracy of 93.9%, sensitivity of 96.8%, and a positive prediction rate of 100%.

Jaiswal et al. [ 104 ] employed DenseNet201 based transfer learning for COVID-19 detection and diagnosis. The performance of DenseNet201 was validated using 2492 chest CT-scan images (1262 with COVID-19 and 1230 healthy) taken into account precision, F1-measure, specificity, sensitivity, and accuracy metrics. Quantitative results showed the effectiveness of compared to VGG16, Resnet152V2, and InceptionResNet by providing a precision, F1-measure, specificity, sensitivity, and accuracy of 96.29%, 96.29%, 96.29% and 96.21%, and 96.25%, respectively.

Sharma [ 105 ] employed a ResNet50 CNN-based approach for COVID-19 detection. 2200 CT images (800 with COVID-19, 600 viral pneumonia, and 800 normal healthy) collected from various hospitals in Italy, China, Moscow, and India were used in the simulation and results showed that ResNet50 outperforms ResNet+ by giving a specificity, sensitivity, accuracy of 90.29%, 92.1%, and 91.0%, respectively. Pu et al. [ 106 ] conducted a similar work.

Alotaibi [ 107 ] used four pre-trained CNN models (RESNET50, VGG19, DENSENET121, and INCEPTIONV3) for the detection of COVID-19 cases. A dataset of X-ray images (219 with COVID-19, 1341 Normal, and 1345 with Viral Pneumonia) was used in the experimentation and results demonstrated the better performance of DENSENET121 compared to RESNET50, VGG19, and INCEPTIONV3 by achieving an accuracy, precision, sensitivity, and F1-score of 98.71%, 98%, 98%, and 97.66%, respectively.

Goyal and Arora [ 108 ] proposed three CNN models (VGG16, VGG19, and ResNet50) for COVID-19 detection. This technique was evaluated using 748 chest X-ray images (250 with COVID-19, 300 normal, and 198 with pneumonia bacteria) and results showed that VGG19 outperforms VGG16 and ResNet50 by achieving an accuracy of 98.79% and 98.12% in training and testing cases, respectively. A similar work was done by Das et al. [ 109 ], in which an extreme version of the Inception (Xception) model for the automatic detection of COVID-19 infection cases in X-ray images.

Rahaman et al. [ 110 ] used 15 different pre-trained CNN models for COVID-19 cases identification. 860 chest X-Ray images (260 with COVID-19, 300 healthy, and 300 pneumonia) were employed to investigate the effectiveness of the proposed models. Simulation results showed that the VGG19 model outperforms other deep CNN models by obtaining an accuracy of 89.3%, precision of 90%, sensitivity of 89%, and F1-score of 90%.

Altan and Karasu [ 111 ] proposed a hybrid approach based on CNN model (EfficientNet-B0), two-dimensional (2D) curvelet transformation, and chaotic salp swarm algorithm (CSSA) for COVID-19 detection. 2905 real raw chest X-ray images (219 with COVID-19, 1345 viral pneumonia, and 1341 normal) were used. Another similar work was done where a Confidence-aware anomaly detection (CAAD) was proposed based on EfficientNetB0

Ni et al. [ 112 ] proposed a CNN model, called MVPNet, for automatic detection of COVID-19 cases. 19,291 pulmonary CT scans images (3854 with COVID-19, 6871 with bacterial pneumonia, and 8566 healthy) were employed to validate the performance of the MVPNet model. Experimental results demonstrated that MVPNet achieves a sensitivity of 100%, specificity of 65%, accuracy of 98%, and F1-score of 97%.

Nguyen et al. [ 113 ] employed two deep CNN models (EfficientNet and MixNet) for the detection of COVID-19 infected patients from chest X-ray (CXR) images. The effectiveness of the proposed approach was validated using two real datasets consisting of: i) 13,511 training images and 1,489 testing images; ii) 14,324 training images and 3,581 testing images. Simulation results demonstrated that the proposed approach outperforms some well-established baselines by yielding an accuracy larger than 95%.

Islam et al. [ 114 ] proposed four CNN models( VGG19, DenseNet121, InceptionV3, and InceptionResNetV2) and recurrent neural network (RNN) for COVID-19 diagnosis. A similar work was done by Mei et al. [ 115 ] with proposing a combination of SVM, random forest, MLP, and CNN.

Khan and Aslam [ 116 ] presented four CNN models (DenseNet121, ResNet50, VGG16, and VGG19) for COVID-19 diagnosis. The superiority of the proposed models was evaluated using a dataset of 1057 X-ray images including 862 normal and 195 with COVID-19. Experimental results demonstrated that VGG-19 model achieves better performance than DenseNet121, ResNet50, and VGG16 by achieving an accuracy, sensitivity, specificity, F1-score of 99.33%, 100%, 98.77%, and 99.27%, respectively.

Perumal et al. [ 117 ] used deep CNN models (VGG16, Resnet50, and InceptionV3) and Haralick features for the detection of COVID-19 cases. A dataset of X-ray and CT images collected from various resources available in Github open repository, RSNA, and Google images was used in the simulation and results showed that the proposed models outperform other existing models with an average accuracy of 93%, precision of 91%, and sensitivity of 90%.

Kumar et al. [ 118 ] used various deep learning models (VGG, DenseNet, AlexNet, MobileNet, ResNet, and Capsule Network) with blockchain and federated-learning technology for COVID-19 detection from CT images. These techniques were evaluated using a dataset of 34,006 CT scan images taken from the GitHub repository ( https://github.com/abdkhanstd/COVID-19 ). Simulation results revealed that the Capsule Network model outperforms other deep learning models by achieving an accuracy of 0.83 and sensitivity of 0.967 and precision of 0.83.

Zebin et al. [ 119 ] proposed three Deep CNN models (modified VGG16, ResNet50, and EfficientNetB0) for COVID-19 detection. A dataset of X-ray images (normal, non-COVID-19 pneumonia, and COVID-19) taken from COVID-19 image Data Collection was used to evaluate them. The overall accuracy of 90%, 94.30%, and 96.80% for the VGG16, ResNet50, and EfficientNetB0 were obtained.

Abraham and Nair [ 120 ] proposed a combined approach based on the combination of five multi-CNN models (Squeezenet, Darknet-53, MobilenetV2, Xception, and Shufflenet) for the automated detection of COVID-19 cases from X-ray images.

Ismael and Şengür [ 121 ] proposed three deep learning techniques for COVID-19 detection from chest X-ray images. The first technique was proposed based on five pre-trained deep CNN models (ResNet18, ResNet50, ResNet101, VGG16, and VGG19), the second deep learning model was proposed using CNN model with end-to-end training, the third and the last technique was proposed using pre-trained CNN models and SVM classifiers with various kernel functions. A dataset of 380 chest X-ray images (180 with COVID-19 and 200 normal (healthy)) was used for validation experimentation and results showed the efficiency of CNN techniques compared to various local texture descriptors.

Goel et al. [ 122 ] proposed an optimized convolutional neural network model, called OptCoNet, for COVID-19 diagnosis. A dataset of 2700 X-ray images (900 with COVID-19, 900 normal, and 900 with pneumonia) was employed to assess the performance of OptCoNet and results showed is effectiveness by providing accuracy, precision, sensitivity, specificity, and F1-score values of 97.78%, 92.88%, 97.75%, 96.25%, and 95.25%, respectively.

Bahel and Pillali [ 123 ] proposed five deep CNN models (InceptionV4, VGG 19, ResNetV2-152, and DenseNet) for detecting COVID-19 from chest X-Ray images. These techniques were evaluated based on a dataset of 300 chest x-ray images of infected and uninfected patients. Heat map filter was used on the images for helping the CNN models to perform better. Simulation results showed that DenseNet outperforms other deep CNN models such as InceptionV4, VGG19, and ResNetV2-152.

Sitaula and Hossain [ 124 ] proposed a novel deep learning model based on VGG-16 with the attention module for COVID-19 detection and classification. Authors conducted extensive experiments based on three X-ray image datasets D1 (Covid-19, No findings, and Pneumonia), D2 (Covid, Normal, Pneumonia Bacteria, Pneumonia Viral), and D3 (Covid, Normal, No findings, Pneumonia Bacteria, and Pneumonia Viral) to test this technique. Experimental results revealed the stable and promising performance compared to the state-of-the-art models by obtaining an accuracy of 79.58%, 85.43%, and 87.49% in D1, D2, and D3, respectively.

Jain et al. [ 125 ] proposed three CNN models (Inception V3, Xception, and ResNeXt) for COVID-19 detection and analysis. 6432 chest x-ray images divided into two classes including training set (5467) and validation set (965) were used to analyze the approaches performance. Simulation results showed that Xception model gives the highest accuracy with 97.97% as compared to other existing models.

Yasar and Ceylan [ 126 ] proposed a novel model based on CNN model with local binary pattern and dual-tree complex wavelet transform for COVID-19 detection on chest X-ray images. This approach was validated using two datasets of X-ray images: i) dataset of 230 images (150 with Covid-19 and 80 normal) and ii) dataset of 476 images (150 with Covid-19 and 326 normal). Experimental results showed that the proposed model gives good performance by achieving an accuracy, sensitivity, specificity, F1-score, and AUC of 98.43%, 99.47%, 98%, 98.81%, and 99.90%, respectively for the first dataset. For the second dataset, the proposed model achieves an accuracy, sensitivity, specificity, F1-score and, AUC of 98.91%, 99.20%, 99.39%, 98.28%, and 99.91%, respectively.

Khalifa et al. [ 127 ] proposed a new approach based on three deep learning models (Resnet50, Shufflenet, and Mobilenet) and GAN for detecting COVID-19 in CT chest Medical Images. In a similar work, Mukherjee et al. [ 128 ] proposed a lightweight (9 layered) CNN-tailored deep neural network model. It was demonstrated that the proposed model outperforms InceptionV3.

Hira et al. [ 142 ] used nine CNN models (AlexNet, GoogleNet, ResNet50, SeResNet50, DenseNet121, InceptionV4, InceptionResNetV2, ResNeXt50, and SeResNeXt50) for the detection of COVID–19 disease. The efficiency of the proposed models was validated using four scenarios: (i) two classes (224 with COVID–19 and 504 Normal); (ii) three classes (224 with COVID–19, 504 Normal, and 700 with bacterial Pneumonia); (iii) three classes (224 with COVID-19, 504 Normal, and 714 with bacterial and viral Pneumonia) and (iv) four classes (1346 normal, 1345 viral pneumonia, 2358 bacteria pneumonia, and with 183 COVID-19). Experimental results demonstrated that SeResNeXt50 outperforms other methods in terms of accuracy, precision, sensitivity, specificity, and F1-score.

Jelodar et al. [ 147 ] proposed a novel model based on LSTM with natural language process (NLP) for COVID-19 cases classification. The effectiveness of the proposed model was validated using a dataset of 563,079 COVID-19-related comments collected from the Kaggle website (between January 20, 2020 and March 19, 2020) and results showed its efficiency and robustness on this problem area to guide related decision-making.

Chimmula et al. [ 148 ] used LSTM model for forecasting of COVID-19 cases in Canada. The performance of LSTM was validated using data collected from Johns Hopkins University and Canadian Health Authority with several confirmed cases and results showed that the LSTM model achieves better performance when compared with other forecasting models.

Jiang et al. [ 149 ] developed a novel model, called BiGRU-AT, based on bidirectional GRU with an attention mechanism for COVID-19 detection and diagnosis. The performance of BiGRU-AT was assessed using breathing and thermal data extracted from people wearing masks. Simulation results showed that BiGRU-AT achieves an accuracy, sensitivity, specificity, and F1-score of 83.69%, 90.23%, 76.31%, and 84.61%, respectively.

Mohammed et al. [ 150 ] proposed LSTM with ResNext+ and slice attention module for COVID-19 detection. A total of of 302 CT volumes (20 with confirmed COVID19 and 282 normal) was used for testing and training the proposed model. According to the results, the proposed model provides an accuracy of 77.60%, precision of 81.90%, sensitivity of 85.50%, specificity of 79.30%, and F1-score of f 81.40%.

Islam et al. [ 151 ] introduced a novel model based on the hybridization of LSTM with CNN for automatic diagnosis of COVID-19 cases. The effectiveness of the hybrid model was validated using a dataset of 4575 X-ray images (1525 images with COVID-19, 1525 with viral pneumonia, and 1525 normal). Simulation results showed that the hybrid model outperforms other existing models by achieving an accuracy, sensitivity, specificity, and F1-score of 99.20%, 99.30%, 99.20%, and 98.90%, respectively.

Aslan et al. [ 152 ] proposed a hybrid approach based on the hybridization of Bidirectional LSTM (BiLSTM) with CNN Transfer Learning (mAlexNet) for COVID-19 detection. A dataset of 2905 X-ray images (219 with COVID-19, 1345 with viral pneumonia, and 1341 normal) was used in the simulation and results showed that the hybrid approach outperforms mAlexNet model by giving an accuracy, precision, sensitivity, specificity, F1-score, and AUC of 98.70%, 98.77%, 98.76%, 99.33%, 98.76%, and 99%, respectively (Tables (Tables2, 2 , ,3, 3 , ,4, 4 , ,5, 5 , ,6 6 ).

Specialized CNN Approaches for COVID–19

Song et al. [ 155 ] developed a deep-learning model, called Details Relation Extraction neural Network (DRE-Net), for accurate identification of COVID-19-infected patients. 275 chest scan images (86 normal, 88 with COVID-19, and 101 with bacteria pneumonia) were used to validate the performance of DRE-Net. Simulation results showed that DRE-Net can identify COVID-19 infected patients with an average accuracy of 94%, AUC of 99%, and sensitivity of 93%.

Li et al. [ 156 ] proposed a deep learning method, called COVNet, for COVID-19 diagnosis from CT scan images. A dataset of 4356 chest CT images from 3222 patients collected from six hospitals between August 2016 and February 2020 was used in the simulation and results showed that the proposed COVNet achieves an AUC, sensitivity, and specificity of 96%, 90%, and 96%, respectively. Zheng et al. conducted a similar study [ 157 ] by proposing a 3D deep CNN model, called DeCoVNet, for detecting COVID-19 from 3D CT images.

Ucar and Korkmaz [ 158 ] proposed a novel and efficient Deep Bayes-SqueezeNet-based system (COVIDiagnosis-Net) for COVID-19 Diagnosis. A dataset of 5949 chest X-ray images including 1583 normal, 4290 pneumonia, and 76 COVID-19 infection cases was employed in the simulation and results showed that COVIDiagnosis-Net outperforms existing network models by achieving 98.26% of accuracy, 99.13% of specificity, and 98.25% of F1-score.

DarkCovidNet was proposed by Ozturk et al. [ 159 ] for automated detection of COVID-19. The efficiency of DarkCovidNet was evaluated using two datasets: i) A COVID-19 X-ray image database developed by Cohen JP [ 84 ] and ii) ChestX-ray8 database provided by Wang et al. [ 160 ]. Simulation results showed that DarkCovidNet gives accurate diagnostics of 98.08% and 87.02% for binary classification (COVID vs. No-Findings) and multi-class classification (COVID vs. No-Findings vs. Pneumonia), respectively.

Wang and Wong [ 161 ] proposed a deep learning model, called Covid-Net, for detecting COVID-19 Cases from Chest X-Ray Images. Quantitative and qualitative results showed the efficiency and superiority of the proposed Covid-Net model compared to VGG-19 and ResNet-50 techniques.

In [ 162 ], Born et al. proposed POCOVID-Net for the automatic detection of COVID-19 cases. A lung ultrasound (POCUS) dataset consisting of 1103 images (654 COVID-19, 277 bacterial pneumonia, and 172 normal) sampled from 64 videos was used for evaluating the effectiveness of POCOVID-Net model. According to the results, POCOVID-Net model provides good performance with 0.89 accuracy, 0.88 precision, 0.96 sensitivity, 0.79 specificity, and 0.92 F1-score.

COVID-19Net was proposed by Wang et al. [ 163 ] for the diagnostic and prognostic analysis of COVID-19 cases in CT images. A dataset of chest CT images collected from six cities or provinces including Wuhan city in China was used for the simulation and results showed the good performance of COVID-19Net by achieving an AUC of 87%, an accuracy of 78.32%, a sensitivity of 80.39%, F1-score of 77%, and a specificity of 76.61%.

Khan et al. [ 164 ] proposed a new model (CoroNet) for COVID-19 detection and diagnosis. CoroNet was validated using three scenarios: i) 4-class CoroNet (normal, viral pneumonia, bacteria pneumonia, and COVID-19 images); ii) 3-class CoroNet (COVID-19, Normal and Pneumonia); and iii) binary 2-class CoroNet (COVID-19, Normal and Pneumonia). Experimental results demonstrated the superiority of CoroNet compared to some studies in the literature by achieving an accuracy of 89.5%, 94.59%, and 99% for 4-class, 3-class, and binary 2-class scenarios, respectively.

Mahmud et al. [ 165 ] proposed a novel multi-dilation deep CNN model (CovXNeT) based on depthwise dilated convolutions for automatic COVID-19 detection. Three datasets of 5856, 610, and 610 x-ray images were used for evaluating the effectiveness of CovXNeT. Experimental results revealed the performance of CovXNeT compared to other approaches in the literature by providing an accuracy of 98.1%, 95.1%, and 91.70% for the dataset of 5856 images, dataset of 610 images, and dataset of 610 images, respectively.

siddhartha and Santra [ 166 ] proposed a novel model, called COVIDLite, based on a depth-wise separable deep neural network (DSCNN) with white balance and CLAHE for the detection of COVID-19 cases. Two datasets of X-ray images: i)1458 images (429 COVID-19, 495 viral pneumonia, and 534 normal) and ii) 365 images (107 COVID-19, 124 viral pneumonia, and 134 normal) were used for testing the effectiveness of COVIDLite. Simulation results revealed that COVIDLite performs for both 2-class and 3-class scenario by achieving an accuracy of 99.58% and 96.43%, respectively.

Ahmed et al. [ 167 ] proposed a novel CNN model, called ReCoNet, for COVID-19 detection. The effectiveness of ReCoNet was evaluated based on COVIDx [ 161 ] and CheXpert [ 168 ] datasets containing 15.134 and 224.316 CXR images, respectively. Experimental results demonstrated that ReCoNet outperforms COVID-Net and other state-of-the-art techniques by yielding an accuracy, sensitivity, and specificity of 97.48%, 96.39%, and 97.53%, respectively.

Haghanifar et al. [ 169 ] developed a novel approach, called COVID-CXNET, based on the well-known CheXNet model for automatic detection of COVID-19 cases. The effectiveness of COVID-CXNET was tested using a dataset of 3,628 chest X-ray images (3,200 normal and 428 with COVID-19) divided into two classes including training set (80%)and validation set (20%). Experimental results showed that COVID-CXNET gives an accuracy of 99.04% and F1-score of 96%.

Turkoglu [ 170 ] proposed a COVIDetectioNet model with AlexNet and SVM for COVID-19 diagnosis and classification. A dataset of 6092 X-ray images (1583 Normal, 219 with COVID19, and 4290 with Pneumonia) collected from the Github and Kaggle databases was used in the experimentation. Simulation results demonstrated the better performance of COVIDetectioNet compared to other deep learning approaches by achieving an accuracy of 99.18%.

Tammina [ 171 ] proposed a novel deep learning approach, called CovidSORT for COVID-19 detection. 5910 Chest X-ray images collected from retrospective cohorts of pediatric Women patients and Children’s Medical Center of Guangzhou, China were used to validate the CovidSORT performance. Simulation results demonstrated that the CovidSORT model provides an accuracy of 96.83%, precision of 98.75%, sensitivity of 96.57%, and F1-score of 97.65%.

Al-Bawi et al. [ 172 ] developed an efficient model based on VGG with the convolutional COVID block (CCBlock) for the automatic diagnosis of COVID-19. To evaluate It, 1,828 x-ray images were used including 310 with COVID-19 cases, 864 with pneumonia, and 654 normal images. According to the results, the proposed model gives the highest diagnosis performance by achieving an accuracy of 98.52% and 95.34% for two and three classes, respectively.

Jamshidi et al. [ 181 ] used Generative Adversarial Network (GAN), Extreme Learning Machine (ELM), RNN, and LSTM for COVID–19 diagnosis and treatment. Sedik et al. [ 182 ] proposed a combined model based on GAN with CNN and ConvLSTM for COVID–19 infection detection. Two datasets of X-ray and CT images were used in the simulation and results showed the effectiveness and performance of the combined model by achieving 99% of accuracy, 97.70% of precision, 100% of sensitivity, 97.80% of specificity, and 99% of F1-score.

Other Deep Learning Approaches

Farid et al. [ 184 ] proposed a Stack Hybrid Model, called Composite Hybrid Feature Selection Model (CHFS), based on the hybridization of CNN and machine learning approaches for early diagnosis of covid19. The performance of CHFS was evaluated based on a dataset containing 51 CT images divided into training and testing sets. Simulation results showed that CHFS achieves an F1-score, precision, sensitivity, accuracy of 96.10%, 96.10%, 96.10%, and 96.07%, respectively.

Hwang et al. [ 185 ] implemented a Deep Learning-Based Computer-Aided Detection (CAD) System for the identification of COVID-19 infected patients. CAD system was trained based on chest X-ray and CT images and results showed that CAD system achieves 68.80% of sensitivity, 66.70% of specificity with chest X-ray images and 81.5% of sensitivity, 72.3% of specificity with CT images.

Amyar et al. [ 186 ] proposed a multi-task deep learning approach for COVID-19 detection and classification from CT images. A dataset of images collected from 1369 patients (449 with COVID-19, 425 normal, 98 with lung cancer, and 397 of different kinds of pathology) was used to evaluate the performance of the proposed approach. Results showed that the proposed approach achieves an AUC of 0.97, an accuracy of 94.67, a sensitivity of 0.96, and a specificity of 0.92.

For COVID-19 pneumonia diagnosis, Ko et al. [ 187 ] proposed fast-track COVID-19 classification network (FCONet), which uses as backbone one of the pre-trained deep learning models (VGG16, ResNet50, Inceptionv3, or Xception). A set of 3993 chest CT images divided into training and test classes were used to evaluate the performance of the proposed FCONet. Experimental results demonstrated that FCONet with ResNet50 gives excellent diagnostic performance by achieving a sensitivity of 99.58%, specificity 100%, accuracy 99.87%, and AUC of 100%.

Basu and Mitra [ 188 ] proposed a domain extension transfer learning (DETL) with three pre-trained deep CNN models (AlexNet, VGGNet, and ResNet) for COVID-19 screening. 1207 X-ray images (350 normal, 322 with pneumonia, 305 with COVID-19, and 300 other diseases) were employed to validate the proposed model. Experimental results showed that DETL with VGGNet gives a better accuracy of 90.13%.

Elghamrawy [ 189 ] developed a new approach (DLBD-COV) based on H2O’s Deep-Learning-inspired model with Big Data analytic for COVID-19 detection. The efficiency of DLBD-COV was validated based on CT images collected from [ 84 ] and X-ray images collected from [ 190 ] taking into account five metrics such as accuracy, precision, Sensitivity, and computational time. Simulation results showed that DLBD-COV provides a superior accuracy compared to other CNN models such as DeConNet and ResNet+.

Sharma et al. [ 191 ] proposed an deep learning model for rapid identifying and screening of COVID-19 patients. The efficiency of the proposed model was validated using chest X-ray images of adult COVID-19 patients (COVID-19, non-COVID-19, pneumonia, and tuberculosis images) and results showed its efficiency compared to previously published methods.

Hammam et al. [ 192 ] proposed a stacked ensemble deep learning model for COVID-19 vision diagnosis. The efficiency of the proposed model was validated using a dataset of 500 X-ray images divided into three classes including the training set (80%), validation set (10%), and testing set (10%). Simulation results showed the superior performance of the proposed model compared to any other single model by achieving 98.60% test accuracy. A similar work was done by Mohammed et al. [ 193 ], in which a Corner-based Weber Local Descriptor (CWLD) was prpoposed for diagnosis of COVID-19 from chest X-Ray images.

Li et al. [ 194 ] proposed a stacked auto-encoder detector model for the diagnosis of COVID-19 Cases on CT scan images. Authors used in their experimentation a dataset of 470 CT images (275 with COVID-19 and 195 normal) collected from UC San Diego. According to the results, the proposed model performs well and achieves an average accuracy of 94.70%, precision of 96.54%, sensitivity of 94.10%, and F1-score of 94.80%. Al-antari et al. [ 195 ] introduced a novel model (CAD-based YOLO Predictor) based on fast deep learning computer-aided diagnosis system with YOLO predictor for automatic diagnosis of COVID-19 cases from digital X-ray images. The proposed system was trained using two different digital X-ray datasets: COVID-19 images [ 84 , 88 ] and ChestX-ray8 images [ 196 ]. According to the experimentation, CAD-based YOLO Predictor achieves an accuracy of 97.40%, sensitivity of 85.15%, specificity of 99.06%, and F1-score of 84.81%.

Gianchandani et al. [ 197 ] proposed two ensemble deep transfer learning models for Rapid COVID-19 diagnosis. The proposed models were validated using two datasets of X-ray images obtained from Kaggle datasets resource [ 198 ] and the University of Dhaka and Qatar University. [ 88 ]

Other Machine Learning Approaches

Chakraborty and Ghosh [ 204 ] developed a hybrid method (ARIMA–WBF) based on the hybridization of ARIMA model and Wavelet-based forecasting (WBF) model for predicting the number of daily confirmed COVID-19 cases. The effectiveness of ARIMA-WBF was validated using datasets of 346 cases taken from five countries (70: Canada, 71: France, 64: India, 76: South Korea, and 65: UK). Simulation results showed the performance and robustness of ARIMA-WBF in the prediction of COVID-19 cases.

Tuncer et al. [ 205 ] proposed a feature generation technique, called Residual Exemplar Local Binary Pattern (ResExLBP) with iterative ReliefF (IRF) and five machine learning methods (Decision tree, linear discriminant, SVM, kNN, and subspace discriminant) for automatic COVID-19 detection. The efficiency of the proposed model was validated using datasets of X-ray images collected from the GitHub website and Kaggle site. Simulation results showed that ResExLBP with IRF and SVM gives better performance compared to other models by providing 99.69% accuracy, 98.85% sensitivity, and 100% specificity.

Tuli et al. [ 206 ] developed a novel model based on machine learning and Cloud Computing for real-time prediction of COVID-19. The effectiveness of the proposed model was validated using 2Our World In Data (COVID-19 Dataset) taken from the Github repository ( https://github.com/owid/covid-19-data/tree/master/public/data/ ). Simulation results showed that the proposed model gives good performance on this problem area.

Pereira et al. [ 207 ]used MLP with KNN, SVM, Decision Trees, and Random Forest for COVID-19 identification in chest X-ray images. The efficiency of the proposed models was evaluated based on RYDLS-20 database of 1144 chest X-ray images divided into training and test sets with 70% and 30% rates. Experimental results showed the superiority of MLP compared to other machine learning approaches by providing an F1-Score of 89%.

Albahri et al. [ 208 ] used a machine learning model combined with a novel Multi-criteria-decision-method (MCDM) for the identification of COVID-19 infected patients. The effectiveness of the proposed model was evaluated based on Blood sample images. Simulation results revealed that the proposed model is a good tool for identifying infected COVID-19 cases.

Wang et al. [ 209 ] developed a hybrid model based on FbProphet technique and Logistic Model for COVID-19 epidemic trend prediction. The hybrid model was validated using COVID-19 epidemiological time-series data and results revealed the effectiveness of the hybrid model for the prediction of the turning point and epidemic size of COVID-19.

Ardakani et al. [ 210 ] proposed a machine learning-based Computer-Aided Detection (CAD) System (COVIDiag) for COVID-19 diagnosis. The performance of COVIDiag was evaluated using CT images of 612 patients (306 with COVID-19 and 306 normal). Experimental results demonstrated the effectiveness of COVIDiag compared to SVM, KNN, NB, and DT by achieving the sensitivity, specificity, and accuracy of 93.54%, 90.32%, and 91.94%, respectively.

The summary of other Machine Learning approaches is given in Table Table7 7 .

Summary of other Machine Learning approaches for detection, diagnosis, and prediction of COVID-19 cases

Machine Learning is the field of AI that has been applied to deal with COVID-19. The finding from this study reveals that:

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig10_HTML.jpg

Approaches of machine learning used to deal with COVID-19

Techniques basically known in the field of Unsupervised Learning did not appear in the reviewed papers. However, in case of unlabeled data, deep Learning makes an automatic learning which is a form of an unsupervised learning;
Similarly, techniques of Reinforcement Learning are not explored in the summarized approaches;

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig11_HTML.jpg

Deep learning approaches used to deal with COVID-19

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig12_HTML.jpg

Supervised learning techniques used to deal with COVID-19

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1184_Fig13_HTML.jpg

Metrics used in the evaluation of COVID-19 related approaches

Despite all these contributions, there are still some remaining challenges in applying ML to deal with COVID-19. Actually, handling new datasets generated in real time is facing several issues limiting the efficiency of results. In fact, many of the proposed approaches are based on small datasets. They are, in most cases, incomplete, noisy, ambiguous and with a significant ratio of missing patterns. Consequently, the training is not efficient and the risk of over-fitting is high because of the high variance and errors on the test set. Therefore, the need to build large datasets becomes unavoidable. However, it is not sufficient. In fact, without a complete and standard dataset, it is difficult to conclude which method provides the best results. To overcome that, a deep work of merging existing datasets and cleaning them up, by removing / imputing missing data and removing redundancy, is required.

The COVID-19 pandemic has deeply marked the year 2020 and has made the researchers community in different fields react. This paper demonstrated the interest attached by data scientists to this particular situation. It provided a survey of Machine Learning based research classified into two categories (Supervised Learning approaches and Deep Learning approaches) to make detection, diagnosis, or prediction of the COVID-19. Moreover, it gave an analysis and statistics on published works. The review included more than 160 publications coming from more than 6 famous scientific publishers. The learning is based on various data supports such as X-Ray images, CT images, Text data, Time series, Sounds, Coughing/Breathing videos, and Blood Samples. Our study presented a synthesis with accurate ratios of use of each of the ML techniques. Also, it summarized the metrics employed to validate the different models. The statistical study showed that 6 metrics are frequently used with favor to accuracy, sensitivity, and specificity which are evaluated in almost equal proportions. Among the ML techniques, it is shown that 79% of them are based on Deep Learning. In 65% of cases, CNN architecture was used. However, 17% of the reviewed papers proposed a Specialize CNN architecture adapted to COVID-19. Supervised Learning is also present in 16% of cases either to make classification by using mainly SVM or to make regression where Random Forest Algorithms and Linear regression are the most dominant techniques. In addition of them, hybrid approaches are also explored to address the topic of COVID-19. They represent 5% of the reviewed methods in this paper. Most of them mix CNN with other techniques and/or meta-heuristics in order to outperform the classical ones. They demonstrated good performance in terms of accuracy and F1-Score, thus, it would be worth investigating them further. Given this state of the art and the number of techniques proposed, research must now focus on the quality of the data used and their harmonization. Indeed, until now, the studies carried out have been based on different types of datasets and different volumes of datasets. The data considered are overall those present in each country where the disease of COVID-19 has not necessarily evolved in the same way. Thus, it is essential to create benchmarks with real-world datasets to train future models on them.

Declaration

The authors declare that there is no conflict of interest with any person(s) or Organization(s).

1 https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports .

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Yassine Meraihi, Email: [email protected] .

Asma Benmessaoud Gabis, Email: zd.ise@duoassemneb_a .

Seyedali Mirjalili, Email: [email protected] .

Amar Ramdane-Cherif, Email: rf.qsvu.vsil@acr .

Fawaz E. Alsaadi, Email: moc.liamg@uakzawaf .

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
Explore content
About the journal
Publish with us
Sign up for alerts
Published: 17 April 2024

Machine learning reveals the control mechanics of an insect wing hinge

Johan M. Melis ORCID: orcid.org/0000-0001-8966-9496 1 ,
Igor Siwanowicz 2 &
Michael H. Dickinson ORCID: orcid.org/0000-0002-8587-9936 1

Nature volume 628 , pages 795–803 ( 2024 ) Cite this article

5413 Accesses

209 Altmetric

Metrics details

Biomechanics
Motor control

Insects constitute the most species-rich radiation of metazoa, a success that is due to the evolution of active flight. Unlike pterosaurs, birds and bats, the wings of insects did not evolve from legs 1 , but are novel structures that are attached to the body via a biomechanically complex hinge that transforms tiny, high-frequency oscillations of specialized power muscles into the sweeping back-and-forth motion of the wings 2 . The hinge consists of a system of tiny, hardened structures called sclerites that are interconnected to one another via flexible joints and regulated by the activity of specialized control muscles. Here we imaged the activity of these muscles in a fly using a genetically encoded calcium indicator, while simultaneously tracking the three-dimensional motion of the wings with high-speed cameras. Using machine learning, we created a convolutional neural network 3 that accurately predicts wing motion from the activity of the steering muscles, and an encoder–decoder 4 that predicts the role of the individual sclerites on wing motion. By replaying patterns of wing motion on a dynamically scaled robotic fly, we quantified the effects of steering muscle activity on aerodynamic forces. A physics-based simulation incorporating our hinge model generates flight manoeuvres that are remarkably similar to those of free-flying flies. This integrative, multi-disciplinary approach reveals the mechanical control logic of the insect wing hinge, arguably among the most sophisticated and evolutionarily important skeletal structures in the natural world.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Bridging two insect flight modes in evolution, physiology and robophysics

Basal complex: a smart wing component for automatic shape morphing

Birds can transition between stable and unstable states via wing morphing

Data availability.

The data required to perform the analyses in this paper and reconstruct all the data figure are available in the following files: main_muscle_and_wing_data.h5, flynet_data.zip, robofly_data.zip, which are available from the Caltech Data website: https://doi.org/10.22002/aypcy-ck464 . main_muscle_and_wing_data.h5 contains the time series of muscle activity and wing kinematics used to train the muscle-to-wing motion CNN and the encoder–decoder used in the latent variable analysis. flynet_data.zip contains a series of data files for training and running Flynet: (1) camera/calibration/cam_calib.txt (example camera calibration data); (2) movies/session_01_12_2020_10_22 (folder containing example movies); (3) labels.h5 and valid_labels.h5 (data for training); and (4) weights_24_03_2022_09_43_14.h5 (example weights). robofly_data.zip contains the MATLAB data files with force and torque data acquired using the dynamically scaled robotic fly.

Code availability

The code required to perform the analyses in this paper and reconstruct all the data figures are available at https://github.com/FlyRanch/mscode-melis-siwanowicz-dickinson . The software is organized into seven submodules: flynet, flynet-kalman, flynet-optimizer, latent-analysis, mpc-simulations, robofly and wing-hinge-cnn. The installation instructions, system requirements and dependency information are given separately in their respective folders. flynet is a neural network and GUI application that requires the dataset flynet_data.zip, and may be used to create Extended Data Fig. 2 . An example demonstrating how to train the network can be found in the examples sub-directory and is called train_flynet.py. flynet-kalman is a Kalman filter Python extension used by Flynet. flynet-optimizer is a particle swarm optimization extension module used by Flynet. latent-analysis is a Python library and Jupyter notebook for performing latent variable analysis that requires the dataset main_muscle_and_wing_data.h5, and may be used to create Fig. 6 and Extended Data Fig. 8 . mpc-simulations is a Python library and Jupyter notebook for MPC simulations, and may be used to create Fig. 5 and Extended Data Fig. 7 . robofly is a Python library and Jupyter notebook for extracting force and torque data from the robotic fly experiments and plotting forces superimposed on 3D wing kinematics. It requires dataset robofly_data.zip, and may be used to create Extended Data Figs. 5 and 6 . wing-hinge-cnn is a Python library and Jupyter notebook for creating the muscle-to-wing motion CNN. It requires main_muscle_and_wing_data.h5, and may be used to create Figs. 3 and 4 and Extended Data Fig. 3 . An example demonstrating how to train the network can be found in the examples sub-directory as is called train_wing_hinge_cnn.py. The files containing the raw videos of the muscle Ca 2+ images and high-speed videos of wing motion are too large to be hosted on a publicly accessible website. Example high-speed videos are provided in the folder movies/session_01_12_2020_10_22 mentioned in Data availability. Additional sequences are available upon request by contacting the corresponding author.

Grimaldi, D. & Engel, M. S. Evolution of the Insects (Cambridge Univ. Press, 2005).

Deora, T., Gundiah, N. & Sane, S. P. Mechanics of the thorax in flies. J. Exp. Biol. 220 , 1382–1395 (2017).

Article PubMed Google Scholar

Gu, J. et al. Recent advances in convolutional neural networks. Pattern Recognit. 77 , 354–377 (2018).

Article ADS Google Scholar

Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks. AlChE J. 37 , 233–243 (1991).

Article ADS CAS Google Scholar

Pringle, J. W. S. The excitation and contraction of the flight muscles of insects. J. Physiol. 108 , 226–232 (1949).

Article CAS PubMed PubMed Central Google Scholar

Josephson, R. K., Malamud, J. G. & Stokes, D. R. Asynchronous muscle: a primer. J. Exp. Biol. 203 , 2713–2722 (2000).

Article CAS PubMed Google Scholar

Gau, J. et al. Bridging two insect flight modes in evolution, physiology and robophysics. Nature 622 , 767–774 (2023).

Article ADS CAS PubMed PubMed Central Google Scholar

Boettiger, E. G. & Furshpan, E. The mechanics of flight movements in diptera. Biol. Bull. 102 , 200–211 (1952).

Article Google Scholar

Pringle, J. W. S. Insect Flight (Cambridge Univ. Press, 1957).

Miyan, J. A. & Ewing, A. W. How Diptera move their wings: a re-examination of the wing base articulation and muscle systems concerned with flight. Phil. Trans. R. Soc. B 311 , 271–302 (1985).

ADS Google Scholar

Wisser, A. Wing beat of Calliphora erythrocephala : turning axis and gearbox of the wing base (Insecta, Diptera). Zoomorph. 107 , 359–369 (1988).

Ennos, R. A. A comparative study of the flight mechanism of diptera. J. Exp. Biol. 127 , 355–372 (1987).

Dickinson, M. H. & Tu, M. S. The function of dipteran flight muscle. Comp. Biochem. Physiol. A 116 , 223–238 (1997).

Nalbach, G. The gear change mechanism of the blowfly ( Calliphora erythrocephala ) in tethered flight. J. Comp. Physiol. A 165 , 321–331 (1989).

Walker, S. M., Thomas, A. L. R. & Taylor, G. K. Operation of the alula as an indicator of gear change in hoverflies. J. R. Soc. Inter. 9 , 1194–1207 (2011).

Walker, S. M. et al. In vivo time-resolved microtomography reveals the mechanics of the blowfly flight motor. PLoS Biol. 12 , e1001823 (2014).

Article PubMed PubMed Central Google Scholar

Wisser, A. & Nachtigall, W. Functional-morphological investigations on the flight muscles and their insertion points in the blowfly Calliphora erythrocephala (Insecta, Diptera). Zoomorph. 104 , 188–195 (1984).

Heide, G. Funktion der nicht-fibrillaren Flugmuskeln von Calliphora. I. Lage Insertionsstellen und Innervierungsmuster der Muskeln. Zool. Jahrb., Abt. allg. Zool. Physiol. Tiere 76 , 87–98 (1971).

Google Scholar

Fabian, B., Schneeberg, K. & Beutel, R. G. Comparative thoracic anatomy of the wild type and wingless (wg1cn1) mutant of Drosophila melanogaster (Diptera). Arth. Struct. Dev. 45 , 611–636 (2016).

Tu, M. & Dickinson, M. Modulation of negative work output from a steering muscle of the blowfly Calliphora vicina . J. Exp. Biol. 192 , 207–224 (1994).

Tu, M. S. & Dickinson, M. H. The control of wing kinematics by two steering muscles of the blowfly ( Calliphora vicina ). J. Comp. Physiol. A 178 , 813–830 (1996).

Muijres, F. T., Iwasaki, N. A., Elzinga, M. J., Melis, J. M. & Dickinson, M. H. Flies compensate for unilateral wing damage through modular adjustments of wing and body kinematics. Interface Focus 7 , 20160103 (2017).

O’Sullivan, A. et al. Multifunctional wing motor control of song and flight. Curr. Biol. 28 , 2705–2717.e4 (2018).

Azevedo, A. et al. Tools for comprehensive reconstruction and analysis of Drosophila motor circuits. Preprint at BioRxiv https://doi.org/10.1101/2022.12.15.520299 (2022).

Donovan, E. R. et al. Muscle activation patterns and motoranatomy of Anna’s hummingbirds Calypte anna and zebra finches Taeniopygia guttata . Physiol. Biochem. Zool. 86 , 27–46 (2013).

Bashivan, P., Kar, K. & DiCarlo, J. J. Neural population control via deep image synthesis. Science 364 , eaav9436 (2019).

Lindsay, T., Sustar, A. & Dickinson, M. The function and organization of the motor system controlling flight maneuvers in flies. Curr. Biol. 27 , 345–358 (2017).

Reiser, M. B. & Dickinson, M. H. A modular display system for insect behavioral neuroscience. J. Neurosci. Meth. 167 , 127–139 (2008).

Albawi, S., Mohammed, T. A. & Al-Zawi, S. Understanding of a convolutional neural network. In 2017 International Conference on Engineering and Technology (ICET) 1–6 https://doi.org/10.1109/ICEngTechnol.2017.8308186 (2017).

Kennedy, J. & Eberhart, R. Particle swarm optimization. In Proc. ICNN’95—International Conference on Neural Networks Vol. 4, 1942–1948 (1995).

Dana, H. et al. High-performance calcium sensors for imaging activity in neuronal populations and microcompartments. Nat. Methods 16 , 649–657 (2019).

Muijres, F. T., Elzinga, M. J., Melis, J. M. & Dickinson, M. H. Flies evade looming targets by executing rapid visually directed banked turns. Science 344 , 172–177 (2014).

Article ADS CAS PubMed Google Scholar

Gordon, S. & Dickinson, M. H. Role of calcium in the regulation of mechanical power in insect flight. Proc. Natl Acad. Sci. USA 103 , 4311–4315 (2006).

Nachtigall, W. & Wilson, D. M. Neuro-muscular control of dipteran flight. J. Exp. Biol. 47 , 77–97 (1967).

Heide, G. & Götz, K. G. Optomotor control of course and altitude in Drosophila melanogaster is correlated with distinct activities of at least three pairs of flight steering muscles. J. Exp. Biol. 199 , 1711–1726 (1996).

Balint, C. N. & Dickinson, M. H. The correlation between wing kinematics and steering muscle activity in the blowfly Calliphora vicina . J. Exp. Biol. 204 , 4213–4226 (2001).

Elzinga, M. J., Dickson, W. B. & Dickinson, M. H. The influence of sensory delay on the yaw dynamics of a flapping insect. J. R. Soc. Interface 9 , 1685–1696 (2012).

Dickinson, M. H., Lehmann, F.-O. & Sane, S. P. Wing rotation and the aerodynamic basis of insect flight. Science 284 , 1954–1960 (1999).

Lehmann, F. O. & Dickinson, M. H. The changes in power requirements and muscle efficiency during elevated force production in the fruit fly Drosophila melanogaster . J. Exp. Biol. 200 , 1133–1143 (1997).

Lucia, S., Tătulea-Codrean, A., Schoppmeyer, C. & Engell, S. Rapid development of modular and sustainable nonlinear model predictive control solutions. Control Eng. Pract. 60 , 51–62 (2017).

Cheng, B., Fry, S. N., Huang, Q. & Deng, X. Aerodynamic damping during rapid flight maneuvers in the fruit fly Drosophila . J. Exp. Biol. 213 , 602–612 (2010).

Collett, T. S. & Land, M. F. Visual control of flight behaviour in the hoverfly, Syritta pipiens L. J. Comp. Physiol. 99 , 1–66 (1975).

Muijres, F. T., Elzinga, M. J., Iwasaki, N. A. & Dickinson, M. H. Body saccades of Drosophila consist of stereotyped banked turns. J. Exp. Biol. 218 , 864–875 (2015).

Syme, D. A. & Josephson, R. K. How to build fast muscles: synchronous and asynchronous designs. Integr. Comp. Biol. 42 , 762–770 (2002).

Snodgrass, R. E. Principles of Insect Morphology (Cornell Univ. Press, 2018).

Williams, C. M. & Williams, M. V. The flight muscles of Drosophila repleta . J. Morphol. 72 , 589–599 (1943).

Wootton, R. The geometry and mechanics of insect wing deformations in flight: a modelling approach. Insects 11 , 446 (2020).

Lerch, S. et al. Resilin matrix distribution, variability and function in Drosophila . BMC Biol. 18 , 195 (2020).

Weis-Fogh, T. A rubber-like protein in insect cuticle. J. Exp. Biol. 37 , 889–907 (1960).

Article CAS Google Scholar

Weis-Fogh, T. Energetics of hovering flight in hummingbirds and in Drosophila . J. Exp. Biol. 56 , 79–104 (1972).

Ellington, C. P. The aerodynamics of hovering insect flight. VI. Lift and power requirements. Phil. Trans. R. Soc. B 305 , 145–181 (1984).

Alexander, R. M. & Bennet-Clark, H. C. Storage of elastic strain energy in muscle and other tissues. Nature 265 , 114–117 (1977).

Mronz, M. & Lehmann, F.-O. The free-flight response of Drosophila to motion of the visual environment. J. Exp. Biol. 211 , 2026–2045 (2008).

Ristroph, L., Bergou, A. J., Guckenheimer, J., Wang, Z. J. & Cohen, I. Paddling mode of forward flight in insects. Phys. Rev. Lett. 106 , 178103 (2011).

Article ADS PubMed Google Scholar

Takemura, S. et al. A connectome of the male Drosophila ventral nerve cord. Preprint at bioRxiv https://doi.org/10.1101/2023.06.05.543757 (2023).

Cheong, H. S. J. et al. Transforming descending input into behavior: The organization of premotor circuits in the Drosophila male adult nerve cord connectome. Preprint at BioRxiv https://doi.org/10.1101/2023.06.07.543976 (2023).

Martynov, A. B. Über zwei Grundtypen der Flügel bei den Insecten und ihre Evolution. Z. Morph. Ökol. Tiere 4 , 465–501 (1925).

Wipfler, B. et al. Evolutionary history of Polyneoptera and its implications for our understanding of early winged insects. Proc. Natl Acad. Sci. USA 116 , 3024–3029 (2019).

Hasenfuss, I. The evolutionary pathway to insect flight—a tentative reconstruction. Arthr. System. Phylog. 66 , 19–35 (2008).

Willkommen, J. & Hörnschemeyer, T. The homology of wing base sclerites and flight muscles in Ephemeroptera and Neoptera and the morphology of the pterothorax of Habroleptoides confusa (Insecta: Ephemeroptera: Leptophlebiidae). Arthro. Struc. Develop. 36 , 253–269 (2007).

Willmann, R. in Arthropod Relationships (eds Fortey, R. A. & Thomas, R. H.) 269–279 (Springer, 1998); https://doi.org/10.1007/978-94-011-4904-4_20 .

Shao, L. et al. A neural circuit encoding the experience of copulation in female Drosophila . Neuron 102 , 1025–1036.e6 (2019).

Suver, M. P., Huda, A., Iwasaki, N., Safarik, S. & Dickinson, M. H. An array of descending visual interneurons encoding self-motion in Drosophila . J. Neurosci. 36 , 11768–11780 (2016).

Götz, K. G. Course-control, metabolism and wing interference during ultralong tethered flight in Drosophila melanogaster . J. Exp. Biol. 128 , 35–46 (1987).

Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter, S. in Advances in Neural Information Processing Systems Vol. 30 (Curran Associates, 2017).

Grewal, M. S. & Andrews, A. P. Kalman Filtering: Theory and Practice with MATLAB (John Wiley & Sons, 2014).

Fischler, M. A. & Bolles, R. C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24 , 381–395 (1981).

Article MathSciNet Google Scholar

Birch, J. M. & Dickinson, M. H. The influence of wing–wake interactions on the production of aerodynamic forces in flapping flight. J. Exp. Biol. 206 , 2257–2272 (2003).

Kouvaritakis, B. & Cannon, M. Model Predictive Control: Classical, Robust and Stochastic (Springer, 2016).

Download references

Acknowledgements

The authors thank W. Dickson for extensive expertise in instrumentation, programming, data analysis, formatting all the data and code for public repositories, and creating the animations of free flight data in Supplementary Videos 3 – 8 ; T. Lindsay for assistance in the design of the epifluorescence microscope and data acquisition software used for muscle imaging; A. Erickson for helpful comments on the manuscript and Supplementary Information ; A. Huda for assistance in the construction of genetic lines; J. Omoto for collecting confocal images of wings to visualize resilin using autofluorescence; J. Tuthill and T. Azevedo for a tomographic dataset of the Drosophila wing hinge that was collected at the European Synchrotron Radiation Facility in Grenoble, France; S. Whitehead for analysis of this tomography data to provide a preliminary reconstruction of the hinge sclerites, and for critical feedback on the manuscript text and data presentation; and B. Fabian and R. G. Beutel for providing μ-CT data from their publication on the morphology of the adult fly body. The research reported in this publication was supported by the National Institute of Neurological Disorders and Stroke of the NIH (U19NS104655). I.S. was supported through the AniBody Project Team at HHMI’s Janelia Research Campus for this work.

Author information

Authors and affiliations.

Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA, USA

Johan M. Melis & Michael H. Dickinson

Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA

Igor Siwanowicz

You can also search for this author in PubMed Google Scholar

Contributions

J.M.M. collected all the data presented in the manuscript and developed the software for data analysis. J.M.M. and M.H.D. collaborated on planning the experiments, preparing figures, and writing the manuscript. I.S. collected the high-resolution morphological images of the Drosophila thorax and created Supplementary Video 1 .

Corresponding author

Correspondence to Michael H. Dickinson .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 automated setup for simultaneous recording of muscle fluorescence and wing motion..

a , Illustration of experimental apparatus, created using Solidworks ( www.solidworks.com ). High-speed cameras, equipped with 0.5X telecentric lenses and collimated IR back-lighting capture synchronized frames of the fly from three orthogonal angles at a rate of 15,000 frames per second. An epi-fluorescence microscope with a muscle imaging camera records GCaMP7f fluorescence in the left steering muscles at approximately 100 frames per second, utilizing a strobing mechanism triggered every other wingbeat. A blue LED provides a brief, 1 ms illumination of the fly’s thorax during dorsal stroke reversal. A camera operating at 30 fps captures a top view of the fly for the kinefly wing tracker. b , Image of the flight arena featuring the components of the setup: LED panorama, IR diode and wingbeat analyzer for triggering the muscle camera and blue LED, prism for splitting the top view between the high-speed camera and kinefly camera, IR backlight, 4X lens of the epi-fluorescence microscope, and a tethered fly illuminated by the blue LED.

Extended Data Fig. 2 Flynet workflow and definitions of wing kinematic angles.

a , The Flynet algorithm takes three synchronized frames as input. Each frame undergoes CNN processing, resulting in a 256-element feature vector extracted from the image. These three feature vectors are concatenated and analyzed by a fully connected (dense) layer with Scaled Exponential Linear Unit (SELU) activation, consisting of 1024 neurons. The output of the neural network is the predicted state (37 elements) of the five model components represented by a quaternion (q), translation vector (p), and wing deformation angle (ξ). Subsequently, the state vector is refined using 3D model fitting and particle swarm optimization (PSO). Normally distributed noise is added to the predicted state, forming the initial state for 16 particles. During the 3D model fitting, the particles traverse the state-space, maximizing the overlap between binary body and wing masks of the segmented frames ( I b ) and the binary masks of the 3D model projected onto the camera views ( I p ). The cost function ( I b ∆ I p )/( I b ∪ I p ) is evaluated iteratively for a randomly selected 3D model component. The PSO algorithm tracks the personal best cost encountered by each particle and the overall lowest cost (global best). After 300 iterations, the refined state is determined by selecting the global best for each 3D model component. See Supplementary Information for more details. b , Training and validation error of the Flynet CNN as a function of training epoch.

Extended Data Fig. 3 CNN-predicted wing motion for example flight sequences.

a , The top five traces show activity of the steering muscles in the four sclerite groups as well as wingbeat frequency during a full, 1.1 second recording. The bottom four traces indicate comparison between the tracked (black) and CNN-predicted (red) wing kinematic angles throughout the sequence. Expanded plots of a 100-ms sequence (0.5 to 0.6 seconds) are plotted on the right. b, c, d . Same as but for a different flight sequences.

Extended Data Fig. 4 Correlation analysis of steering muscle fluorescence and wingbeat frequency.

Linear models (colored lines) fitted to wingbeats in the entire dataset of 72,219 wingbeats from 82 flies. Gray dots represent the normalized baseline muscle activity level, while colored dots represent the normalized maximum muscle activity level. The correlation coefficients associated with these plots are provided in Extended Data Table 1 . For more detail on regression methods, see Supplementary Information .

Extended Data Fig. 5 Aerodynamic force measurements and inertial force calculations.

a , Dynamically scaled flapping fly wing model immersed in mineral oil. b , Non-dimensional forces and torques in the strokeplane reference frame (SRF) for the baseline wingbeat. The four traces in each panel correspond to the total (black: F total , T total ), aerodynamic (blue: F aero , T aero ), inertial components due to acceleration (green: F acc , T acc ), and inertial components due to angular velocity (red: F angvel ; T ang vel ). See Supplementary Information for more details. c , Representation of total forces during the baseline wingbeat, viewed from the front, left, and top. Gray trace represents the wing trajectory; cyan arrows represent instantaneous total force on the wing. At the wing joint, three arrows depict the total mean force, half the body weight, and half the estimated body drag.

Extended Data Fig. 6 Aerodynamic and inertial forces for maximum muscle activity wingbeats.

Figures depict the CNN-predicted wing motion for maximum muscle activity patterns, viewed from the front, left, and top. Instantaneous vectors depicting the sum of aerodynamic and inertial forces are shown in cyan. The wingbeat-averaged force vector is indicated by the color corresponding to the specific steering muscle set to maximum activity. Note that the scaling for the wingbeat-averaged forces differs from that for the instantaneous forces. The black gravitational force and blue body drag force are plotted as in Extended Data Fig. 5c .

Extended Data Fig. 7 Simulation of free flight maneuvers using the state-space system and Model Predictive Control.

a , Schematic of the state-space system and MPC loop, including system matrix ( A ), control matrix ( B ), the state vector ( x ), temporal derivative ( $\dot{x}$ ) left and right steering muscle activity ( u L , u R ), initial state ( x init ) and goal state ( x goal ). b , Forward flight simulation with wingtip traces in red and blue. c , Wing motion during forward flight simulation plotted in stationary body frame. d , Backward flight simulation. e , Wing motion during backward flight simulation plotted in stationary body frame. f , Left and right steering muscle activity during the forward flight manoeuvre. g , State vector during forward flight manoeuvre. h , Steering muscle activity for the backward flight manoeuvre. i , State vector for the backward flight maneuver. j , CNN-predicted left (red) and right (blue) wing kinematics for the forward flight manoeuvre. Note that because this is a bilaterally symmetric flight manoeuvre, the model generates left and right wing kinematics that are identical. The left wing kinematics are displayed underneath the right kinematics,and thus cannot be seen. A baseline wingbeat is shown to emphasize the relative changes in wing motion. k , CNN-predicted wing motion for the backward flight manoeuvre.

Extended Data Fig. 8 Latent variable analysis reveals sclerite function using an encoder–decoder.

a , The network architecture consists of an encoder (red), muscle activity decoder (green), and wing kinematics decoder (blue). The encoder splits the input data into five streams corresponding to different muscle groups and frequency. Feature extraction is performed using convolutional and fully connected layers with SELU activation. Each stream is projected onto a single latent variable. In the muscle activity decoder, the latent variables are transformed back into the input data. A backpropagation stop prevents weight adjustments in the encoder based on the muscle activity reconstruction. The wing kinematics decoder predicts the Legendre coefficients of wing motion using the latent variables. See Supplementary Information for more details. b , Predicted muscle activity (replotted from Fig. 6 ) and normalized wingbeat frequency as a function of each latent parameter varied within the range of −3σ to +3σ. Color bar indicates the latent variable value in panels (c) and (e). c , Predicted wing motion by the wing kinematics decoder for the five latent parameters. d , Absolute angle-of-attack (|α|), wingtip velocity ( u tip ) in mm s −1 , non-dimensional lift (L mg −1 ), and non-dimensional drag (D mg −1 ). The non-dimensional lift and drag were computed using a quasi-steady model as described in Supplementary Information .

Extended Data Fig. 9 Flexible wing root facilitates elastic storage during wingbeat and allows wing to passively respond to changes in lift and drag throughout stroke.

a , Top view of ventral stroke reversal in free flight. Red circles mark the estimated position of the wing hinge, dotted lines indicate the expected position of the wing if a chord-wise flexure line was not present. Images are reproduced from previously publish data 32 from Drosophila hydei . b , Composite confocal image of the wing base on Drosophila melanogaster , indicating a bright blue band of auto-fluorescence consistent with the presence of resilin and existence of a chord-wise flexure line (dashed arrows). The image shown is characteristic of the 4 wings (from 4 individual female flies) that we processed for confocal microscopy.

Supplementary information

Supplementary information.

This file contains Supplementary Information, including Supplementary Figs. 1–4.

Reporting Summary

Supplementary video 1.

The left side of a Drosophila thorax, annotated to illustrate the arrangement of wing sclerites and associated musculature. The colour scheme used for the sclerites and muscles are consistent with Fig. 1.

Supplementary Video 2

Animations of seven simulated flight maneuvers shown in world and body frames (forward acceleration, backward acceleration, upward acceleration, downward acceleration, left saccade, right saccade, and sideways flight) generated using the CNN model of the wing hinge and state-space model operating with a MPC loop (see Fig. 5 and Extended Data Fig. 7).

Supplementary Video 3

A previously published free flight maneuver of D. hydei (Mujires et al., 2014), animated in the same format as that used to depict the simulated flight maneuvers in Supplementary Video 2. The sequence provides examples of slow and fast saccades and forward acceleration.

Supplementary Video 4

Supplementary Video 5

Supplementary Video 6

Supplementary Video 7

Supplementary Video 8

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Melis, J.M., Siwanowicz, I. & Dickinson, M.H. Machine learning reveals the control mechanics of an insect wing hinge. Nature 628 , 795–803 (2024). https://doi.org/10.1038/s41586-024-07293-4

Download citation

Received : 24 July 2023

Accepted : 11 March 2024

Published : 17 April 2024

Issue Date : 25 April 2024

DOI : https://doi.org/10.1038/s41586-024-07293-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Frequently Asked Questions

JMLR Papers

Select a volume number to see its table of contents with links to the papers.

Volume 25 (January 2024 - Present)

Volume 24 (January 2023 - December 2023)

Volume 23 (January 2022 - December 2022)

Volume 22 (January 2021 - December 2021)

Volume 21 (January 2020 - December 2020)

Volume 20 (January 2019 - December 2019)

Volume 19 (August 2018 - December 2018)

Volume 18 (February 2017 - August 2018)

Volume 17 (January 2016 - January 2017)

Volume 16 (January 2015 - December 2015)

Volume 15 (January 2014 - December 2014)

Volume 14 (January 2013 - December 2013)

Volume 13 (January 2012 - December 2012)

Volume 12 (January 2011 - December 2011)

Volume 11 (January 2010 - December 2010)

Volume 10 (January 2009 - December 2009)

Volume 9 (January 2008 - December 2008)

Volume 8 (January 2007 - December 2007)

Volume 7 (January 2006 - December 2006)

Volume 6 (January 2005 - December 2005)

Volume 5 (December 2003 - December 2004)

Volume 4 (Apr 2003 - December 2003)

Volume 3 (Jul 2002 - Mar 2003)

Volume 2 (Oct 2001 - Mar 2002)

Volume 1 (Oct 2000 - Sep 2001)

Special Topics

Bayesian Optimization

Learning from Electronic Health Data (December 2016)

Gesture Recognition (May 2012 - present)

Large Scale Learning (Jul 2009 - present)

Mining and Learning with Graphs and Relations (February 2009 - present)

Grammar Induction, Representation of Language and Language Learning (Nov 2010 - Apr 2011)

Causality (Sep 2007 - May 2010)

Model Selection (Apr 2007 - Jul 2010)

Conference on Learning Theory 2005 (February 2007 - Jul 2007)

Machine Learning for Computer Security (December 2006)

Machine Learning and Large Scale Optimization (Jul 2006 - Oct 2006)

Approaches and Applications of Inductive Programming (February 2006 - Mar 2006)

Learning Theory (Jun 2004 - Aug 2004)

Special Issues

In Memory of Alexey Chervonenkis (Sep 2015)

Independent Components Analysis (December 2003)

Learning Theory (Oct 2003)

Inductive Logic Programming (Aug 2003)

Fusion of Domain Knowledge with Data for Decision Support (Jul 2003)

Variable and Feature Selection (Mar 2003)

Machine Learning Methods for Text and Images (February 2003)

Eighteenth International Conference on Machine Learning (ICML2001) (December 2002)

Computational Learning Theory (Nov 2002)

Shallow Parsing (Mar 2002)

Kernel Methods (December 2001)

Help | Advanced Search

Computer Science > Machine Learning

Title: brain storm optimization based swarm learning for diabetic retinopathy image classification.

Abstract: The application of deep learning techniques to medical problems has garnered widespread research interest in recent years, such as applying convolutional neural networks to medical image classification tasks. However, data in the medical field is often highly private, preventing different hospitals from sharing data to train an accurate model. Federated learning, as a privacy-preserving machine learning architecture, has shown promising performance in balancing data privacy and model utility by keeping private data on the client's side and using a central server to coordinate a set of clients for model training through aggregating their uploaded model parameters. Yet, this architecture heavily relies on a trusted third-party server, which is challenging to achieve in real life. Swarm learning, as a specialized decentralized federated learning architecture that does not require a central server, utilizes blockchain technology to enable direct parameter exchanges between clients. However, the mining of blocks requires significant computational resources, limiting its scalability. To address this issue, this paper integrates the brain storm optimization algorithm into the swarm learning framework, named BSO-SL. This approach clusters similar clients into different groups based on their model distributions. Additionally, leveraging the architecture of BSO, clients are given the probability to engage in collaborative learning both within their cluster and with clients outside their cluster, preventing the model from converging to local optima. The proposed method has been validated on a real-world diabetic retinopathy image classification dataset, and the experimental results demonstrate the effectiveness of the proposed approach.

Submission history

Access paper:.

HTML (experimental)
Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

(PDF) A Research on Machine Learning Methods and Its Applications
(PDF) Application of Machine Learning Methods in Mental Health
PDF
Paper 1-Machine Learning for Bioclimatic Modelling
(PDF) An Overview of Artificial Intelligence and their Applications
Multi-Stage Optimized Machine Learning Framework for Network Intrusion

VIDEO

Why you should read Research Papers in ML & DL? #machinelearning #deeplearning
MLDescent #1: Can Anyone write a Research Paper in the Age of AI?
Extreme Learning Machine: Learning Without Iterative Tuning
A machine learning-based method to design modular metamaterials
Machine learning based analysis of crypto currency risk analysis project
Writing ML Research Papers

COMMENTS

The latest in Machine Learning
Papers With Code highlights trending Machine Learning research and the code to implement it. Papers With Code highlights trending Machine Learning research and the code to implement it. Browse State-of-the-Art ... an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models. Decision Making. 567.
Machine learning-based approach: global trends, research directions
Since ML appeared in the 1990s, all published documents (i.e., journal papers, reviews, conference papers, preprints, code repositories and more) related to this field from 1990 to 2020 have been selected, and specifically, within the search fields, the following keywords were used: "machine learning" OR "machine learning-based approach" OR ...
Machine Learning: Algorithms, Real-World Applications and Research
To discuss the applicability of machine learning-based solutions in various real-world application domains. To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services. The rest of the paper is organized as follows.
Journal of Machine Learning Research
The Journal of Machine Learning Research (JMLR), , provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. Final versions are (ISSN 1533-7928) immediately ...
Machine learning
Machine learning is the ability of a machine to improve its performance based on previous results. Machine learning methods enable computers to learn without being explicitly programmed and have ...
JMLR Papers
JMLR Papers. Select a volume number to see its table of contents with links to the papers. Volume 23 (January 2022 - Present) . Volume 22 (January 2021 - December 2021) . Volume 21 (January 2020 - December 2020) . Volume 20 (January 2019 - December 2019) . Volume 19 (August 2018 - December 2018) . Volume 18 (February 2017 - August 2018) . Volume 17 (January 2016 - January 2017)
Machine learning
Read the latest Research articles in Machine learning from Scientific Reports. ... Twitter users perceptions of AI-based e-learning technologies. ... Calls for Papers Guide to referees ...
Forecasting the future of artificial intelligence with machine learning
The corpus of scientific literature grows at an ever-increasing speed. Specifically, in the field of artificial intelligence (AI) and machine learning (ML), the number of papers every month is ...
Journal of Machine Learning Research
(Machine Learning Open Source Software Paper) On Distance and Kernel Measures of Conditional Dependence ... Intrinsic Persistent Homology via Density-based Metric Learning Ximena Fernández, Eugenio Borghini, Gabriel Mindlin, Pablo Groisman; (75):1 ... An Open Source Python Library to Support Research on Hyperdimensional Computing and Vector ...
[2104.05314] Machine learning and deep learning
Today, intelligent systems that offer artificial intelligence capabilities often rely on machine learning. Machine learning describes the capacity of systems to learn from problem-specific training data to automate the process of analytical model building and solve associated tasks. Deep learning is a machine learning concept based on artificial neural networks. For many applications, deep ...
Machine Learning with Applications
Machine Learning with Applications (MLWA) is a peer reviewed, open access journal focused on research related to machine learning.The journal encompasses all aspects of research and development in ML, including but not limited to data mining, computer vision, natural language processing (NLP), intelligent systems, neural networks, AI-based software engineering, bioinformatics and their ...
machine learning Latest Research Papers
This article examines the background to the problem and outlines a project that TNA undertook to research the feasibility of using commercially available artificial intelligence tools to aid selection. ... this research aims to predict user's personalities based on Indonesian text from social media using machine learning techniques. This ...
Home
Machine Learning is an international forum focusing on computational approaches to learning. ... Improves how machine learning research is conducted. Prioritizes verifiable and replicable supporting evidence in all published papers. Editor-in-Chief. Hendrik Blockeel; Impact factor 7.5 (2022) 5 year impact factor
Machine-Learning-Based Disease Diagnosis: A Comprehensive Review
Machine learning (ML), an area of artificial intelligence (AI), enables researchers, physicians, and patients to solve some of these issues. Based on relevant research, this review explains how machine learning (ML) is being used to help in the early identification of numerous diseases.
Top 20 Recent Research Papers on Machine Learning and Deep Learning
Most (but not all) of these 20 papers, including the top 8, are on the topic of Deep Learning. However, we see strong diversity - only one author (Yoshua Bengio) has 2 papers, and the papers were published in many different venues: CoRR (3), ECCV (3), IEEE CVPR (3), NIPS (2), ACM Comp Surveys, ICML, IEEE PAMI, IEEE TKDE, Information Fusion, Int ...
Top Machine Learning Research Papers Released In 2021
Top Machine Learning Research Papers Released In 2021. Advances in the machine and deep learning in 2021 could lead to new technologies utilised by billions of people worldwide. Published on November 18, 2021. by Dr. Nivash Jeevanandam. Advances in machine learning and deep learning research are reshaping our technology.
Top 10 Machine Learning Research Papers of 2021
Their techniques accomplish an ideal compromise among precision and computational proficiency contrasted and SOTA neural organization-based methodologies. TOP 10 MACHINE LEARNING TOOLS 2021. TOP COMPANIES USING MACHINE LEARNING IN A PROFITABLE WAY. MACHINE LEARNING GUIDE: DIFFERENCES BETWEEN PYTHON AND JAVA.
Top Machine Learning (ML) Research Papers Released in 2022
This 2022 ML paper presents an algorithm that teaches the meta-learner how to overcome the meta-optimization challenge and myopic meta goals. The algorithm's primary objective is meta-learning using gradients, which ensures improved performance. The research paper also examines the potential benefits due to bootstrapping.
2020's Top AI & Machine Learning Research Papers
The Best of Applied Artificial Intelligence, Machine Learning, Automation, Bots, Chatbots. 2020's Top AI & Machine Learning Research Papers. November 24, 2020 by Mariya Yao. Despite the challenges of 2020, the AI research community produced a number of meaningful technical breakthroughs. GPT-3 by OpenAI may be the most famous, but there are ...
Machine Learning-Based Research for COVID-19 Detection, Diagnosis, and
This paper demonstrated the interest attached by data scientists to this particular situation. It provided a survey of Machine Learning based research classified into two categories (Supervised Learning approaches and Deep Learning approaches) to make detection, diagnosis, or prediction of the COVID-19.
Machine Learning
a few innovative research works and their applications in real-world, such as stock trading, medical and healthcare systems, and software automation. ... reinforcement learning-based systems; and finally, (9) evolution of few-shots, ... Machine learning (ML) is the ability of a system to automatically acquire, integrate, ...
Top 10 Machine Learning Papers of 2022
To bring you up to speed on the critical ideas driving machine learning in 2022, we handpicked the top 10 research papers for all AI/ML enthusiasts out there! Let's dive in! Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits. Author (s) - Sean R. Sinclair et al. Ways to incorporate historical data are still ...
A Machine Learning Approach to Predict Fluid Viscosity Based on Droplet
In recent years, machine learning has made significant progress in the field of micro-fluids, and viscosity prediction has become one of the hotspots of research. Due to the specificity of the application direction, the input datasets required for machine learning models are diverse, which limits the generalisation ability of the models. This paper starts by analysing the most obvious kinetic ...
A FAIR and modular image‐based workflow for knowledge discovery in the
Methods in Ecology and Evolution is an open access journal publishing papers across a wide range of subdisciplines, ... Image-based machine learning tools are an ascendant 'big data' research avenue. ... Machine learning research can result in multiple tools being maintained in a single large and therefore monolithic repository that ...
Machine Learning-Based Prediction of Swirl Combustor Operation ...
It was found that the model can predict Q and φ within ±5.17 L/min (equivalent to 3.4% of the total flow rate) and ±0.026, respectively, with a 96% probability. In addition, abnormal operations can be identified based on large reconstruction errors of input images.
Machine learning reveals the control mechanics of an insect ...
Here we used machine learning and physics-based simulations to gain insight into the underlying mechanics of the wing hinge of flies and its active regulation during flight. Fig. 1: The wing hinge ...
JMLR Papers
JMLR Papers. Select a volume number to see its table of contents with links to the papers. Volume 25 (January 2024 - Present) ... Machine Learning and Large Scale Optimization (Jul 2006 - Oct 2006) Approaches and Applications of Inductive Programming (February 2006 - Mar 2006)
Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy
The application of deep learning techniques to medical problems has garnered widespread research interest in recent years, such as applying convolutional neural networks to medical image classification tasks. However, data in the medical field is often highly private, preventing different hospitals from sharing data to train an accurate model. Federated learning, as a privacy-preserving ...
Liquid Material Identification based on RFID Passive Sensing and
Liquid Material Identification based on RFID Passive Sensing and Machine Learning. The proposed approach can effectively identify ten common liquids such as beer, coke, spirit and milk, with an average identification accuracy of 99.4%, which is 4.1% higher than the existing methods, implying its good theoretical significance and application value.

Subscribe to the PwC Newsletter

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

Magic Clothing: Controllable Garment-Driven Image Synthesis

Solving Data Quality Problems with Desbordante: a Demo

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

JMLR Papers

Special Topics

Special Issues

machine learning Recently Published Documents

An explainable machine learning model for identifying geographical origins of sea cucumber Apostichopus japonicus based on multi-element profile

Automated Text Classification of Maintenance Data of Higher Education Buildings Using Text Mining and Machine Learning Techniques

Compressive strength of concrete with recycled aggregate; a machine learning-based evaluation

Export Citation Format

Machine Learning

Latest issue

Latest articles

From MNIST to ImageNet and back: benchmarking continual curriculum learning

Reversible jump attack to textual classifiers with modification reduction

A survey on interpretable reinforcement learning

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Journal updates

CfP: IJCLR Learning and reasoning

Call for Papers: DSAA 2024 Journal Track with Machine Learning Journal

Journal information

Top 10 Machine Learning Research Papers of 2021

Machine learning research papers showcasing the transformation of the technology

Oops I took a gradient: Scalable sampling for discrete distributions

You May Also Like

Can Robots Learn Complicated Tasks from Few Demonstrations?

Top 5 Metaverse Activations You Need to Know about Right Now!

10 Types of Cyberattacks that Will Take New Shape in 2023

Bitcoin Price Prediction: BTC to surpass 2021 high in November, Retik Finance (RETIK) at $0.09 eyes $30

Special Editions

Latest Issue

Second Menu

A Comprehensive Guide on RTMP Streaming

Subscribe to our newsletter

RELATED ARTICLES

Most Popular

2020’s Top AI & Machine Learning Research Papers

Best AI & ML Research Papers 2020

Our Summary

What’s the core idea of this paper?

What’s the key achievement?

What does the AI community think?

What are future research areas?

2. Efficiently Sampling Functions from Gaussian Process Posteriors , by James T. Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth

Where can you get implementation code?

What are possible business applications?

4. Towards a Human-like Open-Domain Chatbot , by Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

6. Beyond Accuracy: Behavioral Testing of NLP models with CheckList , by Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh

7. EfficientDet: Scalable and Efficient Object Detection , by Mingxing Tan, Ruoming Pang, Quoc V. Le

8. Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild , by Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi

9. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale , by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

10. AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients , by Juntang Zhuang, Tommy Tang, Sekhar Tatikonda, Nicha Dvornek, Yifan Ding, Xenophon Papademetris, James S. Duncan

Enjoy this article? Sign up for more AI research updates.

Reader Interactions

About Mariya Yao

Leave a Reply

About TOPBOTS

Machine Learning-Based Research for COVID-19 Detection, Diagnosis, and Prediction: A Survey

Asma Benmessaoud Gabis

Seyedali Mirjalili

Amar Ramdane-Cherif

Fawaz E. Alsaadi

Introduction

Artificial Intelligence, Machine Learning and Deep Learning

Supervised Learning

Linear Regression

Logistic Regression

Support Vector Machine (SVM)

Decision Tree

Random Forest Algorithms

Artificial Neural Network (ANN)

Unsupervised Learning