Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 May 2024

Novel applications of Convolutional Neural Networks in the age of Transformers

  • Tansel Ersavas 1 ,
  • Martin A. Smith 1 , 2 , 3 , 4 &
  • John S. Mattick 1  

Scientific Reports volume  14 , Article number:  10000 ( 2024 ) Cite this article

1 Altmetric

Metrics details

  • Computational science
  • Machine learning

Convolutional Neural Networks (CNNs) have been central to the Deep Learning revolution and played a key role in initiating the new age of Artificial Intelligence. However, in recent years newer architectures such as Transformers have dominated both research and practical applications. While CNNs still play critical roles in many of the newer developments such as Generative AI, they are far from being thoroughly understood and utilised to their full potential. Here we show that CNNs can recognise patterns in images with scattered pixels and can be used to analyse complex datasets by transforming them into pseudo images with minimal processing for any high dimensional dataset, representing a more general approach to the application of CNNs to datasets such as in molecular biology, text, and speech. We introduce a pipeline called DeepMapper , which allows analysis of very high dimensional datasets without intermediate filtering and dimension reduction, thus preserving the full texture of the data, enabling detection of small variations normally deemed ‘noise’. We demonstrate that DeepMapper can identify very small perturbations in large datasets with mostly random variables, and that it is superior in speed and on par in accuracy to prior work in processing large datasets with large numbers of features.

Similar content being viewed by others

artificial neural networks research papers

Deep learning for cellular image analysis

artificial neural networks research papers

Off-the-shelf deep learning is not enough, and requires parsimony, Bayesianity, and causality

artificial neural networks research papers

A guide to machine learning for biologists

Introduction.

There are exponential increases in data 1 especially from highly complex systems, whose non-linear interactions and relationships are not well understood, and which can display major or unexpected changes in response to small perturbations, known as the ‘Butterfly effect’ 2 .

In domains characterised by high-dimensional data, traditional statistical methods and Machine Learning (ML) techniques make heavy use of feature engineering that incorporates extensive filtering, selection of highly variable parameters, and dimension reduction techniques such as Principal Component Analysis (PCA) 3 . Most current tools filter out smaller changes in data, mostly considered artefacts or `noise`, which may contain information that is paramount to understanding the nature and behaviour of such highly complex systems 4 .

The emergence of Deep Learning (DL) offers a paradigm shift. DL algorithms, underpinned by adaptive learning mechanisms, can discern both linear and non-linear data intricacies, and open avenues to analyse data that is not possible or practical by conventional techniques 5 , particularly in complex domains such as image, temporal sequence analysis, molecular biology, and astronomy 6 . DL models, such as Convolutional Neural Networks (CNNs) 7 , Recurrent Neural Networks (RNNs) 8 , Generative Network s 9 and Transformers 10 , have demonstrated exceptional performance in various domains, such as image and speech recognition, natural language processing, and game playing 6 . CNNs and LSTMs were found to be great tools to predict behaviour of so called `chaotic` systems 11 . Modern DL systems often surpass human-level performance, and challenge humans even in creative endeavours.

CNNs utilise a unique architecture that comprises several layers, including convolutional layers, pooling layers, and fully connected layers, to process and transform the input data hierarchically 5 . CNNs have no knowledge of sequence, and therefore are generally not used in analysing time-series or similar data, which is traditionally attempted with Recurrent Neural Networks (RNNs) 12 and Long Short-Term Memory networks (LSTMs) 8 due to their ability to capture temporal patterns. Where CNNs have been employed for sequence or time-series analysis, 1-dimensional (1D) CNNs have been selected because of their vector based 1D input structure 13 . However, attempts to analyse such data in 1D CNNs do not always give superior results 14 . In addition, GPU (Graphical Processing Units) systems are not always optimised for processing 1D CNNs, therefore even though 1D CNNs have fewer parameters than 2-dimensional (2D) CNNs, 2D CNNs can outperform 1D CNNs 15 .

Transformers , introduced by Vaswani et al. 10 , have recently come to prominence, particularly for tasks where data are in the form of time series or sequences, in domains ranging from language modelling to stock market prediction 16 . Transformers leverage self-attention, a key component that allows a model to weigh and focus on various parts of an input sequence when producing an output, enabling the capture of long-range dependencies in data. Unlike CNNs, which use local receptive fields, self-attention weighs the significance of various parts of the input data 17 .

Following success with sequence-based tasks, Transformers are being extended to image processing. Vision-Transformers in object detection 18 , Detection Transformers 19 and lately Real-time Detection Transformers all claim superiority over CNNs 20 . However, their inference operations demand far more resources than CNNs and trail CNNs in flexibility. They also suffer similar augmentation problems as CNNs. More recently, Retentive-Networks have been offered as an alternative to Transformers 21 and may soon challenge the Transformer architecture.

CNNs can recognise dispersed patterns

Even though CNNs are widely used, there are some misconceptions, notably that CNNs are largely limited to image data, and require established spatial relationships between pixels in images, both of which are open to challenge. The latter is of particular importance when considering the potential of CNNs to analyse complex non-image datasets, whose data structures are arbitrary.

Moreover, while CNNs are universal function approximators 22 , they may not always generalise 23 , especially if they are trained on data that is insufficient to cover the solution space 24 . It is also known that they can spontaneously generalise even when supplied with a small number of samples during training after overfitting, called ‘grokking’ 25 , 26 . CNNs can generalise from scattered data if given enough samples, or if they grok, and this can be determined by observing changes to training versus testing accuracy and loss.

Non-image processing with CNNs

While CNNs have achieved remarkable success in computer vision applications, such as image classification and object detection 7 , 27 , they have also been employed in other domains to a lesser degree with impressive results, including: (1) natural language processing, text classification, sentiment analysis and named entity recognition, by treating text data as a one-dimensional image with characters represented as pixels 16 , 28 ; (2) audio processing, such as speech recognition, speaker identification and audio event detection, by applying convolutions over time frequency representations of audio signals 29 ; (3) time series analysis, such as financial market prediction, human activity recognition and medical signal analysis, using one-dimensional convolutions to capture local temporal patterns and learn features from time series data 30 ; and (4) biopolymer (e.g., DNA) sequencing, using 2D CNNs to accurately classify molecular barcodes in raw signals from Oxford Nanopore sequencers using a transformation to turn a 1D signal into 2D images—improving barcode identification recovery from 38 to over 85% 31 .

Indeed, CNNs are not perfect tools for image processing as they do not develop semantic understanding of images even though they can be trained to do semantic segmentation 32 . They cannot easily recognise negative images when trained with positive images 33 . CNNs are also sensitive to the orientation and scale of objects and must rely on augmentation of image datasets, often involving hundreds of variations of the same image 34 . There are no such changes in the perspective and orientation of data converted into flat 2D images.

In the realm of complex domains that generate huge amounts of data, augmentation is usually not required for non-image datasets, as the datasets will be rich enough. Moreover, introducing arbitrary augmentation does not always improve accuracy; indeed, introducing hand-tailored augmentation may hinder analysis 35 . If augmentation is required, it can be introduced in a data-oriented form, but even when using automated augmentation such as AutoAugment 35 or FasterAutoAugment 36 , many of the augmentations (such as shearing, translation, rotation, inversion, etc.) should not be used, and the result should be tested carefully, as augmentation may introduce artefacts.

A frequent problem with handling non-image datasets with many variables is noise. Many algorithms have been developed for noise elimination, most of which are domain specific. CNNs can be trained to use the whole input space with minimal filtering and no dimension reduction, and can find useful information in what might be ascribed as ‘noise’ 4 , 37 . Indeed, a key reason to retain ‘noise’ is to allow discovery of small perturbations that cannot be detected by other methods 11 .

Conversion of non-image data to artificial images for CNN processing

Transforming sequence data to images without resorting to dimension reduction or filtering offers a potent toolset for discerning complex patterns in time series and sequence data, which potentiates the two major advantages of CNNs compared to RNNs, LSTMs and Transformers . First, CNNs do not depend on past data to recognise current patterns, which increases sensitivity to detect patterns that appear in the beginning of time-series or sequence data. Second, 2D CNNs are better optimised for GPUs and highly parallelizable, and are consequently faster than other current architectures, which accelerates training and inference, while reducing resource and energy consumption during in all phases including image transformation, training, and inference significantly.

Image data such as MNIST represented in a matrix can be classified by basic deep networks such as Multi-level Perceptrons (MLP) by turning their matrix representation to vectors (Fig.  1 a). Using this approach analysis of images becomes increasingly complex as the image size grows, increasing the input parameters of MLP and the computational cost exponentially. On the other hand, 2D CNNs can handle the original matrix much faster than MLP with equal or better accuracy and scale to much larger images.

figure 1

Conversion of images to vectors and vice versa. ( a ) Basic operation of transformation of an image to a vector, forming a sequence representation of the numeric values of pixels. ( b ) Transforming a vector to a matrix, forming an image by encoding numerical values as pixels. During this operation if the vector size cannot be mapped to m X n because vector size is smaller than the nearest m X n, then it is padded with zeroes to the nearest m X n.

Just like how a simple neural network analyses a 2D image by turning it into a vector, the reciprocal is also true—data in a vector can be converted to a 2D matrix (Fig.  1 b). Vectors converted to such matrices form arbitrary patterns that are incomprehensible to human eye. A similar technique for such mapping has also been proposed by Kovelarchuk et al. using another algorithm called CPC-R 38 .

Attribution

An important aspect of any analysis is to be able to identify those variables that are most important and the degree to which they contribute to a given classification. Identifying these variables is particularly challenging in CNNs due to their complex hierarchical architecture, and many non-linear transformations 39 . To address this problem many ‘attribution methods’ have been developed to try to quantify the contribution of each variable (e.g., pixels in images) to the final output for deep neural networks and CNNs 40 .

Saliency maps serve as an intuitive attribution and visualisation tool for CNNs, spotlighting regions in input data that significantly influence the model's predictions 27 . By offering a heatmap representation, these maps illuminate key features that the model deems crucial, thus aiding in demystifying the model's decision-making process. For instance, when analysing an image of a cat, the saliency map would emphasise the cat's distinct features over the background. While their simplicity facilitates understanding even for those less acquainted with deep learning, saliency maps do face challenges, particularly their sensitivity to noise and occasional misalignment with human intuition 41 , 42 , 43 . Nonetheless, they remain a pivotal tool in enhancing model transparency and bridging the interpretability gap between ML models and human comprehension.

Several methods have been proposed for attribution, including Guided Backpropagation 44 , Layer-wise Relevance Propagation 45 , Gradient-weighted Class Activation Mapping 46 , Integrated Gradients 47 , DeepLIFT 48 , and SHAP (SHapley Additive exPlanations) 49 . Many of these methods were developed because it is challenging to identify important input features when there are different images with the same label (e.g., ‘bird’ with many species) presented at different scales, colours, and perspectives. In contrast, most non-image data does not have such variations, as each pixel corresponds to the same feature. For this reason, choosing attributions with minimal processing is sufficient to identify the salient input variables that have the maximal impact on classification.

Here we introduce a new analytical pipeline, DeepMapper , which applies a non-indexed or indexed mapping to the data representing each data point with one pixel, enabling the classification or clustering of data using 2D CNNs. This simple direct mapping has been tried by others but has not been tested with datasets with sufficiently large amounts of data in various conditions. We use raw data with minimal filtering and no dimension reduction to preserve small perturbations in data that are normally removed, in order to assess their impact.

The pipeline includes conversion of data, separation to training and validation, assessment of training quality, attribution, and accumulation of results in a pipeline. The pipeline is run multiple times until a consensus is reached. The significant variables can then be identified using attribution and exported appropriately.

The DeepMapper architecture is shown in Fig.  2 . The complete algorithm of DeepMapper is detailed in the “ Methods ” section and the Python source code is supplied at GitHub 50 .

figure 2

DeepMapper architecture. DeepMapper uses sequence or multi-variate data as input. The first step of DeepMapper is to merge and if required index input files to prepare them into matrix format. The data are normalised using log normalisation, then folded to a matrix. Folding is performed either directly with the natural order of the data or by using the index that is generated or supplied during the data import. After folding, the data are kept in temporary storage and separated to ‘train’ and ‘test’ using SciPy train test split. Training is done using either using CNNs that are supplied by the PyTorch libraries, or a custom CNN supplied ( ResNet18 is used by default). Intermediary results are run through attribution algorithms supplied by the Captum 51 and saved to run history log. The run is then repeated until convergence is achieved, or until a pre-determined number of iterations are performed by shuffling training testing and validation data. Results are summarised in a report with exportable tables and graphics. Attribution is applied to true positives and true negatives, and these are translated back to features to be added to reports. Further details can be directly found in the accompanying code 50 .

DeepMapper is developed to implement an approach to process high-dimensional data without resorting to excessive filtering and dimension reduction techniques that eliminate smaller perturbations in data to be able to identify those differences that would otherwise be filtered out. The following algorithm is used to achieve this result:

Read and setup the running parameters.

Read the data into a tabulated form in the form of observations, features, and outcome (in the form of labels, or if self-supervised, the input itself).

If the input data includes categorical features, these features should be converted to numbers and normalised before feeding to DeepMapper .

Identify features and labels.

Do only basic filtering that eliminates observations or features if all of them are 0 or empty.

Normalise features.

Transform tabulated data to 2-dimensional matrices as illustrated in Fig.  1 a by applying a vector to matrix transformation.

If the analysis is supervised, then transform class labels to output matrices.

Begin iteration:

Separate the data into training and validation groups.

Train on the dataset for required number of epochs, until reaching satisfactory testing accuracy and loss, or maximum a pre-determined number of iterations.

If satisfactory testing results are obtained, then:

Perform attributions by associating each result to contributing input pixels using Captum, a Python library for attributions 51 .

Accumulate attribution results by collecting the attribution results for each class.

If training is satisfactory:

Tabulate attribution results by averaging accumulated attributions.

Save the model.

Report results.

The results of DeepMapper analysis can be used in 2 ways:

Supervised: DeepMapper produces a list of features that played a prominent role in the differentiation of classes.

Self-supervised: Highlights the most important features in differentiating observations from each other in a non-linear fashion. The output can be used as an alternative feature selection tool for dimension reduction.

In both modes, any hidden layer can be examined as latent space. A special bottleneck layer can be introduced to reduce dimensions for clustering purposes.

We present a simple example to demonstrate that CNNs can readily interpret data with a well dispersed pattern of pixels, using the MNIST dataset, which is widely used for hand-written image recognition and which humans as well as CNNs can easily recognise and classify based on the obvious spatial relationships between pixels (Fig.  3 ). This dataset is a more complicated problem than datasets such as the Gisette dataset 52 that was developed to distinguish between 4 and 9. It includes all digits and uses a full randomisation of pixels, and can be regenerated with the script supplied 50 and changing the seed will generate different patterns.

figure 3

A sample from MNIST dataset (left side of each image) and its shuffled counterpart (right side).

We randomly shuffled the data in Fig.  3 using the same seed 50 to obtain 60,000 training images such as those shown on the right side of each digit, and validated the results with a separate batch of 20,000 images (Fig.  3 ). Although the resulting images are no longer recognizable by eye, a CNN has no difficulty distinguishing and classifying each pattern with ~ 2% testing error compared to the reference data (Fig.  4 ). This result demonstrates that CNNs can accurately recognise global patterns in images without reliance on local relationships between neighbouring pixels. It also confirms the finding that shuffling images only marginally increases training loss 23 and extends it to testing loss (Fig.  4 ).

figure 4

Results of training MNIST dataset ( a ) and the shuffled dataset ( b ) with PyTorch model ResNet18 50 . The charts demonstrate although the training continued for 50 epochs, about 15 epochs for shuffled images ( b ) would be enough, as further training starts causing overfitting. The decrease of accuracy between normal and shuffled images is about 3%, and this difference cannot be improved by using more sophisticated CNNs with more layers, meaning shuffling images cause a measurable loss of information, yet still hold patterns recognisable by CNNs.

Testing DeepMapper

Finding slight changes in very few variables in otherwise seemingly random datasets with large numbers of variables is like finding a needle in a haystack. Such differences in data are almost impossible to detect using traditional analysis tools because small variations are usually filtered out before analysis.

We devised a simple test case to determine if DeepMapper can detect one or more variables with small but distinct variations in otherwise randomly generated data. We generated a dataset with 10,000 data items with 18,225 numeric variables as an example of a high-dimensional dataset using PyTorch’s uniform random algorithms 53 . The algorithm sets 18,223 of these variables to random numbers in the range of 0–1, and two of the variables into two distinct groups as seen in Table 1 .

We call this type of dataset ‘Needle in a haystack’ (NIHS) dataset, where very small amounts of data with small variance is hidden among a set of random variables that is order(s) of magnitude greater than the meaningful components. We provide a script that can generate this and similar datasets among the source supplied 50 .

DeepMapper was able to accurately classify the two datasets (Fig.  5 ). Furthermore, using attribution DeepMapper was also able to determine the two datapoints that have different variances in the two classes. Note that DeepMapper may not always find all the changes in the first attempt as neural network initialisation of weights is a stochastic process. However, DeepMapper o vercomes this matter via multiple iterations to establish acceptable training and testing accuracies as described in the Methods.

figure 5

In this demonstration of analysis of high dimensional data with very small perturbations, DeepMapper can find these small variations in a few (in this example two) variables out of very large number of random variables (here 18,225). ( a ) DeepMapper representations of each record. ( b ) The result of the test run of the classification with unseen data (3750 elements). ( c ) The first and second variables in the graph are measurably higher than the other variables.

Comparison of DeepMapper with DeepInsight

DeepInsight 54 is the most general approach published to date for converting non-image data into image-like structures, with the claim that these processed structures allow CNNs to capture complex patterns and features in the data. DeepInsight offers an algorithm to create images that have similar features collated into a “well organised image form”, or by applying one of several dimensionality reduction algorithms (e.g., t-SNE, PCA or KPCA) 54 . However, these algorithms add computational complexity, potentially eliminate valuable information, limit the abilities of CNNs to find small perturbations, and make it more difficult to use attribution to determine most notable features impacting analysis as multiple features may overlap in the transformed image. In contrast DeepMapper uses a direct mapping mechanism where each feature corresponds to one pixel.

To identify important input variables, DeepInsight authors later developed DeepFeature 55 using an elaborate mechanism to associate image areas identified by attribution methods to the input variables. DeepMapper uses a simpler approach as each pixel corresponds to only one variable and can use any of the attribution methods to link results to its input space. While both DeepMapper and DeepInsight follow the general idea that non-image data can be processed with 2D CNNs, DeepMapper uses a much simpler and faster algorithm, while DeepInsight chooses a sophisticated set of algorithms to convert non-image data to images, dramatically increasing computational cost. The DeepInsight conversion process is not designed to utilise GPUs so cannot be accelerated by better hardware, and the obtained images may be larger than the number of data points, also impacting performance.

One of the biggest differences between DeepFeature and DeepMapper is that DeepFeature in many cases selects multiple features during attribution because DeepInsight pixels represent multiple values, whereas each DeepMapper pixel represents one input feature, therefore it can determine differentiating features with pinpoint accuracy at a resolution of 1 pixel per feature.

The DeepInsight manuscript offers various examples of data to demonstrate its abilities. However, many of the examples use low dimensions (20–4000 features) while today’s complex datasets may regularly require tens of thousands to millions of features such as in genome analysis in biology and radio-telescope analysis in astronomy. As such, several examples provided by DeepInsight have insufficient dimensions for a sophisticated mechanism such as DeepMapper , which should ideally have 10,000 or more dimensions as required by modern complex datasets. DeepInsight examples include a speech dataset from the TIMIT corpus with 39 dimensions, Relathe (text) dataset, which is derived from newsgroup documents and partitioned evenly across different newsgroups. It contains 1427 samples and 4322 dimensions. The ringnorm-DELVE , which is an implementation of Leo Breiman’s ringnorm example, is a 20 dimensional, 2 class classification with 7400 samples 54 . Another example, Madelon , introduced an artificially generated dataset 2600 samples and 500 dimensions, where only 5 principal and 20 derived variables containing information. Instead, we used a much more complicated example than Madelon , an NIHS dataset 50 that we used to test DeepMapper in the first place. We attempted to run DeepInsight with NIHS data, but we could not get it to train properly and for this reason we cannot supply a comparison.

The most complex problem published by DeepInsight was the analysis of a public RNA sequencing gene expression dataset from TCGA ( https://cancergenome.nih.gov/ ) containing 6216 samples of 60,483 genes or dimensions, of which DeepInsight used 19,319. We selected this example as the second demonstration of application of DeepMapper to high dimensional data, as well as a benchmark for comparison with DeepInsight .

We generated the data using the R script offered by DeepInsight 54 and ran DeepMapper as well as DeepInsight using the generated dataset to compare accuracy and speed. In this test DeepMapper exhibited much improved processing speed with near identical accuracy (Table 2 , Fig.  6 ).

figure 6

Analysis of TCGA data by DeepInsight vs DeepMapper: The image on the top was generated by DeepInsight using its default values and a t-SNE transformer supplied by DeepInsight . The image at the bottom was generated by DeepMapper. Image conversion and training speeds and the analysis results can be found in Table 2 .

CNNs are fundamentally sophisticated pattern matchers that can establish intricate mappings between input features and output representations 6 . They excel at transforming various inputs into outputs, including identifying classes or bounding boxes, through a series of operations involving convolution, pooling, and activation functions 7 , 56 .

Even though CNNs are in the centre of many of today’s revolutionary AI systems from self-driving cars to generative AI systems such as Dall-E-2 , MidJourney and Stable Diffusion , they are still not well understood nor efficiently utilised, and their usage beyond image analysis has been limited.

While CNNs used in image analysis are constrained historically and practically to a 224 × 224 matrix or a similar fixed size input, this limitation arises for pre-trained models. When CNNs have not been pre-trained, one can select a much wider variety of sizes as input shape depending on the CNN architecture. Some CNNs are more flexible in their input size that implemented with adaptive pooling layers such as ResNet18 using adaptive pooling 57 . This provides flexibility to choose optimal sizes for the task in hand for non-image applications, as most non-image applications will not use pre-trained CNNs.

Here we have demonstrated uses of CNNs that are outside the norm. There is a need for analysis of complex data with many thousands of features that are not primarily images. There is also a lack of tools that offer minimal conversion of non-image data to image-like formats that then can easily be processed with CNNs in classification and clustering tasks. As a lot of this data is coming from complex systems that have a lot of features, DeepMapper offers a way of investigating such data in ways that may not be possible with traditional approaches.

Although DeepMapper currently uses CNN as its AI component, alternative analytic strategies can easily be substituted in lieu of CNN with minimal changes, such as Vision Transformers 18 or RetNets 21 , which have great potential for this application. While Transformers and RetNets have input size limitations for inference in terms of number of tokens. Vision Transformers can handle much larger inputs by dividing images to segments that incorporate multiple pixels 18 . This type of approach would be applicable to both Transformers and RetNets , and future architectures. DeepMapping can leverage these newer architectures, and others, in the future 57 .

Data availability

DeepMapper is released as an open source tool on GitHub https://github.com/tansel/deepmapper . Data that is not available from GitHub because of size constraints can be requested from the authors.

Taylor, P. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025. https://www.statista.com/statistics/871513/worldwide-data-created/ (2023).

Ghys, É. The butterfly effect. in The Proceedings of the 12th International Congress on Mathematical Education: Intellectual and attitudinal challenges, pp. 19–39 (Springer). (2015).

Jolliffe, I. T. Mathematical and statistical properties of sample principal components. Principal Component Analysis , pp. 29–61 (Springer). https://doi.org/10.1007/0-387-22440-8_3 (2002).

Landauer, R. The noise is the signal. Nature 392 , 658–659. https://doi.org/10.1038/33551 (1998).

Article   ADS   CAS   Google Scholar  

Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press). http://www.deeplearningbook.org (2016).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444. https://doi.org/10.1038/nature14539 (2015).

Article   ADS   CAS   PubMed   Google Scholar  

Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60 , 84–90. https://doi.org/10.1145/3065386 (2017).

Article   Google Scholar  

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9 , 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 (1997).

Article   CAS   PubMed   Google Scholar  

Goodfellow, I. et al. Generative adversarial nets. Commun. ACM 63 , 139–144. https://doi.org/10.1145/3422622 (2020).

Vaswani, A. et al. Attention is all you need. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems , pp. 6000–6010. https://doi.org/10.5555/3295222.3295349 (2017).

Barrio, R. et al. Deep learning for chaos detection. Chaos 33 , 073146. https://doi.org/10.1063/5.0143876 (2023).

Article   ADS   MathSciNet   PubMed   Google Scholar  

Levin, E. A recurrent neural network: limitations and training. Neural Netw. 3 , 641–650. https://doi.org/10.1016/0893-6080(90)90054-O (1990).

LeCun, Y. & Bengio, Y. Convolutional networks for images, speech, and time series. in The handbook of brain theory and neural networks, pp. 255–258. https://doi.org/10.5555/303568.303704 (MIT Press, 1998).

Wu, Y., Yang, F., Liu, Y., Zha, X. & Yuan, S. A comparison of 1-D and 2-D deep convolutional neural networks in ECG classification. arXiv preprint arXiv:1810.07088 . https://doi.org/10.48550/arXiv.1810.07088 (2018).

Hu, J. et al. A multichannel 2D convolutional neural network model for task-evoked fMRI data classification. Comput. Intell. Neurosci. 2019 , 5065214. https://doi.org/10.1155/2019/5065214 (2019).

Article   PubMed   PubMed Central   Google Scholar  

Zhang, S. et al. A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 44 , e32. https://doi.org/10.1093/nar/gkv1025 (2016).

Article   PubMed   Google Scholar  

Maurício, J., Domingues, I. & Bernardino, J. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Appl. Sci. 13 , 5521. https://doi.org/10.3390/app13095521 (2023).

Article   CAS   Google Scholar  

Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 . https://doi.org/10.48550/arXiv.2010.11929 (2020).

Carion, N. et al. End-to-end object detection with transformers. Computer Vision-ECCV 2020 (Springer), pp. 213–229. https://doi.org/10.1007/978-3-030-58452-8_13 (2020).

Lv, W. et al. DETRs beat YOLOs on real-time object detection. arXiv preprint arXiv:2304.08069 . https://doi.org/10.48550/arXiv.2304.08069 (2023).

Sun, Y. et al. Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621 . https://doi.org/10.48550/arXiv.2307.08621 (2023).

Zhou, D.-X. Universality of deep convolutional neural networks. Appl. Comput. Harmonic Anal. 48 , 787–794. https://doi.org/10.1016/j.acha.2019.06.004 (2020).

Article   MathSciNet   Google Scholar  

Chiyuan, Z., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64 , 107–115. https://doi.org/10.1145/3446776 (2021).

Ma, W., Papadakis, M., Tsakmalis, A., Cordy, M. & Traon, Y. L. Test selection for deep learning systems. ACM Trans. Softw. Eng. Methodol. 30 , 13. https://doi.org/10.1145/3417330 (2021).

Liu, Z., Michaud, E. J. & Tegmark, M. Omnigrok: grokking beyond algorithmic data. arXiv preprint arXiv:2210.01117 . https://doi.org/10.48550/arXiv.2210.01117 (2022).

Power, A., Burda, Y., Edwards, H., Babuschkin, I. & Misra, V. Grokking: generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177 . https://doi.org/10.48550/arXiv.2201.02177 (2022).

Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 . https://doi.org/10.48550/arXiv.1312.6034 (2013).

Kim, Y. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 . https://doi.org/10.48550/arXiv.1408.5882 (2014).

Abdel-Hamid, O. et al. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22 , 1533–1545. https://doi.org/10.1109/TASLP.2014.2339736 (2014).

Hatami, N., Gavet, Y. & Debayle, J. Classification of time-series images using deep convolutional neural networks. in Proceedings Tenth International Conference on Machine Vision (ICMV 2017) 10696 , 106960Y. https://doi.org/10.1117/12.2309486 (2018).

Smith, M. A. et al. Molecular barcoding of native RNAs using nanopore sequencing and deep learning. Genome Res. 30 , 1345–1353. https://doi.org/10.1101/gr.260836.120 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Emek Soylu, B. et al. Deep-Learning-based approaches for semantic segmentation of natural scene images: A review. Electronics 12 , 2730. https://doi.org/10.3390/electronics12122730 (2023).

Hosseini, H., Xiao, B., Jaiswal, M. & Poovendran, R. On the limitation of Convolutional Neural Networks in recognizing negative images. in 16th IEEE International Conference on Machine Learning and Applications, pp. 352–358. https://ieeexplore.ieee.org/document/8260656 (2017).

Montserrat, D. M., Lin, Q., Allebach, J. & Delp, E. J. Training object detection and recognition CNN models using data augmentation. Electron. Imaging 2017 , 27–36. https://doi.org/10.2352/ISSN.2470-1173.2017.10.IMAWM-163 (2017).

Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V. & Le, Q. V. Autoaugment: learning augmentation policies from data. arXiv preprint arXiv:1805.09501 . https://doi.org/10.48550/arXiv.1805.09501 (2018).

Hataya, R., Zdenek, J., Yoshizoe, K. & Nakayama, H. Faster AutoAugment: Learning augmentation strategies using backpropagation, in Computer Vision–ECCV 2020: 16th European Conference, Proceedings, Part XXV, pp. 1–16 (Springer). https://doi.org/10.1007/978-3-030-58595-2_1 (2020).

Xiao, K., Engstrom, L., Ilyas, A. & Madry, A. Noise or signal: the role of image backgrounds in object recognition. arXiv preprint arXiv:2006.09994 . https://doi.org/10.48550/arXiv.2006.09994 (2020).

Kovalerchuk, B., Kalla, D. C. & Agarwal, B., Deep learning image recognition for non-images, in Integrating artificial intelligence and visualization for visual knowledge discovery (eds. Kovalerchuk, B., et al. ) pp. 63–100 (Springer). https://doi.org/10.1007/978-3-030-93119-3_3 (2022).

Samek, W., Binder, A., Montavon, G., Lapuschkin, S. & Muller, K. R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28 , 2660–2673. https://doi.org/10.1109/tnnls.2016.2599820 (2017).

Article   MathSciNet   PubMed   Google Scholar  

Montavon, G., Samek, W. & Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digital Signal Process. 73 , 1–15. https://doi.org/10.1016/j.dsp.2017.10.011 (2018).

De Cesarei, A., Cavicchi, S., Cristadoro, G. & Lippi, M. Do humans and deep convolutional neural networks use visual information similarly for the categorization of natural scenes?. Cognit. Sci. 45 , e13009. https://doi.org/10.1111/cogs.13009 (2021).

Kindermans, P.-J. et al. The (un) reliability of saliency methods, in Explainable AI: Interpreting, explaining and visualizing deep learning. Lecture Notes in Computer Science 11700 , pp. 267–280 (Springer). https://doi.org/10.1007/978-3-030-28954-6_14 (2019).

Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. Computer Vision—ECCV 2014, pp. 818–833 (Fleet, D., Pajdla T., Schiele, B., & Tuytelaars, T., eds) (Springer). https://doi.org/10.1007/978-3-319-10590-1_53 (2014).

Springenberg, J. T., Dosovitskiy, A., Brox, T. & Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 . https://doi.org/10.48550/arXiv.1412.6806 (2014).

Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R. & Samek, W. Layer-wise relevance propagation for neural networks with local renormalization layers, in Artificial Neural Networks and Machine Learning–ICANN 2016: Proceedings 25th International Conference on Artificial Neural Networks, pp. 63–71 (Springer). https://doi.org/10.1007/978-3-319-44781-0_8 (2016).

Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the 2017 IEEE international conference on computer vision, pp. 618–626. https://ieeexplore.ieee.org/document/8237336 (2017).

Sundararajan, M., Taly, A. & Yan, Q. (2017) Axiomatic attribution for deep networks. in Proceedings of the 34th International Conference on Machine Learning 70 , 3319–3328. https://doi.org/10.5555/3305890.3306024 .

Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. in Proceedings of the 34th International Conference on Machine Learning 70 , 3145–3153. https://doi.org/10.5555/3305890.3306006 (2017).

Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. in Proceedings of the 31st International Conference on Machine Learning, pp . 4768–4777. https://doi.org/10.5555/3295222.3295230 (2017).

Ersavas, T. Deepmapper. https://github.com/tansel/deepmapper (2023).

Kokhlikyan, N. et al. Captum: A unified and generic model interpretability library for pytorch. arXiv preprint arXiv:2009.07896 . https://doi.org/10.48550/arXiv.2009.07896 (2020).

Guyon, I. G. S. B.-H. A. & Dror, G. Gisette. UCI Machine Learning Repository . https://archive.ics.uci.edu/dataset/170/gisette (2008).

PyTorch, torch.rand. https://pytorch.org/docs/stable/generated/torch.rand.html (2023).

Sharma, A., Vans, E., Shigemizu, D., Boroevich, K. A. & Tsunoda, T. DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9 , 11399. https://doi.org/10.1038/s41598-019-47765-6 (2019).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Sharma, A., Lysenko, A., Boroevich, K. A., Vans, E. & Tsunoda, T. DeepFeature: feature selection in nonimage data using convolutional neural network. Brief. Bioinform. 22 , bbab297. https://doi.org/10.1093/bib/bbab297 (2021).

Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 . https://doi.org/10.48550/arXiv.1409.1556 (2014).

Pytorch2, AdaptiveAvgPool2d. https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html (2023).

Download references

Acknowledgements

We thank Murat Karaorman, Mitchell Cummins, and Fatemeh Vafaee for helpful advice and comments on the manuscript. This research is supported by an Australian Government Research Training Program Scholarships RSAI8000 and RSAP1000 to T.E., a Fonds de Recherche du Quebec Santé Junior 1 Award 284217 to M.A.S., and UNSW SHARP Grant RG193211 to J.S.M.

Author information

Authors and affiliations.

School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW, 2052, Australia

Tansel Ersavas, Martin A. Smith & John S. Mattick

Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, H3C 3J7, Canada

Martin A. Smith

CHU Sainte-Justine Research Centre, Montreal, Canada

UNSW RNA Institute, UNSW Sydney, Australia

You can also search for this author in PubMed   Google Scholar

Contributions

T.E. developed the methods, implemented DeepMapper and produced the first draft of the paper. J.S.M. provided advice, structured the paper, and edited it for improved readability and clarity. M.A.S. provided advice and edited the paper.

Corresponding authors

Correspondence to Tansel Ersavas or John S. Mattick .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ersavas, T., Smith, M.A. & Mattick, J.S. Novel applications of Convolutional Neural Networks in the age of Transformers. Sci Rep 14 , 10000 (2024). https://doi.org/10.1038/s41598-024-60709-z

Download citation

Received : 16 January 2024

Accepted : 26 April 2024

Published : 01 May 2024

DOI : https://doi.org/10.1038/s41598-024-60709-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

artificial neural networks research papers

Learning spiking neuronal networks with artificial neural networks: neural oscillations

  • Published: 17 April 2024
  • Volume 88 , article number  65 , ( 2024 )

Cite this article

artificial neural networks research papers

  • Ruilin Zhang 1 , 2   na1 ,
  • Zhongyi Wang 1 , 3   na1 ,
  • Tianyi Wu 1 , 3 ,
  • Yuhang Cai 4 ,
  • Louis Tao 1 , 5 ,
  • Zhuo-Cheng Xiao 6 &
  • Yao Li   ORCID: orcid.org/0000-0002-4241-7723 7  

204 Accesses

Explore all metrics

First-principles-based modelings have been extremely successful in providing crucial insights and predictions for complex biological functions and phenomena. However, they can be hard to build and expensive to simulate for complex living systems. On the other hand, modern data-driven methods thrive at modeling many types of high-dimensional and noisy data. Still, the training and interpretation of these data-driven models remain challenging. Here, we combine the two types of methods to model stochastic neuronal network oscillations. Specifically, we develop a class of artificial neural networks to provide faithful surrogates to the high-dimensional, nonlinear oscillatory dynamics produced by a spiking neuronal network model. Furthermore, when the training data set is enlarged within a range of parameter choices, the artificial neural networks become generalizable to these parameters, covering cases in distinctly different dynamical regimes. In all, our work opens a new avenue for modeling complex neuronal network dynamics with artificial neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

artificial neural networks research papers

Similar content being viewed by others

artificial neural networks research papers

Computational Modeling with Spiking Neural Networks

artificial neural networks research papers

Neurons with Non-standard Behaviors Can Be Computationally Relevant

artificial neural networks research papers

Modeling Neuronal Systems

The different criteria of MFE initiation from Algorithm 1 aims to ensure the robustness of capturing MFE, due to the lack of network state information within each timestep in the tau-leaping simulations.

Aggarwal CC et al (2018) Neural networks and deep learning, vol 10. Springer, Cham, p 3

Book   Google Scholar  

AlQuraishi M, Sorger PK (2021) Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms. Nat Methods 18(10):1169–1180

Article   Google Scholar  

Andrew Henrie J, Shapley R (2005) LFP power spectra in V1 cortex: the graded effect of stimulus contrast. J Neurophysiol 94(1):479–490

Azouz R, Gray CM (2000) Dynamic spike threshold reveals a mechanism for synaptic coincidence detection in cortical neurons in vivo. Proc Natl Acad Sci 97(14):8110–8115

Azouz R, Gray CM (2003) Adaptive coincidence detection and dynamic gain control in visual cortical neurons in vivo. Neuron 37:513–523

Barron AR (1994) Approximation and estimation bounds for artificial neural networks. Mach Learn 14(1):115–133

Bauer M et al (2006) Tactile spatial attention enhances gamma-band activity in somatosensory cortex and reduces low-frequency activity in parieto-occipital areas. J Neurosci 26(2):490–501

Bauer EP, Paz R, Paré D (2007) Gamma oscillations coordinate Amygdalo-Rhinal interactions during learning. J Neurosci 27(35):9369–9379

Börgers C, Kopell N (2003) Synchronization in networks of excitatory and inhibitory neurons with sparse, random connectivity. Neural Comput 15(3):509–538

Bressloff PC (1994) Dynamics of compartmental model recurrent neural networks. Phys Rev E 50(3):2308

Article   MathSciNet   Google Scholar  

Brosch M, Budinger E, Scheich H (2002) Stimulus-related gamma oscillations in primate auditory cortex. J Neurophysiol 87(6):2715–2725

Brunel N, Hakim V (1999) Fast global oscillations in networks of integrate-and-fire neurons with low firing rates. Neural Comput 11(7):1621–1671

Buice MA, Cowan JD (2007) Field-theoretic approach to fluctuation effects in neural networks. Phys Rev E 75(5):051919

Buschman TJ, Miller EK (2007) Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science 315:1860–1862

Cai D et al (2006) Kinetic theory for neuronal network dynamics. Commun Math Sci 4(1):97–127

Cai Y et al (2021) Model reduction captures stochastic Gamma oscillations on low-dimensional manifolds. Front Comput Neurosci 15:74

Chariker L, Young L-S (2015) Emergent spike patterns in neuronal populations. J Comput Neurosci 38(1):203–220

Chariker L, Shapley R, Young L-S (2016) Orientation selectivity from very sparse LGN inputs in a comprehensive model of macaque V1 cortex. J Neurosci 36(49):12368–12384

Chariker L, Shapley R, Young L-S (2018) Rhythm and synchrony in a cortical network model. J Neurosci 38(40):8621–8634

Chon KH, Cohen RJ (1997) Linear and nonlinear ARMA model parameter estimation using an artificial neural network. IEEE Trans Biomed Eng 44(3):168–174

Christof K (1999) Biophysics of computations. Oxford University Press, Oxford

Google Scholar  

Csicsvari J et al (2003) Mechanisms of gamma oscillations in the hippocampus of the behaving rat. Neuron 37:311–322

Erol B (2013) A review of gamma oscillations in healthy subjects and in cognitive impairment. Int J Psychophysiol 90(2):99–117. https://doi.org/10.1016/j.ijpsycho.2013.07.005

Frien A et al (2000) Fast oscillations display sharper orientation tuning than slower components of the same recordings in striate cortex of the awake monkey. Eur J Neurosci 12(4):1453–1465

Fries P et al (2001) Modulation of oscillatory neuronal synchronization by selective visual attention. Science 291:1560–1563

Fries P et al (2008) The effects of visual stimulation and selective visual attention on rhythmic neuronal synchronization in macaque area V4. J Neurosci 28(18):4823–4835

Gerstner W et al (2014) Neuronal dynamics: from single neurons to networks and models of cognition. Cambridge University Press, Cambridg

Ghosh-Dastidar S, Adeli H (2009) Spiking neural networks. Int J Neural Syst 19(04):295–308

Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. In: arXiv preprint arXiv:1412.6572

Hasenauer J et al (2015) Data-driven modelling of biological multi-scale processes. J Coupled Syst Multiscale Dyn 3(2):101–121

He K et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

Hodgkin AL, Huxley AF (1952) A quantitative description of membrane current and its application to conduction and excitation in nerve. J Physiol 117(4):500

Jack RE, Crivelli C, Wheatley T (2018) Data-driven methods to diversify knowledge of human psychology. Trends cognit Sci 22(1):1–5

Janes KA, Yaffe MB (2006) Data-driven modelling of signal-transduction networks. Nat Rev Mol Cell Biol 7(11):820–828

Krystal JH et al (2017) Impaired tuning of neural ensembles and the pathophysiology of schizophrenia: a translational and computational neuroscience perspective. Biol Psychiatr 81(10):874–885

Li Z et al (2020) Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895

Li H et al (2020) NETT: solving inverse problems with deep neural networks. Inverse Probl 36(6):065005

Li Y, Hui X (2019) Stochastic neural field model: multiple firing events and correlations. J Math Biol 79(4):1169–1204

Li Y, Chariker L, Young L-S (2019) How well do reduced models capture the dynamics in models of interacting neurons? J Math Biol 78(1):83–115

Liu J, Newsome WT (2006) Local field potential in cortical area MT: stimulus tuning and behavioral correlations. J Neurosci 26(30):7779–7790

Lu L et al (2021) Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat Mach Intell 3(3):218–229

Mably AJ, Colgin LL (2018) Gamma oscillations in cognitive disorders. Current Opin Neurobiol 52:182–187

Nikola K, Samuel L, Siddhartha M (2021) On universal approximation and error bounds for fourier neural operators. J Mach Learn Res 22:1–76

MathSciNet   Google Scholar  

Nobukawa S, Nishimura H, Yamanishi T (2017) Chaotic resonance in typical routes to chaos in the Izhikevich neuron model. Sci Rep 7(1):1–9

Pesaran B et al (2002) Temporal structure in neuronal activity during working memory in macaque parietal cortex. Nat Neurosci 5(8):805–811

Pieter Medendorp W et al (2007) Oscillatory activity in human parietal and occipital cortex shows hemispheric lateralization and memory effects in a delayed double-step saccade task. Cereb Cortex 17(10):2364–2374

Ponulak F, Kasinski A (2011) Introduction to spiking neural networks: information processing, learning and applications. Acta Neurobiol Exp 71(4):409–433

Popescu AT, Popa D, Paré D (2009) Coherent gamma oscillations couple the amygdala and striatum during learning. Nature Neurosci 12(6):801–807

Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys 378:686–707

Rangan AV, Young L-S (2013) Emergent dynamics in a model of visual cortex. J Comput Neurosci 35(2):155–167

Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data 6(1):1–48

Solle D et al (2017) Between the poles of data-driven and mechanistic modeling for process operation. Chem Ing Tech 89(5):542–561

Tao L et al (2006) Orientation selectivity in visual cortex by fluctuation-controlled criticality. Proc Natl Acad Sci 103(34):12911–12916

Traub RD et al (2005) Single-column thalamocortical network model exhibiting gamma oscillations, sleep spindles, and epileptogenic bursts. J Neurophysiol 93(4):2194–2232

Van Der Meer MAA, David Redish A (2009) Low and high gamma oscillations in rat ventral striatum have distinct relationships to behavior, reward, and spiking activity on a learned spatial decision task. Front Integr Neurosci 3:9

van Wingerden M et al (2010) Learning-associated gamma-band phase-locking of action-outcome selective neurons in orbitofrontal cortex. J Neurosci 30(30):10025–10038

Wang S, Wang H, Perdikaris P (2021) Learning the solution operator of parametric partial differential equations with physics-informed DeepONets. Sci Adv 7(40):eabi8605

Whittington MA et al (2000) Inhibition-based rhythms: experimental and mathematical observations on network dynamics. Int J Psychophysiol 38(3):315–336

Wilson HR, Cowan JD (1972) Excitatory and inhibitory interactions in localized populations of model neurons. Biophys J 12(1):1–24

Womelsdorf T et al (2007) Modulation of neuronal interactions through neuronal synchronization. Science 316:1609–1612

Womelsdorf T et al (2012) Orientation selectivity and noise correlation in awake monkey area V1 are modulated by the gamma cycle. Proc Natl Acad Sci 109(11):4302–4307

Wu T et al (2022) Multi-band oscillations emerge from a simple spiking network. Chaos 33:043121

Xiao Z-C, Lin KK (2022) Multilevel monte Carlo for cortical circuit models. J Comput Neurosci 50(1):9–15

Xiao Z-C, Lin KK, Young L-S (2021) A data-informed mean-field approach to mapping of cortical parameter landscapes. PLoS Comput Biol 17(12):e1009718

Yuan X et al (2019) Adversarial examples: attacks and defenses for deep learning. IEEE Trans Neural Netw Learn Syst 30(9):2805–2824

Zhang J et al (2014) A coarse-grained framework for spiking neuronal networks: between homogeneity and synchrony. J Comput Neurosci 37(1):81–104

Zhang J et al (2014) Distribution of correlated spiking events in a population-based approach for integrate-and-fire networks. J Comput Neurosci 36:279–295

Zhang JW, Rangan AV (2015) A reduction for spiking integrate-and-fire network dynamics ranging from homogeneity to synchrony. J Comput Neurosci 38:355–404

Zhang Y, Young L-S (2020) DNN-assisted statistical analysis of a model of local cortical circuits. Sci Rep 10(1):1–16

Download references

Acknowledgements

This work was partially supported by the National Science and Technology Innovation 2030 Major Program through grant 2022ZD0204600 (R.Z. Z.W., T.W., L.T.), the Natural Science Foundation of China through grants 31771147 (R.Z., Z.W., T.W., L.T.) and 91232715 (L.T.). Z.X. is supported by the Courant Institute of Mathematical Sciences through Courant Instructorship. Y.L. is supported by NSF DMS-1813246 and NSF DMS-2108628.

Directorate for Mathematical and Physical Sciences (2108628, 1813246), National Natural Science Foundation of China (31771147, 91232715, 2022ZD0204600).

Author information

Ruilin Zhang and Zhongyi Wang have equal contribution to this work.

Authors and Affiliations

Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, School of Life Sciences, Peking University, Beijing, 100871, China

Ruilin Zhang, Zhongyi Wang, Tianyi Wu & Louis Tao

Yuanpei College, Peking University, 100871, Beijing, China

Ruilin Zhang

School of Mathematical Sciences, Peking University, 100871, Beijing, China

Zhongyi Wang & Tianyi Wu

Department of Mathematics, University of California, 94720, Berkeley, CA, USA

Center for Quantitative Biology, Peking University, 100871, Beijing, China

Courant Institute of Mathematical Sciences, New York University, 10003, New York, NY, USA

Zhuo-Cheng Xiao

Department of Mathematics and Statistics, University of Massachusetts Amherst, 01003, Amherst, MA, USA

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Louis Tao , Zhuo-Cheng Xiao or Yao Li .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1.1 Tau-leaping and SSA algorithms

The simulations of the SNN dynamics are carried out by two algorithms: Tau-leaping and Stochastic Simulation Algorithm (SSA). The key difference is that, The tau-leaping method processes events that happen during a time step \(\tau \) in bulk, while SSA simulates the evolution event by event. Of the two, tau-leaping can be faster (with properly chosen \(\tau \) ), while SSA is usually more precise with the precision that scales with C++ execution. Here we illustrate a Markov jump process as an example.

Algorithms. Consider \(X(t)=\{x_1(t),x_2(t),..., x_N(t)\}\) , where X ( t ) can take values in a discrete state space

The transition from state X to state \(s_i\) at time t is denoted as \(T_{s_i}^t(X)\) , taking an exponential distributed waiting time with rate \(\lambda _{s_i\leftarrow X}\) . Here, \(s_i\in S(X)\) which are states adjacent to state X with a non-zero transition probability. For simplicity, we assume \(\lambda _{s_i\leftarrow X}\) does not explicitly depend on t except via X ( t ).

Tau-leaping only considers X ( t ) on a time grid \(t = jh\) , for \(j = 0,1,...,T/h\) , assuming state transfer occurs for at most one time within each step:

On the other hand, SSA accounts for this simulation problem as:

i.e., starting from \(X^0\) , X transitions to \(X_1, X_2,\ldots , X_k = X(T)\) at time \(0<t_1< t_2<\ldots< t_k <T \) .

For \(t_\ell<t<t_{\ell +1}\) , we sample the transition time from \(\text {Exp}(\sum _{s_i\in S(X(t))} \lambda _{s_i\leftarrow X(t)})\) . That is, for independent, exponentially distributed random variables

Therefore, in each step of an SSA simulation, the system state evolves forward by an exponentially distributed random time, whose rate is the sum of rates of all exponential “clocks". Then we randomly choose the exact state \(s_i\) to which transition takes place with probability weighted by the sizes of the pending events.

Implementation on spiking networks. We note that X ( t ) will changes when

neuron i receives external input ( \(v_i\) goes up for 1, including entering \(\mathcal {R}\) );

neuron i receives a spike ( \(H^E_i\) or \(H^I_i\) goes up for 1);

a pending spike takes effect to neuron i ( \(v_i\) goes up/down according to synaptic strengths);

neuron i walks out from refractory ( \(v_i\) goes from \(\mathcal {R}\) to 0).

The corresponding transition rates are directly given ( \(\lambda ^E\) and \(\lambda ^I\) ) or the inverses of the physiological time scales ( \(\tau ^{E}\) , \(\tau ^{I}\) , and \(\tau ^{\mathcal {R}}\) ). In an SSA simulation, when the state transition elicits a spike in a neuron, the synaptic outputs generated by this spike are immediately added to the pool of corresponding types of effects, and the neuron goes into the refractory state. However, in a tau-leaping simulation, the spikes are recorded but the synaptic outputs are processed in bulk at the end of each time step. Therefore, all events within the same time step are uncorrelated.

1.2 The coarse-graining mapping

Here we give the definition of the coarse-grained mapping \(\mathcal {C}\) in Eq.  6 . For \(\forall \omega \in \varvec{\Omega }\) that

Here, \({\textbf {1}}_{\text {A}}(a)\) is an indicator function of set A , i.e., \({\textbf {1}}_{\text {A}}(a) = 1\) \(\forall a\in \text {A}\) , otherwise \({\textbf {1}}_{\text {A}}(a) = 0\) . \(\varGamma _i\) is a subset of the state space for membrane potential, and

1.3 Pre-processing surrogate data: discrete cosine transform

Here we explain how discrete cosine transform (DCT) works in the pre-processing. For an input probability mass vector

its DCT output \(\mathcal {F}_c(\varvec{p}) = (c_1,c_2,..., c_{22})\) is given by

where \(\delta _{kl}\) is the Kronecker delta function. The iDCT mapping \(\mathcal {F}^{-1}_c\) is defined as the inverse function of \(\mathcal {F}_c\) .

1.4 The linear formula for firing rates

When preparing the parameter-generic training set, we use simple, linear formulas to estimate the firing rate of E neurons and I neurons ( \(f_E\) and \(f_I\) , see Li and Hui 2019 ; Li et al. 2019 ). We take \(\theta \in {\Theta }\) for the synaptic coupling strength, while other constants are the same as in Table 1.

1.5 The deep network architecture

In general, artificial neural networks (ANNs) are interconnected computation units. Many different architectures are possible for ANNs; in this paper, we adopt the feedforward deep network architecture, which is one of the simplest (Fig.  7 ).

figure 7

Diagram of a feedforward ANN

A feedforward ANN has a layered structure, where units in the \(i-\) th layer drive the \((i+1)-\) th layer with a weight matrix \(\varvec{W}_i\) and a bias vector \(b_i\) . Computation is processed from one layer to the next. The first, “input layer" takes an input vector x , sending its output \(\varvec{W}_1x+b_1\) to the first "hidden layer"; the first hidden layer then sends output \(\varvec{W}_2 f(\varvec{W}_1x+b_1)+b_2\) to the next layer, and so on, until the last, “output layer" produces an output vector y . In this paper, we implemented a feedforward ANN with four layers containing 512, 512, 512, and 128 neurons, respectively. We chose the Leaky ReLU function with a default negative slope of 0.01 as our activation function \(f(\cdot )\) .

The training of feedforward ANNs is achieved by the back-propagation (BP) algorithm. Let \(\mathcal{N}\mathcal{N}(x)\) denote the prediction of the ANN with input x , and \(L(\cdot )\) the loss function. With each entry ( x ,  y ) in the training data, we minimize the loss \(L(y-\mathcal{N}\mathcal{N}(x))\) following the gradients on each dimension of \(W_i\) and \(b_i\) . The computation of gradients takes place from the last layer \(W_n\) ’s and \(b_n\) , then “propagated back" to adjust previous \(W_i\) and \(b_i\) on each layer. We chose the mean-square error as our loss function, i.e. \(L(\cdot )=||\cdot ||_{L^2}^2.\)

figure 8

Left: pre-, post- and predicted MFE profiles without DCT + iDCT; Middle: pre-, post- and predicted MFE profiles with DCT + iDCT; Right: pre-, post- and predicted MFE profiles with network parameters as additional inputs of ANN. (Left and Middle: \(S^{EE}\) , \(S^{IE}\) , \(S^{EI}\) , \(S^{II}\) = 4, 3, \(-\) 2.2, -2; Right: 3.82, 3.24, \(-\) 2.05, \(-\) 1.87.)

1.6 Pre-processing in ANN predictions

Here we provide more examples of the ANN predictions of the voltage profiles. We compare how ANNs predict post-MFE voltage distributions \(\varvec{p}^E\) and \(\varvec{p}^I\) in three different settings in Fig.  8 . In each panel divided by red lines, the left column gives an example of pre-MFE voltage distributions, while the right column compares the corresponding post-MFE voltage distributions collected from ANN predictions (red) vs. SSA simulation. Results from ANNs without pre-processing, with pre-processing, and the parameter-generic ANN are depicted in the left, middle, and right panels.

1.7 Principal components of voltage distributions

The voltage distribution vectors in the form below are used to plot the distribution in the phase space as shown in middle panels of Figs.  5 , 6 , 9 D, 10 D, 11 D, 12 D, 13 , 14 , and 15

The vectors from the training set (colored in blue in figures) are selected to generate the basis of the phase space through svd function in numpy.linalg in Python. The first two rows of the \(V^\top \) are the first two PCs of the space. The scores of vectors from the training set and approximated results are dot products of these vectors and the normalized PCs.

The plain ks-density function in MATLAB is used to estimate the kernel smoothing density of the profile distribution based on the data points generated above. The contours show the level of each tenth of the maximal height (with 0.1% bias for demonstrating the top) in the distributions.

figure 9

DNN predictions and surrogate dynamics in ER random network with 400 neurons. A. Mapping \(\widehat{F}^{\theta }_1\) in ER network for \(\theta = (S^{EE}, S^{EI}, S^{IE}, S^{II})= (4, 3, -2.2, -2)\) . A. Left: a pre-MFE \(\varvec{p}^E(v)\) and a \(\varvec{p}^I(v)\) ; Right: post-MFE \(\varvec{p}^E(v)\) and \(\varvec{p}^I(v)\) produced by ANN (blue) vs. spiking network simulations (orange). B. Comparison of E and I spike number during MFEs, ANN predictions vs. SSA simulations. The distributions are depicted by 10th contours of max in ks-density estimation; C. Example of pre and post-MFE voltage distributions \(\varvec{p}^E\) and \(\varvec{p}^I\) in the surrogate dynamics. D. Distribution depicted by 10th-contours of the first two principal components of \(\varvec{p}^E\) and \(\varvec{p}^I\) . E. Raster plots of simulated surrogate dynamics and the real dynamics starting from the same initial profiles (color figure online)

figure 10

DNN predictions and surrogate dynamics in a 400-neuron random network with log-normal degree distribution. A-E are in parallel to Fig.  9

figure 11

DNN predictions and surrogate dynamics in ER random network with 4000 neurons. A-E are in parallel to Fig.  9 , except that \(\theta = (S^{EE}, S^{EI}, S^{IE}, S^{II})= (0.4, 0.3, -0.22, -0.2)\)

figure 12

DNN predictions and surrogate dynamics in a 4000-neuron random network with log-normal degree distribution. A-E are in parallel to Fig.  11

figure 13

Surrogate dynamics produced by parameter-generic MFE mapping \(\widehat{F}_1\) in two fixed networks with 400 neurons. Left : ER network; Right : Network with log-normal degree distribution. A-B : Example of pre and post-MFE voltage distributions \(\varvec{p}^E\) and \(\varvec{p}^I\) in the surrogate dynamics. C-D : Distribution depicted by 10th-contours of the first two principal components of \(\varvec{p}^E\) and \(\varvec{p}^I\) . E-F : Raster plots of simulated surrogate dynamics and the real dynamics starting from the same initial profiles

figure 14

Surrogate dynamics produced by parameter-generic MFE mapping \(\widehat{F}_1\) in two fixed networks with 4000 neurons. Left : ER network; Right : Network with log-normal degree distribution. A-F are in parallel to Fig.  13

figure 15

Surrogate dynamics with a parameter set that lies out of the sampling 4D cube produced by parameter-generic MFE mapping \(\widehat{F}_1\) . Left : \((S^{EE},S^{IE},S^{EI},S^{II}) = (5, 3, -2.2, -2)\) ; Right : \((S^{EE},S^{IE},S^{EI},S^{II}) = (4, 4, -2.2, -2)\) . A-F are in parallel to Fig.  13

figure 16

Fixed random graphs used for SNN simulation. ER: Erd?s-Rényi random graph. LN: random graphs with log-normal degree distribution. A light-colored block at ( i ,  j ) represents a directed synaptic connection from i to j . The last quarter of the neurons are inhibitory, while the others are excitatory. This figure is compressed to reduce the file size

figure 17

Distributions of the magnitude of MFEs (number of spikes) from sampled parameters sets and the specific parameter set \((S^{EE},S^{IE},S^{EI},S^{II}) = (4, 3, -2.2, -2)\)

figure 18

Distributions of the magnitude of MFEs (number of spikes) from the enlarged set of initial profiles and a single trajectory

figure 19

Firing rates of networks with parameter sets that can produce accepted MFEs for training. (n=3000)

1.8 Consistent results from fixed random networks

Here we test the capability of our method with fixed network architectures. We select two types of random graphs: 1. Erd?s-Rényi random graph (ER), and 2. random graphs with log-normal degree distribution (LN). In both types of graphs, the average edge density is consistent with P s in Table 1 . Their adjacency matrices are shown in Fig.  16 . The sampling of four types of edges in LN random graphs leverages a standard deviation of 0.2 in logarithm and constraints from the mean degrees. Results are obtained from both sizes, 400 and 4000, of the random graphs. For each network, MFEs are captured from original network simulations and simulations with enlarged initial profiles (Fig. 18 ). The trained parameter-specific MFE mappings \(\widehat{F}^{\theta }_1\) are used to produce predictions of post-MFE states and surrogate network dynamics (Figs.  9 , 10 , 11 , 12 ). MFEs are further captured in simulations with various parameter sets sampled from the 4D cube \(\varvec{\Theta }\) . The trained parameter-generic \(\widehat{F}_1\) is used to produce predictions and surrogate dynamics, which are shown in Figs.  13 and 14 . As seen in Figs.  9 , 10 , 11 , 12 , 13 and 14 , the performance of the ANN surrogate is consistent with the case when postsynaptic connections are decided on-the-fly.

1.9 Varying the synaptic coupling strengths generates a broad range of firing rates and magnitudes of MFEs

The training of parameter-generic MFE mapping \(\widehat{F}_1\) needs MFEs from simulations with a variety of sets of parameter \(\theta \) . As introduced in Sect.  4.3 , the sets sampled from the 4D cube \(\varvec{\Theta }\) are first filtered by the estimated firing rate computed by the linear formula. A large fraction (about 80%) of the sets from the previous step generate MFEs that can be captured and accepted by our algorithm. These sets generate a wide range of firing rates (Fig.  19 ). The simulated firing rates match the linear formula in general. The major rejected region appears with high \(S^{EE}\) , \(S^{IE}\) and low \(S^{II}\) , \(S^{EI}\) , which is in the neighborhood with the accepted region with high firing rate and near the singular region of the linear formula.

The magnitudes of MFEs (number of spikes) from various parameters show a much wider distribution than the original set of parameters (Fig.  17 ).

1.10 Extrapolating network dynamics out of \(\varvec{\Theta }\) with parameter-generic MFE mapping \(\widehat{F}_1\)

To test the extrapolation ability of our method, we generate surrogate dynamics in networks with \(\theta \) ’s outside of \(\varvec{\Theta }\) with the parameter-generic MFE mapping \(\widehat{F}_1\) . The two \(\theta \) ’s are \((S^{EE},S^{IE},S^{EI},S^{II}) = (5, 3, -2.2, -2)\) and \((S^{EE},S^{IE},S^{EI},S^{II}) = (4, 4, -2.2, -2)\) .

Parameter-generic MFE mapping \(\widehat{F}_1\) trained with MFEs in \(\varvec{\Theta }\) can still capture the neuronal oscillations and predict post-MFE states in the two networks. (Fig.  15 ) Behaviors of networks under these two sets of parameters are also reproduced. Strong recurrent excitation in the former \(\theta \) makes MFEs readily to be concatenated, while the less-synchronized character of the latter \(\theta \) makes MFEs hard to trigger or identify. This result shows the robustness and capability of extrapolation of our method, while our future work can focus on improving the precision of prediction in detail (Figs. 18 , 19 ).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Zhang, R., Wang, Z., Wu, T. et al. Learning spiking neuronal networks with artificial neural networks: neural oscillations. J. Math. Biol. 88 , 65 (2024). https://doi.org/10.1007/s00285-024-02081-0

Download citation

Received : 22 November 2022

Revised : 30 June 2023

Accepted : 05 March 2024

Published : 17 April 2024

DOI : https://doi.org/10.1007/s00285-024-02081-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial neural network
  • Gamma oscillations
  • Data-driven methods
  • Generalization

Mathematics Subject Classification

  • Find a journal
  • Publish with us
  • Track your research

Help | Advanced Search

Computer Science > Neural and Evolutionary Computing

Title: towards neuroai: introducing neuronal diversity into artificial neural networks.

Abstract: Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of neuroscience knowledge can help catalyze the next generation of AI by endowing a network with more powerful capabilities. As we know, the human brain has numerous morphologically and functionally different neurons, while artificial neural networks are almost exclusively built on a single neuron type. In the human brain, neuronal diversity is an enabling factor for all kinds of biological intelligent behaviors. Since an artificial network is a miniature of the human brain, introducing neuronal diversity should be valuable in terms of addressing those essential problems of artificial networks such as efficiency, interpretability, and memory. In this Primer, we first discuss the preliminaries of biological neuronal diversity and the characteristics of information transmission and processing in a biological neuron. Then, we review studies of designing new neurons for artificial networks. Next, we discuss what gains can neuronal diversity bring into artificial networks and exemplary applications in several important fields. Lastly, we discuss the challenges and future directions of neuronal diversity to explore the potential of NeuroAI.

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

A view of Artificial Neural Network

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IMAGES

  1. Research Paper On Basic of Artificial Neural Network

    artificial neural networks research papers

  2. Neural Networks

    artificial neural networks research papers

  3. (PDF) Artificial neural networks explained‐Part 2

    artificial neural networks research papers

  4. Research papers on artificial neural networks matlab

    artificial neural networks research papers

  5. (PDF) Study of Artificial Neural Network

    artificial neural networks research papers

  6. (PDF) Artificial Intelligence and Neural Networks The Legacy of Alan

    artificial neural networks research papers

VIDEO

  1. Neural Network: Models of artificial neural netwok

  2. How an Artificial Neuron Works?

  3. Can computers be cleverer than humans?

  4. Neural Network Diffusion

  5. Artificial Neural Networks(Arabic version)- مقدمة في الشبكات العصبية الاصطناعية

  6. What are Convolutional Neural Networks designed for? #artificialinteligence #neuralnetworks #science

COMMENTS

  1. (PDF) Artificial Neural Networks: An Overview

    Neural networks, also known as artificial neural networks, are a type of deep learning technology that falls under the. category of artificial intelligence, or AI. These technologies' commercial ...

  2. Novel applications of Convolutional Neural Networks in the age of

    Convolutional Neural Networks (CNNs) have been central to the Deep Learning revolution and played a key role in initiating the new age of Artificial Intelligence. However, in recent years newer ...

  3. Brain-inspired learning in artificial neural networks: a review

    Artificial neural networks (ANNs) have emerged as an essential tool in machine learning, achieving remarkable success across diverse domains, including image and speech generation, game playing, and robotics. However, there exist fundamental differences between ANNs' operating mechanisms and those of the biological brain, particularly concerning learning processes. This paper presents a ...

  4. Exploring the Advancements and Future Research Directions of Artificial

    Artificial Neural Networks (ANNs) are machine learning algorithms inspired by the structure and function of the human brain. Their popularity has increased in recent years due to their ability to learn and improve through experience, making them suitable for a wide range of applications. ANNs are often used as part of deep learning, which enables them to learn, transfer knowledge, make ...

  5. Artificial Neural Networks for Neuroscientists: A Primer

    Here, x is an external input, r l denotes the neural activity of neurons in the l-th layer, and W l is the connection matrix from the (l − 1)-th to the l-th layer. f (⋅) is a (usually nonlinear) activation function of the model neurons. The output of the network is read out through connections W N.Parameters b l and b N are biases for model neurons and output units, respectively.

  6. Neural networks: An overview of early research, current frameworks and

    1. Introduction and goals of neural-network research. Generally speaking, the development of artificial neural networks or models of neural networks arose from a double objective: firstly, to better understand the nervous system and secondly, to try to construct information processing systems inspired by natural, biological functions and thus gain the advantages of these systems.

  7. [1404.7828] Deep Learning in Neural Networks: An Overview

    Juergen Schmidhuber. In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit ...

  8. Deep learning: systematic review, models, challenges, and research

    The main concept of artificial neural networks (ANN) was proposed and introduced as a mathematical model of an artificial neuron in 1943 [1,2,3].In 2006, the concept of deep learning (DL) was proposed as an ANN model with several layers, which has significant learning capacity.

  9. [2101.08635] Neural Networks, Artificial Intelligence and the

    View a PDF of the paper titled Neural Networks, Artificial Intelligence and the Computational Brain, by Martin C. Nwadiugwu. ... Advancing research in artificial intelligence using the architecture of the human brain seeks to model systems by studying the brain rather than looking to technology for brain models. This study explores the concept ...

  10. Machine learning with neural networks

    The purpose of this early research on neural networks was to explain neuro-physiological mechanisms [8]. Perhaps the most significant advance was Hebb's learning principle, describing how neural networks ... perceptron learning was the paper by Rumelhart et al. [12]. The authors demon-strated in 1986 that perceptrons can be trained by ...

  11. Artificial Neural Network and Its Applications

    This paper focuses on Artificial Neural Networks (ANNs) and their applications. Initially, it explores the core concepts of a neural network (NN), including their inspiration, basic structure, and ...

  12. Multimodal neurons in artificial neural networks

    Our paper builds on nearly a decade of research into interpreting convolutional networks, [^reference-3] [^reference-4] [^reference-5] [^reference-6] [^reference-7] [^reference-8] [^reference-9] [^reference-10] [^reference-11] [^reference-12] beginning with the observation that many of these classical techniques are directly applicable to CLIP. We employ two tools to understand the activations ...

  13. Machine learning, artificial neural networks and social research

    Machine learning (ML), and particularly algorithms based on artificial neural networks (ANNs), constitute a field of research lying at the intersection of different disciplines such as mathematics, statistics, computer science and neuroscience. This approach is characterized by the use of algorithms to extract knowledge from large and heterogeneous data sets. In addition to offering a brief ...

  14. (PDF) Introduction to artificial neural networks

    Paper analyses four common image classification algorithms: convolution neural network, support vector machine, artificial neural network and logistic regression. In the research work, both ...

  15. Artificial Neural Network: Understanding the Basic Concepts without

    ARTIFICIAL NEURAL NETWORK. The basic unit by which the brain works is a neuron. Neurons transmit electrical signals (action potentials) from one end to the other. 11 That is, electrical signals are transmitted from the dendrites to the axon terminals through the axon body. In this way, the electrical signals continue to be transmitted across the synapse from one neuron to another.

  16. Artificial neural networks applied for predicting and explaining the

    Artificial neural networks (ANN) is a machine learning modeling technique that has become considerably competitive to traditional regression and statistical models. ... Therefore, the paper opens room for further research on the social characteristics and behavior of social media users. Supplementary Information. Below is the link to the ...

  17. Learning spiking neuronal networks with artificial neural networks

    Throughout this manuscript, we study SNN dynamics with a Markovian integrate-and-fire (MIF) neuronal network model. This model imitates a small, local circuit of the brain and shares many features of local circuits in living brains, including extensive recurrent interactions between neurons, leading to the emergence of \(\gamma \)-oscillations.We will evaluate the performance of DNNs based on ...

  18. Comprehensive Review of Artificial Neural Network Applications to

    The era of artificial neural network (ANN) began with a simplified application in many fields and remarkable success in pattern recognition (PR) even in manufacturing industries. Although significant progress achieved and surveyed in addressing ANN application to PR challenges, nevertheless, some problems are yet to be resolved like whimsical orientation (the unknown path that cannot be ...

  19. Applied Sciences

    An artificial neural network (ANN) may be used as a "black-box" modeling strategy without the need for a detailed system physical model. It is more reasonable to solely use the input and output data to explain the system's actions. ... Feature papers represent the most advanced research with significant potential for high impact in the ...

  20. Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural

    View PDF Abstract: Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of ...

  21. Mechanical neural networks: Architected materials that learn ...

    Some of the first networks developed for AI purposes were purely mathematical in form. The concepts underlying these mathematical networks, called artificial neural networks (ANNs) (), were first introduced by McCulloch and Pitts but were later matured by Rosenblatt ().The mathematical formulation underlying ANNs can be diagrammed using interconnected lines, shown in blue in Fig. 1A, that ...

  22. Artificial neural networks in business: Two decades of research

    In recent two decades, artificial neural networks have been extensively used in many business applications. Despite the growing number of research papers, only few studies have been presented focusing on the overview of published findings in this important and popular area. Moreover, the majority of these reviews were introduced more than 15 ...

  23. (PDF) A Review on Artificial Neural Networks

    Abstract. This paper deals with the glance of introductory to Artificial Neural Networks. Neural networks simulate how the complex human brain works with neurons connected with other multiple ...

  24. (PDF) AN INTRODUCTION TO ARTIFICIAL NEURAL NETWORK

    This paper gives an introduction into ANN and the way it is used. Brain neuron [2] Model of an artificial neuron [3] x 1 ...x n are the inputs to the neuron. A bias is also added to the neuron ...

  25. A view of Artificial Neural Network

    In this paper, An Artificial Neural Network or ANN, its various characteristics and business applications. In this paper also show that "what are neural networks" and "Why they are so important in today's Artificial intelligence?" Because various advances have been made in developing intelligent system, some inspired by biological neural networks. ANN provides a very exciting ...

  26. (PDF) Artificial Neural Network Systems

    In this paper, a review in recent development and applications of the Artificial Neural Networks is presented in order to move forward the research filed by reviewing and analyzing recent ...

  27. Artificial Intelligence in Ship Trajectory Prediction

    Maritime traffic is increasing more and more, creating more complex navigation environments for ships. Ship trajectory prediction based on historical AIS data is a vital method of reducing navigation risks and enhancing the efficiency of maritime traffic control. At present, employing machine learning or deep learning techniques to construct predictive models based on AIS data has become a ...