Advances on Data Management and Information Systems

  • Published: 02 March 2022
  • Volume 24 , pages 1–10, ( 2022 )

Cite this article

  • Jérôme Darmont 1 ,
  • Boris Novikov 2 ,
  • Robert Wrembel 3 &
  • Ladjel Bellatreche 4  

5947 Accesses

3 Citations

3 Altmetric

Explore all metrics

Avoid common mistakes on your manuscript.

1 Introduction

The research and technological area of data management encompasses various concepts, techniques, algorithms and technologies, including data modeling, data integration and ingestion, transactional data management, query languages, query optimization, physical data storage, data structures, analytical techniques (including On-Line Analytical Processing – OLAP), as well as service creation and orchestration (Garcia-Molina et al., 2009 ). Data management technologies are core components of every information system, either centralized or distributed, deployed in an on-premise hardware architecture or in a cloud ecosystem. Data management technologies have been used in commercial, mature products for decades. They were originally developed for managing structured data (mainly expressed in the relational data model).

Yet, the ubiquitous big data (Azzini et al., 2021 ) require development of new data management techniques, suitable for the variety of data formats (from structured, through semi-structured, to unstructured), overwhelming data volumes and velocity of big data generation. These new techniques draw upon the concepts applied to managing relational data.

One of the most frequently used data format for big data is based on graphs, which are a natural way of representing relationships between entities, e.g., knowledge, social connections and components of a complex system. Such data not only need to be efficiently stored but also efficiently analyzed. Therefore, some OLAP-like analysis approaches from graph data have been recently proposed, e.g., Chen et al. ( 2020 ), Ghrab et al. ( 2018 ), Ghrab et al. ( 2021 ), and Schuetz et al. ( 2021 ). Thus, combining graph and OLAP technologies offers ways of analyzing graphs in a manner already well accepted by the industry (Richardson et al., 2021 ).

The complexity of ecosystems for managing big data results in challenges for orchestrating these components and in optimizing their performance, as there are too many parameters in each system to be manually tuned by a human administrator. Thus, more and more frequently, machine learning techniques are applied to performance optimization, e.g., Hernández et al. ( 2018 ) and Witt et al. ( 2019 ). Conversely, data management techniques are used to solve challenges in machine learning, such as building end-to-end data processing pipelines (Romero and Wrembel, 2020 ).

In this editorial to the special section of Information Systems Frontiers, we outline research problems in graph processing, OLAP and machine learning. These problems are addressed by the papers in this special section.

2 Selected Research Problems in Data Management and Information Systems

2.1 graph processing.

Graph processing algorithms have been attracting attention of researchers since the 1950’s. Several knowledge representation techniques (such as semantic networks) studied in the 1970’s utilize graph structures, including applications to rule-based systems (Griffith, 1982 ), data structures for efficient processing (Moldovan, 1984 )), and several other aspects.

The concepts of semantic networks served as a base for the Semantic Web and evolved into knowledge graphs, also known as knowledge bases. Processing of large distributed RDF knowledge bases with the SPARQL language is addressed in Peng et al. ( 2016 ).

Graph-based models become a natural choice for a representation of semi-structured data (McHugh et al., 1997 ). Graph representations proved their usefulness for modeling hypertexts, including the World Wide Web (Meusel et al., 2014 ). Documents (e.g., Web pages) are mapped to vertices, while directed edges represent links. The graph representations of the WWW provided several features (such as page rank and simrank) for deep analysis of its structure and definition. Sources of large graphs include social networks, bioinformatics, road networks, and other application domains.

The need to store and process large graphs supports growing interest to graph databases. An overview of several aspects of graph databases can be found in (Deutsch and Papakonstantinou, 2018 ). Typically, such databases can store graph vertices and nodes, labeled with sets of attributes. A widespread opinion states that graph databases provide more powerful modeling features than the relational model used in traditional relational databases. This is doubtful, as a relational database schema (represented for example as an ER model) is also a graph (Pokorný, 2016 ). Actually, the advantage of graph databases is that the expensive and time-consuming modeling can be pushed forward to later phases of the information system lifecycle, providing more options for rapid prototyping and similar application development methodologies (Brdjanin et al., 2018 ).

A need of highly expressive tools for graph processing specification triggered a number of efforts in declarative query languages design. A step toward the standardization of graph query languages (Angles et al., 2018 ) is focused on providing a balance between high expressiveness and computational performance, avoiding constructs that may result in unacceptable computation complexity. The GSQL graph query language (Deutsch et al., 2020 ) supports the specification of complex analytical queries over graphs, including pattern matching and aggregations. A comparison of different graph processing techniques, available in the Neo4j graph database management system, can be found in Holzschuher and Peinl ( 2013 ).

Several similarities can be found between relational and graph declarative query languages: as soon as sets of labeled nodes or edges are produced as intermediate results, the remaining processing is typically expressed in terms of relational operations. The most significant differences between graph and relational database query languages follow.

Graph traversal (implicit and rarely used in relational languages) requires the intensive use of recursion and an efficient implementation in graph databases.

Graph query languages provide support for computationally complex processing, such as weighted shortest path search, potentially with additional constraints.

Locality of data placement is essential for high performance of relational systems. Data placement in graph databases is much more complex and often results in poor performance when the size of the database exceeds the available main memory of a single server.

The items listed above are inter-related: the performance of graph processing depends on locality needed for efficient traversing of a graph. However, traversing depends on the problem being solved. A generic approach is to rely on certain graph properties to optimize the storage of graph nodes and edges, that is, graph partitioning.

2.2 On-Line Analytical Processing

The term “On-Line Analytical Processing” (OLAP) was coined by Edgar F. Codd in 1993 (Codd et al., 1993 ). OLAP is defined in contrast to operational database systems that run On-Line Transactional Processing (OLTP). In OLTP, data representing the current state of information may be frequently modified and are interrogated through relatively simple queries. OLAP’s data are typically sourced from one or several OLTP databases, consolidated and historicized for decision-support purposes. They are seldom modified and are queried by complex, analytical queries that run over large data volumes.

Conceptually, OLAP rests on a metaphor that is easy to grasp by business users: the (hyper)cube. Facts constituted of numerical Key Performance Indicators (KPIs), e.g., product sales, are analysis subjects. They are viewed as points in a multidimensional space whose dimensions are analysis axes, e.g., time, store, salesperson, etc. Dimensions may also have hierarchies, e.g., \(store \rightarrow city \rightarrow state\) . Thus, dimensions represent the coordinates of facts in the multidimensional space.

In the 1990’s, OLAP research mainly focused on designing efficient logical and physical models, synthetically surveyed by Vassiliadis and Sellis ( 1999 ). Relational OLAP (ROLAP) relies on storing data in time-tested relational Database Management Systems (DBMSs), complemented with new, OLAP-specific operators and queries available in SQL99. ROLAP is cheap and easy to implement, can handle large data volumes, and schema evolution is relatively easy. However, ROLAP induces numerous, costly joints that hinder query performance, and analysis results are not suitable to end-users, i.e., business users, and thus must be reformatted.

In contrast, Multidimensional OLAP (MOLAP) sticks to the cube metaphor. Hypercubes are natively stored in multidimensional tables, allowing quick aggregate computations. However, it turned out that MOLAP systems and languages (e.g., MDX) were in majority proprietary and difficult to implement. Moreover, data volume is limited to the RAM size and a cube can be quite sparse, wasting memory. Eventually, refreshing the system is limited, inducing full and costly periodical reconstructions.

Eventually, Hybrid OLAP (HOLAP) was proposed as the best of both worlds (Salka, 1998 ), by storing atomic data in a relational DBMS and aggregated data in MOLAP cubes, thus achieving a good cost/performance tradeoff on large data volumes. However, HOLAP is difficult to implement and neither as fast as MOLAP nor as scalable as ROLAP. Later on, in 2014, Gartner introduced the Hybrid Transaction/Analytical Processing (HTAP Footnote 1 ), where an in-memory DBMS helps process OLTP and OLAP simultaneously, which allows transactional data to be quickly available for analytics and induces fast, distributed query computation while avoiding data redundancy. However, this is a complex and drastic change in decision-support architectures.

After OLAP pioneers, many lines of research went on for more than fifteen years, which can be classified in two trends. In the first trend, OLAP is adapted to particular data formats. One of the most prominent of such adaptations is probably Spatial OLAP (SOLAP Han, 2017 ), where OLAP is applied on spatial (and even spatio-temporal) data, allowing for example to zoom and dezoom (i.e., drill-down and roll-up in terms of OLAP operations) spatial representations such as maps. Another well-researched adaptation was XML-OLAP (also called XOLAP), which allows OLAP on semi-structured data. Related approaches are surveyed in Mahboubi et al. ( 2009 ). Other examples include OLAP on trajectory data (Marketos and Theodoridis, 2010 ) and mobile OLAP (Maniatis, 2004 ).

In the second trend, OLAP is hybridized with other techniques for specific purposes. Quite quickly, OLAP was associated with data mining, with OLAP providing data navigation and identifying a subset of a cube; and data mining featuring association, classification, prediction, clustering, and sequencing on this data subset (Han, 1997 ).

With the Web becoming an important source of data, OLAP systems could not rely only on internal data any more and had to discover external, Web data, as well as their semantics. This issue was addressed with the help of Semantic Web (SW) technologies that support inference and reasoning on data. An extensive survey covers this research trend (Abelló et al., 2015 ). OLAP was also combined with information networks akin to social media, in the sense that they can be represented by very similar graphs. A comprehensive survey of the so-called Graph OLAP, with a focus on bibliographic data analysis, is provided in Loudcher et al. ( 2015 ).

Eventually, the Big Data era made OLAP meet new challenges such as: (1) design methods that handle a high complexity that tends to make the number of dimensions explosive; (2) computing methodologies that leverage the cloud computing paradigm for scaling and performance; and (3) query languages that can manage data variety (Cuzzocrea, 2015 ).

Big Data also pushed forward the exploitation of textual documents, which are acknowledged to represent the majority of the information stored worldwide. In the context of OLAP, i.e., Text or Textual OLAP, the key issue is to find ways of aggregating textual documents instead of numerical KPIs. Two trends emerge, based on the hypercube structure and text mining, respectively. They are thoroughly surveyed and discussed in Bouakkaz et al. ( 2017 ).

Finally, with the emergence of Data Lakes (DLs) in the 2010’s, the concept of Data Warehouse (DW), on which OLAP typically rests, is challenged in terms of data integration complexity, data siloing, data variety management and even scaling. However, DLs and DWs are actually synergistic. A DL can indeed be the source of a DW, and DWs can be components, among others, of DLs. Thence, OLAP remains very useful as an analytical tool in both cases. Two recent and complimentary surveys cover DL, DW and OLAP-related issues (Sawadogo and Darmont, 2021 ; Hai et al., 2021 ).

2.3 Machine Learning

Artificial intelligence (AI) has been a hot research and technological topic for a few years. AI refers to the computing techniques that allow stimulation of human-like intelligence in machines. AI is a broad area of research and technology that includes a sub-area - Machine Learning (ML), which enables a computer system to learn models from data.

The most frequent ML techniques include regression, clustering, and classification (they are supported by multiple software tools (Krensky & Idoine, 2021 )). Regression aims at building statistical models to predict continuous values (e.g., electrical or thermal energy usage in a given point in time or time period). Clustering aims at dividing data items into a non-predefined number of groups, such that the instances in the same group have similar values of some features (e.g., grouping customers by their purchase behaviour). Classification aims at predicting a predefined class to which belongs a given data item (e.g., classifying patients into a class of high blood pressure risk or a class of no-risk).

ML in turn includes a sub-area - Artificial Neural Networks (ANNs). ANNs are based on a statistical model that reflects the way a human brain is build, thus it mathematically models how the brain works. ANNs are the foundation of Deep Learning (DL) (Bengio et al., 2021 ). DL applies algorithms that allow a machine to train itself from large volumes of data, in order to learn new models based on new input (data). DL turned out to be especially efficient in image and speech recognition.

In order to build prediction models by ML algorithms, massive amounts of pre-processed data are needed. The pre-processing includes a workflow of tasks (a.k.a. data wrangling (Bogatu et al., 2019 ), data processing/preparation pipeline (Konstantinou & Paton, 2020 ; Romero et al., 2020 ) or ETL (Ali & Wrembel, 2017 )). The workflow includes the following tasks: data integration and transformation, data cleaning and homogenization, data preparation for a particular ML algorithm. Based on pre-processed data, ML models are built (trained, validated, and tuned Quemy, 2020 ). Since the whole workflow is very complex, constructing it requires a deep knowledge from its developer in multiple areas, including software engineering, data engineering, performance optimization, and ML. Thus, multiple works focus on automating the construction of such workflows. This research area is commonly called AutoML . It turned to be a hot research area in recent years (Bilalli et al., 2019 ; Giovanelli et al., 2021 ; Koehler et al., 2021 ; Quemy, 2019 ). (Kedziora et al., 2020 ) provides an excellent state of the art of this research area.

Other major trends in ML are pointed to by the Gartner report on strategical technological trends for 2021 (Panetta, 2020 ). Among the trends AI engineering is listed. It is defined as means to “facilitate the performance, scalability, interpretability and reliability of AI models”. Interpretability and reliability is crucial, since AI systems are typically applied to support decision making by providing means of prediction models and recommendations, which by definition must be reliable. Moreover, a decision maker must be able to figure out and understand how a decision was reached by a given model.

Unfortunately, models built by ML algorithms may be difficult to understand for a user, for two main reasons. First, a model may be too complex to be understood by a user. Second, a user typically has access to an input and output of a model, i.e., internals of the model are hidden. Such models are typically referred to as black-box models . They typically include ensemble models produced by classification techniques (e.g., Random Forest, Bagging, Adaboost) and ANN models. Even a simple classification model may be difficult to understand if a decision tree is large. ANN models are by their nature non-interpretable (e.g., an ANN with a hundred of inputs and several hidden layers). As a consequence, a user is not able to fully understand how decisions are reached by such complex models (Du et al., 2019 ).

Yet, in a decision making process, it is necessary to understand how a given decision was reached by a ML model. Therefore, there is a need for developing methods for explaining how ML black-box models work internally. As the response to this need, the so-called Explainable Artificial Intelligence (EAI) (Biggio et al., 2021 ; Goebel et al., 2018 ; Liang et al., 2021 ; Langer et al., 2021 ; Miller, 2019 ) or Interpretable Machine Learning (IML) (Du et al., 2019 ) techniques are being developed. This research problem is defined as “investigating methods to produce or complement AI to make accessible and interpretable the internal logic and the outcome of the model, making such process human understandable” (Bodria et al., 2021 ). Explaining ML models turned out to be crucial in multiple business and engineering domains, such as system security (Mahbooba et al., 2021 ), health care (Danso et al., 2021 ), chemistry (Karimi et al., 2021 ), text processing (Moradi & Samwald, 2021 ), finance (Ohana et al., 2021 ), energy management (Sardianos et al., 2021 ), and IoT (García-Magariño et al., 2019 ).

A substantial growth of this research topic is observed in years 2018-2019. The DBLP service Footnote 2 includes in total 215 papers on EAI and 163 papers on IML (as of September 25, 2021). Google Trends Footnote 3 shows an increasing popularity trend of this research topic. Figure  1 shows an aggregated trend for EAI and IML.

figure 1

Aggregated popularity trend of EAI and IML topics (Google Trends)

Techniques for building interpretable ML models can be divided into two categories, namely intrinsic and post-hoc. An orthogonal classification divides the techniques into global and local interpretable models (Du et al., 2019 ).

Models from the intrinsic interpretability incorporate interpretability directly to their structures, making them self-interpretable. Examples of such models include for example: decision trees, rule-based models, and linear models. Models from the post-hoc interpretability require constructing an additional model, which provides explanations to the main model.

A globally interpretable model means that a user is able to understand how a model works globally, i.e., in a generalized way. A locally interpretable model allows a user to understand how an individual prediction was made by the model.

Multiple approaches to explaining models have been proposed. They may be specific to a type of data used to build a model, i.e., there are specific approaches for table-like data, for images, and for texts.

For explaining models that use table-like data, the most popular method is based either on rules or on feature importance. A rule-based explanation uses decision rules understandable by a user, which explain reasoning that produced the final prediction (decision). A feature importance explanation assigns a value to each input feature. The value represents the importance of a given feature in the produced model.

For explaining models that work on images, the most frequently used technique is called the Sailency Map (SM). The SM is an image where a brightness and/or color of a pixel reflects how important the pixel is (it is typically visualized as a divergent color map). This way, it can be visualized whether and how strongly a given pixel in an image contributes to the given output of a model. The SM is typically modeled as a matrix, whose dimensions are the sizes of the image being analyzed.

A concept similar to SM can be used to explain models that work on text data. When the SM is applied to a text, then every word in the text is assigned a color, which reflects the importance of a given word in the final output of a model.

An excellent overview of explanation methods in ML for various types of data is available in Bodria et al. ( 2021 ).

3 Special Section Content

This editorial paper overviews research topics covered in this special section of the Information Systems Frontiers journal. The special section contains papers invited from the 24 th European Conference on Advances in Databases and Information Systems (ADBIS).

3.1 ADBIS Research Topics

The ADBIS conference has been running continuously since 1993. An overview of ADBIS past and present activities can be found in (Tsikrika and Manolopoulos, 2016 ) and at . ADBIS is considered among core European conferences on practical and theoretical aspects of databases, data engineering, data management as well as information systems development and management. In this context, the most frequent research topics addressed by researchers submitting papers to ADBIS within the last ten years include: Data streams , Data models and modeling , Data cleaning and quality , Graph processing , Reasoning and intelligent systems , On-Line Analytical Processing , Software and systems , Ontologies and RDF , Algorithms , Indexing , Spatio-temporal data processing , Data integration , Query language and processing , Machine Learning .

All research topics covered within the last 10 editions (years 2012-2021) of ADBIS are visualized in Fig.  2 . We constrained the analysis to the 10-years period in order to reflect the recent research interests. Moreover, we analyzed papers published only in the LNCS volumes, to include the highest quality papers. In Fig.  2 , the Y axis shows a total number of papers addressing a given topic (the median is equal to 7). Q1-Q4 represent the first, second, third, and fourth quartile, respectively.

figure 2

Research topics within the last 10 years of ADBIS (based on papers published only in LNCS volumes)

The papers included in this special section address topics from Q3 and Q4, and thus represent frequent ADBIS topics . These papers cover: Graph processing , OLAP , and Machine Learning (marked in black in Fig.  2 ). These three topics are outlined in Section  2 , whereas the papers included in this special section are summarized in Section  3 . It is worth to note that the most frequent ADBIS topics reflect world research trends and they follow research topics of top world conferences in databases and data engineering, including SIGMOD, VLDB, and ICDE (Wrembel et al., 2019 ). This special section includes three papers covering: Graph processing , OLAP , and Machine Learning .

3.2 Papers in this Special Section

The first paper (Belayneh et al., 2022 ), Speeding Up Reachability Queries in Public Transport Networks Using Graph , authored by Bezaye Tesfaye Belayneh, Nikolaus Augsten, Mateusz Pawlik, Michael H. Böhlen, and Christian S. Jensen, addresses the challenges discussed in Section  2.1 for a special case of temporal road networks graphs and a special case of queries, namely, reachability queries over public transport network.

An evaluation of such queries involves multiple computations of shortest paths with additional temporal constraints. Specifically, the connection time calculated as a difference between a departure of the outgoing vehicle and an arrival of previous incoming vehicle is added to the length of a path. The problem is, in general, NP -hard. Therefore, an approximate algorithm is needed to solve the problem efficiently. To this end, the authors propose an algorithm based on graph partitioning: the problem is split into smaller problems. A set of boundary nodes is pre-calculated for each partition. The shortest path is found in each partition (called a cell in the paper) and a choice of a path between partitions. Pre-calculated paths inside cells constitute an index that significantly speeds up the search. In the proposed evaluation, the search is limited to startpoint and endpoint cells and search for chains of cells, as the paths inside cells and boundary nodes are pre-calculatied. The partitioning provides locality, but of course actual performance depends on the choice of partitioning algorithm. The paper contains deep performance analysis and comparison of different partitioning algorithms.

The second paper (Francia et al., 2022 ), entitled Enhancing Cubes with Models to Describe Multidimensional Data , by Matteo Francia, Patrick Marcel, Veronika Peralta, and Stefano Rizzi, presents a first step toward a proof of concept of the Intentional Analytics Model (IAM).

IAM mobilizes both Online Analytical Processing (OLAP) and various machine learning methods to allow users express the so-called analysis intentions and obtain the so-called enhanced (annotated) data cube. Analysis intentions are expressed with five operators. The paper focuses on formalizing and implementing the describe operator, which describes cube measures. Enhanced cube cells are associated with interesting components of models (e.g., clustering models) that are automatically extracted from cubes. For example, cells containing outliers can be highlighted.

Moreover, the authors propose a measure to assess the interestingness of model components in terms of novelty, peculiarity and surprise during the user’s data navigation. A dataviz is also automatically produced by a heuristic to depict enhanced cubes, by coupling text-based representations (a pivot table and a ranked component list) and graphical representations, i.e., various possible charts. Eventually, the whole approach is evaluated through experiments that target efficiency, scalability, effectiveness, and formulation complexity.

The third paper (Ferrettini et al., 2022 ), entitled Coalitional Strategies for Efficient Individual Prediction Explanation , by Gabriel Ferrettini, Elodie Escriva, Julien Aligon, Jean-Baptiste Excoffier, and Chantal Soulé-Dupuy, addresses the problem of explaining machine learning models. The goal of this work was to develop a general method for facilitating the understanding of how a machine learning model works, with a particular focus on identifying groups of attributes that affect a ML model, i.e., a quality of prediction provided by the model.

A starting point of the investigation is an observation that attributes cannot be considered as independent of each other, therefore it was required to verify the influence of all possible attributes combinations on the model quality. The influence of an attribute is measured according to its importance in each group an attribute can belong to. A complete influence of an attribute now takes into consideration its importance among all the possible attribute combinations. Computing the complete influence is of exponential complexity. For this reason, efficient methods for finding influential groups are needed.

In this context, the paper describes a method for identifying groups of attributes that are crucial for a quality of a ML model. To this end, the authors proposed the so-called coalitions . A coalition includes these attributes that influence a ML model. In order to identify coalitions, the authors proposed to use the following techniques:

Model-based coalition , where interactions between attributes used in a model are detected by analyzing the usage of the attributes by the model. To this end, the values of attributes in an input data set are modified and it is observed how the model predictions vary.

PCA-based coalition , where the PCA method is applied to create a set of combined attributes, represented by a new attribute obtained from the PCA. This set is considered as an influential group of attributes.

Variance inflation factor-based coalitions , where the standard variance inflation factor (VIF) is an estimation of the multicollinearity of the attributes in a dataset, w.r.t. a given target attribute. VIF is based on the R coefficient of determination of the linear regression. Since the value of VIF is computed by means of a linear regression, this method is suitable for coalitions where linear correlation between attributes exist.

Spearman correlation coefficient-based coalition , which takes into account non-linear correlations between attributes. The correlations are computed between all pairs of attributes and their correlations are represented by the Spearman coefficient.

These methods were evaluated by excessive experiments on multiple data sets provided by, for two classification algorithms, namely Random Forest and Support Vector Machine. As the baseline, the so-called complete method was selected. The obtained results, show that the proposed methods provided promising performance characteristics in terms of computation time and model accuracy.

Abelló, A., Romero, O., Pedersen, T.B., Llavori, R.B., Nebot, V., Cabo, M.J.A., & Simitsis, A. (2015). Using semantic web technologies for exploratory OLAP: a survey. IEEE Transactions on Knowledge and Data Enginering , 27 (2), 571–588.

Article   Google Scholar  

Ali, S.M.F., & Wrembel, R. (2017). From conceptual design to performance optimization of ETL workflows: current state of research and open problems. The VLDB Journal , 26 (6), 777–801.

Angles, R., Arenas, M., Barcelo, P., Boncz, P., Fletcher, G., Gutierrez, C., Lindaaker, T., Paradies, M., Plantikow, S., Sequeda, J., van Rest, O., & Voigt, H. (2018). G-core: a core for future graph query languages. In ACM SIGMOD Int. Conf. on management of data (pp. 1421–1432).

Azzini, A.S.B. Jr, Bellandi, V., Catarci, T., Ceravolo, P., Cudré-mauroux, P., Maghool, S., Pokorný, J., Scannapieco, M., Sédes, F., Tavares, G.M., & Wrembel, R. (2021). Advances in data management in the big data era. In Advancing research in information and communication technology, IFIP AICT , (Vol. 600 pp. 99–126). Springer.

Belayneh, B.T., Augsten, N., Pawlik, M., Böhlen, M. H., & Jensen, C.S. (2022). Speeding up reachability queries in public transport networks using graph partitioning. Inf. Syst Frontiers 24 (1). .

Bengio, Y., Lecun, Y., & Hinton, G. (2021). Deep learning for ai. Communcations of the ACM , 64 (7), 58–65.

Biggio, B., Diaz, C., Paulheim, H., & Saukh, O. (2021). Big minds sharing their vision on the future of ai (panel). In Database and expert systems applications (DEXA), LNCS , Vol. 12923. Springer.

Bilalli, B., Abelló, A., Aluja-banet, T., & Wrembel, R. (2019). PRESISTANT: learning based assistant for data pre-processing. Data & Knowledge Engineering 123.

Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., & Rinzivillo, S. (2021). Benchmarking and survey of explanation methods for black box models. arXiv: 2102.13076 .

Bogatu, A., Paton, N.W., Fernandes, A.A.A., & Koehler, M. (2019). Towards automatic data format transformations: Data wrangling at scale. The Computer Journal , 62 (7), 1044–1060.

Bouakkaz, M., Ouinten, Y., Loudcher, S., & Strekalova, Y.A. (2017). Textual aggregation approaches in OLAP context: a survey. Int. Journal of Information Management , 37 (6), 684–692.

Brdjanin, D., Banjac, D., Banjac, G., & Maric, S. (2018). An online business process model-driven generator of the conceptual database model. In Int. Conf. on web intelligence, mining and semantics .

Chen, H., Wu, B., Deng, S., Huang, C., Li, C., Li, Y., & Cheng, J. (2020). High performance distributed OLAP on property graphs with grasper. In Int. Conf. on management of data, SIGMOD (pp. 2705–2708). ACM.

Codd, E., Codd, S., & Salley, C. (1993). Providing OLAP to User-Analysts: an IT mandate. E.F codd & associates.

Cuzzocrea, A. (2015). Data warehousing and OLAP over Big Data: a survey of the state-of-the-art, open problems and future challenges. Int. Journal of Business Process Integration and Management , 7 (4), 372–377.

Danso, S.O., Zeng, Z., Muniz, G.T., & Ritchie, C. (2021). Developing an explainable machine learning-based personalised dementia risk prediction model: a transfer learning approach with ensemble learning algorithms. Frontiers Big Data , 613047 , 4.

Google Scholar  

Deutsch, A., & Papakonstantinou, Y. (2018). Graph data models, query languages and programming paradigms. Proc. VLDB Endow. , 11 (12), 2106–2109.

Deutsch, A., Xu, Y., Wu, M., & Lee, V.E. (2020). Aggregation support for modern graph analytics in tigergraph. In ACM SIGMOD Int. Conf. on management of data (pp. 377–392).

Du, M., Liu, N., & Hu, X. (2019). Techniques for interpretable machine learning. Communcations of the ACM , 63 (1), 68–77.

Ferrettini, G., Escriva, E., Aligon, J., Excoffier, J.B., & Soulé-Dupuy, C. (2022). Coalitional strategies for efficient individual prediction explanation. Inf. Syst Frontiers 24(1). .

Francia, M., Marcel, P., Peralta, V., & Rizzi, S. (2022). Enhancing cubes with models to describe multidimensional data. Inf. Syst Frontiers 24(1). .

García-Magariño, I., Rajarajan, M., & Lloret, J. (2019). Human-centric AI for trustworthy iot systems with explainable multilayer perceptrons. IEEE Access , 7 , 125562–125574.

Garcia-Molina, H., Ullman, J.D., & Widom, J. (2009). Database systems - the complete book . London: Pearson Education.

Ghrab, A., Romero, O., Jouili, S., & Skhiri, S. (2018). Graph BI & analytics: Current state and future challenges. In Int. Conf. on big data analytics and knowledge discovery DAWAK, LNCS , (Vol. 11031 pp. 3–18). Springer.

Ghrab, A., Romero, O., Skhiri, S., & Zimányi, E. (2021). Topograph: an end-to-end framework to build and analyze graph cubes. Information Systems Frontiers , 23 (1), 203–226.

Giovanelli, J., Bilalli, B., & Abelló, A. (2021). Effective data pre-processing for automl. In Int. Workshop on design, optimization, languages and analytical processing of big data (DOLAP), CEUR workshop proceedings , (Vol. 2840 pp. 1–10).

Goebel, R., Chander, A., Holzinger, K., Lécué, F., Akata, Z., Stumpf, S., Kieseberg, P., & Holzinger, A. (2018). Explainable AI: the new 42?. In IFIP TC 5 Int. Cross-domain conf. on machine learning and knowledge extraction CD-MAKE, LNCS , (Vol. 11015 pp. 295–303). Springer.

Griffith, R.L. (1982). Three principles of representation for semantic networks. ACM Transactions on Database Systems 417–442.

Hai, R., Quix, C., & Jarke, M. (2021). Data lake concept and systems: a survey arXiv: 2106.09592 .

Han, J. (1997). OLAP Mining: Integration of OLAP with data mining. In Conf. on database semantics (DS), IFIP conference proceedings , (Vol. 124 pp. 3–20).

Han, J. (2017). OLAP, Spatial , (pp. 809–812). Berlin: Encyclopedia of GIS Springer.

Hernández, A.́B., Pérez, M.S., Gupta, S., & Muntés-mulero, V. (2018). Using machine learning to optimize parallelism in big data applications. Future Gener. Comput. Syst. , 86 , 1076–1092.

Holzschuher, F., & Peinl, R. (2013). Performance of graph query languages: Comparison of cypher, gremlin and native access in neo4j. In Joint EDBT/ICDT workshops (pp. 195–204).

Karimi, M., Wu, D., Wang, Z., & Shen, Y. (2021). Explainable deep relational networks for predicting compound-protein affinities and contacts. Journal of Chemical Information and Modeling , 61 (1), 46–66.

Kedziora, D.J., Musial, K., & Gabrys, B. (2020). Autonoml: Towards an integrated framework for autonomous machine learning. arXiv: 2012.12600 .

Koehler, M., Abel, E., Bogatu, A., Civili, C., Mazilu, L., Konstantinou, N., Fernandes, A.A.A., Keane, J.A., Libkin, L., & Paton, N.W. (2021). Incorporating data context to cost-effectively automate end-to-end data wrangling. IEEE Transactions on Big Data , 7 (1), 169–186.

Konstantinou, N., & Paton, N.W. (2020). Feedback driven improvement of data preparation pipelines. Information Systems , 92 , 101480.

Krensky, P., & Idoine, C. (2021). Magic quadrant for data science and machine learning platforms. . Gartner.

Langer, M., Oster, D., Speith, T., Hermanns, H., Kästner, L., Schmidt, E., Sesing, A., & Baum, K. (2021). What do we want from explainable artificial intelligence (xai)? - a stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artifitial Intelligence , 296 , 103473.

Liang, Y., Li, S., Yan, C., Li, M., & Jiang, C. (2021). Explaining the black-box model: a survey of local interpretation methods for deep neural networks. Neurocomputing , 419 , 168–182.

Loudcher, S., Jakawat, W., Soriano-Morales, E.P., & Favre, C. (2015). Combining OLAP and information networks for bibliographic data analysis: a survey. Scientometrics , 103 (2), 471–487.

Mahbooba, B., Timilsina, M., Sahal, R., & Serrano, M. (2021). Explainable artificial intelligence (XAI) to enhance trust management in intrusion detection systems using decision tree model. Complexity , 2021 , 6634811:1–6634811:11.

Mahboubi, H., Hachicha, M., & Darmont, J. (2009). XML Warehousing And OLAP, Encyclopedia of Data Warehousing and Mining, Second Edition, vol. IV, pp. 2109–2116 IGI Publishing.

Maniatis, A.S. (2004). The case for mobile OLAP. In Current trends in database technology – EDBT workshops, LNCS , (Vol. 3268 pp. 405–414).

Marketos, G., & Theodoridis, Y. (2010). Ad-hoc OLAP on Trajectory Data. In Int. Conf. on mobile data management (MDM) (pp. 189–198).

McHugh, J., Abiteboul, S., Goldman, R., Quass, D., & Widom, J. (1997). Lore: a database management system for semistructured data. SIGMOD Record , 26 (3), 54–66.

Meusel, R., Vigna, S., Lehmberg, O., & Bizer, C. (2014). Graph structure in the web — revisited: a trick of the heavy tail. In Int. Conf. on world wide web (pp. 427–432).

Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence , 267 , 1–38.

Moldovan, D.I. (1984). An associative array architecture intended for semantic network processing. In Annual conf. of the ACM on the fifth generation challenge (pp. 212–221). ACM.

Moradi, M., & Samwald, M. (2021). Explaining black-box models for biomedical text classification. IEEE Journal of Biomedical and Health Informatics , 25 (8), 3112–3120.

Ohana, J., Ohana, S., Benhamou, E., Saltiel, D., & Guez, B. (2021). Explainable AI (XAI) models applied to the multi-agent environment of financial markets. In Explainable and transparent AI and multi-agent systems, lecture notes in computer science , (Vol. 12688 pp. 189–207). Springer.

Panetta, K. (2020). Gartner top strategic technology trends for 2021. . Gartner.

Peng, P., Zou, L., Özsu, M.T., Chen, L., & Zhao, D. (2016). Processing sparql queries over distributed rdf graphs. The VLDB Journal , 25 , 243–268.

Pokorný, J. (2016). Conceptual and database modelling of graph databases. In Int. Symp. on database engineering and application systems (IDEAS) (pp. 370–377).

Quemy, A. (2019). Data pipeline selection and optimization. In Int. Workshop on design, optimization, languages and analytical processing of big data, CEUR workshop proceedings , Vol. 2324.

Quemy, A. (2020). Two-stage optimization for machine learning workflow. Information Systems , 92 , 101483.

Richardson, J., Schlegel, K., Sallam, R., Kronz, A., & Sun, J. (2021). Magic quadrant for analytics and business intelligence platforms. . Gartner.

Romero, O., & Wrembel, R. (2020). Data engineering for data science: Two sides of the same coin. Int. Conf. on big data analytics and knowledge discovery DAWAK, LNCS, vol. 12393, pp. 157–166. Springer .

Romero, O., Wrembel, R., & Song, I. (2020). An alternative view on data processing pipelines from the DOLAP 2019 perspective. Information Systems 92.

Salka, C. (1998). Ending the MOLAP/ROLAP debate: Usage based aggregation and flexible HOLAP. In Int. Conf. on data engineering (ICDE) (p. 180).

Sardianos, C., Varlamis, I., Chronis, C., Dimitrakopoulos, G., Alsalemi, A., Himeur, Y., Bensaali, F., & Amira, A. (2021). The emergence of explainability of intelligent systems: Delivering explainable and personalized recommendations for energy efficiency. Int. Journal of Intelligent Systems , 36 (2), 656– 680.

Sawadogo, P.N., & Darmont, J. (2021). On data lake architectures and metadata management. Journal of Intelligent Information Systems , 56 (1), 97–120.

Schuetz, C.G., Bozzato, L., Neumayr, B., Schrefl, M., & Serafini, L. (2021). Knowledge graph OLAP. Semantic Web , 12 (4), 649–683.

Tsikrika, T., & Manolopoulos, Y. (2016). A retrospective study on the 20 years of the ADBIS conference. In New trends in databases and information systems, communications in computer and information science , (Vol. 637 pp. 1–15). Springer.

Vassiliadis, P., & Sellis, T.K. (1999). A survey of logical models for OLAP databases. SIGMOD Record , 28 (4), 64–69.

Witt, C., Bux, M., Gusew, W., & Leser, U. (2019). Predictive performance modeling for distributed batch processing using black box monitoring and machine learning. Information Systems , 82 , 33–52.

Wrembel, R., Abelló, A., & Song, I. (2019). DOLAP Data warehouse research over two decades: Trends and challenges. Information Systems , 85 , 44–47.

Download references


The Guest Editors thank all friends and colleagues who contributed to the success of this special section. We appreciate the effort of all the authors who were attracted by the topics of the ADBIS conference and submitted scientific contributions.

Special thanks go to the Editors-In-Chief Prof. Ram Ramesh and Prof. H. Raghav Rao, for offering this special section to the ADBIS conference and to the Springer staff, namely to Kristine Kay Canaleja and Aila O. Asejo-Nuique, for efficient cooperation.

We appreciated the work done by the reviewers who offered their expertise in assessing the quality of the submitted papers by providing constructive comments to the authors. The list of reviewers includes:

–Andras Benczur (Eötvös Loránd University, Hungary)

–Paweł Boiński (Poznan University of Technology, Poland)

–Omar Boussaid (Univeristé Lyon 2, France)

–Theo Härder (Technical University Kaiserslautern, Germany)

–Petar Jovanovic (Universitat Politècnica de Catalunya, Spain)

–Sebastian Link (University of Auckland, New Zealand)

–Angelo Montanari (University of Udine, Italy)

–Kestutis Normantas (Vilnius Gediminas Technical University, Lithuania)

–Carlos Ordonez (University of Houston, USA)

–Szymon Wilk (Poznan University of Technology, Poland)

–Vladimir Zadorozhny (University of Pittsburgh, USA)

Author information

Authors and affiliations.

Universié Lumière Lyon 2, Lyon, France

Jérôme Darmont

HSE Unviersity, Saint Petersburg, Russia

Boris Novikov

Poznan University of Technology, Poznan, Poland

Robert Wrembel

École Nationale Supérieure de Mécanique et d’Aérotechnique, Poitiers, France

Ladjel Bellatreche

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Robert Wrembel .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Darmont, J., Novikov, B., Wrembel, R. et al. Advances on Data Management and Information Systems. Inf Syst Front 24 , 1–10 (2022).

Download citation

Accepted : 22 December 2021

Published : 02 March 2022

Issue Date : February 2022


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Advances in database systems education: Methods, tools, curricula, and way forward

Muhammad ishaq.

1 Department of Computer Science, National University of Computer and Emerging Sciences, Lahore, Pakistan

2 Department of Computer Science, Virtual University of Pakistan, Lahore, Pakistan

3 Department of Computer Science, University of Management and Technology, Lahore, Pakistan

Muhammad Shoaib Farooq

Muhammad faraz manzoor.

4 Department of Computer Science, Lahore Garrison University, Lahore, Pakistan

Uzma Farooq

Kamran abid.

5 Department of Electrical Engineering, University of the Punjab, Lahore, Pakistan

Mamoun Abu Helou

6 Faculty of Information Technology, Al Istiqlal University, Jericho, Palestine

Associated Data

Not Applicable.

Fundamentals of Database Systems is a core course in computing disciplines as almost all small, medium, large, or enterprise systems essentially require data storage component. Database System Education (DSE) provides the foundation as well as advanced concepts in the area of data modeling and its implementation. The first course in DSE holds a pivotal role in developing students’ interest in this area. Over the years, the researchers have devised several different tools and methods to teach this course effectively, and have also been revisiting the curricula for database systems education. In this study a Systematic Literature Review (SLR) is presented that distills the existing literature pertaining to the DSE to discuss these three perspectives for the first course in database systems. Whereby, this SLR also discusses how the developed teaching and learning assistant tools, teaching and assessment methods and database curricula have evolved over the years due to rapid change in database technology. To this end, more than 65 articles related to DSE published between 1995 and 2022 have been shortlisted through a structured mechanism and have been reviewed to find the answers of the aforementioned objectives. The article also provides useful guidelines to the instructors, and discusses ideas to extend this research from several perspectives. To the best of our knowledge, this is the first research work that presents a broader review about the research conducted in the area of DSE.


Database systems play a pivotal role in the successful implementation of the information systems to ensure the smooth running of many different organizations and companies (Etemad & Küpçü, 2018 ; Morien, 2006 ). Therefore, at least one course about the fundamentals of database systems is taught in every computing and information systems degree (Nagataki et al., 2013 ). Database System Education (DSE) is concerned with different aspects of data management while developing software (Park et al., 2017 ). The IEEE/ACM computing curricula guidelines endorse 30–50 dedicated hours for teaching fundamentals of design and implementation of database systems so as to build a very strong theoretical and practical understanding of the DSE topics (Cvetanovic et al., 2010 ).

Practically, most of the universities offer one user-oriented course at undergraduate level that covers topics related to the data modeling and design, querying, and a limited number of hours on theory (Conklin & Heinrichs, 2005 ; Robbert & Ricardo, 2003 ), where it is often debatable whether to utilize a design-first or query-first approach. Furthermore, in order to update the course contents, some recent trends, including big data and the notion of NoSQL should also be introduced in this basic course (Dietrich et al., 2008 ; Garcia-Molina, 2008 ). Whereas, the graduate course is more theoretical and includes topics related to DB architecture, transactions, concurrency, reliability, distribution, parallelism, replication, query optimization, along with some specialized classes.

Researchers have designed a variety of tools for making different concepts of introductory database course more interesting and easier to teach and learn interactively (Brusilovsky et al., 2010 ) either using visual support (Nagataki et al., 2013 ), or with the help of gamification (Fisher & Khine, 2006 ). Similarly, the instructors have been improvising different methods to teach (Abid et al., 2015 ; Domínguez & Jaime, 2010 ) and evaluate (Kawash et al., 2020 ) this theoretical and practical course. Also, the emerging and hot topics such as cloud computing and big data has also created the need to revise the curriculum and methods to teach DSE (Manzoor et al., 2020 ).

The research in database systems education has evolved over the years with respect to modern contents influenced by technological advancements, supportive tools to engage the learners for better learning, and improvisations in teaching and assessment methods. Particularly, in recent years there is a shift from self-describing data-driven systems to a problem-driven paradigm that is the bottom-up approach where data exists before being designed. This mainly relies on scientific, quantitative, and empirical methods for building models, while pushing the boundaries of typical data management by involving mathematics, statistics, data mining, and machine learning, thus opening a multidisciplinary perspective. Hence, it is important to devote a few lectures to introducing the relevance of such advance topics.

Researchers have provided useful review articles on other areas including Introductory Programming Language (Mehmood et al., 2020 ), use of gamification (Obaid et al., 2020 ), research trends in the use of enterprise service bus (Aziz et al., 2020 ), and the role of IoT in agriculture (Farooq et al., 2019 , 2020 ) However, to the best of our knowledge, no such study was found in the area of database systems education. Therefore, this study discusses research work published in different areas of database systems education involving curricula, tools, and approaches that have been proposed to teach an introductory course on database systems in an effective manner. The rest of the article has been structured in the following manner: Sect.  2 presents related work and provides a comparison of the related surveys with this study. Section  3 presents the research methodology for this study. Section  4 analyses the major findings of the literature reviewed in this research and categorizes it into different important aspects. Section  5 represents advices for the instructors and future directions. Lastly, Sect.  6 concludes the article.

Related work

Systematic Literature Reviews have been found to be a very useful artifact for covering and understanding a domain. A number of interesting review studies have been found in different fields (Farooq et al., 2021 ; Ishaq et al., 2021 ). Review articles are generally categorized into narrative or traditional reviews (Abid et al., 2016 ; Ramzan et al., 2019 ), systematic literature review (Naeem et al., 2020 ) and meta reviews or mapping study (Aria & Cuccurullo, 2017 ; Cobo et al., 2012 ; Tehseen et al., 2020 ). This study presents a systematic literature review on database system education.

The database systems education has been discussed from many different perspectives which include teaching and learning methods, curriculum development, and the facilitation of instructors and students by developing different tools. For instance, a number of research articles have been published focusing on developing tools for teaching database systems course (Abut & Ozturk, 1997 ; Connolly et al., 2005 ; Pahl et al., 2004 ). Furthermore, few authors have evaluated the DSE tools by conducting surveys and performing empirical experiments so as to gauge the effectiveness of these tools and their degree of acceptance among important stakeholders, teachers and students (Brusilovsky et al., 2010 ; Nelson & Fatimazahra, 2010 ). On the other hand, some case studies have also been discussed to evaluate the effectiveness of the improvised approaches and developed tools. For example, Regueras et al. ( 2007 ) presented a case study using the QUEST system, in which e-learning strategies are used to teach the database course at undergraduate level, while, Myers and Skinner ( 1997 ) identified the conflicts that arise when theories in text books regarding the development of databases do not work on specific applications.

Another important facet of DSE research focuses on the curriculum design and evolution for database systems, whereby (Alrumaih, 2016 ; Bhogal et al., 2012 ; Cvetanovic et al., 2010 ; Sahami et al., 2011 ) have proposed solutions for improvements in database curriculum for the better understanding of DSE among the students, while also keeping the evolving technology into the perspective. Similarly, Mingyu et al. ( 2017 ) have shared their experience in reforming the DSE curriculum by adding topics related to Big Data. A few authors have also developed and evaluated different tools to help the instructors teaching DSE.

There are further studies which focus on different aspects including specialized tools for specific topics in DSE (Mcintyre et al, 1995 ; Nelson & Fatimazahra, 2010 ). For instance, Mcintyre et al. ( 1995 ) conducted a survey about using state of the art software tools to teach advanced relational database design courses at Cleveland State University. However, the authors did not discuss the DSE curricula and pedagogy in their study. Similarly, a review has been conducted by Nelson and Fatimazahra ( 2010 ) to highlight the fact that the understanding of basic knowledge of database is important for students of the computer science domain as well as those belonging to other domains. They highlighted the issues encountered while teaching the database course in universities and suggested the instructors investigate these difficulties so as to make this course more effective for the students. Although authors have discussed and analyzed the tools to teach database, the tools are yet to be categorized according to different methods and research types within DSE. There also exists an interesting systematic mapping study by Taipalus and Seppänen ( 2020 ) that focuses on teaching SQL which is a specific topic of DSE. Whereby, they categorized the selected primary studies into six categories based on their research types. They utilized directed content analysis, such as, student errors in query formulation, characteristics and presentation of the exercise database, specific or non-specific teaching approach suggestions, patterns and visualization, and easing teacher workload.

Another relevant study that focuses on collaborative learning techniques to teach the database course has been conducted by Martin et al. ( 2013 ) This research discusses collaborative learning techniques and adapted it for the introductory database course at the Barcelona School of Informatics. The motive of the authors was to introduce active learning methods to improve learning and encourage the acquisition of competence. However, the focus of the study was only on a few methods for teaching the course of database systems, while other important perspectives, including database curricula, and tools for teaching DSE were not discussed in this study.

The above discussion shows that a considerable amount of research work has been conducted in the field of DSE to propose various teaching methods; develop and test different supportive tools, techniques, and strategies; and to improve the curricula for DSE. However, to the best of our knowledge, there is no study that puts all these relevant and pertinent aspects together while also classifying and discussing the supporting methods, and techniques. This review is considerably different from previous studies. Table ​ Table1 1 highlights the differences between this study and other relevant studies in the field of DSE using ✓ and – symbol reflecting "included" and "not included" respectively. Therefore, this study aims to conduct a systematic mapping study on DSE that focuses on compiling, classifying, and discussing the existing work related to pedagogy, supporting tools, and curricula.

Comparison with other related research articles

Research methodology

In order to preserve the principal aim of this study, which is to review the research conducted in the area of database systems education, a piece of advice has been collected from existing methods described in various studies (Elberzhager et al., 2012 ; Keele et al., 2007 ; Mushtaq et al., 2017 ) to search for the relevant papers. Thus, proper research objectives were formulated, and based on them appropriate research questions and search strategy were formulated as shown in Fig.  1 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig1_HTML.jpg

Research objectives

The Following are the research objectives of this study:

  • i. To find high quality research work in DSE.
  • ii. To categorize different aspects of DSE covered by other researchers in the field.
  • iii. To provide a thorough discussion of the existing work in this study to provide useful information in the form of evolution, teaching guidelines, and future research directions of the instructors.

Research questions

In order to fulfill the research objectives, some relevant research questions have been formulated. These questions along with their motivations have been presented in Table ​ Table2 2 .

Study selection results

Search strategy

The Following search string used to find relevant articles to conduct this study. “Database” AND (“System” OR “Management”) AND (“Education*” OR “Train*” OR “Tech*” OR “Learn*” OR “Guide*” OR “Curricul*”).

Articles have been taken from different sources i.e. IEEE, Springer, ACM, Science Direct and other well-known journals and conferences such as Wiley Online Library, PLOS and ArXiv. The planning for search to find the primary study in the field of DSE is a vital task.

Study selection

A total of 29,370 initial studies were found. These articles went through a selection process, and two authors were designated to shortlist the articles based on the defined inclusion criteria as shown in Fig.  2 . Their conflicts were resolved by involving a third author; while the inclusion/exclusion criteria were also refined after resolving the conflicts as shown in Table ​ Table3. 3 . Cohen’s Kappa coefficient 0.89 was observed between the two authors who selected the articles, which reflects almost perfect agreement between them (Landis & Koch, 1977 ). While, the number of papers in different stages of the selection process for all involved portals has been presented in Table ​ Table4 4 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig2_HTML.jpg

Selection criteria

Title based search: Papers that are irrelevant based on their title are manually excluded in the first stage. At this stage, there was a large portion of irrelevant papers. Only 609 papers remained after this stage.

Abstract based search: At this stage, abstracts of the selected papers in the previous stage are studied and the papers are categorized for the analysis along with research approach. After this stage only 152 papers were left.

Full text based analysis: Empirical quality of the selected articles in the previous stage is evaluated at this stage. The analysis of full text of the article has been conducted. The total of 70 papers were extracted from 152 papers for primary study. Following questions are defined for the conduction of final data extraction.

Quality assessment criteria

Following are the criteria used to assess the quality of the selected primary studies. This quality assessment was conducted by two authors as explained above.

  • The study focuses on curricula, tools, approach, or assessments in DSE, the possible answers were Yes (1), No (0)
  • The study presents a solution to the problem in DSE, the possible answers to this question were Yes (1), Partially (0.5), No (0)
  • The study focuses on empirical results, Yes (1), No (0)

Score pattern of publication channels

Almost 50.00% of papers had scored more than average and 33.33% of papers had scored between the average range i.e., 2.50–3.50. Some articles with the score below 2.50 have also been included in this study as they present some useful information and were published in education-based journals. Also, these studies discuss important demography and technology based aspects that are directly related to DSE.

Threats to validity

The validity of this study could be influenced by the following factors during the literature of this publication.

Construct validity

In this study this validity identifies the primary study for research (Elberzhager et al., 2012 ). To ensure that many primary studies have been included in this literature two authors have proposed possible search keywords in multiple repetitions. Search string is comprised of different terms related to DS and education. Though, list might be incomplete, count of final papers found can be changed by the alternative terms (Ampatzoglou et al., 2013 ). IEEE digital library, Science direct, ACM digital library, Wiley Online Library, PLOS, ArXiv and Google scholar are the main libraries where search is done. We believe according to the statistics of search engines of literature the most research can be found on these digital libraries (Garousi et al., 2013 ). Researchers also searched related papers in main DS research sites (VLDB, ICDM, EDBT) in order to minimize the risk of missing important publication.

Including the papers that does not belong to top journals or conferences may reduce the quality of primary studies in this research but it indicates that the representativeness of the primary studies is improved. However, certain papers which were not from the top publication sources are included because of their relativeness wisth the literature, even though they reduce the average score for primary studies. It also reduces the possibility of alteration of results which might have caused by the improper handling of duplicate papers. Some cases of duplications were found which were inspected later whether they were the same study or not. The two authors who have conducted the search has taken the final decision to the select the papers. If there is no agreement between then there must be discussion until an agreement is reached.

Internal validity

This validity deals with extraction and data analysis (Elberzhager et al., 2012 ). Two authors carried out the data extraction and primary studies classification. While the conflicts between them were resolved by involving a third author. The Kappa coefficient was 0.89, according to Landis and Koch ( 1977 ), this value indicates almost perfect level of agreement between the authors that reduces this threat significantly.

Conclusion validity

This threat deals with the identification of improper results which may cause the improper conclusions. In this case this threat deals with the factors like missing studies and wrong data extraction (Ampatzoglou et al., 2013 ). The objective of this is to limit these factors so that other authors can perform study and produce the proper conclusions (Elberzhager et al., 2012 ).

Interpretation of results might be affected by the selection and classification of primary studies and analyzing the selected study. Previous section has clearly described each step performed in primary study selection and data extraction activity to minimize this threat. The traceability between the result and data extracted was supported through the different charts. In our point of view, slight difference based on the publication selection and misclassification would not alter the main results.

External validity

This threat deals with the simplification of this research (Mateo et al., 2012 ). The results of this study were only considered that related to the DSE filed and validation of the conclusions extracted from this study only concerns the DSE context. The selected study representativeness was not affected because there was no restriction on time to find the published research. Therefore, this external validity threat is not valid in the context of this research. DS researchers can take search string and the paper classification scheme represented in this study as an initial point and more papers can be searched and categorized according to this scheme.

Analysis of compiled research articles

This section presents the analysis of the compiled research articles carefully selected for this study. It presents the findings with respect to the research questions described in Table ​ Table2 2 .

Selection results

A total of 70 papers were identified and analyzed for the answers of RQs described above. Table ​ Table6 6 represents a list of the nominated papers with detail of the classification results and their quality assessment scores.

Classification and quality assessment of selected articles

RQ1.Categorization of research work in DSE field

The analysis in this study reveals that the literature can be categorized as: Tools: any additional application that helps instructors in teaching and students in learning. Methods: any improvisation aimed at improving pedagogy or cognition. Curriculum: refers to the course content domains and their relative importance in a degree program, as shown in Fig.  3 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig3_HTML.jpg

Taxonomy of DSE study types

Most of the articles provide a solution by gathering the data and also prove the novelty of their research through results. These papers are categorized as experiments w.r.t. their research types. Whereas, some of them case study papers which are used to generate an in depth, multifaceted understanding of a complex issue in its real-life context, while few others are review studies analyzing the previously used approaches. On the other hand, a majority of included articles have evaluated their results with the help of experiments, while others conducted reviews to establish an opinion as shown in Fig.  4 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig4_HTML.jpg

Cross Mapping of DSE study type and research Types

Educational tools, especially those related to technology, are making their place in market faster than ever before (Calderon et al., 2011 ). The transition to active learning approaches, with the learner more engaged in the process rather than passively taking in information, necessitates a variety of tools to help ensure success. As with most educational initiatives, time should be taken to consider the goals of the activity, the type of learners, and the tools needed to meet the goals. Constant reassessment of tools is important to discover innovation and reforms that improve teaching and learning (Irby & Wilkerson, 2003 ). For this purpose, various type of educational tools such as, interactive, web-based and game based have been introduced to aid the instructors in order to explain the topic in more effective way.

The inclusion of technology into the classroom may help learners to compete in the competitive market when approaching the start of their career. It is important for the instructors to acknowledge that the students are more interested in using technology to learn database course instead of merely being taught traditional theory, project, and practice-based methods of teaching (Adams et al., 2004 ). Keeping these aspects in view many authors have done significant research which includes web-based and interactive tools to help the learners gain better understanding of basic database concepts.

Great research has been conducted with the focus of students learning. In this study we have discussed the students learning supportive with two major finding’s objectives i.e., tools which prove to be more helpful than other tools. Whereas, proposed tools with same outcome as traditional classroom environment. Such as, Abut and Ozturk ( 1997 ) proposed an interactive classroom environment to conduct database classes. The online tools such as electronic “Whiteboard”, electronic textbooks, advance telecommunication networks and few other resources such as Matlab and World Wide Web were the main highlights of their proposed smart classroom. Also, Pahl et al. ( 2004 ) presented an interactive multimedia-based system for the knowledge and skill oriented Web-based education of database course students. The authors had differentiated their proposed classroom environment from traditional classroom-based approach by using tool mediated independent learning and training in an authentic setting. On the other hand, some authors have also evaluated the educational tools based on their usage and impact on students’ learning. For example, Brusilovsky et al. ( 2010 )s evaluated the technical and conceptual difficulties of using several interactive educational tools in the context of a single course. A combined Exploratorium has been presented for database courses and an experimental platform, which delivers modified access to numerous types of interactive learning activities.

Also, Taipalus and Perälä ( 2019 ) investigated the types of errors that are persistent in writing SQL by the students. The authors also contemplated the errors while mapping them onto different query concepts. Moreover, Abelló Gamazo et al. ( 2016 ) presented a software tool for the e-assessment of relational database skills named LearnSQL. The proposed software allows the automatic and efficient e-learning and e-assessment of relational database skills. Apart from these, Yue ( 2013 ) proposed the database tool named Sakila as a unified platform to support instructions and multiple assignments of a graduate database course for five semesters. According to this study, students find this tool more useful and interesting than the highly simplified databases developed by the instructor, or obtained from textbook. On the other hand, authors have proposed tools with the main objective to help the student’s grip on the topic by addressing the pedagogical problems in using the educational tools. Connolly et al. ( 2005 ) discussed some of the pedagogical problems sustaining the development of a constructive learning environment using problem-based learning, a simulation game and interactive visualizations to help teach database analysis and design. Also, Yau and Karim ( 2003 ) proposed smart classroom with prevalent computing technology which will facilitate collaborative learning among the learners. The major aim of this smart classroom is to improve the quality of interaction between the instructors and students during lecture.

Student satisfaction is also an important factor for the educational tools to more effective. While it supports in students learning process it should also be flexible to achieve the student’s confidence by making it as per student’s needs (Brusilovsky et al., 2010 ; Connolly et al., 2005 ; Pahl et al., 2004 ). Also, Cvetanovic et al. ( 2010 ) has proposed a web-based educational system named ADVICE. The proposed solution helps the students to reduce the gap between DBMS, theory and its practice. On the other hand, authors have enhanced the already existing educational tools in the traditional classroom environment to addressed the student’s concerns (Nelson & Fatimazahra, 2010 ; Regueras et al., 2007 ) Table ​ Table7 7 .

Tools: Adopted in DSE and their impacts

Hands on database development is the main concern in most of the institute as well as in industry. However, tools assisting the students in database development and query writing is still major concern especially in SQL (Brusilovsky et al., 2010 ; Nagataki et al., 2013 ).

Student’s grades reflect their conceptual clarity and database development skills. They are also important to secure jobs and scholarships after passing out, which is why it is important to have the educational learning tools to help the students to perform well in the exams (Cvetanovic et al., 2010 ; Taipalus et al., 2018 ). While, few authors (Wang et al., 2010 ) proposed Metube which is a variation of YouTube. Subsequently, existing educational tools needs to be upgraded or replaced by the more suitable assessment oriented interactive tools to attend challenging students needs (Pahl et al., 2004 ; Yuelan et al., 2011 ).

One other objective of developing the educational tools is to increase the interaction between the students and the instructors. In the modern era, almost every institute follows the student centered learning(SCL). In SCL the interaction between students and instructor increases with most of the interaction involves from the students. In order to support SCL the educational based interactive and web-based tools need to assign more roles to students than the instructors (Abbasi et al., 2016 ; Taipalus & Perälä, 2019 ; Yau & Karim, 2003 ).

Theory versus practice is still one of the main issues in DSE teaching methods. The traditional teaching method supports theory first and then the concepts learned in the theoretical lectures implemented in the lab. Whereas, others think that it is better to start by teaching how to write query, which should be followed by teaching the design principles for database, while a limited amount of credit hours are also allocated for the general database theory topics. This part of the article discusses different trends of teaching and learning style along with curriculum and assessments methods discussed in DSE literature.

A variety of teaching methods have been designed, experimented, and evaluated by different researchers (Yuelan et al., 2011 ; Chen et al., 2012 ; Connolly & Begg, 2006 ). Some authors have reformed teaching methods based on the requirements of modern way of delivering lectures such as Yuelan et al. ( 2011 ) reform teaching method by using various approaches e.g. a) Modern ways of education: includes multimedia sound, animation, and simulating the process and working of database systems to motivate and inspire the students. b) Project driven approach: aims to make the students familiar with system operations by implementing a project. c) Strengthening the experimental aspects: to help the students get a strong grip on the basic knowledge of database and also enable them to adopt a self-learning ability. d) Improving the traditional assessment method: the students should turn in their research and development work as the content of the exam, so that they can solve their problem on their own.

The main aim of any teaching method is to make student learn the subject effectively. Student must show interest in order to gain something from the lectures delivered by the instructors. For this, teaching methods should be interactive and interesting enough to develop the interest of the students in the subject. Students can show interest in the subject by asking more relative questions or completing the home task and assignments on time. Authors have proposed few teaching methods to make topic more interesting such as, Chen et al. ( 2012 ) proposed a scaffold concept mapping strategy, which considers a student’s prior knowledge, and provides flexible learning aids (scaffolding and fading) for reading and drawing concept maps. Also, Connolly & Begg (200s6) examined different problems in database analysis and design teaching, and proposed a teaching approach driven by principles found in the constructivist epistemology to overcome these problems. This constructivist approach is based on the cognitive apprenticeship model and project-based learning. Similarly, Domínguez & Jaime ( 2010 ) proposed an active method for database design through practical tasks development in a face-to-face course. They analyzed results of five academic years using quasi experimental. The first three years a traditional strategy was followed and a course management system was used as material repository. On the other hand, Dietrich and Urban ( 1996 ) have described the use of cooperative group learning concepts in support of an undergraduate database management course. They have designed the project deliverables in such a way that students develop skills for database implementation. Similarly, Zhang et al. ( 2018 ) have discussed several effective classroom teaching measures from the aspects of the innovation of teaching content, teaching methods, teaching evaluation and assessment methods. They have practiced the various teaching measures by implementing the database technologies and applications in Qinghai University. Moreover, Hou and Chen ( 2010 ) proposed a new teaching method based on blending learning theory, which merges traditional and constructivist methods. They adopted the method by applying the blending learning theory on Access Database programming course teaching.

Problem solving skills is a key aspect to any type of learning at any age. Student must possess this skill to tackle the hurdles in institute and also in industry. Create mind and innovative students find various and unique ways to solve the daily task which is why they are more likeable to secure good grades and jobs. Authors have been working to introduce teaching methods to develop problem solving skills in the students(Al-Shuaily, 2012 ; Cai & Gao, 2019 ; Martinez-González & Duffing, 2007 ; Gudivada et al., 2007 ). For instance, Al-Shuaily ( 2012 ) has explored four cognitive factors such as i) Novices’ ability in understanding, ii) Novices’ ability to translate, iii) Novice’s ability to write, iv) Novices’ skills that might influence SQL teaching, and learning methods and approaches. Also, Cai and Gao ( 2019 ) have reformed the teaching method in the database course of two higher education institutes in China. Skills and knowledge, innovation ability, and data abstraction were the main objective of their study. Similarly, Martinez-González and Duffing ( 2007 ) analyzed the impact of convergence of European Union (EU) in different universities across Europe. According to their study, these institutes need to restructure their degree program and teaching methodologies. Moreover, Gudivada et al. ( 2007 ) proposed a student’s learning method to work with the large datasets. they have used the Amazon Web Services API and.NET/C# application to extract a subset of the product database to enhance student learning in a relational database course.

On the other hand, authors have also evaluated the traditional teaching methods to enhance the problem-solving skills among the students(Eaglestone & Nunes, 2004 ; Wang & Chen, 2014 ; Efendiouglu & Yelken, 2010 ) Such as, Eaglestone and Nunes ( 2004 ) shared their experiences of delivering a database design course at Sheffield University and discussed some of the issues they faced, regarding teaching, learning and assessments. Likewise, Wang and Chen ( 2014 ) summarized the problems mainly in teaching of the traditional database theory and application. According to the authors the teaching method is outdated and does not focus on the important combination of theory and practice. Moreover, Efendiouglu and Yelken ( 2010 ) investigated the effects of two different methods Programmed Instruction (PI) and Meaningful Learning (ML) on primary school teacher candidates’ academic achievements and attitudes toward computer-based education, and to define their views on these methods. The results show that PI is not favoured for teaching applications because of its behavioural structure Table ​ Table8 8 .

Methods: Teaching approaches adopted in DSE

Students become creative and innovative when the try to study on their own and also from different resources rather than curriculum books only. In the modern era, there are various resources available on both online and offline platforms. Modern teaching methods must emphasize on making the students independent from the curriculum books and educate them to learn independently(Amadio et al., 2003 ; Cai & Gao, 2019 ; Martin et al., 2013 ). Also, in the work of Kawash et al. ( 2020 ) proposed he group study-based learning approach called Graded Group Activities (GGAs). In this method students team up in order to take the exam as a group. On the other hand, few studies have emphasized on course content to prepare students for the final exams such as, Zheng and Dong ( 2011 ) have discussed the issues of computer science teaching with particular focus on database systems, where different characteristics of the course, teaching content and suggestions to teach this course effectively have been presented.

As technology is evolving at rapid speed, so students need to have practical experience from the start. Basic theoretical concepts of database are important but they are of no use without its implementation in real world projects. Most of the students study in the institutes with the aim of only clearing the exams with the help of theoretical knowledge and very few students want to have practical experience(Wang & Chen, 2014 ; Zheng & Dong, 2011 ). To reduce the gap between the theory and its implementation, authors have proposed teaching methods to develop the student’s interest in the real-world projects (Naik & Gajjar, 2021 ; Svahnberg et al., 2008 ; Taipalus et al., 2018 ). Moreover, Juxiang and Zhihong ( 2012 ) have proposed that the teaching organization starts from application scenarios, and associate database theoretical knowledge with the process from analysis, modeling to establishing database application. Also, Svahnberg et al. ( 2008 ) explained that in particular conditions, there is a possibility to use students as subjects for experimental studies in DSE and influencing them by providing responses that are in line with industrial practice.

On the other hand, Nelson et al. ( 2003 ) evaluated the different teaching methods used to teach different modules of database in the School of Computing and Technology at the University of Sunder- land. They outlined suggestions for changes to the database curriculum to further integrate research and state-of-the-art systems in databases.

  • III. Curriculum

Database curriculum has been revisited many times in the form of guidelines that not only present the contents but also suggest approximate time to cover different topics. According to the ACM curriculum guidelines (Lunt et al., 2008 ) for the undergraduate programs in computer science, the overall coverage time for this course is 46.50 h distributed in such a way that 11 h is the total coverage time for the core topics such as, Information Models (4 core hours), Database Systems (3 core hours) and Data Modeling (4 course hours). Whereas, the remaining hours are allocated for elective topics such as Indexing, Relational Databases, Query Languages, Relational Database Design, Transaction Processing, Distributed Databases, Physical Database Design, Data Mining, Information Storage and Retrieval, Hypermedia, Multimedia Systems, and Digital Libraries(Marshall, 2012 ). While, according to the ACM curriculum guidelines ( 2013 ) for undergraduate programs in computer science, this course should be completed in 15 weeks with two and half hour lecture per week and lab session of four hours per week on average (Brady et al., 2004 ). Thus, the revised version emphasizes on the practice based learning with the help of lab component. Numerous organizations have exerted efforts in this field to classify DSE (Dietrich et al., 2008 ). DSE model curricula, bodies of knowledge (BOKs), and some standardization aspects in this field are discussed below:

Model curricula

There are standard bodies who set the curriculum guidelines for teaching undergraduate degree programs in computing disciplines. Curricula which include the guidelines to teach database are: Computer Engineering Curricula (CEC) (Meier et al., 2008 ), Information Technology Curricula (ITC) (Alrumaih, 2016 ), Computing Curriculum Software Engineering (CCSE) (Meyer, 2001 ), Cyber Security Curricula (CSC) (Brady et al., 2004 ; Bishop et al., 2017 ).

Bodies of knowledge (BOK)

A BOK includes the set of thoughts and activities related to the professional area, while in model curriculum set of guidelines are given to address the education issues (Sahami et al., 2011 ). Database body of Knowledge comprises of (a) The Data Management Body of Knowledge (DM- BOK), (b) Software Engineering Education Knowledge (SEEK) (Sobel, 2003 ) (Sobel, 2003 ), and (c) The SE body of knowledge (SWEBOK) (Swebok Evolution: IEEE Computer Society n.d. ).

Apart from the model curricula, and bodies of knowledge, there also exist some standards related to the database and its different modules: ISO/IEC 9075–1:2016 (Computing Curricula, 1991 ), ISO/IEC 10,026–1: 1998 (Suryn, 2003 ).

We also utilize advices from some studies (Elberzhager et al., 2012 ; Keele et al., 2007 ) to search for relevant papers. In order to conduct this systematic study, it is essential to formulate the primary research questions (Mushtaq et al., 2017 ). Since the data management techniques and software are evolving rapidly, the database curriculum should also be updated accordingly to meet these new requirements. Some authors have described ways of updating the content of courses to keep pace with specific developments in the field and others have developed new database curricula to keep up with the new data management techniques.

Furthermore, some authors have suggested updates for the database curriculum based on the continuously evolving technology and introduction of big data. For instance Bhogal et al. ( 2012 ) have shown that database curricula need to be updated and modernized, which can be achieved by extending the current database concepts that cover the strategies to handle the ever changing user requirements and how database technology has evolved to meet the requirements. Likewise, Picciano ( 2012 ) examines the evolving world of big data and analytics in American higher education. According to the author, the “data driven” decision making method should be used to help the institutes evaluate strategies that can improve retention and update the curriculum that has big data basic concepts and applications, since data driven decision making has already entered in the big data and learning analytic era. Furthermore, Marshall ( 2011 ) presented the challenges faced when developing a curriculum for a Computer Science degree program in the South African context that is earmarked for international recognition. According to the author, the Curricula needs to adhere both to the policy and content requirements in order to be rated as being of a particular quality.

Similarly, some studies (Abourezq & Idrissi, 2016 ; Mingyu et al., 2017 ) described big data influence from a social perspective and also proceeded with the gaps in database curriculum of computer science, especially, in the big data era and discovers the teaching improvements in practical and theoretical teaching mode, teaching content and teaching practice platform in database curriculum. Also Silva et al. ( 2016 ) propose teaching SQL as a general language that can be used in a wide range of database systems from traditional relational database management systems to big data systems.

On the other hand, different authors have developed a database curriculum based on the different academic background of students. Such as, Dean and Milani ( 1995 ) have recommended changes in computer science curricula based on the practice in United Stated Military Academy (USMA). They emphasized greatly on the practical demonstration of the topic rather than the theoretical explanation. Especially, for the non-computer science major students. Furthermore, Urban and Dietrich ( 2001 ) described the development of a second course on database systems for undergraduates, preparing students for the advanced database concepts that they will exercise in the industry. They also shared their experience with teaching the course, elaborating on the topics and assignments. Also, Andersson et al. ( 2019 ) proposed variations in core topics of database management course for the students with the engineering background. Moreover, Dietrich et al. ( 2014 ) described two animations developed with images and color that visually and dynamically introduce fundamental relational database concepts and querying to students of many majors. The goal is that the educators, in diverse academic disciplines, should be able to incorporate these animations in their existing courses to meet their pedagogical needs.

The information systems have evolved into large scale distributed systems that store and process a huge amount of data across different servers, and process them using different distributed data processing frameworks. This evolution has given birth to new paradigms in database systems domain termed as NoSQL and Big Data systems, which significantly deviate from conventional relational and distributed database management systems. It is pertinent to mention that in order to offer a sustainable and practical CS education, these new paradigms and methodologies as shown in Fig.  5 should be included into database education (Kleiner, 2015 ). Tables ​ Tables9 9 and ​ and10 10 shows the summarized findings of the curriculum based reviewed studies. This section also proposed appropriate text book based on the theory, project, and practice-based teaching methodology as shown in Table ​ Table9. 9 . The proposed books are selected purely on the bases of their usage in top universities around the world such as, Massachusetts Institute of Technology, Stanford University, Harvard University, University of Oxford, University of Cambridge and, University of Singapore and the coverage of core topics mentioned in the database curriculum.

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig5_HTML.jpg

Concepts in Database Systems Education (Kleiner, 2015 )

Recommended text books for DSE

Curriculum: Findings of Reviewed Literature

RQ.2 Evolution of DSE research

This section discusses the evolution of database while focusing the DSE over the past 25 years as shown in Fig.  6 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig6_HTML.jpg

Evolution of DSE studies

This study shows that there is significant increase in research in DSE after 2004 with 78% of the selected papers are published after 2004. The main reason of this outcome is that some of the papers are published in well-recognized channels like IEEE Transactions on Education, ACM Transactions on Computing Education, International Conference on Computer Science and Education (ICCSE), and Teaching, Learning and Assessment of Database (TLAD) workshop. It is also evident that several of these papers were published before 2004 and only a few articles were published during late 1990s. This is because of the fact that DSE started to gain interest after the introduction of Body of Knowledge and DSE standards. The data intensive scientific discovery has been discussed as the fourth paradigm (Hey et al., 2009 ): where the first involves empirical science and observations; second contains theoretical science and mathematically driven insights; third considers computational science and simulation driven insights; while the fourth involves data driven insights of modern scientific research.

Over the past few decades, students have gone from attending one-room class to having the world at their fingertips, and it is a great challenge for the instructors to develop the interest of students in learning database. This challenge has led to the development of the different types of interactive tools to help the instructors teach DSE in this technology oriented era. Keeping the importance of interactive tools in DSE in perspective, various authors have proposed different interactive tools over the years, such as during 1995–2003, when different authors proposed various interactive tools. Some studies (Abut & Ozturk, 1997 ; Mcintyre et al., 1995 ) introduced state of the art interactive tools to teach and enhance the collaborative learning among the students. Similarly, during 2004–2005 more interactive tools in the field of DSE were proposed such as Pahl et al. ( 2004 ), Connolly et al. ( 2005 ) introduced multimedia system based interactive model and game based collaborative learning environment.

The Internet has started to become more common in the first decade of the twenty-first century and its positive impact on the education sector was undeniable. Cost effective, student teacher peer interaction, keeping in touch with the latest information were the main reasons which made the instructors employ web-based tools to teach database in the education sector. Due to this spike in the demand of web-based tools, authors also started to introduce new instruments to assist with teaching database. In 2007 Regueras et al. ( 2007 ) proposed an e-learning tool named QUEST with a feedback module to help the students to learn from their mistakes. Similarly, in 2010, multiple authors have proposed and evaluated various web-based tools. Cvetanovic et al. ( 2010 ) proposed ADVICE with the functionality to monitor student’s progress, while, few authors (Wang et al., 2010 ) proposed Metube which is a variation of YouTube. Furthermore, Nelson and Fatimazahra ( 2010 ) evaluated different web-based tools to highlight the complexities of using these web-based instruments.

Technology has changed the teaching methods in the education sector but technology cannot replace teachers, and despite the amount of time most students spend online, virtual learning will never recreate the teacher-student bond. In the modern era, innovation in technology used in educational sectors is not meant to replace the instructors or teaching methods.

During the 1990s some studies (Dietrich & Urban, 1996 ; Urban & Dietrich, 1997 ) proposed learning and teaching methods respectively keeping the evolving technology in view. The highlight of their work was project deliverables and assignments where students progressively advanced to a step-by-step extension, from a tutorial exercise and then attempting more difficult extension of assignment.

During 2002–2007 various authors have discussed a number of teaching and learning methods to keep up the pace with the ever changing database technology, such as Connolly and Begg ( 2006 ) proposing a constructive approach to teach database analysis and design. Similarly, Prince and Felder ( 2006 ) reviewed the effectiveness of inquiry learning, problem based learning, project-based learning, case-based teaching, discovery learning, and just-in-time teaching. Also, McIntyre et al. (Mcintyre et al., 1995 ) brought to light the impact of convergence of European Union (EU) in different universities across Europe. They suggested a reconstruction of teaching and learning methodologies in order to effectively teach database.

During 2008–2013 more work had been done to address the different methods of teaching and learning in the field of DSE, like the work of Dominguez and Jaime ( 2010 ) who proposed an active learning approach. The focus of their study was to develop the interest of students in designing and developing databases. Also, Zheng and Dong ( 2011 ) have highlighted various characteristics of the database course and its teaching content. Similarly, Yuelan et al. ( 2011 ) have reformed database teaching methods. The main focus of their study were the Modern ways of education, project driven approach, strengthening the experimental aspects, and improving the traditional assessment method. Likewise, Al-Shuaily ( 2012 ) has explored 4 cognitive factors that can affect the learning process of database. The main focus of their study was to facilitate the students in learning SQL. Subsequently, Chen et al. ( 2012 ) also proposed scaffolding-based concept mapping strategy. This strategy helps the students to better understand database management courses. Correspondingly, Martin et al. ( 2013 ) discussed various collaborative learning techniques in the field of DSE while keeping database as an introductory course.

In the years between 2014 and 2021, research in the field of DSE increased, which was the main reason that the most of teaching, learning and assessment methods were proposed and discussed during this period. Rashid and Al-Radhy ( 2014 ) discussed the issues of traditional teaching, learning, assessing methods of database courses at different universities in Kurdistan and the main focus of their study being reformation issues, such as absence of teaching determination and contradiction between content and theory. Similarly, Wang and Chen ( 2014 ) summarized the main problems in teaching the traditional database theory and its application. Curriculum assessment mode was the main focus of their study. Eaglestone and Nunes ( 2004 ) shared their experiences of delivering a databases design course at Sheffield University. Their focus of study included was to teach the database design module to a diverse group of students from different backgrounds. Rashid ( 2015 ) discussed some important features of database courses, whereby reforming the conventional teaching, learning, and assessing strategies of database courses at universities were the main focus of this study. Kui et al. ( 2018 ) reformed the teaching mode of database courses based on flipped classroom. Initiative learning of database courses was their main focus in this study. Similarly, Zhang et al. ( 2018 ) discussed several effective classroom teaching measures. The main focus of their study was teaching content, teaching methods, teaching evaluation and assessment methods. Cai and Gao ( 2019 ) also carried out the teaching reforms in the database course of liberal arts. Diversified teaching modes, such as flipping classroom, case oriented teaching and task oriented were the focus of their study. Teaching Kawash et al. ( 2020 ) proposed a learning approach called Graded Group Activities (GGAs). Their main focus of the study was reforming learning and assessment method.

Database course covers several topics that range from data modeling to data implementation and examination. Over the years, various authors have given their suggestions to update these topics in database curriculum to meet the requirements of modern technologies. On the other hand, authors have also proposed a new curriculum for the students of different academic backgrounds and different areas. These reformations in curriculum helped the students in their preparation, practically and theoretically, and enabled them to compete in the competitive market after graduation.

During 2003 and 2006 authors have proposed various suggestions to update and develop computer science curriculum across different universities. Robbert and Ricardo ( 2003 ) evaluated three reviews from 1999 to 2002 that were given to the groups of educators. The focus of their study was to highlight the trends that occurred in database curriculum. Also, Calero et al. ( 2003 ) proposed a first draft for this Database Body of Knowledge (DBBOK). Database (DB), Database Design (DBD), Database Administration (DBAd), Database Application (DBAp) and Advance Databases (ADVDB) were the main focus of their study. Furthermore, Conklin and Heinrichs (Conklin & Heinrichs, 2005 ) compared the content included in 13 database textbooks and the main focus of their study was IS 2002, CC2001, and CC2004 model curricula.

The years from 2007 and 2011, authors managed to developed various database curricula, like Luo et al. ( 2008 ) developed curricula in Zhejiang University City College. The aim of their study to nurture students to be qualified computer scientists. Likewise, Dietrich et al. ( 2008 ) proposed the techniques to assess the development of an advanced database course. The purpose behind the addition of an advanced database course at undergraduate level was to prepare the students to respond to industrial requirements. Also, Marshall ( 2011 ) developed a new database curriculum for Computer Science degree program in the South African context.

During 2012 and 2021 various authors suggested updates for the database curriculum such as Bhogal et al. ( 2012 ) who suggested updating and modernizing the database curriculum. Data management and data analytics were the focus of their study. Similarly, Picciano ( 2012 ) examined the curriculum in the higher level of American education. The focus of their study was big data and analytics. Also, Zhanquan et al. ( 2016 ) proposed the design for the course content and teaching methods in the classroom. Massive Open Online Courses (MOOCs) were the focus of their study. Likewise, Mingyu et al. ( 2017 ) suggested updating the database curriculum while keeping new technology concerning the database in perspective. The focus of their study was big data.

The above discussion clearly shows that the SQL is most discussed topic in the literature where more than 25% of the studies have discussed it in the previous decade as shown in Fig.  7 . It is pertinent to mention that other SQL databases such as Oracle, MS access are discussed under the SQL banner (Chen et al., 2012 ; Hou & Chen, 2010 ; Wang & Chen, 2014 ). It is mainly because of its ability to handle data in a relational database management system and direct implementation of database theoretical concepts. Also, other database topics such as transaction management, application programming etc. are also the main highlights of the topics discussed in the literature.

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig7_HTML.jpg

Evolution of Database topics discussed in literature

Research synthesis, advice for instructors, and way forward

This section presents the synthesized information extracted after reading and analyzing the research articles considered in this study. To this end, it firstly contextualizes the tools and methods to help the instructors find suitable tools and methods for their settings. Similarly, developments in curriculum design have also been discussed. Subsequently, general advice for instructors have been discussed. Lastly, promising future research directions for developing new tools, methods, and for revising the curriculum have also been discussed in this section.

Methods, tools, and curriculum

Methods and tools.

Web-based tools proposed by Cvetanovic et al. ( 2010 ) and Wang et al. ( 2010 ) have been quite useful, as they are growing increasingly pertinent as online mode of education is prevalent all around the globe during COVID-19. On the other hand, interactive tools and smart class room methodology has also been used successfully to develop the interest of students in database class. (Brusilovsky et al., 2010 ; Connolly et al., 2005 ; Pahl et al., 2004 ; Canedo et al., 2021 ; Ko et al., 2021 ).

One of the most promising combination of methodology and tool has been proposed by Cvetanovic et al. ( 2010 ), whereby they developed a tool named ADVICE that helps students learn and implement database concepts while using project centric methodology, while a game based collaborative learning environment was proposed by Connolly et al. ( 2005 ) that involves a methodology comprising of modeling, articulation, feedback, and exploration. As a whole, project centric teaching (Connolly & Begg, 2006 ; Domínguez & Jaime, 2010 ) and teaching database design and problem solving skills Wang and Chen ( 2014 ), are two successful approaches for DSE. Whereas, other studies (Urban & Dietrich, 1997 ) proposed teaching methods that are more inclined towards practicing database concepts. While a topic specific approach has been proposed by Abbasi et al. ( 2016 ), Taipalus et al. ( 2018 ) and Silva et al. ( 2016 ) to teach and learn SQL. On the other hand, Cai and Gao ( 2019 ) developed a teaching method for students who do not have a computer science background. Lastly, some useful ways for defining assessments for DSE have been proposed by Kawash et al. ( 2020 ) and Zhang et al. ( 2018 ).

Curriculum of database adopted by various institutes around the world does not address how to teach the database course to the students who do not have a strong computer science background. Such as Marshall ( 2012 ), Luo et al. ( 2008 ) and Zhanquan et al. ( 2016 ) have proposed the updates in current database curriculum for the students who are not from computer science background. While Abid et al. ( 2015 ) proposed a combined course content and various methodologies that can be used for teaching database systems course. On the other hand, current database curriculum does not include the topics related to latest technologies in database domain. This factor was discussed by many other studies as well (Bhogal et al., 2012 ; Mehmood et al., 2020 ; Picciano, 2012 ).

Guidelines for instructors

The major conclusion of this study are the suggestions based on the impact and importance for instructors who are teaching DSE. Furthermore, an overview of productivity of every method can be provided by the empirical studies. These instructions are for instructors which are the focal audience of this study. These suggestions are subjective opinions after literature analysis in form of guidelines according to the authors and their meaning and purpose were maintained. According to the literature reviewed, various issues have been found in this section. Some other issues were also found, but those were not relevant to DSE. Following are some suggestions that provide interesting information:

Project centric and applied approach

  • To inculcate database development skills for the students, basic elements of database development need to be incorporated into teaching and learning at all levels including undergraduate studies (Bakar et al., 2011 ). To fulfill this objective, instructors should also improve the data quality in DSE by assigning the projects and assignments to the students where they can assess, measure and improve the data quality using already deployed databases. They should demonstrate that the quality of data is determined not only by the effective design of a database, but also through the perception of the end user (Mathieu & Khalil, 1997 )
  • The gap between the database course theory and industrial practice is big. Fresh graduate students find it difficult to cope up with the industrial pressure because of the contrast between what they have been taught in institutes and its application in industry (Allsopp et al., 2006 ). Involve top performers from classes in industrial projects so that they are able to acquiring sufficient knowledge and practice, especially for post graduate courses. There must be some other activities in which industry practitioners come and present the real projects and also share their industrial experiences with the students. The gap between theoretical and the practical sides of database has been identified by Myers and Skinner ( 1997 ). In order to build practical DS concepts, instructors should provide the students an accurate view of reality and proper tools.

Importance of software development standards and impact of DB in software success

  • They should have the strategies, ability and skills that can align the DSE course with the contemporary Global Software Development (GSD) (Akbar & Safdar, 2015 ; Damian et al., 2006 ).
  • Enable the students to explain the approaches to problem solving, development tools and methodologies. Also, the DS courses are usually taught in normal lecture format. The result of this method is that students cannot see the influence on the success or failure of projects because they do not realize the importance of DS activities.

Pedagogy and the use of education technology

  • Some studies have shown that teaching through play and practical activities helps to improve the knowledge and learning outcome of students (Dicheva et al., 2015 ).
  • Interactive classrooms can help the instructors to deliver their lecture in a more effective way by using virtual white board, digital textbooks, and data over network(Abut & Ozturk, 1997 ). We suggest that in order to follow the new concept of smart classroom, instructors should use the experience of Yau and Karim ( 2003 ) which benefits in cooperative learning among students and can also be adopted in DSE.
  • The instructors also need to update themselves with full spectrum of technology in education, in general, and for DSE, in particular. This is becoming more imperative as during COVID the world is relying strongly on the use of technology, particularly in education sector.

Periodic Curriculum Revision

  • There is also a need to revisit the existing series of courses periodically, so that they are able to offer the following benefits: (a) include the modern day database system concepts; (b) can be offered as a specialization track; (c) a specialized undergraduate degree program may also be designed.

DSE: Way forward

This research combines a significant work done on DSE at one place, thus providing a point to find better ways forward in order to improvise different possible dimensions for improving the teaching process of a database system course in future. This section discusses technology, methods, and modifications in curriculum would most impact the delivery of lectures in coming years.

Several tools have already been developed for effective teaching and learning in database systems. However, there is a great room for developing new tools. Recent rise of the notion of “serious games” is marking its success in several domains. Majority of the research work discussed in this review revolves around web-based tools. The success of serious games invites researchers to explore this new paradigm of developing useful tools for learning and practice database systems concepts.

Likewise, due to COVID-19 the world is setting up new norms, which are expected to affect the methods of teaching as well. This invites the researchers to design, develop, and test flexible tools for online teaching in a more interactive manner. At the same time, it is also imperative to devise new techniques for assessments, especially conducting online exams at massive scale. Moreover, the researchers can implement the idea of instructional design in web-based teaching in which an online classroom can be designed around the learners’ unique backgrounds and effectively delivering the concepts that are considered to be highly important by the instructors.

The teaching, learning and assessment methods discussed in this study can help the instructors to improve their methods in order to teach the database system course in a better way. It is noticed that only 16% of authors have the assessment methods as their focus of study, which clearly highlights that there is still plenty of work needed to be done in this particular domain. Assessment techniques in the database course will help the learners to learn from their mistakes. Also, instructors must realize that there is a massive gap between database theory and practice which can only be reduced with maximum practice and real world database projects.

Similarly, the technology is continuously influencing the development and expansion of modern education, whereas the instructors’ abilities to teach using online platforms are critical to the quality of online education.

In the same way, the ideas like flipped classroom in which students have to prepare the lesson prior to the class can be implemented on web-based teaching. This ensures that the class time can be used for further discussion of the lesson, share ideas and allow students to interact in a dynamic learning environment.

The increasing impact of big data systems, and data science and its anticipated impact on the job market invites the researchers to revisit the fundamental course of database systems as well. There is a need to extend the boundaries of existing contents by including the concepts related to distributed big data systems data storage, processing, and transaction management, with possible glimpse of modern tools and technologies.

As a whole, an interesting and long term extension is to establish a generic and comprehensive framework that engages all the stakeholders with the support of technology to make the teaching, learning, practicing, and assessing easier and more effective.

This SLR presents review on the research work published in the area of database system education, with particular focus on teaching the first course in database systems. The study was carried out by systematically selecting research papers published between 1995 and 2021. Based on the study, a high level categorization presents a taxonomy of the published under the heads of Tools, Methods, and Curriculum. All the selected articles were evaluated on the basis of a quality criteria. Several methods have been developed to effectively teach the database course. These methods focus on improving learning experience, improve student satisfaction, improve students’ course performance, or support the instructors. Similarly, many tools have been developed, whereby some tools are topic based, while others are general purpose tools that apply for whole course. Similarly, the curriculum development activities have also been discussed, where some guidelines provided by ACM/IEEE along with certain standards have been discussed. Apart from this, the evolution in these three areas has also been presented which shows that the researchers have been presenting many different teaching methods throughout the selected period; however, there is a decrease in research articles that address the curriculum and tools in the past five years. Besides, some guidelines for the instructors have also been shared. Also, this SLR proposes a way forward in DSE by emphasizing on the tools: that need to be developed to facilitate instructors and students especially post Covid-19 era, methods: to be adopted by the instructors to close the gap between the theory and practical, Database curricula update after the introduction of emerging technologies such as big data and data science. We also urge that the recognized publication venues for database research including VLDB, ICDM, EDBT should also consider publishing articles related to DSE. The study also highlights the importance of reviving the curricula, tools, and methodologies to cater for recent advancements in the field of database systems.

Data availability

Code availability, declarations.

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Abbasi, S., Kazi, H., Khowaja, K., Abelló Gamazo, A., Burgués Illa, X., Casany Guerrero, M. J., Martin Escofet, C., Quer, C., Rodriguez González, M. E., Romero Moral, Ó., Urpi Tubella, A., Abid, A., Farooq, M. S., Raza, I., Farooq, U., Abid, K., Hussain, N., Abid, K., Ahmad, F., …, Yatim, N. F. M. (2016). Research trends in enterprise service bus (ESB) applications: A systematic mapping study. Journal of Informetrics, 27 (1), 217–220.
  • Abbasi, S., Kazi, H., & Khowaja, K. (2017). A systematic review of learning object oriented programming through serious games and programming approaches. 2017 4th IEEE International Conference on Engineering Technologies and Applied Sciences (ICETAS) , 1–6.
  • Abelló Gamazo A, Burgués Illa X, Casany Guerrero MJ, Martin Escofet C, Quer C, Rodriguez González ME, Romero Moral Ó, Urpi Tubella A. A software tool for E-assessment of relational database skills. International Journal of Engineering Education. 2016; 32 (3A):1289–1312. [ Google Scholar ]
  • Abid A, Farooq MS, Raza I, Farooq U, Abid K. Variants of teaching first course in database systems. Bulletin of Education and Research. 2015; 37 (2):9–25. [ Google Scholar ]
  • Abid A, Hussain N, Abid K, Ahmad F, Farooq MS, Farooq U, Khan SA, Khan YD, Naeem MA, Sabir N. A survey on search results diversification techniques. Neural Computing and Applications. 2016; 27 (5):1207–1229. [ Google Scholar ]
  • Abourezq, M., & Idrissi, A. (2016). Database-as-a-service for big data: An overview. International Journal of Advanced Computer Science and Applications (IJACSA) , 7 (1).
  • Abut, H., & Ozturk, Y. (1997). Interactive classroom for DSP/communication courses. 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing , 1 , 15–18.
  • Adams ES, Granger M, Goelman D, Ricardo C. Managing the introductory database course: What goes in and what comes out? ACM SIGCSE Bulletin. 2004; 36 (1):497–498. [ Google Scholar ]
  • Akbar, R., & Safdar, S. (2015). A short review of global software development (gsd) and latest software development trends. 2015 International Conference on Computer, Communications, and Control Technology (I4CT) , 314–317.
  • Allsopp DH, DeMarie D, Alvarez-McHatton P, Doone E. Bridging the gap between theory and practice: Connecting courses with field experiences. Teacher Education Quarterly. 2006; 33 (1):19–35. [ Google Scholar ]
  • Alrumaih, H. (2016). ACM/IEEE-CS information technology curriculum 2017: status report. Proceedings of the 1st National Computing Colleges Conference (NC3 2016) .
  • Al-Shuaily, H. (2012). Analyzing the influence of SQL teaching and learning methods and approaches. 10 Th International Workshop on the Teaching, Learning and Assessment of Databases , 3.
  • Amadio, W., Riyami, B., Mansouri, K., Poirier, F., Ramzan, M., Abid, A., Khan, H. U., Awan, S. M., Ismail, A., Ahmed, M., Ilyas, M., Mahmood, A., Hey, A. J. G., Tansley, S., Tolle, K. M., others, Tehseen, R., Farooq, M. S., Abid, A., …, Fatimazahra, E. (2003). The fourth paradigm: data-intensive scientific discovery. Innovation in Teaching and Learning in Information and Computer Sciences , 1 (1), 823–828.
  • Amadio, W. (2003). The dilemma of Team Learning: An assessment from the SQL programming classroom . 823–828.
  • Ampatzoglou A, Charalampidou S, Stamelos I. Research state of the art on GoF design patterns: A mapping study. Journal of Systems and Software. 2013; 86 (7):1945–1964. [ Google Scholar ]
  • Andersson C, Kroisandt G, Logofatu D. Including active learning in an online database management course for industrial engineering students. IEEE Global Engineering Education Conference (EDUCON) 2019; 2019 :217–220. [ Google Scholar ]
  • Aria M, Cuccurullo C. bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics. 2017; 11 (4):959–975. [ Google Scholar ]
  • Aziz O, Farooq MS, Abid A, Saher R, Aslam N. Research trends in enterprise service bus (ESB) applications: A systematic mapping study. IEEE Access. 2020; 8 :31180–31197. [ Google Scholar ]
  • Bakar MA, Jailani N, Shukur Z, Yatim NFM. Final year supervision management system as a tool for monitoring computer science projects. Procedia-Social and Behavioral Sciences. 2011; 18 :273–281. [ Google Scholar ]
  • Beecham S, Baddoo N, Hall T, Robinson H, Sharp H. Motivation in Software Engineering: A systematic literature review. Information and Software Technology. 2008; 50 (9–10):860–878. [ Google Scholar ]
  • Bhogal, J. K., Cox, S., & Maitland, K. (2012). Roadmap for Modernizing Database Curricula. 10 Th International Workshop on the Teaching, Learning and Assessment of Databases , 73.
  • Bishop, M., Burley, D., Buck, S., Ekstrom, J. J., Futcher, L., Gibson, D., ... & Parrish, A. (2017, May). Cybersecurity curricular guidelines . In IFIP World Conference on Information Security Education (pp. 3–13). Cham: Springer.
  • Brady A, Bruce K, Noonan R, Tucker A, Walker H. The 2003 model curriculum for a liberal arts degree in computer science: preliminary report. ACM SIGCSE Bulletin. 2004; 36 (1):282–283. [ Google Scholar ]
  • Brusilovsky P, Sosnovsky S, Lee DH, Yudelson M, Zadorozhny V, Zhou X. An open integrated exploratorium for database courses. AcM SIGcSE Bulletin. 2008; 40 (3):22–26. [ Google Scholar ]
  • Brusilovsky P, Sosnovsky S, Yudelson MV, Lee DH, Zadorozhny V, Zhou X. Learning SQL programming with interactive tools: From integration to personalization. ACM Transactions on Computing Education (TOCE) 2010; 9 (4):1–15. [ Google Scholar ]
  • Cai, Y., & Gao, T. (2019). Teaching Reform in Database Course for Liberal Arts Majors under the Background of" Internet Plus". 2018 6th International Education, Economics, Social Science, Arts, Sports and Management Engineering Conference (IEESASM 2018) , 208–213.
  • Calderon KR, Vij RS, Mattana J, Jhaveri KD. Innovative teaching tools in nephrology. Kidney International. 2011; 79 (8):797–799. [ PubMed ] [ Google Scholar ]
  • Calero C, Piattini M, Ruiz F. Towards a database body of knowledge: A study from Spain. ACM SIGMOD Record. 2003; 32 (2):48–53. [ Google Scholar ]
  • Canedo, E. D., Bandeira, I. N., & Costa, P. H. T. (2021). Challenges of database systems teaching amidst the Covid-19 pandemic. In 2021 IEEE Frontiers in Education Conference (FIE) (pp. 1–9). IEEE.
  • Chen H-H, Chen Y-J, Chen K-J. The design and effect of a scaffolded concept mapping strategy on learning performance in an undergraduate database course. IEEE Transactions on Education. 2012; 56 (3):300–307. [ Google Scholar ]
  • Cobo MJ, López-Herrera AG, Herrera-Viedma E, Herrera F. SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology. 2012; 63 (8):1609–1630. [ Google Scholar ]
  • Conklin M, Heinrichs L. In search of the right database text. Journal of Computing Sciences in Colleges. 2005; 21 (2):305–312. [ Google Scholar ]
  • Connolly, T. M., & Begg, C. E. (2006). A constructivist-based approach to teaching database analysis and design. Journal of Information Systems Education , 17 (1).
  • Connolly, T. M., Stansfield, M., & McLellan, E. (2005). An online games-based collaborative learning environment to teach database design. Web-Based Education: Proceedings of the Fourth IASTED International Conference(WBE-2005) .
  • Curricula Computing. (1991). Report of the ACM/IEEE-CS Joint Curriculum Task Force. Technical Report . New York: Association for Computing Machinery.
  • Cvetanovic M, Radivojevic Z, Blagojevic V, Bojovic M. ADVICE—Educational system for teaching database courses. IEEE Transactions on Education. 2010; 54 (3):398–409. [ Google Scholar ]
  • Damian, D., Hadwin, A., & Al-Ani, B. (2006). Instructional design and assessment strategies for teaching global software development: a framework. Proceedings of the 28th International Conference on Software Engineering , 685–690.
  • Dean, T. J., & Milani, W. G. (1995). Transforming a database systems and design course for non computer science majors. Proceedings Frontiers in Education 1995 25th Annual Conference. Engineering Education for the 21st Century , 2 , 4b2--17.
  • Dicheva, D., Dichev, C., Agre, G., & Angelova, G. (2015). Gamification in education: A systematic mapping study. Journal of Educational Technology \& Society , 18 (3), 75–88.
  • Dietrich SW, Urban SD, Haag S. Developing advanced courses for undergraduates: A case study in databases. IEEE Transactions on Education. 2008; 51 (1):138–144. [ Google Scholar ]
  • Dietrich SW, Goelman D, Borror CM, Crook SM. An animated introduction to relational databases for many majors. IEEE Transactions on Education. 2014; 58 (2):81–89. [ Google Scholar ]
  • Dietrich, S. W., & Urban, S. D. (1996). Database theory in practice: learning from cooperative group projects. Proceedings of the Twenty-Seventh SIGCSE Technical Symposium on Computer Science Education , 112–116.
  • Dominguez, C., & Jaime, A. (2010). Database design learning: A project-based approach organized through a course management system. Computers \& Education , 55 (3), 1312–1320.
  • Eaglestone, B., & Nunes, M. B. (2004). Pragmatics and practicalities of teaching and learning in the quicksand of database syllabuses. Journal of Innovations in Teaching and Learning for Information and Computer Sciences , 3 (1).
  • Efendiouglu A, Yelken TY. Programmed instruction versus meaningful learning theory in teaching basic structured query language (SQL) in computer lesson. Computers & Education. 2010; 55 (3):1287–1299. [ Google Scholar ]
  • Elberzhager F, Münch J, Nha VTN. A systematic mapping study on the combination of static and dynamic quality assurance techniques. Information and Software Technology. 2012; 54 (1):1–15. [ Google Scholar ]
  • Etemad M, Küpçü A. Verifiable database outsourcing supporting join. Journal of Network and Computer Applications. 2018; 115 :1–19. [ Google Scholar ]
  • Farooq MS, Riaz S, Abid A, Abid K, Naeem MA. A Survey on the role of IoT in agriculture for the implementation of smart farming. IEEE Access. 2019; 7 :156237–156271. [ Google Scholar ]
  • Farooq MS, Riaz S, Abid A, Umer T, Zikria YB. Role of IoT technology in agriculture: A systematic literature review. Electronics. 2020; 9 (2):319. [ Google Scholar ]
  • Farooq U, Rahim MSM, Sabir N, Hussain A, Abid A. Advances in machine translation for sign language: Approaches, limitations, and challenges. Neural Computing and Applications. 2021; 33 (21):14357–14399. [ Google Scholar ]
  • Fisher, D., & Khine, M. S. (2006). Contemporary approaches to research on learning environments: Worldviews . World Scientific.
  • Garcia-Molina, H. (2008). Database systems: the complete book . Pearson Education India.
  • Garousi V, Mesbah A, Betin-Can A, Mirshokraie S. A systematic mapping study of web application testing. Information and Software Technology. 2013; 55 (8):1374–1396. [ Google Scholar ]
  • Gudivada, V. N., Nandigam, J., & Tao, Y. (2007). Enhancing student learning in database courses with large data sets. 2007 37th Annual Frontiers In Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports , S2D--13.
  • Hey, A. J. G., Tansley, S., Tolle, K. M., & others. (2009). The fourth paradigm: data-intensive scientific discovery (Vol. 1). Microsoft research Redmond, WA.
  • Holliday, M. A., & Wang, J. Z. (2009). A multimedia database project and the evolution of the database course. 2009 39th IEEE Frontiers in Education Conference , 1–6.
  • Hou, S., & Chen, S. (2010). Research on applying the theory of Blending Learning on Access Database Programming Course teaching. 2010 2nd International Conference on Education Technology and Computer , 3 , V3--396.
  • Irby DM, Wilkerson L. Educational innovations in academic medicine and environmental trends. Journal of General Internal Medicine. 2003; 18 (5):370–376. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ishaq K, Zin NAM, Rosdi F, Jehanghir M, Ishaq S, Abid A. Mobile-assisted and gamification-based language learning: A systematic literature review. PeerJ Computer Science. 2021; 7 :e496. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Joint Task Force on Computing Curricula, A. F. C. M. (acm), & Society, I. C. (2013). Computer science curricula 2013: Curriculum guidelines for undergraduate degree programs in computer science . New York, NY, USA: Association for Computing Machinery.
  • Juxiang R, Zhihong N. Taking database design as trunk line of database courses. Fourth International Conference on Computational and Information Sciences. 2012; 2012 :767–769. [ Google Scholar ]
  • Kawash, J., Jarada, T., & Moshirpour, M. (2020). Group exams as learning tools: Evidence from an undergraduate database course. Proceedings of the 51st ACM Technical Symposium on Computer Science Education , 626–632.
  • Keele, S., et al. (2007). Guidelines for performing systematic literature reviews in software engineering .
  • Kleiner, C. (2015). New Concepts in Database System Education: Experiences and Ideas. Proceedings of the 46th ACM Technical Symposium on Computer Science Education , 698.
  • Ko J, Paek S, Park S, Park J. A news big data analysis of issues in higher education in Korea amid the COVID-19 pandemic. Sustainability. 2021; 13 (13):7347. [ Google Scholar ]
  • Kui, X., Du, H., Zhong, P., & Liu, W. (2018). Research and application of flipped classroom in database course. 2018 13th International Conference on Computer Science \& Education (ICCSE) , 1–5.
  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics , 159–174. [ PubMed ]
  • Lunt, B., Ekstrom, J., Gorka, S., Hislop, G., Kamali, R., Lawson, E., ... & Reichgelt, H. (2008). Curriculum guidelines for undergraduate degree programs in information technology . ACM.
  • Luo, R., Wu, M., Zhu, Y., & Shen, Y. (2008). Exploration of Curriculum Structures and Educational Models of Database Applications. 2008 The 9th International Conference for Young Computer Scientists , 2664–2668.
  • Luxton-Reilly, A., Albluwi, I., Becker, B. A., Giannakos, M., Kumar, A. N., Ott, L., Paterson, J., Scott, M. J., Sheard, J., & Szabo, C. (2018). Introductory programming: a systematic literature review. Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education , 55–106.
  • Manzoor MF, Abid A, Farooq MS, Nawaz NA, Farooq U. Resource allocation techniques in cloud computing: A review and future directions. Elektronika Ir Elektrotechnika. 2020; 26 (6):40–51. doi: 10.5755/j01.eie.26.6.25865. [ CrossRef ] [ Google Scholar ]
  • Marshall, L. (2011). Developing a computer science curriculum in the South African context. CSERC , 9–19.
  • Marshall, L. (2012). A comparison of the core aspects of the acm/ieee computer science curriculum 2013 strawman report with the specified core of cc2001 and cs2008 review. Proceedings of Second Computer Science Education Research Conference , 29–34.
  • Martin C, Urpi T, Casany MJ, Illa XB, Quer C, Rodriguez ME, Abello A. Improving learning in a database course using collaborative learning techniques. The International Journal of Engineering Education. 2013; 29 (4):986–997. [ Google Scholar ]
  • Martinez-González MM, Duffing G. Teaching databases in compliance with the European dimension of higher education: Best practices for better competences. Education and Information Technologies. 2007; 12 (4):211–228. [ Google Scholar ]
  • Mateo PR, Usaola MP, Alemán JLF. Validating second-order mutation at system level. IEEE Transactions on Software Engineering. 2012; 39 (4):570–587. [ Google Scholar ]
  • Mathieu, R. G., & Khalil, O. (1997). Teaching Data Quality in the Undergraduate Database Course. IQ , 249–266.
  • Mcintyre, D. R., Pu, H.-C., & Wolff, F. G. (1995). Use of software tools in teaching relational database design. Computers \& Education , 24 (4), 279–286.
  • Mehmood E, Abid A, Farooq MS, Nawaz NA. Curriculum, teaching and learning, and assessments for introductory programming course. IEEE Access. 2020; 8 :125961–125981. [ Google Scholar ]
  • Meier, R., Barnicki, S. L., Barnekow, W., & Durant, E. (2008). Work in progress-Year 2 results from a balanced, freshman-first computer engineering curriculum. In 38th Annual Frontiers in Education Conference (pp. S1F-17). IEEE.
  • Meyer B. Software engineering in the academy. Computer. 2001; 34 (5):28–35. [ Google Scholar ]
  • Mingyu, L., Jianping, J., Yi, Z., & Cuili, Z. (2017). Research on the teaching reform of database curriculum major in computer in big data era. 2017 12th International Conference on Computer Science and Education (ICCSE) , 570–573.
  • Morien, R. I. (2006). A Critical Evaluation Database Textbooks, Curriculum and Educational Outcomes. Director , 7 .
  • Mushtaq Z, Rasool G, Shehzad B. Multilingual source code analysis: A systematic literature review. IEEE Access. 2017; 5 :11307–11336. [ Google Scholar ]
  • Myers M, Skinner P. The gap between theory and practice: A database application case study. Journal of International Information Management. 1997; 6 (1):5. [ Google Scholar ]
  • Naeem A, Farooq MS, Khelifi A, Abid A. Malignant melanoma classification using deep learning: Datasets, performance measurements, challenges and opportunities. IEEE Access. 2020; 8 :110575–110597. [ Google Scholar ]
  • Nagataki, H., Nakano, Y., Nobe, M., Tohyama, T., & Kanemune, S. (2013). A visual learning tool for database operation. Proceedings of the 8th Workshop in Primary and Secondary Computing Education , 39–40.
  • Naik, S., & Gajjar, K. (2021). Applying and Evaluating Engagement and Application-Based Learning and Education (ENABLE): A Student-Centered Learning Pedagogy for the Course Database Management System. Journal of Education , 00220574211032319.
  • Nelson, D., Stirk, S., Patience, S., & Green, C. (2003). An evaluation of a diverse database teaching curriculum and the impact of research. 1st LTSN Workshop on Teaching, Learning and Assessment of Databases, Coventry .
  • Nelson D, Fatimazahra E. Review of Contributions to the Teaching, Learning and Assessment of Databases (TLAD) Workshops. Innovation in Teaching and Learning in Information and Computer Sciences. 2010; 9 (1):78–86. [ Google Scholar ]
  • Obaid I, Farooq MS, Abid A. Gamification for recruitment and job training: Model, taxonomy, and challenges. IEEE Access. 2020; 8 :65164–65178. [ Google Scholar ]
  • Pahl C, Barrett R, Kenny C. Supporting active database learning and training through interactive multimedia. ACM SIGCSE Bulletin. 2004; 36 (3):27–31. [ Google Scholar ]
  • Park, Y., Tajik, A. S., Cafarella, M., & Mozafari, B. (2017). Database learning: Toward a database that becomes smarter every time. Proceedings of the 2017 ACM International Conference on Management of Data , 587–602.
  • Picciano AG. The evolution of big data and learning analytics in American higher education. Journal of Asynchronous Learning Networks. 2012; 16 (3):9–20. [ Google Scholar ]
  • Prince MJ, Felder RM. Inductive teaching and learning methods: Definitions, comparisons, and research bases. Journal of Engineering Education. 2006; 95 (2):123–138. [ Google Scholar ]
  • Ramzan M, Abid A, Khan HU, Awan SM, Ismail A, Ahmed M, Ilyas M, Mahmood A. A review on state-of-the-art violence detection techniques. IEEE Access. 2019; 7 :107560–107575. [ Google Scholar ]
  • Rashid, T. A., & Al-Radhy, R. S. (2014). Transformations to issues in teaching, learning, and assessing methods in databases courses. 2014 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE) , 252–256.
  • Rashid, T. (2015). Investigation of instructing reforms in databases. International Journal of Scientific \& Engineering Research , 6 (8), 64–72.
  • Regueras, L. M., Verdú, E., Verdú, M. J., Pérez, M. A., & De Castro, J. P. (2007). E-learning strategies to support databases courses: a case study. First International Conference on Technology, Training and Communication .
  • Robbert MA, Ricardo CM. Trends in the evolution of the database curriculum. ACM SIGCSE Bulletin. 2003; 35 (3):139–143. [ Google Scholar ]
  • Sahami, M., Guzdial, M., McGettrick, A., & Roach, S. (2011). Setting the stage for computing curricula 2013: computer science--report from the ACM/IEEE-CS joint task force. Proceedings of the 42nd ACM Technical Symposium on Computer Science Education , 161–162.
  • Sciore E. SimpleDB: A simple java-based multiuser syst for teaching database internals. ACM SIGCSE Bulletin. 2007; 39 (1):561–565. [ Google Scholar ]
  • Shebaro B. Using active learning strategies in teaching introductory database courses. Journal of Computing Sciences in Colleges. 2018; 33 (4):28–36. [ Google Scholar ]
  • Sibia, N., & Liut, M. (2022, June). The Positive Effects of using Reflective Prompts in a Database Course. In 1st International Workshop on Data Systems Education (pp. 32–37).
  • Silva, Y. N., Almeida, I., & Queiroz, M. (2016). SQL: From traditional databases to big data. Proceedings of the 47th ACM Technical Symposium on Computing Science Education , 413–418.
  • Sobel, A. E. K. (2003). Computing Curricula--Software Engineering Volume. Proc. of the Final Draft of the Software Engineering Education Knowledge (SEEK) .
  • Suryn, W., Abran, A., & April, A. (2003). ISO/IEC SQuaRE: The second generation of standards for software product quality .
  • Svahnberg, M., Aurum, A., & Wohlin, C. (2008). Using students as subjects-an empirical evaluation. Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement , 288–290.
  • Swebok evolution: IEEE Computer Society. (n.d.). In IEEE Computer Society SWEBOK Evolution Comments . Retrieved March 24, 2021
  • Taipalus T, Seppänen V. SQL education: A systematic mapping study and future research agenda. ACM Transactions on Computing Education (TOCE) 2020; 20 (3):1–33. [ Google Scholar ]
  • Taipalus T, Siponen M, Vartiainen T. Errors and complications in SQL query formulation. ACM Transactions on Computing Education (TOCE) 2018; 18 (3):1–29. [ Google Scholar ]
  • Taipalus, T., & Perälä, P. (2019). What to expect and what to focus on in SQL query teaching. Proceedings of the 50th ACM Technical Symposium on Computer Science Education , 198–203.
  • Tehseen R, Farooq MS, Abid A. Earthquake prediction using expert systems: A systematic mapping study. Sustainability. 2020; 12 (6):2420. [ Google Scholar ]
  • Urban, S. D., & Dietrich, S. W. (2001). Advanced database concepts for undergraduates: experience with teaching a second course. Proceedings of the Thirty-Second SIGCSE Technical Symposium on Computer Science Education , 357–361.
  • Urban SD, Dietrich SW. Integrating the practical use of a database product into a theoretical curriculum. ACM SIGCSE Bulletin. 1997; 29 (1):121–125. [ Google Scholar ]
  • Wang, J., & Chen, H. (2014). Research and practice on the teaching reform of database course. International Conference on Education Reform and Modern Management, ERMM .
  • Wang, J. Z., Davis, T. A., Westall, J. M., & Srimani, P. K. (2010). Undergraduate database instruction with MeTube. Proceedings of the Fifteenth Annual Conference on Innovation and Technology in Computer Science Education , 279–283.
  • Yau, G., & Karim, S. W. (2003). Smart classroom: Enhancing collaborative learning using pervasive computing technology. II American Society… .
  • Yue K-B. Using a semi-realistic database to support a database course. Journal of Information Systems Education. 2013; 24 (4):327. [ Google Scholar ]
  • Yuelan L, Yiwei L, Yuyan H, Yuefan L. Study on teaching methods of database application courses. Procedia Engineering. 2011; 15 :5425–5428. [ Google Scholar ]
  • Zhang, X., Wang, X., Liu, Z., Xue, W., & ZHU, X. (2018). The Exploration and Practice on the Classroom Teaching Reform of the Database Technologies Course in colleges. 2018 3rd International Conference on Modern Management, Education Technology, and Social Science (MMETSS 2018) , 320–323.
  • Zhanquan W, Zeping Y, Chunhua G, Fazhi Z, Weibin G. Research of database curriculum construction under the environment of massive open online courses. International Journal of Educational and Pedagogical Sciences. 2016; 10 (12):3873–3877. [ Google Scholar ]
  • Zheng, Y., & Dong, J. (2011). Teaching reform and practice of database principles. 2011 6th International Conference on Computer Science \& Education (ICCSE) , 1460–1462.

NTRS - NASA Technical Reports Server

Available downloads, related records.

Help | Advanced Search

Computer Science > Computation and Language

Title: large language models: a survey.

Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.

Submission history

Access paper:.

  • Download PDF
  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .



    database management system research papers pdf

  2. Database Management System

    database management system research papers pdf

  3. Overview of database management system pdf

    database management system research papers pdf

  4. Database management system pdf

    database management system research papers pdf

  5. Database Management System 1

    database management system research papers pdf


    database management system research papers pdf


  1. Lecture 15 Database Application

  2. Lecture 10 Database Application

  3. lecture 7 Database Application

  4. Lecture 9 Database Application


  6. Lecture 18 Database Application


  1. 19024 PDFs

    Jan 2024. Anita Fira Waluyo. Jan 2024. Explore the latest full-text research PDFs, articles, conference papers, preprints and more on DATABASE MANAGEMENT SYSTEMS. Find methods information, sources ...

  2. PDF Database management system performance comparisons: A systematic

    the database is used by one or several software applications via a DBMS. Collectively, the database, the DBMS, and the software application are referred to as a database system [31, p.7][17, p.65]. The separation of the database and the DBMS, especially in the realm of relational databases, is typically impossible without exporting the database ...

  3. (PDF) Database System: Concepts and Design

    Such related data are called a database. A database system is an integrated collection of related files, along with details of the interpretation of the data contained therein. Basically, the ...

  4. PDF Architecture of a Database System

    in database systems that may arise in the future. As a result, we focus on relational database systems throughout this paper. At heart, a typical RDBMS has five main components, as illustrated in Figure 1.1. As an introduction to each of these components and the way they fit together, we step through the life of a query in a database system.

  5. (PDF) Database Management Systems: A NoSQL Analysis

    A Comparison Of Relational, NoSQL and NewSQL Database Management Systems For The Persistence Of Time Series Data. Conference Paper. Nov 2022. Christoph Praschl. Sebastian Pritz. Oliver Krauss ...

  6. Database management system performance comparisons: A systematic

    Download : Download high-res image (261KB) Download : Download full-size image Fig. 1. A simplified view of a database system and the end-user with the emphasis on components relevant to this study; the arrows represent the flow of information from the end-user's device to the database residing in persistent storage; the flow of information back to the software application is not illustrated ...

  7. PDF The Architecture of an Active Data Base Management System

    In this paper we propose an architecture for an active DBMS that supports ECA rules. This architecture provides new forms of interaction, in support of ECA rules, between application programs and thDBMS.This leads to a new paradigm for constructing database applications. Permission to copy without fee all or part of this material is granted ...

  8. Database Management Systems—An Efficient, Effective, and ...

    The object-oriented database management system has three main parts to it which are object structure, object classes, and object identity. The term object-oriented database management system (OODBMS) first came into play circa 1985. Several research projects have been done on the subject, with the most notable one being ORION .

  9. PDF Database Management System Performance Comparisons: A Systematic Survey

    2 DATABASE SYSTEMS 2.1 Database System Overview A database is a collection of interrelated data, typically stored according to a data model. Typically, the data is used by one or several software applications via a DBMS. Collectively, the database, the DBMS, and the software application are referred to as a database system [31, p.7][17, p.65].

  10. Architecture of a Database System

    Abstract. Database Management Systems (DBMSs) are a ubiquitous and critical component of modern computing, and the result of decades of research and development in both academia and industry. Historically, DBMSs were among the earliest multi-user server systems to be developed, and thus pioneered many systems design techniques for scalability ...

  11. PDF Database Management Systems: A Case Study of Faculty of Open Education

    Database systems continue to be a key aspect of Computer Science & Engineering today. Representing knowledge within a computer is one of the central challenges of the field. Database research has focused primarily on this fundamental issue (6). This paper presents a database management system developed for AOF (Faculty of Open Education) course ...

  12. [2301.01095] Database management system performance comparisons: A

    Download PDF Abstract: Efficiency has been a pivotal aspect of the software industry since its inception, as a system that serves the end-user fast, and the service provider cost-efficiently benefits all parties. A database management system (DBMS) is an integral part of effectively all software systems, and therefore it is logical that different studies have compared the performance of ...

  13. Advances on Data Management and Information Systems

    This editorial paper overviews research topics covered in this special section of the Information Systems Frontiers journal. The special section contains papers invited from the 24 th European Conference on Advances in Databases and Information Systems (ADBIS).. 3.1 ADBIS Research Topics. The ADBIS conference has been running continuously since 1993.

  14. PDF Principles of Database Management

    978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Frontmatter ... ers: LCCN 2018023251 | ISBN 9781107186125 (hardback : alk. paper) Subjects: LCSH: Database management. Classi ... 1.5 Advantages of Database Systems and Database Management 12 1.5.1 Data Independence 12 1.5.2 Database ...

  15. [PDF] Distributed Database Management Systems: A Practical Approach

    Distributed Database Management Systems: A Practical Approach. S. Rahimi, F. S. Haug. Published 2 August 2010. Computer Science. TLDR. This book addresses issues related to managing data across a distributed database system and gives implementers guidance on hiding discrepancies across systems and creating the illusion of a single repository ...

  16. [PDF] Advanced Database Management Systems

    This is the list of references for the Advanced Database Management Systems course unit, and pay special attention to the course unit materials web pages in order to identify which items in the list are mandatory weekly readings and which are the subject of the final coursework. This is the list of references for the Advanced Database Management Systems course unit.

  17. Relational data paradigms: What do we learn by taking the materiality

    Scientists are creating data objects to meet the needs of their projects, using platforms and languages to achieve database goals in systems not necessarily designed for database management. Beyond the working lives of scientists, large-scale data infrastructures are playing an increasing part in our everyday lives.

  18. PDF Distributed Database Management System (Dbms) Architectures and

    database administration system enables a person to organize, store and get the data from a computer. In the very early years of computers, punch cards were used for input, output and data storage, but this is a means of communicating the stored memory with a computer. Punch cards offered a fast way to enter data, and to retrieve it.

  19. (PDF) Impact of database management in modern world

    Database management systems are based on improvements brought about by integration. These systems provide essential means of simultaneously communicating to study the work, produce better ...

  20. Dec'12- Feb'13 JIT

    The concept of a Relational Database Management system (RDBMS) came to the fore in the 1970s. This concept was first advanced by Edgar F. Codd in his paper on database construction theory, "A Relational Model of Data for Large Shared Data Banks". The concept of a database table was formed where records of a fixed length would be

  21. Advances in database systems education: Methods, tools, curricula, and

    Abstract. Fundamentals of Database Systems is a core course in computing disciplines as almost all small, medium, large, or enterprise systems essentially require data storage component. Database System Education (DSE) provides the foundation as well as advanced concepts in the area of data modeling and its implementation.

  22. [PDF] Restaurant management database system design and implement

    This paper through demand analysis, draw ER diagram after analyzing the relationship model using Mysql to build a restaurant project database, with convenient query turnover, understand the functions of best-selling products, employee information and other functions. Test whether the function of the system is normal through several common daily ...

  23. (PDF) Role of Database Management Systems (DBMS) in Supporting

    Reference to produce graduates with understanding and skills in database design, the use of Database Management System (DBMS), the ability in administration and professional capability in the ...

  24. Usability of Pre-Flight Planning Interfaces for Supplemental Data

    Usability of Pre-Flight Planning Interfaces for Supplemental Data Service Provider Tools to Support Uncrewed Aircraft System Traffic Management Small uncrewed aircraft systems (sUASs) operate in low-altitude, uncontrolled airspace - where support services for their operators (UASOs) are not currently provided. NASA's System-Wide Safety (SWS) project is identifying the potential risks and ...

  25. (PDF) A Literature Review on Evolving Database

    It summarized the recent technologies that gave a new world and new research areas to database. A long journey from RDBMS to NewSQL [16] evolved terms like Google spanner and Polygot persistence ...

  26. [2402.05929] An Interactive Agent Foundation Model

    Download PDF Abstract: The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and ...

  27. [2402.13744] Reasoning Algorithmically in Graph Neural Networks

    Download PDF Abstract: The development of artificial intelligence systems with advanced reasoning capabilities represents a persistent and long-standing research question. Traditionally, the primary strategy to address this challenge involved the adoption of symbolic approaches, where knowledge was explicitly represented by means of symbols and explicitly programmed rules.

  28. [2402.06196] Large Language Models: A Survey

    Download PDF HTML (experimental) Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted ...