• Survey Paper
  • Open access
  • Published: 25 July 2020

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

  • Mahya Seyedan 1 &
  • Fereshteh Mafakheri   ORCID: orcid.org/0000-0002-7991-4635 1  

Journal of Big Data volume  7 , Article number:  53 ( 2020 ) Cite this article

104k Accesses

107 Citations

23 Altmetric

Metrics details

Big data analytics (BDA) in supply chain management (SCM) is receiving a growing attention. This is due to the fact that BDA has a wide range of applications in SCM, including customer behavior analysis, trend analysis, and demand prediction. In this survey, we investigate the predictive BDA applications in supply chain demand forecasting to propose a classification of these applications, identify the gaps, and provide insights for future research. We classify these algorithms and their applications in supply chain management into time-series forecasting, clustering, K-nearest-neighbors, neural networks, regression analysis, support vector machines, and support vector regression. This survey also points to the fact that the literature is particularly lacking on the applications of BDA for demand forecasting in the case of closed-loop supply chains (CLSCs) and accordingly highlights avenues for future research.

Introduction

Nowadays, businesses adopt ever-increasing precision marketing efforts to remain competitive and to maintain or grow their margin of profit. As such, forecasting models have been widely applied in precision marketing to understand and fulfill customer needs and expectations [ 1 ]. In doing so, there is a growing attention to analysis of consumption behavior and preferences using forecasts obtained from customer data and transaction records in order to manage products supply chains (SC) accordingly [ 2 , 3 ].

Supply chain management (SCM) focuses on flow of goods, services, and information from points of origin to customers through a chain of entities and activities that are connected to one another [ 4 ]. In typical SCM problems, it is assumed that capacity, demand, and cost are known parameters [ 5 ]. However, this is not the case in reality, as there are uncertainties arising from variations in customers’ demand, supplies transportation, organizational risks and lead times. Demand uncertainties, in particular, has the greatest influence on SC performance with widespread effects on production scheduling, inventory planning, and transportation [ 6 ]. In this sense, demand forecasting is a key approach in addressing uncertainties in supply chains [ 7 , 8 , 9 ].

A variety of statistical analysis techniques have been used for demand forecasting in SCM including time-series analysis and regression analysis [ 10 ]. With the advancements in information technologies and improved computational efficiencies, big data analytics (BDA) has emerged as a means of arriving at more precise predictions that better reflect customer needs, facilitate assessment of SC performance, improve the efficiency of SC, reduce reaction time, and support SC risk assessment [ 11 ].

The focus of this meta-research (literature review) paper is on “demand forecasting” in supply chains. The characteristics of demand data in today’s ever expanding and sporadic global supply chains makes the adoption of big data analytics (and machine learning) approaches a necessity for demand forecasting. The digitization of supply chains [ 12 ] and incoporporation Blockchain technologies [ 13 ] for better tracking of supply chains further highlights the role of big data analytics. Supply chain data is high dimensional generated across many points in the chain for varied purposes (products, supplier capacities, orders, shipments, customers, retailers, etc.) in high volumes due to plurality of suppliers, products, and customers and in high velocity reflected by many transactions continuously processed across supply chain networks. In the sense of such complexities, there has been a departure from conventional (statistical) demand forecasting approaches that work based on identifying statistically meannignful trends (characterized by mean and variance attributes) across historical data [ 14 ], towards intelligent forecasts that can learn from the historical data and intelligently evolve to adjust to predict the ever changing demand in supply chains [ 15 ]. This capability is established using big data analytics techniques that extract forecasting rules through discovering the underlying relationships among demand data across supply chain networks [ 16 ]. These techniques are computationally intensive to process and require complex machine-programmed algorithms [ 17 ].

With SCM efforts aiming at satisfying customer demand while minimizing the total cost of supply, applying machine-learning/data analytics algorithms could facilitate precise (data-driven) demand forecasts and align supply chain activities with these predictions to improve efficiency and satisfaction. Reflecting on these opportunities, in this paper, first a taxonmy of data sources in SCM is proposed. Then, the importance of demand management in SCs is investigated. A meta-research (literature review) on BDA applications in SC demand forecasting is explored according to categories of the algorithms utilized. This review paves the path to a critical discussion of BDA applications in SCM highlighting a number of key findings and summarizing the existing challenges and gaps in BDA applications for demand forecasting in SCs. On that basis, the paper concludes by presenting a number of avenues for future research.

Data in supply chains

Data in the context of supply chains can be categorized into customer, shipping, delivery, order, sale, store, and product data [ 18 ]. Figure  1 provides the taxonomy of supply chain data. As such, SC data originates from different (and segmented) sources such as sales, inventory, manufacturing, warehousing, and transportation. In this sense, competition, price volatilities, technological development, and varying customer commitments could lead to underestimation or overestimation of demand in established forecasts [ 19 ]. Therefore, to increase the precision of demand forecast, supply chain data shall be carefully analyzed to enhance knowledge about market trends, customer behavior, suppliers and technologies. Extracting trends and patterns from such data and using them to improve accuracy of future predictions can help minimize supply chain costs [ 20 , 21 ].

figure 1

Taxonomy of supply chain data

Analysis of supply chain data has become a complex task due to (1) increasing multiplicity of SC entities, (2) growing diversity of SC configurations depending on the homogeneity or heterogeneity of products, (3) interdependencies among these entities (4) uncertainties in dynamical behavior of these components, (5) lack of information as relate to SC entities; [ 11 ], (6) networked manufacturing/production entities due to their increasing coordination and cooperation to achieve a high level customization and adaptaion to varying customers’ needs [ 22 ], and finally (7) the increasing adoption of supply chain digitization practices (and use of Blockchain technologies) to track the acitivities across supply chains [ 12 , 13 ].

Big data analytics (BDA) has been increasingly applied in management of SCs [ 23 ], for procurement management (e.g., supplier selection [ 24 ], sourcing cost improvement [ 25 ], sourcing risk management [ 26 ], product research and development [ 27 ], production planning and control [ 28 ], quality management [ 29 ], maintenance, and diagnosis [ 30 ], warehousing [ 31 ], order picking [ 32 ], inventory control [ 33 ], logistics/transportation (e.g., intelligent transportation systems [ 34 ], logistics planning [ 35 ], in-transit inventory management [ 36 ], demand management (e.g., demand forecasting [ 37 ], demand sensing [ 38 ], and demand shaping [ 39 ]. A key application of BDA in SCM is to provide accurate forecasting, especially demand forecasting, with the aim of reducing the bullwhip effect [ 14 , 40 , 41 , 42 ].

Big data is defined as high-volume, high-velocity, high-variety, high value, and high veracity data requiring innovative forms of information processing that enable enhanced insights, decision making, and process automation [ 43 ]. Volume refers to the extensive size of data collected from multiple sources (spatial dimension) and over an extended period of time (temporal dimension) in SCs. For example, in case of freight data, we have ERP/WMS order and item-level data, tracking, and freight invoice data. These data are generated from sensors, bar codes, Enterprise resource planning (ERP), and database technologies. Velocity can be defined as the rate of generation and delivery of specific data; in other words, it refers to the speed of data collection, reliability of data transferring, efficiency of data storage, and excavation speed of discovering useful knowledge as relate to decision-making models and algorithms. Variety refers to generating varied types of data from diverse sources such as the Internet of Things (IoT), mobile devices, online social networks, and so on. For instance, the vast data from SCM are usually variable due to the diverse sources and heterogeneous formats, particularly resulted from using various sensors in manufacturing sites, highways, retailer shops, and facilitated warehouses. Value refers to the nature of the data that must be discovered to support decision-making. It is the most important yet the most elusive, of the 5 Vs. Veracity refers to the quality of data, which must be accurate and trustworthy, with the knowledge that uncertainty and unreliability may exist in many data sources. Veracity deals with conformity and accuracy of data. Data should be integrated from disparate sources and formats, filtered and validated [ 23 , 44 , 45 ]. In summary, big data analytics techniques can deal with a collection of large and complex datasets that are difficult to process and analyze using traditional techniques [ 46 ].

The literature points to multiple sources of big data across the supply chains with varied trade-offs among volume, velocity, variety, value, and veracity attributes [ 47 ]. We have summarized these sources and trade-offs in Table  1 . Although, the demand forecasts in supply chains belong to the lower bounds of volume, velocity, and variety, however, these forecasts can use data from all sources across the supply chains from low volume/variety/velocity on-the-shelf inventory reports to high volume/variety/velocity supply chain tracking information provided through IoT. This combination of data sources used in SC demand forecasts, with their diverse temporal and spatial attributes, places a greater emphasis on use of big data analytics in supply chains, in general, and demand forecasting efforts, in particular.

The big data analytics applications in supply chain demand forecasting have been reported in both categories of supervised and unsupervised learning. In supervised learning, data will be associated with labels, meaning that the inputs and outputs are known. The supervised learning algorithms identify the underlying relationships between the inputs and outputs in an effort to map the inputs to corresponding outputs given a new unlabeled dataset [ 48 ]. For example, in case of a supervised learning model for demand forecasting, future demand can be predicted based on the historical data on product demand [ 41 ]. In unsupervised learning, data are unlabeled (i.e. unknown output), and the BDA algorithms try to find the underlying patterns among unlabeled data [ 48 ] by analyzing the inputs and their interrelationships. Customer segmentation is an example of unsupervised learning in supply chains that clusters different groups of customers based on their similarity [ 49 ]. Many machine-learning/data analytics algorithms can facilitate both supervised learning (extracting the input–output relationships) and unsupervised learning (extracting inputs, outputs and their relationships) [ 41 ].

Demand management in supply chains

The term “demand management” emerged in practice in the late 1980s and early 1990s. Traditionally, there are two approaches for demand management. A forward approach which looks at potential demand over the next several years and a backward approach that relies on past or ongoing capabilities in responding to demand [ 50 ].

In forward demand management, the focus will be on demand forecasting and planning, data management, and marketing strategies. Demand forecasting and planning refer to predicting the quantities and timings of customers’ requests. Such predictions aim at achieving customers’ satisfaction by meeting their needs in a timely manner [ 51 ]. Accurate demand forecasting could improve the efficiency and robustness of production processes (and the associated supply chains) as the resources will be aligned with requirements leading to reduction of inventories and wastes [ 52 , 53 ].

In the light of the above facts, there are many approaches proposed in the literature and practice for demand forecasting and planning. Spreadsheet models, statistical methods (like moving averages), and benchmark-based judgments are among these approaches. Today, the most widely used demand forecasting and planning tool is Excel. The most widespread problem with spreadsheet models used for demand forecasting is that they are not scalable for large-scale data. In addition, the complexities and uncertainties in SCM (with multiplicity and variability of demand and supply) cannot be extracted, analyzed, and addressed through simple statistical methods such as moving averages or exponential smoothing [ 50 ]. During the past decade, traditional solutions for SC demand forecasting and planning have faced many difficulties in driving the costs down and reducing inventories [ 50 ]. Although, in some cases, the suggested solutions have improved the day’s payable, they have pushed up the SC costs as a burden to suppliers.

The era of big data and high computing analytics has enabled data processing at a large scale that is efficient, fast, easy, and with reduced concerns about data storage and collection due to cloud services. The emergence of new technologies in data storage and analytics and the abundance of quality data have created new opportunities for data-driven demand forecasting and planning. Demand forecast accuracy can be significantly improved with data-mining algorithms and tools that can sift through data, analyze the results, and learn about the relationships involved. This could lead to highly accurate demand forecasting models that learn from data and are scalable for application in SCM. In the following section, a review of BDA applications in SCM is presented. These applications are categorized based on the employed techniques in establishing the data-drive demand forecasts.

BDA for demand forecasting in SCM

This survey aims at reviewing the articles published in the area of demand and sales forecasting in SC in the presence of big data to provide a classification of the literature based on algorithms utilized as well as a survey of applications. To the best of our knowledge, no comprehensive review of the literature specifically on SC demand forecasting has been conducted with a focus on classification of techniques of data analytics and machine learning. In doing so, we performed a thorough search of the existing literature, through Scopus, Google Scholar, and Elsevier, with publication dates ranging from 2005 to 2019. The keywords used for the search were supply chain, demand forecasting, sales forecasting, big data analytics, and machine learning.

Figure  2 shows the trend analysis of publications in demand forecasting for SC appeared from 2005 to 2019. There is a steadily increasing trend in the number of publications from 2005 to 2019. It is expected that such growth continues in 2020. Reviewing the past 15 years of research on big data analysis/machine learning applications in SC demand forecasting, we identified 64 research papers (excluding books, book chapters, and review papers) and categorized them with respect to the methodologies adopted for demand forecasting. The five most frequently used techniques are listed in Table  2 that includes “Neural Network,” “Regression”, “Time-series forecasting (ARIMA)”, “Support Vector Machine”, and “Decision Tree” methods. This table implies the growing use of big data analysis techniques in SC demand forecasting. It shall be mentioned that there were a few articles using multiple of these techniques.

figure 2

Distribution of literature in supply chain demand forecasting from 2005 to 2019

It shall be mentioned that there are literature review papers exploring the use of big data analytics in SCM [ 10 , 16 , 23 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 ]. However, this study focuses on the specific topic of “demand forecasting” in SCM to explore BDA applications in line with this particular subtopic in SCM.

As Hofmann and Rutschmann [ 58 ] indicated in their literature review, the key questions to answer are why, what and how big data analytics/machine-learning algorithms could enhance forecasts’ accuracy in comparison to conventional statistical forecasting approaches.

Conventional methods have faced a number of limitations for demand forecasting in the context of SCs. There are a lot of parameters influencing the demand in supply chains, however, many of them were not captured in studies using conventional methods for the sake of simplicity. In this regard, the forecasts could only provide a partial understanding of demand variations in supply chains. In addition, the unexplained demand variations could be simply considered as statistical noise. Conventional approaches could provide shorter processing times in exchange for a compromise on robustness and accuracy of predictions. Conventional SC demand forecasting approaches are mostly done manually with high reliance on the planner’s skills and domain knowledge. It would be worthwhile to fully automate the forecasting process to reduce such a dependency [ 58 ]. Finally, data-driven techniques could learn to incorporate non-linear behaviors and could thus provide better approximations in demand forecasting compared to conventional methods that are mostly derived based on linear models. There is a significant level of non-linearity in demand behavior in SC particularly due to competition among suppliers, the bullwhip effect, and mismatch between supply and demand [ 40 ].

To extract valuable knowledge from a vast amount of data, BDA is used as an advanced analytics technique to obtain the data needed for decision-making. Reduced operational costs, improved SC agility, and increased customer satisfaction are mentioned among the benefits of applying BDA in SCM [ 68 ]. Researchers used various BDA techniques and algorithms in SCM context, such as classification, scenario analysis, and optimization [ 23 ]. Machine-learning techniques have been used to forecast demand in SCs, subject to uncertainties in prices, markets, competitors, and customer behaviors, in order to manage SCs in a more efficient and profitable manner [ 40 ].

BDA has been applied in all stages of supply chains, including procurement, warehousing, logistics/transportation, manufacturing, and sales management. BDA consists of descriptive analytics, predictive analytics, and prescriptive analytics. Descriptive analysis is defined as describing and categorizing what happened in the past. Predictive analytics are used to predict future events and discover predictive patterns within data by using mathematical algorithms such as data mining, web mining, and text mining. Prescriptive analytics apply data and mathematical algorithms for decision-making. Multi-criteria decision-making, optimization, and simulation are among the prescriptive analytics tools that help to improve the accuracy of forecasting [ 10 ].

Predictive analytics are the ones mostly utilized in SC demand and procurement forecasting [ 23 ]. In this sense, in the following subsections, we will review various predictive big data analytics approaches, presented in the literature for demand forecasting in SCM, categorized based on the employed data analytics/machine learning technique/algorithm, with elaborations of their purpose and applications (summarized in Table  3 ).

Time-series forecasting

Time series are methodologies for mining complex and sequential data types. In time-series data, sequence data, consisting of long sequences of numeric data, recorded at equal time intervals (e.g., per minute, per hour, or per day). Many natural and human-made processes, such as stock markets, medical diagnosis, or natural phenomenon, can generate time-series data. [ 48 ].

In case of demand forecasting using time-series, demand is recorded over time at equal size intervals [ 69 , 70 ]. Combinations of time-series methods with product or market features have attracted much attention in demand forecasting with BDA. Ma et al. [ 71 ] proposed and developed a demand trend-mining algorithm for predictive life cycle design. In their method, they combined three models (a) a decision tree model for large-scale historical data classification, (b) a discrete choice analysis for present and past demand modeling, and (c) an automated time-series forecasting model for future trend analysis. They tested and applied their 3-level approach in smartphone design, manufacturing and remanufacturing.

Time-series approach was used for forecasting of search traffic (service demand) subject to changes in consumer attitudes [ 37 ]. Demand forecasting has been achieved through time-series models using exponential smoothing with covariates (ESCov) to provide predictions for short-term, mid-term, and long-term demand trends in the chemical industry SCs [ 7 ]. In addition, Hamiche et al. [ 72 ] used a customer-responsive time-series approach for SC demand forecasting.

In case of perishable products, with short life cycles, having appropriate (short-term) forecasting is extremely critical. Da Veiga et al. [ 73 ] forecasted the demand for a group of perishable dairy products using Autoregressive Integrated Moving Average (ARIMA) and Holt-Winters (HW) models. The results were compared based on mean absolute percentage error (MAPE) and Theil inequality index (U-Theil). The HW model showed a better goodness-of-fit based on both performance metrics.

In case of ARIMA, the accuracy of predictions could diminish where there exists a high level of uncertainty in future patterns of parameters [ 42 , 74 , 75 , 76 ]. HW model forecasting can yield better accuracy in comparison to ARIMA [ 73 ]. HW is simple and easy to use. However, data horizon could not be larger than a seasonal cycle; otherwise, the accuracy of forecasts will decrease sharply. This is due to the fact that inputs of an HW model are themselves predicted values subject to longer-term potential inaccuracies and uncertainties [ 45 , 73 ].

Clustering analysis

Clustering analysis is a data analysis approach that partitions a group of data objects into subgroups based on their similarities. Several applications of clustering analysis has been reported in business analytics, pattern recognition, and web development [ 48 ]. Han et al. [ 48 ] have emphasized the fact that using clustering customers can be organized into groups (clusters), such that customers within a group present similar characteristic.

A key target of demand forecasting is to identify demand behavior of customers. Extraction of similar behavior from historical data leads to recognition of customer clusters or segments. Clustering algorithms such as K-means, self-organizing maps (SOMs), and fuzzy clustering have been used to segment similar customers with respect to their behavior. The clustering enhances the accuracy of SC demand forecasting as the predictions are established for each segment comprised of similar customers. As a limitation, the clustering methods have the tendency to identify the customers, that do not follow a pattern, as outliers [ 74 , 77 ].

Hierarchical forecasts of sales data are performed by clustering and categorization of sales patterns. Multivariate ARIMA models have been used in demand forecasting based on point-of-sales data in industrial bakery chains [ 19 ]. These bakery goods are ordered and clustered daily with a continuous need to demand forecasts in order to avoid both shortage or waste [ 19 ]. Fuel demand forecasting in thermal power plants is another domain with applications of clustering methods. Electricity consumption patterns are derived using a clustering of consumers, and on that basis, demand for the required fuel is established [ 77 ].

K-nearest-neighbor (KNN)

KNN is a method of classification that has been widely used for pattern recognition. KNN algorithm identifies the similarity of a given object to the surrounding objects (called tuples) by generating a similarity index. These tuples are described by n attributes. Thus, each tuple corresponds to a point in an n-dimensional space. The KNN algorithm searches for k tuples that are closest to a given tuple [ 48 ]. These similarity-based classifications will lead to formation of clusters containing similar objects. KNN can also be integrated into regression analysis problems [ 78 ] for dimensionality reduction of the data [ 79 ]. In the realm of demand forecasting in SC, Nikolopoulos et al. [ 80 ] applied KNN for forecasting sporadic demand in an automotive spare parts supply chain. In another study, KNN is used to forecast future trends of demand for Walmart’s supply chain planning [ 81 ].

Artificial neural networks

In artificial neural networks, a set of neurons (input/output units) are connected to one another in different layers in order to establish mapping of the inputs to outputs by finding the underlying correlations between them. The configuration of such networks could become a complex problem, due to a high number of layers and neurons, as well as variability of their types (linear or nonlinear), which needs to follow a data-driven learning process to be established. In doing so, each unit (neuron) will correspond to a weight, that is tuned through a training step [ 48 ]. At the end, a weighted network with minimum number of neurons, that could map the inputs to outputs with a minimum fitting error (deviation), is identified.

As the literature reveals, artificial neural networks (ANN) are widely applied for demand forecasting [ 82 , 83 , 84 , 85 ]. To improve the accuracy of ANN-based demand predictions, Liu et al. [ 86 ] proposed a combination of a grey model and a stacked auto encoder applied to a case study of predicting demand in a Brazilian logistics company subject to transportation disruption [ 87 ]. Amirkolaii et al. [ 88 ] applied neural networks in forecasting spare parts demand to minimize supply chain shortages. In this case of spare parts supply chain, although there were multiple suppliers to satisfy demand for a variety of spare parts, the demand was subject to high variability due to a varying number of customers and their varying needs. Their proposed ANN-based forecasting approach included (1) 1 input demand feature with 1 Stock-Keeping Unit (SKU), (2) 1 input demand feature with all SKUs, (3) 16 input demand features with 1 SKU, and (4) 16 input demand features with all SKUs. They applied neural networks with back propagation and compared the results with a number of benchmarks reporting a Mean Square Error (MSE) for each configuration scenario.

Huang et al. [ 89 ] compared a backpropagation (BP) neural network and a linear regression analysis for forecasting of e-logistics demand in urban and rural areas in China using data from 1997 to 2015. By comparing mean absolute error (MAE) and the average relative errors of backpropagation neural network and linear regression, they showed that backpropagation neural networks could reach higher accuracy (reflecting lower differences between predicted and actual data). This is due to the fact that a Sigmoid function was used as the transfer function in the hidden layer of BP, which is differentiable for nonlinear problems such as the one presented in their case study, whereas the linear regression works well with linear problems.

ANNs have also been applied in demand forecasting for server models with one-week demand prediction ahead of order arrivals. In this regard, Saha et al. [ 90 ] proposed an ANN-based forecasting model using a 52-week time-series data fitted through both BP and Radial Basis Function (RBF) networks. A RBF network is similar to a BP network except for the activation/transfer function in RBF that follows a feed-forward process using a radial basis function. RBF results in faster training and convergence to ANN weights in comparison with BP networks without compromising the forecasting precision.

Researchers have combined ANN-based machine-learning algorithms with optimization models to draw optimal courses of actions, strategies, or decisions for future. Chang et al. [ 91 ] employed a genetic algorithm in the training phase of a neural network using sales/supply chain data in the printed circuit board industry in Taiwan and presented an evolving neural network-forecasting model. They proposed use of a Genetic Algorithms (GA)-based cost function optimization to arrive at the best configuration of the corresponding neural network for sales forecast with respect to prediction precision. The proposed model was then compared to back-propagation and linear regression approaches using three performance indices of MAPE, Mean Absolute Deviation (MAD), and Total Cost Deviation (TCD), presenting its superior prediction precision.

Regression analysis

Regression models are used to generate continuous-valued functions utilized for prediction. These methods are used to predict the value of a response (dependent) variable with respect to one or more predictor (independent) variables. There are various forms of regression analysis, such as linear, multiple, weighted, symbolic (random), polynomial, nonparametric, and robust. The latter approach is useful when errors fail to satisfy normalcy conditions or when we deal with big data that could contain significant number of outliers [ 48 ].

Merkuryeva et al. [ 92 ] analyzed three prediction approaches for demand forecasting in the pharmaceutical industry: a simple moving average model, multiple linear regressions, and a symbolic regression with searches conducted through an evolutionary genetic programming. In this experiment, symbolic regression exhibited the best fit with the lowest error.

As perishable products must be sold due to a very short preservation time, demand forecasting for this type of products has drawn increasing attention. Yang and Sutrisno [ 93 ] applied and compared regression analysis and neural network techniques to derive demand forecasts for perishable goods. They concluded that accurate daily forecasts are achievable with knowledge of sales numbers in the first few hours of the day using either of the above methods.

Support vector machine (SVM)

SVM is an algorithm that uses a nonlinear mapping to transform a set of training data into a higher dimension (data classes). SVM searches for an optimal separating hyper-plane that can separate the resulting class from another) [ 48 ]. Villegas et al. [ 94 ] tested the applicability of SVMs for demand forecasting in household and personal care SCs with a dataset comprised of 229 weekly demand series in the UK. Wu [ 95 ] applied an SVM, using a particle swarm optimization (PSO) to search for the best separating hyper-plane, classifying the data related to car sales and forecasting the demand in each cluster.

Support vector regression (SVR)

Continuous variable classification problems can be solved by support vector regression (SVR), which is a regression implementation of SVM. The main idea behind SVR regression is the computation of a linear regression function within a high-dimensional feature space. SVR has been applied in financial/cost prediction problems, handwritten digit recognition, and speaker identification, object recognition, etc. [ 48 ].

Guanghui [ 96 ] used the SVR method for SC needs prediction. The use of SVR in demand forecasting can yield a lower mean square error than RBF neural networks due to the fact that the optimization (cost) function in SVR does not consider the points beyond a margin of distance from the training set. Therefore, this method leads to higher forecast accuracy, although, similar to SVM, it is only applicable to a two-class problem (such as normal versus anomaly detection/estimation problems). Sarhani and El Afia [ 97 ] sought to forecast SC demand using SVR and applied Particle swarm optimization (PSO) and GA to optimize SVR parameters. SVR-PSO and SVR-GA approaches were compared with respect to accuracy of predictions using MAPE. The results showed a superior performance by PSO in terms time intensity and MAPE when configuring the SVR parameters.

Mixed approaches

Some works in the literature have used a combination of the aforementioned techniques. In these studies, the data flow into a sequence of algorithms and the outputs of one stage become inputs of the next step. The outputs are explanatory in the form of qualitative and quantitative information with a sequence of useful information extracted out of each algorithm. Examples of such studies include [ 15 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 ].

In more complex supply chains with several points of supply, different warehouses, varied customers, and several products, the demand forecasting becomes a high dimensional problem. To address this issue, Islek and Oguducu [ 100 ] applied a clustering technique, called bipartite graph clustering, to analyze the patterns of sales for different products. Then, they combined a moving average model and a Bayesian belief network approaches to improve the accuracy of demand forecasting for each cluster. Kilimci et al. [ 101 ] developed an intelligent demand forecasting system by applying time-series and regression methods, a support vector regression algorithm, and a deep learning model in a sequence. They dealt with a case involving big amount of data accounting for 155 features over 875 million records. First, they used a principal component analysis for dimension reduction. Then, data clustering was performed. This is followed by demand forecasting for each cluster using a novel decision integration strategy called boosting ensemble. They concluded that the combination of a deep neural network with a boosting strategy yielded the best accuracy, minimizing the prediction error for demand forecasting.

Chen and Lu [ 98 ] combined clustering algorithms of SOM, a growing hierarchical self-organizing mapping (GHSOM), and K-means, with two machine-learning techniques of SVR and extreme learning machine (ELM) in sales forecasting of computers. The authors found that the combination of GHSOM and ELM yielded better accuracy and performance in demand forecasts for their computer retailing case study. Difficulties in forecasting also occur in cases with high product variety. For these types of products in an SC, patterns of sales can be extracted for clustered products. Then, for each cluster, a machine-learning technique, such as SVR, can be employed to further improve the prediction accuracy [ 104 ].

Brentan et al. [ 106 ] used and analyzed various BDA techniques for demand prediction; including support vector machines (SVM), and adaptive neural fuzzy inference systems (ANFIS). They combined the predicted values derived from each machine learning techniques, using a linear regression process to arrive at an average prediction value adopted as the benchmark forecast. The performance (accuracy) of each technique is then analyzed with respect to their mean square root error (RMSE) and MAE values obtained through comparing the target values and the predicted ones.

In summary, Table  3 provides an overview of the recent literature on the application of Predictive BDA in demand forecasting.

Discussions

The data produced in SCs contain a great deal of useful knowledge. Analysis of such massive data can help us to forecast trends of customer behavior, markets, prices, and so on. This can help organizations better adapt to competitive environments. To forecast demand in an SC, with the presences of big data, different predictive BDA algorithms have been used. These algorithms could provide predictive analytics using time-series approaches, auto-regressive methods, and associative forecasting methods [ 10 ]. The demand forecasts from these BDA methods could be integrated with product design attributes as well as with online search traffic mapping to incorporate customer and price information [ 37 , 71 ].

Predictive BDA algorithms

Most of the studies examined, developed and used a certain data-mining algorithm for their case studies. However, there are very few comparative studies available in the literature to provide a benchmark for understanding of the advantages and disadvantages of these methodologies. Additionally, as depicted by Table  3 , there is no clear trend between the choice of the BDA algorithm/method and the application domain or category.

Predictive BDA applicability

Most data-driven models used in the literature consider historical data. Such a backward-looking forecasting ignores the new trends and highs and lows in different economic environments. Also, organizational factors, such as reputation and marketing strategies, as well as internal risks (related to availability of SCM resources), could greatly influence the demand [ 107 ] and thus contribute to inaccuracy of BDA-based demand predictions using historical data. Incorporating existing driving factors outside the historical data, such as economic instability, inflation, and purchasing power, could help adjust the predictions with respect to unseen future scenarios of demand. Combining predictive algorithms with optimization or simulation can equip the models with prescriptive capabilities in response to future scenarios and expectations.

Predictive BDA in closed-loop supply chains (CLSC)

The combination of forward and reverse flow of material in a SC is referred to as a closed-loop supply chain (CLSC). A CLSC is a more complex system than a traditional SC because it consists of the forward and reverse SC simultaneously [ 108 ]. Economic impact, environmental impact, and social responsibility are three significant factors in designing a CLSC network with inclusion of product recycling, remanufacturing, and refurbishment functions. The complexity of a CLSC, compared to a common SC, results from the coordination between backward and forward flows. For example, transportation cost, holding cost, and forecasting demand are challenging issues because of uncertainties in the information flows from the forward chain to the reverse one. In addition, the uncertainties about the rate of returned products and efficiencies of recycling, remanufacturing, and refurbishment functions are some of the main barriers in establishing predictions for the reverse flow [ 5 , 6 , 109 ]. As such, one key finding from this literature survey is that CLSCs particularly deal with the lack of quality data for remanufacturing. Remanufacturing refers to the disassembly of products, cleaning, inspection, storage, reconditioning, replacement, and reassembling. As a result of deficiencies in data, optimal scheduling of remanufacturing functions is cumbersome due to uncertainties in the quality and quantity of used products as well as timing of returns and delivery delays.

IoT-based approaches can overcome the difficulties of collecting data in a CLSC. In an IoT environment, objects are monitored and controlled remotely across existing network infrastructures. This enables more direct integration between the physical world and computer-based systems. The results include improved efficiency, accuracy, and economic benefit across SCs [ 50 , 54 , 110 ].

Radio frequency identification (RFID) is another technology that has become very popular in SCs. RFID can be used for automation of processes in an SC, and it is useful for coordination of forecasts in CLSCs with dispersed points of return and varied quantities and qualities of returned used products [ 10 , 111 , 112 , 113 , 114 ].

Conclusions

The growing need to customer behavior analysis and demand forecasting is deriven by globalization and increasing market competitions as well as the surge in supply chain digitization practices. In this study, we performed a thorough review for applications of predictive big data analytics (BDA) in SC demand forecasting. The survey overviewed the BDA methods applied to supply chain demand forecasting and provided a comparative categorization of them. We collected and analyzed these studies with respect to methods and techniques used in demand prediction. Seven mainstream techniques were identified and studied with their pros and cons. The neural networks and regression analysis are observed as the two mostly employed techniques, among others. The review also pointed to the fact that optimization models or simulation can be used to improve the accuracy of forecasting through formulating and optimizing a cost function for the fitting of the predictions to data.

One key finding from reviewing the existing literature was that there is a very limited research conducted on the applications of BDA in CLSC and reverse logistics. There are key benefits in adopting a data-driven approach for design and management of CLSCs. Due to increasing environmental awareness and incentives from the government, nowadays a vast quantity of returned (used) products are collected, which are of various types and conditions, received and sorted in many collection points. These uncertainties have a direct impact on the cost-efficiency of remanufacturing processes, the final price of the refurbished products and the demand for these products [ 115 ]. As such, design and operation of CLSCs present a case for big data analytics from both supply and demand forecasting perspectives.

Availability of data and materials

The paper presents a review of the literature extracted from main scientific databases without presenting data.

Abbreviations

Adaptive neural fuzzy inference systems

Auto regressive integrated moving average

Artificial neural network

  • Big data analytics

Backpropagation

Closed-loop supply chain

Extreme learning machine

Enterprise resource planning

Genetic algorithms

Growing hierarchical self-organizing map

Holt-winters

Internet of things

K-nearest-neighbor

Mean absolute deviation

Mean absolute error

Mean absolute percentage error

Mean square error

Mean square root error

Radial basis function

Particle swarm optimization

Self-organizing maps

Stock-keeping unit

Supply chain analytics

Supply chain

  • Supply chain management

Support vector machine

Support vector regression

Total cost deviation

Theil inequality index

You Z, Si Y-W, Zhang D, Zeng X, Leung SCH, Li T. A decision-making framework for precision marketing. Expert Syst Appl. 2015;42(7):3357–67. https://doi.org/10.1016/J.ESWA.2014.12.022 .

Article   Google Scholar  

Guo ZX, Wong WK, Li M. A multivariate intelligent decision-making model for retail sales forecasting. Decis Support Syst. 2013;55(1):247–55. https://doi.org/10.1016/J.DSS.2013.01.026 .

Wei J-T, Lee M-C, Chen H-K, Wu H-H. Customer relationship management in the hairdressing industry: an application of data mining techniques. Expert Syst Appl. 2013;40(18):7513–8. https://doi.org/10.1016/J.ESWA.2013.07.053 .

Lu LX, Swaminathan JM. Supply chain management. Int Encycl Soc Behav Sci. 2015. https://doi.org/10.1016/B978-0-08-097086-8.73032-7 .

Gholizadeh H, Tajdin A, Javadian N. A closed-loop supply chain robust optimization for disposable appliances. Neural Comput Appl. 2018. https://doi.org/10.1007/s00521-018-3847-9 .

Tosarkani BM, Amin SH. A possibilistic solution to configure a battery closed-loop supply chain: multi-objective approach. Expert Syst Appl. 2018;92:12–26. https://doi.org/10.1016/J.ESWA.2017.09.039 .

Blackburn R, Lurz K, Priese B, Göb R, Darkow IL. A predictive analytics approach for demand forecasting in the process industry. Int Trans Oper Res. 2015;22(3):407–28. https://doi.org/10.1111/itor.12122 .

Article   MathSciNet   MATH   Google Scholar  

Boulaksil Y. Safety stock placement in supply chains with demand forecast updates. Oper Res Perspect. 2016;3:27–31. https://doi.org/10.1016/J.ORP.2016.07.001 .

Article   MathSciNet   Google Scholar  

Tang CS. Perspectives in supply chain risk management. Int J Prod Econ. 2006;103(2):451–88. https://doi.org/10.1016/J.IJPE.2005.12.006 .

Wang G, Gunasekaran A, Ngai EWT, Papadopoulos T. Big data analytics in logistics and supply chain management: certain investigations for research and applications. Int J Prod Econ. 2016;176:98–110. https://doi.org/10.1016/J.IJPE.2016.03.014 .

Awwad M, Kulkarni P, Bapna R, Marathe A. Big data analytics in supply chain: a literature review. In: Proceedings of the international conference on industrial engineering and operations management, 2018(SEP); 2018, p. 418–25.

Büyüközkan G, Göçer F. Digital Supply Chain: literature review and a proposed framework for future research. Comput Ind. 2018;97:157–77.

Kshetri N. 1 Blockchain’s roles in meeting key supply chain management objectives. Int J Inf Manage. 2018;39:80–9.

Michna Z, Disney SM, Nielsen P. The impact of stochastic lead times on the bullwhip effect under correlated demand and moving average forecasts. Omega. 2019. https://doi.org/10.1016/J.OMEGA.2019.02.002 .

Zhu Y, Zhao Y, Zhang J, Geng N, Huang D. Spring onion seed demand forecasting using a hybrid Holt-Winters and support vector machine model. PLoS ONE. 2019;14(7):e0219889. https://doi.org/10.1371/journal.pone.0219889 .

Govindan K, Cheng TCE, Mishra N, Shukla N. Big data analytics and application for logistics and supply chain management. Transport Res Part E Logist Transport Rev. 2018;114:343–9. https://doi.org/10.1016/J.TRE.2018.03.011 .

Bohanec M, Kljajić Borštnar M, Robnik-Šikonja M. Explaining machine learning models in sales predictions. Expert Syst Appl. 2017;71:416–28. https://doi.org/10.1016/J.ESWA.2016.11.010 .

Constante F, Silva F, Pereira A. DataCo smart supply chain for big data analysis. Mendeley Data. 2019. https://doi.org/10.17632/8gx2fvg2k6.5 .

Huber J, Gossmann A, Stuckenschmidt H. Cluster-based hierarchical demand forecasting for perishable goods. Expert Syst Appl. 2017;76:140–51. https://doi.org/10.1016/J.ESWA.2017.01.022 .

Ali MM, Babai MZ, Boylan JE, Syntetos AA. Supply chain forecasting when information is not shared. Eur J Oper Res. 2017;260(3):984–94. https://doi.org/10.1016/J.EJOR.2016.11.046 .

Bian W, Shang J, Zhang J. Two-way information sharing under supply chain competition. Int J Prod Econ. 2016;178:82–94. https://doi.org/10.1016/J.IJPE.2016.04.025 .

Mourtzis D. Challenges and future perspectives for the life cycle of manufacturing networks in the mass customisation era. Logist Res. 2016;9(1):2.

Nguyen T, Zhou L, Spiegler V, Ieromonachou P, Lin Y. Big data analytics in supply chain management: a state-of-the-art literature review. Comput Oper Res. 2018;98:254–64. https://doi.org/10.1016/J.COR.2017.07.004 .

Choi Y, Lee H, Irani Z. Big data-driven fuzzy cognitive map for prioritising IT service procurement in the public sector. Ann Oper Res. 2018;270(1–2):75–104. https://doi.org/10.1007/s10479-016-2281-6 .

Huang YY, Handfield RB. Measuring the benefits of erp on supply management maturity model: a “big data” method. Int J Oper Prod Manage. 2015;35(1):2–25. https://doi.org/10.1108/IJOPM-07-2013-0341 .

Miroslav M, Miloš M, Velimir Š, Božo D, Đorđe L. Semantic technologies on the mission: preventing corruption in public procurement. Comput Ind. 2014;65(5):878–90. https://doi.org/10.1016/J.COMPIND.2014.02.003 .

Zhang Y, Ren S, Liu Y, Si S. A big data analytics architecture for cleaner manufacturing and maintenance processes of complex products. J Clean Prod. 2017;142:626–41. https://doi.org/10.1016/J.JCLEPRO.2016.07.123 .

Shu Y, Ming L, Cheng F, Zhang Z, Zhao J. Abnormal situation management: challenges and opportunities in the big data era. Comput Chem Eng. 2016;91:104–13. https://doi.org/10.1016/J.COMPCHEMENG.2016.04.011 .

Krumeich J, Werth D, Loos P. Prescriptive control of business processes: new potentials through predictive analytics of big data in the process manufacturing industry. Bus Inform Syst Eng. 2016;58(4):261–80. https://doi.org/10.1007/s12599-015-0412-2 .

Guo SY, Ding LY, Luo HB, Jiang XY. A Big-Data-based platform of workers’ behavior: observations from the field. Accid Anal Prev. 2016;93:299–309. https://doi.org/10.1016/J.AAP.2015.09.024 .

Chuang Y-F, Chia S-H, Wong J-Y. Enhancing order-picking efficiency through data mining and assignment approaches. WSEAS Transactions on Business and Economics. 2014;11(1):52–64.

Google Scholar  

Ballestín F, Pérez Á, Lino P, Quintanilla S, Valls V. Static and dynamic policies with RFID for the scheduling of retrieval and storage warehouse operations. Comput Ind Eng. 2013;66(4):696–709. https://doi.org/10.1016/J.CIE.2013.09.020 .

Alyahya S, Wang Q, Bennett N. Application and integration of an RFID-enabled warehousing management system—a feasibility study. J Ind Inform Integr. 2016;4:15–25. https://doi.org/10.1016/J.JII.2016.08.001 .

Cui J, Liu F, Hu J, Janssens D, Wets G, Cools M. Identifying mismatch between urban travel demand and transport network services using GPS data: a case study in the fast growing Chinese city of Harbin. Neurocomputing. 2016;181:4–18. https://doi.org/10.1016/J.NEUCOM.2015.08.100 .

Shan Z, Zhu Q. Camera location for real-time traffic state estimation in urban road network using big GPS data. Neurocomputing. 2015;169:134–43. https://doi.org/10.1016/J.NEUCOM.2014.11.093 .

Ting SL, Tse YK, Ho GTS, Chung SH, Pang G. Mining logistics data to assure the quality in a sustainable food supply chain: a case in the red wine industry. Int J Prod Econ. 2014;152:200–9. https://doi.org/10.1016/J.IJPE.2013.12.010 .

Jun S-P, Park D-H, Yeom J. The possibility of using search traffic information to explore consumer product attitudes and forecast consumer preference. Technol Forecast Soc Chang. 2014;86:237–53. https://doi.org/10.1016/J.TECHFORE.2013.10.021 .

He W, Wu H, Yan G, Akula V, Shen J. A novel social media competitive analytics framework with sentiment benchmarks. Inform Manage. 2015;52(7):801–12. https://doi.org/10.1016/J.IM.2015.04.006 .

Marine-Roig E, Anton Clavé S. Tourism analytics with massive user-generated content: a case study of Barcelona. J Destination Market Manage. 2015;4(3):162–72. https://doi.org/10.1016/J.JDMM.2015.06.004 .

Carbonneau R, Laframboise K, Vahidov R. Application of machine learning techniques for supply chain demand forecasting. Eur J Oper Res. 2008;184(3):1140–54. https://doi.org/10.1016/J.EJOR.2006.12.004 .

Article   MATH   Google Scholar  

Munir K. Cloud computing and big data: technologies, applications and security, vol. 49. Berlin: Springer; 2019.

Rostami-Tabar B, Babai MZ, Ali M, Boylan JE. The impact of temporal aggregation on supply chains with ARMA(1,1) demand processes. Eur J Oper Res. 2019;273(3):920–32. https://doi.org/10.1016/J.EJOR.2018.09.010 .

Beyer MA, Laney D. The importance of ‘big data’: a definition. Stamford: Gartner; 2012. p. 2014–8.

Benabdellah AC, Benghabrit A, Bouhaddou I, Zemmouri EM. Big data for supply chain management: opportunities and challenges. In: Proceedings of IEEE/ACS international conference on computer systems and applications, AICCSA, no. 11, p. 20–26; 2016. https://doi.org/10.1109/AICCSA.2016.7945828 .

Kumar M. Applied big data analytics in operations management. Appl Big Data Anal Oper Manage. 2016. https://doi.org/10.4018/978-1-5225-0886-1 .

Zhong RY, Huang GQ, Lan S, Dai QY, Chen X, Zhang T. A big data approach for logistics trajectory discovery from RFID-enabled production data. Int J Prod Econ. 2015;165:260–72. https://doi.org/10.1016/J.IJPE.2015.02.014 .

Varela IR, Tjahjono B. Big data analytics in supply chain management: trends and related research. In: 6th international conference on operations and supply chain management, vol. 1, no. 1, p. 2013–4; 2014. https://doi.org/10.13140/RG.2.1.4935.2563 .

Han J, Kamber M, Pei J. Data mining: concepts and techniques. Burlington: Morgan Kaufmann Publishers; 2013. https://doi.org/10.1016/B978-0-12-381479-1.00001-0 .

Book   MATH   Google Scholar  

Arunachalam D, Kumar N. Benefit-based consumer segmentation and performance evaluation of clustering approaches: an evidence of data-driven decision-making. Expert Syst Appl. 2018;111:11–34. https://doi.org/10.1016/J.ESWA.2018.03.007 .

Chase CW. Next generation demand management: people, process, analytics, and technology. Hoboken: Wiley; 2016.

Book   Google Scholar  

SAS Institute. Demand-driven forecasting and planning: take responsiveness to the next level. 13; 2014. https://www.sas.com/content/dam/SAS/en_us/doc/whitepaper2/demand-driven-forecasting-planning-107477.pdf .

Acar Y, Gardner ES. Forecasting method selection in a global supply chain. Int J Forecast. 2012;28(4):842–8. https://doi.org/10.1016/J.IJFORECAST.2011.11.003 .

Ma S, Fildes R, Huang T. Demand forecasting with high dimensional data: the case of SKU retail sales forecasting with intra- and inter-category promotional information. Eur J Oper Res. 2016;249(1):245–57. https://doi.org/10.1016/J.EJOR.2015.08.029 .

Addo-Tenkorang R, Helo PT. Big data applications in operations/supply-chain management: a literature review. Comput Ind Eng. 2016;101:528–43. https://doi.org/10.1016/J.CIE.2016.09.023 .

Agrawal S, Singh RK, Murtaza Q. A literature review and perspectives in reverse logistics. Resour Conserv Recycl. 2015;97:76–92. https://doi.org/10.1016/J.RESCONREC.2015.02.009 .

Gunasekaran A, Kumar Tiwari M, Dubey R, Fosso Wamba S. Big data and predictive analytics applications in supply chain management. Comput Ind Eng. 2016;101:525–7. https://doi.org/10.1016/J.CIE.2016.10.020 .

Hazen BT, Skipper JB, Ezell JD, Boone CA. Big data and predictive analytics for supply chain sustainability: a theory-driven research agenda. Comput Ind Eng. 2016;101:592–8. https://doi.org/10.1016/J.CIE.2016.06.030 .

Hofmann E, Rutschmann E. Big data analytics and demand forecasting in supply chains: a conceptual analysis. Int J Logist Manage. 2018;29(2):739–66. https://doi.org/10.1108/IJLM-04-2017-0088 .

Jain A, Sanders NR. Forecasting sales in the supply chain: consumer analytics in the big data era. Int J Forecast. 2019;35(1):170–80. https://doi.org/10.1016/J.IJFORECAST.2018.09.003 .

Jin J, Liu Y, Ji P, Kwong CK. Review on recent advances in information mining from big consumer opinion data for product design. J Comput Inf Sci Eng. 2018;19(1):010801. https://doi.org/10.1115/1.4041087 .

Kumar R, Mahto D. Industrial forecasting support systems and technologies in practice: a review. Glob J Res Eng. 2013;13(4):17–33.

MathSciNet   Google Scholar  

Mishra D, Gunasekaran A, Papadopoulos T, Childe SJ. Big Data and supply chain management: a review and bibliometric analysis. Ann Oper Res. 2016;270(1):313–36. https://doi.org/10.1007/s10479-016-2236-y .

Ren S, Zhang Y, Liu Y, Sakao T, Huisingh D, Almeida CMVB. A comprehensive review of big data analytics throughout product lifecycle to support sustainable smart manufacturing: a framework, challenges and future research directions. J Clean Prod. 2019;210:1343–65. https://doi.org/10.1016/J.JCLEPRO.2018.11.025 .

Singh Jain AD, Mehta I, Mitra J, Agrawal S. Application of big data in supply chain management. Mater Today Proc. 2017;4(2):1106–15. https://doi.org/10.1016/J.MATPR.2017.01.126 .

Souza GC. Supply chain analytics. Bus Horiz. 2014;57(5):595–605. https://doi.org/10.1016/J.BUSHOR.2014.06.004 .

Tiwari S, Wee HM, Daryanto Y. Big data analytics in supply chain management between 2010 and 2016: insights to industries. Comput Ind Eng. 2018;115:319–30. https://doi.org/10.1016/J.CIE.2017.11.017 .

Zhong RY, Newman ST, Huang GQ, Lan S. Big Data for supply chain management in the service and manufacturing sectors: challenges, opportunities, and future perspectives. Comput Ind Eng. 2016;101:572–91. https://doi.org/10.1016/J.CIE.2016.07.013 .

Ramanathan U, Subramanian N, Parrott G. Role of social media in retail network operations and marketing to enhance customer satisfaction. Int J Oper Prod Manage. 2017;37(1):105–23. https://doi.org/10.1108/IJOPM-03-2015-0153 .

Coursera. Supply chain planning. Coursera E-Learning; 2019. https://www.coursera.org/learn/planning .

Villegas MA, Pedregal DJ. Supply chain decision support systems based on a novel hierarchical forecasting approach. Decis Support Syst. 2018;114:29–36. https://doi.org/10.1016/J.DSS.2018.08.003 .

Ma J, Kwak M, Kim HM. Demand trend mining for predictive life cycle design. J Clean Prod. 2014;68:189–99. https://doi.org/10.1016/J.JCLEPRO.2014.01.026 .

Hamiche K, Abouaïssa H, Goncalves G, Hsu T. A robust and easy approach for demand forecasting in supply chains. IFAC-PapersOnLine. 2018;51(11):1732–7. https://doi.org/10.1016/J.IFACOL.2018.08.206 .

Da Veiga CP, Da Veiga CRP, Catapan A, Tortato U, Da Silva WV. Demand forecasting in food retail: a comparison between the Holt-Winters and ARIMA models. WSEAS Trans Bus Econ. 2014;11(1):608–14.

Murray PW, Agard B, Barajas MA. Forecasting supply chain demand by clustering customers. IFAC-PapersOnLine. 2015;48(3):1834–9. https://doi.org/10.1016/J.IFACOL.2015.06.353 .

Ramos P, Santos N, Rebelo R. Performance of state space and ARIMA models for consumer retail sales forecasting. Robot Comput Integr Manuf. 2015;34:151–63. https://doi.org/10.1016/J.RCIM.2014.12.015 .

Schaer O, Kourentzes N. Demand forecasting with user-generated online information. Int J Forecast. 2019;35(1):197–212. https://doi.org/10.1016/J.IJFORECAST.2018.03.005 .

Pang Y, Yao B, Zhou X, Zhang Y, Xu Y, Tan Z. Hierarchical electricity time series forecasting for integrating consumption patterns analysis and aggregation consistency; 2018. In: IJCAI international joint conference on artificial intelligence; 2018, p. 3506–12.

Goyal R, Chandra P, Singh Y. Suitability of KNN regression in the development of interaction based software fault prediction models. IERI Procedia. 2014;6:15–21. https://doi.org/10.1016/J.IERI.2014.03.004 .

Runkler TA. Data analytics (models and algorithms for intelligent data analysis). In: Revista Espanola de las Enfermedades del Aparato Digestivo (Vol. 26, Issue 4). Springer Fachmedien Wiesbaden; 2016. https://doi.org/10.1007/978-3-658-14075-5 .

Nikolopoulos KI, Babai MZ, Bozos K. Forecasting supply chain sporadic demand with nearest neighbor approaches. Int J Prod Econ. 2016;177:139–48. https://doi.org/10.1016/j.ijpe.2016.04.013 .

Gaur M, Goel S, Jain E. Comparison between nearest Neighbours and Bayesian network for demand forecasting in supply chain management. In: 2015 international conference on computing for sustainable global development, INDIACom 2015, May; 2015, p. 1433–6.

Burney SMA, Ali SM, Burney S. A survey of soft computing applications for decision making in supply chain management. In: 2017 IEEE 3rd international conference on engineering technologies and social sciences, ICETSS 2017, 2018, p. 1–6. https://doi.org/10.1109/ICETSS.2017.8324158 .

González Perea R, Camacho Poyato E, Montesinos P, Rodríguez Díaz JA. Optimisation of water demand forecasting by artificial intelligence with short data sets. Biosyst Eng. 2019;177:59–66. https://doi.org/10.1016/J.BIOSYSTEMSENG.2018.03.011 .

Vhatkar S, Dias J. Oral-care goods sales forecasting using artificial neural network model. Procedia Comput Sci. 2016;79:238–43. https://doi.org/10.1016/J.PROCS.2016.03.031 .

Wong WK, Guo ZX. A hybrid intelligent model for medium-term sales forecasting in fashion retail supply chains using extreme learning machine and harmony search algorithm. Int J Prod Econ. 2010;128(2):614–24. https://doi.org/10.1016/J.IJPE.2010.07.008 .

Liu C, Shu T, Chen S, Wang S, Lai KK, Gan L. An improved grey neural network model for predicting transportation disruptions. Expert Syst Appl. 2016;45:331–40. https://doi.org/10.1016/J.ESWA.2015.09.052 .

Yuan WJ, Chen JH, Cao JJ, Jin ZY. Forecast of logistics demand based on grey deep neural network model. Proc Int Conf Mach Learn Cybern. 2018;1:251–6. https://doi.org/10.1109/ICMLC.2018.8527006 .

Amirkolaii KN, Baboli A, Shahzad MK, Tonadre R. Demand forecasting for irregular demands in business aircraft spare parts supply chains by using artificial intelligence (AI). IFAC-PapersOnLine. 2017;50(1):15221–6. https://doi.org/10.1016/J.IFACOL.2017.08.2371 .

Huang L, Xie G, Li D, Zou C. Predicting and analyzing e-logistics demand in urban and rural areas: an empirical approach on historical data of China. Int J Performabil Eng. 2018;14(7):1550–9. https://doi.org/10.23940/ijpe.18.07.p19.15501559 .

Saha C, Lam SS, Boldrin W. Demand forecasting for server manufacturing using neural networks. In: Proceedings of the 2014 industrial and systems engineering research conference, June 2014; 2015.

Chang P-C, Wang Y-W, Tsai C-Y. Evolving neural network for printed circuit board sales forecasting. Expert Syst Appl. 2005;29(1):83–92. https://doi.org/10.1016/J.ESWA.2005.01.012 .

Merkuryeva G, Valberga A, Smirnov A. Demand forecasting in pharmaceutical supply chains: a case study. Procedia Comput Sci. 2019;149:3–10. https://doi.org/10.1016/J.PROCS.2019.01.100 .

Yang CL, Sutrisno H. Short-term sales forecast of perishable goods for franchise business. In: 2018 10th international conference on knowledge and smart technology: cybernetics in the next decades, KST 2018, p. 101–5; 2018. https://doi.org/10.1109/KST.2018.8426091 .

Villegas MA, Pedregal DJ, Trapero JR. A support vector machine for model selection in demand forecasting applications. Comput Ind Eng. 2018;121:1–7. https://doi.org/10.1016/J.CIE.2018.04.042 .

Wu Q. The hybrid forecasting model based on chaotic mapping, genetic algorithm and support vector machine. Expert Syst Appl. 2010;37(2):1776–83. https://doi.org/10.1016/J.ESWA.2009.07.054 .

Guanghui W. Demand forecasting of supply chain based on support vector regression method. Procedia Eng. 2012;29:280–4. https://doi.org/10.1016/J.PROENG.2011.12.707 .

Sarhani M, El Afia A. Intelligent system based support vector regression for supply chain demand forecasting. In: 2014 2nd world conference on complex systems, WCCS 2014; 2015, p. 79–83. https://doi.org/10.1109/ICoCS.2014.7060941 .

Chen IF, Lu CJ. Sales forecasting by combining clustering and machine-learning techniques for computer retailing. Neural Comput Appl. 2017;28(9):2633–47. https://doi.org/10.1007/s00521-016-2215-x .

Fasli M, Kovalchuk Y. Learning approaches for developing successful seller strategies in dynamic supply chain management. Inf Sci. 2011;181(16):3411–26. https://doi.org/10.1016/J.INS.2011.04.014 .

Islek I, Oguducu SG. A retail demand forecasting model based on data mining techniques. In: IEEE international symposium on industrial electronics; 2015, p. 55–60. https://doi.org/10.1109/ISIE.2015.7281443 .

Kilimci ZH, Akyuz AO, Uysal M, Akyokus S, Uysal MO, Atak Bulbul B, Ekmis MA. An improved demand forecasting model using deep learning approach and proposed decision integration strategy for supply chain. Complexity. 2019;2019:1–15. https://doi.org/10.1155/2019/9067367 .

Loureiro ALD, Miguéis VL, da Silva LFM. Exploring the use of deep neural networks for sales forecasting in fashion retail. Decis Support Syst. 2018;114:81–93. https://doi.org/10.1016/J.DSS.2018.08.010 .

Punam K, Pamula R, Jain PK. A two-level statistical model for big mart sales prediction. In: 2018 international conference on computing, power and communication technologies, GUCON 2018; 2019. https://doi.org/10.1109/GUCON.2018.8675060 .

Puspita PE, İnkaya T, Akansel M. Clustering-based Sales Forecasting in a Forklift Distributor. In: Uluslararası Muhendislik Arastirma ve Gelistirme Dergisi, 1–17; 2019. https://doi.org/10.29137/umagd.473977 .

Thomassey S. Sales forecasts in clothing industry: the key success factor of the supply chain management. Int J Prod Econ. 2010;128(2):470–83. https://doi.org/10.1016/J.IJPE.2010.07.018 .

Brentan BM, Ribeiro L, Izquierdo J, Ambrosio JK, Luvizotto E, Herrera M. Committee machines for hourly water demand forecasting in water supply systems. Math Probl Eng. 2019;2019:1–11. https://doi.org/10.1155/2019/9765468 .

Mafakheri F, Breton M, Chauhan S. Project-to-organization matching: an integrated risk assessment approach. Int J IT Project Manage. 2012;3(3):45–59. https://doi.org/10.4018/jitpm.2012070104 .

Mafakheri F, Nasiri F. Revenue sharing coordination in reverse logistics. J Clean Prod. 2013;59:185–96. https://doi.org/10.1016/J.JCLEPRO.2013.06.031 .

Bogataj M. Closed Loop Supply Chain (CLSC): economics, modelling, management and control. Int J Prod Econ. 2017;183:319–21. https://doi.org/10.1016/J.IJPE.2016.11.020 .

Hopkins J, Hawking P. Big Data Analytics and IoT in logistics: a case study. Int J Logist Manage. 2018;29(2):575–91. https://doi.org/10.1108/IJLM-05-2017-0109 .

de Oliveira CM, Soares PJSR, Morales G, Arica J, Matias IO. RFID and its applications on supply chain in Brazil: a structured literature review (2006–2016). Espacios. 2017;38(31). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85021922345&partnerID=40&md5=f062191611541391ded4cdb73eea55cb .

Griva A, Bardaki C, Pramatari K, Papakiriakopoulos D. Retail business analytics: customer visit segmentation using market basket data. Expert Syst Appl. 2018;100:1–16. https://doi.org/10.1016/J.ESWA.2018.01.029 .

Lee CKM, Ho W, Ho GTS, Lau HCW. Design and development of logistics workflow systems for demand management with RFID. Expert Syst Appl. 2011;38(5):5428–37. https://doi.org/10.1016/J.ESWA.2010.10.012 .

Mohebi E, Marquez L. Application of machine learning and RFID in the stability optimization of perishable foods; 2008.

Jiao Z, Ran L, Zhang Y, Li Z, Zhang W. Data-driven approaches to integrated closed-loop sustainable supply chain design under multi-uncertainties. J Clean Prod. 2018;185:105–27.

Levis AA, Papageorgiou LG. Customer demand forecasting via support vector regression analysis. Chem Eng Res Des. 2005;83(8):1009–18. https://doi.org/10.1205/CHERD.04246 .

Chi H-M, Ersoy OK, Moskowitz H, Ward J. Modeling and optimizing a vendor managed replenishment system using machine learning and genetic algorithms. Eur J Oper Res. 2007;180(1):174–93. https://doi.org/10.1016/J.EJOR.2006.03.040 .

Sun Z-L, Choi T-M, Au K-F, Yu Y. Sales forecasting using extreme learning machine with applications in fashion retailing. Decis Support Syst. 2008;46(1):411–9. https://doi.org/10.1016/J.DSS.2008.07.009 .

Efendigil T, Önüt S, Kahraman C. A decision support system for demand forecasting with artificial neural networks and neuro-fuzzy models: a comparative analysis. Expert Syst Appl. 2009;36(3):6697–707. https://doi.org/10.1016/J.ESWA.2008.08.058 .

Lee CC, Ou-Yang C. A neural networks approach for forecasting the supplier’s bid prices in supplier selection negotiation process. Expert Syst Appl. 2009;36(2):2961–70. https://doi.org/10.1016/J.ESWA.2008.01.063 .

Chen F-L, Chen Y-C, Kuo J-Y. Applying Moving back-propagation neural network and Moving fuzzy-neuron network to predict the requirement of critical spare parts. Expert Syst Appl. 2010;37(9):6695–704. https://doi.org/10.1016/J.ESWA.2010.04.037 .

Wu Q. Product demand forecasts using wavelet kernel support vector machine and particle swarm optimization in manufacture system. J Comput Appl Math. 2010;233(10):2481–91. https://doi.org/10.1016/J.CAM.2009.10.030 .

Babai MZ, Ali MM, Boylan JE, Syntetos AA. Forecasting and inventory performance in a two-stage supply chain with ARIMA(0,1,1) demand: theory and empirical analysis. Int J Prod Econ. 2013;143(2):463–71. https://doi.org/10.1016/J.IJPE.2011.09.004 .

Kourentzes N. Intermittent demand forecasts with neural networks. Int J Prod Econ. 2013;143(1):198–206. https://doi.org/10.1016/J.IJPE.2013.01.009 .

Lau HCW, Ho GTS, Zhao Y. A demand forecast model using a combination of surrogate data analysis and optimal neural network approach. Decis Support Syst. 2013;54(3):1404–16. https://doi.org/10.1016/J.DSS.2012.12.008 .

Arunraj NS, Ahrens D. A hybrid seasonal autoregressive integrated moving average and quantile regression for daily food sales forecasting. Int J Prod Econ. 2015;170:321–35. https://doi.org/10.1016/J.IJPE.2015.09.039 .

Di Pillo G, Latorre V, Lucidi S, Procacci E. An application of support vector machines to sales forecasting under promotions. 4OR. 2016. https://doi.org/10.1007/s10288-016-0316-0 .

da Veiga CP, da Veiga CRP, Puchalski W, dos Coelho LS, Tortato U. Demand forecasting based on natural computing approaches applied to the foodstuff retail segment. J Retail Consumer Serv. 2016;31:174–81. https://doi.org/10.1016/J.JRETCONSER.2016.03.008 .

Chawla A, Singh A, Lamba A, Gangwani N, Soni U. Demand forecasting using artificial neural networks—a case study of American retail corporation. In: Applications of artificial intelligence techniques in wind power generation. Integrated Computer-Aided Engineering; 2018, p. 79–90. https://doi.org/10.3233/ica-2001-8305 .

Pereira MM, Machado RL, Ignacio Pires SR, Pereira Dantas MJ, Zaluski PR, Frazzon EM. Forecasting scrap tires returns in closed-loop supply chains in Brazil. J Clean Prod. 2018;188:741–50. https://doi.org/10.1016/J.JCLEPRO.2018.04.026 .

Fanoodi B, Malmir B, Jahantigh FF. Reducing demand uncertainty in the platelet supply chain through artificial neural networks and ARIMA models. Comput Biol Med. 2019;113:103415. https://doi.org/10.1016/J.COMPBIOMED.2019.103415 .

Sharma R, Singhal P. Demand forecasting of engine oil for automotive and industrial lubricant manufacturing company using neural network. Mater Today Proc. 2019;18:2308–14. https://doi.org/10.1016/J.MATPR.2019.07.013 .

Tanizaki T, Hoshino T, Shimmura T, Takenaka T. Demand forecasting in restaurants using machine learning and statistical analysis. Procedia CIRP. 2019;79:679–83. https://doi.org/10.1016/J.PROCIR.2019.02.042 .

Wang C-H, Chen J-Y. Demand forecasting and financial estimation considering the interactive dynamics of semiconductor supply-chain companies. Comput Ind Eng. 2019;138:106104. https://doi.org/10.1016/J.CIE.2019.106104 .

Download references

Acknowledgements

The authors are very much thankful to anonymous reviewers whose comments and suggestion were very helpful in improving the quality of the manuscript.

Author information

Authors and affiliations.

Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, H3G 1M8, Canada

Mahya Seyedan & Fereshteh Mafakheri

You can also search for this author in PubMed   Google Scholar

Contributions

The authors contributed equally to the writing of the paper. First author conducted the literature search. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Fereshteh Mafakheri .

Ethics declarations

Ethics approval.

Not applicable.

Competing interests

The authors declare no competing or conflicting interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Seyedan, M., Mafakheri, F. Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities. J Big Data 7 , 53 (2020). https://doi.org/10.1186/s40537-020-00329-2

Download citation

Received : 05 April 2020

Accepted : 17 July 2020

Published : 25 July 2020

DOI : https://doi.org/10.1186/s40537-020-00329-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Demand forecasting
  • Closed-loop supply chains
  • Machine-learning

big data analytics research papers 2020

  • Email Alert

big data analytics research papers 2020

论文  全文  图  表  新闻 

  • Abstracting/Indexing
  • Journal Metrics
  • Current Editorial Board
  • Early Career Advisory Board
  • Previous Editor-in-Chief
  • Past Issues
  • Current Issue
  • Special Issues
  • Early Access
  • Online Submission
  • Information for Authors
  • Share facebook twitter google linkedin

big data analytics research papers 2020

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8 , Top 4% (SCI Q1) CiteScore: 17.6 , Top 3% (Q1) Google Scholar h5-index: 77, TOP 5

Big Data Analytics in Healthcare — A Systematic Literature Review and Roadmap for Practical Implementation

Doi:  10.1109/jas.2020.1003384.

  • Sohail Imran , 
  • Tariq Mahmood ,  , 
  • Ahsan Morshed , 
  • Timos Sellis

Sohail Imran is an Assistant Professor and a doctoral candidate at the PAF-Karachi Institute of Economics and Technology, Pakistan. He has more than 15 years teaching experience in databases, data science, and big data analytics, and more than 10 years of training experience in databases (SQL and NoSQL), big data infrastructure, and data science for different institutes, universities, and the corporate sector. His research work is focused on mapping OLAP data warehousing schema into the distributed Hadoop environment. Specifically, he has developed a framework which creates dimension and fact tables over Hbase and Hive in a NoSQL schema-less manner and computes aggregates through SQL-overHadoop technologies (Presto, Drill, Spark SQL). This functionality is made scalable through containerization and more efficient through the use of Apache Spark

Tariq Mahmood is an Associate Professor at the Faculty of Computer Science, Institute of Business Administration (IBA), Pakistan. He received the Ph.D. degree in machine learning from University of Trento, Italy, and the M.S. degree in statistical machine learning from Universite Pierre et Marie Curie (Paris 6), France. He has published around 20 international journal and 35 conference publications with total 691 citations and h-index of 12 (Google Scholar). His research interests include BDA, deep learning and machine learning/data science. He heads the Big Data Analytics Laboratory at IBA, with the focus on imparting data science and big data certifications to students and industry professionals, implementing BDA-related industrial projects and researching in BDA technology stack, particularly to develop BDA architectures for different types of streaming and non-streaming data. He also consults in various local industries regarding business intelligence, data governance, BDA, and machine learning

Ahsan Morshed is a Lecturer in ICT at CQ University, Australia. Previously, he was a Research Fellow in Data Analytics at Swinburne University of Technology and a Senior Project Officer at RMIT University. He was also a Postdoctoral Fellow at CSIRO (Australia) on sensor data integration and machine learning, and an Information Management Specialist in the OEKC division at Food and Agriculture Organization (FAO) of UN in Rome, Italy. During his time in FAO, he acquired extensive skills in metadata standards, knowledge organization systems, ontologies, Linked Open Data management and information management tools. His research interests are the big data, data science, semantic web, linked open data and semantic machine learning. He holds the Ph.D. degree from the University of Trento, Italy. Dr. Morshed has 50 peer-reviewed publications (book, book chapter, journals, conference and workshop papers), with 229 citations and an h-index of 6 (Google Scholar)

Timos Sellis (F’09) is a Professor at Swinburne University of Technology, Australia. He holds the diploma from National Technical University of Athens (NTUA), Greece, the M.Sc. degree from Harvard University, USA, and the Ph.D. degree from the University of California at Berkeley, USA. Timos has a significant international research reputation in big data, data analytics, data integration and spatiotemporal database systems. He is a Fellow of the Association for Computing Machinery (ACM) for his contributions to database query optimisation, spatial data management and data warehousing and also an Institute of Electrical and Electronics Engineers (IEEE) Fellow for his contributions to database query optimisation and spatial data management. In 2018 he was awarded the IEEE TCDE Impact Award, in recognition of his impact in the field and for contributions to database systems research and broadening the reach of data engineering research. Before joining Swinburne, Timos was the Director of the Institute for Management of Information Systems and Professor at the National Technical University of Athens. He has also held the role of Director, Big Data Lab at RMIT University

  • Corresponding author: T. Mahmood is with the Faculty of Computer Science, Institute of Business Administration, Karachi 75270, Pakistan (e-mail: [email protected] )
  • 1 https://neo4j.com
  • 2 http://www.hl7.org/implement/standards/fhir/)
  • 3 A group of graduate students participated in this activity over a period of 3 months. For the sake of brevity, the details are outside the scope of this paper.
  • 4 To the best of our knowledge, this list is complete as of June 2020.
  • 5 A detailed discussion of the nine compared papers is outside the scope of this work; we invite the reader to go through these papers for more required information.
  • Revised Date: 2020-07-21
  • Accepted Date: 2020-07-22
  • Big data analytics (BDA) , 
  • big data architecture , 
  • healthcare , 
  • NoSQL data stores , 
  • patient care , 
  • roadmap , 
  • systematic literature review

Proportional views

通讯作者: 陈斌, [email protected].

沈阳化工大学材料科学与工程学院 沈阳 110142

Figures( 13 )  /  Tables( 5 )

Article Metrics

  • PDF Downloads( 306 )
  • Abstract views( 3964 )
  • HTML views( 905 )
  • The most thorough systematic literature review on big data analytics applications to healthcare
  • Focus on healthcare applications for NoSQL databases and Apache Hadoop ecosystem
  • Proposes the first-ever Zeta architecture called Med-BDA for big healthcare data analytics
  • Med-BDA has the potential to solve ALL current limitations for big healthcare data analytics
  • We present business strategies to successfully implement Med-BDA in any clinical organization
  • Copyright © 2022 IEEE/CAA Journal of Automatica Sinica
  • 京ICP备14019135号-24
  • E-mail: [email protected]  Tel: +86-10-82544459, 10-82544746
  • Address: 95 Zhongguancun East Road, Handian District, Beijing 100190, China

big data analytics research papers 2020

Export File

shu

  • Figure 1. Year-wise distribution of selected 99 articles
  • Figure 2. Digital source distribution for six basic search queries
  • Figure 3. Digital source distribution for six basic search queries + healthcare (HC)
  • Figure 4. Digital source distribution for six basic search queries + healthcare analytics (HA)
  • Figure 5. Hadoop components and ecosystem
  • Figure 6. Data generators for an HIMS
  • Figure 7. The 4 V’s big data identified in healthcare research literature
  • Figure 8. The Challenges in Application of Big Data Analytics to Healthcare
  • Figure 9. A snapshot of key-value store from healthcare domain
  • Figure 10. A snapshot of columnar store from healthcare domain
  • Figure 11. A snapshot of a document store from healthcare domain
  • Figure 12. A snapshot of a graph store from healthcare domain
  • Figure 13. Med-BDA: A state-of-the-art BDA architecture for healthcare

Book cover

  • Conference proceedings
  • © 2022

Contemporary Issues in Communication, Cloud and Big Data Analytics

Proceedings of CCB 2020

  • Hiren Kumar Deva Sarma 0 ,
  • Valentina Emilia Balas   ORCID: https://orcid.org/0000-0003-0885-1283 1 ,
  • Bhaskar Bhuyan 2 ,
  • Nitul Dutta 3

Department of Information Technology, Sikkim Manipal Institute of Technology, Majitar, India

You can also search for this editor in PubMed   Google Scholar

Department of Automatics and Applied Software, Aurel Vlaicu University of Arad, Arad, Romania

Department of computer science and engineering, marwadi university, rajkot, india.

  • Presents research works in the field of communication, cloud and big data
  • Provides original works presented at CCB 2020 held in Sikkim, India
  • Serves as a reference for researchers and practitioners in academia and industry

Part of the book series: Lecture Notes in Networks and Systems (LNNS, volume 281)

29k Accesses

50 Citations

  • Table of contents

About this book

Editors and affiliations, about the editors, bibliographic information.

  • Publish with us

Buying options

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (38 papers)

Front matter, communication, reliable data delivery in software-defined networking: a survey.

  • Prerna Rai, Hiren Kumar Deva Sarma

Phishing Websites, Detection and Analysis: A Survey

  • Leena I. Sakri, Pushpalatha S. Nikkam, Madhuri Kulkarni, Priyanka Kamath, Shreedevi Subrahmanya Bhat, Swati Kamat

Analysis of Security Attacks in SDN Network: A Comprehensive Survey

  • Ali Nadim Alhaj, Nitul Dutta

An Overview of 51% Attack Over Bitcoin Network

  • Raja Siddharth Raju, Sandeep Gurung, Prativa Rai

An IPS Approach to Secure V-RSU Communication from Blackhole and Wormhole Attacks in VANET

  • Gaurav Soni, Kamlesh Chandravanshi, Mahendra Ku. Jhariya, Arjun Rajput

BER Analysis of FBMC for 5G Communication

  • Balwant Singh, Malay Ranjan Tripathy, Rishi Asthana

Impact of TCP-SYN Flood Attack in Cloud

  • Anurag Sharma, Md. Ruhul Islam, Dhruba Ningombam

An Efficient Cooperative Caching with Request Forwarding Strategy in Information-Centric Networking

  • Krishna Delvadia, Nitul Dutta

Instabilities of Consensus

  • Priya Ranjan

Delay-Based Approach for Prevention of Rushing Attack in MANETs

  • Ashwin Adarsh, Tshering Lhamu Tamang, Payash Pradhan, Vikash Kumar Singh, Biswaraj Sen, Kalpana Sharma

ASCTWNDN:A Simple Caching Tool for Wireless Named Data Networking

  • Dependra Dhakal, Mohit Rathor, Sudipta Dey, Prantik Dey, Kalpana Sharma

Design of MIMO Cylindrical DRA’s Using Metalstrip for Enhanced Isolation with Improved Performance

  • A. Jayakumar, K. Suresh Kumar, T. Ananth Kumar, S. Sundaresan

A Robust BSP Scheduler for Bioinformatics Application on Public Cloud

  • Leena I. Sakri, K. S. Jagadeeshgowda

Mobile Cloud-Based Framework for Health Monitoring with Real-Time Analysis Using Machine Learning Algorithms

  • Suman Mohanty, Ravi Anand, Ambarish Dutta, Venktesh Kumar, Utsav Kumar, Md. Ruhul Islam
  • Big Data Analytics

Genomic Data and Big Data Analytics

  • Hiren Kumar Deva Sarma

Image Processing

This book presents the outcomes of the First International Conference on Communication, Cloud, and Big Data (CCB) held on December 18–19, 2020, at Sikkim Manipal Institute of Technology, Majitar, Sikkim, India. This book contains research papers and articles in the latest topics related to the fields like communication networks, cloud computing, big data analytics, and on various computing techniques. Research papers addressing security issues in above-mentioned areas are also included in the book. The research papers and articles discuss latest issues in the above-mentioned topics. The book is very much helpful and useful for the researchers, engineers, practitioners, research students, and interested readers.

  • Communication Networks
  • Cloud Computing
  • Network Security
  • Cloud Computing Platform
  • Big Data Open Platforms

Hiren Kumar Deva Sarma, Bhaskar Bhuyan

Valentina Emilia Balas

Nitul Dutta

Dr. Hiren Kumar Deva Sarma is Professor in the Department of Information Technology, Sikkim Manipal Institute of Technology, Sikkim. He received Bachelor of Engineering in Mechanical Engineering from Assam Engineering College, Guwahati, Assam (1998). He completed Master of Technology in Information Technology from Tezpur University, Assam (2000). He received Doctor of Philosophy (in Computer Science & Engineering) from Jadavpur University, West Bengal (2013). He has co-authored two books, edited three book volumes, and published more than seventy research papers in different International Journals and referred International and National Conferences of repute. He is the recipient of Young Scientist Award from International Union of Radio Science (URSI) in the XVIII General Assembly 2005, held at New Delhi, India, and has received IEEE Early Adopter Award in 2014. His current research interests are networks, network security, robotics, and big data analytics.  

Dr. Bhaskar Bhuyan is presently working as Associate Professor in the Department of Information Technology, Sikkim Manipal Institute of Technology affiliated to Sikkim Manipal University, Sikkim, India. He did his B.E. (1997) in Computer Science & Engineering from Motilal Nehru Regional Engineering College (now NIT), Allahabad, India.  He did his M.Tech. (2000) in Information Technology and Ph.D. (2017) in Computer Science & Engineering from Tezpur University, Assam, India. He has 18+ years of professional experience in teaching as well as in industry. He has published several research papers in various conferences and journals of repute, and co-edited one book (conference proceedings). His research interests include computer networks, wireless sensor networks, mobile ad hoc networks, Internet of things, and cloud computing.

Book Title : Contemporary Issues in Communication, Cloud and Big Data Analytics

Book Subtitle : Proceedings of CCB 2020

Editors : Hiren Kumar Deva Sarma, Valentina Emilia Balas, Bhaskar Bhuyan, Nitul Dutta

Series Title : Lecture Notes in Networks and Systems

DOI : https://doi.org/10.1007/978-981-16-4244-9

Publisher : Springer Singapore

eBook Packages : Engineering , Engineering (R0)

Copyright Information : The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022

Softcover ISBN : 978-981-16-4243-2 Published: 02 December 2021

eBook ISBN : 978-981-16-4244-9 Published: 30 November 2021

Series ISSN : 2367-3370

Series E-ISSN : 2367-3389

Edition Number : 1

Number of Pages : XVIII, 476

Number of Illustrations : 41 b/w illustrations, 191 illustrations in colour

Topics : Communications Engineering, Networks , Professional Computing , Big Data , Computer Communication Networks

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • PeerJ Comput Sci

Logo of peerjcs

Artificial intelligence approaches and mechanisms for big data analytics: a systematic study

Amir masoud rahmani.

1 Future Technology Research Center, National Yunlin University of Science and Technology, Yunlin, Taiwan

2 Department of Computer Science, Khazar University, Baku, Azerbaijan

Elham Azhir

3 Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

4 Department of Information Systems, College of Economics and Political Science, Sultan Qaboos University, Muscat, Oman

Mokhtar Mohammadi

5 Department of Information Technology, Lebanese French University, Erbil, Kurdistan Region, Iraq

Omed Hassan Ahmed

6 Department of Information Technology, University of Human Development, Sulaymaniyah, Iraq

Marwan Yassin Ghafour

7 Department of Computer Science, College of Science, University of Halabja, Halabja, Iraq

Sarkar Hasan Ahmed

8 Network Department, Sulaimani Polytechnic University, Sulaymaniyah, Iraq

Mehdi Hosseinzadeh

9 Institute of Research and Development, Duy Tan University, Da Nang, Vietnam

10 Mental Health Research Center, Psychosocial Health Research Institue, Iran University of Medical Sciences, Tehran, Iran

Associated Data

The following information was supplied regarding data availability:

This article is a literature review.

Recent advances in sensor networks and the Internet of Things (IoT) technologies have led to the gathering of an enormous scale of data. The exploration of such huge quantities of data needs more efficient methods with high analysis accuracy. Artificial Intelligence (AI) techniques such as machine learning and evolutionary algorithms able to provide more precise, faster, and scalable outcomes in big data analytics. Despite this interest, as far as we are aware there is not any complete survey of various artificial intelligence techniques for big data analytics. The present survey aims to study the research done on big data analytics using artificial intelligence techniques. The authors select related research papers using the Systematic Literature Review (SLR) method. Four groups are considered to investigate these mechanisms which are machine learning, knowledge-based and reasoning methods, decision-making algorithms, and search methods and optimization theory. A number of articles are investigated within each category. Furthermore, this survey denotes the strengths and weaknesses of the selected AI-driven big data analytics techniques and discusses the related parameters, comparing them in terms of scalability, efficiency, precision, and privacy. Furthermore, a number of important areas are provided to enhance the big data analytics mechanisms in the future.

Introduction

With the rapid innovations of digital technologies, the volume of digital data is growing fast ( Klein, 2017 ). Consequently, large quantities of data are created from lots of sources such as social networks, smartphones, sensors, etc. Such huge amounts of data that conventional relational databases and analytical techniques are unable to store and process is called Big Data. Development of novel tools and analytical techniques are therefore required to discover patterns from large datasets. Big data is produced quickly from numerous sources in multiple formats. Henceforth, the novel analytical tools should be able to detect correlations between rapidly changing data to better exploit them.

As mentioned, traditional processing techniques have problems coping with a huge amount of data. It’s necessary to develop effective ways for data analysis in big data problems. Various big data frameworks such as Hadoop and Spark have allowed a lot of data to be distributed and analyzed ( Oussous et al., 2018 ). Furthermore, different types of Artificial Intelligence (AI) techniques, such as Machine Learning (ML) and search-based methods were introduced to deliver faster and more precise results for large data analytics. The combination of big data tools and AI techniques has created new opportunities in big data analysis.

There are some literatures reviews on big data analytics techniques. Nevertheless, none of these articles concentrate on the complete and systematic review of the artificial intelligent mechanisms for big data analytics. We have studied and classified the articles in the field of big data analytics using artificial intelligent techniques. The AI-driven big data analytics techniques will be described together with the strengths and weaknesses of every technique. In this survey, the existing research on big data analytics techniques is categorized into four major groups, including machine learning, knowledge-based and reasoning methods, decision-making algorithms, and search methods and optimization theory. This survey makes three main contributions as follows:

  • • Providing a systematic study related to big data analytics using AI techniques.
  • • Classifying and reviewing AI-driven big data analytics techniques in four main categories, and specifying their key advantages and disadvantages.
  • • Discussing open issues to provide new research directions in the big data analysis filed.

The following classification will be discussed in the rest of the paper. The previous studies have been reviewed in “Background and Related Work”. In “Research Selection Method”, we described the process of article selection. The intended taxonomy for the chosen big data analysis studies and the selected studies are reviewed in “AI-driven big Data Analytics Mechanisms”. The investigated studies will be compared in “Results and Comparisons”. Eventually, some open issues and the conclusion are provided in “Open Issues and Challenges” and “Conclusion”, respectively.

Background and Related Work

In this part, some preliminaries and related works for big data analytics are illustrated.

Big data definition and characteristics

Huge volumes of data gathered from various sources like sensors, transactional applications, and social media in heterogeneous formats. There are various definitions presented for big data ( I.I.J, 2014 ; Gantz & Reinsel, 2012 ; Glossary, 2014 ; Manyika et al., 2011 ; Chang et al., 2015 ). Generally, the term Big Data refers to a growing set of data that contain varied formats: structured, unstructured, and semi-structured data. Existing Database Management Systems (DBMSs) are not able to process such a huge volume of heterogeneous data. Therefore, powerful technologies and advanced algorithms are needed for processing big data.

The big data can be described using different V’s such as Volume, Velocity, Variety, Veracity ( Furht & Villanustre, 2016 ).

  • • Volume: This implies the huge quantities of data produced every second. These huge volumes of data can be processed in big data frameworks.
  • • Velocity: This denotes the speed of data production and processing to extract valuable insights.
  • • Variety: This specifies the various format of data such as documents, videos, and logs.
  • • Veracity: This indicates the data quality factors. That is, it specifies the biases, noise, abnormality etc. in the data.

Nowadays, more V’s and other characteristics such as Visualization, Value, and Volatility have been used to better define big data ( Patgiri & Ahmed, 2016 ).

Management of big data is essential to efficiently manage big data for creating quality data analytics. It includes efficient data collection from different sources, efficient storage using various mechanisms and tools, data cleansing to eliminate the errors and transform the data into a uniform format, and data encoding for security and privacy. The goal of this process is to ensure the availability, management, efficient and secure storage of reliable data.

Big data analytics

Organizations can extract valuable information and patterns that may affect business through big data analytics ( Gandomi & Haider, 2015 ). Thus, advanced data analysis is needed to identify the relations between features and forecast future observations. Big data analytics refers to techniques applied to achieve insights from huge datasets ( Labrinidis & Jagadish, 2012 ). The big data analytics results can improve decision-making and increase organizational efficiency. Various analytical approaches are developed to extract knowledge from the data, such as:

  • • Descriptive analytics is concerned with analyzing historical data of a business to describe what occurred in the past ( Joseph & Johnson, 2013 ).
  • • Predictive analytics  is focused on a variety of statistical modeling and machine learning techniques to predict future possibilities ( Waller & Fawcett, 2013 ).
  • • Prescriptive analytics  include descriptive and predictive analytics to recommend the most suitable actions to enhance business practices ( Joseph & Johnson, 2013 ).

Data mining, statistical analysis, machine learning, rule-based systems, neural networks, and etc. are various analytics techniques to make better and faster decisions on big data sets to uncover hidden patterns. Various researches address this field of study by improving the developed techniques, proposing novel methods, or investigating the combination of various algorithms. However, more analytical improvements are required to meet the challenges of big data ( Oussous et al., 2018 ).

Big data platforms

Batch processing, real-time processing, and interactive analytics are different platforms of big data ( Borodo, Shamsuddin & Hasan, 2016 ). Batch processing platforms perform extensive computations and take time to process data. Apache Hadoop is the most common batch processing platform. It is used due to scalability, cost-effectiveness, flexibility, and fault tolerance in the big data processing. Hadoop Distributed File System (HDFS), Yet Another Resource Negotiator (YARN), and MapReduce distributed programming model are some different modules of the Hadoop platform which operate across the big data value chain; from aggregation, storage, process, and management.

As defined in the previous sections, velocity is another characteristic of big data. It is defined as a continuous, and high-speed data streams that arrive at rapid rates, and requires continuous processing and analysis. Real-time processing platforms are used for fast and efficient analysis of continuous data streams. Apache Spark ( Acharjya & Ahmed, 2016 ) and Storm ( Mazumder, 2016 ) are two common examples of stream processing platforms. Stream processing would be required for various applications such as weather and transport systems.

Interactive analytics platforms enable users to access dataset remotely and perform various operations as needed. Users can connect to a system directly and interact with data. Apache Drill is an example of interactive analytic platforms.

Related work

A brief overview of the previous survey studies is presented in this part. Here the previous surveys are classified into four main categories include big data management process; big data analytics techniques; big data platforms; and big data analytics applications. We discuss these surveys in the following subsections.

Big data management process

The authors in  Tsai et al. (2015) reviewed various studies related to the traditional and recent big data analysis. The procedure of Knowledge Discovery in Data mining (KDD) involving input, analysis, and output is considered as the basis for these studies. Various data and big data mining techniques such as clustering and classification are discussed in the analysis step. Moreover, some open issues and future research directions have been suggested to provide efficient methods. However, their survey has not been written in a systematic way, the studies are not compared completely and the recently published articles are not included. Also, they only have focused on the machine learning category of artificial intelligence techniques, and other AI categories such as computational intelligence have not been studied.

Siddiqa et al. (2016) presented a basic overview of various big data management techniques. A detailed taxonomy was presented based on storage, pre-processing, processing, and security. Various articles have been discussed in each category. Furthermore, the features of the proposed methods were described in this paper, and different techniques were compared. Moreover, future works and open challenges have been discussed. However, there is no clear method for article selection.

Big data analytics techniques

Athmaja, Hanumanthappa & Kavitha (2017) presented a systematic literature-based review of the big data analytics approaches according to the machine learning mechanisms. However, no categorization is provided for reviewing related studies in the present paper. Moreover, the non-functional features of the studies have not been investigated. The authors do not provide any systematic procedure for gathering the related studies.

Ghani et al. (2019) have reviewed the existing big social media analytics approaches in five classes: artificial neural networks, fuzzy systems, swarm intelligence, evolutionary computation, and deep learning. The authors assessed the reviewed techniques based on their quality metrics. However, there is no systematic procedure to select articles related to this field.

A complete study of big data analysis tools and techniques has been presented by Mittal & Sangwan (2019) . The authors focused on studying machine learning techniques for big data analysis. Therefore, three categories are considered for reviewing selected techniques, which include supervised learning, unsupervised learning, and reinforcement learning. Nevertheless, there is no clear method for article selection and the studies have not been evaluated based on quality parameters.

Qiu et al. (2016) presented a brief review of the ML techniques. Some recent learning methods, such as representation learning, deep learning, distributed and parallel learning, transfer learning, active learning, and kernel-based learning are highlighted in this review article. However, they only focused on machine learning techniques and the study reviews few papers in each classification. Also, the article selection procedure is not included in this paper. Moreover, in this paper, no technical comparison has been made in relation to the proposed methods.

Another work provided by Sivarajah et al. (2017) for the big data analysis techniques. The authors categorized these techniques into three main groups, including descriptive, predictive, and prescriptive analytics. However, there are some gaps in analyzing the qualitative parameters, and the study selection process.

Oussous et al. (2018) investigated the impact of big data challenges, and numerous tools for its analysis. The tools used for big data processing are discussed in this article. Also, the challenges of big data analytics are divided into six general categories: big data management, big data cleansing, big data collection, imbalanced big data, big data analytics, and big data machine learning. However, the article selection process is not included. Also, there are no categories in this article based on some factors.

Nicolalde et al. (2018) investigated research efforts directed toward big data processing technologies. The authors discussed some associated challenges, such as data storage and analysis, knowledge discovery and computational complexities, scalability and data visualization, and information security. However, the article selection process is not referred and the studies have not been evaluated based on quality parameters.

Big data analytics applications

Vaishya et al. (2020) studied the main applications of AI for prevention and fighting against Coronavirus Disease 2019 (COVID-19). The authors recognized seven applications of AI for the COVID-19 pandemic: (1) detection of the disease, (2) monitor patient treatment, (3) contact tracing, (4) predicting cases and deaths, (5) drug production, (6) reduction of workloads, and (7) disease prevention. However, this paper fails to take into account the following: (1) few papers were investigated (2) the study selection process is not stated, and (2) the qualitative parameters were not provided. Furthermore, a detailed taxonomy was not presented based on AI techniques.

Finally, Pham et al. (2020) discussed the applications of AI techniques and big data to manage and analyze the huge volume of data derived from the COVID-19 disease. Five categories are considered for reviewing selected big data techniques, which include prediction of COVID-19 outbreak, tracking the spread of the virus, diagnosis and treatment, and drug discovery. Then, the related challenges of the reviewed solutions highlighted. Nevertheless, there is no clear method for article selection.

Due to the investigated studies, there are some weaknesses in the current big data analysis surveys as follows:

  • • Many articles did not assess the qualitative metrics for investigating the techniques.
  • • Some papers did not present any reasonable classification of data analytics techniques in the context of big data.
  • • Some papers did not clear the paper selection procedure.
  • • Many articles did not present entire categories of artificial intelligence techniques for reviewing big data analytics.

The reasons mentioned led us to write a survey paper on big data analysis using artificial intelligence mechanisms to overcome all of these lacks.

Research selection method

This section provides guidelines for performing a systematic analysis for studying the big data analytics approaches. The systematic analysis procedure includes a clarification of finding the related studies in scientific databases ( Charband & Navimipour, 2016 ). The following Research Questions (RQs) are defined and answered according to the objectives and scope of the present survey:

  • • RQ1: What is the taxonomy designed for big data analytics techniques?
  • • RQ2: Which artificial intelligence techniques are applied to big data analytics?
  • • RQ3: What qualitative features are assessed in artificial intelligence approaches?
  • • RQ4: And what are the big data analytics open issues?

After defining the research questions, some criteria are applied to select the final studies. The article selection process is shown in Fig. 1 . In this systematic procedure, some popular databases such as ScienceDirect, SpringerLink, IEEE Xplore, and ACM Digital Library are used. Masters theses and doctoral dissertations, conference papers, book chapters, and non-English papers were excluded from the study. The following keywords are searched for the period 2016 to 2021 ( Antonopoulos et al., 2020 ):

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-488-g001.jpg

  • • “Artificial Intelligence” AND “Big Data Analytics”
  • • “Machine Learning” AND “Big Data Analytics”
  • • “Neural Networks” AND “Big Data Analytics”

Initially, 4,291 and additional 10 papers were identified through our keyword search strategy. In the next steps, duplicate records are removed and some criteria are considered for selecting high-quality studies. Titles, abstracts, and keywords were studied to select the articles for the next step. Henceforth, 468 articles remained for re-evaluation. In stage 3, a review of the text of the selected studies from the second stage was performed to confirm these studies. A total of 32 articles were identified in the last step. The distribution of the articles by various publishers and the publication year is shown in Fig. 2 . As shown in Fig. 2 , the highest number of articles is related to Elsevier in 2018.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-488-g002.jpg

AI-driven big data analytics mechanisms

Classification and review of the selected big data analysis studies are performed based on the AI subfields used in big data analytics. Figure 3 shows the taxonomy of the big data analytics techniques based on the AI subfields, and categorizes the articles investigated in this survey within those categories. The presented taxonomy has four main categories, including machine learning, knowledge-based and reasoning methods, decision-making algorithms, and search methods and optimization theory ( Russell & Norvig, 2020 ).

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-488-g003.jpg

Furthermore, the four most significant qualitative parameters are defined to assess each big data analysis method and recognize its benefits and drawbacks, as follows:

  • • Scalability : The mechanism’s ability to adapt to rapid changes without compromising the quality of the analysis.
  • • Efficiency : It denotes the ratio of the method to the overall time and cost need.
  • • Precision : This is detected with various parameters like data errors, and the predictive ability of algorithms.
  • • Privacy : It defines the practices which safeguard that the data is only used for its intended purpose.

The papers are overviewed and compared with mechanism goals in the last step.

Machine learning mechanisms

Machine learning algorithms can be divided into two main classes including supervised learning and unsupervised learning. The first class needs a lot of manual effort to put the data in a proper format to learn algorithms. The unsupervised learning algorithms can discover hidden patterns in huge amounts of unlabeled data.

Supervised learning

The aim of a supervised learning algorithm is to forecast the right label for newly presented input data using another dataset. In this learning method, a set of inputs and outputs is presented and the relation among them is found while training the system. The main objective of supervised learning is to model the dependency between the input features and the target prediction outputs. As shown in Fig. 4 , input examples are categorized into a known set of classes ( Kotsiantis, Zaharakis & Pintelas, 2007 ).

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-488-g004.jpg

Carcillo et al. (2018) proposed a novel platform for fraud detection named, SCAlable Real-time Fraud Finder (SCARFF). The proposed platform uses Kafka, Spark, and Cassandra big data tools along with a machine learning technique to process streaming data. The machine learning engine composed of a weighted ensemble that employs two types of classifiers based on random forest ( Breiman, 2001 ; Rokach, 2016 ). It deals with imbalanced data, non-stationarity, and feedback latency. The results indicate that the efficiency, accuracy, and scalability of the presented framework is satisfactory over a big stream of transactions.

Kannan et al. (2019) presented a predictive approach on demonetization data using a support vector machine, called PAD-SVM. Preprocessing, descriptive analysis, and prescriptive analysis are three stages of the proposed PAD-SVM. Cleaning the data, handling the missing data fields, and splitting the essential data from the tweets are performed in the first stage. Identifying the most influential individual and performing analytical functionalities are two key functions of the descriptive analysis stage. Semantic analysis is also performed in the second stage. The present mindset of people and the reaction of society to the problem is predicted using predictive analysis. The authors performed a series of experiments and confirmed the performance of the proposed method in terms of execution time and classification error.

Feng et al. (2019) proposed several data mining and deep learning methods for visualization and trend prediction of criminal data. The authors discovered various interesting facts and patterns from the criminal data of San Francisco, Chicago, and Philadelphia datasets. The proposed method has lower complexity in comparison with LSTM. Based on the predictive results of the article, the superior performance of the Prophet model and Keras stateful LSTM is confirmed as compared to traditional neural networks.

Accurate and timely forecasting popularity of television programs is of great value for content providers, advertisers, and broadcast television operators. Traditional prediction models require a huge amount of samples and long training time, and the precision of predictions for programs with high peaks or severe decrease in popularity is poor. Zhu, Cheng & Wang (2017) proposed an enhanced prediction method based on trend detection. The authors used a random forest model after clustering the trends using the Dynamic Time Wrapping (DTW) algorithm and K-medoids clustering. For new programs, the GBM classifier applied to assign them to the existing trends. According to the trial outcomes, the introduced model obtains better prediction results with a combination of prediction values from the trend-specific models and classification probabilities. The results also revealed that the forecasting period is effectively reduced compared to the current forecasting methods.

Big data produced by social media is a great opportunity to extract valuable insights. With the growth of the data size, distributed deep learning models are efficient for analyzing social data. Henceforth, it is essential to improve the performance of deep learning techniques. Hammou, Lahcen & Mouline (2020) presented a novel efficient technique for sentiment analysis. The authors tried to adopt fastText with Recurrent Neural Network (RNN) variants to represent and classify textual data. Furthermore, a distributed system based on distributed machine learning has been proposed for real-time analytics. The performed trials prove that the presented method outperforms Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Gated Recurrent Unit (GRU) methods in terms of classification accuracy. Also, it can handle large scale data for sentiment analysis.

Nowadays, the urban network has produced a huge amount of data. Therefore, some security challenges arise because of the private data gathering by smart devices. Tian et al. (2020) tried to discover the abnormal behavior of insiders to avoid urban big data leakage. The authors developed various deep learning methods to analyze deviations among realistic actions and the normalcy of daily activities. Abnormal activities are recognized using a Multi-Layer Perceptron (MLP) based on the computed deviations. According to the trial outcomes, the proposed method can learn the normal pattern of behaviors and identify abnormal activities with high precision.

Internet traffic is growing rapidly in the age of multimedia big data. Therefore, data processing and network overload are two key challenges in this context. Wang et al. (2016) proposed a hybrid-stream model to solve these challenges for video analysis. It contains data preprocessing, data classification, and data-load-reduction modules. A modified version of the CNN method is developed to evaluate the importance of each video frame to improve classification accuracy. The outcomes confirmed that the proposed model reduces data load, controls the video input size, and decreases the overload of the network. The outcomes also confirmed the effective reduction of processed video without compromising the quality of experience. Also, it observed that the model has a good performance for the continuous growth of large multimedia data as compared to other traditional models.

Kaur, Sharma & Mittal (2018) proposed a novel model for smart healthcare information systems using machine learning algorithms. The proposed model includes four layers. The data source layer handles heterogeneous data sources. The data storage layer manages the storage optimization process. Various techniques like indexing and normalization have been used to make optimal use of system resources. Different data security and privacy techniques such as data masking, granular control over data access, activity monitoring, dynamic encryption, and endpoint validation are used in the data security layer. Finally, machine learning methods used in the application layer for early diagnosis of the disease. Based on the trial outcomes of the article, the accuracy of the proposed model improved by using fuzzy logic and information theory.

Nair, Shetty & Shetty (2018) introduced a novel health status prediction system by applying machine learning models on big data streams. The presented system built using Apache Spark and deployed in the cloud environment. The user sends his health qualities and the system forecasts the user’s health status in real-time. A decision tree model is created from the existing healthcare data and applied to streaming data for health status prediction. The presented architecture leads to the time and cost-efficiency of the introduced system. The privacy of data is overcome by using a secondary Twitter account.

AlZubi (2020) developed new big data technologies and machine learning methods to identify diabetes disease. First, the data is gathered from a huge data set, and the MapReduce model is used to efficiently combine the small chunk of data. Then, the normalization procedure is used to eliminate the noise of the collected data. Also, an ant bee colony algorithm is applied to select the statistical features. The chosen features are trained using the SVM with a multilayer neural network. The associated neural network is applied to classify the learned features. The results revealed that the SVM neural network provides high accuracy, sensitivity, and less error rate.

Detection of COVID-19 based on the analysis of chest X-ray and Computed Tomography (CT) scans, has attracted the attention of researchers. COVID-19 medical scans analysis using machine learning algorithms provides an automated and effective diagnostic tool. El-bana, Al-Kabbany & Sharkas (2020) proposed a multi-task pipeline model based on deep neural networks for COVID-19 medical scans analysis. An Inception-v3 deep model fine-tuned using multi-modal learning in the first stage. A Convolutional Neural Network (CNN) architecture is used to identify three types of manifestations in the second stage. Transfers learning from another domain of knowledge to generate binary masks for segmenting the regions related to these manifestations are performed in the last stage. Based on the trial results, the proposed framework enhances efficiency in terms of computational time. Furthermore, the proposed system has higher accuracy compared to the recent literature.

A novel Computer-Aided Diagnosis (CAD) system called FUSI-CAD based on AI techniques has been proposed by Ragab & Attallah (2020) . The proposed FUSI-CAD is based on combining several different CNN architectures with three handcrafted features including statistical features and textural analysis features that have not previously been used in coronavirus diagnosis. The results reveal that the proposed FUSI-CAD can precisely distinguish between COVID-19 and non-COVID-19 images compared to the recent related studies.

Also, a deep CNN on chest X-rays is proposed by Ahmed, Bukhari & Keshtkar (2021) to determine COVID-19. After 5-fold cross-validation on a multi-class dataset consisting of COVID-19, Viral Pneumonia, and normal X-ray images, the proposed method achieved a classification accuracy of 90.64%.

Recently, the novel coronavirus infection is threatening human health. The Internet of Things (IoT) and big data technologies play a vital role to fight against COVID-19 infection. Ahmed et al. (0000) proposed a new framework for analyzing and forecasting COVID-19 using the integration of big data analytics and IoT. The proposed framework is developed based on neural networks. According to the trial results, the proposed framework has good performance with an accuracy of 99% as compared to traditional machine learning methods.

Asencio-Cortés et al. (2018) investigated the use of regression algorithms with ensemble learning for predicting the magnitude of the earthquakes. The Apache Spark distributed processing framework along with linear regression, Gradient Boosting Machines (GBM), deep learning, and random forests machine learning models from the H2O library have been employed in this paper. The experiments demonstrate the accuracy of the tree-based methods. High levels of parallelism and scalability are the two main strengths of the introduced method. But it has low efficiency for processing large data sets.

Wang et al. (2017) developed a new model for predicting electricity prices based on a combination of some modules. To eliminate redundant features, a hybrid feature selection based on Grey Correlation Analysis (GCA) is proposed with the integration of random forest and Relief-F algorithm. A combination of the Kernel function and Principle Component Analysis (PCA) is also developed for dimensionality reduction. Furthermore, a Differential Evolution (DE) based Support Vector Machine (SVM) classifier is developed for price classification. Based on the obtained numerical results, the superior performance of the proposed technique is revealed in terms of accuracy and time efficiency.

Vu et al. (2020) used a deep learning method to capture the association between the data distribution and the quality of partitioning methods. The presented method executes in two stages including offline training and application. In the training phase, synthetic data are generated based on various distributions, divided using different partitioning techniques, and their quality is measured using different quality criteria. The data set is also summarized using histograms and skewness measures. The deep learning model trained using the data summaries and the quality metrics. The trained model applied to forecast the ideal partitioning technique given a new dataset that needs to be partitioned. The experiments revealed that the introduced method performs better than the baseline method in terms of precision in choosing the best partitioning method.

Huang et al. (2016) designed a parallel ensemble algorithm, Online Sequential Extreme Learning Machine (PEOS-ELM), based on the MapReduce distributed model for large-scale learning. The proposed PEOS-ELM algorithm supports bagging, subspace partitioning, and cross-validation to analyze incremental data. PEOS-ELM performance compared with the original Online Sequential Extreme Learning Machine (OS-ELM). Based on the results, the presented distributed algorithm can process large-scale datasets and performs well in terms of speed and accuracy.

Banchhor & Srinivasu (2020) presented a big data classification model using Cuckoo–Grey wolf based Correlative Naive Bayes classifier and MapReduce Model (CGCNBMRM). In the proposed algorithm, the Correlative Naive Bayes (CNB) classifier is enhanced by using the Cuckoo–Grey Wolf Optimization (CGWO) algorithm. CGWO is developed by a combination of Cuckoo Search (CS) and Grey Wolf Optimizer (GWO) algorithms. Henceforth, the modified CNB classifier improved by the ideal selection of the model parameters. The results proved the effectiveness of big data classification in terms of accuracy, sensitivity, and specificity.

Table 1 displays a comparison of the functional properties of the supervised-learning based big data analytics approaches. This comparison examines scalability, efficiency, precision, and privacy based on the claimed results of the investigated studies. The important factors that have increased with most of the supervised learning-based mechanisms are efficiency and precision. However, scalability and privacy have received less attention from researchers.

Unsupervised learning

Unsupervised learning is used for input data without the corresponding output variable. These algorithms detect hidden patterns in the data. Clustering is one of the major types of unsupervised algorithms. As shown in Fig. 5 , inherent groups in input objects are discovered based on the underlying patterns ( Bengio, Courville & Vincent, 2012 ).

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-488-g005.jpg

Ianni et al. (2020) introduced a parallel version of CLUBS + centroid-based clustering algorithm, named CLUBS-P, for efficient centroid-based clustering. The presented unsupervised algorithm provides high-quality clusters of data around the cluster centroid. The authors examined the performance of the proposed algorithm against the performance of the parallel k-means clustering. The results revealed that the algorithm can achieve high accuracy and high scalability.

Wang, Tsai & Ciou (2020) proposed a hybrid model based on the Recency, Frequency, and Monetary (RFM) model, k-means clustering, Naive Bayes algorithm, and linked Bloom filters to analyze customer data and obtain intelligent strategies. The authors performed some experiments and demonstrated the benefits of big data analytics for marketing strategies and forecasting potential customer demands. Also, linked Bloom filters can store inactive data more efficiently for future use.

Ip et al. (2018) performed an overview of big data and machine learning techniques in the field of crop protection. Furthermore, the capability of utilizing Markov Random Fields (MRF) which considers the spatial component among neighboring sites to model herbicide resistance of ryegrass is examined. The trial results demonstrated the performance of the proposed approach.

Pulgar-Rubio et al. (2017) introduced a novel method, named MEFASD-BD, for subgroup discovery. It is the first big data approximation in evolutionary fuzzy systems for subgroup discovery. MEFASD-BD is implemented based on the MapReduce model under Apache Spark. In this paradigm, the quality of the subgroups obtained for each map is analyzed according to the main dataset to enhance the quality of the subgroups. The presented method can efficiently process high dimensional datasets. The trial outcomes of the study revealed a significant reduction in execution time while maintaining the values in the standard quality.

Table 2 shows the summary of the reviewed techniques as well as their main benefits and drawbacks. The authors focused on increasing the accuracy as the main parameter in all the unsupervised learning-based mechanisms. However, scalability, efficiency, and privacy parameters have attracted lower attention.

Search methods and optimization theory

The search-based methods can be used to find the ideal solutions for a problem. In search-based optimization, ideal decisions made based on some objectives within the given constraints. The search space in a big data environment becomes larger. Therefore, powerful search algorithms need to be developed for large-scale optimization problems ( Azhir et al., 2021 ). The selected methods regarding search-based methods and optimization theory are described in the following.

Alkurd, Abualhaol & Yanikomeroglu (2020) proposed the application of AI, big data analytics, and real-time non-intrusive feedback to personalize wireless networks. The authors proposed a user satisfaction model to enable user feedback measurement. An evolutionary multi-objective formulation optimizes the provided Quality of Service (QoS) and user satisfaction simultaneously. The results proved that personalization enables efficient optimization of network resources. Therefore, user satisfaction and a certain level of revenue in the form of saved resources are achieved.

The data generated from the IoT environments should be processed by analytical applications. However, considering various issues like data size, velocity, and locality, the current infrastructures cannot allocate enough resources to tasks of an application efficiently. Ding et al. (2020) proposed two task allocation methods based on Particle Swarm Optimization (PSO) to enhance resource utilization with an auto-scaling guarantee for batch and stream processing. Various experiments are performed and revealed that the proposed method can increase the efficiency of resource utilization by effectively supporting the offload.

Optimizing the performance of transport protocols is a challenging task for transmitting big data over dedicated channels in High-Performance Networks (HPNs). Yun et al. (2019) proposed ProbData, PRofiling Optimization Based DAta Transfer Advisor, to adjust the number of parallel streams and the buffer size for Transmission Control Protocol (TCP) transmission using stochastic approximation. ProbData used the Simultaneous Perturbation Stochastic Approximation method to recognize the ideal transmission configurations for TCP and UDP-based transport methods. The performance of ProbData is assessed using real-life performance measurements and physical connections in current HPNs. Based on the results, the proposed method can significantly reduce the profiling overhead while achieving good performance.

With the growth of global services, the need for big data analytics in multiple Data Centers (DCs) located in different regions increases. Recent attempts to analyze geo-distributed big data cannot guarantee a predictable job completion time and lead to excessive network traffic over the inter-DC. Li et al. (2016) minimized inter-DC traffic produced by MapReduce jobs by directing geo-distributed big data while predicting job completion time. The authors formulated an optimization problem using the movement of input data and the placement of tasks. Also, the chance-constrained optimization method is applied to guarantee the predictable job completion time. Therefore, the MapReduce job can most likely be performed at a predetermined time. Several simulations have been performed using real traces produced by a series of queries on Hive. According to the trials, the proposed method reduces the inter-DC traffic compared with centralized processing by gathering all data in a single data center.

Managing and evaluating a large set of criteria is challenging in facility layout design problems. Tayal & Singh (2018) proposed a framework by integrating big data analytics and a hybrid meta-heuristic method to design an efficient facility layout over multi-period stochastic demand. First, the factors affecting the design of the facility layout are recognized. Then, using big data analysis, a reduced set of factors is obtained. The reduced set is used to model a weighted aggregate objective for the Multi-Objective Stochastic Dynamic Facility Layout Problem (MO-SDFLP). A combination of Firefly (FA) and Chaos Simulated Annealing (CSA) is applied to solve the MO-SDFLP.

Table 3 shows a comparison of the most important strengths and weaknesses of the discussed mechanisms. According to the results of the reviewed articles, search-based algorithms have high efficiency and achieve high precision results. However, these algorithms are not suitable for large-scale data.

Knowledge-based and reasoning

Knowledge-based and reasoning is one of the major fields of AI. A reasoning system can perform better than a human expert using its knowledge base within a specified domain. Three selected knowledge-based mechanisms are discussed in this section.

Recently, various classifiers have been developed to classify big data. Extended Belief Rule Base (EBRB) systems have shown their capability for big data and multiclass issues. However, time complexity and computing efficiency are two key challenges of BRB methods. Yang et al. (2018) proposed three improvements of EBRB systems to improve the time complexity and computing efficiency for multiclass classification in large data. The proposed method is based on the approach of skipping the rule weight computation, an evidential reasoning algorithm, and a rule reduction method based on domain division. Moreover, parallel rule generation and inference schemes of the proposed classifier are implemented under Apache Spark. Based on the results, the EBRB can obtain good accuracy and have better time complexity and computing efficiency than some popular classifiers.

Recently, context-aware computing has received increasing attention in the IoT and pervasive computing. Context acquisition, context modeling, and context-aware reasoning are three major steps of this method. Although, the development of context-aware applications for reasoning on resource-bounded mobile devices is challenging. Rakib & Uddin (2019) presented a context-aware framework with a lightweight rule engine and a wide range of user preferences to decrease the number of rules while inferring personalized contexts. The authors confirmed that associated rules can be reduced in order to enhance the inference engine efficiency in terms of accuracy, execution speed, total execution time, and execution cost.

Araújo & Pestana (2017) proposed a novel solution to increase employee’s motivation and encouraging them to be more active. It is performed by automatically detecting stressful situations and offering recommendations when identifying a stressful pattern. Two notions of workplace well-being (i.e., physical and social) are aggregated with gamification methods to analyze how it can aid employees to obtain the soft and hard skills to enhance their curriculum.

Table 4 shows a comparison of the most significant benefits and drawbacks of the discussed mechanisms. Generally, the primary drawbacks of knowledge-based and reasoning mechanisms are the problems encountered during knowledge acquisition, as well as adaptability. Besides, these methods cannot be used for large quantities of data.

Decision making algorithms

The aim of decision algorithms is to maximize the expected utility. In these algorithms, the desirability of a state is calculated using a utility function. The agent decides with the aim of maximizing the utility function. The selected decision making-based mechanism is discussed in the following.

Big data analytics applications need to be re-deployed when changes are occurred in the Cloud at runtime. Lu et al. (2017) presented a decision-making solution for selecting the most appropriate deployment for big data analytics applications. First, a new language called DepPolicy is presented to specify runtime deployment information as policies. Then, MiniZinc is developed to model the deployment decision problem as a constraint programming model. Then, a decision-making algorithm is introduced to make various deployment decisions based on total utility maximization while satisfying all given constraints. Finally, a decision making middleware, called DepWare, is applied to deploy the application in the Cloud. The obtained result confirmed the functional correctness, performance and scalability of the proposed method.

Table 5 shows the most significant benefits and drawbacks of the discussed mechanism.

Results and Comparisons

The selected AI-driven big data analysis mechanisms have been surveyed in the previous section. We described the most important AI-driven big data analysis techniques until 2021. As mentioned in the previous sections, machine learning, knowledge-based and reasoning methods, decision-making algorithms, and search methods and optimization theory are four main categories of big data analytics techniques. The main achievements of these techniques are: first, AI drives down the time taken to perform big data analytics. Repetitive tasks can be done with the help of machine intelligence. Reducing the error and enhancing the degree of precision is another advantage of AI-driven big data analytics.

As shown in Fig. 6 , the popular technique that researchers use to analyze big data is supervised learning with 59%. Relevant techniques include regression, ensemble classifier, naive bayes, decision tree, random forest, support vector machine, and neural network. Also, Fig. 7 displays the popularity of the various supervised learning techniques in big data analytics, which clearly shows that neural networks, SVM, and decision trees are the most popular ones.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-488-g006.jpg

Also, we evaluate the parameters which have an impact on the big data analysis based on artificial intelligence approaches. The main features of the studied big data analysis techniques, which include scalability, efficiency, precision, and privacy are provided in Tables 1 – 5 . Based on the claimed results of the investigated articles, the machine learning-based mechanisms focus on improving the accuracy of big data analytics. However, the machine learning-based mechanisms have high complexity and overhead compared with other mechanisms. The search-based methods focus on optimization and efficiency. Also, it suffers from low scalability for large scale data. Also, knowledge-based and reasoning mechanisms have high accuracy. Finally, the investigated decision making algorithm guarantee the scalability, efficiency, and precision metrics.

Figure 8 shows the outcomes of the provided results in Tables 1 – 5 . These outcomes reveal that precision and efficiency are at the center of attention. Scalability is an important parameter that should be considered more in the future. Also, privacy is another challenging research area that is not considered in many big data analysis techniques.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-488-g008.jpg

Open issues and challenges

This part offers some challenges for big data analytics using AI techniques from various perspectives: (1) Fog computing; (2) Processing huge quantities of data; (3) Security; (4) Qualitative parameters and metrics; (5) and Data quality.

  • • Fog computing . The IoT architecture produces large quantities of data that need to be analyzed in real-time. Fog computing is a technology that employs edge devices to provide a considerable quantity of computation, storage, and communication locally. It is recommended that more research should be done for Big IoT data analytics by fog computing structure.
  • • Processing huge quantities of data . Big data is produced from numerous, distributed, and heterogeneous sources and has different features such as high-speed, huge volume, heterogeneity of data formats, incomplete, and inconsistent. Processing an enormous amount of unstructured, inconsistent, incomplete, and imprecise data by computing machines is a challenging task. This data cannot be stored and processed by traditional data processing methods. Various artificial intelligence techniques must be implemented to analyze such huge quantities of data in real-time. Henceforth, the efficiency and scalability of current analytics algorithms being applied to big data must be investigated and improved.
  • • Security . Without a secure way to handle the collected big data from various systems and environments, big data analytics cannot be a reliable system. The security issues of big data analytics should be handled in various fields such as protecting IoT devices from attacks, secure AI techniques, and secure communication with external systems. To the best of our knowledge, there are few studies focusing on the security issues of big data analytics. Investigating security challenges and measures is an interesting research line in the future.
  • • Qualitative parameters and metrics . As studied in this paper, various AI techniques applied to different datasets. The authors used different quality attributes for validation of the presented techniques. Although, the study of big data analytics on the same real-world datasets, with the same techniques and the same experimental infrastructure and their assessment by considering the various quality attributes is very interesting.
  • • Data quality . Big data includes huge volumes of semi-structured and unstructured data, like JSON and text documents. Moreover, more research with a focus on data quality problems for unstructured, and semi-structured data formats is needed.

The state of the art mechanisms in the field of big data analytics is surveyed in this article. According to the performed study, we introduced a taxonomy for AI-driven big data analytics mechanisms. The selected 32 articles are investigated in four main categories including machine learning, knowledge-based and reasoning methods, decision-making algorithms, and search methods and optimization theory. The advantages and disadvantages of each of these mechanisms have been investigated. The machine learning-based mechanisms use a learning method to adapt the automated decisions. Efficiency and precision as the major factors are improved in most of the machine learning-based mechanisms. However, the use of incomplete and inconsistent data may produce incorrect results. The search-based optimization methods used various objective functions to find an optimal solution from a number of alternative solutions. These methods have high efficiency and high precision. Although, these methods are not scalable enough. The knowledge-based and reasoning mechanisms improve the analytics quality using the knowledge base. The major advantage of knowledge-based mechanisms is their relative simplicity of development. Although coverage for different scenarios is lower, whatever scenarios are covered by these mechanisms will provide high accuracy. In decision making algorithms, a decision making problem is modeled as a constraint programming problem and the desirable decision is made using a utility function maximization. These mechanisms have good performance in terms of scalability, efficiency, and precision. Furthermore, this survey introduces some interesting lines for future research.

The data gathered in this paper aid to explain the state-of-the-art in the field of big data analysis. This survey tries to perform a detailed systematic study but also has some limitations. It fails to study big data analysis techniques that are available in different sources. Furthermore, the articles which are not in the context of big data are not entirely investigated. Despite this, the results will help researchers to develop more effective big data analysis methods in big data environments.

Funding Statement

The authors received no funding for this work.

Additional Information and Declarations

The authors declare there are no competing interests.

Amir Masoud Rahmani performed the experiments, prepared figures and/or tables, and approved the final draft.

Elham Azhir and Omed Hassan Ahmed conceived and designed the experiments, performed the experiments, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Saqib Ali conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Mokhtar Mohammadi and Sarkar Hasan Ahmed analyzed the data, prepared figures and/or tables, and approved the final draft.

Marwan Yassin Ghafour analyzed the data, authored or reviewed drafts of the paper, and approved the final draft.

Mehdi Hosseinzadeh conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Data Analytics for the COVID-19 Epidemic

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IMAGES

  1. (PDF) Big Data Analytics

    big data analytics research papers 2020

  2. (PDF) BIG DATA ANALYTICS: LITERATURE STUDY ON HOW BIG DATA WORKS

    big data analytics research papers 2020

  3. Big data analytics, research report

    big data analytics research papers 2020

  4. Big data Analytics and Predictive Analytics in 2020 : StatisticsZone

    big data analytics research papers 2020

  5. (PDF) RESEARCH IN BIG DATA -AN OVERVIEW

    big data analytics research papers 2020

  6. (PDF) ANALYSIS OF BIG DATA

    big data analytics research papers 2020

VIDEO

  1. Data Analytics Introduction

  2. Q&A Most Asked Doubts Regarding Data Analyst In Short

  3. Data Analysis

  4. Become Data Analyst In a Minute ? 🫣 See How

  5. Big Data Analytics Advantages

  6. Big Data Analytics Advantages

COMMENTS

  1. Home page

    The Journal of Big Data publishes open-access original research on data science and data analytics. Deep learning algorithms and all applications of big data are welcomed. Survey papers and case studies are also considered. The journal examines the challenges facing big data today and going forward including, but not limited to: data capture ...

  2. Big data analytics meets social media: A systematic review of

    In this paper, we demonstrate how big data analytics meets social media, and a comprehensive review is provided on big data analytic approaches in social networks to search published studies between 2013 and August 2020, with 74 identified papers. ... The need for an SLR is to identify, classify, and compare the existing research reviews on big ...

  3. Intellectual landscape and emerging trends of big data research in

    The superiority of big data has led to ample research on big data analytics in the hospitality and tourism context. It is thus important to capture the overall intellectual landscape by reviewing extant relevant literature. ... (e.g., Li, Meng and Pan, 2020) - can come from the UGC data. ... Main data type Research theme Example paper ...

  4. A new theoretical understanding of big data analytics capabilities in

    Of the 70 papers satisfying our selection criteria, publication year and type (journal or conference paper) reveal an increasing trend in big data analytics over the last 6 years (Table 6). Additionally, journals produced more BDA papers than Conference proceedings (Fig. 2 ), which may be affected during 2020-2021 because of COVID, and fewer ...

  5. Big data analytics in healthcare: a systematic literature review

    2.1. Characteristics of big data. The concept of BDA overarches several data-intensive approaches to the analysis and synthesis of large-scale data (Galetsi, Katsaliaki, and Kumar Citation 2020; Mergel, Rethemeyer, and Isett Citation 2016).Such large-scale data derived from information exchange among different systems is often termed 'big data' (Bahri et al. Citation 2018; Khanra, Dhir ...

  6. Big Data Analytics: Applications, Challenges & Future Directions

    Big data is concerned with voluminous, complex, highly unstructured data produced from numerous sources. It is expanding at immense rate these days and is a crucial issue to handle and manage the data for the analysis of required information to save both time and cost. The data extracted can be useful for the organization in various aspects. A lot of decisions have to be taken by business ...

  7. Predictive big data analytics for supply chain demand forecasting

    Big data analytics (BDA) in supply chain management (SCM) is receiving a growing attention. This is due to the fact that BDA has a wide range of applications in SCM, including customer behavior analysis, trend analysis, and demand prediction. In this survey, we investigate the predictive BDA applications in supply chain demand forecasting to propose a classification of these applications ...

  8. Big data analytics capabilities: Patchwork or progress? A systematic

    In brief, existing papers have neglected research on BDAC antecedents or restated generic resources from prior works as the majority of papers published in 2020, 2021, and 2022 do not examine this factor. ... Ramadan et al. (2020) "Big data analytics capabilities refer to the firm's ability to recognize and analyze different data sources to ...

  9. Big Data Analytics in Healthcare

    The advent of healthcare information management systems (HIMSs) continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale. Analysis of this big data allows for boundless potential outcomes for discovering knowledge. Big data analytics (BDA) in healthcare can, for instance, help determine causes of diseases, generate ...

  10. Big Data: Current Challenges and Future Scope

    Big Data encompasses huge amounts of raw material which influence multitude of research fields as well as different industries performance such as business, marketing, social network analysis, educational systems, healthcare, IoT, meteorology, fraud detection. It aimed to uncover hidden trends and has prompted a development from a model-driven perspective to a data-driven approach. Among ...

  11. Exploring research trends in big data

    Hence, this article also aims to explore evolving, growing and shrinking topics in big data publications over time. This study applied robust text mining techniques to provide a macro-level perspective of big data research trends. 2. Aims and objectives. Big data are a fast-growing area in academic literature.

  12. PDF Big data visualization and analytics: Future research challenges and

    Particularly, each scientist summarizes his thoughts regarding the following two aspects: − the top future research challenges in Big Data visualiza-tion and analytics. − the top emerging applications in the context of Big Data visualization and analytics. We present their responses in the following sections, while the challenges are ...

  13. Business analytics and big data research in information systems

    For this special issue of the Journal of Business Analytics, we invited about a dozen papers from the track Business Analytics and Big Data at ECIS 2020. They comprised the best reviewed papers, the most suitable topics given the theme of the track and journal, as well as the most engaging discussion at the virtual conference.

  14. Full article: Critical analysis of the impact of big data analytics on

    Research papers related to descriptive analytical approach have implemented mathematical model techniques, data mining techniques, and descriptive statistic techniques. ... L. Xu, P. Dhamija, and Y. Kayikci. 2020. "Big Data Analytics as an Operational Excellence Approach to Enhance Sustainable Supply Chain Performance." Resources ...

  15. Big data analytics and machine learning: A ...

    Initially, a descriptive analysis of the exported '.bib' file from 2006 to 2020 was conducted and is shown in Table 1. Fig. 2 illustrates the distribution of the corpus by article type. Out of 2160 journal papers, 1787 are research articles, 5 are both articles and book chapters, 49 are early access articles, 50 are both articles and conference proceedings, 1 Correction, 134 Editorial ...

  16. Big Data: Big Data Analysis, Issues and Challenges and Technologies

    Here the big data technologies play a crucial role to handle, store, and process this tremendous amount of data in real-time. Big data analytics is used to extract meaningful information or ...

  17. Privacy Prevention of Big Data Applications: A Systematic Literature

    The phrase "Big Data" refers to the vast and ever-increasing volumes of data that might overwhelm an organization (Ur Rehman et al., 2016).It gathers massive, broad, and multi-format data streams from disparate and independent data sources (X. Wu et al., 2014).Big Data is believed to have five properties, which are known as the five V's: volume, velocity, variety, veracity, and valence ...

  18. Research on Data Science, Data Analytics and Big Data

    Abstract. Big Data refers to a huge volume of data of various types, i.e., structured, semi structured, and unstructured. This data is generated through various digital channels such as mobile, Internet, social media, e-commerce websites, etc. Big Data has proven to be of great use since its inception, as companies started realizing its importance for various business purposes.

  19. Big data analytics meets social media: A systematic review of

    The need for an SLR is to identify, classify, and compare the existing research reviews on big data analytics in social networks. In order to show that a comprehensive SLR has not been already proposed, we searched Google Scholar with the following search string: ... and selected 74 papers between 2013 and August 2020, from among 785 papers in ...

  20. Contemporary Issues in Communication, Cloud and Big Data Analytics

    This book contains research papers and articles in the latest topics related to the fields like communication networks, cloud computing, big data analytics, and on various computing techniques. Research papers addressing security issues in above-mentioned areas are also included in the book.

  21. Big Data Research

    The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in … View full aims & scope $2730

  22. Artificial intelligence approaches and mechanisms for big data

    The Internet of Things (IoT) and big data technologies play a vital role to fight against COVID-19 infection. Ahmed et al. (0000) proposed a new framework for analyzing and forecasting COVID-19 using the integration of big data analytics and IoT. The proposed framework is developed based on neural networks.

  23. Data Analytics for the COVID-19 Epidemic

    Abstract: With the spread of COVID-19 worldwide, peo-plej -s production and life have been significantly affected. Artificial intelligence and big data technologies have been vigorously developed in recent years. It is very significant to use data science and technology to help humans in a timely and accurate manner to prevent and control the development of the epidemic, maintain social ...

  24. Interactive Big Data Visualization and Analytics

    Interactive Big Data Visualization and Analytics. David Auber, Nikos Bikakis, +2 authors. Mohamed Sharaf. Published in Big Data Research 1 February 2024. Computer Science. View via Publisher.