• Reference Manager
  • Simple TEXT file

People also looked at

Systematic review article, sentiment analysis of students’ feedback in moocs: a systematic literature review.

www.frontiersin.org

  • 1 Faculty of Technology, Linnaeus University, Växjö, Sweden
  • 2 Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia

In recent years, sentiment analysis (SA) has gained popularity among researchers in various domains, including the education domain. Particularly, sentiment analysis can be applied to review the course comments in massive open online courses (MOOCs), which could enable instructors to easily evaluate their courses. This article is a systematic literature review on the use of sentiment analysis for evaluating students’ feedback in MOOCs, exploring works published between January 1, 2015, and March 4, 2021. To the best of our knowledge, this systematic review is the first of its kind. We have applied a stepwise PRISMA framework to guide our search process, by searching for studies in six electronic research databases (ACM, IEEE, ScienceDirect, Springer, Scopus, and Web of Science). Our review identified 40 relevant articles out of 440 that were initially found at the first stage. From the reviewed literature, we found that the research has revolved around six areas: MOOC content evaluation, feedback contradiction detection, SA effectiveness, SA through social network posts, understanding course performance and dropouts, and MOOC design model evaluation. In the end, some recommendations are provided and areas for future research directions are identified.

Introduction

Recent innovations in digital learning have provided great opportunities to shift learning pedagogies away from conventional lecture methods toward more creative and effective teaching methods. These methods involve learners in collaborative learning and offer open access to course content to a large scale of learners. One such learning method that has received much attention is the Massive Open Online Courses (MOOCs), whose slogan is: “Education for anyone, anywhere, and any time” ( Zemsky, 2014 ). MOOCs are online courses that offer free access via the Web to a huge number of learners around the world. They introduce interactive user forums that support and encourage collaborative learning and active participation of students ( Rabbany et al., 2014 ). Moreover, their spread and popularity are enabling learners to satisfy the learning expectations and needs in an open, engaging and distributed manner ( Littlejohn et al., 2016 ; Dalipi et al., 2017 ). Students’ feedback represents an indispensable source of information that can be used by teachers or educational instructors in order to enhance learning procedures and training activities. The popularity and importance of student’s feedback have increased especially in the COVID-19 pandemic times when most educational institutions have transcended traditional face-to-face learning to online format. However, due to the nature of the language used by students and the large volume of information expressing their points of view and emotions about different aspects in MOOCs forums, dealing with and processing the students’ opinions is a complex task. One way to overcome these challenges is by leveraging the advantages of sentiment analysis and opinion mining techniques.

Sentiment analysis, which is the process of finding sentiment words and phrases that exhibiting emotions, has attracted a lot of research attention recently, especially in the education domain in general and in MOOCs in particular ( Lundqvist et al., 2020 ; Onan, 2021 ). SA systems use natural language processing (NLP) and machine learning (ML) techniques to discover, retrieve, and distill information and opinions from vast textual information ( Cambria et al., 2013 ).

Sentiments can provide a valuable source of information not only for analyzing a student’s behavior towards a course topic, but also for enhancing policies and higher education institutions for their improvement ( Kastrati et al., 2021 ). In this perspective, the past couple of years there has been a trend with increased publications where different sentiment analysis techniques, including NLP, and deep learning (DL), are successfully used for this purpose ( Estrada et al., 2020 ; Zhou and Ye, 2020 ).

The main goal of this paper is to critically evaluate the body of knowledge related to sentiment analysis of students’ feedback in MOOCs, by answering research questions through a stepwise framework for conducting systematic reviews. By exploring the current state of knowledge in the field, we also demonstrated that the knowledge body of educational technology research lacks a comprehensive and systematic review that covers studies about MOOCs learners’ feedback sentiment analysis. Therefore, our study will try to fill these gaps by analyzing and synthesizing research findings to describe state of the art and provide some valuable guidelines for new research and development efforts in the field.

Furthermore, the findings derived from this review can serve as a basis and a guide for future research and teaching practice as MOOC based teaching is becoming one of the approaches that is widely implemented in traditional curriculum and educational practices of many higher education institutions.

The rest of the paper is organized as follows: Methodology describes the search strategy and methodology adopted in conducting the study. Results and Analysis presents the systematic review study results. Themes identified from the investigated papers are described in Discussion . Discussion also outlines recommendations and future research directions for the development of effective sentiment analysis systems. Lastly, final conclusions are drawn in the Conclusion section.

Methodology

For this systematic literature review (SLR) study, the PRISMA guidelines provided in ( Liberati et al., 2009 ) were applied. SLR represents a thorough and comprehensive research method for conducting a literature review in a systematic manner by strictly following well-defined steps. This method is guided by specific research questions; and by being systematic and explicit, it reduces biases in the review process. It also includes applying a structured and stepwise approach and designing a research protocol ( Petticrew and Roberts, 2006 ; Staples and Niazi, 2007 ; Liberati et al., 2009 ; Onwuegbuzie et al., 2012 ). As also reported by Fink (2019) , a systematic literature review is an organized, comprehensive, and reproducible method. Using these definitions, the main purpose of this study was to:

• report on previous research works on sentiment analysis applications in MOOC setting, and

• provide an exhaustive analysis that could serve as a platform for future opportunities and paths for research and implementation in the field.

Having these purposes in mind the paper will identify and report the investigated entities/aspects, the most frequently used bibliographical sources, the research trends and patterns, scenarios, architectures, techniques and the tools used for performing sentiment analysis in MOOC.

The following research questions guide this systematic literature review:

• RQ1 . What are the various techniques, tools, and architectures used to conduct sentiment analysis in MOOCs discussion forums?

• RQ2 . In what scenarios and for what purpose is the sentiment analysis performed in the selected papers?

Search Strategy and Data Collection

The online JabRef ® software facilitated the article search and selection following the PRISMA approach. To ensure that all relevant studies were collected and reviewed, search strategy involved a stepwise approach that consists of four stages. The overall process of search strategy is shown in Figure 1 .

www.frontiersin.org

FIGURE 1 . Implemented PRISMA search methodology.

The first stage entails the development of a research protocol by determining the research questions, defining the search keywords and identifying the bibliographic databases for performing the search. For the search purposes, following online research databases and engines were systematically examined: ACM DL, IEEE Xplore, ScienceDirect, Scopus, SpringerLink, and Web of Science. In total, the first stage yielded 440 articles, and after all the duplicates were removed, it produced a reduced list of 359 articles to be processed and included for the upcoming stage of screening.

The keywords used in this study were driven by the PICO framework, and are shown in Table 1 . PICO (Population, Intervention, Comparison, and Outcomes) is aimed at helping researchers to design a comprehensive set of search keywords for quantitative research in terms of: Population, Intervention, Comparison, Outcome, and Context ( Schardt et al., 2007 ). As suggested by ( Gianni and Divitini, 2015 ), aiming to avoid missing possible relevant articles, a Context section to the PICO schema was added. Table 2 presents the final search keywords associated with PICO(C) used in the study.

www.frontiersin.org

TABLE 1 . PICO(C) driven keywords framing.

www.frontiersin.org

TABLE 2 . Search string (Query).

First, for all the sections of PICO(C) in Table 1 the adequate keywords were identified, followed by the self-constructed search string by applying binary operators, as shown in Table 2 . To ensure that any possible relevant article will not be omitted in the study, a context section was also added as a separate feature.

Screening refers to stage 2 of the search strategy process and involves the application of inclusion criteria. At this stage, the relevant studies were selected based on the following criteria: 1) type of publication needs to be a peer-reviewed journal or conference paper, 2) papers should have been published between 2015 and 2021, and 3) papers should be in English. After applying the mentioned criteria in the search process, out of 359 papers, a total number of 110 records were accepted as relevant studies for further exploration. The authors agreed to encode the data using three different colors: 1) green—papers that passed the screening threshold, 2) red—papers that did not pass the screening threshold, and 3) yellow—papers that the authors were unsure which category to classify them as (green or red). For such papers, a comprehensive discussion between the authors took place, and once a consensus was reached, those papers were classified into either the green or red category. 

In Stage 3, which in Figure 1 corresponds to eligibility, the studies that are explicitly not: 1) within the context of MOOC, 2) considering sentiment analysis were eliminated. At this stage, all the titles, abstracts, and keywords were examined to determine the relevant records for the next stage. After these criteria, only 49 papers were considered eligible for future investigation in the last stage of analysis.

Moreover, after carefully reading and observing the eligible papers, it was found that three out of 49 papers were lacking full text, and another 6 papers were either review papers or were only employing tools, without providing rich information on the algorithmic applications for sentiment analysis. Therefore, those papers were also excluded, which decreased the number of eligible papers to 40.

Limitations

When assessing this systematic literature review, there are several factors that need to be considered, since they can potentially limit the validity of the findings: These factors include:

• Only papers written in English were selected in the study. While searching the research databases, we found related articles in other languages, such as Chinese and Spanish. Those articles are not included.

• The study includes papers collected from the six digital research databases shown in Figure 1 . Thus, we might have potentially missed papers having been indexed in other digital libraries.

• For this study, only peer reviewed journal articles, conferences and book sections are selected. Scientific studies that are not-peer reviewed are not included.

• Only works published between January 1, 2015, and March 4, 2021, are selected in this study. We highlight that there may have been conference papers presented before March 4, 2021, that were not published by the cut-off date for this study and that they were not included in our literature review.

Results and Analysis

After determining the core set of eligible papers, both quantitative and qualitative analysis on the data were performed. In the quantitative approach, data categorization of the findings was performed, based on the publication year, venue, publication type, geographic region of the authors and also data based on techniques, architectures, algorithms and tools. On the other hand, for qualitative analysis, an open coding content analysis method as described in ( Braun and Clarke, 2006 ) was used. This technique comprises two phases: first, reading all papers to extract themes, and second, classifying the identified themes. The Figure 2 below showcases the process of analysis.

www.frontiersin.org

FIGURE 2 . Analysis process of the relevant contributions.

Quantitative Analysis

We conduct quantitative analysis for answering the first research question, dealing with the techniques, tools, and architectures used to conduct sentiment analysis in MOOCs discussion forums. Figure 3 presents the relevant studies distributed according to year and database source. From the figure, it can be observed that the most relevant and selected studies is IEEE Xplore with 13 studies, followed by Scopus with 8 studies. Moreover, as can be seen from Figure 4 , which illustrates the distribution of conference and journal papers, there has been an increasing trend of research works in journals in the last 2 years. During the previous years, most of the studies were published in conferences.

www.frontiersin.org

FIGURE 3 . Distribution of studies in academic databases.

www.frontiersin.org

FIGURE 4 . The number of collected conference and journal papers in 2015–2021.

By observing the country of origin of the first author, most of the works are from Asia with 17 papers, followed by Europe with 10 papers, and North America with 8 papers. In Asia, most of the studies are from China. Figure 5 shows the distribution by country.

www.frontiersin.org

FIGURE 5 . The number of collected papers across different regions/countries of first author.

When it comes to the techniques used to conduct sentiment analysis in MOOCs, they can be categorized mainly into four different groups, namely supervised, unsupervised, lexicon-based approach, and statistical analysis. Table 3 presents papers clustering based on learning approaches (techniques) that the authors have applied. In total, 21 papers used either supervised, unsupervised, and lexicon-based techniques or a combination among the three groups. Nine papers used statistical analysis while the rest of the papers did not explicitly specify the technique.

www.frontiersin.org

TABLE 3 . Papers based on used technique/learning approach.

In Table 4 , the most frequently used supervised learning algorithms are shown. As can be seen, Neural Networks (NN) and Naïve Bayes (NB) were used most often in the reviewed studies, followed by Support Vector Machines (SVM) and Decision Tree (DT) algorithms.

www.frontiersin.org

TABLE 4 . Most frequently used supervised learning algorithms.

Table 5 , lists the use of lexicon-based approaches, which are also known as rule-based sentiment analysis. The most frequently used lexicons among the reviewed articles is VADER (Valence Aware Dictionary and Sentiment Reasoner), followed by TextBlob and SentiWordNet.

www.frontiersin.org

TABLE 5 . Most frequently used lexicons.

Regarding the architecture, ML, DL and NLP were presented in the reviewed articles. Figure 6 illustrates that NLP and DL are most often used starting from 2020 onwards. Hence, NLP is used in seven papers, followed by DL with five papers.

www.frontiersin.org

FIGURE 6 . Distribution of architectures during 2015–2021.

Figure 7 below shows the findings reviewed in the study with respect to the most frequently used packages, tools, libraries, etc. for the sentiment analysis task in MOOCs.

www.frontiersin.org

FIGURE 7 . Tools/packages/libraries/used for sentiment analysis in the reviewed papers.

As presented in the figure, the most popular solution to conduct sentiment analysis is R and was used in four studies. Next, NLTK was the second most used platform. On the other hand, StanfordNLP, NLTK, spaCY, edX-CAS, WAT and TAALES represent the second category of most used solutions, each of them appearing in two different articles. The third group is composed of a variation of solutions which appear only once across the reviewed articles.

Qualitative Analysis

To answer the second research question, the process continued with the strategy described by ( Braun and Clarke, 2006 ). This encompasses an inductive thematic approach to identify common themes identified in every article. This process involves six phases: familiarizing with data , generating initial codes , searching for themes , theme review , defining themes and naming themes . Familiarization with the literature was reached during screening. The authors then inductively coded the manuscripts. The codes were collected in an Excel file to prepare for the upcoming steps. Further, the codes were grouped and consolidated in order to find and identify themes. Upon final agreement of themes and their definitions, a narrative through independent and collaborative writing and reviewing was built, following the recommendations from ( Lincoln and Guba, 1985 ; Creswell and Miller, 2000 ). The overall process resulted in 6 themes, each discussed in detail in the discussion section. A summary of this assessment is presented in Table 6 .

www.frontiersin.org

TABLE 6 . Summary of identified themes.

In this section, the types and trends of research conducted within each of the previously identified themes are explored and discussed. Finally, recommendations and suggestions for addressing the identified challenges are provided.

MOOC Content Evaluation

In order to create relevant and useful insights for MOOC content development, course designers and learning analytics experts need to process and analyze a complex set of unstructured learner-generated data from MOOC discussion forums. The course content evaluations via sentiment analysis approaches can provide substantial indications to instructional designers and teachers to periodically evaluate the courses and introduce potential improvements.

In a study with a small sample of 28 students, the learners had a positive attitude and perception towards the quality of MOOC content (88.6%). Moreover, the text-mining based evaluation of the content conducted on the study also confirmed a high satisfaction on MOOC content. Here, the positive features included “interesting,” “easy,” and “duration of video is appropriate” ( Au et al., 2016 ).

( Dina et al., 2021 ) explored the performance of a quantitative (SA based) model to measure the user preferences regarding the course content. The sentiment analysis classification has been done using Support Vector Machine. The accuracy, precision, recall, and F1 score were above 80%. Some of the positive features produced by this model were “course-good,” “course-interesting, ”“course-easy,” “course-understand,” “course-recommended,” and “material-good.” In another case study, a learner decision journey framework was proposed to analyze the MOOC content development, to understand the circular learning process, and to generate further insights for course improvements ( Lei et al., 2015 ). The study showed the presence of posts with significant positive sentiment scores during the entire course, meaning that learners were positive towards the content and also in completing the course.

An application framework of an intelligent system for learner emotion recognition on MOOCs was proposed by ( Liu et al., 2017a ), where obtaining the learners’ emotion-topic feedback about content proved to be instrumental for teachers to analyze and improve their teaching pedagogy. Furthermore, an analysis of sentiments of MOOC learners’ posts via deep learning approach was conducted by ( Li et al., 2019 ). The experiments in this study revealed that the approach could be effectively used to identify content related problems and to improve educational outcomes. In contrast to lexicon-based approaches, which were also evaluated in the study, deep learning models could further reduce the consumption of constructing sentiment dictionaries, among others.

Review (Feedback) Contradiction Analysis

Although the learner-generated reviews and opinions have great practical relevance to educators and instructional designers, sometimes, learners’ comments tend to be contradictory (positive vs. negative), which creates difficulties for teachers to understand them. One possible explanation for such a contradiction is that MOOC learners are quite heterogeneous with different educational backgrounds, knowledge, and motivations ( Nie and Luo, 2019 ). However, the large-scale comments, negative opinions and emotions in particular, can spread faster than positive ones ( Pugh, 2001 ), and these could lead to dropouts. Only three studies were found to be focused on the contradiction analysis of MOOC reviews ( Badache and Boughanem, 2014 ; Liu et al., 2017a ; Kastrati et al., 2020 ).

An experimental study on the detection of contradictory reviews in Coursera based on the sentiment analysis around specific aspects was conducted by ( Badache and Boughanem, 2014 ). Before extracting particular aspects according to the distribution of the emotional terms, the reviews were first grouped according to the session. Further, the polarity of each review segment holding an aspect was identified. The results of experiments with 2,244 courses and 73, 873 reviews revealed the effectiveness of the proposed approach towards isolating and quantifying contradiction intensity. Another aspect-based sentiment analysis framework tested and validated in Coursera dataset was proposed by ( Kastrati et al., 2020 ). Researchers have achieved a high-performance score (F1 = 86.13%) for aspect category identification, which demonstrates the reliability and the comprehensiveness of the proposed framework.

Some other scholars also recommended a generative probabilistic model that extends Sentence-LDA (Latent Dirichlet Allocation) to explore negative opinions in terms of pairs of emotions and topics ( Liu et al., 2017a ). With this model, the detection precision of negative topics reached an acceptable accuracy rate of (85.71%). The negative comments were mainly revolving around learning content, online assignments and course certificates.

SA Effectiveness

The effectiveness evaluation of sentiment analysis models was a key focus of much of the reviewed papers, especially those published after 2019. This could be due to the recent trends of making datasets available and the goals of the MOOC providers, because sentiment analysis techniques can shed more light towards improving enrollment and learning experience. During the period of 2015 and 2016, most of the works utilized the clustering models to group similar MOOC discussion forum posts, along with topic modeling to capture the topical themes ( Ezen-Can et al., 2015 ). The main reason behind some works was also to increase satisfaction of teachers who themselves attend MOOCs to support their own professional development ( Koutsodimou and Jimoyiannis, 2015 ; Holstein and Cohen, 2016 ).

However, most of the identified research papers that evaluated the effectiveness of the sentiment analysis models were published during 2019 and 2020 ( Cobos et al., 2019a ; Cobos et al., 2019b ; Yan et al., 2019 ; Capuano and Caballé, 2020 ; Capuano et al., 2020 ; Estrada et al., 2020 ; Hew et al., 2020 ; Onan, 2021 ). ( Cobos et al., 2019a ; Cobos et al., 2019b ) compared and measured the evaluation effectiveness of machine learning (SVM, NB, ANN) and NLP approaches (VADER, TextBlob) to extract features and perform text analysis. Their prototype was based on a content analyser system for edX MOOCs. Another group of researchers conducted a relevant study by applying unsupervised natural language processing techniques to explore students’ engagement in Coursera MOOCs ( Yan et al., 2019 ). Further, they evaluated the performance of LDA, LSA (Latent Semantic Analysis) and topic modelling to discover the emerging topics in discussion forums and to investigate the sentiments associated with the discussions.

After 2019, along with the machine learning and natural language processing techniques ( Hew et al., 2020 ), researchers started to use and measure the effectiveness of deep learning architectures for sentiment analysis on MOOCs that exhibit an improved performance compared to conventional supervised learning methods ( Capuano and Caballé, 2020 ; Capuano et al., 2020 ; Estrada et al., 2020 ; Onan, 2021 ). The most widely used deep learning approaches by researchers are CNN (Convolutional Neural Networks), LSTM (Long Short-Term Memory), BERT (Bidirectional Encoder Representations from Transformers), and RNN (Recurrent Neural Networks).

SA Through Social Networks Posts

The research has demonstrated that social networking sites can significantly impact the interaction of learners with courses ( Georgios Paltoglou, 2012 ). With the growing popularity of social networking, sentiment analysis has been used with social networks and microblogging sites, especially Twitter or blogs ( Hong and Skiena, 2010 ; Miller et al., 2011 ). However, the nature and the structure of the texts published in social networks is largely scattered and unstructured. Therefore, many researchers have adopted various social media mining approaches to investigate the sentiments of Twitter messages related to MOOC learning ( Shen and Kuo, 2015 ; Buenaño-Fernández et al., 2017 ). The main goal of these studies was to explore the students’ tweets (positive and negative) about the course, and to evaluate instructors and the educational tools used in the course. ( Lundqvist et al., 2020 ) employed sentiment analysis to investigate the online comments of MOOCs where VADER (Valence Aware Dictionary for sEntiment Reasoning) sentiment algorithm was used. Sentiment ratings from 90,000 social media based posts are included in VADER. From all analyzed comments, it was revealed that there exists a correlation between sentiments of the posts and the feedback provided about the MOOC. Moreover, 78% of students were positive towards the MOOC structure. Almost all identified papers were using Twitter to explain the insights of MOOCs from social media platforms. Future invastigations may also consider other platforms, such as Facebook or Youtube and compare with findings obtained for Twitter.

Understanding Course Performance and Dropouts

The major challenge of MOOCs is the massive dropout or retention ( Chen et al., 2020 ). In parallel with the factors, like demographic characteristics, interaction, self-reported motivation, and commitment attitudes, this paper stresses that learners’ lack of self-regulation might create cliffhangers that should be instantly conquered to benefit from the MOOCs.

The best way to predict the prospective dropouts is to analyse the reactions within SA and to extract those keywords that reveal that the dropouts are predominantly related with the course performance. Such analysis was performed in more detail in five of the eligible papers ( Crossley et al., 2015 ; Dowell et al., 2015 ; Crossley et al., 2016 ; Lubis et al., 2016 ; Nissenson and Coburn, 2016 ), showing that many researchers have been intrigued by the poorer course performance and decreased interest to persist in the course. Three of them concentrate on the discussion forums ( Crossley et al., 2015 ; Dowell et al., 2015 ; Crossley et al., 2016 ). While ( Crossley et al., 2015 ) embraces the language used in the discussion forums as a predictive feature of successful class completion, ( Crossley et al., 2016 ), also examines the online clickstream data and the language. The last one ( Dowell et al., 2015 ) additionally examines the social position of learners as they interact in a MOOC. Last two papers that investigate the language to understand learner’s performance and dropouts are mainly focused on the attributes that contribute towards predicting the successful course completion ( Lubis et al., 2016 ), ( Nissenson and Coburn, 2016 ). They both extract the attributes that exhibit learners’ satisfaction only, rather than those factors that might suppress learners from continuing their studies in the MOOCs. ( Lubis et al., 2016 ) is even more optimistic, and never explicitly mentions dropouts. This is extremely good news, knowing that the analysis was done over 20,000 reviews crawled from class central websites containing 1900 topics.

The general objective of this cluster of papers is to analyse the sentiment analysis by examining the language used in it. Depending on the research hypotheses in them, the attributes used to explore learners’ opinion vary from moderately pessimistic to very optimistic. Undoubtedly, several more papers implementing the same approach will contribute to increasing the impact of MOOCs on education and minimizing the risk of premature retention.

MOOC Design Model Evaluation

As elaborated in the MOOC content evaluation , the evaluation of MOOC content is crucial for the evolution of the MOOCs, since it determines and proposes the necessary improvements that are inevitable to extend MOOCs lifecycle. Quite unexpectedly, several of the surveyed papers suggested an improvement of the design model, as a complementary element that is essential to keep the MOOC active and prosperous. In the first place, they notice that there are many differences of the language used for MOOC supported online and real classes ( Rahimi and Khosravizadeh, 2017 ). The distinction is done including both, the text and the speech analysis. More profoundly, the research in ( Qi and Liu, 2021 ) proposes LDA for mining of the student generated reviews with an ultimate aim to objectively and accurately evaluate the indicators providing reliable references for both, the students and the educators. Based on the established means for text mining of sentiment analysis and the profound processing of the results, reorganization of the model can start. The strategy is proposed in ( Lee et al., 2016 ). By introducing 11 design criteria for organization of the model, this paper examines the MOOC characteristics and their impact on satisfaction of instructor and learner.

The last two papers from this cluster are topic specific. ( Liu, 2016 ) explores a new model based on English for Specific Purposes for the course of metallurgical English. To strengthen the approach, authors suggest a symbiosis between MOOCs and flipped classrooms, in the light of the course purpose, content, teaching organization and finally, teachers’ evaluation. By making the synergy between both teaching methodologies, they believe that the course will significantly advance. ( O’Malley et al., 2015 ) goes one step forward, it suggests a reconstruction of MOOCs into a virtual laboratory using video and simulations. This is an outgoing project, intended to adapt online delivery format for a campus-based first year module on Physical chemistry at University of Manchester. The experience of merging MOOC with a virtual laboratory proved its efficiency. Improvement of the content needs an improvement of the design model.

On many occasions, the improvement of a product means an improvement of the technology that enables it. The last theme of this survey proves this claim. It can be done by adding new features, such as the flipped classroom ( Liu, 2016 ) and the virtual laboratory ( O’Malley et al., 2015 ). This extension should be done steadily and carefully to avoid the risks of ruining the product. To enable the extensions, it is inevitable to maintain the existing features. They can be assessed by implementing the design criteria ( Lee et al., 2016 ). However, all the improvements must be appreciated by their end users, the learners and the teachers. The evaluation includes SA performed using the techniques proposed in ( Qi and Liu, 2021 ; Rahimi and Khosravizadeh, 2017 ). The last, but not the least is to support the philosophy of continuous improvement. This returns the sentiment analysis to the first theme: MOOC content evaluation, and then continues with all the remaining themes, creating a never-ending lifecycle for evaluation of MOOCs.

Recommendations and Future Research Avenues

When considering the MOOC content evaluation of the relevant studies documented in our reviewed sample, overall, there is a favorable rating of course content among learners. As can be seen from the above discussion, most research on MOOC content evaluation is focused on the learner feedback, however, future scholars could also consider investigating the teacher’s feedback/perspective towards the content development, teaching pedagogy, experience, and assessment, among others. Moreover, it would be also interesting to consider exploring the results provided by sentiment analysis techniques in collaboration with the instructors of the MOOC course to know if their proposed materials could be improved.

Throughout the reviewed papers, imbalanced datasets with underrepresented categories were evidenced. Therefore, a recommendation for researchers to achieve performance improvement is by applying data augmentation techniques. Classifier performance can be improved by adopting more advanced word representation approaches like contextualized embeddings as well as classical NLU (Natural Language Understanding) techniques, such as part-of-speech, parsing, etc.

Furthermore, exploring the relationship between polarity markers and other feeling labels or emotions could be beneficial towards better identification and addressing of the issues related towards the target subject, as has been studied in many relevant text-based emotion detection works ( Acheampong et al., 2020 ).

A considerable number of reviewed papers failed to report on how the results were standardized in terms of participant numbers and characteristics, course subject and context, accuracy, and metrics of SA approaches. Hence, we consider that a special focus should be placed towards enhancing the transparency of the research results. This could be beneficial and advantageous to other researchers when conducting comparative performance analysis between various sentiment analysis approaches.

Some of the studies related to recognition of polarities and emotions in MOOCs are conducted in laboratory settings and utilize a limited set of algorithmic solutions and techniques. However, more standardized investigations are needed to be conducted with students using more algorithms with different configurations of hyper-parameters and layers. This way, standardization will contribute to assuring the quality, safety, and reliability of the solutions and techniques designed for sentiment analysis in MOOC learning environments. In addition, there is also a lack of standardized datasets available for the evaluation of sentiment analysis models in MOOCs. Most of the researchers have used publicly available datasets of Coursera, edX, FutureLearn, and even datasets from their own institutions ( Ezen-Can et al., 2015 ; Moreno-Marcos et al., 2018 ; Cobos et al., 2019a ; Yan et al., 2019 ; Estrada et al., 2020 ; Lee et al., 2020 ). The absence of standardized datasets plays a negative role when benchmarking or comparing algorithmic solutions of different researchers. It is also worth mentioning that researchers used datasets from predominantly computer science courses to evaluate and explore sentiment analysis of students’ feedback in MOOCs ( Moreno-Marcos et al., 2018 ; Estrada et al., 2020 ; Lee et al., 2020 ; Lundqvist et al., 2020 ). Thus, the research is mainly limited to one academic field.

It was also observed that the reviewed research papers have not taken into consideration different types of MOOCs, such as cMOOCs, xMOOCs, or sMOOCs. In the future, sentiment analysis of students’ feedback should also consider different types of MOOCs.

In addition, if enough suitable (standardized) datasets could have been available, it can be interesting to introduce more meticulous RQs and to try a meta-analysis, or even an advanced systematic quantitative literature review, involving more complex statistical operations. This could, however, serve as an insightful idea for a future work.

Although introduced almost 75 years ago, sentiment analysis has recently become a very popular tactic for gathering and mining the subjective information from end users of various services. Implementing popular NLP, statistical and ML techniques, sentiment analysis grows into a cost-effective tool to distil the sentiment patterns that reveal the potential challenges of the existing services, and at the same time, identify new opportunities and improvements. Its extensive implementation contributed to increased accuracy and efficiency wherever it was used.

The use of sentiment analysis techniques to understand students’ feedback in MOOCs represents an important factor to improve the learning experience. Moreover, sentiment analysis can be also applied to improve teaching by analyzing the learners’ behavior towards courses, platforms, and instructors.

To evaluate these claims, a PRISMA directed systematic review of the most recent and more influential scholar publications has been done. The review has gone through an exhaustive quantitative and qualitative stepwise filtering of the initial corpus existing of 440 articles that fulfilled the search criteria associated with PICO(C). Together with the briefly introduces methodology, search strategy and data selection, the authors have also tackled the potential limitations of the proposed approach. After these introductory sections, the paper thoroughly presents the quantitative results for 40 relevant papers, starting from the process of analysing relevant contributions, their contribution in academic databases and annual and geographical distribution, then makes an overview of the implemented sentiment analysis technique and supervised learning algorithms and lexicons, to end up with the distribution of architectures, tools/packages/libraries/used for sentiment analysis in the reviewed papers. It is worth mentioning that from 2019 onwards researchers have started to apply deep learning in combination with NLP approaches to analyze the sentiments of students’ comments in MOOCs.

Qualitative analysis identified the following six major themes being used in the reviewed papers: MOOC content evaluation, review (feedback) contradiction analysis, SA effectiveness, SA through social networks posts, understanding course performance and dropouts, and MOOC design model evaluation. As part of this analysis, each theme was carefully presented and illustrated with the corresponding filtered references that fulfil all the criteria.

We believe that this work could be a good inspiration for future research, and that will provide readers with interesting information in a wide context about the current trends, challenges, and future directions in the field.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author Contributions

All authors listed have agreed on the design of this study and have performed literature reading and relevant papers’ review. Project administration, methodology, data abstraction, processing and analysis are conducted by the FD. FD and KZ have contributed to writing and editing of the original draft. FA was involved in reading, editing, and providing constructive feedback for the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acheampong, F. A., Wenyu, C., and Nunoo-Mensah, H. (2020). Text-based Emotion Detection: Advances, Challenges, and Opportunities. Eng. Rep. 2, e12189. doi:10.1002/eng2.12189

CrossRef Full Text | Google Scholar

Au, C. H., Lam, K. C. S., Fung, W. S. L., and Xu, X. (2016). “Using Animation to Develop a MOOC on Information Security,” in 2016 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) , December 4-7, 2016 , Bali , 365–369. doi:10.1109/IEEM.2016.7797898

Badache, I., and Boughanem, M. (2014). “Harnessing Social Signals to Enhance a Search,” in Procee- dings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) , August 11-14, 2014 , Washington, DC, USA , 303–309. WI-IAT ’14. doi:10.1109/wi-iat.2014.48

Braun, V., and Clarke, V. (2006). Using Thematic Analysis in Psychology. Qual. Res. Psychol. 3 (2), 77–101. doi:10.1191/1478088706qp063oa

Buenaño-Fernández, D., Luján-Mora, S., and Villegas-Ch, W. (2017). “Application of Text Mining on Social Network Messages about a MOOC,” in ICERI2017 Proceedings , November 16-18, 2017 , Seville, Spain , 6336–6344.

Google Scholar

Cambria, E., Schuller, B., Xia, Y., and Havasi, C. (2013). New Avenues in Opinion Mining and Sentiment Analysis. IEEE Intell. Syst. 28 (2), 15–21. doi:10.1109/mis.2013.30

Capuano, N., and Caballé, S. (2020). “Multi-attribute Categorization of MOOC Forum Posts and Applications to Conversational Agents,” in Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2019 . Lecture Notes in Networks and Systems . Editors L. Barolli, P. Hellinckx, and J. Natwichai (Cham: Springer ), 96, 505–514. doi:10.1007/978-3-030-33509-0_47

Capuano, N., Caballé, S., Conesa, J., and Greco, A. (2020). Attention-based Hierarchical Recurrent Neural Networks for MOOC Forum Posts Analysis. J. Ambient Intell. Hum. Comput 2020, 1–13. doi:10.1007/s12652-020-02747-9

Chen, C., Sonnert, G., Sadler, P. M., Sasselov, D. D., Fredericks, C., and Malan, D. J. (2020). Going over the Cliff: MOOC Dropout Behavior at Chapter Transition. Distance Educ. 41 (1), 6–25. doi:10.1080/01587919.2020.1724772

Cobos, R., Jurado, F., and Villén, Á. (2019a). “Moods in MOOCs: Analyzing Emotions in the Content of Online Courses with edX-CAS,” in 2019 IEEE Global Engineering Education Conference (EDUCON) , April 9-11, 2019 , Dubai, UAE . doi:10.1109/educon.2019.8725107

Cobos, R., Jurado, F., and Blázquez-Herranz, A. (2019b). A Content Analysis System that Supports Sentiment Analysis for Subjectivity and Polarity Detection in Online Courses. IEEE R. Iberoam. Tecnol. Aprendizaje 14, 177–187. doi:10.1109/rita.2019.2952298

Creswell, J. W., and Miller, D. L. (2000). Determining Validity in Qualitative Inquiry. Theor. into Pract. 39 (3), 124–130. doi:10.1207/s15430421tip3903_2

Crossley, S., McNamara, D. S., Baker, R., Wang, Y., Paquette, L., Barnes, T., and Bergner, Y. (2015). “Language to Completion: Success in an Educational Data Mining Massive Open Online Class,” in Proceedings of the 7th Annual Conference on Educational Data Mining [EDM2015] , June 26-29, 2015 , Madrid, Spain .

Crossley, S., Paquette, L., Dascalu, M., McNamara, D. S., and Baker, R. S. (2016). “Combining Click-Stream Data with NLP Tools to Better Understand MOOC Completion,” in Proceedings of the Sixth International Conference on Learning Analytics and Knowledge (New York: ACM ), 6–14. LAK ’16. doi:10.1145/2883851.2883931

Dalipi, F., Kurti, A., Zdravkova, K., and Ahmedi, L. (2017). “Rethinking the Conventional Learning Paradigm towards MOOC Based Flipped Classroom Learning,” in Proceedings of the 16th IEEE International Conference on Information Technology Based Higher Education and Training (ITHET) , July, 10-12 2017 , Ohrid, North Macedonia , 1–6. doi:10.1109/ITHET.2017.8067791

Dina, N., Yunardi, R., and Firdaus, A. (2021). Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation Massive Open Online Course. Int. J. Emerging Tech. Learn. (Ijet) 16 (1), 134–151. doi:10.3991/ijet.v16i01.17095

Dowell, N. M., Skrypnyk, O., Joksimovic, S., et al. (2015). “Modeling Learners’ Social Centrality and Performance through Language and Discourse,” in Proceedings of the 8th International Conference on Educational Data Mining , June, 26-29 2015 , Madrid, Spain , 250–257.

Barrón Estrada, M. L., Zatarain Cabada, R., Oramas Bustillos, R., Graff, M., and Raúl, M. G. (2020). Opinion Mining and Emotion Recognition Applied to Learning Environments. Expert Syst. Appl. 150, 113265. doi:10.1016/j.eswa.2020.113265

Ezen-Can, A., Boyer, K. E., Kellogg, S., and Booth, S. (2015). “Unsupervised Modeling for Understanding MOOC Discussion Forums: a Learning Analytics Approach,” in Proceedings of the International Conference on Learning Analytics and Knowledge (LAK’15) , June, 26-29 2015 , Madrid, Spain .

Fink, A. (2019). Conducting Research Literature Reviews: From the Internet to Paper . Fifth edition. UCLA, California: Sage Publications .

Georgios Paltoglou, M. T. (2012). Twitter, MySpace, Digg: Unsupervised Sentiment Analysis in Social Media. ACM Trans. Intell. Syst. Technol. 3 (4), 1–9. doi:10.1145/2337542.2337551

Gianni, F., and Divitini, M. (2015). Technology-enhanced Smart City Learning: a Systematic Mapping of the Literature. Interaction Des. Architect.(s) J. - IxD&A, N. 27, 28–43.

Hew, K. F., Hu, X., Qiao, C., and Tang, Y. (2020). What Predicts Student Satisfaction with MOOCs: A Gradient Boosting Trees Supervised Machine Learning and Sentiment Analysis Approach. Comput. Educ. 145, 103724. doi:10.1016/j.compedu.2019.103724

Holstein, S., and Cohen, A. (2016). The Characteristics of Successful MOOCs in the Fields of Software, Science, and Management, According to Students' Perception. Ijell 12, 247–266. doi:10.28945/3614

Hong, Y., and Skiena, S. (2010). “The Wisdom of Bookies? Sentiment Analysis vs. The NFL point Spread,” in Proceedings of the International Conference on Weblogs and Social Media (IcWSm-2010) , May 23-26, 2010 , Washington DC, USA , 251–254.

Kastrati, Z., Imran, A. S., and Kurti, A. (2020). Weakly Supervised Framework for Aspect-Based Sentiment Analysis on Students' Reviews of MOOCs. IEEE Access 8, 106799–106810. doi:10.1109/access.2020.3000739

Kastrati, Z., Dalipi, F., Imran, A. S., Pireva Nuci, K., and Wani, M. A. (2021). Sentiment Analysis of Students' Feedback with NLP and Deep Learning: A Systematic Mapping Study. Appl. Sci. 11, 3986. doi:10.3390/app11093986

Koutsodimou, K., and Jimoyiannis, A. (2015). “MOOCs for Teacher Professional Development: Investigating Views and Perceptions of the Participants,” in Proceedings of the 8th international conference of education, research and innovation – ICERI2015, Seville, Spain ( IATED ), 6968–6977.

Lee, G., Keum, S., Kim, M., Choi, Y., and Rha, I. (2016). “A Study on the Development of a MOOC Design Model,” in Educational Technology International (Korea: Seoul National University ), 17, 1–37.1

Lee, D., Watson, S. L., and Watson, W. R. (2020). The Relationships between Self-Efficacy, Task Value, and Self-Regulated Learning Strategies in Massive Open Online Courses. Irrodl 21 (1), 23–39. doi:10.19173/irrodl.v20i5.4389

Lei, C. U., Hou, X., Kwok, T. T., Chan, T. S., Lee, J., Oh, E., and Lai, C. (2015). “Advancing MOOC and SPOC Development via a Learner Decision Journey Analytic Framework,” in 2015 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE) , December 10–15, 2015 , Zhuhai, China , ( IEEE ), 149–156. doi:10.1109/tale.2015.7386034

Li, X., Zhang, H., Ouyang, Y., Zhang, X., and Rong, W. (2019). “A Shallow BERT-CNN Model for Sentiment Analysis on MOOCs Comments,” in 2019 IEEE International Conference on Engineering, Technology and Education (TALE) , December 10-13, 2019 , Yogyakarta, Indonesia , ( IEEE ), 1–6. doi:10.1109/tale48000.2019.9225993

Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gotzsche, P. C., Ioannidis, J. P. A., et al. (2009). The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies that Evaluate Healthcare Interventions: Explanation and Elaboration. BMJ 339, b2700. doi:10.1136/bmj.b2700

PubMed Abstract | CrossRef Full Text | Google Scholar

Lincoln, Y., and Guba, E. (1985). Naturalistic Inquiry . California: Sage Publications .

Littlejohn, A., Hood, N., Milligan, C., and Mustain, P. (2016). Learning in MOOCs: Motivations and Self-Regulated Learning in MOOCs. Internet Higher Educ. 29, 40–48. doi:10.1016/j.iheduc.2015.12.003

Liu, Z., Zhang, W., Sun, J., et al. (2017a). “Emotion and Associated Topic Detection for Course Comments in a MOOC Platform,” in IEEE International Conference on Educational Innovation Through Technology , September 22-24, 2016 , Tainan, Taiwan .

Liu, Z., Yang, C., Peng, X., Sun, J., and Liu, S. (2017b). “Joint Exploration of Negative Academic Emotion and Topics in Student-Generated Online Course Comments,” in Proceedings of the International Conference of Educational Innovation through Technology (EITT) , Osaka, Japan , 7–9 December 2017 , 89–93. doi:10.1109/eitt.2017.29

Liu, D. (2016). The Reform and Innovation of English Course: A Coherent Whole of MOOC, Flipped Classroom and ESP. Proced. - Soc. Behav. Sci. 232, 280–286. doi:10.1016/j.sbspro.2016.10.021

Lubis, F. F., Rosmansyah, Y., and Supangkat, S. H. Experience in Learners Review to Determine Attribute Relation for Course Completion (2016). In Proceedings of the International Conference on ICT For Smart Society (ICISS) , Surabaya, Indonesia , 20–21 July 2016 ; pp. 32–36. doi:10.1109/ictss.2016.7792865

Lundqvist, K., Liyanagunawardena, T., and Starkey, L. (2020). Evaluation of Student Feedback within a MOOC Using Sentiment Analysis and Target Groups. Irrodl 21 (3), 140–156. doi:10.19173/irrodl.v21i3.4783

Martínez, G., Baldiris, S., and Salas, D. (2019). “The Effect of Gamification in User Satisfaction, the Approval Rate and Academic Performance,” in International Symposium on Emerging Technologies for Education (Cham: Springer ), 122–132.

Miller, M., Sathi, C., Wiesenthal, D., Leskovec, J., and Potts, C. (2011). “Sentiment Flow through Hyperlink Networks,” in Proceedings of the Fifth International Conference on Weblogs and Social Media , July 17-21, 2011 , Barcelona, Spain , 550–553.

Moreno-Marcos, P. M., Alario-Hoyos, C., Muñoz-Merino, P. J., Estévez-Ayres, I., and Kloos, C. D. (2018). “Sentiment Analysis in MOOCs: A Case Study,” in 2018 IEEE Global Engineering Education Conference (EDUCON) , April 17-20, 2018 , Santa Cruz de Tenerife, Spain , ( IEEE ), 1489–1496.

Nie, Y., and Luo, H. (2019). “Diagnostic Evaluation of MOOCs Based on Learner Reviews: The Analytic Hierarchy Process (AHP) Approach,” in Blended Learning: Educational Innovation for Personalized Learning. ICBL 2019 . Lecture Notes in Computer Science . Editors S. Cheung, L. K. Lee, I. Simonova, T. Kozel, and LF. Kwok, vol, 11546. doi:10.1007/978-3-030-21562-0_24

Nissenson, P. M., and Coburn, T. D. (2016). “Scaling-up a MOOC at a State University in a Cost-Effective Manner,” in Proceedings of the 2016 American Society for Engineering Education Annual Conference & Exposition , June 26-29, 2016 , New Orleans, USA , 26–29.

O’Malley, P. J., Agger, J. R., and Anderson, M. W. (2015). Teaching a Chemistry MOOC with a Virtual Laboratory: Lessons Learned from an Introductory Physical Chemistry Course. J. Chem. Educ. 92 (10), 1661–1666. doi:10.1021/acs.jchemed.5b00118

Onan, A. (2021). Sentiment Analysis on Massive Open Online Course Evaluations: A Text Mining and Deep Learning Approach. Comput. Appl. Eng. Educ. 29, 572–589. doi:10.1002/cae.22253

Onwuegbuzie, A., Leech, N., and Collins, K. (2012). Qualitative Analysis Techniques for the Review of the Literature. Qual. Rep. 17 (56), 1–28.

Petticrew, M., and Roberts, H. (2006). Systematic Reviews in the Social Sciences . Oxford: Blackwell Publishing .

Pugh, S. D. (2001). Service with a Smile: Emotional Contagion in the Service Encounter. Amj 44 (5), 1018–1027. doi:10.5465/3069445

Qi, C., and Liu, S. (2021). Evaluating On-Line Courses via Reviews Mining. IEEE Access 9, 35439–35451. doi:10.1109/access.2021.3062052

Rabbany, R., Elatia, S., Takaffoli, M., and Zaïane, O. R. (2014). “Collaborative Learning of Students in Online Discussion Forums: A Social Network Analysis Perspective,” in Educational Data Mining (Cham: Springer ), 441–466. doi:10.1007/978-3-319-02738-8_16

Rahimi, A., and Khosravizadeh, P. (2018). A Corpus Study on the Difference between MOOCs and Real Classes. BRAIN . Broad Res. Artif. Intelligence Neurosci. 9 (1), 36–43.

Sa'don, N. F., Alias, R. A., and Ohshima, N. (2014). “Nascent Research Trends in MOOCs in Higher Educational Institutions: A Systematic Literature Review,” in 2014 International Conference on Web and Open Access to Learning (ICWOAL) , November 25-27, 2014 , Dubai, UAE , ( IEEE ), 1–4. doi:10.1109/icwoal.2014.7009215

Schardt, C., Adams, M. B., Owens, T., Keitz, S., and Fontelo, P. (2007). Utilization of the PICO Framework to Improve Searching PubMed for Clinical Questions. BMC Med. Inform. Decis. Mak 7 (1), 16. doi:10.1186/1472-6947-7-16

Shen, C.-w., and Kuo, C.-J. (2015). Learning in Massive Open Online Courses: Evidence from Social media Mining. Comput. Hum. Behav. 51, 568–577. doi:10.1016/j.chb.2015.02.066

Staples, M., and Niazi, M. (2007). Experiences Using Systematic Review Guidelines. J. Syst. Softw. 80 (9), 1425–1437. doi:10.1016/j.jss.2006.09.046

Yan, W., Dowell, N., Holman, C., Welsh, S. S., Choi, H., and Brooks, C. (2019). “Exploring Learner Engagement Patterns in Teach-Outs Using Topic, Sentiment and On-Topicness to Reflect on Pedagogy,” in Proceedings of the 9th International Conference on Learning Analytics & Knowledge , 180–184. doi:10.1145/3303772.3303836

Zemsky, R. (2014). With a MOOC MOOC Here and a MOOC MOOC There, Here a MOOC, There a MOOC, Everywhere a MOOC MOOC. J. Gen. Educ. 63 (4), 237–243. doi:10.1353/jge.2014.0029

Zhang, H., Dong, J., Min, L., and Bi, P. (2020). A BERT Fine-tuning Model for Targeted Sentiment Analysis of Chinese Online Course Reviews. Int. J. Artif. Intell. Tools 29 (07n08), 2040018. doi:10.1142/s0218213020400187

Zhou, J., and Ye, J.-m. (2020). “Sentiment Analysis in Education Research: A Review of Journal Publications,” in Interactive Learning Environments , 1–13. doi:10.1080/10494820.2020.1826985

Keywords: massive open online courses, MOOCs, sentiment analysis, systematic review, student feedback, learning analytics, opinion mining

Citation: Dalipi F, Zdravkova K and Ahlgren F (2021) Sentiment Analysis of Students’ Feedback in MOOCs: A Systematic Literature Review. Front. Artif. Intell. 4:728708. doi: 10.3389/frai.2021.728708

Received: 21 June 2021; Accepted: 26 August 2021; Published: 09 September 2021.

Reviewed by:

Copyright © 2021 Dalipi, Zdravkova and Ahlgren. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Fisnik Dalipi, [email protected]

This article is part of the Research Topic

Learning Analytics: Trends and Challenges

chrome icon

Sentiment analysis in education research: a review of journal publications

73  citations

34  citations

26  citations

9  citations

5  citations

3,740  citations

765  citations

600  citations

508  citations

387  citations

Related Papers (5)

Ask Copilot

Related papers

Contributing institutions

Related topics

Systematic reviews in sentiment analysis: a tertiary study

  • Open access
  • Published: 03 March 2021
  • Volume 54 , pages 4997–5053, ( 2021 )

Cite this article

You have full access to this open access article

  • Alexander Ligthart 1 ,
  • Cagatay Catal   ORCID: orcid.org/0000-0003-0959-2930 2 &
  • Bedir Tekinerdogan 1  

29k Accesses

114 Citations

1 Altmetric

Explore all metrics

With advanced digitalisation, we can observe a massive increase of user-generated content on the web that provides opinions of people on different subjects. Sentiment analysis is the computational study of analysing people's feelings and opinions for an entity. The field of sentiment analysis has been the topic of extensive research in the past decades. In this paper, we present the results of a tertiary study, which aims to investigate the current state of the research in this field by synthesizing the results of published secondary studies (i.e., systematic literature review and systematic mapping study) on sentiment analysis. This tertiary study follows the guidelines of systematic literature reviews (SLR) and covers only secondary studies. The outcome of this tertiary study provides a comprehensive overview of the key topics and the different approaches for a variety of tasks in sentiment analysis. Different features, algorithms, and datasets used in sentiment analysis models are mapped. Challenges and open problems are identified that can help to identify points that require research efforts in sentiment analysis. In addition to the tertiary study, we also identified recent 112 deep learning-based sentiment analysis papers and categorized them based on the applied deep learning algorithms. According to this analysis, LSTM and CNN algorithms are the most used deep learning algorithms for sentiment analysis.

Similar content being viewed by others

sentiment analysis in education research a review of journal publications

A survey on sentiment analysis methods, applications, and challenges

Mayur Wankhade, Annavarapu Chandra Sekhara Rao & Chaitanya Kulkarni

sentiment analysis in education research a review of journal publications

A review on sentiment analysis and emotion detection from text

Pansy Nandwani & Rupali Verma

sentiment analysis in education research a review of journal publications

Sentiment Analysis in the Age of Generative AI

Jan Ole Krugmann & Jochen Hartmann

Avoid common mistakes on your manuscript.

1 Introduction

Sentiment analysis or opinion mining is the computational study of people's opinions, sentiments, emotions, and attitudes towards entities such as products, services, issues, events, topics, and their attributes (Liu 2015). As such, sentiment analysis can allow tracking the mood of the public about a particular entity to create actionable knowledge. Also, this type of knowledge can be used to understand, explain, and predict social phenomena (Pozzi et al. 2017 ). For the business domain, sentiment analysis plays a vital role in enabling businesses to improve strategy and gain insight into customers' feedback about their products. In today's customer-oriented business culture, understanding the customer is increasingly important (Chagas et al. 2018 ).

The explosive growth of discussion platforms, product review websites, e-commerce, and social media facilitates a continuous stream of thoughts and opinions. This growth makes it challenging for companies to get a better understanding of customers' aggregate opinions and attitudes towards products. The explosion of internet-generated content coupled with techniques like sentiment analysis provides opportunities for marketers to gain intelligence on consumers' attitudes towards their products (Rambocas and Pacheco 2018 ). Extracting sentiments from product reviews helps marketers to reach out to customers who need extra care, which will improve customer satisfaction, sales, and ultimately benefits businesses (Vyas and Uma 2019 ).

Sentiment analysis is a multidisciplinary field, including psychology, sociology, natural language processing, and machine learning. Recently, the exponentially growing amounts of data and computing power enabled more advanced forms of analytics. Machine learning, therefore, became a dominant tool for sentiment analysis. There is an abundance of scientific literature available on sentiment analysis, and there are also several secondary studies conducted on the topic.

A secondary study can be considered as a review of primary studies that empirically analyze one or more research questions (Nurdiani et al. 2016 ). The use of secondary studies (i.e., systematic reviews) in software engineering was suggested in 2004, and the term “Evidence-based Software Engineering” (EBSE) was coined by Kitchenham et al. ( 2004 ). Nowadays, secondary studies are widely used as a well-established tool in software engineering research (Budgen et al. 2018 ). The following two kinds of secondary studies can be conducted within the scope of EBSE:

Systematic Literature Review (SLR): An SLR study aims to identify relevant primary studies, extract the required information regarding the research questions (RQs), and synthesize the information to respond to these RQs. It follows a well-defined methodology and assesses the literature in an unbiased and repeatable way (Kitchenham and Charters 2007 ).

Systematic Mapping Study (SMS): An SMS study presents an overview of a particular research area by categorizing and mapping the studies based on several dimensions (i.e., facets) (Petersen et al. 2008 ).

SLR and SMS studies are different than traditional review papers (a.k.a., survey articles) because we systematically search in electronic databases and follow a well-defined protocol to identify the articles. There are also several differences between SLR and SMS studies (Catal and Mishra 2013 ; Kitchenham et al. 2010b ). For instance, while RQs of the SLR studies are very specific, RQs of SMS are general. The search process of the SLR is driven by research questions, but the search process of the SMS is based on the research topic. For the SLR, all relevant papers must be retrieved, and quality assessments of identified articles must be performed; however, requirements for the SMS are less stringent.

When there is a sufficient number of secondary studies on a research topic, a tertiary study can be performed (Kitchenham et al. 2010a ; Nurdiani et al. 2016 ). A tertiary study synthesizes data from secondary studies and provides a comprehensive review of research in a research area (Rios et al. 2018 ). They are used to summarize the existing secondary studies and can be considered as a special form of review that uses other secondary studies as primary studies (Raatikainen et al. 2019 ).

Although sentiment analysis has been the topic of some SLR studies, a tertiary study characterizing these systematic reviews has not been performed yet. As such, the aim of our study is to identify and characterize systematic reviews in sentiment analysis and present a consolidated view of the published literature to better understand the limitations and challenges of sentiment analysis. We follow the research methodology guidelines suggested for the tertiary studies (Kitchenham et al. 2010a ).

The objective of this study is thus to better understand the sentiment analysis research area by synthesizing results of these secondary studies, namely SLR and SMS, and providing a thorough overview of the topic. The methodology that we followed applies a systematic literature review to a sample of systematic reviews, and therefore, this type of tertiary study is valuable to determine the potential research areas for further research.

As part of this tertiary study, different models, tasks, features, datasets, and approaches in sentiment analysis have been mapped and also, challenges and open problems in this field are identified. Although tertiary studies have been performed for other topics in several fields such as software engineering and software testing (Raatikainen et al. 2019 ; Nurdiani et al. 2016 ; Verner et al. 2014 ; Cruzes and Dybå, 2011 ; Cadavid et al. 2020 ), this is the first study that performs a tertiary study on sentiment analysis.

The main contributions of this article are three-fold:

We present the results of the first tertiary study in the literature on sentiment analysis.

We identify systematic review studies of sentiment analysis systematically and explain the consolidated view of these systematic studies.

We support our study with recent survey papers that review deep learning-based sentiment analysis papers and explain the popular lexicons in this field.

The rest of the paper is organized as follows: Sect.  2 provides the background and related work. Section  3 explains the methodology, which was followed in this study. Section  4 presents the results in detail. Section  5 provides the discussion, and Sect.  6 explains the conclusions.

2 Background and related work

Sentiment analysis and opinion mining are often used interchangeably. Some researchers indicate a subtle difference between sentiments and opinions, namely that opinions are more concrete thoughts, whereas sentiments are feelings (Pozzi et al. 2017 ). However, sentiment and opinion are related constructs, and both sentiment and opinion are included when referring to either one. This research adopts sentiment analysis as a general term for both opinion mining and sentiment analysis.

Sentiment analysis is a broad concept that consists of many different tasks, approaches, and types of analysis, which are explained in this section. In addition, an overview of sentiment analysis is represented in Fig.  1 , which is adapted from (Hemmatian and Sohrabi 2017 ; Kumar and Jaiswal 2020 ; Mite-Baidal et al. 2018 ; Pozzi et al. 2017 ; Ravi and Ravi 2015 ). Cambria et al. ( 2017 ) stated that a holistic approach to sentiment analysis is required, and only categorization or classification is not sufficient. They presented the problem as a three-layer structure that includes 15 Natural Language Processing (NLP) problems as follows:

Syntactics layer: Microtext normalization, sentence boundary disambiguation, POS tagging, text chunking, and lemmatization

Semantics layer: Word sense disambiguation, concept extraction, named entity recognition, anaphora resolution, and subjectivity detection

Pragmatics layer: Personality recognition, sarcasm detection, metaphor understanding, aspect extraction, and polarity detection

figure 1

Sentiment analysis concept overview

Cambria ( 2016 ) state that approaches for sentiment analysis and affective computing can be divided into the following three categories: knowledge-based techniques, statistical approaches (e.g., machine learning and deep learning approaches), and hybrid techniques that combine the knowledge-based and statistical techniques.

Sentiment analysis models can adopt different pre-processing methods and apply a variety of feature selection methods. While pre-processing means transforming the text into normalized tokens (e.g., removing article words and applying the stemming or lemmatization techniques), feature selection means determining what features will be used as inputs. In the following subsections, related tasks, approaches, and levels of analysis are presented in detail.

2.1.1 Sentiment classification

One of the most widely known and researched tasks in sentiment analysis is sentiment classification. Polarity determination is a subtask of sentiment classification and is often improperly used when referring to sentiment analysis. However, it is merely a subtask aimed at identifying sentiment polarity in each text document. Traditionally, polarity is classified as either positive or negative (Wang et al. 2014 ). Some studies include a third class called neutral . Cross-domain and cross-language classification are subtasks of sentiment classification that aim to transfer knowledge from a data-rich source domain to a target domain where data and labels are limited. The cross-domain analysis predicts the sentiment of a target domain, with a model (partly) trained on a more data-rich source domain. A popular method is to extract domain invariant features whose distribution in the source domain is close to that of the target domain (Peng et al. 2018 ). The model can be extended with target domain-specific information. The cross-language analysis is practiced in a similar way by training a model on a source language dataset and testing it on a different language where data is limited, for example by translating the target language to the source language before processing (Can et al. 2018 ). Xia et al. ( 2015 ) stated that opinion-level context is beneficial to solve polarity ambiguity of sentiment words and applied the Bayesian model. Word polarity ambiguity is one of the challenges that need to be addressed for sentiment analysis. Vechtomova ( 2017 ) showed that the information retrieval-based model is an alternative to machine learning-based approaches for word polarity disambiguation.

2.1.2 Subjectivity classification

Subjectivity classification is a task to determine the existence of subjectivity in the text (Kasmuri and Basiron 2017 ). The goal of subjectivity classification is to restrict unwanted objective data objects for further processing (Kamal 2013 ). It is often considered the first step in sentiment analysis. Subjectivity classification detects subjective clues , words that carry emotion or subjective notions like ‘expensive’, ‘easy’, and ‘better’ (Kasmuri and Basiron 2017 ). These clues are used to classify text objects as subjective or objective.

2.1.3 Opinion spam detection

The growing popularity of e-commerce websites and review websites caused opinion spam detection to be a prominent issue in sentiment analysis. Opinion spams also referred to as false or fake reviews are intelligently written comments that either promote or discredit a product. Opinion spam detection aims to identify three types of features that relate to a fake review: review content, metadata of review, and real-life knowledge about the product (Ravi and Ravi 2015 ). Review content is often analyzed with machine learning techniques to uncover deception. Metadata includes the star rating, IP address, geo-location, user-id, etc.; however, in many cases, it is not accessible for analysis. The third method includes real-life knowledge. For instance, if a product has a good reputation, and suddenly the inferior product is rated superior in some period, reviews of that period might be suspected.

2.1.4 Implicit language detection

Implicit language refers to humor, sarcasm, and irony. There are vagueness and ambiguity in this form of speech, which is sometimes hard to detect even for humans. However, an implicit meaning to a sentence can completely flip the polarity of a sentence. Implicit language detection often aims at understanding facts related to an event. For example, in the phrase “I love pain”, pain is a factual word with a negative polarity load. The contradiction of the factual word ‘pain’ and subjective word ‘love’ can indicate sarcasm, irony, and humor. More traditional methods for implicit language detection include exploring clues such as emoticons, expressions for laughter, and heavy punctuation mark usage (Filatova 2012 ).

2.1.5 Aspect extraction

Aspect extraction refers to retrieving the target entity and aspects of the target entity in the document. The target entity can be a product, person, event, organization, etc. (Akshi Kumar and Sebastian 2012 ). People's opinions on various parts of a product need to be identified for fine-grained sentiment analysis (Ravi and Ravi 2015 ). Aspect extraction is especially important in sentiment analysis of social media and blogs that often do not have predefined topics.

Multiple methods exist for aspect extraction. The first and most traditional method is frequency-based analysis. This method finds frequently used nouns or compound nouns (POS tags), which are likely to be aspects. A rule of thumb that is often used is that if the (compound) noun occurs in at least 1% of the sentences, it is considered an aspect. This straightforward method turns out to be quite powerful (Schouten and Frasincar 2016 ). However, there are some drawbacks to this method (e.g., not all nouns are referring to aspects).

Syntax-based methods find aspects by means of syntactic relations they are in. A simple example is identifying aspects that are preceded by a modifying adjective that is a sentiment word. This method allows for low-frequency aspects to be identified. The drawback of this method is that many relations need to be found for complete coverage, which requires knowledge of sentiment words. Extra aspects can be found if more sentiment words that serve as adjectives can be identified. Qiu et al. ( 2009 ) propose a syntax-based algorithm that identifies aspects as well as sentiment words that works both ways. The algorithm identifies sentiment words for known aspects and aspects for known sentiment words.

2.2 Approaches

2.2.1 machine learning-based approaches.

Machine learning approaches for sentiment analysis tasks can be divided into three categories: unsupervised learning, semi-supervised learning, and supervised learning.

The unsupervised learning methods group unlabelled data into clusters that are similar to each other. For example, the algorithm can consider data as similar based on common words or word pairs in the document (Li and Liu 2014 ).

Semi-supervised learning uses both labeled and unlabelled data in the training process (da Silva et al. 2016a , b ). A set of unlabelled data is complemented with some examples of labeled data (often limited) included building a classifier. This technique can yield decent accuracy and requires less human effort compared to supervised learning. In cross-domain and cross-language classification, domain, or language invariant features can be extracted with the help of unlabelled data, while fine-tuning the classifier with labeled target data (Peng et al. 2018 ). Semi-supervised learning is especially popular for Twitter sentiment analysis, where large sets of unlabelled data are available (da Silva et al. 2016a , b ). Hussain and Cambria ( 2018 ) compared the computational complexity of several semi-supervised learning methods and presented a new semi-supervised model based on biased SVM (bSVM) and biased Regularized Least Squares (bRLS). Wu et al. ( 2019 ) developed a semi-supervised Dimensional Sentiment Analysis (DSA) model using the variational autoencoder algorithm. DSA calculates the sentiment score of texts based on several dimensions, such as dominance, valence, and arousal. Xu and Tan ( 2019 ) proposed the target-oriented semi-supervised sequential generative model (TSSGM) for target-oriented aspect-based sentiment analysis and showed that this approach outperforms two semi-supervised learning methods. Han et al. (2019) developed a semi-supervised model using dynamic thresholding and multiple classifiers for sentiment analysis. They evaluated their model on the Large Movie Review dataset and showed that it provides higher performance than the other models. Duan et al. ( 2020 ) proposed the Generative Emotion Model with Categorized Words (GEM-CW) model for stock message sentiment classification and demonstrated that this model is effective. Gupta et al. ( 2018 ) investigated the semi-supervised approaches for low resource sentiment classification and showed that their proposed methods improve the model performance against supervised learning models.

The most widely known machine learning method is supervised learning. This approach trains a model with labeled source data. The trained model can subsequently make predictions for an output considering new unlabelled input data. In most cases, supervised learning often outperforms unsupervised and semi-supervised learning approaches, but the dependency on labeled training data can require lots of human effort and is therefore sometimes inefficient (Hemmatian and Sohrabi 2017 ).

Machine learning methods are increasingly popular for aspect extraction. The most commonly used approach for aspect extraction is topic modeling , an unsupervised method that assumes any document contains a certain amount of hidden topics (Hemmatian and Sohrabi 2017 ). Latent Dirichlet Allocation (LDA) algorithm, which has many different variations, is a popular topic modeling algorithm (Nguyen and Shirai 2015 ) that allows observations to be explained by unsupervised grouping of similar data. LDA outputs some topics of a text document and attributes each word in the document to one of the identified topics. The drawback of machine learning methods is that they require lots of labeled data.

2.2.2 Deep learning-based approaches

Deep learning is a sub-branch of machine learning that uses deep neural networks. Recently, deep learning algorithms have been widely applied for sentiment analysis. In this section, first, we discuss the articles that present an overview of papers that applied deep learning for sentiment analysis. These articles are neither SLR nor SMS papers. Instead, they are either traditional review (a.k.a., survey) articles or comparative assessment papers that explain the existing deep learning-based approaches in addition to the experimental analysis. Later, we also present some of the deep learning-based models used in sentiment analysis papers.

In Table  1 , we present the survey papers that analyzed deep learning-based sentiment analysis papers. In this table, we also show the number of papers investigated in these survey papers.

Dang et al. ( 2020 ) presented a summary of 32 deep learning-based sentiment analysis papers and analyzed the performance of Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) on eight datasets. They selected these deep learning algorithms because they are the most widely used deep learning algorithms according to their analysis of 32 deep learning-based sentiment analysis papers. They used both word embedding and term frequency-inverse document frequency (TF-IDF) to prepare inputs for classification algorithms and reported that the RNN-based model using word embedding achieved the best performance among other algorithms. However, the processing time of the RNN-based model is ten times larger than the CNN-based one. In addition, they reported that the following deep learning algorithms were used in the 32 deep learning-based sentiment analysis papers: CNN, Long-Short Term Memory (LSTM) (tree-LSTM, discourse-LSTM, coattention-LSTM, bi-LSTM), Gated Recurrent Units (GRU), RNN, Coattention-MemNet, Latent Rating Neural Network (LRNN), Simple Recurrent Networks (SRN), and Recurrent Neural Tensor Network (RNTN)).

Yadav and Vishwakarma (2019) reviewed 130 research papers that apply deep learning techniques in sentiment analysis. They identified the following deep learning methods used for sentiment analysis: CNN, Recursive Neural Network (Rec NN), RNN (LSTM and GRU), Deep Belief Networks (DBN), Attention-based Network, Bi-RNN, and Capsule Network. They reported that LSTM provides better results, and the use of deep learning approaches for sentiment analysis is promising. However, they stated that they require a huge amount of data, and there is a lack of training datasets.

Zhang et al. ( 2018 ) published a survey article on the application of deep learning methods for sentiment analysis. They explained several papers that address one of the following levels: document level, sentence level, and the aspect level sentiment classification. The applied algorithms per analysis level are listed as follows:

Document-level sentiment classification: Artificial Neural Networks (ANN), Stacked Denoising Autoencoder (DSA), Denoising Autoencoder, CNN, LSTM, GRU, Memory Network, and GRU-based Encoder

Sentence-level sentiment classification: CNN, RNN, Semi-supervised Recursive Autoencoders Network (RAE), Recursive Neural Network, Recursive Neural Tensor Network, Dynamic CNN, LSTM, CNN-LSTM, Bi-LSTM, and Recurrent Random Walk Network

Aspect-level sentiment classification: Adaptive Recursive Neural Network, LSTM, Bi-LSTM, Attention-based LSTM, Memory Network, Interactive Attention Network, Recurrent Attention Network, and Dyadic Memory Network

Rojas‐Barahona (2016) presented an overview of deep learning approaches used for sentiment analysis and divided the techniques into the following categories:

Non-Recursive Neural Networks: RNN (variant: Bi-RNN), LSTM (variant: Bi-LSTM), and CNN (variants: CNN-Multichannel, CNN-non-static, Dynamic CNN)

Recursive Neural Networks: Recursive Autoencoders and Constituency Tree Recursive Neural Networks

Combination of Non-Recursive and Recursive Methods: Tree-Long Short-Term Memory (Tree-LSTM) and Deep Recursive Neural Networks (Deep RsNN)

For the movie reviews dataset, Rojas‐Barahona (2016) showed that the Dynamic CNN model provides the best performance. For the Sentiment TreeBank dataset, the Constituency Tree‐LSTM that is a Recursive Neural Network outperforms all the other algorithms.

Habimana et al. ( 2020a ) reviewed papers that applied deep learning algorithms for sentiment analysis and also performed several experiments with the specified algorithms on different datasets. They reported that dynamic sentiment analysis, sentiment analysis for heterogeneous information, and language structure are the main challenges for the sentiment analysis research field. They categorized the techniques used in the papers based on several analysis levels that are listed as follows:

Document-level Sentiment Analysis: CNN-based models, RNN with attention-based models, RNN with the user and product attention-based models, Adversarial Network Models, and Hybrid Models

Sentence-Level Sentiment Classification: Unsupervised Pre-Trained Networks (UPN), CNN, Recurrent Neural Networks, Deep Reinforcement Learning (DRL), RNN, RNN with cognition attention-based models

Aspect-based Sentiment Analysis: Attention-based models with aspect information, attention-based models with the aspect context, RNN with attention memory model, RNN with commonsense knowledge model, CNN-based model, and Hybrid model

Do et al. ( 2019 ) presented an overview of over 40 deep learning approaches used for aspect-based sentiment analysis. They categorized papers based on the following categories: CNN, RNN, Recursive Neural Network, and Hybrid methods. Also, they presented the advantages, disadvantages, and implications for aspect-based sentiment analysis (ABSA). They concluded that deep learning and ABSA are still in the early stages, and there are four main challenges in this field, namely domain adaptation, multi-lingual application, technical requirements (labeled data and computational resources and time), and linguistic complications.

Minaee et al. ( 2020 ) reviewed more than 150 deep learning-based text classification studies and presented their strengths and contributions. 22 of these studies proposed approaches for sentiment analysis. They provided more than 40 popular text classification datasets and showed the performance of some deep learning models on popular datasets. Since they did not only focus on sentiment analysis problems, they explained other kinds of models used for other tasks such as news categorization, topic analysis, question answering (QA), and natural language inference. They explained the following deep learning models in their paper: Feed-forward neural networks, RNN-based models, CNN-based models, Capsule Neural Networks, Models with attention mechanism, Memory augmented networks, Transformers, Graph Neural Networks, Siamese Neural Networks, Hybrid models, Autoencoders, Adversarial training, and Reinforcement learning. The challenges reported in this study are new datasets for multi-lingual text classification, interpretable deep learning models, and memory-efficient models. They concluded that the use of deep learning in text classification improves the performance of the models.

Some of the highly cited deep learning-based sentiment analysis papers are shown in Table  2 .

Kim ( 2014 ) performed several experiments with the CNN algorithm for sentence classification and showed that even with little parameter tuning, the CNN model that includes only one convolutional layer provides better performance than the state-of-the-art models of sentiment analysis.

Wang et al. ( 2016 ) developed an attention-based LSTM approach that can learn aspect embeddings. These aspects are used to compute the attention weights. Their models provided a state-of-the-art performance on SemEval 2014 dataset. Similarly, Pergola et al. ( 2019 ) proposed a topic-dependent attention model for sentiment classification and showed that the use of recurrent unit and multi-task learning provides better representations for accurate sentiment analysis.

Chen et al. ( 2017 ) developed the Recurrent Attention on Memory (RAM) model and showed that their model outperforms other state-of-the-art techniques on four datasets, namely SemEval 2014 (two datasets), Twitter dataset, and Chinese news comment dataset. Multiple attentions were combined with a Recurrent Neural Network in this study.

Ma et al. ( 2018 ) incorporated a hierarchical attention mechanism to the LSTM network and also extended the LSTM cell to incorporate commonsense knowledge. They demonstrated that the combination of this new LSTM model called Sentic LSTM and the attention architecture outperforms the other models for targeted aspect-based sentiment analysis.

Chen et al. ( 2016 ) developed a hierarchical LSTM model that incorporates user and product information via different levels of attention. They showed that their model achieves significant improvements over models without user and product information on IMDB, Yelp2013, and Yelp2014 datasets.

Wehrmann et al. ( 2017 ) proposed a language-agnostic sentiment analysis model based on the CNN algorithm, and the model does not require any translation. They demonstrated that their model outperforms other models on a dataset, including tweets from four languages, namely English, German, Spanish, and Portuguese. The dataset consists of 1.6 million annotated tweets (i.e., positive, negative, and neutral) from 13 European languages.

Ebrahimi et al. ( 2017 ) presented the challenges of building a sentiment analysis platform and focused on the 2016 US presidential election. They reported that they reached the best accuracy using the CNN algorithm, and the content-related challenges were hashtags, links, and sarcasm.

Poria et al. ( 2018 ) investigated three deep learning-based architectures for multimodal sentiment analysis and created a baseline based on state-of-the-art models.

Xu et al. ( 2019 ) developed an improved word representation approach, used the weighted word vectors as input into the Bi-LSTM model, obtained the comment text representation, and applied the feedforward neural network classifier to predict the comment sentiment tendency.

Majumder et al. ( 2019 ) proposed a GRU-based Neural Network that can be trained on sarcasm or sentiment datasets. They demonstrated that multitask learning-based approaches provide better performance than standalone classifiers developed on sarcasm and sentiment datasets.

After investigating these above-mentioned survey and highly cited articles, we searched in Google Scholar by using our search criteria (i.e., “deep learning” and “sentiment analysis”) to reach the recent state-of-the-art deep learning-based studies published in 2020. We retrieved 112 deep learning-based sentiment analysis papers published in 2020 and extracted the applied deep learning algorithms from these papers. In Appendix (Table  16 ), we present these recent deep learning-based sentiment analysis papers. In Table  3 , we show the distribution of applied deep learning algorithms used in these 112 recent papers.

According to this table, the most applied algorithm is the LSTM algorithm (i.e., 35.53%) and the second most used algorithm is CNN (i.e., 33.33%). The other widely used algorithms are GUR (i.e., 8.77%) and RNN (i.e., 7.89%) algorithms. However, the other well-known deep learning algorithms such as DNN, Recursive Neural Network (ReNN), Capsule Network (CapN), Generative Adversarial Network (GAN), Deep Q-Network, and Autoencoder have not been preferred much and used only in a few studies. Most of the hybrid approaches also combined the CNN and LSTM algorithms and therefore, they were represented under these categories. As this analysis indicates, most of the recent deep learning-based studies followed the supervised learning machine learning approach.

2.2.3 Lexicon-based approaches

The traditional approach for sentiment analysis is the lexicon-based approach (Hemmatian and Sohrabi 2017 ). Lexicon-based methods scan through the documents for words that express positive or negative feelings to humans. Negatives words would be ‘bad’, ‘ugly’, ‘scary’ while positive words are, for example, ‘good’ or ‘beautiful’. The values of these words are documented in a lexicon. Words with high positive or negative values are mostly adjectives and adverbs. Sentiment analysis shows to be extremely dependent on the domain of interest (Vinodhini 2012 ). For example, analyzing movie reviews can yield very different results compared to analyzing Twitter data due to different forms of language used. Therefore, the lexicon used for sentiment analysis needs to be adjusted according to the domain of interest. This can be a time-consuming process. However, lexicon-based methods do not require training data, which is a big advantage (Shayaa et al. 2018 ).

There are two main approaches to creating sentiment lexicons: dictionary-based and corpus-based. The dictionary-based approach starts with a small set of sentiment words, and iteratively expands the lexicon with synonyms and antonyms from existing dictionaries. In most cases, the dictionary-based approach works best for general purposes. Corpus-based lexicons can be tailored to specific domains. The approach starts with a list of general-purpose sentiment words and discovers other sentiment words from a domain corpus based on co-occurring word patterns (Mite-Baidal et al. 2018 ).

2.2.4 Hybrid approaches

There are different hybrid approaches in the literature. Some of them aim to extend machine learning models with lexicon-based knowledge (Behera et al. 2016 ). The goal is to combine both methods to yield optimal results using an effective feature set of both lexicon and machine learning-based techniques (Munir Ahmad et al. 2017 ). This way, the deficiencies and limitations of both approaches can be overcome.

Recently, researchers focused on the integration of symbolic and subsymbolic Artificial Intelligence (AI) for sentiment analysis (Cambria et al. 2020 ). Machine learning (also, deep learning) is considered to be a bottom-up approach and applies subsymbolic AI. This is extremely useful for exploring a huge amount of data and discovering interesting patterns in the data. Although this type of bottom-up approach works quite well for image classification tasks, they are not very effective for natural language processing tasks. For effective communication, we learn many issues such as cultural awareness and commonsense in a top-down manner instead of a bottom-up manner (Cambria et al. 2020 ). Therefore, these researchers applied subsymbolic AI (i.e., deep learning) to recognize patterns in text and represented them in a knowledge base using symbolic AI (i.e., logic and semantic networks). They built a new commonsense knowledge base called SenticNet for the sentiment analysis problem and concluded that coupling symbolic AI and subsymbolic AI is crucial to passing to the natural language understanding stage from natural language processing.

Minaee et al. ( 2019 ) developed an ensemble model using LSTM and CNN algorithm and demonstrated that this ensemble model provides better performance than the individual models.

2.2.5 Milestones of sentiment analysis research

Recently, Poria et al. ( 2020 ) investigated the challenges and new research directions in sentiment analysis research. Also, they presented the key milestones of sentiment analysis for the last two decades. We adapted their timeline figure for the last decade. In Fig.  2 , we present the most promising works of sentiment analysis research. For a more detailed illustration of milestones, we refer the readers to the article of Poria et al. ( 2020 ).

figure 2

Milestones of sentiment analysis research for the last decade

2.3 Levels of analysis

Sentiment analysis can be implemented at the following three levels: document, sentence, and aspect level. We elaborate on these in the next paragraphs.

2.3.1 Document-level

Document-level analysis considers the whole text document as a unit of analysis (Wang et al. 2014 ). It is a simplified task that presumes that the entire document originates from a single opinion holder. Document analysis comes with some issues, namely that there could be multiple and mixed opinions in a document expressed in many different ways, sometimes with implicit language (Akshi Kumar and Sebastian 2012 ). Typically, documents are revised on a sentence or aspect level before determining the polarity of the entire text document.

2.3.2 Sentence-level:

Sentence-level analysis considers specific sentences in a text and is especially used for subjectivity classification. Text documents typically consist of sentences that either contain opinion or not. Subjectivity classification analyses individual sentences in a document to detect whether the sentence contains facts or emotions and opinions. The main goal of subjectivity classification is to exclude sentences that do not contain sentiment or opinion (Akshi Kumar and Sebastian 2012 ). This analysis often includes subjectivity classification as a step to either include or exclude sentences for analysis.

2.3.3 Aspect-level

Aspect-level analysis is a challenging topic in sentiment analysis. It refers to analyzing sentiments about specific entities and their aspects in a text document, not merely the overall sentiment of the document (Tun Thura Thet et al. 2010 ). It is also known as entity-level or feature-level analysis. Even though the general sentiment of a document may be classified as positive or negative, the opinion holder can have a divergent opinion about specific aspects of an entity (Akshi Kumar and Sebastian 2012 ). In order to measure aspect-level opinion, aspects of the entity need to be identified. Valdivia et al. ( 2017 ) stated that aspect-based sentiment analysis is beneficial to the business manager because customer opinions are extracted in a transparent way. Also, they reported that ironic expression detection in TripAdvisor is still an open problem and also, labeling of reviews should not only focus on user ratings because some users write positive sentences on negative user ratings and vice versa. Poria et al. ( 2016 ) proposed a new algorithm called Sentic LDA (Latent Dirichlet Allocation) and improved the LDA algorithm with semantic similarity for aspect-based sentiment analysis. They concluded that this new algorithm helps researchers to pass to the semantics analysis from the syntactical analysis in aspect-based sentiment analysis by using the common-sense computing (Cambria et al. 2009 ) and improves the clustering process (Poria et al. 2016 ).

2.4 Popular lexicons

Several survey articles discussed the popular lexicons used in sentiment analysis. Dang et al. ( 2020 ) reported the following popular sentiment analysis lexicons in their article: Sentiment 140, Tweets Airline, Tweets Semeval, IMDB Movie Reviews (1), IMDB Movie Reviews (2), Cornell Movie Reviews, Book Reviews, and Music Reviews datasets. Habimana et al. ( 2020a ) explained the following popular lexicons in their survey article: IMDB, IMDB2, SST-5, SST-2, Amazon, SemEval 2014-D1, SemEval 2014-D2, SemEval 2017, STS, STS-Gold, Yelp, HR (Chinese), MR, Sanders, Deutsche Bahn (Deutsch), ASTD (Arabic), YouTube, CMU-MOSI, and CMU-MOSEI. Do et al. ( 2019 ) reported the following datasets widely used in sentiment analysis papers: Customer review data, SemEval 2014, SemEval 2015, SemEval 2016, ICWSM 2010 JDPA Sentiment Corpus, Darmstadt Service Review Corpus, FiQA ABSA, and target-dependent Twitter sentiment classification dataset. Minaee et al. ( 2020 ) explained the following datasets used for sentiment analysis: Yelp, IMDB, Movie Review, SST, MPQA, Amazon, and aspect-based sentiment analysis datasets (SemEval 2014 Task-4, Twitter, and SentiHood). Researchers who would like to perform a new study are suggested to look at these articles because links and other details per lexicon are presented in detail in these articles.

2.5 Advantages, disadvantages, and performance of the models

Several studies have been performed to compare the performance of existing models for sentiment analysis. Each model has its own advantages and weaknesses. For the aspect-based sentiment analysis, Do et al. ( 2019 ) divided models based on the following three categories: CNN, RNN, and Recurrent Neural Networks. The advantages of CNN-based models are fast computation, the ability to extract local patterns and represent non-linear dynamics. The disadvantage of the CNN-based model is the high demand for data. The advantages of RNN-based models are that they do not require a huge amount of data, they have a distributed hidden state that stores previous computations, and they require fewer parameters. The disadvantages are that they cannot capture long-term dependencies, and they select the last hidden state to represent the sentence. The advantages of Recurrent Neural Networks are their simple architectures and their ability to learn tree structures. The disadvantages are that they require parsers that might be slow, and they are still at early stages. It was reported that RNN-based models provide better performance than CNN-based models, and more research is required for Recurrent Neural Networks.

Yadav and Vishwakarma (2019) reported that deep learning-based models are gaining popularity for different sentiment analysis tasks. They stated that CNN followed by LSTM (an RNN algorithm) provides the highest accuracy for document-level sentiment classification, researchers focused on RNN algorithms (particularly, LSTM) for sentence-level sentiment classification and aspect-level sentiment classification, and RNN models the best-performing ones for multi-domain sentiment classification. They also discussed the merits and demerits of CNN, Recursive Neural Networks (RecNN), RNN, LSTM, GRU, DBN models.

The advantage of DBN is the ability to learn the dimension of vocabulary using different layers. The disadvantages of DBN are that they are computationally expensive and unable to remember the previous task.

The advantage of GRU is that it is computationally less expensive, it has a less complex structure, and it can capture interdependencies between sentences. The disadvantage of GRU is that it does not have a memory unit, and its performance is lower than the LSTM model on larger datasets.

The advantage of LSTM is that they perform better than CNN, they can extract sequential information, and they can forget/remember things selectively. The disadvantage of LSTM is that it is considerably slower, each output should be reconciled to a sentence, and it is computationally expensive.

The advantage of RNN models is that they provide better performance than CNN models, have fewer parameters, and capture long-distance dependency features. The disadvantage of RNN models is that they cannot process long sequences.

The advantage of CNN models is that they are less expensive in terms of computational complexity and faster compared to RNN, LSTM, and GRU algorithms. Also, they can discover relevant features from different parts of the word. The disadvantage of LSTM models is that they cannot preserve long-term dependency and ignores this type of long-distance features.

The advantage of RecNN is that they are good at learning hierarchical structure and therefore, they provide better performance for NLP tasks. The disadvantage of RecNN models is that their efficiency is dramatically affected in the case of informal data that do not have grammatical rules and training can be difficult because structure changes for every sample.

Despite the excellent performance of deep learning models, there are some drawbacks. The following drawbacks are discussed by Yadav and Vishwakarma (2019):

A huge amount of data is required to train the models and finding these large datasets is not easy in many cases

They work like a black box, it is hard to understand how they predict the sentiment of the text

The performance of the models is affected by the hyperparameters and the selection of these hyperparameters is very challenging

Training time is very long and most of the time they require GPU support and large RAM

Yadav and Vishwakarma (2019) performed experiments to compare the execution time and accuracy of several deep learning algorithms. They reported that the LSTM algorithm and its variations such as Bi-LSTM and GRU require long training and execution time compared to other deep learning models. However, these LSTM-based algorithms better performance. Therefore, there is a trade-off between time and accuracy parameters when selecting the deep learning model.

3 Methodology

In this section, the methodology of our tertiary study is presented. This study can be considered as a systematic review study that targets secondary studies on sentiment analysis, which is a widely researched topic. There are several reviews and mapping studies available on sentiment analysis in the literature. In this section, we focus on synthesizing the results of these secondary studies. Hence, we conduct a tertiary study. The study design is based on the systematic literature review (SLR) protocol suggested by Kitchenham and Charters ( 2007 ) and the format followed by the tertiary study papers of Curcio et al. ( 2019 ); Raatikainen et al. ( 2019 ). This study reviews two types of secondary studies:

SLR: These studies are performed to aggregate results related to specific research questions.

SMS: These studies aim to find and classify primary studies in a specific research topic. This method is more explorative compared to the SLR and is used to identify available literature prior to undertaking an SLR.

Both are considered secondary studies as they review primary studies. A pragmatic comparison between SLR and SMS is discussed by Kitchenham et al. ( 2011 ). Three main phases for conducting this research are planning, conducting, and reporting the review (Kitchenham 2004 ). Planning refers to identifying the need for the review and developing the review protocol. The goal of this tertiary study is to gather a broad overview of the current state of the art in sentiment analysis and to identify open problems and challenges in the field.

3.1 Research questions

The following research questions have been defined for this study:

RQ1 What are the adopted features (input/output) in sentiment analysis?

RQ2 What are the adopted approaches in sentiment analysis?

RQ3 What domains have been addressed in the adopted data sets?

RQ4 What are the challenges and open problems with respect to sentiment analysis?

3.2 Search process

This section provides insight into the process of determining secondary studies to include. Not all databases are equally relevant to this research topic. Databases that are used to identify secondary studies are adopted from the search strategy of secondary studies on sentiment analysis (Genc-Nayebi and Abran 2017 ; Hemmatian and Sohrabi 2017 ; Kumar and Jaiswal 2020 ; Sharma and Dutta 2018 ). The following databases are included in this study: IEEE, Science Direct, ACM, Springer, Wiley, and Scopus . To find the relevant literature, databases are searched for the title, abstract, and keywords based on the following query:

(“sentiment analysis” OR “sentiment classification” OR “opinion mining”) AND (“SLR” OR “systematic literature review” OR “systematic mapping” OR “mapping study”)

This query results in 43 hits. As stated before, this study only considers systematic literature reviews and systematic mapping studies since they are considered of higher quality and more in-depth compared to survey articles. Inclusion and exclusion criteria are formulated, as shown in Table  4 .

All secondary studies are analyzed and classified according to the inclusion and exclusion criteria in Table  4 . After this process, 16 secondary studies are selected.

3.3 Quality assessment

The confidence placed in the secondary studies is on the quality assessment of the articles. For a tertiary study, the quality assessment is especially important (Goulão et al. 2016 ). The DARE criteria proposed by York University Centre for Reviews and Dissemination (CDR) and adopted in this study are often used in the context of software engineering (Goulão et al. 2016 ; Rios et al. 2018 ; Curcio et al. 2019 ; Goulão et al. 2016 ; Kitchenham et al. 2010a ). The criteria are based on four questions (CQs), as shown in Table 5 . For each selected article, the criteria are scored based on a three-point scale, as described in Table  6 , adopted from (Kitchenham et al. 2010a , b ).

The scoring procedure is Yes = 1, Partial = 0.5, and No = 0. The assessment is conducted by the researchers. The results of the quality assessment are shown in Table  7 . Two studies are excluded based on the results, leaving a total amount of 14 studies remaining for analysis.

3.4 Additional data

In order to provide an overview of the selected secondary studies, Table  8 shows the following data extracted from the articles: Research focus, number of primary studies included in the review, year of publication, paper type (conference/journal/book chapter), and source. In addition, an overview of the research questions of the secondary studies is provided, as shown in Table  9 . The reference numbers in Table  8 are used throughout the rest of this paper.

This section addresses the results of the research questions derived from 14 secondary studies. For each research question, tables with aggregate results and in-depth descriptions and interpretations are presented. The selected secondary studies discuss specific sentiment analysis tasks. It is important to note that different tasks in sentiment analysis require different features and approaches. Therefore, a brief overview of each paper is presented. Note that in-depth analysis and synthesis of the articles are presented later in this section.

Genc-Nayebi and Abran ( 2017 ) identify mobile app store opinion mining techniques. Their paper is mainly focused on statistical data mining techniques based on manual classification and correlation analysis. Some machine learning algorithms are discussed in the context of cross-domain analysis and app aspect-extraction. Some interesting challenges in sentiment analysis are proposed.

Al-Moslmi et al. ( 2017 ) review the cross-domain sentiment analysis. Specific algorithms for cross-domain sentiment analysis are described.

Qazi et al. ( 2017 ) research the opinion types and sentiment analysis. Opinion types are classified into the following three categories: regular, comparative, and suggestive. Several supervised machine learning techniques are used. Sentiment classification algorithms are mapped.

Ahmed Ibrahim and Salim ( 2013 ) perform sentiment analysis of Arabic tweets. Their study is focused on mapping features and techniques used for general sentiment analysis.

Shayaa et al. ( 2018 ) research the big data approach to sentiment analysis. A solid overview of machine learning methods and challenges is presented.

A. Kumar and Sharma ( 2017 ) research sentiment analysis for government intelligence. Techniques and datasets are mapped.

M. Ahmad et al. ( 2018 ) focus their research on SVM classification. SVM is the most used machine learning technique in sentiment classification.

A. Kumar and Jaiswal ( 2020 ) discuss soft computing techniques for sentiment analysis on Twitter. Soft computing techniques include machine learning techniques. Deep learning (CNN in particular) is mentioned as upcoming in recent articles. KPIs are described thoroughly.

A. Kumar and Garg ( 2019 ) research context-based sentiment analysis. They stress the importance of subjectivity in sentiment analysis and show that deep learning offers opportunities for context-based sentiment analysis.

Kasmuri and Basiron ( 2017 ) research the subjectivity analysis. Its purpose is to determine whether the text is subjective or objective with objective clues. Subjectivity analysis is a classification problem, and thus, machine learning algorithms are widely used.

Madhala et al. ( 2018 ) research customer emotion analysis. They review articles that classify emotions from 4 to 51 different classes.

Mite-Baidal et al. ( 2018 ) research sentiment analysis in the education domain. E-learning is upcoming, and due to the online nature, lots of review data is generated on forums of MOOCS and social media.

Salah et al. ( 2019 ) research the social media sentiment analysis. Mainly twitter data is used because of the high dimensionality (e. g., retweets, location, user followers no.) and structure.

De Oliveira Lima et al. ( 2018 ) research opinion mining of hotel reviews specifically aimed at sustainability practices aspects. Limited information on used features is available. The following sections dive into the different models that are used in sentiment analysis, including adopted features, approaches, and datasets.

4.1 RQ1 “What are the adopted features in sentiment analysis?”

Table  10 depicts the common input and output features that articles present for the sentiment analysis approach. Checkmarks indicate that the features are explicitly discussed in the referred article. Traditional approaches commonly use Bag-Of-Words (BOW) method. BOW counts the words, referred to as n-grams , in the text and creates a sparse vector with 1 s for present words and 0 s for absent words. These vectors are used as input to machine learning models. N-grams are sets of words that occur next to each other that are combined into one feature. This way, the order of words will be maintained when the text is vectorized. Part-Of-Speech (POS) tags provide feature tags for similar words with a different part of speech in the context. The term frequency-inverse document frequency (TF-IDF) method highlights words or word pairs that often occur in one document but are low in frequency in the entire text corpus. Negation is an important feature to include in lexicon-based approaches. Negation means contradicting or denying something, which can flip the polarity of an opinion or sentiment.

Word embeddings are often used as feature learning techniques in deep learning models. Word embeddings are dense vectors with real numbers for a word or sentence in the text based on the context of the word in the text corpus. This approach, although considered promising, is only discussed to a limited extent in the selected articles.

Output variables differ per sentiment analysis task. Output classes of the identified secondary studies are polarity, subjectivity, emotions classes, or spam identification. Polarity indicates the extent to which the input is considered positive or negative in sentiment. In most cases, the output is classified in a binary way, either positive or negative. Some models include a neutral class as well. Multiple classes of polarity are shown to drastically reduce performance (Al-Moslmi et al. 2017 ) and are, therefore, not frequently used. One study (Madhala et al. 2018 ) focuses specifically on emotion classification, with up to 51 different classes of emotions. Some studies (Ahmed Ibrahim and Salim 2013 ; Kasmuri and Basiron 2017 ) include subjectivity analysis as part of sentiment analysis. Finally, spam detection is an important task in sentiment analysis, referring to extracting illegitimate means for a review. Examples of spam are untruthful opinions, reviews on the brand instead of on the product, and non-reviews like advertisements and random questions or text (Jindal and Liu 2008).

A clear pattern exists in the use of input and output features. Traditional machine learning models commonly use unigrams and n-grams as input. Variable features are TF-IDF values and POS-tags. Not every feature extraction method is as effective in differing domains. Combinations of input features are often made to reach better performance. Word embeddings are upcoming input features. The most recent articles (Kumar and Garg 2019 ; Kumar and Jaiswal 2020 ) explicitly discuss them. Text classification with word embeddings as input is considered a promising technique that is often combined with deep learning methods like recurrent neural networks. The output shows a similar pattern with common and variable features. The common feature is polarity, and variable output features include emotions, subjectivity, and spam type.

4.2 RQ2 “What are the adopted approaches in sentiment analysis?”

Different tasks in sentiment analysis require different approaches. Therefore, it is important to note which task requires which approach. Table  11 shows the categories that are used throughout different sentiment analysis tasks.

Table  12 depicts the commonly used approaches for sentiment analysis per selected paper. Machine learning algorithms, including deep learning (DL), unsupervised learning, and ensemble learning, are widely used for sentiment analysis tasks, as well as lexicon-based and hybrid methods. Checkmarks indicate that approaches are explicitly discussed in the referred article. Results are divided into five categories with specific subcategories. Each category with corresponding subcategories are described as follows:

4.2.1 Deep learning

Deep learning models are complex architectures with multiple layers of neural networks to progressively extract high-level features from input. CNN uses convolutional filters to recognize patterns in data. CNN is widely used in image recognition and, to a lesser extent, in the field of NLP. RNN is designed for recognizing sequential patterns. RNN is especially powerful in cases where context is critical. For this reason, RNN is very promising in sentiment analysis. LSTM networks are a special kind of RNN, that is capable of learning long-term context and dependencies. LSTM is especially powerful in NLP, where long-term dependencies are often important. The discussed deep learning algorithms are considered promising techniques and able to boost the performance of NLP tasks (Socher et al. 2013 ).

4.2.2 Traditional machine learning

Traditional ML algorithms are still widely used in all kinds of sentiment analysis tasks, including sentiment classification. While deep learning is a promising field, in many cases, traditional ML performs sufficiently well or even better for a specific task compared to deep learning methods, usually on smaller datasets. The traditional supervised machine learning algorithms are Support Vector Machines (SVM), Naive Bayes (NB), Neural Networks (NN), Logistic Regression (LogR), Maximum Entropy (ME), k-Nearest Neighbor (kNN), Random Forest (RF), and Decision Trees (DT).

4.2.3 Lexicon-based

Lexicon-based learning is a traditional approach to sentiment analysis. Lexicon-based methods scan through the documents for words that express positive or negative feelings to humans. Words are defined in a lexicon beforehand, so no learning data is required for this approach.

4.2.4 Hybrid models

In the context of sentiment classification, hybrid models combine the lexicon-based approach with machine learning techniques (Behera et al. 2016 ) to create a lexicon-enhanced classifier. Lexicons are used for defining domain-related features that are used as input for a machine learning classifier.

4.2.5 Ensemble classification

Ensemble classifiers approach adopts multiple learning algorithms to obtain better performance (Behera et al. 2016 ). Three main types of ensemble classification methods are bagging (bootstrap aggregating), boosting, and stacking. The bagging method independently learns homogeneous algorithms with data points randomly picked from the training set, following a deterministic averaging process. Boosting learns homogeneous algorithms in a sequential and adaptive way before following an averaging process. Stacking learns heterogeneous classifiers in parallel and combines them to predict an output. An overview of ensemble classifiers is shown in Table  13 .

Support Vector Machines (SVM) is the dominant algorithm in the field of sentiment classification. All selected papers include SVM for classification purposes, and in most cases, this technique yields the best performance. Naive Bayes is the second most used algorithm and is praised for its high performance despite the simplicity of the technique. Besides these two dominant algorithms, methods like NN, LogR, ME, kNN, RF, and DT are used throughout different tasks of sentiment analysis. A popular unsupervised approach for aspect extraction is LDA. Hybrid approaches to sentiment classification have been effective by using domain-specific knowledge to create extra features that enhance the performance of the model. Ensemble and hybrid methods often improve the performance and reliability of predictions.

Deep learning algorithms are rising techniques in sentiment analysis. Especially, RNNs and the more complex RNN architecture, LSTM, are increasing in popularity. Even though deep learning is promising for increasing the performance of NLP and sentiment analysis models (Al-Moslmi et al. 2017 ; Kumar and Garg 2019 ; Kumar and Jaiswal 2020 ; Socher et al. 2013 ), the selected papers only discuss deep learning to a limited extent. The papers that discuss deep learning algorithms are recent papers published in 2018 and 2019, which stresses that sentiment analysis is a timely research subject and that the state-of-the-art is evolving rapidly. Figure  3 shows the year-wise distribution of selected articles. Except for one study from 2013, all selected studies are published in 2017, 2018, and 2019.

figure 3

Publication dates of the selected articles

4.3 RQ3 “What domains have been addressed in the adopted data sets?”

Datasets for sentiment analysis are typically user-generated textual content. The text differs a lot depending on the domain and platform that the content is derived from. For example, social media data is usually very subjective and full of informal speech, whereas news article websites are mostly objective and formally written. Twitter data is limited to a certain number of characters and contains hashtags and references, whereas product review websites take a specific product into account and describe this in-depth. ML models trained on a specific domain provide poor performance when tested on a dataset from a different domain. Different domains have different language use and, therefore, require different methods for analysis. Table  14 depicts the domains of the adopted datasets per study. Checkmarks indicate that datasets from the domain are explicitly mentioned in the referred article.

Social media data is the most widely used source of data. This data is usually easy to obtain through APIs. Especially, tweets are popular because they are relatively similar in format (e.g., a limited number of characters). Twitter has an API where tweets can be scraped on specific subjects, time range, hashtags, etc. Tweets contain worldwide real-time information on entities. Furthermore, scraped tweets contain information about the location, number of retweets, number of likes, and much more. Some reviewed articles focus specifically on Twitter data (Ahmed Ibrahim and Salim 2013 ; Kumar and Jaiswal 2020 ). Other social media platforms like Facebook and Tumblr are also used for sentiment analysis.

Reviews of products, hotels, and movies are also commonly used for text classification models. Reviews are usually combined with a star rating (i.e., label), which makes them suitable for machine learning models. Star ratings indicate polarity. This way, no labor-intensive manual labeling process or predefined lexicon is required.

4.4 RQ4 “What are the challenges and open problems with respect to sentiment analysis?”

All of the 14 selected papers include challenges and open problems in sentiment analysis. Table  15 shows the challenges that are explicitly described in the papers. These challenges are categorized and sorted by the number of selected papers that explicitly mention the challenge.

Domain dependency is a well-known challenge in sentiment analysis; most of the models that we build are dependent on the domain it was built in. Linguistic dependency is the second most stated and well-known challenge that originates from the same deeper problem. Specific text corpora per domain or language need to be available for the optimal performance of the ML model. Some studies investigate multi-lingual or multi-domain models.

Most papers use English text corpora. Spanish and Chinese are the second most used languages in sentiment analysis. Limited literature is available in other languages. Some studies attempted to create a multi-language model (Al-Moslmi et al. 2017 ), but this is still a challenging task (Kumar and Garg 2019 ; Qazi et al. 2017 ). Multi-lingual systems are an interesting topic for further research.

Deep learning is a promising but complex technique where syntactic structures and word order can be retained. Deep learning still poses some challenges and is not widely researched in the selected articles. Opinion spam or fake review detection is a prominent issue in sentiment analysis where the internet has become an integral part of life, and false information spreads just as fast as accurate information on the web (Vosoughi et al. 2018 ). Another major challenge is the multi-class classification. In general, more output classes in a classifier reduce the performance (Al-Moslmi et al. 2017 ). Multiple polarity classes and multiple classes of emotions (Madhala et al. 2018 ) have been shown to dramatically reduce the performance of the model.

Further challenges are incomplete information, implicit language, typos, slang, and all other kinds of inconsistencies in language use. Combining text with corresponding pictures, audio, and video is also challenging.

5 Discussion

The goal of this study is to present an overview of the current application of machine learning models and corresponding challenges in sentiment analysis. This is done by critically analyzing the selected secondary studies and extracting the relevant data considering the predefined research questions. This tertiary study follows the guidelines proposed by Kitchenham and Charters ( 2007 ) for conducting systematic literature reviews. The study initially selected 16 secondary studies. After the quality assessment, 14 secondary papers remained for data extraction. The research methodology is transparent and designed in such a way that it can be reproduced by other researchers. Like any secondary study, there are also some limitations to this tertiary study.

The SLRs included in this study have their specific research focus on sentiment analysis. Even though the methodology of the 14 secondary studies is similar, the documentation of techniques and methods differs a lot. Besides that, some SLR papers are more comprehensive than others. This made the data extraction process harder and prone to mistakes. Another limitation concerns the selection process. The criteria for inclusion are restricted to SLR and SMS papers. Some other studies chose to include non-systematic literature reviews as well to complement results, but we did not include traditional survey papers because they do not systematically synthesize the papers in a field.

The first threat to validity is related to the inclusion criteria for methods in research questions. Checkmarks in the tables of RQ2, RQ3, and RQ4 are placed when something is explicitly mentioned in the referred paper. The included secondary studies have their specific research focus with different sentiment analysis tasks and corresponding machine learning approaches. For instance, Kasmuri and Basiron ( 2017 ) discuss subjectivity classification, which typically uses different approaches compared to other sentiment analysis tasks. This variation in research focus influences the checkmarks placed in the tables.

Another threat related to inclusion criteria is that some secondary studies have more included papers than others. For example, Kumar and Sharma ( 2017 ) included 194 primary studies, where Mite-Baidal et al. ( 2018 ) only included eight primary studies. It is likely that papers with a higher number of included primary articles mention more different techniques and challenges, and thus, more checkmarks are placed in the tables compared to papers with a lower number of primary articles included.

Lastly, this tertiary study only considers the selected secondary papers and does not consult the primary papers selected by the secondary papers. If any mistakes are made in the documentation of results in the secondary articles, these mistakes will be reflected in this study as well.

6 Conclusion and future work

This study provides the results of a tertiary study on sentiment analysis methods whereby we aimed to highlight the adopted features (input/output), adopted approaches, the adopted data sets, and the challenges with respect to sentiment analysis. The answers to the research questions were derived based on in-depth secondary studies.

A different number of input and output features could be identified. Interestingly, some features appeared to be described in all the secondary studies, while other features were more specific to a selected set of secondary studies. The results further indicate that sentiment analysis has been applied in various domains, among which social media is the most popular. Also, the study showed that different domains require the use of different techniques.

There also seems a trend towards using more complex deep learning techniques, since they can detect more complex patterns in text and perform particularly well with larger datasets. In some use cases like, for example, advertisement, slight improvements in performance that can be obtained through deep learning can have a great impact. However, it should be noted that traditional machine learning models are less computationally expensive and perform sufficiently well for sentiment analysis tasks. They are widely praised for their performance and efficiency.

This study showed that the most prominent challenges in sentiment analysis are domain and language dependency. Specific text corpora are required for differing languages and domains of interest. Attempts for cross-domain and multi-lingual sentiment analysis models have been made, but this challenging task should be explored further. Other prominent challenges are opinion spam detection and the application of deep learning for sentiment analysis tasks. Overall, the study shows that sentiment analysis is a timely and important research topic. The adoption of a tertiary study showed the additional value that could not be derived from each of the secondary studies.

The following future directions and challenges have also been mainly discussed in deep learning-based survey papers: New datasets are required for more challenging tasks, common sense knowledge must be modeled, interpretable deep learning-based models must be developed, and memory-efficient models are required (Minaee et al. 2020 ). Domain adaptation techniques are needed, multi-lingual applications should be addressed, technical requirements such as a huge amount of labeled data requirement must be considered, and linguistic complications must be investigated (Do et al. 2019 ). Popular deep learning techniques such as deep reinforcement learning and generative adversarial networks can be evaluated to solve some challenging tasks, advantages of the BERT algorithm can be considered, language structures (e.g., slangs) can be investigated in detail, dynamic sentiment analysis can be studied, and sentiment analysis for heterogeneous data can be implemented (Habimana et al. 2020a ). Dependency trees in recursive neural networks can be investigated, domain adaptation can be analyzed in detail, and linguistic-subjective phenomena (e.g., irony and sarcasm) can be studied (Rojas-Barahona 2016 ). Different applications of sentiment analysis (e.g., medical domain and security screening of employees) can be implemented, and transfer learning approaches can be analyzed for sentiment classification (Yadav and Vishwakarma 2019). Comparative studies should be extended with new approaches and new datasets, and also hybrid approaches to reduce computational cost and improve performance must be developed (Dang et al. 2020 ).

Abid F, Li C, Alam M (2020) Multi-source social media data sentiment analysis using bidirectional recurrent convolutional neural networks. Comput Commun 157:102–115

Article   Google Scholar  

Ahmad M, Aftab S, Ali I, Hameed N (2017) Hybrid tools and techniques for sentiment analysis: a review 8(4):7

Ahmad M, Aftab S, Bashir MS, Hameed N (2018) Sentiment analysis using SVM: a systematic literature review. Int J Adv Comput Sci Appl 9(2):182–188 ( Scopus )

Google Scholar  

Ahmed Ibrahim M, Salim N (2013) Opinion analysis for twitter and Arabic tweets: a systematic literature review. J Theor Appl Inf Technol 56(3):338–348 ( Scopus )

Alam M, Abid F, Guangpei C, Yunrong LV (2020) Social media sentiment analysis through parallel dilated convolutional neural network for smart city applications. Comput Commun 154:129–137

Alarifi A, Tolba A, Al-Makhadmeh Z, Said W (2020) A big data approach to sentiment analysis using greedy feature selection with cat swarm optimization-based long short-term memory neural networks. J Supercomput 76(6):4414–4429

Alexandridis G, Michalakis K, Aliprantis J, Polydoras P, Tsantilas P, Caridakis G (2020) A deep learning approach to aspect-based sentiment prediction. In: IFIP International conference on artificial ıntelligence applications and ınnovations. Springer, Cham, pp 397–408

Al-Moslmi T, Omar N, Abdullah S, Albared M (2017) Approaches to cross-domain sentiment analysis: a systematic literature review. IEEE Access 5:16173–16192 ( Scopus )

Almotairi, M. (2009) A framework for successful CRM implementation. In: European and mediterranean conference on information systems. pp 1–14

Aslam A, Qamar U, Saqib P, Ayesha R, Qadeer A (2020) A novel framework for sentiment analysis using deep learning. In: 2020 22nd International conference on advanced communication technology (ICACT). IEEE, pp 525–529

Basiri ME, Abdar M, Cifci MA, Nemati S, Acharya UR (2020) A novel method for sentiment classification of drug reviews using fusion of deep and machine learning techniques. Knowl-Based Syst 198:1–19

Becker JU, Greve G, Albers S (2009) The impact of technological and organizational implementation of CRM on customer acquisition, maintenance, and retention. Int J Res Mark 26(3):207–215

Behera RN, Manan R, Dash S (2016) Ensemble based hybrid machine learning approach for sentiment classification-a review. Int J Comput Appl 146(6):31–36

Beseiso M, Elmousalami H (2020) Subword attentive model for arabic sentiment analysis: a deep learning approach. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 19(2):1–17

Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. Proc Eleventh Ann Conf Comput Learn Theory COLT’ 98:92–100

Article   MathSciNet   Google Scholar  

Bondielli A, Marcelloni F (2019) A survey on fake news and rumour detection techniques. Inf Sci 497:38–55

Budgen D, Brereton P, Drummond S, Williams N (2018) Reporting systematic reviews: some lessons from a tertiary study. Inf Softw Technol 95:62–74

Cadavid H, Andrikopoulos V, Avgeriou P (2020) Architecting systems of systems: a tertiary study. Inf Softw Technol 118(106202):1–18

Cai Y, Huang Q, Lin Z, Xu J, Chen Z, Li Q (2020) Recurrent neural network with pooling operation and attention mechanism for sentiment analysis: a multi-task learning approach. Knowl-Based Syst 203(105856):1–12

Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107

Cambria E, Hussain A, Havasi C, Eckl C (2009) Common sense computing: from the society of mind to digital intuition and beyond. In: European workshop on biometrics and ıdentity management. Springer, Berlin, Heidelberg, pp 252–259

Cambria E, Poria S, Gelbukh A, Thelwall M (2017) Sentiment analysis is a big suitcase. IEEE Intell Syst 32(6):74–80

Cambria E, Li Y, Xing FZ, Poria S, Kwok K (2020) SenticNet 6: ensemble application of symbolic and subsymbolic AI for sentiment analysis. In: Proceedings of the 29th ACM International conference on ınformation & knowledge management. pp 105–114

Can EF, Ezen-Can A., & Can, F. (2018). Multi-lingual sentiment analysis: an RNN-based framework for limited data. In: Proceedings of ACM SIGIR 2018 workshop on learning from limited or noisy data, Ann Arbor

Catal C, Mishra D (2013) Test case prioritization: a systematic mapping study. Softw Qual J 21(3):445–478

Chandra Y, Jana A (2020) Sentiment analysis using machine learning and deep learning. In: 2020 7th International conference on computing for sustainable global development (INDIACom). IEEE, pp. 1–4

Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: AISTATS vol 2005, pp 57–64

Che S, Li X (2020) HCI with DEEP learning for sentiment analysis of corporate social responsibility report. Curr Psychol. https://doi.org/10.1007/s12144-020-00789-y

Chen IJ, Popovich K (2003) Understanding customer relationship management (CRM). Bus Process Manag J 9(5):672–688. https://doi.org/10.1108/14637150310496758

Chen L, Chen G, Wang F (2015) Recommender systems based on user reviews: the state of the art. User Model User-Adap Inter 25(2):99–154. https://doi.org/10.1007/s11257-015-9155-5

Chen H, Sun M, Tu C, Lin Y, Liu Z (2016) Neural sentiment classification with user and product attention. In: Proceedings of the 2016 conference on empirical methods in natural language processing. pp 1650–1659

Chen P, Sun Z, Bing L, Yang W (2017) Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing. pp 452–461

Chen H, Liu J, Lv Y, Li MH, Liu M, Zheng Q (2018) Semi-supervised clue fusion for spammer detection in Sina Weibo. Inf Fusion 44:22–32. https://doi.org/10.1016/j.inffus.2017.11.002

Cheng Y, Yao L, Xiang G, Zhang G, Tang T, Zhong L (2020) Text sentiment orientation analysis based on multi-channel CNN and bidirectional GRU with attention mechanism. IEEE Access 8:134964–134975

Choi Y, Cardie C (2008) Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the 2008 conference on empirical methods in natural language processing. pp 793–801

Colón-Ruiz C, Segura-Bedmar I (2020) Comparing deep learning architectures for sentiment analysis on drug reviews. J Biomed Inform 110(103539):1–11

Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):23. https://doi.org/10.1186/s40537-015-0029-9

Cruzes DS, Dybå T (2011) Research synthesis in software engineering: a tertiary study. Inf Softw Technol 53(5):440–455

Curcio K, Santana R, Reinehr S, Malucelli A (2019) Usability in agile software development: a tertiary study. Comput Stand Interfaces 64:61–77. https://doi.org/10.1016/j.csi.2018.12.003

Da Silva NFF, Coletta LFS, Hruschka ER, Hruschka ER Jr (2016a) Using unsupervised information to improve semi-supervised tweet sentiment classification. Inf Sci 355:348–365. https://doi.org/10.1016/j.ins.2016.02.002

Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: a comparative study. Electronics 9(3):483

Dashtipour K, Gogate M, Li J, Jiang F, Kong B, Hussain A (2020) A hybrid persian sentiment analysis framework: Integrating dependency grammar based rules and deep neural networks. Neurocomputing 380:1–10

Da’u A, Salim N, Rabiu I, Osman A (2020a) Recommendation system exploiting aspect-based opinion mining with deep learning method. Inf Sci 512:1279–1292

Da’u A, Salim N, Rabiu I, Osman A (2020b) Weighted aspect-based opinion mining using deep learning for recommender system. Expert Syst Appl 140(112871):1–12

De Oliveira Lima T, Colaco Junior M, Nunes MASN (2018) Mining on line general opinions about sustainability of hotels: a systematic literature mapping. In: Gervasi O, Murgante B, Misra S, Stankova E, Torre CM, Rocha AMAC, Taniar D, Apduhan BO, Tarantino E, Ryu Y (eds) Computational science and ıts applications–ICCSA 2018. Springer, New York, pp 558–574

Chapter   Google Scholar  

Dessí D, Dragoni M, Fenu G, Marras M, Recupero DR (2020) Deep learning adaptation with word embeddings for sentiment analysis on online course reviews. deep learning-based approaches for sentiment analysis. Springer, Singapore, pp 57–83

Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies vol 1 (Long and Short Papers). pp 4171–4186

Dietterich TG (2002) Machine learning for sequential data: a review. In: Caelli T, Amin A, Duin RPW, de Ridder D, Kamel M (eds) Structural, syntactic, and statistical pattern recognition. Springer, Berlin Heidelberg, pp 15–30

Do HH, Prasad PWC, Maag A, Alsadoon A (2019) Deep learning for aspect-based sentiment analysis: a comparative review. Expert Syst Appl 118:272–299

Dong M, Li Y, Tang X, Xu J, Bi S, Cai Y (2020a) Variable convolution and pooling convolutional neural network for text sentiment classification. IEEE Access 8:16174–16186

Dong Y, Fu Y, Wang L, Chen Y, Dong Y, Li J (2020b) A sentiment analysis method of capsule network based on BiLSTM. IEEE Access 8:37014–37020

Duan J, Luo B, Zeng J (2020) Semi-supervised Learning with generative model for sentiment classification of stock messages. Expert Syst Appl 158(113540):1–9

Ebrahimi M, Yazdavar AH, Sheth A (2017) Challenges of sentiment analysis for dynamic events. IEEE Intell Syst 32(5):70–75

Elmuti D, Jia H, Gray D (2009) Customer relationship management strategic application and organizational effectiveness: An empirical investigation. J Strateg Mark 17(1):75–96. https://doi.org/10.1080/09652540802619301

Filatova, E. (2012). Irony and sarcasm: corpus generation and analysis using crowdsourcing. In: Lrec, pp 392–398

Gan C, Wang L, Zhang Z, Wang Z (2020a) Sparse attention based separable dilated convolutional neural network for targeted sentiment analysis. Knowl-Based Syst 188(104827):1–10

Gan C, Wang L, Zhang Z (2020b) Multi-entity sentiment analysis using self-attention based hierarchical dilated convolutional neural network. Future Gener Comput Syst 112:116–125

Genc-Nayebi N, Abran A (2017) A systematic literature review: opinion mining studies from mobile app store user reviews. J Syst Softw 125:207–219. https://doi.org/10.1016/j.jss.2016.11.027

Ghorbani M, Bahaghighat M, Xin Q, Özen F (2020) ConvLSTMConv network: a deep learning approach for sentiment analysis in cloud computing. J Cloud Comput 9(1):1–12

Gieseke F, Airola A, pahikkala T, Oliver K (2012) Sparse quasi-newton optimization for semi-supervised support vector machines. Proceedings of the 1st ınternational conference on pattern recognition applications and methods 45–54. https://doi.org/ https://doi.org/10.5220/0003755300450054

Giménez M, Palanca J, Botti V (2020) Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. a case of study in sentiment analysis. Neurocomputing 378:315–323

Gneiser MS (2010) Value-Based CRM. Bus Inf Syst Eng 2(2):95–103. https://doi.org/10.1007/s12599-010-0095-7

Goldberg AB, Zhu X (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of textgraphs: the first workshop on graph based methods for natural language processing. pp 45–52

Goulão M, Amaral V, Mernik M (2016) Quality in model-driven engineering: a tertiary study. Softw Qual J 24(3):601–633. https://doi.org/10.1007/s11219-016-9324-8

Gu T, Xu G, Luo J (2020) Sentiment analysis via deep multichannel neural networks with variational information bottleneck. IEEE Access 8:121014–121021

Gupta R, Sahu S, Espy-Wilson C, Narayanan S (2018) Semi-supervised and transfer learning approaches for low resource sentiment classification. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5109–5113

Habimana O, Li Y, Li R, Gu X, Yu G (2020a) Sentiment analysis using deep learning approaches: an overview. Sci China Inf Sci 63(1):1–36

Habimana O, Li Y, Li R, Gu X, Yan W (2020b) Attentive convolutional gated recurrent network: a contextual model to sentiment analysis. Int J Mach Learn Cybern 11:2637–2651

Hameed Z, Garcia-Zapirain B (2020) Sentiment classification using a single-layered BiLSTM model. IEEE Access 8:73992–74001

Han Y, Liu Y, Jin Z (2020a) Sentiment analysis via semi-supervised learning: a model based on dynamic threshold and multi-classifiers. Neural Comput Appl 32(9):5117–5129

Han Y, Liu M, Jing W (2020b) Aspect-level drug reviews sentiment analysis based on double BiGRU and knowledge transfer. IEEE Access 8:21314–21325

Haralabopoulos G, Anagnostopoulos I, McAuley D (2020) Ensemble deep learning for multilabel binary classification of user-generated content. Algorithms 13(4):83

Hassan R, Islam MR (2019) Detection of fake online reviews using semi-supervised and supervised learning. In: 2019 International conference on electrical, computer and communication engineering (ECCE). pp 1–5. https://doi.org/ https://doi.org/10.1109/ECACE.2019.8679186

Hemmatian F, Sohrabi MK (2017) A survey on classification techniques for opinion mining and sentiment analysis. Artif Intell Rev 52(3):1495–1545. https://doi.org/10.1007/s10462-017-9599-6

Huang M, Xie H, Rao Y, Feng J, Wang FL (2020b) Sentiment strength detection with a context-dependent lexicon-based convolutional neural network. Inf Sci 520:389–399

Huang F, Wei K, Weng J, Li Z (2020a) Attention-based modality-gated networks for image-text sentiment analysis. ACM Trans Multimed Comput Commun Appl (TOMM) 16(3):1–19

Huang M, Xie H, Rao Y, Liu Y, Poon LK, Wang FL (2020c) Lexicon-based sentiment convolutional neural networks for online review analysis. IEEE Trans Affect Comput (Early Access), 1–1

Hung BT (2020) Domain-specific versus general-purpose word representations in sentiment analysis for deep learning models. Frontiers in ıntelligent computing: theory and applications. Springer, Singapore, pp 252–264

Hung BT (2020) Integrating sentiment analysis in recommender systems. Reliability and statistical computing. Springer, Cham, pp 127–137

Hussain A, Cambria E (2018) Semi-supervised learning for big social data analysis. Neurocomputing 275:1662–1673

Ishaya T, Folarin M (2012) A service oriented approach to business intelligence in telecoms industry. Telemat Inform 29(3):273–285. https://doi.org/10.1016/j.tele.2012.01.004

Ji C, Wu H (2020) Cascade architecture with rhetoric long short-term memory for complex sentence sentiment analysis. Neurocomputing 405:161–172

Jia Z, Bai X, Pang S (2020) Hierarchical gated deep memory network with position-aware for aspect-based sentiment analysis. IEEE Access 8:136340–136347

Jiang T, Wang J, Liu Z, Ling Y (2020) Fusion-extraction network for multimodal sentiment analysis. Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 785–797

Jin N, Wu J, Ma X, Yan K, Mo Y (2020) Multi-task learning model based on multi-scale CNN and LSTM for sentiment classification. IEEE Access 8:77060–77072

Josiassen A, Assaf AG, Cvelbar LK (2014) CRM and the bottom line: do all CRM dimensions affect firm performance? Int J Hosp Manag 36:130–136. https://doi.org/10.1016/j.ijhm.2013.08.005

Kabra A, Shrawne S (2020) Location-wise news headlines classification and sentiment analysis: a deep learning approach. International conference on ıntelligent computing and smart communication 2019. Springer, Singapore, pp 383–391

Kamal A (2013) Subjectivity classification using machine learning techniques for mining feature-opinion pairs from web opinion sources 10(5):191–200

Kamal N, Andrew M, Tom M (2006) Semi-supervised text classification using EM. In: Chapelle O, Scholkopf B, Zien A (eds) Semi-supervised learning. The MIT Press, Cambridge, pp 32–55. https://doi.org/10.7551/mitpress/9780262033589.003.0003

Kansara D, Sawant V (2020) Comparison of traditional machine learning and deep learning approaches for sentiment analysis. Advanced computing technologies and applications. Springer, Singapore, pp 365–377

Karimpour J, Noroozi AA, Alizadeh S (2012) Web spam detection by learning from small labeled samples. Int J Comput Appl 50(21):1–5. https://doi.org/10.5120/7924-0993

Kasmuri E, Basiron H (2017) Subjectivity analysis in opinion mining—a systematic literature review. Int J Adv Soft Comput Appl 9(3):132–159 ( Scopus )

Khan M, Malviya A (2020) Big data approach for sentiment analysis of twitter data using Hadoop framework and deep learning. In: 2020 International conference on emerging trends in ınformation technology and engineering (ic-ETITE). IEEE, pp 1–5

Khedkar S, Shinde S (2020) Deep learning and ensemble approach for praise or complaint classification. Procedia Comput Sci 167:449–458

Khedkar S, Shinde S (2020a) Deep learning-based approach to classify praises or complaints. In: Proceeding of ınternational conference on computational science and applications: ICCSA 2019. Springer, New York, p 391

Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp 1746–1751

Kim H-S, Kim Y-G (2009) A CRM performance measurement framework: Its development process and application. Ind Mark Manag 38(4):477–489. https://doi.org/10.1016/j.indmarman.2008.04.008

Kiran R, Kumar P, Bhasker B (2020) OSLCFit (organic simultaneous LSTM and CNN fit): a novel deep learning based solution for sentiment polarity classification of reviews. Expert Syst Appl 157(113488):1–12

Kitchenham B (2004) Procedures for performing systematic reviews, vol 33. Keele University, Keele, UK, pp 1–26

Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering version 2.3. Engineering, 45(4ve), p 1051

Kitchenham BA, Dyba T, Jorgensen M (2004) Evidence-based software engineering. In: Proceedings 26th ınternational conference on software engineering. IEEE, pp 273–281

Kitchenham B, Pretorius R, Budgen D, Pearl Brereton O, Turner M, Niazi M, Linkman S (2010) Systematic literature reviews in software engineering–a tertiary study. Inf Softw Technol 52(8):792–805. https://doi.org/10.1016/j.infsof.2010.03.006

Kitchenham BA, Budgen D, Brereton OP (2010b) The value of mapping studies–a participant-observer case study. In: 14th international conference on evaluation and assessment in software engineering (ease). pp 1–9

Kitchenham BA, Budgen D, Pearl Brereton O (2011) Using mapping studies as the basis for further research–a participant-observer case study. Inf Softw Technol 53(6):638–651. https://doi.org/10.1016/j.infsof.2010.12.011

Koksal O, Tekinerdogan B (2017) Feature-driven domain analysis of session layer protocols of internet of things. IEEE Int Congr Internet Things (ICIOT) 2017:105–112. https://doi.org/10.1109/IEEE.ICIOT.2017.19

Krouska A, Troussas C, Virvou M (2020) Deep learning for twitter sentiment analysis: the effect of pre-trained word embedding. Machine learning paradigms. Springer, Cham, pp 111–124

Kula S, Choraś M, Kozik R, Ksieniewicz P, Woźniak M (2020) Sentiment analysis for fake news detection by means of neural networks. International conference on computational science. Springer, Cham, pp 653–666

Kumar V (2010) Customer relationship management. In: Wiley ınternational encyclopedia of marketing. American Cancer Society, Georgia. https://onlinelibrary.wiley.com/doi/abs/ https://doi.org/10.1002/9781444316568.wiem01015

Kumar A, Garg G (2019) Systematic literature review on context-based sentiment analysis in social multimedia. Multimed Tools Appl 79:15349–15380

Kumar R, Garg S (2020) Aspect-based sentiment analysis using deep learning convolutional neural network. Information and communication technology for sustainable development. Springer, Singapore, pp 43–52

Kumar A, Jaiswal A (2020) Systematic literature review of sentiment analysis on Twitter using soft computing techniques. Concurr Comput Pract Exp 32(1):e5107

Kumar NS, Malarvizhi N (2020) Bi-directional LSTM–CNN combined method for sentiment analysis in part of speech tagging (PoS). Int J Speech Technol 23:373–380

Kumar V, Reinartz W (2016) Creating enduring customer value. J Mark 80(6):36–68. https://doi.org/10.1509/jm.15.0414

Kumar A, Sebastian TM (2012) Sentiment analysis: a perspective on its past, present and future. Int J Intell Syst Appl 4(10):1–14. https://doi.org/10.5815/ijisa.2012.10.01

Kumar A, Sharan A (2020) Deep learning-based frameworks for aspect-based sentiment analysis. Deep learning-based approaches for sentiment analysis. Springer, Singapore, pp 139–158

Kumar A, Sharma A (2017) Systematic literature review on opinion mining of big data for government intelligence. Webology 14(2):6–47 ( Scopus )

Kumar R, Pannu HS, Malhi AK (2020) Aspect-based sentiment analysis using deep networks and stochastic optimization. Neural Comput Appl 32(8):3221–3235

Kumar A, Srinivasan K, Cheng WH, Zomaya AY (2020) Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf Process Manag 57(1):102141

Ładyżyński P, Żbikowski K, Gawrysiak P (2019) Direct marketing campaigns in retail banking with the use of deep learning and random forests. Expert Syst Appl 134:28–35. https://doi.org/10.1016/j.eswa.2019.05.020

Lai Y, Zhang L, Han D, Zhou R, Wang G (2020) Fine-grained emotion classification of Chinese microblogs based on graph convolution networks. World Wide Web 23(5):2771–2787

Li G, Liu F (2014) Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions. Appl Intell 40(3):441–452. https://doi.org/10.1007/s10489-013-0463-3

Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: Proceedings of the twenty-second international joint conference on artificial ıntelligence-volume Vol 3. pp 2488–2493

Li W, Zhu L, Shi Y, Guo K, Zheng Y (2020) User reviews: Sentiment analysis using lexicon integrated two-channel CNN-LSTM family models. Appl Soft Comput 94(106435):1–11

Li L, Goh TT, Jin D (2020) How textual quality of online reviews affect classification performance: a case of deep learning sentiment analysis. Neural Comput Appl 32(9):4387–4415

Li D, Rzepka R, Ptaszynski M, Araki K (2020) HEMOS: a novel deep learning-based fine-grained humor detecting method for sentiment analysis of social media. Inf Process Manag 57(6):102290

Lim WL, Ho CC, Ting CY (2020) Tweet sentiment analysis using deep learning with nearby locations as features. Computational science and technology. Springer, Singapore, pp 291–299

Lin Y, Li J, Yang L, Xu K, Lin H (2020) Sentiment analysis with comparison enhanced deep neural network. IEEE Access 8:78378–78384

Ling M, Chen Q, Sun Q, Jia Y (2020) Hybrid neural network for sina weibo sentiment analysis. IEEE Trans Comput Soc Syst 7(4):983–990

Liu B (2020) Text sentiment analysis based on CBOW model and deep learning in big data environment. J Ambient Intell Humaniz Comput 11(2):451–458

Liu Q, Mukaidani H (2020) Effective-target representation via LSTM with attention for aspect-level sentiment analysis. In: 2020 ınternational conference on artificial ıntelligence in ınformation and communication (ICAIIC). IEEE, pp 336–340

Liu N, Shen B (2020) Aspect-based sentiment analysis with gated alternate neural network. Knowl-Based Syst 188(105010):1–14

Liu N, Shen B (2020) ReMemNN: a novel memory neural network for powerful interaction in aspect-based sentiment analysis. Neurocomputing 395:66–77

Lou Y, Zhang Y, Li F, Qian T, Ji D (2020) Emoji-based sentiment analysis using attention networks. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 19(5):1–13

Lu Q, Zhu Z, Zhang D, Wu W, Guo Q (2020) Interactive rule attention network for aspect-level sentiment analysis. IEEE Access 8:52505–52516

Lu G, Zhao X, Yin J, Yang W, Li B (2020) Multi-task learning using variational auto-encoder for sentiment classification. Pattern Recogn Lett 132:115–122

Luo J, Huang S, Wang R (2020) A fine-grained sentiment analysis of online guest reviews of economy hotels in China. J Hosp Mark Manag 1–25

Ma Y, Peng H, Cambria E (2018) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In; Proceedings of the AAAI conference on artificial ıntelligence vol 32. pp 5876–5883

Madhala P, Jussila J, Aramo-Immonen H, Suominen A (2018) Systematic literature review on customer emotions in social media. In: ECSM 2018 5th European conference on social media. Academic Conferences and publishing limited, South Oxfordshire, pp 154–162

Maglogiannis IG (ed) (2007) Emerging artificial intelligence applications in computer engineering: real word ai systems with applications in ehealth, hci, information retrieval and pervasive technologies, vol 160. Ios Press, Amsterdam

Mahmood Z, Safder I, Nawab RMA, Bukhari F, Nawaz R, Alfakeeh AS, Hassan SU (2020) Deep sentiments in Roman Urdu text using recurrent convolutional neural network model. Inf Process Manag 57(4):102233

Majumder N, Poria S, Peng H, Chhaya N, Cambria E, Gelbukh A (2019) Sentiment and sarcasm classification with multitask learning. IEEE Intell Syst 34(3):38–43

Meškelė D, Frasincar F (2020) ALDONAr: a hybrid solution for sentence-level aspect-based sentiment analysis using a lexicalized domain ontology and a regularized neural attention model. Inf Process Manag 57(3):102211

Minaee S, Azimi E, Abdolrashidi A (2019) Deep-sentiment: sentiment analysis using ensemble of cnn and bi-lstm models. http://arxiv.org/abs/arXiv:1904.04206

Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2020) Deep learning based text classification: a comprehensive review 1(1):1–43. http://arxiv.org/abs/arXiv:2004.03705

Mite-Baidal K, Delgado-Vera C, Solís-Avilés E, Espinoza AH, Ortiz-Zambrano J, Varela-Tapia E (2018) Sentiment analysis in education domain: a systematic literature review. Commun Comput Inf Sci 883:285–297. https://doi.org/10.1007/978-3-030-00940-3_21 ( Scopus )

Naseem U, Razzak I, Musial K, Imran M (2020) Transformer based deep intelligent contextual embedding for twitter sentiment analysis. Future Gener Comput Syst 113:58–69

Nguyen TH, Shirai K (2015) Topic modeling based sentiment analysis on social media for stock market prediction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th ınternational joint conference on natural language processing vol 1. pp 1354–1364. https://doi.org/ https://doi.org/10.3115/v1/P15-1131

Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth ınternational conference on ınformation and knowledge management—CIKM '00. pp 86–93. https://doi.org/ https://doi.org/10.1145/354756.354805

Nurdiani I, Börstler J, Fricker SA (2016) The impacts of agile and lean practices on project constraints: a tertiary study. J Syst Softw 119:162–183

Ombabi AH, Ouarda W, Alimi AM (2020) Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc Netw Anal Min 10(1):1–13

Onan A (2020) Mining opinions from instructor evaluation reviews: a deep learning approach. Comput Appl Eng Edu 28(1):117–138

Onan A (2020a) Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach. Comput Appl Eng Edu 1–18

Onan A (2020b) Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr Comput Pract Exp e5909

Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies vol 1. pp 309–319

Pan Y, Liang M (2020) Chinese text sentiment analysis based on BI-GRU and self-attention. In: 2020 IEEE 4th ınformation technology, networking, electronic and automation control conference (ITNEC) vol. 1. IEEE, pp 1983–1988

Parimala M, Swarna Priya RM, Praveen Kumar Reddy M, Lal Chowdhary C, Kumar Poluru R, Khan S (2020) Spatiotemporal‐based sentiment analysis on tweets for risk assessment of event using deep learning approach. Softw Pract Exp 1–21

Park HJ, Song M, Shin KS (2020) Deep learning models and datasets for aspect term sentiment classification: implementing holistic recurrent attention on target-dependent memories. Knowl-Based Syst 187(104825):1–15

Patel P, Patel D, Naik C (2020) Sentiment analysis on movie review using deep learning RNN method. Intelligent data engineering and analytics. Springer, Singapore, pp 155–163

Pavlinek M, Podgorelec V (2017) Text classification method based on self-training and LDA topic models. Expert Syst Appl 80:83–93. https://doi.org/10.1016/j.eswa.2017.03.020

Payne A, Frow P (2005) A strategic framework for customer relationship management. J Mark 69(4):167–176. https://doi.org/10.1509/jmkg.2005.69.4.167

Peng M, Zhang Q, Jiang Y, Huang X (2018) Cross-domain sentiment classification with target domain specific ınformation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics vol 1. pp 2505–2513. https://doi.org/ https://doi.org/10.18653/v1/P18-1233

Peng H, Xu L, Bing L, Huang F, Lu W, Si L (2020) Knowing what, how and why: a near complete solution for aspect-based sentiment analysis. In AAAI. pp 8600–8607

Pergola G, Gui L, He Y (2019) TDAM: a topic-dependent attention model for sentiment analysis. Inf Process Manag 56(6):102084

Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. In: 12th international conference on evaluation and assessment in software engineering (EASE) 12. pp 1–10

Phillips-Wren G, Hoskisson A (2015) An analytical journey towards big data. J Decis Syst 24(1):87–102. https://doi.org/10.1080/12460125.2015.994333

Poria S, Chaturvedi I, Cambria E, Bisio F (2016) Sentic LDA: ımproving on LDA with semantic similarity for aspect-based sentiment analysis. In: 2016 international joint conference on neural networks (IJCNN). IEEE, pp 4465–4473

Poria S, Majumder N, Hazarika D, Cambria E, Gelbukh A, Hussain A (2018) Multimodal sentiment analysis: addressing key issues and setting up the baselines. IEEE Intell Syst 33(6):17–25

Poria S, Hazarika D, Majumder N, Mihalcea R (2020) Beneath the tip of the ıceberg: current challenges and new directions in sentiment analysis research. http://arxiv.org/abs/arXiv:2005.00357

Portugal I, Alencar P, Cowan D (2018) The use of machine learning algorithms in recommender systems: a systematic review. Expert Syst Appl 97:205–227. https://doi.org/10.1016/j.eswa.2017.12.020

Pozzi FA, Fersini E, Messina E, Liu B (2017) Challenges of sentiment analysis in social networks: an overview. In: Pozzi FA, Fersini E, Messina E, Liu B (eds) Sentiment analysis in social networks. Morgan Kaufmann, Burlington, pp 1–11

Pröllochs N, Feuerriegel S, Lutz B, Neumann D (2020) Negation scope detection for sentiment analysis: a reinforcement learning framework for replicating human interpretations. Inf Sci 536:205–221

Qazi A, Raj RG, Hardaker G, Standing C (2017) A systematic literature review on opinion types and sentiment analysis techniques: tasks and challenges. Internet Res 27(3):608–630. https://doi.org/10.1108/IntR-04-2016-0086 ( Scopus )

Qiu G, Liu B, Bu J, Chen C (2009) Expanding domain sentiment lexicon through double propagation. In: IJCAI vol 9. pp 1199–1204

Raatikainen M, Tiihonen J, Männistö T (2019) Software product lines and variability modeling: a tertiary study. J Syst Softw 149:485–510. https://doi.org/10.1016/j.jss.2018.12.027

Rababah K, Mohd H, Ibrahim H (2011) A unified definition of CRM towards the successful adoption and implementation. Acad Res Int 1(1):220–228

Rambocas M, Pacheco BG (2018) Online sentiment analysis in marketing research: a review. J Res Interact Mark 12(2):146–163. https://doi.org/10.1108/JRIM-05-2017-0030

Rao AVSR, Ranjana P (2020) Deep learning method to ıdentify the demographic attribute to enhance effectiveness of sentiment analysis. Innovations in computer science and engineering. Springer, Singapore, pp 275–285

Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl-Based Syst 89:14–46. https://doi.org/10.1016/j.knosys.2015.06.015

Ray P, Chakrabarti A (2020) A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis. Appl Comput Inform. https://doi.org/10.1016/j.aci.2019.02.002

Reddy YCAP, Viswanath P, Eswara Reddy B (2018) Semi-supervised learning: a brief review. Int J Eng Technol 7(18):81

Reichheld FF, Schefter P (2000) E-loyalty: your secret weapon on the web. Harv Bus Rev 78(4):105–113

Reinartz W, Krafft M, Hoyer WD (2004) The customer relationship management process: its measurement and impact on performance. J Mark Res 41(3):293–305. https://doi.org/10.1509/jmkr.41.3.293.35991

Ren Z, Zeng G, Chen L, Zhang Q, Zhang C, Pan D (2020) A lexicon-enhanced attention network for aspect-level sentiment analysis. IEEE Access 8:93464–93471

Ren F, Feng L, Xiao D, Cai M, Cheng S (2020) DNet: a lightweight and efficient model for aspect based sentiment analysis. Expert Syst Appl 151(113393):1–10

Ren L, Xu B, Lin H, Liu X, Yang L (2020) Sarcasm detection with sentiment semantics enhanced multi-level memory network. Neurocomputing 401:320–326

Rios N, de Mendonça Neto MG, Spínola RO (2018) A tertiary study on technical debt: types, management strategies, research trends, and base information for practitioners. Inf Softw Technol 102:117–145. https://doi.org/10.1016/j.infsof.2018.05.010

Rodrigues Chagas BN, Nogueira Viana JA, Reinhold O, Lobato F, Jacob AFL, Alt R (2018) Current applications of machine learning techniques in CRM: a literature review and practical implications. IEEE/WIC/ACM Int Conf Web Intell (WI) 2018:452–458. https://doi.org/10.1109/WI.2018.00-53

Rojas-Barahona LM (2016) Deep learning for sentiment analysis. Lang Linguist Compass 10(12):701–719

Rout JK, Dalmia A, Choo K-KR, Bakshi S, Jena SK (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5:1319–1327. https://doi.org/10.1109/ACCESS.2017.2655032

Rygielski C, Wang J-C, Yen DC (2002) Data mining techniques for customer relationship management. Technol Soc 24(4):483–502. https://doi.org/10.1016/S0160-791X(02)00038-6

Sabbeh SF (2018) Machine-learning techniques for customer retention: A comparative study. Int J Adv Comput Sci Appl 9(2):273–281

Sadr H, Pedram MM, Teshnehlab M (2020) Multi-view deep network: a deep model based on learning features from heterogeneous neural networks for sentiment analysis. IEEE Access 8:86984–86997

Salah Z, Al-Ghuwairi A-RF, Baarah A, Aloqaily A, Qadoumi B, Alhayek M, Alhijawi B (2019) A systematic review on opinion mining and sentiment analysis in social media. Int J Bus Inf Syst 31(4):530–554. https://doi.org/10.1504/IJBIS.2019.101585 ( Scopus )

Salur MU, Aydin I (2020) A novel hybrid deep learning model for sentiment classification. IEEE Access 8:58080–58093

Sangeetha K, Prabha D (2020) Sentiment analysis of student feedback using multi-head attention fusion model of word and context embedding for LSTM. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-01791-9

Sankar H, Subramaniyaswamy V, Vijayakumar V, Arun Kumar S, Logesh R, Umamakeswari AJSP (2020) Intelligent sentiment analysis approach using edge computing-based deep learning technique. Softw Pract Exp 50(5):645–657

Sawant SS, Prabukumar M (2018) A review on graph-based semi-supervised learning methods for hyperspectral image classification. Egypt J Remote Sens Space Sci 23(2):243–248. https://doi.org/10.1016/j.ejrs.2018.11.001

Schouten K, Frasincar F (2016) Survey on aspect-level sentiment analysis. IEEE Trans Knowl Data Eng 28(3):813–830. https://doi.org/10.1109/TKDE.2015.2485209

Seo S, Kim C, Kim H, Mo K, Kang P (2020) Comparative study of deep learning-based sentiment classification. IEEE Access 8:6861–6875

Shakeel MH, Karim A (2020) Adapting deep learning for sentiment classification of code-switched informal short text. In: Proceedings of the 35th annual ACM symposium on applied computing. pp 903–906

Sharma SS, Dutta G (2018) Polarity determination of movie reviews: a systematic literature review. Int J of Innov Knowl Concepts 6:12

Shayaa S, Jaafar NI, Bahri S, Sulaiman A, Seuk Wai P, Wai Chung Y, Piprani AZ, Al-Garadi MA (2018) Sentiment analysis of big data: Methods, applications, and open challenges. IEEE Access 6:37807–37827. https://doi.org/10.1109/ACCESS.2018.2851311 ( Scopus )

Shirani-Mehr H (2014) Applications of deep learning to sentiment analysis of movie reviews. Tech Report 1–8

Shuang K, Yang Q, Loo J, Li R, Gu M (2020) Feature distillation network for aspect-based sentiment analysis. Inf Fusion 61:13–23

Silva NFFD, Coletta LFS, Hruschka ER (2016) A survey and comparative study of tweet sentiment analysis via semi-supervised learning. ACM Comput Surv 49(1):1–26. https://doi.org/10.1145/2932708

Singh PK, Sharma S, Paul S (2020) Identifying hidden sentiment in text using deep neural network. In 2nd ınternational conference on data, engineering and applications (IDEA). IEEE, pp 1–5

Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing. pp 1631–1642

Studiawan H, Sohel F, Payne C (2020) Sentiment analysis in a forensic timeline with deep learning. IEEE Access 8:60664–60675

Su YJ, Hu WC, Jiang JH, Su RY (2020) A novel LMAEB-CNN model for Chinese microblog sentiment analysis. J Supercomput 76:9127–9141

Sun X, He J (2020) A novel approach to generate a large scale of supervised data for short text sentiment analysis. Multimed Tools Appl 79(9):5439–5459

Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307

Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., & Qin, B. (2014, June). Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics vol 1. pp 1555–1565

Tao J, Fang X (2020) Toward multi-label sentiment analysis: a transfer learning based approach. J Big Data 7(1):1–26

Thet TT, Na J-C, Khoo CSG (2010) Aspect-based sentiment analysis of movie reviews on discussion boards. J Inf Sci 36(6):823–848. https://doi.org/10.1177/0165551510388123

Tran TU, Hoang HTT, Huynh HX (2020) Bidirectional ındependently long short-term memory and conditional random field ıntegrated model for aspect extraction in sentiment analysis. Frontiers in ıntelligent computing: theory and applications. Springer, Singapore, pp 131–140

Tsai C, Hu Y, Hung C, Hsu Y (2013) A comparative study of hybrid machine learning techniques for customer lifetime value prediction. Kybernetes 42(3):357–370. https://doi.org/10.1108/03684921311323626

Ullah MA, Marium SM, Begum SA, Dipa NS (2020) An algorithm and method for sentiment analysis using the text and emoticon. ICT Express 6(4):357–360

Usama M, Ahmad B, Song E, Hossain MS, Alrashoud M, Muhammad G (2020) Attention-based sentiment analysis using convolutional and recurrent neural network. Future Gener Comput Syst 113:571–578

Valdivia A, Luzón MV, Herrera F (2017) Sentiment analysis in tripadvisor. IEEE Intell Syst 32(4):72–77

Valdivia A, Martínez-Cámara E, Chaturvedi I, Luzón MV, Cambria E, Ong YS, Herrera F (2020) What do people think about this monument? understanding negative reviews via deep learning, clustering and descriptive rules. J Ambient Intell Humaniz Comput 11(1):39–52

Vechtomova O (2017) Disambiguating context-dependent polarity of words: an information retrieval approach. Inf Process Manag 53(5):1062–1079

Venkatakrishnan S, Kaushik A, Verma JK (2020) Sentiment analysis on google play store data using deep learning. Applications of machine learning. Springer, Singapore, pp 15–30

Verhoef PC, Venkatesan R, McAlister L, Malthouse EC, Krafft M, Ganesan S (2010) CRM in data-rich multichannel retailing environments: a review and future research directions. J Interact Mark 24(2):121–137. https://doi.org/10.1016/j.intmar.2010.02.009

Verner JM, Brereton OP, Kitchenham BA, Turner M, Niazi M (2014) Risks and risk mitigation in global software development: a tertiary study. Inf Softw Technol 56(1):54–78

Vinodhini G, Chandrasekaran RM (2012) Sentiment analysis and opinion mining: a survey. Int J 2(6):282–292

Vosoughi S, Roy D, Aral S (2018) The spread of true and false news online. Science 359(6380):1146–1151

Vyas V, Uma V (2019) Approaches to sentiment analysis on product reviews. Sentiment analysis and knowledge discovery in contemporary business. IGI Global, Pennsylvania, pp 15–30

Wadawadagi R, Pagi V (2020) Sentiment analysis with deep neural networks: comparative study and performance assessment. Artif Intell Rev 53:6155–6195

Wang G, Sun J, Ma J, Xu K, Gu J (2014) Sentiment classification: the contribution of ensemble learning. Decis Support Syst 57:77–93. https://doi.org/10.1016/j.dss.2013.08.002

Wang Y, Huang M, Zhu X, Zhao L (2016) Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing. pp 606–615

Wang S, Zhu Y, Gao W, Cao M, Li M (2020) Emotion-semantic-enhanced bidirectional LSTM with multi-head attention mechanism for microblog sentiment analysis. Information 11(5):280

Wehrmann J, Becker W, Cagnini HE, Barros RC (2017) A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In: 2017 International joint conference on neural networks (IJCNN). IEEE, pp 2384–2391

Wilcox PA, Gurău C (2003) Business modelling with UML: the implementation of CRM systems for online retailing. J Retail Consum Serv 10(3):181–191. https://doi.org/10.1016/S0969-6989(03)00004-3

Winer RS (2001) A framework for customer relationship management. Calif Manag Rev 43(4):89–105. https://doi.org/10.2307/41166102

Wu C, Wu F, Wu S, Yuan Z, Liu J, Huang Y (2019) Semi-supervised dimensional sentiment analysis with variational autoencoder. Knowl-Based Syst 165:30–39

Xi D, Zhuang F, Zhou G, Cheng X, Lin F, He Q (2020) Domain adaptation with category attention network for deep sentiment analysis. In: Proceedings of the web conference 2020. pp 3133–3139

Xia Y, Cambria E, Hussain A, Zhao H (2015) Word polarity disambiguation using bayesian model and opinion-level features. Cognit Comput 7(3):369–380

Xu W, Tan Y (2019) Semi-supervised target-oriented sentiment classification. Neurocomputing 337:120–128

Xu G, Meng Y, Qiu X, Yu Z, Wu X (2019) Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7:51522–51532

Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep learning architectures: a review. Artif Intell Rev 53(6):4335–4385

Yadav A, Vishwakarma DK (2020) A deep learning architecture of RA-DLNet for visual sentiment analysis. Multimed Syst 26:431–451

Yang L, Li Y, Wang J, Sherratt RS (2020) Sentiment analysis for E-commerce product reviews in chinese based on sentiment lexicon and deep learning. IEEE Access 8:23522–23530

Yao F, Wang Y (2020) Domain-specific sentiment analysis for tweets during hurricanes (DSSA-H): a domain-adversarial neural-network-based approach. Comput Environ Urban Syst 83(101522):1–14

Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting on association for computational linguistics. pp189–196. https://doi.org/ https://doi.org/10.3115/981658.981684

Yildirim S (2020) Comparing deep neural networks to traditional models for sentiment analysis in Turkish language. Deep learning-based approaches for sentiment analysis. Springer, Singapore, pp 311–319

Zerbino P, Aloini D, Dulmin R, Mininno V (2018) Big Data-enabled customer relationship management: a holistic approach. Inf Process Manag 54(5):818–846. https://doi.org/10.1016/j.ipm.2017.10.005

Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):e1253

Zhang B, Li X, Xu X, Leung KC, Chen Z, Ye Y (2020) Knowledge guided capsule attention network for aspect-based sentiment analysis. IEEE/ACM Trans Audio Speech Lang Process 28:2538–2551

Zhang S, Xu X, Pang Y, Han J (2020) Multi-layer attention based CNN for target-dependent sentiment classification. Neural Process Lett 51(3):2089–2103

Zhao P, Hou L, Wu O (2020) Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification. Knowl-Based Syst 193(105443):1–10

Zhou J, Huang JX, Hu QV, He L (2020) Is position important? deep multi-task learning for aspect-based sentiment analysis. Appl Intell 50:3367–3378

Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Technical report CMU-CALD-02–107, Carnegie Mellon University. 8

Zhu X, Yin S, Chen Z (2020) Attention based BiLSTM-MCNN for sentiment analysis. In: 2020 IEEE 5th international conference on cloud computing and big data analytics (ICCCBDA). IEEE, pp 170–174

Zuo E, Zhao H, Chen B, Chen Q (2020) Context-specific heterogeneous graph convolutional network for implicit sentiment analysis. IEEE Access 8:37967–37975

Download references

Open Access funding provided by the Qatar National Library.

Author information

Authors and affiliations.

Information Technology Group, Wageningen University & Research, Wageningen, The Netherlands

Alexander Ligthart & Bedir Tekinerdogan

Department of Computer Science & Engineering, Qatar University, Doha, Qatar

Cagatay Catal

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Cagatay Catal .

Ethics declarations

Conflict of interest.

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Ligthart, A., Catal, C. & Tekinerdogan, B. Systematic reviews in sentiment analysis: a tertiary study. Artif Intell Rev 54 , 4997–5053 (2021). https://doi.org/10.1007/s10462-021-09973-3

Download citation

Accepted : 08 February 2021

Published : 03 March 2021

Issue Date : October 2021

DOI : https://doi.org/10.1007/s10462-021-09973-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Sentiment analysis
  • Tertiary study
  • Systematic literature review
  • Sentiment classification
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Artif Intell

Logo of frontai

Sentiment Analysis of Students’ Feedback in MOOCs: A Systematic Literature Review

Fisnik dalipi.

1 Faculty of Technology, Linnaeus University, Växjö, Sweden

Katerina Zdravkova

2 Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia

Fredrik Ahlgren

Peter Ilic , University of Aizu, Japan

Associated Data

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

In recent years, sentiment analysis (SA) has gained popularity among researchers in various domains, including the education domain. Particularly, sentiment analysis can be applied to review the course comments in massive open online courses (MOOCs), which could enable instructors to easily evaluate their courses. This article is a systematic literature review on the use of sentiment analysis for evaluating students’ feedback in MOOCs, exploring works published between January 1, 2015, and March 4, 2021. To the best of our knowledge, this systematic review is the first of its kind. We have applied a stepwise PRISMA framework to guide our search process, by searching for studies in six electronic research databases (ACM, IEEE, ScienceDirect, Springer, Scopus, and Web of Science). Our review identified 40 relevant articles out of 440 that were initially found at the first stage. From the reviewed literature, we found that the research has revolved around six areas: MOOC content evaluation, feedback contradiction detection, SA effectiveness, SA through social network posts, understanding course performance and dropouts, and MOOC design model evaluation. In the end, some recommendations are provided and areas for future research directions are identified.

Introduction

Recent innovations in digital learning have provided great opportunities to shift learning pedagogies away from conventional lecture methods toward more creative and effective teaching methods. These methods involve learners in collaborative learning and offer open access to course content to a large scale of learners. One such learning method that has received much attention is the Massive Open Online Courses (MOOCs), whose slogan is: “Education for anyone, anywhere, and any time” ( Zemsky, 2014 ). MOOCs are online courses that offer free access via the Web to a huge number of learners around the world. They introduce interactive user forums that support and encourage collaborative learning and active participation of students ( Rabbany et al., 2014 ). Moreover, their spread and popularity are enabling learners to satisfy the learning expectations and needs in an open, engaging and distributed manner ( Littlejohn et al., 2016 ; Dalipi et al., 2017 ). Students’ feedback represents an indispensable source of information that can be used by teachers or educational instructors in order to enhance learning procedures and training activities. The popularity and importance of student’s feedback have increased especially in the COVID-19 pandemic times when most educational institutions have transcended traditional face-to-face learning to online format. However, due to the nature of the language used by students and the large volume of information expressing their points of view and emotions about different aspects in MOOCs forums, dealing with and processing the students’ opinions is a complex task. One way to overcome these challenges is by leveraging the advantages of sentiment analysis and opinion mining techniques.

Sentiment analysis, which is the process of finding sentiment words and phrases that exhibiting emotions, has attracted a lot of research attention recently, especially in the education domain in general and in MOOCs in particular ( Lundqvist et al., 2020 ; Onan, 2021 ). SA systems use natural language processing (NLP) and machine learning (ML) techniques to discover, retrieve, and distill information and opinions from vast textual information ( Cambria et al., 2013 ).

Sentiments can provide a valuable source of information not only for analyzing a student’s behavior towards a course topic, but also for enhancing policies and higher education institutions for their improvement ( Kastrati et al., 2021 ). In this perspective, the past couple of years there has been a trend with increased publications where different sentiment analysis techniques, including NLP, and deep learning (DL), are successfully used for this purpose ( Estrada et al., 2020 ; Zhou and Ye, 2020 ).

The main goal of this paper is to critically evaluate the body of knowledge related to sentiment analysis of students’ feedback in MOOCs, by answering research questions through a stepwise framework for conducting systematic reviews. By exploring the current state of knowledge in the field, we also demonstrated that the knowledge body of educational technology research lacks a comprehensive and systematic review that covers studies about MOOCs learners’ feedback sentiment analysis. Therefore, our study will try to fill these gaps by analyzing and synthesizing research findings to describe state of the art and provide some valuable guidelines for new research and development efforts in the field.

Furthermore, the findings derived from this review can serve as a basis and a guide for future research and teaching practice as MOOC based teaching is becoming one of the approaches that is widely implemented in traditional curriculum and educational practices of many higher education institutions.

The rest of the paper is organized as follows: Methodology describes the search strategy and methodology adopted in conducting the study. Results and Analysis presents the systematic review study results. Themes identified from the investigated papers are described in Discussion . Discussion also outlines recommendations and future research directions for the development of effective sentiment analysis systems. Lastly, final conclusions are drawn in the Conclusion section.

Methodology

For this systematic literature review (SLR) study, the PRISMA guidelines provided in ( Liberati et al., 2009 ) were applied. SLR represents a thorough and comprehensive research method for conducting a literature review in a systematic manner by strictly following well-defined steps. This method is guided by specific research questions; and by being systematic and explicit, it reduces biases in the review process. It also includes applying a structured and stepwise approach and designing a research protocol ( Petticrew and Roberts, 2006 ; Staples and Niazi, 2007 ; Liberati et al., 2009 ; Onwuegbuzie et al., 2012 ). As also reported by Fink (2019) , a systematic literature review is an organized, comprehensive, and reproducible method. Using these definitions, the main purpose of this study was to:

  • • report on previous research works on sentiment analysis applications in MOOC setting, and
  • • provide an exhaustive analysis that could serve as a platform for future opportunities and paths for research and implementation in the field.

Having these purposes in mind the paper will identify and report the investigated entities/aspects, the most frequently used bibliographical sources, the research trends and patterns, scenarios, architectures, techniques and the tools used for performing sentiment analysis in MOOC.

The following research questions guide this systematic literature review:

  • • RQ1 . What are the various techniques, tools, and architectures used to conduct sentiment analysis in MOOCs discussion forums?
  • • RQ2 . In what scenarios and for what purpose is the sentiment analysis performed in the selected papers?

Search Strategy and Data Collection

The online JabRef ® software facilitated the article search and selection following the PRISMA approach. To ensure that all relevant studies were collected and reviewed, search strategy involved a stepwise approach that consists of four stages. The overall process of search strategy is shown in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is frai-04-728708-g001.jpg

Implemented PRISMA search methodology.

The first stage entails the development of a research protocol by determining the research questions, defining the search keywords and identifying the bibliographic databases for performing the search. For the search purposes, following online research databases and engines were systematically examined: ACM DL, IEEE Xplore, ScienceDirect, Scopus, SpringerLink, and Web of Science. In total, the first stage yielded 440 articles, and after all the duplicates were removed, it produced a reduced list of 359 articles to be processed and included for the upcoming stage of screening.

The keywords used in this study were driven by the PICO framework, and are shown in Table 1 . PICO (Population, Intervention, Comparison, and Outcomes) is aimed at helping researchers to design a comprehensive set of search keywords for quantitative research in terms of: Population, Intervention, Comparison, Outcome, and Context ( Schardt et al., 2007 ). As suggested by ( Gianni and Divitini, 2015 ), aiming to avoid missing possible relevant articles, a Context section to the PICO schema was added. Table 2 presents the final search keywords associated with PICO(C) used in the study.

PICO(C) driven keywords framing.

Search string (Query).

First, for all the sections of PICO(C) in Table 1 the adequate keywords were identified, followed by the self-constructed search string by applying binary operators, as shown in Table 2 . To ensure that any possible relevant article will not be omitted in the study, a context section was also added as a separate feature.

Screening refers to stage 2 of the search strategy process and involves the application of inclusion criteria. At this stage, the relevant studies were selected based on the following criteria: 1) type of publication needs to be a peer-reviewed journal or conference paper, 2) papers should have been published between 2015 and 2021, and 3) papers should be in English. After applying the mentioned criteria in the search process, out of 359 papers, a total number of 110 records were accepted as relevant studies for further exploration. The authors agreed to encode the data using three different colors: 1) green—papers that passed the screening threshold, 2) red—papers that did not pass the screening threshold, and 3) yellow—papers that the authors were unsure which category to classify them as (green or red). For such papers, a comprehensive discussion between the authors took place, and once a consensus was reached, those papers were classified into either the green or red category. 

In Stage 3, which in Figure 1 corresponds to eligibility, the studies that are explicitly not: 1) within the context of MOOC, 2) considering sentiment analysis were eliminated. At this stage, all the titles, abstracts, and keywords were examined to determine the relevant records for the next stage. After these criteria, only 49 papers were considered eligible for future investigation in the last stage of analysis.

Moreover, after carefully reading and observing the eligible papers, it was found that three out of 49 papers were lacking full text, and another 6 papers were either review papers or were only employing tools, without providing rich information on the algorithmic applications for sentiment analysis. Therefore, those papers were also excluded, which decreased the number of eligible papers to 40.

Limitations

When assessing this systematic literature review, there are several factors that need to be considered, since they can potentially limit the validity of the findings: These factors include:

  • • Only papers written in English were selected in the study. While searching the research databases, we found related articles in other languages, such as Chinese and Spanish. Those articles are not included.
  • • The study includes papers collected from the six digital research databases shown in Figure 1 . Thus, we might have potentially missed papers having been indexed in other digital libraries.
  • • For this study, only peer reviewed journal articles, conferences and book sections are selected. Scientific studies that are not-peer reviewed are not included.
  • • Only works published between January 1, 2015, and March 4, 2021, are selected in this study. We highlight that there may have been conference papers presented before March 4, 2021, that were not published by the cut-off date for this study and that they were not included in our literature review.

Results and Analysis

After determining the core set of eligible papers, both quantitative and qualitative analysis on the data were performed. In the quantitative approach, data categorization of the findings was performed, based on the publication year, venue, publication type, geographic region of the authors and also data based on techniques, architectures, algorithms and tools. On the other hand, for qualitative analysis, an open coding content analysis method as described in ( Braun and Clarke, 2006 ) was used. This technique comprises two phases: first, reading all papers to extract themes, and second, classifying the identified themes. The Figure 2 below showcases the process of analysis.

An external file that holds a picture, illustration, etc.
Object name is frai-04-728708-g002.jpg

Analysis process of the relevant contributions.

Quantitative Analysis

We conduct quantitative analysis for answering the first research question, dealing with the techniques, tools, and architectures used to conduct sentiment analysis in MOOCs discussion forums. Figure 3 presents the relevant studies distributed according to year and database source. From the figure, it can be observed that the most relevant and selected studies is IEEE Xplore with 13 studies, followed by Scopus with 8 studies. Moreover, as can be seen from Figure 4 , which illustrates the distribution of conference and journal papers, there has been an increasing trend of research works in journals in the last 2 years. During the previous years, most of the studies were published in conferences.

An external file that holds a picture, illustration, etc.
Object name is frai-04-728708-g003.jpg

Distribution of studies in academic databases.

An external file that holds a picture, illustration, etc.
Object name is frai-04-728708-g004.jpg

The number of collected conference and journal papers in 2015–2021.

By observing the country of origin of the first author, most of the works are from Asia with 17 papers, followed by Europe with 10 papers, and North America with 8 papers. In Asia, most of the studies are from China. Figure 5 shows the distribution by country.

An external file that holds a picture, illustration, etc.
Object name is frai-04-728708-g005.jpg

The number of collected papers across different regions/countries of first author.

When it comes to the techniques used to conduct sentiment analysis in MOOCs, they can be categorized mainly into four different groups, namely supervised, unsupervised, lexicon-based approach, and statistical analysis. Table 3 presents papers clustering based on learning approaches (techniques) that the authors have applied. In total, 21 papers used either supervised, unsupervised, and lexicon-based techniques or a combination among the three groups. Nine papers used statistical analysis while the rest of the papers did not explicitly specify the technique.

Papers based on used technique/learning approach.

In Table 4 , the most frequently used supervised learning algorithms are shown. As can be seen, Neural Networks (NN) and Naïve Bayes (NB) were used most often in the reviewed studies, followed by Support Vector Machines (SVM) and Decision Tree (DT) algorithms.

Most frequently used supervised learning algorithms.

Table 5 , lists the use of lexicon-based approaches, which are also known as rule-based sentiment analysis. The most frequently used lexicons among the reviewed articles is VADER (Valence Aware Dictionary and Sentiment Reasoner), followed by TextBlob and SentiWordNet.

Most frequently used lexicons.

Regarding the architecture, ML, DL and NLP were presented in the reviewed articles. Figure 6 illustrates that NLP and DL are most often used starting from 2020 onwards. Hence, NLP is used in seven papers, followed by DL with five papers.

An external file that holds a picture, illustration, etc.
Object name is frai-04-728708-g006.jpg

Distribution of architectures during 2015–2021.

Figure 7 below shows the findings reviewed in the study with respect to the most frequently used packages, tools, libraries, etc. for the sentiment analysis task in MOOCs.

An external file that holds a picture, illustration, etc.
Object name is frai-04-728708-g007.jpg

Tools/packages/libraries/used for sentiment analysis in the reviewed papers.

As presented in the figure, the most popular solution to conduct sentiment analysis is R and was used in four studies. Next, NLTK was the second most used platform. On the other hand, StanfordNLP, NLTK, spaCY, edX-CAS, WAT and TAALES represent the second category of most used solutions, each of them appearing in two different articles. The third group is composed of a variation of solutions which appear only once across the reviewed articles.

Qualitative Analysis

To answer the second research question, the process continued with the strategy described by ( Braun and Clarke, 2006 ). This encompasses an inductive thematic approach to identify common themes identified in every article. This process involves six phases: familiarizing with data , generating initial codes , searching for themes , theme review , defining themes and naming themes . Familiarization with the literature was reached during screening. The authors then inductively coded the manuscripts. The codes were collected in an Excel file to prepare for the upcoming steps. Further, the codes were grouped and consolidated in order to find and identify themes. Upon final agreement of themes and their definitions, a narrative through independent and collaborative writing and reviewing was built, following the recommendations from ( Lincoln and Guba, 1985 ; Creswell and Miller, 2000 ). The overall process resulted in 6 themes, each discussed in detail in the discussion section. A summary of this assessment is presented in Table 6 .

Summary of identified themes.

In this section, the types and trends of research conducted within each of the previously identified themes are explored and discussed. Finally, recommendations and suggestions for addressing the identified challenges are provided.

MOOC Content Evaluation

In order to create relevant and useful insights for MOOC content development, course designers and learning analytics experts need to process and analyze a complex set of unstructured learner-generated data from MOOC discussion forums. The course content evaluations via sentiment analysis approaches can provide substantial indications to instructional designers and teachers to periodically evaluate the courses and introduce potential improvements.

In a study with a small sample of 28 students, the learners had a positive attitude and perception towards the quality of MOOC content (88.6%). Moreover, the text-mining based evaluation of the content conducted on the study also confirmed a high satisfaction on MOOC content. Here, the positive features included “interesting,” “easy,” and “duration of video is appropriate” ( Au et al., 2016 ).

( Dina et al., 2021 ) explored the performance of a quantitative (SA based) model to measure the user preferences regarding the course content. The sentiment analysis classification has been done using Support Vector Machine. The accuracy, precision, recall, and F1 score were above 80%. Some of the positive features produced by this model were “course-good,” “course-interesting, ”“course-easy,” “course-understand,” “course-recommended,” and “material-good.” In another case study, a learner decision journey framework was proposed to analyze the MOOC content development, to understand the circular learning process, and to generate further insights for course improvements ( Lei et al., 2015 ). The study showed the presence of posts with significant positive sentiment scores during the entire course, meaning that learners were positive towards the content and also in completing the course.

An application framework of an intelligent system for learner emotion recognition on MOOCs was proposed by ( Liu et al., 2017a ), where obtaining the learners’ emotion-topic feedback about content proved to be instrumental for teachers to analyze and improve their teaching pedagogy. Furthermore, an analysis of sentiments of MOOC learners’ posts via deep learning approach was conducted by ( Li et al., 2019 ). The experiments in this study revealed that the approach could be effectively used to identify content related problems and to improve educational outcomes. In contrast to lexicon-based approaches, which were also evaluated in the study, deep learning models could further reduce the consumption of constructing sentiment dictionaries, among others.

Review (Feedback) Contradiction Analysis

Although the learner-generated reviews and opinions have great practical relevance to educators and instructional designers, sometimes, learners’ comments tend to be contradictory (positive vs. negative), which creates difficulties for teachers to understand them. One possible explanation for such a contradiction is that MOOC learners are quite heterogeneous with different educational backgrounds, knowledge, and motivations ( Nie and Luo, 2019 ). However, the large-scale comments, negative opinions and emotions in particular, can spread faster than positive ones ( Pugh, 2001 ), and these could lead to dropouts. Only three studies were found to be focused on the contradiction analysis of MOOC reviews ( Badache and Boughanem, 2014 ; Liu et al., 2017a ; Kastrati et al., 2020 ).

An experimental study on the detection of contradictory reviews in Coursera based on the sentiment analysis around specific aspects was conducted by ( Badache and Boughanem, 2014 ). Before extracting particular aspects according to the distribution of the emotional terms, the reviews were first grouped according to the session. Further, the polarity of each review segment holding an aspect was identified. The results of experiments with 2,244 courses and 73, 873 reviews revealed the effectiveness of the proposed approach towards isolating and quantifying contradiction intensity. Another aspect-based sentiment analysis framework tested and validated in Coursera dataset was proposed by ( Kastrati et al., 2020 ). Researchers have achieved a high-performance score (F1 = 86.13%) for aspect category identification, which demonstrates the reliability and the comprehensiveness of the proposed framework.

Some other scholars also recommended a generative probabilistic model that extends Sentence-LDA (Latent Dirichlet Allocation) to explore negative opinions in terms of pairs of emotions and topics ( Liu et al., 2017a ). With this model, the detection precision of negative topics reached an acceptable accuracy rate of (85.71%). The negative comments were mainly revolving around learning content, online assignments and course certificates.

SA Effectiveness

The effectiveness evaluation of sentiment analysis models was a key focus of much of the reviewed papers, especially those published after 2019. This could be due to the recent trends of making datasets available and the goals of the MOOC providers, because sentiment analysis techniques can shed more light towards improving enrollment and learning experience. During the period of 2015 and 2016, most of the works utilized the clustering models to group similar MOOC discussion forum posts, along with topic modeling to capture the topical themes ( Ezen-Can et al., 2015 ). The main reason behind some works was also to increase satisfaction of teachers who themselves attend MOOCs to support their own professional development ( Koutsodimou and Jimoyiannis, 2015 ; Holstein and Cohen, 2016 ).

However, most of the identified research papers that evaluated the effectiveness of the sentiment analysis models were published during 2019 and 2020 ( Cobos et al., 2019a ; Cobos et al., 2019b ; Yan et al., 2019 ; Capuano and Caballé, 2020 ; Capuano et al., 2020 ; Estrada et al., 2020 ; Hew et al., 2020 ; Onan, 2021 ). ( Cobos et al., 2019a ; Cobos et al., 2019b ) compared and measured the evaluation effectiveness of machine learning (SVM, NB, ANN) and NLP approaches (VADER, TextBlob) to extract features and perform text analysis. Their prototype was based on a content analyser system for edX MOOCs. Another group of researchers conducted a relevant study by applying unsupervised natural language processing techniques to explore students’ engagement in Coursera MOOCs ( Yan et al., 2019 ). Further, they evaluated the performance of LDA, LSA (Latent Semantic Analysis) and topic modelling to discover the emerging topics in discussion forums and to investigate the sentiments associated with the discussions.

After 2019, along with the machine learning and natural language processing techniques ( Hew et al., 2020 ), researchers started to use and measure the effectiveness of deep learning architectures for sentiment analysis on MOOCs that exhibit an improved performance compared to conventional supervised learning methods ( Capuano and Caballé, 2020 ; Capuano et al., 2020 ; Estrada et al., 2020 ; Onan, 2021 ). The most widely used deep learning approaches by researchers are CNN (Convolutional Neural Networks), LSTM (Long Short-Term Memory), BERT (Bidirectional Encoder Representations from Transformers), and RNN (Recurrent Neural Networks).

SA Through Social Networks Posts

The research has demonstrated that social networking sites can significantly impact the interaction of learners with courses ( Georgios Paltoglou, 2012 ). With the growing popularity of social networking, sentiment analysis has been used with social networks and microblogging sites, especially Twitter or blogs ( Hong and Skiena, 2010 ; Miller et al., 2011 ). However, the nature and the structure of the texts published in social networks is largely scattered and unstructured. Therefore, many researchers have adopted various social media mining approaches to investigate the sentiments of Twitter messages related to MOOC learning ( Shen and Kuo, 2015 ; Buenaño-Fernández et al., 2017 ). The main goal of these studies was to explore the students’ tweets (positive and negative) about the course, and to evaluate instructors and the educational tools used in the course. ( Lundqvist et al., 2020 ) employed sentiment analysis to investigate the online comments of MOOCs where VADER (Valence Aware Dictionary for sEntiment Reasoning) sentiment algorithm was used. Sentiment ratings from 90,000 social media based posts are included in VADER. From all analyzed comments, it was revealed that there exists a correlation between sentiments of the posts and the feedback provided about the MOOC. Moreover, 78% of students were positive towards the MOOC structure. Almost all identified papers were using Twitter to explain the insights of MOOCs from social media platforms. Future invastigations may also consider other platforms, such as Facebook or Youtube and compare with findings obtained for Twitter.

Understanding Course Performance and Dropouts

The major challenge of MOOCs is the massive dropout or retention ( Chen et al., 2020 ). In parallel with the factors, like demographic characteristics, interaction, self-reported motivation, and commitment attitudes, this paper stresses that learners’ lack of self-regulation might create cliffhangers that should be instantly conquered to benefit from the MOOCs.

The best way to predict the prospective dropouts is to analyse the reactions within SA and to extract those keywords that reveal that the dropouts are predominantly related with the course performance. Such analysis was performed in more detail in five of the eligible papers ( Crossley et al., 2015 ; Dowell et al., 2015 ; Crossley et al., 2016 ; Lubis et al., 2016 ; Nissenson and Coburn, 2016 ), showing that many researchers have been intrigued by the poorer course performance and decreased interest to persist in the course. Three of them concentrate on the discussion forums ( Crossley et al., 2015 ; Dowell et al., 2015 ; Crossley et al., 2016 ). While ( Crossley et al., 2015 ) embraces the language used in the discussion forums as a predictive feature of successful class completion, ( Crossley et al., 2016 ), also examines the online clickstream data and the language. The last one ( Dowell et al., 2015 ) additionally examines the social position of learners as they interact in a MOOC. Last two papers that investigate the language to understand learner’s performance and dropouts are mainly focused on the attributes that contribute towards predicting the successful course completion ( Lubis et al., 2016 ), ( Nissenson and Coburn, 2016 ). They both extract the attributes that exhibit learners’ satisfaction only, rather than those factors that might suppress learners from continuing their studies in the MOOCs. ( Lubis et al., 2016 ) is even more optimistic, and never explicitly mentions dropouts. This is extremely good news, knowing that the analysis was done over 20,000 reviews crawled from class central websites containing 1900 topics.

The general objective of this cluster of papers is to analyse the sentiment analysis by examining the language used in it. Depending on the research hypotheses in them, the attributes used to explore learners’ opinion vary from moderately pessimistic to very optimistic. Undoubtedly, several more papers implementing the same approach will contribute to increasing the impact of MOOCs on education and minimizing the risk of premature retention.

MOOC Design Model Evaluation

As elaborated in the MOOC content evaluation , the evaluation of MOOC content is crucial for the evolution of the MOOCs, since it determines and proposes the necessary improvements that are inevitable to extend MOOCs lifecycle. Quite unexpectedly, several of the surveyed papers suggested an improvement of the design model, as a complementary element that is essential to keep the MOOC active and prosperous. In the first place, they notice that there are many differences of the language used for MOOC supported online and real classes ( Rahimi and Khosravizadeh, 2017 ). The distinction is done including both, the text and the speech analysis. More profoundly, the research in ( Qi and Liu, 2021 ) proposes LDA for mining of the student generated reviews with an ultimate aim to objectively and accurately evaluate the indicators providing reliable references for both, the students and the educators. Based on the established means for text mining of sentiment analysis and the profound processing of the results, reorganization of the model can start. The strategy is proposed in ( Lee et al., 2016 ). By introducing 11 design criteria for organization of the model, this paper examines the MOOC characteristics and their impact on satisfaction of instructor and learner.

The last two papers from this cluster are topic specific. ( Liu, 2016 ) explores a new model based on English for Specific Purposes for the course of metallurgical English. To strengthen the approach, authors suggest a symbiosis between MOOCs and flipped classrooms, in the light of the course purpose, content, teaching organization and finally, teachers’ evaluation. By making the synergy between both teaching methodologies, they believe that the course will significantly advance. ( O’Malley et al., 2015 ) goes one step forward, it suggests a reconstruction of MOOCs into a virtual laboratory using video and simulations. This is an outgoing project, intended to adapt online delivery format for a campus-based first year module on Physical chemistry at University of Manchester. The experience of merging MOOC with a virtual laboratory proved its efficiency. Improvement of the content needs an improvement of the design model.

On many occasions, the improvement of a product means an improvement of the technology that enables it. The last theme of this survey proves this claim. It can be done by adding new features, such as the flipped classroom ( Liu, 2016 ) and the virtual laboratory ( O’Malley et al., 2015 ). This extension should be done steadily and carefully to avoid the risks of ruining the product. To enable the extensions, it is inevitable to maintain the existing features. They can be assessed by implementing the design criteria ( Lee et al., 2016 ). However, all the improvements must be appreciated by their end users, the learners and the teachers. The evaluation includes SA performed using the techniques proposed in ( Qi and Liu, 2021 ; Rahimi and Khosravizadeh, 2017 ). The last, but not the least is to support the philosophy of continuous improvement. This returns the sentiment analysis to the first theme: MOOC content evaluation, and then continues with all the remaining themes, creating a never-ending lifecycle for evaluation of MOOCs.

Recommendations and Future Research Avenues

When considering the MOOC content evaluation of the relevant studies documented in our reviewed sample, overall, there is a favorable rating of course content among learners. As can be seen from the above discussion, most research on MOOC content evaluation is focused on the learner feedback, however, future scholars could also consider investigating the teacher’s feedback/perspective towards the content development, teaching pedagogy, experience, and assessment, among others. Moreover, it would be also interesting to consider exploring the results provided by sentiment analysis techniques in collaboration with the instructors of the MOOC course to know if their proposed materials could be improved.

Throughout the reviewed papers, imbalanced datasets with underrepresented categories were evidenced. Therefore, a recommendation for researchers to achieve performance improvement is by applying data augmentation techniques. Classifier performance can be improved by adopting more advanced word representation approaches like contextualized embeddings as well as classical NLU (Natural Language Understanding) techniques, such as part-of-speech, parsing, etc.

Furthermore, exploring the relationship between polarity markers and other feeling labels or emotions could be beneficial towards better identification and addressing of the issues related towards the target subject, as has been studied in many relevant text-based emotion detection works ( Acheampong et al., 2020 ).

A considerable number of reviewed papers failed to report on how the results were standardized in terms of participant numbers and characteristics, course subject and context, accuracy, and metrics of SA approaches. Hence, we consider that a special focus should be placed towards enhancing the transparency of the research results. This could be beneficial and advantageous to other researchers when conducting comparative performance analysis between various sentiment analysis approaches.

Some of the studies related to recognition of polarities and emotions in MOOCs are conducted in laboratory settings and utilize a limited set of algorithmic solutions and techniques. However, more standardized investigations are needed to be conducted with students using more algorithms with different configurations of hyper-parameters and layers. This way, standardization will contribute to assuring the quality, safety, and reliability of the solutions and techniques designed for sentiment analysis in MOOC learning environments. In addition, there is also a lack of standardized datasets available for the evaluation of sentiment analysis models in MOOCs. Most of the researchers have used publicly available datasets of Coursera, edX, FutureLearn, and even datasets from their own institutions ( Ezen-Can et al., 2015 ; Moreno-Marcos et al., 2018 ; Cobos et al., 2019a ; Yan et al., 2019 ; Estrada et al., 2020 ; Lee et al., 2020 ). The absence of standardized datasets plays a negative role when benchmarking or comparing algorithmic solutions of different researchers. It is also worth mentioning that researchers used datasets from predominantly computer science courses to evaluate and explore sentiment analysis of students’ feedback in MOOCs ( Moreno-Marcos et al., 2018 ; Estrada et al., 2020 ; Lee et al., 2020 ; Lundqvist et al., 2020 ). Thus, the research is mainly limited to one academic field.

It was also observed that the reviewed research papers have not taken into consideration different types of MOOCs, such as cMOOCs, xMOOCs, or sMOOCs. In the future, sentiment analysis of students’ feedback should also consider different types of MOOCs.

In addition, if enough suitable (standardized) datasets could have been available, it can be interesting to introduce more meticulous RQs and to try a meta-analysis, or even an advanced systematic quantitative literature review, involving more complex statistical operations. This could, however, serve as an insightful idea for a future work.

Although introduced almost 75 years ago, sentiment analysis has recently become a very popular tactic for gathering and mining the subjective information from end users of various services. Implementing popular NLP, statistical and ML techniques, sentiment analysis grows into a cost-effective tool to distil the sentiment patterns that reveal the potential challenges of the existing services, and at the same time, identify new opportunities and improvements. Its extensive implementation contributed to increased accuracy and efficiency wherever it was used.

The use of sentiment analysis techniques to understand students’ feedback in MOOCs represents an important factor to improve the learning experience. Moreover, sentiment analysis can be also applied to improve teaching by analyzing the learners’ behavior towards courses, platforms, and instructors.

To evaluate these claims, a PRISMA directed systematic review of the most recent and more influential scholar publications has been done. The review has gone through an exhaustive quantitative and qualitative stepwise filtering of the initial corpus existing of 440 articles that fulfilled the search criteria associated with PICO(C). Together with the briefly introduces methodology, search strategy and data selection, the authors have also tackled the potential limitations of the proposed approach. After these introductory sections, the paper thoroughly presents the quantitative results for 40 relevant papers, starting from the process of analysing relevant contributions, their contribution in academic databases and annual and geographical distribution, then makes an overview of the implemented sentiment analysis technique and supervised learning algorithms and lexicons, to end up with the distribution of architectures, tools/packages/libraries/used for sentiment analysis in the reviewed papers. It is worth mentioning that from 2019 onwards researchers have started to apply deep learning in combination with NLP approaches to analyze the sentiments of students’ comments in MOOCs.

Qualitative analysis identified the following six major themes being used in the reviewed papers: MOOC content evaluation, review (feedback) contradiction analysis, SA effectiveness, SA through social networks posts, understanding course performance and dropouts, and MOOC design model evaluation. As part of this analysis, each theme was carefully presented and illustrated with the corresponding filtered references that fulfil all the criteria.

We believe that this work could be a good inspiration for future research, and that will provide readers with interesting information in a wide context about the current trends, challenges, and future directions in the field.

Data Availability Statement

Author contributions.

All authors listed have agreed on the design of this study and have performed literature reading and relevant papers’ review. Project administration, methodology, data abstraction, processing and analysis are conducted by the FD. FD and KZ have contributed to writing and editing of the original draft. FA was involved in reading, editing, and providing constructive feedback for the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

  • Acheampong F. A., Wenyu C., Nunoo-Mensah H. (2020). Text-based Emotion Detection: Advances, Challenges, and Opportunities . Eng. Rep. 2 , e12189. 10.1002/eng2.12189 [ CrossRef ] [ Google Scholar ]
  • Au C. H., Lam K. C. S., Fung W. S. L., Xu X. (2016). “ Using Animation to Develop a MOOC on Information Security ,” in 2016 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), December 4-7, 2016, Bali, 365–369. 10.1109/IEEM.2016.7797898 [ CrossRef ] [ Google Scholar ]
  • Badache I., Boughanem M. (2014). “ Harnessing Social Signals to Enhance a Search ,” in Procee- dings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), August 11-14, 2014, Washington, DC, USA, 303–309. WI-IAT ’14. 10.1109/wi-iat.2014.48 [ CrossRef ] [ Google Scholar ]
  • Braun V., Clarke V. (2006). Using Thematic Analysis in Psychology . Qual. Res. Psychol. 3 ( 2 ), 77–101. 10.1191/1478088706qp063oa [ CrossRef ] [ Google Scholar ]
  • Buenaño-Fernández D., Luján-Mora S., Villegas-Ch W. (2017). “ Application of Text Mining on Social Network Messages about a MOOC ,” in ICERI2017 Proceedings, November 16-18, 2017, Seville, Spain, 6336–6344. [ Google Scholar ]
  • Cambria E., Schuller B., Xia Y., Havasi C. (2013). New Avenues in Opinion Mining and Sentiment Analysis . IEEE Intell. Syst. 28 ( 2 ), 15–21. 10.1109/mis.2013.30 [ CrossRef ] [ Google Scholar ]
  • Capuano N., Caballé S. (2020). “ Multi-attribute Categorization of MOOC Forum Posts and Applications to Conversational Agents ,” in Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2019 . Lecture Notes in Networks and Systems . Editors Barolli L., Hellinckx P., Natwichai J. (Cham: Springer; ), 96 , 505–514. 10.1007/978-3-030-33509-0_47 [ CrossRef ] [ Google Scholar ]
  • Capuano N., Caballé S., Conesa J., Greco A. (2020). Attention-based Hierarchical Recurrent Neural Networks for MOOC Forum Posts Analysis . J. Ambient Intell. Hum. Comput 2020 , 1–13. 10.1007/s12652-020-02747-9 [ CrossRef ] [ Google Scholar ]
  • Chen C., Sonnert G., Sadler P. M., Sasselov D. D., Fredericks C., Malan D. J. (2020). Going over the Cliff: MOOC Dropout Behavior at Chapter Transition . Distance Educ. 41 ( 1 ), 6–25. 10.1080/01587919.2020.1724772 [ CrossRef ] [ Google Scholar ]
  • Cobos R., Jurado F., Villén Á. (2019a). “ Moods in MOOCs: Analyzing Emotions in the Content of Online Courses with edX-CAS ,” in 2019 IEEE Global Engineering Education Conference (EDUCON), April 9-11, 2019, Dubai, UAE. 10.1109/educon.2019.8725107 [ CrossRef ] [ Google Scholar ]
  • Cobos R., Jurado F., Blázquez-Herranz A. (2019b). A Content Analysis System that Supports Sentiment Analysis for Subjectivity and Polarity Detection in Online Courses . IEEE R. Iberoam. Tecnol. Aprendizaje 14 , 177–187. 10.1109/rita.2019.2952298 [ CrossRef ] [ Google Scholar ]
  • Creswell J. W., Miller D. L. (2000). Determining Validity in Qualitative Inquiry . Theor. into Pract. 39 ( 3 ), 124–130. 10.1207/s15430421tip3903_2 [ CrossRef ] [ Google Scholar ]
  • Crossley S., McNamara D. S., Baker R., Wang Y., Paquette L., Barnes T., Bergner Y. (2015). “ Language to Completion: Success in an Educational Data Mining Massive Open Online Class ,” in Proceedings of the 7th Annual Conference on Educational Data Mining [EDM2015], June 26-29, 2015, Madrid, Spain. [ Google Scholar ]
  • Crossley S., Paquette L., Dascalu M., McNamara D. S., Baker R. S. (2016). “ Combining Click-Stream Data with NLP Tools to Better Understand MOOC Completion ,” in Proceedings of the Sixth International Conference on Learning Analytics and Knowledge (New York: ACM; ), 6–14. LAK ’16. 10.1145/2883851.2883931 [ CrossRef ] [ Google Scholar ]
  • Dalipi F., Kurti A., Zdravkova K., Ahmedi L. (2017). “ Rethinking the Conventional Learning Paradigm towards MOOC Based Flipped Classroom Learning ,” in Proceedings of the 16th IEEE International Conference on Information Technology Based Higher Education and Training (ITHET), July, 10-12 2017, Ohrid, North Macedonia, 1–6. 10.1109/ITHET.2017.8067791 [ CrossRef ] [ Google Scholar ]
  • Dina N., Yunardi R., Firdaus A. (2021). Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation Massive Open Online Course . Int. J. Emerging Tech. Learn. (Ijet) 16 ( 1 ), 134–151. 10.3991/ijet.v16i01.17095 [ CrossRef ] [ Google Scholar ]
  • Dowell N. M., Skrypnyk O., Joksimovic S., et al. (2015). “ Modeling Learners’ Social Centrality and Performance through Language and Discourse ,” in Proceedings of the 8th International Conference on Educational Data Mining, June, 26-29 2015, Madrid, Spain, 250–257. [ Google Scholar ]
  • Barrón Estrada M. L., Zatarain Cabada R., Oramas Bustillos R., Graff M., Raúl M. G. (2020). Opinion Mining and Emotion Recognition Applied to Learning Environments . Expert Syst. Appl. 150 , 113265. 10.1016/j.eswa.2020.113265 [ CrossRef ] [ Google Scholar ]
  • Ezen-Can A., Boyer K. E., Kellogg S., Booth S. (2015). “ Unsupervised Modeling for Understanding MOOC Discussion Forums: a Learning Analytics Approach ,” in Proceedings of the International Conference on Learning Analytics and Knowledge (LAK’15), June, 26-29 2015, Madrid, Spain. [ Google Scholar ]
  • Fink A. (2019). Conducting Research Literature Reviews: From the Internet to Paper . Fifth edition. UCLA, California: Sage Publications. [ Google Scholar ]
  • Georgios Paltoglou M. T. (2012). Twitter, MySpace, Digg: Unsupervised Sentiment Analysis in Social Media . ACM Trans. Intell. Syst. Technol. 3 ( 4 ), 1–9. 10.1145/2337542.2337551 [ CrossRef ] [ Google Scholar ]
  • Gianni F., Divitini M. (2015). Technology-enhanced Smart City Learning: a Systematic Mapping of the Literature . Interaction Des. Architect.(s) J. - IxD&A, N. 27 , 28–43. [ Google Scholar ]
  • Hew K. F., Hu X., Qiao C., Tang Y. (2020). What Predicts Student Satisfaction with MOOCs: A Gradient Boosting Trees Supervised Machine Learning and Sentiment Analysis Approach . Comput. Educ. 145 , 103724. 10.1016/j.compedu.2019.103724 [ CrossRef ] [ Google Scholar ]
  • Holstein S., Cohen A. (2016). The Characteristics of Successful MOOCs in the Fields of Software, Science, and Management, According to Students' Perception . Ijell 12 , 247–266. 10.28945/3614 [ CrossRef ] [ Google Scholar ]
  • Hong Y., Skiena S. (2010). “ The Wisdom of Bookies? Sentiment Analysis vs. The NFL point Spread ,” in Proceedings of the International Conference on Weblogs and Social Media (IcWSm-2010), May 23-26, 2010, Washington DC, USA, 251–254. [ Google Scholar ]
  • Kastrati Z., Imran A. S., Kurti A. (2020). Weakly Supervised Framework for Aspect-Based Sentiment Analysis on Students' Reviews of MOOCs . IEEE Access 8 , 106799–106810. 10.1109/access.2020.3000739 [ CrossRef ] [ Google Scholar ]
  • Kastrati Z., Dalipi F., Imran A. S., Pireva Nuci K., Wani M. A. (2021). Sentiment Analysis of Students' Feedback with NLP and Deep Learning: A Systematic Mapping Study . Appl. Sci. 11 , 3986. 10.3390/app11093986 [ CrossRef ] [ Google Scholar ]
  • Koutsodimou K., Jimoyiannis A. (2015). “ MOOCs for Teacher Professional Development: Investigating Views and Perceptions of the Participants ,” in Proceedings of the 8th international conference of education, research and innovation – ICERI2015, Seville, Spain (IATED; ), 6968–6977. [ Google Scholar ]
  • Lee G., Keum S., Kim M., Choi Y., Rha I. (2016). “ A Study on the Development of a MOOC Design Model ,” in Educational Technology International (Korea: Seoul National University; ), 17 , 1–37. 1 [ Google Scholar ]
  • Lee D., Watson S. L., Watson W. R. (2020). The Relationships between Self-Efficacy, Task Value, and Self-Regulated Learning Strategies in Massive Open Online Courses . Irrodl 21 ( 1 ), 23–39. 10.19173/irrodl.v20i5.4389 [ CrossRef ] [ Google Scholar ]
  • Lei C. U., Hou X., Kwok T. T., Chan T. S., Lee J., Oh E., Lai C. (2015). “ Advancing MOOC and SPOC Development via a Learner Decision Journey Analytic Framework ,” in 2015 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), December 10–15, 2015, Zhuhai, China, (IEEE; ), 149–156. 10.1109/tale.2015.7386034 [ CrossRef ] [ Google Scholar ]
  • Li X., Zhang H., Ouyang Y., Zhang X., Rong W. (2019). “ A Shallow BERT-CNN Model for Sentiment Analysis on MOOCs Comments ,” in 2019 IEEE International Conference on Engineering, Technology and Education (TALE), December 10-13, 2019, Yogyakarta, Indonesia, (IEEE; ), 1–6. 10.1109/tale48000.2019.9225993 [ CrossRef ] [ Google Scholar ]
  • Liberati A., Altman D. G., Tetzlaff J., Mulrow C., Gotzsche P. C., Ioannidis J. P. A., et al. (2009). The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies that Evaluate Healthcare Interventions: Explanation and Elaboration . BMJ 339 , b2700. 10.1136/bmj.b2700 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lincoln Y., Guba E. (1985). Naturalistic Inquiry . California: Sage Publications. [ Google Scholar ]
  • Littlejohn A., Hood N., Milligan C., Mustain P. (2016). Learning in MOOCs: Motivations and Self-Regulated Learning in MOOCs . Internet Higher Educ. 29 , 40–48. 10.1016/j.iheduc.2015.12.003 [ CrossRef ] [ Google Scholar ]
  • Liu Z., Zhang W., Sun J., et al. (2017a). “ Emotion and Associated Topic Detection for Course Comments in a MOOC Platform ,” in IEEE International Conference on Educational Innovation Through Technology, September 22-24, 2016, Tainan, Taiwan. [ Google Scholar ]
  • Liu Z., Yang C., Peng X., Sun J., Liu S. (2017b). “ Joint Exploration of Negative Academic Emotion and Topics in Student-Generated Online Course Comments ,” in Proceedings of the International Conference of Educational Innovation through Technology (EITT), Osaka, Japan, 7–9 December 2017, 89–93. 10.1109/eitt.2017.29 [ CrossRef ] [ Google Scholar ]
  • Liu D. (2016). The Reform and Innovation of English Course: A Coherent Whole of MOOC, Flipped Classroom and ESP . Proced. - Soc. Behav. Sci. 232 , 280–286. 10.1016/j.sbspro.2016.10.021 [ CrossRef ] [ Google Scholar ]
  • Lubis F. F., Rosmansyah Y., Supangkat S. H. Experience in Learners Review to Determine Attribute Relation for Course Completion (2016). In Proceedings of the International Conference on ICT For Smart Society (ICISS), Surabaya, Indonesia, 20–21 July 2016; pp. 32–36. 10.1109/ictss.2016.7792865 [ CrossRef ] [ Google Scholar ]
  • Lundqvist K., Liyanagunawardena T., Starkey L. (2020). Evaluation of Student Feedback within a MOOC Using Sentiment Analysis and Target Groups . Irrodl 21 ( 3 ), 140–156. 10.19173/irrodl.v21i3.4783 [ CrossRef ] [ Google Scholar ]
  • Martínez G., Baldiris S., Salas D. (2019). “ The Effect of Gamification in User Satisfaction, the Approval Rate and Academic Performance ,” in International Symposium on Emerging Technologies for Education (Cham: Springer; ), 122–132. [ Google Scholar ]
  • Miller M., Sathi C., Wiesenthal D., Leskovec J., Potts C. (2011). “ Sentiment Flow through Hyperlink Networks ,” in Proceedings of the Fifth International Conference on Weblogs and Social Media, July 17-21, 2011, Barcelona, Spain, 550–553. [ Google Scholar ]
  • Moreno-Marcos P. M., Alario-Hoyos C., Muñoz-Merino P. J., Estévez-Ayres I., Kloos C. D. (2018). “ Sentiment Analysis in MOOCs: A Case Study ,” in 2018 IEEE Global Engineering Education Conference (EDUCON), April 17-20, 2018, Santa Cruz de Tenerife, Spain, (IEEE; ), 1489–1496. [ Google Scholar ]
  • Nie Y., Luo H. (2019). “ Diagnostic Evaluation of MOOCs Based on Learner Reviews: The Analytic Hierarchy Process (AHP) Approach ,” in Blended Learning: Educational Innovation for Personalized Learning. ICBL 2019 . Lecture Notes in Computer Science . Editors Cheung S., Lee L. K., Simonova I., Kozel T., Kwok LF., vol , 11546. 10.1007/978-3-030-21562-0_24 [ CrossRef ] [ Google Scholar ]
  • Nissenson P. M., Coburn T. D. (2016). “ Scaling-up a MOOC at a State University in a Cost-Effective Manner ,” in Proceedings of the 2016 American Society for Engineering Education Annual Conference & Exposition, June 26-29, 2016, New Orleans, USA, 26–29. [ Google Scholar ]
  • O’Malley P. J., Agger J. R., Anderson M. W. (2015). Teaching a Chemistry MOOC with a Virtual Laboratory: Lessons Learned from an Introductory Physical Chemistry Course . J. Chem. Educ. 92 ( 10 ), 1661–1666. 10.1021/acs.jchemed.5b00118 [ CrossRef ] [ Google Scholar ]
  • Onan A. (2021). Sentiment Analysis on Massive Open Online Course Evaluations: A Text Mining and Deep Learning Approach . Comput. Appl. Eng. Educ. 29 , 572–589. 10.1002/cae.22253 [ CrossRef ] [ Google Scholar ]
  • Onwuegbuzie A., Leech N., Collins K. (2012). Qualitative Analysis Techniques for the Review of the Literature . Qual. Rep. 17 ( 56 ), 1–28. [ Google Scholar ]
  • Petticrew M., Roberts H. (2006). Systematic Reviews in the Social Sciences . Oxford: Blackwell Publishing. [ Google Scholar ]
  • Pugh S. D. (2001). Service with a Smile: Emotional Contagion in the Service Encounter . Amj 44 ( 5 ), 1018–1027. 10.5465/3069445 [ CrossRef ] [ Google Scholar ]
  • Qi C., Liu S. (2021). Evaluating On-Line Courses via Reviews Mining . IEEE Access 9 , 35439–35451. 10.1109/access.2021.3062052 [ CrossRef ] [ Google Scholar ]
  • Rabbany R., Elatia S., Takaffoli M., Zaïane O. R. (2014). “ Collaborative Learning of Students in Online Discussion Forums: A Social Network Analysis Perspective ,” in Educational Data Mining (Cham: Springer; ), 441–466. 10.1007/978-3-319-02738-8_16 [ CrossRef ] [ Google Scholar ]
  • Rahimi A., Khosravizadeh P. (2018). A Corpus Study on the Difference between MOOCs and Real Classes. BRAIN . Broad Res. Artif. Intelligence Neurosci. 9 ( 1 ), 36–43. [ Google Scholar ]
  • Sa'don N. F., Alias R. A., Ohshima N. (2014). “ Nascent Research Trends in MOOCs in Higher Educational Institutions: A Systematic Literature Review ,” in 2014 International Conference on Web and Open Access to Learning (ICWOAL), November 25-27, 2014, Dubai, UAE, (IEEE; ), 1–4. 10.1109/icwoal.2014.7009215 [ CrossRef ] [ Google Scholar ]
  • Schardt C., Adams M. B., Owens T., Keitz S., Fontelo P. (2007). Utilization of the PICO Framework to Improve Searching PubMed for Clinical Questions . BMC Med. Inform. Decis. Mak 7 ( 1 ), 16. 10.1186/1472-6947-7-16 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shen C.-w., Kuo C.-J. (2015). Learning in Massive Open Online Courses: Evidence from Social media Mining . Comput. Hum. Behav. 51 , 568–577. 10.1016/j.chb.2015.02.066 [ CrossRef ] [ Google Scholar ]
  • Staples M., Niazi M. (2007). Experiences Using Systematic Review Guidelines . J. Syst. Softw. 80 ( 9 ), 1425–1437. 10.1016/j.jss.2006.09.046 [ CrossRef ] [ Google Scholar ]
  • Yan W., Dowell N., Holman C., Welsh S. S., Choi H., Brooks C. (2019). “ Exploring Learner Engagement Patterns in Teach-Outs Using Topic, Sentiment and On-Topicness to Reflect on Pedagogy ,” in Proceedings of the 9th International Conference on Learning Analytics & Knowledge, 180–184. 10.1145/3303772.3303836 [ CrossRef ] [ Google Scholar ]
  • Zemsky R. (2014). With a MOOC MOOC Here and a MOOC MOOC There, Here a MOOC, There a MOOC, Everywhere a MOOC MOOC . J. Gen. Educ. 63 ( 4 ), 237–243. 10.1353/jge.2014.0029 [ CrossRef ] [ Google Scholar ]
  • Zhang H., Dong J., Min L., Bi P. (2020). A BERT Fine-tuning Model for Targeted Sentiment Analysis of Chinese Online Course Reviews . Int. J. Artif. Intell. Tools 29 ( 07n08 ), 2040018. 10.1142/s0218213020400187 [ CrossRef ] [ Google Scholar ]
  • Zhou J., Ye J.-m. (2020). “ Sentiment Analysis in Education Research: A Review of Journal Publications ,” in Interactive Learning Environments , 1–13. 10.1080/10494820.2020.1826985 [ CrossRef ] [ Google Scholar ]

AIP Publishing Logo

  • Previous Article
  • Next Article

A review of sentiment analysis of educational sector stakeholders

[email protected]

[email protected]

  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Reprints and Permissions
  • Cite Icon Cite
  • Search Site

Purnima Kumari Srivastava , Sharmistha Roy , Prashant Pranav; A review of sentiment analysis of educational sector stakeholders. AIP Conf. Proc. 11 December 2023; 2981 (1): 020009. https://doi.org/10.1063/5.0181556

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Sentiment analysis is a process of judging people thinking, imaginations, thoughts, personality through their textual words, emotions, different types of pictures including emoji’s, paintings and behavior etc. Sentiment analysis is widely spread in multiple fields and the deployment of the approach for judging the sentiments of educational sector stakeholders is barely a researched field. Teacher and students are the backbone of our educational system who works behind the improvement of the educational infrastructure and development of any country. Their sentiments while teaching and learning both affect the future of our new generation.

In this review paper, we highlight different methods, approaches and tools related to sentiment analysis. Classification tools include Machine learning and lexicon analysis, whereas Statistical tools include ANOVA test, T-test etc.

Sign in via your Institution

Citing articles via, publish with us - request a quote.

sentiment analysis in education research a review of journal publications

Sign up for alerts

  • Online ISSN 1551-7616
  • Print ISSN 0094-243X
  • For Researchers
  • For Librarians
  • For Advertisers
  • Our Publishing Partners  
  • Physics Today
  • Conference Proceedings
  • Special Topics

pubs.aip.org

  • Privacy Policy
  • Terms of Use

Connect with AIP Publishing

This feature is available to subscribers only.

Sign In or Create an Account

IMAGES

  1. Sentiment Analysis: Types, Tools, and Use Cases

    sentiment analysis in education research a review of journal publications

  2. (PDF) A Sentiment Analysis System to Improve Teaching and Learning

    sentiment analysis in education research a review of journal publications

  3. (PDF) A Review of Sentiment Analysis Approaches in Big Data Era

    sentiment analysis in education research a review of journal publications

  4. (PDF) Sentiment analysis using product review data

    sentiment analysis in education research a review of journal publications

  5. (PDF) Sentiment Analysis-An Objective View

    sentiment analysis in education research a review of journal publications

  6. (PDF) Prediction of Sentiment Analysis on Educational Data based on

    sentiment analysis in education research a review of journal publications

VIDEO

  1. [हिंदी] What is peculiar in my Forex Sentiment Analysis education program?

  2. Sentiment Analysis

  3. Sentiment Analysis Using Product Review from Sephora

  4. Sentiment Analysis in E Commerce Platforms A Review of Current Techniques and Future Directions

  5. Sentiment Analysis

  6. B3 Stock Price Prediction Using LSTM Neural Networks and Sentiment Analysis

COMMENTS

  1. Sentiment analysis in education research: a review of journal

    Sentiment analysis (SA) is widespread across all fields and has become one of the most active topics in education research, and there is a growing body of papers published. So far, however, there has been little discussion about comprehensive literature reviews in SA in education.

  2. Sentiment analysis in education research: a review of journal publications

    Sentiment analysis (SA) is widespread across all fields and has become one of the most active topics in education research, and there is a growing body of papers published. So far, however, there ...

  3. Sentiment analysis in education research: a review of journal

    The high-qualified scientific literature of SA in education is reviewed and the future research prospects of SA based on the reviewed papers are revealed, showing that most studies focus on higher education, and more studies adopt smaller datasets. ABSTRACT Sentiment analysis (SA) is widespread across all fields and has become one of the most active topics in education research, and there is a ...

  4. Sentiment Analysis in Education Research: A Review of Journal Publications

    Sentiment analysis (SA) is widespread across all fields and has become one of the most active topics in education research, and there is a growing body of papers published. So far, however, there has been little discussion about comprehensive literature reviews in SA in education. Therefore, this study aims to review the high-qualified scientific literature of SA in education and reveals the ...

  5. Sentiment analysis for formative assessment in higher education: a

    Sentiment Analysis (SA), a technique based on applying artificial intelligence to analyze textual data in natural language, can help to characterize interactions between students and teachers and improve learning through timely, personalized feedback, but its use in education is still scarce. This systematic literature review explores how SA has been applied for learning assessment in online ...

  6. Sentiment analysis and opinion mining on educational data: A survey

    The challenges and trends in adopting sentiment analysis to education have been discussed and outlined in the research works in Table 4. 6. ... Sentiment analysis in education research: a review of journal publications. Interact. Learn. Environ. (2020), pp. 1-13, 10.1080/10494820.2020.1826985.

  7. Sentiment Analysis in Education

    Early studies in sentiment analysis used to be called opinion mining as they focused on extracting text from review-related websites, where the data often contains a wealth of information about user opinions and attitudes (Pang and Lee 2008; Liu 2012; Thelwall 2016).As the ability to automatically collect a large size of data units increases with time, the shift in research is towards specific ...

  8. PDF Sentiment analysis in education research: a review of journal publications

    As shown in Figure 7, we conducted a statistical analysis of 41 review papers published in 2010 2020. -. and found that only one paper was published before 2014. Since then, the number of SA ...

  9. Sentiment Analysis in Education Domain: A Systematic Literature Review

    The literature review presented in this work has three main objectives: (1) to identify the techniques and classification algorithms used by sentiment analysis in education domain; (2) to identify digital educational resources or learning environments that serve as data sources for the sentiment analysis; and (3) to identify the most used techniques and data sources by the sentiment analysis ...

  10. Sentiment analysis for formative assessment in higher education: a

    Findings from this review show that there is a growing field of research on SA, although most of the papers are written from a technical perspective and published in journals related to digital ...

  11. The evolution of sentiment analysis—A review of research topics, venues

    The table provides the publication name, its type (P for proceedings, B for books, and J for journals), the number of published papers in 2015, the number of sentiment analysis articles and their percentage with respect to the entire sentiment analysis data set, the ratio of the published sentiment analysis papers versus all published papers ...

  12. Sentiment Analysis in Online Learning Environment: A Systematic Review

    The sentiment analysis for online education is an emerging area and hence attracted research studies highlighting its various dimension. [] presented a literature survey on educational research, although not specifically for online education[] considered sentiment analysis e-learning for distance education[] explained about the techniques for improving sentiment analysis for the same

  13. Applied Sciences

    In the last decade, sentiment analysis has been widely applied in many domains, including business, social networks and education. Particularly in the education domain, where dealing with and processing students' opinions is a complicated task due to the nature of the language used by students and the large volume of information, the application of sentiment analysis is growing yet remains ...

  14. Frontiers

    In recent years, sentiment analysis (SA) has gained popularity among researchers in various domains, including the education domain. Particularly, sentiment analysis can be applied to review the course comments in massive open online courses (MOOCs), which could enable instructors to easily evaluate their courses. This article is a systematic literature review on the use of sentiment analysis ...

  15. A Review of Sentiment Analysis Approaches for Quality Assurance in

    A summary of sentiment analysis tec hniques for education was pre sente d in a review study by [3]. For multimodal fusions, the authors of this st udy presente d a sentime nt dete ction

  16. Sentiment Analysis in Education Domain: A Systematic Literature Review

    Some of the main benefits of using sentiment analysis in education domain are the improvement of the teaching-learning process and students' performance, as well as the reduction in course abandonment. E-learning is the delivery of education through digital or electronic methods allowing students to acquire new knowledge and develop new skills. E-learning allows students to expand their ...

  17. Sentiment analysis in education research: a review of journal

    Sentiment analysis (SA) is widespread across all fields and has become one of the most active topics in education research, and there is a growing body of papers published. So far, however, there h...

  18. A Review Paper on the Role of Sentiment Analysis in Quality Education

    This method will help educational institutions in identifying and supporting low-performing students at an initial stage. This study presents a systematic review of research on sentiment analysis towards SDG4 quality education through social media platform such as Twitter, Facebook and a review of 21 studies indexed in SCOPUS.

  19. Sentiment analysis for formative assessment in higher education: a

    Findings from this review show that there is a growing field of research on SA, although most of the papers are written from a technical perspective and published in journals related to digital technologies. Sentiment Analysis (SA), a technique based on applying artificial intelligence to analyze textual data in natural language, can help to characterize interactions between students and ...

  20. A Sentiment Analysis Based Approach for Exploring Student Feedback

    Zhou, J., Ye, J.M.: Sentiment analysis in education research: a review of journal publications. Interact. Learn. Environ. 1-13 (2020) ... The sentiment analysis of the film review text is to extract and analyze the hidden sentiment information in the text data, thereby helping the network personnel such as the media platform to analyze the ...

  21. Systematic reviews in sentiment analysis: a tertiary study

    Mite-Baidal et al. research sentiment analysis in the education domain. E-learning is upcoming, and due to the online nature, lots of review data is generated on forums of MOOCS and social media. 13. Salah et al. research the social media sentiment analysis. Mainly twitter data is used because of the high dimensionality (e. g., retweets ...

  22. Sentiment Analysis of Students' Feedback in MOOCs: A Systematic

    " Sentiment Analysis in Education Research: A Review of Journal Publications," in Interactive Learning Environments, 1-13. 10.1080/10494820.2020.1826985 [Google Scholar] Articles from Frontiers in Artificial Intelligence are provided here courtesy of Frontiers Media SA

  23. A review of sentiment analysis of educational sector stakeholders

    Sentiment analysis is a process of judging people thinking, imaginations, thoughts, personality through their textual words, emotions, different types of pictures including emoji's, paintings and behavior etc. Sentiment analysis is widely spread in multiple fields and the deployment of the approach for judging the sentiments of educational sector stakeholders is barely a researched field.