50 selected papers in Data Mining and Machine Learning
Here is the list of 50 selected papers in Data Mining and Machine Learning . You can download them for your detailed reading and research. Enjoy!
Data Mining and Statistics: What’s the Connection?
Data Mining: Statistics and More? , D. Hand, American Statistician, 52(2):112-118.
Data Mining , G. Weiss and B. Davison, in Handbook of Technology Management, John Wiley and Sons, expected 2010.
From Data Mining to Knowledge Discovery in Databases , U. Fayyad, G. Piatesky-Shapiro & P. Smyth, AI Magazine, 17(3):37-54, Fall 1996.
Mining Business Databases , Communications of the ACM, 39(11): 42-48.
10 Challenging Problems in Data Mining Research , Q. Yiang and X. Wu, International Journal of Information Technology & Decision Making, Vol. 5, No. 4, 2006, 597-604.
The Long Tail , by Anderson, C., Wired magazine.
AOL’s Disturbing Glimpse Into Users’ Lives , by McCullagh, D., News.com, August 9, 2006
General Data Mining Methods and Algorithms
Top 10 Algorithms in Data Mining , X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. motoda, G.J. MClachlan, A. Ng, B. Liu, P.S. Yu, Z. Zhou, M. Steinbach, D. J. Hand, D. Steinberg, Knowl Inf Syst (2008) 141-37.
Induction of Decision Trees , R. Quinlan, Machine Learning, 1(1):81-106, 1986.
Web and Link Mining
The Pagerank Citation Ranking: Bringing Order to the Web , L. Page, S. Brin, R. Motwani, T. Winograd, Technical Report, Stanford University, 1999.
The Structure and Function of Complex Networks , M. E. J. Newman, SIAM Review, 2003, 45, 167-256.
Link Mining: A New Data Mining Challenge , L. Getoor, SIGKDD Explorations, 2003, 5(1), 84-89.
Link Mining: A Survey , L. Getoor, SIGKDD Explorations, 2005, 7(2), 3-12.
Semi-supervised Learning
Semi-Supervised Learning Literature Survey , X. Zhu, Computer Sciences TR 1530, University of Wisconsin — Madison.
Introduction to Semi-Supervised Learning, in Semi-Supervised Learning (Chapter 1) O. Chapelle, B. Scholkopf, A. Zien (eds.), MIT Press, 2006. (Fordham’s library has online access to the entire text)
Learning with Labeled and Unlabeled Data , M. Seeger, University of Edinburgh (unpublished), 2002.
Person Identification in Webcam Images: An Application of Semi-Supervised Learning , M. Balcan, A. Blum, P. Choi, J. lafferty, B. Pantano, M. Rwebangira, X. Zhu, Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data , 2005.
Learning from Labeled and Unlabeled Data: An Empirical Study across Techniques and Domains , N. Chawla, G. Karakoulas, Journal of Artificial Intelligence Research , 23:331-366, 2005.
Text Classification from Labeled and Unlabeled Documents using EM , K. Nigam, A. McCallum, S. Thrun, T. Mitchell, Machine Learning , 39, 103-134, 2000.
Self-taught Learning: Transfer Learning from Unlabeled Data , R. Raina, A. Battle, H. Lee, B. Packer, A. Ng, in Proceedings of the 24th International Conference on Machine Learning , 2007.
An iterative algorithm for extending learners to a semisupervised setting , M. Culp, G. Michailidis, 2007 Joint Statistical Meetings (JSM), 2007
Partially-Supervised Learning / Learning with Uncertain Class Labels
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers , V. Sheng, F. Provost, P. Ipeirotis, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2008.
Logistic Regression for Partial Labels , in 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems , Volume III, pp. 1935-1941, 2002.
Classification with Partial labels , N. Nguyen, R. Caruana, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2008.
Imprecise and Uncertain Labelling: A Solution based on Mixture Model and Belief Functions, E. Come, 2008 (powerpoint slides).
Induction of Decision Trees from Partially Classified Data Using Belief Functions , M. Bjanger, Norweigen University of Science and Technology, 2000.
Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth , P. Smyth, M. Burl, U. Fayyad, P. Perona, KDD Workshop 1994, AAAI Technical Report WS-94-03, pp. 109-120, 1994.
Recommender Systems
Trust No One: Evaluating Trust-based Filtering for Recommenders , J. O’Donovan and B. Smyth, In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05), 2005, 1663-1665.
Trust in Recommender Systems, J. O’Donovan and B. Symyth, In Proceedings of the 10th International Conference on Intelligent User Interfaces (IUI-05), 2005, 167-174.
General resources available on this topic :
ICML 2003 Workshop: Learning from Imbalanced Data Sets II
AAAI ‘2000 Workshop on Learning from Imbalanced Data Sets
A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data , G. Batista, R. Prati, and M. Monard, SIGKDD Explorations , 6(1):20-29, 2004.
Class Imbalance versus Small Disjuncts , T. Jo and N. Japkowicz, SIGKDD Explorations , 6(1): 40-49, 2004.
Extreme Re-balancing for SVMs: a Case Study , B. Raskutti and A. Kowalczyk, SIGKDD Explorations , 6(1):60-69, 2004.
A Multiple Resampling Method for Learning from Imbalanced Data Sets , A. Estabrooks, T. Jo, and N. Japkowicz, in Computational Intelligence , 20(1), 2004.
SMOTE: Synthetic Minority Over-sampling Technique , N. Chawla, K. Boyer, L. Hall, and W. Kegelmeyer, Journal of Articifial Intelligence Research , 16:321-357.
Generative Oversampling for Mining Imbalanced Datasets, A. Liu, J. Ghosh, and C. Martin, Third International Conference on Data Mining (DMIN-07), 66-72.
Learning from Little: Comparison of Classifiers Given Little of Classifiers given Little Training , G. Forman and I. Cohen, in 8th European Conference on Principles and Practice of Knowledge Discovery in Databases , 161-172, 2004.
Issues in Mining Imbalanced Data Sets – A Review Paper , S. Visa and A. Ralescu, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference , pp. 67-73, 2005.
Wrapper-based Computation and Evaluation of Sampling Methods for Imbalanced Datasets , N. Chawla, L. Hall, and A. Joshi, in Proceedings of the 1st International Workshop on Utility-based Data Mining , 24-33, 2005.
C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , C. Drummond and R. Holte, in ICML Workshop onLearning from Imbalanced Datasets II , 2003.
C4.5 and Imbalanced Data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure , N. Chawla, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.
Class Imbalances: Are we Focusing on the Right Issue?, N. Japkowicz, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.
Learning when Data Sets are Imbalanced and When Costs are Unequal and Unknown , M. Maloof, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.
Uncertainty Sampling Methods for One-class Classifiers , P. Juszcak and R. Duin, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.
Active Learning
Improving Generalization with Active Learning , D Cohn, L. Atlas, and R. Ladner, Machine Learning 15(2), 201-221, May 1994.
On Active Learning for Data Acquisition , Z. Zheng and B. Padmanabhan, In Proc. of IEEE Intl. Conf. on Data Mining, 2002.
Active Sampling for Class Probability Estimation and Ranking , M. Saar-Tsechansky and F. Provost, Machine Learning 54:2 2004, 153-178.
The Learning-Curve Sampling Method Applied to Model-Based Clustering , C. Meek, B. Thiesson, and D. Heckerman, Journal of Machine Learning Research 2:397-418, 2002.
Active Sampling for Feature Selection , S. Veeramachaneni and P. Avesani, Third IEEE Conference on Data Mining, 2003.
Heterogeneous Uncertainty Sampling for Supervised Learning , D. Lewis and J. Catlett, In Proceedings of the 11th International Conference on Machine Learning, 148-156, 1994.
Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , G. Weiss and F. Provost, Journal of Artificial Intelligence Research, 19:315-354, 2003.
Active Learning using Adaptive Resampling , KDD 2000, 91-98.
Cost-Sensitive Learning
Types of Cost in Inductive Concept Learning , P. Turney, In Proceedings Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning.
Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , P. Chan and S. Stolfo, KDD 1998.
Recent Blogs
Artificial intelligence and machine learning: What’s the difference
Artificial Intelligence , Machine Learning
10 online courses for understanding machine learning
Machine Learning , Tutorials
How is ML Being Used to Handle Security Vulnerabilities?
Machine Learning
10 groups of machine learning algorithms
How a nearly forgotten physicist shaped internet access today
Massachuse...
FinTech 2019: 5 uses cases of machine learning in finance
Banking / Finance , Machine Learning
The biggest impact of machine learning for digital marketing professionals
Machine Learning , Marketing
Looking ahead: the innovative future of iOS in 2019
How machine learning is changing identity theft detection
Machine Learning , Privacy / Security
Wearable technology to boost the process of digitalization of the modern world
Top 8 machine learning startups you should know about
The term...
How retargeting algorithms help in web personalization
others , Machine Learning
3 automation tools to help you in your next app build
Machine learning and information security: impact and trends
Machine Learning , Privacy / Security , Sectors , Tech and Tools
How to improve your productivity with AI and Machine Learning?
Artificial Intelligence , Human Resource , Machine Learning
Artificial...
Ask Data – A new and intuitive way to analyze data with natural language
10 free machine learning ebooks all scientists & ai engineers should read, yisi, a machine translation teacher who cracks down on errors in meaning, machine learning & license plate recognition: an ideal partnership, top 17 data science and machine learning vendors shortlisted by gartner, accuracy and bias in machine learning models – overview, interview with dejan s. milojicic on top technology trends and predictions for 2019.
Artificial Intelligence , Interviews , Machine Learning
Recently,...
Why every small business should use machine learning?
Microsoft’s ML.NET: A blend of machine learning and .NET
Machine learning: best examples and ideas for mobile apps, researchers harness machine learning to predict chemical reactions, subscribe to the crayon blog.
Get the latest posts in your inbox!
data mining techniques Recently Published Documents
Total documents.
- Latest Documents
- Most Cited Documents
- Contributed Authors
- Related Sources
- Related Keywords
Prediction of Skin Diseases Using Machine Learning
Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.
A Survey on Building Recommendation Systems Using Data Mining Techniques
Classification is a data mining technique or approach used to estimate the grouped membership of items on a basis of a common feature. This technique is virtuous for future planning and discovering new knowledge about a specific dataset. An in-depth study of previous pieces of literature implementing data mining techniques in the design of recommender systems was performed. This chapter provides a broad study of the way of designing recommender systems using various data mining classification techniques of machine learning and also exploiting their methodological decisions in four aspects, the recommendation approaches, data mining techniques, recommendation types, and performance measures. This study focused on some selected classification methods and can be so supportive for both the researchers and the students in the field of computer science and machine learning in strengthening their knowledge about the machine learning hypothesis and data mining.
A Classification and Clustering Approach Using Data Mining Techniques in Analysing Gastrointestinal Tract
Diagnosis and detection of plant diseases using data mining techniques, location-based crime prediction using multiclass classification data mining techniques, an effective approach to test suite reduction and fault detection using data mining techniques.
Software testing is used to find bugs in the software to provide a quality product to the end users. Test suites are used to detect failures in software but it may be redundant and it takes a lot of time for the execution of software. In this article, an enormous number of test cases are created using combinatorial test design algorithms. Attribute reduction is an important preprocessing task in data mining. Attributes are selected by removing all weak and irrelevant attributes to reduce complexity in data mining. After preprocessing, it is not necessary to test the software with every combination of test cases, since the test cases are large and redundant, the healthier test cases are identified using a data mining techniques algorithm. This is healthier and the final test suite will identify the defects in the software, it will provide better coverage analysis and reduces execution time on the software.
Applying data mining techniques to classify patients with suspected hepatitis C virus infection
Dengue fever prediction modelling using data mining techniques, fake news detection using data mining techniques.
Nowadays, internet has been well known as an information source where the information might be real or fake. Fake news over the web exist since several years. The main challenge is to detect the truthfulness of the news. The motive behind writing and publishing the fake news is to mislead the people. It causes damage to an agency, entity or person. This paper aims to detect fake news using semantic search.
A Leading Indicator Approach with Data Mining Techniques in Analysing Bitcoin Market Value
Export citation format, share document.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- NEWS FEATURE
- 17 July 2019
- Correction 19 July 2019
The plan to mine the world’s research papers
- Priyanka Pulla 0
Priyanka Pulla is a freelance journalist based in Bengaluru, India.
You can also search for this author in PubMed Google Scholar
Carl Malamud in front of the data store of 73 million articles that he plans to let scientists text mine. Credit: Smita Sharma for Nature
Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
24,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
185,98 € per year
only 3,65 € per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Nature 571 , 316-318 (2019)
doi: https://doi.org/10.1038/d41586-019-02142-1
Updates & Corrections
Correction 19 July 2019 : An earlier version of this feature used the term ‘fair use’ inappropriately — the term isn’t relevant under Indian law.
Reprints and permissions
Related Articles
Text-mining block prompts online response
Text-mining spat heats up
- Developing world
- Computer science
A guide to the Nature Index
Nature Index 13 MAR 24
Decoding chromatin states by proteomic profiling of nucleosome readers
Article 06 MAR 24
‘All of Us’ genetics chart stirs unease over controversial depiction of race
News 23 FEB 24
Rwanda 30 years on: understanding the horror of genocide
Editorial 09 APR 24
How I harnessed media engagement to supercharge my research career
Career Column 09 APR 24
Three ways ChatGPT helps me in my academic writing
Career Column 08 APR 24
AI can help to tailor drugs for Africa — but Africans should lead the way
Comment 09 APR 24
Time to sound the alarm about the hidden epidemic of kidney disease
Editorial 03 APR 24
PhD position (all genders) in AI for biomedical data analysis
PhD position (all genders) in AI for biomedical data analysis Part time | Temporary | Arbeitsort: Hamburg-Eppendorf UKE_Zentrum für Molekulare Ne...
Hamburg (DE)
Personalwerk GmbH
Postdoctoral fellow in structure determination of membrane proteins using cryo-EM
The Institute of Biomedicine is involved in both research and education. In both of these areas, we focus on fundamental knowledge of the living ce...
Gothenburg (Stad), Västra Götaland (SE)
University of Gothenburg
Postdoctoral Research Fellow in Neuroscience
Postdoc in Neuroscience at McGill University. Explore neocortical circuits & plasticity with electrophysiology & 2-photon optics. Apply by July 31.
Montréal, Quebec (CA)
McGill University
Postdoctoral Associate- Endometriosis
Houston, Texas (US)
Baylor College of Medicine (BCM)
Postdoctoral Research Fellow at the Dalian Institute of Chemical Physics
Located in the beautiful coastal city of Dalian, surrounded by mountains and sea, DICP seeks all talents from around the globe.
Dalian, Liaoning, China
The Dalian Institute of Chemical Physics (DICP)
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .
Enter the email address you signed up with and we'll email you a reset link.
- We're Hiring!
- Help Center
Applying Data Mining Research Methodologies on Information Systems
In this paper we considered several frameworks for data mining. These frameworks are based on different approaches, including inductive databases approach, the reductionist statistical approaches, data compression approach, constructive induction approach and some others. We considered advantages and limitations of these frameworks. We presented the view on data mining research as continuous and never- ending development process of an adaptive DM system towards the efficient utilization of available DM techniques for solving a current problem impacted by the dynamically changing environment. We discussed one of the traditional information systems frameworks and, drawing the analogy to this framework, we considered a data mining system as the special kind of adaptive information system. We adapted the information systems development framework for the context of data-mining systems development. Key words: Data Mining, Information Systems, Knowledge Discovery Databases
Related Papers
Information Systems Development
Seppo Puuronen
Abstract Data mining applications are typically used in the decision making process. The knowledge discovery process (KDD process for short) is a typical iterative process, in which not only the raw data can be mined several times, but also the mined patterns might constitute the starting point for further mining on them.
Data Mining and Knowledge Discovery
Jean-François Boulicaut
Discovery Science
Panče Panov
Lecture Notes in Computer Science
Christophe Rigotti , Toon Calders
Data Mining Workshops, …
Motivated by the need for unification of the field of data mining and the growing demand for formalized representation of outcomes of research, we address the task of constructing an ontology of data mining. The proposed ontology, named OntoDM, is based on a recent proposal of a general framework for data mining, and includes definitions of basic data mining entities, such as datatype and dataset, data mining task, data mining algorithm and components thereof (e.g., distance function), etc. It also allows for the definition of more complex entities, e.g., constraints in constraint-based data mining, sets of such constraints (inductive queries) and data mining scenarios (sequences of inductive queries). Unlike most existing approaches to constructing ontologies of data mining, OntoDM is a deep/heavy-weight ontology and follows best practices in ontology engineering, such as not allowing multiple inheritance of classes, using a predefined set of relations and usinga top level ontology.
Grace L Samson , Aminat Showole
Spatial data mining is the quantitative study of phenomena that are located in space. This paper investigates methods of mining patterns of a complex spatial data set (which generally describes any kind of data where the location in space of object holds importance). We based this research on the analysis of some spatial characteristics of certain objects. We began with describing the spatial pattern of events or objects with respect to their attributes; we looked at how to describe the spatial nature/characteristics of entities in an environment with respect to their spatial and non-spatial attributes. We also looked at modelling (predictive modelling/knowledge management of complex spatial systems), querying and implementing a complex spatial database (using data structure and algorithms). Critically speaking, the presence of spatial auto-correlation and the fact that continuous data types are always present in spatial data makes it important to create methods, tools and algorithms to mine spatial patterns in a complex spatial data set. This work is particularly useful to researchers in the ¯eld of data mining as it contributes a whole lot of knowledge to di®erent application areas of data mining especially spatial data mining. It can also be useful in teaching and likewise for other study purposes.
Abstract In recent years more interest of the data mining research community has been deserved in the topic of constrained-based mining because it increases the relevance of the result-set, reduces its volume and the amount of computational work. However, constrained-based mining will be completely feasible only when e cient optimizers for mining languages will be conceived and available. This paper is a rst step towards the construction of optimizers for a constraint-based mining language.
RELATED PAPERS
Lothar Richter
Logics for emerging …
Giuseppe Manco
Barbara Buttenfield , Mark Gahegan , May Yuan
Roberto Trasarti
Cosmin Popescu
Computers & Graphics
Fabrice Guillet
ACM Computing Surveys
John Roddick , Carl Mooney
Ruggero G. Pensa
Knowledge Engineering Review
Kenneth McGarry
Ciência da Informação
Scott Cunningham , Alan Porter
Nazha Selmaoui
Anustup Nayak
ACM Transactions on Database Systems
Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08
Knowledge and Information Systems
Nazha Selmaoui-Folcher
Proceedings of the 2006 ACM symposium on Applied computing - SAC '06
Ieva Mitasiunaite
Maristella Matera
Elisa Fromont
Decision Engineering
Guisseppi Forgionne
Information Systems
Christoph Helma
IEEE Transactions on Knowledge and Data Engineering
Myra Spiliopoulou
Elisa Bertino
Proceedings of the ACM SIGKDD Workshop on Useful Patterns - UP '10
Carson Leung
Alfred Vella
Journal of Intelligent Information Systems
Bertrand Cuissart
ACM SIGKDD Workshop on Useful Patterns (UP'10)
Sigkdd Explorations
Daniel Lister
in Silico Biology
Mohamed Quafafou , Jean Vaillancourt
RELATED TOPICS
- We're Hiring!
- Help Center
- Find new research papers in:
- Health Sciences
- Earth Sciences
- Cognitive Science
- Mathematics
- Computer Science
- Academia ©2024
Data Analytics in e-Learning: Approaches and Applications pp 21–42 Cite as
Public Datasets and Data Sources for Educational Data Mining
- P. S. Popescu ORCID: orcid.org/0000-0003-4504-6144 4 ,
- M. C. Mihăescu 4 &
- M. L. Mocanu 4
- First Online: 23 March 2022
496 Accesses
Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 220))
Datasets are the starting point for any Machine Learning or Data mining workflow, and their impact on the overall performance of the whole system is vast. The data sources offer a variety of data that can be used directly or needs more or less preprocessing to produce a suitable dataset, but the main problem is what data is available and what data should be used to solve a specific task. The datasets explored in this chapter focus on the educational area and can directly impact any educational environment: online or just classical full-time education. This chapter aims to clarify what public datasets we have at this point, for what tasks are suitable, and presents a use case for a specific dataset in detail. The dataset referred deeply in this chapter was recently produced and includes attributes that can easily be mined from any e-Learning platform, so it can be a good baseline for anyone who starts in Educational Data Mining or tries to perform sample experiments in order to get a better insight before implementing a workflow in their system.
- Machine learning
- Educational data mining
- Decision trees
This is a preview of subscription content, log in via an institution .
Buying options
- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Available as EPUB and PDF
- Compact, lightweight edition
- Dispatched in 3 to 5 business days
- Free shipping worldwide - see info
- Durable hardcover edition
Tax calculation will be finalised at checkout
Purchases are for personal use only
B.D.E, https://www.upenn.edu/learninganalytics/ryanbaker/bigdataeducation.html .
IEEEXplore, https://ieeexplore.ieee.org/Xplore/home.jsp .
Elsevier Science Direct, https://www.sciencedirect.com .
Research Gate, https://www.researchgate.net .
Data Collection Project, https://bluej.org/blackbox/ .
UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/index.php .
University Data Set, https://archive.ics.uci.edu/ml/datasets/University .
Data Set, https://archive.ics.uci.edu/ml/datasets/Teaching+Assistant+Evaluation .
Student Performance Data Set, http://archive.ics.uci.edu/ml/datasets/Student+Performance .
Student Alcohol Consumption, https://www.kaggle.com/uciml/student-alcohol-consumption .
User Knowledge, http://archive.ics.uci.edu/ml/datasets/User+Knowledge+Modeling .
Educational Process Mining (EPM), https://tinyurl.com/y27yduo3 .
Open University Learning Analytics dataset, https://tinyurl.com/y6ysfant .
S.A.P., https://archive.ics.uci.edu/ml/datasets/Student+Academics+Performance .
Mendeley Data Repository, https://data.mendeley.com .
MOOC lectures dataset, https://data.mendeley.com/datasets/xknjp8pxbj/1 .
Dataset, https://data.mendeley.com/datasets/68mt8gms4j/2 .
Dataset, https://data.mendeley.com/datasets/6jmv43nffk/2 .
Dataset, https://data.mendeley.com/datasets/83tcx8psxv/1 .
KEEL, http://www.keel.es/
Harvard Dataverse, https://dataverse.harvard.edu .
HarvardX Person-Course Academic Year 2013, https://doi.org/10.7910/DVN/26147 .
Canvas Network Person-Course, https://doi.org/10.7910/DVN/1XORAL .
(MOOC-Ed), https://doi.org/10.7910/DVN/ZZH3UB .
CAMEO Dataset, https://doi.org/10.7910/DVN/3UKVOR .
Nursing Student Data, https://doi.org/10.7910/DVN/MQ8EP0 .
Dataset, https://doi.org/10.7910/DVN/M07HQ7 .
Early Reading and Writing Assessment in Preschool, https://doi.org/10.7910/DVN/V7E9XD .
DataShop@CMU, https://pslcdatashop.web.cmu.edu/index.jsp?datasets=public .
KDD Cup 2010, https://pslcdatashop.web.cmu.edu/KDDCup .
What Do You Know? https://www.kaggle.com/c/WhatDoYouKnow .
Automated Essay Scoring, https://www.kaggle.com/c/asap-aes .
Short Answer Scoring, https://www.kaggle.com/c/asap-sas .
Students' Academic Performance Dataset, https://www.kaggle.com/aljarah/xAPI-Edu-Data .
NAEP 2017, https://sites.google.com/view/assistmentsdatamining .
NAEP 2019: https://sites.google.com/view/dataminingcompetition2019 .
Students Performance https://www.kaggle.com/spscientist/students-performance-in-exams .
CSEDM 2019 Data Challenge, https://sites.google.com/asu.edu/csedm-ws-lak-2019 .
KC Modeling for Programming, https://pslcdatashop.web.cmu.edu/Project?id=294 .
CSEDM 2020, https://sites.google.com/ncsu.edu/csedm-ws-edm-2020/data-challenge .
EdNet dataset, https://github.com/riiid/ednet .
Riiid AIEd Challenge 2020, https://www.kaggle.com/c/riiid-test-answer-prediction .
Multimodal learning Math Data Corpus, http://mla.ucsd.edu/data .
Learn Moodle August 2016, https://research.moodle.org/158 .
Lix Puzzle-game, https://sites.google.com/site/learninganalyticsforall/data-sets/lix-dataset .
Student Life Dataset, http://studentlife.cs.dartmouth.edu .
Dataset for empirical evaluation of entry requirements, https://tinyurl.com/yxqf2v42 .
MUTLA, https://tinyurl.com/SAILdata .
https://www.kaggle.com/cristianmihaescu/dsa-test-dataset/kernels .
Popescu, P.S., Mihaescu, M.C., Teodorescu, O.M., Mocanu, M.: Student testing activity dataset from data structures course. In: RoCHI-International Conference on Human-Computer Interaction, p. 157 (2020)
Google Scholar
Mihaescu, M.C., Popescu, P.S.: Review on publicly available datasets for educational data mining. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 11 (3), e1403 (2021)
Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst. Appl. 33 (1), 135–146 (2007)
Article Google Scholar
Baker, R.S., Yacef, K.: The state of educational data mining in 2009: a review and future visions. J. Educ. Data Min. 1 (1), 3–17 (2009)
Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40 (6), 601–618 (2010)
Koedinger, K.R., Baker, R.S., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J.: A data repository for the EDM community: the PSLC DataShop. Handb. Educ. Data Min. 43 , 43–56 (2010)
Merceron, A.: Educational data mining/learning analytics: methods, tasks and current trends. In: DeLFI Workshops, pp. 101–109 (2015)
Dutt, A., Ismail, M.A., Herawan, T.: A systematic review on educational data mining. IEEE Access 5 , 15991–16005 (2017)
Silva, C., Fonseca, J.: Educational Data Mining: a literature review. In: Europe and MENA Cooperation Advances in Information and Communication Technologies, pp. 87–94 (2017)
Rodrigues, M.W., Isotani, S., Zarate, L.E.: Educational data mining: a review of evaluation process in the e-learning. Telemat. Inform. 35 (6), 1701–1717 (2018)
Romero, C., Ventura, S.: Data mining in education. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 3 (1), 12–27 (2013)
Romero, C., Ventura, S.: Educational data science in massive open online courses. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 7 (1), e1187 (2017)
Romero, C., Ventura, S.: Educational data mining and learning analytics: an updated survey. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 10 (3), e1355 (2020)
Slater, S., Joksimović, S., Kovanovic, V., Baker, R.S., Gasevic, D.: Tools for educational data mining: a review. J. Educ. Behav. Stat. 42 (1), 85–106 (2017)
Ihantola, P., Vihavainen, A., Ahadi, A., Butler, M., Börstler, J., Edwards, S.H., Toll, D.: Educational data mining and learning analytics in programming: Literature review and case studies. In: Proceedings of the 2015 ITiCSE on Working Group Reports, pp. 41–63 (2015)
Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. Proc. ACM Programm. Lang. 3 (POPL), 1–29 (2019)
Vieira, C., Parsons, P., Byrd, V.: Visual learning analytics of educational data: a systematic literature review and research agenda. Comput. Educ. 122 , 119–135 (2018)
Ferreira‐Mello, R., André, M., Pinheiro, A., Costa, E., Romero, C.: Text mining in education. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 9 (6), e1332 (2019)
Bakhshinategh, B., Zaiane, O.R., ElAtia, S., Ipperciel, D.: Educational data mining applications and tasks: a survey of the last 10 years. Educ. Inf. Technol. 23 (1), 537–553 (2018)
Loh, W.Y., Shih, Y.S.: Split selection methods for classification trees. Stat. Sin. 815–840 (1997)
Cortez, P., Silva, A.M.G.: Using data mining to predict secondary school student performance (2008)
Kahraman, H.T., Sagiroglu, S., Colak, I.: The development of intuitive knowledge classifier and the modeling of domain dependent data. Knowl.-Based Syst. 37 , 283–295 (2013)
Vahdat, M., Oneto, L., Anguita, D., Funk, M., Rauterberg, M.: A learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. In: European Conference on Technology Enhanced Learning, pp. 352–366. Springer, Cham (2015)
Kuzilek, J., Hlosta, M., Zdrahal, Z.: Open university learning analytics dataset. Sci. Data 4 (1), 1–8 (2017)
Hussain, S., Dahan, N.A., Ba-Alwib, F.M., Ribata, N.: Educational data mining and analysis of students’ academic performance using WEKA. Indones. J. Electr. Eng. Comput. Sci. 9 (2), 447–459 (2018)
Bhoi, N.K.: Mendeley data repository as a platform for research data management. Marching Libr.: Manag. Ski. Technol. Competencies, 481–487 (2018)
Kastrati, Z., Kurti, A., Imran, A.S.: WET: Word embedding-topic distribution vectors for MOOC video lectures dataset. Data Brief 28 , 105090 (2020)
Gómez-Tejedor, J.A., Vidaurre, A., Tort-Ausina, I., Mateo, J.M., Serrano, M.A., Meseguer-Dueñas, J.M., et al.: Data set on the effectiveness of flip teaching on engineering students’ performance in the physics lab compared to traditional methodology. Data Brief 28 , 104915 (2020)
Prasojo, L.D., Habibi, A., Yaakob, M.F.M., Pratama, R., Yusof, M.R., Mukminin, A., Hanum, F.: Teachers’ burnout: A SEM analysis in an Asian context. Heliyon 6 (1), e03144 (2020)
Delahoz-Dominguez, E., Zuluaga, R., Fontalvo-Herrera, T.: Dataset of academic performance evolution for engineering students. Data Brief 30 , 105537 (2020)
Hou, Y., Li, L., Li, B., Liu, J.: An anti-noise ensemble algorithm for imbalance classification. Intell. Data Anal. 23 (6), 1205–1217 (2019)
Ho, A., Reich, J., Nesterko, S., Seaton, D., Mullaney, T., Waldo, J., Chuang, I.: HarvardX and MITx: the first year of open online courses, fall 2012-summer 2013 (HarvardX and MITx Working Paper No. 1) (2014)
Kellogg, S., Edelmann, A.: Massively open online course for educators (MOOC-E d) network dataset. Br. J. Educ. Technol. 46 (5), 977–983 (2015)
Northcutt, C.G., Ho, A.D., Chuang, I.L.: Detecting and preventing “multiple-account” cheating in massive open online courses. Comput. Educ. 100 , 71–80 (2016)
Stamper, J., Niculescu-Mizil, A., Ritter, S., Gordon, G.J., Koedinger, K.R.: Bridge to algebra 2006–2007. Development data set from KDD cup 2010 educational data mining challenge (2010)
Amrieh, E.A., Hamtini, T., Aljarah, I.: Mining educational data to predict student’s academic performance using ensemble methods. Int. J. Database Theory Appl. 9 (8), 119–136 (2016)
Choi, Y., Lee, Y., Shin, D., Cho, J., Park, S., Lee, S., et al.: Ednet: a large-scale hierarchical dataset in education. In: International Conference on Artificial Intelligence in Education, pp. 69–73. Springer, Cham (2020)
Oviatt, S., Cohen, A., Weibel, N.: Multimodal learning analytics: description of math data corpus for ICMI grand challenge workshop. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 563–568 (2013)
Vahdat, M., Carvalho, M.B., Funk, M., Rauterberg, M., Hu, J., Anguita, D. Learning analytics for a puzzle game to discover the puzzle-solving tactics of players. In: European Conference on Technology Enhanced Learning, pp. 673–677. Springer, Cham (2016)
Wang, R., Chen, F., Chen, Z., Li, T., Harari, G., Tignor, S., et al.: StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 3–14 (2014)
Wang, R., Chen, F., Chen, Z., Li, T., Harari, G., Tignor, S., et al.: StudentLife: Using smartphones to assess mental health and academic performance of college students. In: Mobile Health, pp. 7–33. Springer, Cham (2017)
Odukoya, J.A., Popoola, S.I., Atayero, A.A., Omole, D.O., Badejo, J.A., John, T.M., Olowo, O.O.: Learning analytics: dataset for empirical evaluation of entry requirements into engineering undergraduate programs in a Nigerian university. Data Brief 17 , 998–1014 (2018)
Xu, F., Wu, L., Thai, K.P., Hsu, C., Wang, W., Tong, R.: MUTLA: a large-scale dataset for multimodal teaching and learning analytics. arXiv:1910.06078 (2019)
Teodorescu, O.M., Popescu, P.S., Mihaescu, M.C.: Taking e-assessment quizzes-a case study with an SVD based recommender system. In: International Conference on Intelligent Data Engineering and Automated Learning, pp. 829–837. Springer, Cham (2018)
Download references
Author information
Authors and affiliations.
Department of Computer Science and Information Technology, University of Craiova, Str. A.I. Cuza, Nr. 13, Craiova, Romania
P. S. Popescu, M. C. Mihăescu & M. L. Mocanu
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to P. S. Popescu .
Editor information
Editors and affiliations.
University of Craiova, Craiova, Romania
Marian Cristian Mihăescu
Rights and permissions
Reprints and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter.
Popescu, P.S., Mihăescu, M.C., Mocanu, M.L. (2022). Public Datasets and Data Sources for Educational Data Mining. In: Mihăescu, M.C. (eds) Data Analytics in e-Learning: Approaches and Applications. Intelligent Systems Reference Library, vol 220. Springer, Cham. https://doi.org/10.1007/978-3-030-96644-7_2
Download citation
DOI : https://doi.org/10.1007/978-3-030-96644-7_2
Published : 23 March 2022
Publisher Name : Springer, Cham
Print ISBN : 978-3-030-96643-0
Online ISBN : 978-3-030-96644-7
eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)
Share this chapter
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Publish with us
Policies and ethics
- Find a journal
- Track your research
We will keep fighting for all libraries - stand with us!
Internet Archive Audio
- This Just In
- Grateful Dead
- Old Time Radio
- 78 RPMs and Cylinder Recordings
- Audio Books & Poetry
- Computers, Technology and Science
- Music, Arts & Culture
- News & Public Affairs
- Spirituality & Religion
- Radio News Archive
- Flickr Commons
- Occupy Wall Street Flickr
- NASA Images
- Solar System Collection
- Ames Research Center
- All Software
- Old School Emulation
- MS-DOS Games
- Historical Software
- Classic PC Games
- Software Library
- Kodi Archive and Support File
- Vintage Software
- CD-ROM Software
- CD-ROM Software Library
- Software Sites
- Tucows Software Library
- Shareware CD-ROMs
- Software Capsules Compilation
- CD-ROM Images
- ZX Spectrum
- DOOM Level CD
- Smithsonian Libraries
- FEDLINK (US)
- Lincoln Collection
- American Libraries
- Canadian Libraries
- Universal Library
- Project Gutenberg
- Children's Library
- Biodiversity Heritage Library
- Books by Language
- Additional Collections
- Prelinger Archives
- Democracy Now!
- Occupy Wall Street
- TV NSA Clip Library
- Animation & Cartoons
- Arts & Music
- Computers & Technology
- Cultural & Academic Films
- Ephemeral Films
- Sports Videos
- Videogame Videos
- Youth Media
Search the history of over 866 billion web pages on the Internet.
Mobile Apps
- Wayback Machine (iOS)
- Wayback Machine (Android)
Browser Extensions
Archive-it subscription.
- Explore the Collections
- Build Collections
Save Page Now
Capture a web page as it appears now for use as a trusted citation in the future.
Please enter a valid web address
- Donate Donate icon An illustration of a heart shape
Data mining for dummies
Bookreader item preview, share or embed this item, flag this item for.
- Graphic Violence
- Explicit Sexual Content
- Hate Speech
- Misinformation/Disinformation
- Marketing/Phishing/Advertising
- Misleading/Inaccurate/Missing Metadata
plus-circle Add Review comment Reviews
8 Favorites
Better World Books
DOWNLOAD OPTIONS
No suitable files to display here.
IN COLLECTIONS
Uploaded by station65.cebu on November 4, 2022
IMAGES
VIDEO
COMMENTS
Data Mining Algorithms and Techniques. Various algorithms and techniques like Classification, Clustering, Regression, Artificial. Intelligence, Neural Networks, Association Rules, Decision Trees ...
The accurate average value is 74.05% of the existing COID algorithm, and our proposed algorithm has 77.21%. The average recall value is 81.19% and 89.51% of the existing and proposed algorithm, which shows that the proposed work efficiency is better than the existing COID algorithm. Download Full-text.
To take a holistic view of the research trends in the area of data mining, a comprehensive survey is presented in this paper. This paper presents a systematic and comprehensive survey of various data mining tasks and techniques. Further, various real-life applications of data mining are presented in this paper.
Data mining plays an important role in various human activities because it extracts the unknown useful patterns (or knowledge). Due to its capabilities, data mining become an essential task in large number of application domains such as banking, retail, medical, insurance, bioinformatics, etc. To take a holistic view of the research trends in the area of data mining, a comprehensive survey is ...
Abstract - Data mining is used regularly in a variety of in-dustries and is continuing to gain in both popularity and ac-ceptance. However, applying data mining methods to complex real-world tasks is far from straightforward and many pitfalls face data mining practitioners. However, most research in the field tends to focus on the algorithmic ...
propose feature directions some of data mining applications. We have added the scope of the data mining applications so that the researcher can pin pointed the following areas. 2. The Data Mining Task The data mining tasks are of d ifferent types depending on the use of data mining result the data mining tasks are classified as[1,2]:
Data Mining Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL, USA Synonyms Data analysis; Knowledge discovery from data; Pattern discovery Definition Data mining is the process of discovering knowl-edge or patterns from massive amounts of data. As a young research field, data mining represents
Issues in Mining Imbalanced Data Sets - A Review Paper, S. Visa and A. Ralescu, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67-73, 2005. Wrapper-based Computation and Evaluation of Sampling Methods for Imbalanced Datasets , N. Chawla, L. Hall, and A. Joshi, in Proceedings of the 1st ...
Statistical Analysis and Data Mining addresses the broad area of data analysis, including data mining algorithms, statistical approaches, and practical applications. Topics include problems involving massive and complex datasets, solutions utilizing innovative data mining algorithms and/or novel statistical approaches.
The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ...
Abstract. Data mining is the process of extracting hidden and useful patterns and information from data. Data mining is a new technology that helps businesses to predict future trends and behaviors, allowing them to make proactive, knowledge driven decisions. The aim of this paper is to show the process of data mining and how it can help ...
Originally, "data mining" or "data dredging" was a derogatory term referring to attempts to extract information that was not supported by the data. Section 1.2 illustrates the sort of errors one can make by trying to extract what really isn't in the data. Today, "data mining" has taken on a positive meaning.
1.2 Data mining techniques 1.2.1 Abrief overview Many data mining techniques have been developed over the years. Some of them are conceptually very simple, and some others are more complex and may lead to the formulation of a global optimization problem (see Section 1.4). In data mining, the goal is to split data in different categories, each ...
The power of data mining. ... Nehru University in New Delhi to extract text and images from 73 million research papers. ... sources that provide free-to-download versions of papers (such as PubMed ...
Download Free PDF. Download Free PDF. ... This paper investigates methods of mining patterns of a complex spatial data set (which generally describes any kind of data where the location in space of object holds importance). ... From the data mining research point of view the constructive approach can be seen to help to manipulate and coordinate ...
Descriptive vs. predictive data mining • Multiple/integrated functions and mining at multiple levels • Techniques utilized • Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition, visualization, high- performance, etc. • Applications adapted • Retail, telecommunication, banking, fraud analysis, bio ...
www.scitepress.org
Download book PDF. Download book EPUB. Data Analytics in e-Learning: Approaches and Applications ... Of particular importance in the area of general Educational Data Mining, review papers are the works of [11,12,13]. ... This datasets overview is relevant for the Educational Data Mining research area, making it easier to pick a suitable dataset ...
"Learn to: understand key data mining concepts and best practices; create a data model and test its validity; interpret results and communicate your findings; make a business case for investing in data mining"--Cover ... Pdf_module_version 0.0.20 Ppi 360 Rcs_key 24143 Republisher_date 20221109115924 Republisher_operator associate-teresita ...