50 selected papers in Data Mining and Machine Learning

Here is the list of 50 selected papers in Data Mining and Machine Learning . You can download them for your detailed reading and research. Enjoy!

Data Mining and Statistics: What’s the Connection?

Data Mining: Statistics and More? , D. Hand, American Statistician, 52(2):112-118.

Data Mining , G. Weiss and B. Davison, in Handbook of Technology Management, John Wiley and Sons, expected 2010.

From Data Mining to Knowledge Discovery in Databases , U. Fayyad, G. Piatesky-Shapiro & P. Smyth, AI Magazine, 17(3):37-54, Fall 1996.

Mining Business Databases , Communications of the ACM, 39(11): 42-48.

10 Challenging Problems in Data Mining Research , Q. Yiang and X. Wu, International Journal of Information Technology & Decision Making, Vol. 5, No. 4, 2006, 597-604.

The Long Tail , by Anderson, C., Wired magazine.

AOL’s Disturbing Glimpse Into Users’ Lives , by McCullagh, D., News.com, August 9, 2006

General Data Mining Methods and Algorithms

Top 10 Algorithms in Data Mining , X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. motoda, G.J. MClachlan, A. Ng, B. Liu, P.S. Yu, Z. Zhou, M. Steinbach, D. J. Hand, D. Steinberg, Knowl Inf Syst (2008) 141-37.

Induction of Decision Trees , R. Quinlan, Machine Learning, 1(1):81-106, 1986.

Web and Link Mining

The Pagerank Citation Ranking: Bringing Order to the Web , L. Page, S. Brin, R. Motwani, T. Winograd, Technical Report, Stanford University, 1999.

The Structure and Function of Complex Networks , M. E. J. Newman, SIAM Review, 2003, 45, 167-256.

Link Mining: A New Data Mining Challenge , L. Getoor, SIGKDD Explorations, 2003, 5(1), 84-89.

Link Mining: A Survey , L. Getoor, SIGKDD Explorations, 2005, 7(2), 3-12.

Semi-supervised Learning

Semi-Supervised Learning Literature Survey , X. Zhu, Computer Sciences TR 1530, University of Wisconsin — Madison.

Introduction to Semi-Supervised Learning, in Semi-Supervised Learning (Chapter 1) O. Chapelle, B. Scholkopf, A. Zien (eds.), MIT Press, 2006. (Fordham’s library has online access to the entire text)

Learning with Labeled and Unlabeled Data , M. Seeger, University of Edinburgh (unpublished), 2002.

Person Identification in Webcam Images: An Application of Semi-Supervised Learning , M. Balcan, A. Blum, P. Choi, J. lafferty, B. Pantano, M. Rwebangira, X. Zhu, Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data , 2005.

Learning from Labeled and Unlabeled Data: An Empirical Study across Techniques and Domains , N. Chawla, G. Karakoulas, Journal of Artificial Intelligence Research , 23:331-366, 2005.

Text Classification from Labeled and Unlabeled Documents using EM , K. Nigam, A. McCallum, S. Thrun, T. Mitchell, Machine Learning , 39, 103-134, 2000.

Self-taught Learning: Transfer Learning from Unlabeled Data , R. Raina, A. Battle, H. Lee, B. Packer, A. Ng, in Proceedings of the 24th International Conference on Machine Learning , 2007.

An iterative algorithm for extending learners to a semisupervised setting , M. Culp, G. Michailidis, 2007 Joint Statistical Meetings (JSM), 2007

Partially-Supervised Learning / Learning with Uncertain Class Labels

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers , V. Sheng, F. Provost, P. Ipeirotis, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2008.

Logistic Regression for Partial Labels , in 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems , Volume III, pp. 1935-1941, 2002.

Classification with Partial labels , N. Nguyen, R. Caruana, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2008.

Imprecise and Uncertain Labelling: A Solution based on Mixture Model and Belief Functions, E. Come, 2008 (powerpoint slides).

Induction of Decision Trees from Partially Classified Data Using Belief Functions , M. Bjanger, Norweigen University of Science and Technology, 2000.

Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth , P. Smyth, M. Burl, U. Fayyad, P. Perona, KDD Workshop 1994, AAAI Technical Report WS-94-03, pp. 109-120, 1994.

Recommender Systems

Trust No One: Evaluating Trust-based Filtering for Recommenders , J. O’Donovan and B. Smyth, In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05), 2005, 1663-1665.

Trust in Recommender Systems, J. O’Donovan and B. Symyth, In Proceedings of the 10th International Conference on Intelligent User Interfaces (IUI-05), 2005, 167-174.

General resources available on this topic :

ICML 2003 Workshop: Learning from Imbalanced Data Sets II

AAAI ‘2000 Workshop on Learning from Imbalanced Data Sets

A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data , G. Batista, R. Prati, and M. Monard, SIGKDD Explorations , 6(1):20-29, 2004.

Class Imbalance versus Small Disjuncts , T. Jo and N. Japkowicz, SIGKDD Explorations , 6(1): 40-49, 2004.

Extreme Re-balancing for SVMs: a Case Study , B. Raskutti and A. Kowalczyk, SIGKDD Explorations , 6(1):60-69, 2004.

A Multiple Resampling Method for Learning from Imbalanced Data Sets , A. Estabrooks, T. Jo, and N. Japkowicz, in Computational Intelligence , 20(1), 2004.

SMOTE: Synthetic Minority Over-sampling Technique , N. Chawla, K. Boyer, L. Hall, and W. Kegelmeyer, Journal of Articifial Intelligence Research , 16:321-357.

Generative Oversampling for Mining Imbalanced Datasets, A. Liu, J. Ghosh, and C. Martin, Third International Conference on Data Mining (DMIN-07), 66-72.

Learning from Little: Comparison of Classifiers Given Little of Classifiers given Little Training , G. Forman and I. Cohen, in 8th European Conference on Principles and Practice of Knowledge Discovery in Databases , 161-172, 2004.

Issues in Mining Imbalanced Data Sets – A Review Paper , S. Visa and A. Ralescu, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference , pp. 67-73, 2005.

Wrapper-based Computation and Evaluation of Sampling Methods for Imbalanced Datasets , N. Chawla, L. Hall, and A. Joshi, in Proceedings of the 1st International Workshop on Utility-based Data Mining , 24-33, 2005.

C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , C. Drummond and R. Holte, in ICML Workshop onLearning from Imbalanced Datasets II , 2003.

C4.5 and Imbalanced Data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure , N. Chawla, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.

Class Imbalances: Are we Focusing on the Right Issue?, N. Japkowicz, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.

Learning when Data Sets are Imbalanced and When Costs are Unequal and Unknown , M. Maloof, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.

Uncertainty Sampling Methods for One-class Classifiers , P. Juszcak and R. Duin, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.

Active Learning

Improving Generalization with Active Learning , D Cohn, L. Atlas, and R. Ladner, Machine Learning 15(2), 201-221, May 1994.

On Active Learning for Data Acquisition , Z. Zheng and B. Padmanabhan, In Proc. of IEEE Intl. Conf. on Data Mining, 2002.

Active Sampling for Class Probability Estimation and Ranking , M. Saar-Tsechansky and F. Provost, Machine Learning 54:2 2004, 153-178.

The Learning-Curve Sampling Method Applied to Model-Based Clustering , C. Meek, B. Thiesson, and D. Heckerman, Journal of Machine Learning Research 2:397-418, 2002.

Active Sampling for Feature Selection , S. Veeramachaneni and P. Avesani, Third IEEE Conference on Data Mining, 2003.

Heterogeneous Uncertainty Sampling for Supervised Learning , D. Lewis and J. Catlett, In Proceedings of the 11th International Conference on Machine Learning, 148-156, 1994.

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , G. Weiss and F. Provost, Journal of Artificial Intelligence Research, 19:315-354, 2003.

Active Learning using Adaptive Resampling , KDD 2000, 91-98.

Cost-Sensitive Learning

Types of Cost in Inductive Concept Learning , P. Turney, In Proceedings Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning.

Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , P. Chan and S. Stolfo, KDD 1998.

Recent Blogs

Artificial intelligence and machine learning: What’s the difference

Artificial intelligence and machine learning: What’s the difference

Artificial Intelligence , Machine Learning

10 online courses for understanding machine learning

10 online courses for understanding machine learning

Machine Learning , Tutorials

How is ML Being Used to Handle Security Vulnerabilities?

Machine Learning

10 groups of machine learning algorithms

10 groups of machine learning algorithms

How a nearly forgotten physicist shaped internet access today 

How a nearly forgotten physicist shaped internet access today 


FinTech 2019: 5 uses cases of machine learning in finance

FinTech 2019: 5 uses cases of machine learning in finance

Banking / Finance , Machine Learning

The biggest impact of machine learning for digital marketing professionals

The biggest impact of machine learning for digital marketing professionals

Machine Learning , Marketing

Looking ahead: the innovative future of iOS in 2019

How machine learning is changing identity theft detection

How machine learning is changing identity theft detection

Machine Learning , Privacy / Security

Wearable technology to boost the process of digitalization of the modern world

Wearable technology to boost the process of digitalization of the modern world

Top 8 machine learning startups you should know about

Top 8 machine learning startups you should know about

The term...

How retargeting algorithms help in web personalization

How retargeting algorithms help in web personalization

others , Machine Learning

3 automation tools to help you in your next app build

3 automation tools to help you in your next app build

Machine learning and information security: impact and trends

Machine learning and information security: impact and trends

Machine Learning , Privacy / Security , Sectors , Tech and Tools

How to improve your productivity with AI and Machine Learning?

How to improve your productivity with AI and Machine Learning?

Artificial Intelligence , Human Resource , Machine Learning


Ask Data – A new and intuitive way to analyze data with natural language

10 free machine learning ebooks all scientists & ai engineers should read, yisi, a machine translation teacher who cracks down on errors in meaning, machine learning & license plate recognition: an ideal partnership, top 17 data science and machine learning vendors shortlisted by gartner, accuracy and bias in machine learning models – overview, interview with dejan s. milojicic on top technology trends and predictions for 2019.

Artificial Intelligence , Interviews , Machine Learning


Why every small business should use machine learning?

Why every small business should use machine learning?

Microsoft’s ML.NET: A blend of machine learning and .NET

Microsoft’s ML.NET: A blend of machine learning and .NET

Machine learning: best examples and ideas for mobile apps, researchers harness machine learning to predict chemical reactions, subscribe to the crayon blog.

Get the latest posts in your inbox!

data mining techniques Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Prediction of Skin Diseases Using Machine Learning

Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.

A Survey on Building Recommendation Systems Using Data Mining Techniques

Classification is a data mining technique or approach used to estimate the grouped membership of items on a basis of a common feature. This technique is virtuous for future planning and discovering new knowledge about a specific dataset. An in-depth study of previous pieces of literature implementing data mining techniques in the design of recommender systems was performed. This chapter provides a broad study of the way of designing recommender systems using various data mining classification techniques of machine learning and also exploiting their methodological decisions in four aspects, the recommendation approaches, data mining techniques, recommendation types, and performance measures. This study focused on some selected classification methods and can be so supportive for both the researchers and the students in the field of computer science and machine learning in strengthening their knowledge about the machine learning hypothesis and data mining.

A Classification and Clustering Approach Using Data Mining Techniques in Analysing Gastrointestinal Tract

Diagnosis and detection of plant diseases using data mining techniques, location-based crime prediction using multiclass classification data mining techniques, an effective approach to test suite reduction and fault detection using data mining techniques.

Software testing is used to find bugs in the software to provide a quality product to the end users. Test suites are used to detect failures in software but it may be redundant and it takes a lot of time for the execution of software. In this article, an enormous number of test cases are created using combinatorial test design algorithms. Attribute reduction is an important preprocessing task in data mining. Attributes are selected by removing all weak and irrelevant attributes to reduce complexity in data mining. After preprocessing, it is not necessary to test the software with every combination of test cases, since the test cases are large and redundant, the healthier test cases are identified using a data mining techniques algorithm. This is healthier and the final test suite will identify the defects in the software, it will provide better coverage analysis and reduces execution time on the software.

Applying data mining techniques to classify patients with suspected hepatitis C virus infection

Dengue fever prediction modelling using data mining techniques, fake news detection using data mining techniques.

Nowadays, internet has been well known as an information source where the information might be real or fake. Fake news over the web exist since several years. The main challenge is to detect the truthfulness of the news. The motive behind writing and publishing the fake news is to mislead the people. It causes damage to an agency, entity or person. This paper aims to detect fake news using semantic search.

A Leading Indicator Approach with Data Mining Techniques in Analysing Bitcoin Market Value

Export citation format, share document.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • 17 July 2019
  • Correction 19 July 2019

The plan to mine the world’s research papers

  • Priyanka Pulla 0

Priyanka Pulla is a freelance journalist based in Bengaluru, India.

You can also search for this author in PubMed   Google Scholar

Carl Malamud in front of the data store of 73 million articles that he plans to let scientists text mine. Credit: Smita Sharma for Nature

Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

Nature 571 , 316-318 (2019)

doi: https://doi.org/10.1038/d41586-019-02142-1

Updates & Corrections

Correction 19 July 2019 : An earlier version of this feature used the term ‘fair use’ inappropriately — the term isn’t relevant under Indian law.

Reprints and permissions

Related Articles

data mining research papers pdf free download

Text-mining block prompts online response

Text-mining spat heats up

  • Developing world
  • Computer science

A guide to the Nature Index

A guide to the Nature Index

Nature Index 13 MAR 24

Decoding chromatin states by proteomic profiling of nucleosome readers

Decoding chromatin states by proteomic profiling of nucleosome readers

Article 06 MAR 24

‘All of Us’ genetics chart stirs unease over controversial depiction of race

‘All of Us’ genetics chart stirs unease over controversial depiction of race

News 23 FEB 24

Rwanda 30 years on: understanding the horror of genocide

Rwanda 30 years on: understanding the horror of genocide

Editorial 09 APR 24

How I harnessed media engagement to supercharge my research career

How I harnessed media engagement to supercharge my research career

Career Column 09 APR 24

Three ways ChatGPT helps me in my academic writing

Three ways ChatGPT helps me in my academic writing

Career Column 08 APR 24

AI can help to tailor drugs for Africa — but Africans should lead the way

AI can help to tailor drugs for Africa — but Africans should lead the way

Comment 09 APR 24

Time to sound the alarm about the hidden epidemic of kidney disease

Time to sound the alarm about the hidden epidemic of kidney disease

Editorial 03 APR 24

PhD position (all genders) in AI for biomedical data analysis

PhD position (all genders) in AI for biomedical data analysis Part time  | Temporary | Arbeitsort: Hamburg-Eppendorf UKE_Zentrum für Molekulare Ne...

Hamburg (DE)

Personalwerk GmbH

data mining research papers pdf free download

Postdoctoral fellow in structure determination of membrane proteins using cryo-EM

The Institute of Biomedicine is involved in both research and education. In both of these areas, we focus on fundamental knowledge of the living ce...

Gothenburg (Stad), Västra Götaland (SE)

University of Gothenburg

Postdoctoral Research Fellow in Neuroscience

Postdoc in Neuroscience at McGill University. Explore neocortical circuits & plasticity with electrophysiology & 2-photon optics. Apply by July 31.

Montréal, Quebec (CA)

McGill University

data mining research papers pdf free download

Postdoctoral Associate- Endometriosis

Houston, Texas (US)

Baylor College of Medicine (BCM)

data mining research papers pdf free download

Postdoctoral Research Fellow at the Dalian Institute of Chemical Physics

Located in the beautiful coastal city of Dalian, surrounded by mountains and sea, DICP seeks all talents from around the globe.

Dalian, Liaoning, China

The Dalian Institute of Chemical Physics (DICP)

data mining research papers pdf free download

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Applying Data Mining Research Methodologies on Information Systems

Profile image of Grace L Samson

In this paper we considered several frameworks for data mining. These frameworks are based on different approaches, including inductive databases approach, the reductionist statistical approaches, data compression approach, constructive induction approach and some others. We considered advantages and limitations of these frameworks. We presented the view on data mining research as continuous and never- ending development process of an adaptive DM system towards the efficient utilization of available DM techniques for solving a current problem impacted by the dynamically changing environment. We discussed one of the traditional information systems frameworks and, drawing the analogy to this framework, we considered a data mining system as the special kind of adaptive information system. We adapted the information systems development framework for the context of data-mining systems development. Key words: Data Mining, Information Systems, Knowledge Discovery Databases

Related Papers

Information Systems Development

Seppo Puuronen

data mining research papers pdf free download

Abstract Data mining applications are typically used in the decision making process. The knowledge discovery process (KDD process for short) is a typical iterative process, in which not only the raw data can be mined several times, but also the mined patterns might constitute the starting point for further mining on them.

Data Mining and Knowledge Discovery

Jean-François Boulicaut

Discovery Science

Panče Panov

Lecture Notes in Computer Science

Christophe Rigotti , Toon Calders

Data Mining Workshops, …

Motivated by the need for unification of the field of data mining and the growing demand for formalized representation of outcomes of research, we address the task of constructing an ontology of data mining. The proposed ontology, named OntoDM, is based on a recent proposal of a general framework for data mining, and includes definitions of basic data mining entities, such as datatype and dataset, data mining task, data mining algorithm and components thereof (e.g., distance function), etc. It also allows for the definition of more complex entities, e.g., constraints in constraint-based data mining, sets of such constraints (inductive queries) and data mining scenarios (sequences of inductive queries). Unlike most existing approaches to constructing ontologies of data mining, OntoDM is a deep/heavy-weight ontology and follows best practices in ontology engineering, such as not allowing multiple inheritance of classes, using a predefined set of relations and usinga top level ontology.

Grace L Samson , Aminat Showole

Spatial data mining is the quantitative study of phenomena that are located in space. This paper investigates methods of mining patterns of a complex spatial data set (which generally describes any kind of data where the location in space of object holds importance). We based this research on the analysis of some spatial characteristics of certain objects. We began with describing the spatial pattern of events or objects with respect to their attributes; we looked at how to describe the spatial nature/characteristics of entities in an environment with respect to their spatial and non-spatial attributes. We also looked at modelling (predictive modelling/knowledge management of complex spatial systems), querying and implementing a complex spatial database (using data structure and algorithms). Critically speaking, the presence of spatial auto-correlation and the fact that continuous data types are always present in spatial data makes it important to create methods, tools and algorithms to mine spatial patterns in a complex spatial data set. This work is particularly useful to researchers in the ¯eld of data mining as it contributes a whole lot of knowledge to di®erent application areas of data mining especially spatial data mining. It can also be useful in teaching and likewise for other study purposes.

Abstract In recent years more interest of the data mining research community has been deserved in the topic of constrained-based mining because it increases the relevance of the result-set, reduces its volume and the amount of computational work. However, constrained-based mining will be completely feasible only when e cient optimizers for mining languages will be conceived and available. This paper is a rst step towards the construction of optimizers for a constraint-based mining language.


Lothar Richter

Logics for emerging …

Giuseppe Manco

Barbara Buttenfield , Mark Gahegan , May Yuan

Roberto Trasarti

Cosmin Popescu

Computers & Graphics

Fabrice Guillet

ACM Computing Surveys

John Roddick , Carl Mooney

Ruggero G. Pensa

Knowledge Engineering Review

Kenneth McGarry

Ciência da Informação

Scott Cunningham , Alan Porter

Nazha Selmaoui

Anustup Nayak

ACM Transactions on Database Systems

Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08

Knowledge and Information Systems

Nazha Selmaoui-Folcher

Proceedings of the 2006 ACM symposium on Applied computing - SAC '06

Ieva Mitasiunaite

Maristella Matera

Elisa Fromont

Decision Engineering

Guisseppi Forgionne

Information Systems

Christoph Helma

IEEE Transactions on Knowledge and Data Engineering

Myra Spiliopoulou

Elisa Bertino

Proceedings of the ACM SIGKDD Workshop on Useful Patterns - UP '10

Carson Leung

Alfred Vella

Journal of Intelligent Information Systems

Bertrand Cuissart

ACM SIGKDD Workshop on Useful Patterns (UP'10)

Sigkdd Explorations

Daniel Lister

in Silico Biology

Mohamed Quafafou , Jean Vaillancourt


  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Book cover

Data Analytics in e-Learning: Approaches and Applications pp 21–42 Cite as

Public Datasets and Data Sources for Educational Data Mining

  • P. S. Popescu   ORCID: orcid.org/0000-0003-4504-6144 4 ,
  • M. C. Mihăescu 4 &
  • M. L. Mocanu 4  
  • First Online: 23 March 2022

496 Accesses

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 220))

Datasets are the starting point for any Machine Learning or Data mining workflow, and their impact on the overall performance of the whole system is vast. The data sources offer a variety of data that can be used directly or needs more or less preprocessing to produce a suitable dataset, but the main problem is what data is available and what data should be used to solve a specific task. The datasets explored in this chapter focus on the educational area and can directly impact any educational environment: online or just classical full-time education. This chapter aims to clarify what public datasets we have at this point, for what tasks are suitable, and presents a use case for a specific dataset in detail. The dataset referred deeply in this chapter was recently produced and includes attributes that can easily be mined from any e-Learning platform, so it can be a good baseline for anyone who starts in Educational Data Mining or tries to perform sample experiments in order to get a better insight before implementing a workflow in their system.

  • Machine learning
  • Educational data mining
  • Decision trees

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

B.D.E, https://www.upenn.edu/learninganalytics/ryanbaker/bigdataeducation.html .

IEEEXplore, https://ieeexplore.ieee.org/Xplore/home.jsp .

Elsevier Science Direct, https://www.sciencedirect.com .

Research Gate, https://www.researchgate.net .

Data Collection Project, https://bluej.org/blackbox/ .

UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/index.php .

University Data Set, https://archive.ics.uci.edu/ml/datasets/University .

Data Set, https://archive.ics.uci.edu/ml/datasets/Teaching+Assistant+Evaluation .

Student Performance Data Set, http://archive.ics.uci.edu/ml/datasets/Student+Performance .

Student Alcohol Consumption, https://www.kaggle.com/uciml/student-alcohol-consumption .

User Knowledge, http://archive.ics.uci.edu/ml/datasets/User+Knowledge+Modeling .

Educational Process Mining (EPM), https://tinyurl.com/y27yduo3 .

Open University Learning Analytics dataset, https://tinyurl.com/y6ysfant .

S.A.P., https://archive.ics.uci.edu/ml/datasets/Student+Academics+Performance .

Mendeley Data Repository, https://data.mendeley.com .

MOOC lectures dataset, https://data.mendeley.com/datasets/xknjp8pxbj/1 .

Dataset, https://data.mendeley.com/datasets/68mt8gms4j/2 .

Dataset, https://data.mendeley.com/datasets/6jmv43nffk/2 .

Dataset, https://data.mendeley.com/datasets/83tcx8psxv/1 .

KEEL, http://www.keel.es/

Harvard Dataverse, https://dataverse.harvard.edu .

HarvardX Person-Course Academic Year 2013, https://doi.org/10.7910/DVN/26147 .

Canvas Network Person-Course, https://doi.org/10.7910/DVN/1XORAL .

(MOOC-Ed), https://doi.org/10.7910/DVN/ZZH3UB .

CAMEO Dataset, https://doi.org/10.7910/DVN/3UKVOR .

Nursing Student Data, https://doi.org/10.7910/DVN/MQ8EP0 .

Dataset, https://doi.org/10.7910/DVN/M07HQ7 .

Early Reading and Writing Assessment in Preschool, https://doi.org/10.7910/DVN/V7E9XD .

DataShop@CMU, https://pslcdatashop.web.cmu.edu/index.jsp?datasets=public .

KDD Cup 2010, https://pslcdatashop.web.cmu.edu/KDDCup .

What Do You Know? https://www.kaggle.com/c/WhatDoYouKnow .

Automated Essay Scoring, https://www.kaggle.com/c/asap-aes .

Short Answer Scoring, https://www.kaggle.com/c/asap-sas .

Students' Academic Performance Dataset, https://www.kaggle.com/aljarah/xAPI-Edu-Data .

NAEP 2017, https://sites.google.com/view/assistmentsdatamining .

NAEP 2019: https://sites.google.com/view/dataminingcompetition2019 .

Students Performance https://www.kaggle.com/spscientist/students-performance-in-exams .

CSEDM 2019 Data Challenge, https://sites.google.com/asu.edu/csedm-ws-lak-2019 .

KC Modeling for Programming, https://pslcdatashop.web.cmu.edu/Project?id=294 .

CSEDM 2020, https://sites.google.com/ncsu.edu/csedm-ws-edm-2020/data-challenge .

EdNet dataset, https://github.com/riiid/ednet .

Riiid AIEd Challenge 2020, https://www.kaggle.com/c/riiid-test-answer-prediction .

Multimodal learning Math Data Corpus, http://mla.ucsd.edu/data .

Learn Moodle August 2016, https://research.moodle.org/158 .

Lix Puzzle-game, https://sites.google.com/site/learninganalyticsforall/data-sets/lix-dataset .

Student Life Dataset, http://studentlife.cs.dartmouth.edu .

Dataset for empirical evaluation of entry requirements, https://tinyurl.com/yxqf2v42 .

MUTLA, https://tinyurl.com/SAILdata .

https://www.kaggle.com/cristianmihaescu/dsa-test-dataset/kernels .

Popescu, P.S., Mihaescu, M.C., Teodorescu, O.M., Mocanu, M.: Student testing activity dataset from data structures course. In: RoCHI-International Conference on Human-Computer Interaction, p. 157 (2020)

Google Scholar  

Mihaescu, M.C., Popescu, P.S.: Review on publicly available datasets for educational data mining. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 11 (3), e1403 (2021)

Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst. Appl. 33 (1), 135–146 (2007)

Article   Google Scholar  

Baker, R.S., Yacef, K.: The state of educational data mining in 2009: a review and future visions. J. Educ. Data Min. 1 (1), 3–17 (2009)

Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40 (6), 601–618 (2010)

Koedinger, K.R., Baker, R.S., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J.: A data repository for the EDM community: the PSLC DataShop. Handb. Educ. Data Min. 43 , 43–56 (2010)

Merceron, A.: Educational data mining/learning analytics: methods, tasks and current trends. In: DeLFI Workshops, pp. 101–109 (2015)

Dutt, A., Ismail, M.A., Herawan, T.: A systematic review on educational data mining. IEEE Access 5 , 15991–16005 (2017)

Silva, C., Fonseca, J.: Educational Data Mining: a literature review. In: Europe and MENA Cooperation Advances in Information and Communication Technologies, pp. 87–94 (2017)

Rodrigues, M.W., Isotani, S., Zarate, L.E.: Educational data mining: a review of evaluation process in the e-learning. Telemat. Inform. 35 (6), 1701–1717 (2018)

Romero, C., Ventura, S.: Data mining in education. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 3 (1), 12–27 (2013)

Romero, C., Ventura, S.: Educational data science in massive open online courses. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 7 (1), e1187 (2017)

Romero, C., Ventura, S.: Educational data mining and learning analytics: an updated survey. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 10 (3), e1355 (2020)

Slater, S., Joksimović, S., Kovanovic, V., Baker, R.S., Gasevic, D.: Tools for educational data mining: a review. J. Educ. Behav. Stat. 42 (1), 85–106 (2017)

Ihantola, P., Vihavainen, A., Ahadi, A., Butler, M., Börstler, J., Edwards, S.H., Toll, D.: Educational data mining and learning analytics in programming: Literature review and case studies. In: Proceedings of the 2015 ITiCSE on Working Group Reports, pp. 41–63 (2015)

Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. Proc. ACM Programm. Lang. 3 (POPL), 1–29 (2019)

Vieira, C., Parsons, P., Byrd, V.: Visual learning analytics of educational data: a systematic literature review and research agenda. Comput. Educ. 122 , 119–135 (2018)

Ferreira‐Mello, R., André, M., Pinheiro, A., Costa, E., Romero, C.: Text mining in education. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 9 (6), e1332 (2019)

Bakhshinategh, B., Zaiane, O.R., ElAtia, S., Ipperciel, D.: Educational data mining applications and tasks: a survey of the last 10 years. Educ. Inf. Technol. 23 (1), 537–553 (2018)

Loh, W.Y., Shih, Y.S.: Split selection methods for classification trees. Stat. Sin. 815–840 (1997)

Cortez, P., Silva, A.M.G.: Using data mining to predict secondary school student performance (2008)

Kahraman, H.T., Sagiroglu, S., Colak, I.: The development of intuitive knowledge classifier and the modeling of domain dependent data. Knowl.-Based Syst. 37 , 283–295 (2013)

Vahdat, M., Oneto, L., Anguita, D., Funk, M., Rauterberg, M.: A learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. In: European Conference on Technology Enhanced Learning, pp. 352–366. Springer, Cham (2015)

Kuzilek, J., Hlosta, M., Zdrahal, Z.: Open university learning analytics dataset. Sci. Data 4 (1), 1–8 (2017)

Hussain, S., Dahan, N.A., Ba-Alwib, F.M., Ribata, N.: Educational data mining and analysis of students’ academic performance using WEKA. Indones. J. Electr. Eng. Comput. Sci. 9 (2), 447–459 (2018)

Bhoi, N.K.: Mendeley data repository as a platform for research data management. Marching Libr.: Manag. Ski. Technol. Competencies, 481–487 (2018)

Kastrati, Z., Kurti, A., Imran, A.S.: WET: Word embedding-topic distribution vectors for MOOC video lectures dataset. Data Brief 28 , 105090 (2020)

Gómez-Tejedor, J.A., Vidaurre, A., Tort-Ausina, I., Mateo, J.M., Serrano, M.A., Meseguer-Dueñas, J.M., et al.: Data set on the effectiveness of flip teaching on engineering students’ performance in the physics lab compared to traditional methodology. Data Brief 28 , 104915 (2020)

Prasojo, L.D., Habibi, A., Yaakob, M.F.M., Pratama, R., Yusof, M.R., Mukminin, A., Hanum, F.: Teachers’ burnout: A SEM analysis in an Asian context. Heliyon 6 (1), e03144 (2020)

Delahoz-Dominguez, E., Zuluaga, R., Fontalvo-Herrera, T.: Dataset of academic performance evolution for engineering students. Data Brief 30 , 105537 (2020)

Hou, Y., Li, L., Li, B., Liu, J.: An anti-noise ensemble algorithm for imbalance classification. Intell. Data Anal. 23 (6), 1205–1217 (2019)

Ho, A., Reich, J., Nesterko, S., Seaton, D., Mullaney, T., Waldo, J., Chuang, I.: HarvardX and MITx: the first year of open online courses, fall 2012-summer 2013 (HarvardX and MITx Working Paper No. 1) (2014)

Kellogg, S., Edelmann, A.: Massively open online course for educators (MOOC-E d) network dataset. Br. J. Educ. Technol. 46 (5), 977–983 (2015)

Northcutt, C.G., Ho, A.D., Chuang, I.L.: Detecting and preventing “multiple-account” cheating in massive open online courses. Comput. Educ. 100 , 71–80 (2016)

Stamper, J., Niculescu-Mizil, A., Ritter, S., Gordon, G.J., Koedinger, K.R.: Bridge to algebra 2006–2007. Development data set from KDD cup 2010 educational data mining challenge (2010)

Amrieh, E.A., Hamtini, T., Aljarah, I.: Mining educational data to predict student’s academic performance using ensemble methods. Int. J. Database Theory Appl. 9 (8), 119–136 (2016)

Choi, Y., Lee, Y., Shin, D., Cho, J., Park, S., Lee, S., et al.: Ednet: a large-scale hierarchical dataset in education. In: International Conference on Artificial Intelligence in Education, pp. 69–73. Springer, Cham (2020)

Oviatt, S., Cohen, A., Weibel, N.: Multimodal learning analytics: description of math data corpus for ICMI grand challenge workshop. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 563–568 (2013)

Vahdat, M., Carvalho, M.B., Funk, M., Rauterberg, M., Hu, J., Anguita, D. Learning analytics for a puzzle game to discover the puzzle-solving tactics of players. In: European Conference on Technology Enhanced Learning, pp. 673–677. Springer, Cham (2016)

Wang, R., Chen, F., Chen, Z., Li, T., Harari, G., Tignor, S., et al.: StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 3–14 (2014)

Wang, R., Chen, F., Chen, Z., Li, T., Harari, G., Tignor, S., et al.: StudentLife: Using smartphones to assess mental health and academic performance of college students. In: Mobile Health, pp. 7–33. Springer, Cham (2017)

Odukoya, J.A., Popoola, S.I., Atayero, A.A., Omole, D.O., Badejo, J.A., John, T.M., Olowo, O.O.: Learning analytics: dataset for empirical evaluation of entry requirements into engineering undergraduate programs in a Nigerian university. Data Brief 17 , 998–1014 (2018)

Xu, F., Wu, L., Thai, K.P., Hsu, C., Wang, W., Tong, R.: MUTLA: a large-scale dataset for multimodal teaching and learning analytics. arXiv:1910.06078 (2019)

Teodorescu, O.M., Popescu, P.S., Mihaescu, M.C.: Taking e-assessment quizzes-a case study with an SVD based recommender system. In: International Conference on Intelligent Data Engineering and Automated Learning, pp. 829–837. Springer, Cham (2018)

Download references

Author information

Authors and affiliations.

Department of Computer Science and Information Technology, University of Craiova, Str. A.I. Cuza, Nr. 13, Craiova, Romania

P. S. Popescu, M. C. Mihăescu & M. L. Mocanu

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to P. S. Popescu .

Editor information

Editors and affiliations.

University of Craiova, Craiova, Romania

Marian Cristian Mihăescu

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Cite this chapter.

Popescu, P.S., Mihăescu, M.C., Mocanu, M.L. (2022). Public Datasets and Data Sources for Educational Data Mining. In: Mihăescu, M.C. (eds) Data Analytics in e-Learning: Approaches and Applications. Intelligent Systems Reference Library, vol 220. Springer, Cham. https://doi.org/10.1007/978-3-030-96644-7_2

Download citation

DOI : https://doi.org/10.1007/978-3-030-96644-7_2

Published : 23 March 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-96643-0

Online ISBN : 978-3-030-96644-7

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

We will keep fighting for all libraries - stand with us!

Internet Archive Audio

data mining research papers pdf free download

  • This Just In
  • Grateful Dead
  • Old Time Radio
  • 78 RPMs and Cylinder Recordings
  • Audio Books & Poetry
  • Computers, Technology and Science
  • Music, Arts & Culture
  • News & Public Affairs
  • Spirituality & Religion
  • Radio News Archive

data mining research papers pdf free download

  • Flickr Commons
  • Occupy Wall Street Flickr
  • NASA Images
  • Solar System Collection
  • Ames Research Center

data mining research papers pdf free download

  • All Software
  • Old School Emulation
  • MS-DOS Games
  • Historical Software
  • Classic PC Games
  • Software Library
  • Kodi Archive and Support File
  • Vintage Software
  • CD-ROM Software
  • CD-ROM Software Library
  • Software Sites
  • Tucows Software Library
  • Shareware CD-ROMs
  • Software Capsules Compilation
  • CD-ROM Images
  • ZX Spectrum
  • DOOM Level CD

data mining research papers pdf free download

  • Smithsonian Libraries
  • Lincoln Collection
  • American Libraries
  • Canadian Libraries
  • Universal Library
  • Project Gutenberg
  • Children's Library
  • Biodiversity Heritage Library
  • Books by Language
  • Additional Collections

data mining research papers pdf free download

  • Prelinger Archives
  • Democracy Now!
  • Occupy Wall Street
  • TV NSA Clip Library
  • Animation & Cartoons
  • Arts & Music
  • Computers & Technology
  • Cultural & Academic Films
  • Ephemeral Films
  • Sports Videos
  • Videogame Videos
  • Youth Media

Search the history of over 866 billion web pages on the Internet.

Mobile Apps

  • Wayback Machine (iOS)
  • Wayback Machine (Android)

Browser Extensions

Archive-it subscription.

  • Explore the Collections
  • Build Collections

Save Page Now

Capture a web page as it appears now for use as a trusted citation in the future.

Please enter a valid web address

  • Donate Donate icon An illustration of a heart shape

Data mining for dummies

Bookreader item preview, share or embed this item, flag this item for.

  • Graphic Violence
  • Explicit Sexual Content
  • Hate Speech
  • Misinformation/Disinformation
  • Marketing/Phishing/Advertising
  • Misleading/Inaccurate/Missing Metadata

[WorldCat (this item)]

plus-circle Add Review comment Reviews

8 Favorites

Better World Books


No suitable files to display here.


Uploaded by station65.cebu on November 4, 2022

SIMILAR ITEMS (based on metadata)


  1. PDF Download Introduction to Data Mining Full Free Collection

    data mining research papers pdf free download

  2. (PDF) A Study On Applications Of Data Mining

    data mining research papers pdf free download

  3. (PDF) Trends in data mining research: A two-decade review using topic

    data mining research papers pdf free download

  4. (PDF) Classification algorithm in Data mining: An Overview

    data mining research papers pdf free download

  5. (PDF) A Review: Data Mining Techniques and Its Applications

    data mining research papers pdf free download

  6. (PDF) Data mining in bioinformatics: Selected papers from BIOKDD

    data mining research papers pdf free download


  1. Lecture 15: Data Mining CSE 2020 Fall

  2. 01 Lecture 6 Part 01

  3. Challenges and Opportunities for Educational Data Mining ! Research Paper review

  4. Unlock a Universe of PDFs with PDF Books & Downloads 📚

  5. Data Mining || NPTEL week 8 assignment answers 2024 #nptel #datamining #skumaredu #2024

  6. Data Mining Introduction


  1. (PDF) Data mining techniques and applications

    Data Mining Algorithms and Techniques. Various algorithms and techniques like Classification, Clustering, Regression, Artificial. Intelligence, Neural Networks, Association Rules, Decision Trees ...

  2. data mining Latest Research Papers

    The accurate average value is 74.05% of the existing COID algorithm, and our proposed algorithm has 77.21%. The average recall value is 81.19% and 89.51% of the existing and proposed algorithm, which shows that the proposed work efficiency is better than the existing COID algorithm. Download Full-text.

  3. PDF A comprehensive survey of data mining

    To take a holistic view of the research trends in the area of data mining, a comprehensive survey is presented in this paper. This paper presents a systematic and comprehensive survey of various data mining tasks and techniques. Further, various real-life applications of data mining are presented in this paper.

  4. A comprehensive survey of data mining

    Data mining plays an important role in various human activities because it extracts the unknown useful patterns (or knowledge). Due to its capabilities, data mining become an essential task in large number of application domains such as banking, retail, medical, insurance, bioinformatics, etc. To take a holistic view of the research trends in the area of data mining, a comprehensive survey is ...

  5. PDF Data Mining in the Real World: Experiences, Challenges, and ...

    Abstract - Data mining is used regularly in a variety of in-dustries and is continuing to gain in both popularity and ac-ceptance. However, applying data mining methods to complex real-world tasks is far from straightforward and many pitfalls face data mining practitioners. However, most research in the field tends to focus on the algorithmic ...

  6. The Survey of Data Mining Applications And Feature Scope

    propose feature directions some of data mining applications. We have added the scope of the data mining applications so that the researcher can pin pointed the following areas. 2. The Data Mining Task The data mining tasks are of d ifferent types depending on the use of data mining result the data mining tasks are classified as[1,2]:

  7. PDF Data Mining

    Data Mining Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL, USA Synonyms Data analysis; Knowledge discovery from data; Pattern discovery Definition Data mining is the process of discovering knowl-edge or patterns from massive amounts of data. As a young research field, data mining represents

  8. 50 selected papers in Data Mining and Machine Learning

    Issues in Mining Imbalanced Data Sets - A Review Paper, S. Visa and A. Ralescu, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67-73, 2005. Wrapper-based Computation and Evaluation of Sampling Methods for Imbalanced Datasets , N. Chawla, L. Hall, and A. Joshi, in Proceedings of the 1st ...

  9. Statistical Analysis and Data Mining: The ASA Data Science Journal

    Statistical Analysis and Data Mining addresses the broad area of data analysis, including data mining algorithms, statistical approaches, and practical applications. Topics include problems involving massive and complex datasets, solutions utilizing innovative data mining algorithms and/or novel statistical approaches.

  10. data mining techniques Latest Research Papers

    The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ...

  11. Review Paper on Data Mining Techniques and Applications

    Abstract. Data mining is the process of extracting hidden and useful patterns and information from data. Data mining is a new technology that helps businesses to predict future trends and behaviors, allowing them to make proactive, knowledge driven decisions. The aim of this paper is to show the process of data mining and how it can help ...

  12. PDF Data Mining

    Originally, "data mining" or "data dredging" was a derogatory term referring to attempts to extract information that was not supported by the data. Section 1.2 illustrates the sort of errors one can make by trying to extract what really isn't in the data. Today, "data mining" has taken on a positive meaning.

  13. PDF Chapter 1 Introduction to Data Mining

    1.2 Data mining techniques 1.2.1 Abrief overview Many data mining techniques have been developed over the years. Some of them are conceptually very simple, and some others are more complex and may lead to the formulation of a global optimization problem (see Section 1.4). In data mining, the goal is to split data in different categories, each ...

  14. The plan to mine the world's research papers

    The power of data mining. ... Nehru University in New Delhi to extract text and images from 73 million research papers. ... sources that provide free-to-download versions of papers (such as PubMed ...

  15. (PDF) Applying Data Mining Research Methodologies on Information

    Download Free PDF. Download Free PDF. ... This paper investigates methods of mining patterns of a complex spatial data set (which generally describes any kind of data where the location in space of object holds importance). ... From the data mining research point of view the constructive approach can be seen to help to manipulate and coordinate ...


    Descriptive vs. predictive data mining • Multiple/integrated functions and mining at multiple levels • Techniques utilized • Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition, visualization, high- performance, etc. • Applications adapted • Retail, telecommunication, banking, fraud analysis, bio ...

  17. PDF www.scitepress.org


  18. Public Datasets and Data Sources for Educational Data Mining

    Download book PDF. Download book EPUB. Data Analytics in e-Learning: Approaches and Applications ... Of particular importance in the area of general Educational Data Mining, review papers are the works of [11,12,13]. ... This datasets overview is relevant for the Educational Data Mining research area, making it easier to pick a suitable dataset ...

  19. Data mining for dummies : Seltzer, Meta Brown : Free Download, Borrow

    "Learn to: understand key data mining concepts and best practices; create a data model and test its validity; interpret results and communicate your findings; make a business case for investing in data mining"--Cover ... Pdf_module_version 0.0.20 Ppi 360 Rcs_key 24143 Republisher_date 20221109115924 Republisher_operator associate-teresita ...