Chapman University Digital Commons

Home > Dissertations and Theses > Computational and Data Sciences (PhD) Dissertations

Computational and Data Sciences (PhD) Dissertations

Below is a selection of dissertations from the Doctor of Philosophy in Computational and Data Sciences program in Schmid College that have been included in Chapman University Digital Commons. Additional dissertations from years prior to 2019 are available through the Leatherby Libraries' print collection or in Proquest's Dissertations and Theses database.

Dissertations from 2023 2023

Computational Analysis of Antibody Binding Mechanisms to the Omicron RBD of SARS-CoV-2 Spike Protein: Identification of Epitopes and Hotspots for Developing Effective Therapeutic Strategies , Mohammed Alshahrani

Integration of Computer Algebra Systems and Machine Learning in the Authoring of the SANYMS Intelligent Tutoring System , Sam Ford

Voluntary Action and Conscious Intention , Jake Gavenas

Random Variable Spaces: Mathematical Properties and an Extension to Programming Computable Functions , Mohammed Kurd-Misto

Computational Modeling of Superconductivity from the Set of Time-Dependent Ginzburg-Landau Equations for Advancements in Theory and Applications , Iris Mowgood

Application of Machine Learning Algorithms for Elucidation of Biological Networks from Time Series Gene Expression Data , Krupa Nagori

Stochastic Processes and Multi-Resolution Analysis: A Trigonometric Moment Problem Approach and an Analysis of the Expenditure Trends for Diabetic Patients , Isaac Nwi-Mozu

Applications of Causal Inference Methods for the Estimation of Effects of Bone Marrow Transplant and Prescription Drugs on Survival of Aplastic Anemia Patients , Yesha M. Patel

Causal Inference and Machine Learning Methods in Parkinson's Disease Data Analysis , Albert Pierce

Causal Inference Methods for Estimation of Survival and General Health Status Measures of Alzheimer’s Disease Patients , Ehsan Yaghmaei

Dissertations from 2022 2022

Computational Approaches to Facilitate Automated Interchange between Music and Art , Rao Hamza Ali

Causal Inference in Psychology and Neuroscience: From Association to Causation , Dehua Liang

Advances in NLP Algorithms on Unstructured Medical Notes Data and Approaches to Handling Class Imbalance Issues , Hanna Lu

Novel Techniques for Quantifying Secondhand Smoke Diffusion into Children's Bedroom , Sunil Ramchandani

Probing the Boundaries of Human Agency , Sook Mun Wong

Dissertations from 2021 2021

Predicting Eye Movement and Fixation Patterns on Scenic Images Using Machine Learning for Children with Autism Spectrum Disorder , Raymond Anden

Forecasting the Prices of Cryptocurrencies using a Novel Parameter Optimization of VARIMA Models , Alexander Barrett

Applications of Machine Learning to Facilitate Software Engineering and Scientific Computing , Natalie Best

Exploring Behaviors of Software Developers and Their Code Through Computational and Statistical Methods , Elia Eiroa Lledo

Assessing the Re-Identification Risk in ECG Datasets and an Application of Privacy Preserving Techniques in ECG Analysis , Arin Ghazarian

Multi-Modal Data Fusion, Image Segmentation, and Object Identification using Unsupervised Machine Learning: Conception, Validation, Applications, and a Basis for Multi-Modal Object Detection and Tracking , Nicholas LaHaye

Machine-Learning-Based Approach to Decoding Physiological and Neural Signals , Elnaz Lashgari

Learning-Based Modeling of Weather and Climate Events Related To El Niño Phenomenon via Differentiable Programming and Empirical Decompositions , Justin Le

Quantum State Estimation and Tracking for Superconducting Processors Using Machine Learning , Shiva Lotfallahzadeh Barzili

Novel Applications of Statistical and Machine Learning Methods to Analyze Trial-Level Data from Cognitive Measures , Chelsea Parlett

Optimal Analytical Methods for High Accuracy Cardiac Disease Classification and Treatment Based on ECG Data , Jianwei Zheng

Dissertations from 2020 2020

Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents , Steven Agajanian

Allocation of Public Resources: Bringing Order to Chaos , Lance Clifner

A Novel Correction for the Adjusted Box-Pierce Test — New Risk Factors for Emergency Department Return Visits within 72 hours for Children with Respiratory Conditions — General Pediatric Model for Understanding and Predicting Prolonged Length of Stay , Sidy Danioko

A Computational and Experimental Examination of the FCC Incentive Auction , Logan Gantner

Exploring the Employment Landscape for Individuals with Autism Spectrum Disorders using Supervised and Unsupervised Machine Learning , Kayleigh Hyde

Integrated Machine Learning and Bioinformatics Approaches for Prediction of Cancer-Driving Gene Mutations , Oluyemi Odeyemi

On Quantum Effects of Vector Potentials and Generalizations of Functional Analysis , Ismael L. Paiva

Long Term Ground Based Precipitation Data Analysis: Spatial and Temporal Variability , Luciano Rodriguez

Gaining Computational Insight into Psychological Data: Applications of Machine Learning with Eating Disorders and Autism Spectrum Disorder , Natalia Rosenfield

Connecting the Dots for People with Autism: A Data-driven Approach to Designing and Evaluating a Global Filter , Viseth Sean

Novel Statistical and Machine Learning Methods for the Forecasting and Analysis of Major League Baseball Player Performance , Christopher Watkins

Dissertations from 2019 2019

Contributions to Variable Selection in Complexly Sampled Case-control Models, Epidemiology of 72-hour Emergency Department Readmission, and Out-of-site Migration Rate Estimation Using Pseudo-tagged Longitudinal Data , Kyle Anderson

Bias Reduction in Machine Learning Classifiers for Spatiotemporal Analysis of Coral Reefs using Remote Sensing Images , Justin J. Gapper

Estimating Auction Equilibria using Individual Evolutionary Learning , Kevin James

Employing Earth Observations and Artificial Intelligence to Address Key Global Environmental Challenges in Service of the SDGs , Wenzhao Li

Image Restoration using Automatic Damaged Regions Detection and Machine Learning-Based Inpainting Technique , Chloe Martin-King

Theses from 2017 2017

Optimized Forecasting of Dominant U.S. Stock Market Equities Using Univariate and Multivariate Time Series Analysis Methods , Michael Schwartz

  • Collections
  • Disciplines

Advanced Search

  • Notify me via email or RSS

Author Corner

  • Submit Research
  • Rights and Terms of Use
  • Leatherby Libraries
  • Chapman University

ISSN 2572-1496

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

DiscoverDataScience.org

PhD in Data Science – Your Guide to Choosing a Doctorate Degree Program

phd thesis in data science

Created by aasif.faizal

Professional opportunities in data science are growing incredibly fast. That’s great news for students looking to pursue a career as a data scientist. But it also means that there are a lot more options out there to investigate and understand before developing the best educational path for you.

A PhD is the most advanced data science degree you can get, reflecting a depth of knowledge and technical expertise that will put you at the top of your field.

phd data science

This means that PhD programs are the most time-intensive degree option out there, typically requiring that students complete dissertations involving rigorous research. This means that PhDs are not for everyone. Indeed, many who work in the world of big data hold master’s degrees rather than PhDs, which tend to involve the same coursework as PhD programs without a dissertation component. However, for the right candidate, a PhD program is the perfect choice to become a true expert on your area of focus.

If you’ve concluded that a data science PhD is the right path for you, this guide is intended to help you choose the best program to suit your needs. It will walk through some of the key considerations while picking graduate data science programs and some of the nuts and bolts (like course load and tuition costs) that are part of the data science PhD decision-making process.

Data Science PhD vs. Masters: Choosing the right option for you

If you’re considering pursuing a data science PhD, it’s worth knowing that such an advanced degree isn’t strictly necessary in order to get good work opportunities. Many who work in the field of big data only hold master’s degrees, which is the level of education expected to be a competitive candidate for data science positions.

So why pursue a data science PhD?

Simply put, a PhD in data science will leave you qualified to enter the big data industry at a high level from the outset.

You’ll be eligible for advanced positions within companies, holding greater responsibilities, keeping more direct communication with leadership, and having more influence on important data-driven decisions. You’re also likely to receive greater compensation to match your rank.

However, PhDs are not for everyone. Dissertations require a great deal of time and an interest in intensive research. If you are eager to jumpstart a career quickly, a master’s program will give you the preparation you need to hit the ground running. PhDs are appropriate for those who want to commit their time and effort to schooling as a long-term investment in their professional trajectory.

For more information on the difference between data science PhD’s and master’s programs, take a look at our guide here.

Topics include:

  • Can I get an Online Ph.D in Data Science?
  • Overview of Ph.d Coursework

Preparing for a Doctorate Program

Building a solid track record of professional experience, things to consider when choosing a school.

  • What Does it Cost to Get a Ph.D in Data Science?
  • School Listings

data analysis graph

Data Science PhD Programs, Historically

Historically, data science PhD programs were one of the main avenues to get a good data-related position in academia or industry. But, PhD programs are heavily research oriented and require a somewhat long term investment of time, money, and energy to obtain. The issue that some data science PhD holders are reporting, especially in industry settings, is that that the state of the art is moving so quickly, and that the data science industry is evolving so rapidly, that an abundance of research oriented expertise is not always what’s heavily sought after.

Instead, many companies are looking for candidates who are up to date with the latest data science techniques and technologies, and are willing to pivot to match emerging trends and practices.

One recent development that is making the data science graduate school decisions more complex is the introduction of specialty master’s degrees, that focus on rigorous but compact, professional training. Both students and companies are realizing the value of an intensive, more industry-focused degree that can provide sufficient enough training to manage complex projects and that are more client oriented, opposed to research oriented.

However, not all prospective data science PhD students are looking for jobs in industry. There are some pretty amazing research opportunities opening up across a variety of academic fields that are making use of new data collection and analysis tools. Experts that understand how to leverage data systems including statistics and computer science to analyze trends and build models will be in high demand.

Can You Get a PhD in Data Science Online?

While it is not common to get a data science Ph.D. online, there are currently two options for those looking to take advantage of the flexibility of an online program.

Indiana University Bloomington and Northcentral University both offer online Ph.D. programs with either a minor or specialization in data science.

Given the trend for schools to continue increasing online offerings, expect to see additional schools adding this option in the near future.

woman data analysis on computer screens

Overview of PhD Coursework

A PhD requires a lot of academic work, which generally requires between four and five years (sometimes longer) to complete.

Here are some of the high level factors to consider and evaluate when comparing data science graduate programs.

How many credits are required for a PhD in data science?

On average, it takes 71 credits to graduate with a PhD in data science — far longer (almost double) than traditional master’s degree programs. In addition to coursework, most PhD students also have research and teaching responsibilities that can be simultaneously demanding and really great career preparation.

What’s the core curriculum like?

In a data science doctoral program, you’ll be expected to learn many skills and also how to apply them across domains and disciplines. Core curriculums will vary from program to program, but almost all will have a core foundation of statistics.

All PhD candidates will have to take a qualifying exam. This can vary from university to university, but to give you some insight, it is broken up into three phases at Yale. They have a practical exam, a theory exam and an oral exam. The goal is to make sure doctoral students are developing the appropriate level of expertise.

Dissertation

One of the final steps of a PhD program involves presenting original research findings in a formal document called a dissertation. These will provide background and context, as well as findings and analysis, and can contribute to the understanding and evolution of data science. A dissertation idea most often provides the framework for how a PhD candidate’s graduate school experience will unfold, so it’s important to be thoughtful and deliberate while considering research opportunities.

Since data science is such a rapidly evolving field and because choosing the right PhD program is such an important factor in developing a successful career path, there are some steps that prospective doctoral students can take in advance to find the best-fitting opportunity.

Join professional associations

Even before being fully credentials, joining professional associations and organizations such as the Data Science Association and the American Association of Big Data Professionals is a good way to get exposure to the field. Many professional societies are welcoming to new members and even encourage student participation with things like discounted membership fees and awards and contest categories for student researchers. One of the biggest advantages to joining is that these professional associations bring together other data scientists for conference events, research-sharing opportunities, networking and continuing education opportunities.

Leverage your social network

Be on the lookout to make professional connections with professors, peers, and members of industry. There are a number of LinkedIn groups dedicated to data science. A well-maintained professional network is always useful to have when looking for advice or letters of recommendation while applying to graduate school and then later while applying for jobs and other career-related opportunities.

Kaggle competitions

Kaggle competitions provide the opportunity to solve real-world data science problems and win prizes. A list of data science problems can be found at Kaggle.com . Winning one of these competitions is a good way to demonstrate professional interest and experience.

Internships

Internships are a great way to get real-world experience in data science while also getting to work for top names in the world of business. For example, IBM offers a data science internship which would also help to stand out when applying for PhD programs, as well as in seeking employment in the future.

Demonstrating professional experience is not only important when looking for jobs, but it can also help while applying for graduate school. There are a number of ways for prospective students to gain exposure to the field and explore different facets of data science careers.

Get certified

There are a number of data-related certificate programs that are open to people with a variety of academic and professional experience. DeZyre has an excellent guide to different certifications, some of which might help provide good background for graduate school applications.

Conferences

Conferences are a great place to meet people presenting new and exciting research in the data science field and bounce ideas off of newfound connections. Like professional societies and organizations, discounted student rates are available to encourage student participation. In addition, some conferences will waive fees if you are presenting a poster or research at the conference, which is an extra incentive to present.

teacher in full classroom of students

It can be hard to quantify what makes a good-fit when it comes to data science graduate school programs. There are easy to evaluate factors, such as cost and location, and then there are harder to evaluate criteria such as networking opportunities, accessibility to professors, and the up-to-dateness of the program’s curriculum.

Nevertheless, there are some key relevant considerations when applying to almost any data science graduate program.

What most schools will require when applying:

  • All undergraduate and graduate transcripts
  • A statement of intent for the program (reason for applying and future plans)
  • Letters of reference
  • Application fee
  • Online application
  • A curriculum vitae (outlining all of your academic and professional accomplishments)

What Does it Cost to Get a PhD in Data Science?

The great news is that many PhD data science programs are supported by fellowships and stipends. Some are completely funded, meaning the school will pay tuition and basic living expenses. Here are several examples of fully funded programs:

  • University of Southern California
  • University of Nevada, Reno
  • Kennesaw State University
  • Worcester Polytechnic Institute
  • University of Maryland

For all other programs, the average range of tuition, depending on the school can range anywhere from $1,300 per credit hour to $2,000 amount per credit hour. Remember, typical PhD programs in data science are between 60 and 75 credit hours, meaning you could spend up to $150,000 over several years.

That’s why the financial aspects are so important to evaluate when assessing PhD programs, because some schools offer full stipends so that you are able to attend without having to find supplemental scholarships or tuition assistance.

Can I become a professor of data science with a PhD.? Yes! If you are interested in teaching at the college or graduate level, a PhD is the degree needed to establish the full expertise expected to be a professor. Some data scientists who hold PhDs start by entering the field of big data and pivot over to teaching after gaining a significant amount of work experience. If you’re driven to teach others or to pursue advanced research in data science, a PhD is the right degree for you.

Do I need a master’s in order to pursue a PhD.? No. Many who pursue PhDs in Data Science do not already hold advanced degrees, and many PhD programs include all the coursework of a master’s program in the first two years of school. For many students, this is the most time-effective option, allowing you to complete your education in a single pass rather than interrupting your studies after your master’s program.

Can I choose to pursue a PhD after already receiving my master’s? Yes. A master’s program can be an opportunity to get the lay of the land and determine the specific career path you’d like to forge in the world of big data. Some schools may allow you to simply extend your academic timeline after receiving your master’s degree, and it is also possible to return to school to receive a PhD if you have been working in the field for some time.

If a PhD. isn’t necessary, is it a waste of time? While not all students are candidates for PhDs, for the right students – who are keen on doing in-depth research, have the time to devote to many years of school, and potentially have an interest in continuing to work in academia – a PhD is a great choice. For more information on this question, take a look at our article Is a Data Science PhD. Worth It?

Complete List of Data Science PhD Programs

Below you will find the most comprehensive list of schools offering a doctorate in data science. Each school listing contains a link to the program specific page, GRE or a master’s degree requirements, and a link to a page with detailed course information.

Note that the listing only contains true data science programs. Other similar programs are often lumped together on other sites, but we have chosen to list programs such as data analytics and business intelligence on a separate section of the website.

Boise State University  – Boise, Idaho PhD in Computing – Data Science Concentration

The Data Science emphasis focuses on the development of mathematical and statistical algorithms, software, and computing systems to extract knowledge or insights from data.  

In 60 credits, students complete an Introduction to Graduate Studies, 12 credits of core courses, 6 credits of data science elective courses, 10 credits of other elective courses, a Doctoral Comprehensive Examination worth 1 credit, and a 30-credit dissertation.

Electives can be taken in focus areas such as Anthropology, Biometry, Ecology/Evolution and Behavior, Econometrics, Electrical Engineering, Earth Dynamics and Informatics, Geoscience, Geostatistics, Hydrology and Hydrogeology, Materials Science, and Transportation Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $7,236 total (Resident), $24,573 total (Non-resident)

View Course Offerings

Bowling Green State University  – Bowling Green, Ohio Ph.D. in Data Science

Data Science students at Bowling Green intertwine knowledge of computer science with statistics.

Students learn techniques in analyzing structured, unstructured, and dynamic datasets.

Courses train students to understand the principles of analytic methods and articulating the strengths and limitations of analytical methods.

The program requires 60 credit hours in the studies of Computer Science (6 credit hours), Statistics (6 credit hours), Data Science Exploration and Communication, Ethical Issues, Advanced Data Mining, and Applied Data Science Experience.

Students must also complete 21 credit hours of elective courses, a qualifying exam, a preliminary exam, and a dissertation.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,418 (Resident), $14,410 (Non-resident)

Brown University  – Providence, Rhode Island PhD in Computer Science – Concentration in Data Science

Brown University’s database group is a world leader in systems-oriented database research; they seek PhD candidates with strong system-building skills who are interested in researching TupleWare, MLbase, MDCC, Crowd DB, or PIQL.

In order to gain entrance, applicants should consider first doing a research internship at Brown with this group. Other ways to boost an application are to take and do well at massive open online courses, do an internship at a large company, and get involved in a large open-source software project.

Coding well in C++ is preferred.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $62,680 total

Chapman University  – Irvine, California Doctorate in Computational and Data Sciences

Candidates for the doctorate in computational and data science at Chapman University begin by completing 13 core credits in basic methodologies and techniques of computational science.

Students complete 45 credits of electives, which are personalized to match the specific interests and research topics of the student.

Finally, students complete up to 12 credits in dissertation research.

Applicants must have completed courses in differential equations, data structures, and probability and statistics, or take specific foundation courses, before beginning coursework toward the PhD.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,538 per year

Clemson University / Medical University of South Carolina (MUSC) – Joint Program – Clemson, South Carolina & Charleston, South Carolina Doctor of Philosophy in Biomedical Data Science and Informatics – Clemson

The PhD in biomedical data science and informatics is a joint program co-authored by Clemson University and the Medical University of South Carolina (MUSC).

Students choose one of three tracks to pursue: precision medicine, population health, and clinical and translational informatics. Students complete 65-68 credit hours, and take courses in each of 5 areas: biomedical informatics foundations and applications; computing/math/statistics/engineering; population health, health systems, and policy; biomedical/medical domain; and lab rotations, seminars, and doctoral research.

Applicants must have a bachelor’s in health science, computing, mathematics, statistics, engineering, or a related field, and it is recommended to also have competency in a second of these areas.

Program requirements include a year of calculus and college biology, as well as experience in computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,858 total (South Carolina Resident), $22,566 total (Non-resident)

View Course Offerings – Clemson

George Mason University  – Fairfax, Virginia Doctor of Philosophy in Computational Sciences and Informatics – Emphasis in Data Science

George Mason’s PhD in computational sciences and informatics requires a minimum of 72 credit hours, though this can be reduced if a student has already completed a master’s. 48 credits are toward graduate coursework, and an additional 24 are for dissertation research.

Students choose an area of emphasis—either computer modeling and simulation or data science—and completed 18 credits of the coursework in this area. Students are expected to completed the coursework in 4-5 years.

Applicants to this program must have a bachelor’s degree in a natural science, mathematics, engineering, or computer science, and must have knowledge and experience with differential equations and computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $13,426 total (Virginia Resident), $35,377 total (Non-resident)

Harrisburg University of Science and Technology  – Harrisburg, Pennsylvania Doctor of Philosophy in Data Sciences

Harrisburg University’s PhD in data science is a 4-5 year program, the first 2 of which make up the Harrisburg master’s in analytics.

Beyond this, PhD candidates complete six milestones to obtain the degree, including 18 semester hours in doctoral-level courses, such as multivariate data analysis, graph theory, machine learning.

Following the completion of ANLY 760 Doctoral Research Seminar, students in the program complete their 12 hours of dissertation research bringing the total program hours to 36.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $14,940 total

Icahn School of Medicine at Mount Sinai  – New York, New York Genetics and Data Science, PhD

As part of the Biomedical Science PhD program, the Genetics and Data Science multidisciplinary training offers research opportunities that expand on genetic research and modern genomics. The training also integrates several disciplines of biomedical sciences with machine learning, network modeling, and big data analysis.

Students in the Genetics and Data Science program complete a predetermined course schedule with a total of 64 credits and 3 years of study.

Additional course requirements and electives include laboratory rotations, a thesis proposal exam and thesis defense, Computer Systems, Intro to Algorithms, Machine Learning for Biomedical Data Science, Translational Genomics, and Practical Analysis of a Personal Genome.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $31,303 total

Indiana University-Purdue University Indianapolis  – Indianapolis, Indiana PhD in Data Science PhD Minor in Applied Data Science

Doctoral candidates pursuing the PhD in data science at Indiana University-Purdue must display competency in research, data analytics, and at management and infrastructure to earn the degree.

The PhD is comprised of 24 credits of a data science core, 18 credits of methods courses, 18 credits of a specialization, written and oral qualifying exams, and 30 credits of dissertation research. All requirements must be completed within 7 years.

Applicants are generally expected to have a master’s in social science, health, data science, or computer science. 

Currently a majority of the PhD students at IUPUI are funded by faculty grants and two are funded by the federal government. None of the students are self funded.

IUPUI also offers a PhD Minor in Applied Data Science that is 12-18 credits. The minor is open to students enrolled at IUPUI or IU Bloomington in a doctoral program other than Data Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $9,228 per year (Indiana Resident), $25,368 per year (Non-resident)

Jackson State University – Jackson, Mississippi PhD Computational and Data-Enabled Science and Engineering

Jackson State University offers a PhD in computational and data-enabled science and engineering with 5 concentration areas: computational biology and bioinformatics, computational science and engineering, computational physical science, computation public health, and computational mathematics and social science.

Students complete 12 credits of common core courses, 12 credits in the specialization, 24 credits of electives, and 24 credits in dissertation research.

Students may complete the doctoral program in as little as 5 years and no more than 8 years.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,270 total

Kennesaw State University  – Kennesaw, Georgia PhD in Analytics and Data Science

Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

Prior to dissertation research, the comprehensive examination will cover material from the three areas of study: computer science, mathematics, and statistics.

Successful applicants will have a master’s degree in a computational field, calculus I and II, programming experience, modeling experience, and are encouraged to have a base SAS certification.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,328 total (Georgia Resident), $19,188 total (Non-resident)

New Jersey Institute of Technology  – Newark, New Jersey PhD in Business Data Science

Students may enter the PhD program in business data science at the New Jersey Institute of Technology with either a relevant bachelor’s or master’s degree. Students with bachelor’s degrees begin with 36 credits of advanced courses, and those with master’s take 18 credits before moving on to credits in dissertation research.

Core courses include business research methods, data mining and analysis, data management system design, statistical computing with SAS and R, and regression analysis.

Students take qualifying examinations at the end of years 1 and 2, and must defend their dissertations successfully by the end of year 6.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $21,932 total (New Jersey Resident), $32,426 total (Non-resident)

New York University  – New York, New York PhD in Data Science

Doctoral candidates in data science at New York University must complete 72 credit hours, pass a comprehensive and qualifying exam, and defend a dissertation with 10 years of entering the program.

Required courses include an introduction to data science, probability and statistics for data science, machine learning and computational statistics, big data, and inference and representation.

Applicants must have an undergraduate or master’s degree in fields such as mathematics, statistics, computer science, engineering, or other scientific disciplines. Experience with calculus, probability, statistics, and computer programming is also required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,332 per year

View Course Offering

Northcentral University  – San Diego, California PhD in Data Science-TIM

Northcentral University offers a PhD in technology and innovation management with a specialization in data science.

The program requires 60 credit hours, including 6-7 core courses, 3 in research, a PhD portfolio, and 4 dissertation courses.

The data science specialization requires 6 courses: data mining, knowledge management, quantitative methods for data analytics and business intelligence, data visualization, predicting the future, and big data integration.

Applicants must have a master’s already.

Delivery Method: Online GRE: Required 2022-2023 Tuition: $16,794 total

Stevens Institute of Technology – Hoboken, New Jersey Ph.D. in Data Science

Stevens Institute of Technology has developed a data science Ph.D. program geared to help graduates become innovators in the space.

The rigorous curriculum emphasizes mathematical and statistical modeling, machine learning, computational systems and data management.

The program is directed by Dr. Ted Stohr, a recognized thought leader in the information systems, operations and business process management arenas.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $39,408 per year

University at Buffalo – Buffalo, New York PhD Computational and Data-Enabled Science and Engineering

The curriculum for the University of Buffalo’s PhD in computational and data-enabled science and engineering centers around three areas: data science, applied mathematics and numerical methods, and high performance and data intensive computing. 9 credit course of courses must be completed in each of these three areas. Altogether, the program consists of 72 credit hours, and should be completed in 4-5 years. A master’s degree is required for admission; courses taken during the master’s may be able to count toward some of the core coursework requirements.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,310 per year (New York Resident), $23,100 per year (Non-resident)

University of Colorado Denver – Denver, Colorado PhD in Big Data Science and Engineering

The University of Colorado – Denver offers a unique program for those students who have already received admission to the computer science and information systems PhD program.

The Big Data Science and Engineering (BDSE) program is a PhD fellowship program that allows selected students to pursue research in the area of big data science and engineering. This new fellowship program was created to train more computer scientists in data science application fields such as health informatics, geosciences, precision and personalized medicine, business analytics, and smart cities and cybersecurity.

Students in the doctoral program must complete 30 credit hours of computer science classes beyond a master’s level, and 30 credit hours of dissertation research.

The BDSE fellowship requires students to have an advisor both in the core disciplines (either computer science or mathematics and statistics) as well as an advisor in the application discipline (medicine and public health, business, or geosciences).

In addition, the fellowship covers full stipend, tuition, and fees up to ~50k for BDSE fellows annually. Important eligibility requirements can be found here.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $55,260 total

University of Marylan d  – College Park, Maryland PhD in Information Studies

Data science is a potential research area for doctoral candidates in information studies at the University of Maryland – College Park. This includes big data, data analytics, and data mining.

Applicants for the PhD must have taken the following courses in undergraduate studies: programming languages, data structures, design and analysis of computer algorithms, calculus I and II, and linear algebra.

Students must complete 6 qualifying courses, 2 elective graduate courses, and at least 12 credit hours of dissertation research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $16,238 total (Maryland Resident), $35,388 total (Non-resident)

University of Massachusetts Boston  – Boston, Massachusetts PhD in Business Administration – Information Systems for Data Science Track

The University of Massachusetts – Boston offers a PhD in information systems for data science. As this is a business degree, students must complete coursework in their first two years with a focus on data for business; for example, taking courses such as business in context: markets, technologies, and societies.

Students must take and pass qualifying exams at the end of year 1, comprehensive exams at the end of year 2, and defend their theses at the end of year 4.

Those with a degree in statistics, economics, math, computer science, management sciences, information systems, and other related fields are especially encouraged, though a quantitative degree is not necessary.

Students accepted by the program are ordinarily offered full tuition credits and a stipend ($25,000 per year) to cover educational expenses and help defray living costs for up to three years of study.

During the first two years of coursework, they are assigned to a faculty member as a research assistant; for the third year students will be engaged in instructional activities. Funding for the fourth year is merit-based from a limited pool of program funds

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $18,894 total (in-state), $36,879 (out-of-state)

University of Nevada Reno – Reno, Nevada PhD in Statistics and Data Science

The University of Nevada – Reno’s doctoral program in statistics and data science is comprised of 72 credit hours to be completed over the course of 4-5 years. Coursework is all within the scope of statistics, with titles such as statistical theory, probability theory, linear models, multivariate analysis, statistical learning, statistical computing, time series analysis.

The completion of a Master’s degree in mathematics or statistics prior to enrollment in the doctoral program is strongly recommended, but not required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,814 total (in-state), $22,356 (out-of-state)

University of Southern California – Los Angles, California PhD in Data Sciences & Operations

USC Marshall School of Business offers a PhD in data sciences and operations to be completed in 5 years.

Students can choose either a track in operations management or in statistics. Both tracks require 4 courses in fall and spring of the first 2 years, as well as a research paper and courses during the summers. Year 3 is devoted to dissertation preparation and year 4 and/or 5 to dissertation defense.

A bachelor’s degree is necessary for application, but no field or further experience is required.

Students should complete 60 units of coursework. If the students are admitted with Advanced Standing (e.g., Master’s Degree in appropriate field), this requirement may be reduced to 40 credits.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $63,468 total

University of Tennessee-Knoxville  – Knoxville, Tennessee The Data Science and Engineering PhD

The data science and engineering PhD at the University of Tennessee – Knoxville requires 36 hours of coursework and 36 hours of dissertation research. For those entering with an MS degree, only 24 hours of course work is required.

The core curriculum includes work in statistics, machine learning, and scripting languages and is enhanced by 6 hours in courses that focus either on policy issues related to data, or technology entrepreneurship.

Students must also choose a knowledge specialization in one of these fields: health and biological sciences, advanced manufacturing, materials science, environmental and climate science, transportation science, national security, urban systems science, and advanced data science.

Applicants must have a bachelor’s or master’s degree in engineering or a scientific field. 

All students that are admitted will be supported by a research fellowship and tuition will be included.

Many students will perform research with scientists from Oak Ridge national lab, which is located about 30 minutes drive from campus.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,468 total (Tennessee Resident), $29,656 total (Non-resident)

University of Vermont – Burlington, Vermont Complex Systems and Data Science (CSDS), PhD

Through the College of Engineering and Mathematical Sciences, the Complex Systems and Data Science (CSDS) PhD program is pan-disciplinary and provides computational and theoretical training. Students may customize the program depending on their chosen area of focus.

Students in this program work in research groups across campus.

Core courses include Data Science, Principles of Complex Systems and Modeling Complex Systems. Elective courses include Machine Learning, Complex Networks, Evolutionary Computation, Human/Computer Interaction, and Data Mining.

The program requires at least 75 credits to graduate with approval by the student graduate studies committee.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $12,204 total (Vermont Resident), $30,960 total (Non-resident)

University of Washington Seattle Campus – Seattle, Washington PhD in Big Data and Data Science

The University of Washington’s PhD program in data science has 2 key goals: training of new data scientists and cyberinfrastructure development, i.e., development of open-source tools and services that scientists around the world can use for big data analysis.

Students must take core courses in data management, machine learning, data visualization, and statistics.

Students are also required to complete at least one internship that covers practical work in big data.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $17,004 per year (Washington resident), $30,477 (non-resident)

University of Wisconsin-Madison – Madison, Wisconsin PhD in Biomedical Data Science

The PhD program in Biomedical Data Science offered by the Department of Biostatistics and Medical Informatics at UW-Madison is unique, in blending the best of statistics and computer science, biostatistics and biomedical informatics. 

Students complete three year-long course sequences in biostatistics theory and methods, computer science/informatics, and a specialized sequence to fit their interests.

Students also complete three research rotations within their first two years in the program, to both expand their breadth of knowledge and assist in identifying a research advisor.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,728 total (in-state), $24,054 total (out-of-state)

Vanderbilt University – Nashville, Tennessee Data Science Track of the BMI PhD Program

The PhD in biomedical informatics at Vanderbilt has the option of a data science track.

Students complete courses in the areas of biomedical informatics (3 courses), computer science (4 courses), statistical methods (4 courses), and biomedical science (2 courses). Students are expected to complete core courses and defend their dissertations within 5 years of beginning the program.

Applicants must have a bachelor’s degree in computer science, engineering, biology, biochemistry, nursing, mathematics, statistics, physics, information management, or some other health-related field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $53,160 per year

Washington University in St. Louis – St. Louis, Missouri Doctorate in Computational & Data Sciences

Washington University now offers an interdisciplinary Ph.D. in Computational & Data Sciences where students can choose from one of four tracks (Computational Methodologies, Political Science, Psychological & Brain Sciences, or Social Work & Public Health).

Students are fully funded and will receive a stipend for at least five years contingent on making sufficient progress in the program.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $59,420 total

Worcester Polytechnic Institute – Worcester, Massachusetts PhD in Data Science

The PhD in data science at Worcester Polytechnic Institute focuses on 5 areas: integrative data science, business intelligence and case studies, data access and management, data analytics and mining, and mathematical analysis.

Students first complete a master’s in data science, and then complete 60 credit hours beyond the master’s, including 30 credit hours of research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $28,980 per year

Yale University – New Haven, Connecticut PhD Program – Department of Stats and Data Science

The PhD in statistics and data science at Yale University offers broad training in the areas of statistical theory, probability theory, stochastic processes, asymptotics, information theory, machine learning, data analysis, statistical computing, and graphical methods. Students complete 12 courses in the first year in these topics.

Students are required to teach one course each semester of their third and fourth years.

Most students complete and defend their dissertations in their fifth year.

Applicants should have an educational background in statistics, with an undergraduate major in statistics, mathematics, computer science, or similar field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $46,900 total

phd thesis in data science

  • Related Programs

wiley university servieces logo

  • Warning : Invalid argument supplied for foreach() in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 95 Warning : array_merge(): Expected parameter 2 to be an array, null given in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 102
  • ODSC EUROPE
  • AI+ Training
  • Speak at ODSC

phd thesis in data science

  • Data Analytics
  • Data Engineering
  • Data Visualization
  • Deep Learning
  • Generative AI
  • Machine Learning
  • NLP and LLMs
  • Business & Use Cases
  • Career Advice
  • Write for us
  • ODSC Community Slack Channel
  • Upcoming Webinars

17 Compelling Machine Learning Ph.D. Dissertations

17 Compelling Machine Learning Ph.D. Dissertations

Machine Learning Modeling Research posted by Daniel Gutierrez, ODSC August 12, 2021 Daniel Gutierrez, ODSC

Working in the field of data science, I’m always seeking ways to keep current in the field and there are a number of important resources available for this purpose: new book titles, blog articles, conference sessions, Meetups, webinars/podcasts, not to mention the gems floating around in social media. But to dig even deeper, I routinely look at what’s coming out of the world’s research labs. And one great way to keep a pulse for what the research community is working on is to monitor the flow of new machine learning Ph.D. dissertations. Admittedly, many such theses are laser-focused and narrow, but from previous experience reading these documents, you can learn an awful lot about new ways to solve difficult problems over a vast range of problem domains. 

In this article, I present a number of hand-picked machine learning dissertations that I found compelling in terms of my own areas of interest and aligned with problems that I’m working on. I hope you’ll find a number of them that match your own interests. Each dissertation may be challenging to consume but the process will result in hours of satisfying summer reading. Enjoy!

Please check out my previous data science dissertation round-up article . 

1. Fitting Convex Sets to Data: Algorithms and Applications

This machine learning dissertation concerns the geometric problem of finding a convex set that best fits a given data set. The overarching question serves as an abstraction for data-analytical tasks arising in a range of scientific and engineering applications with a focus on two specific instances: (i) a key challenge that arises in solving inverse problems is ill-posedness due to a lack of measurements. A prominent family of methods for addressing such issues is based on augmenting optimization-based approaches with a convex penalty function so as to induce a desired structure in the solution. These functions are typically chosen using prior knowledge about the data. The thesis also studies the problem of learning convex penalty functions directly from data for settings in which we lack the domain expertise to choose a penalty function. The solution relies on suitably transforming the problem of learning a penalty function into a fitting task; and (ii) the problem of fitting tractably-described convex sets given the optimal value of linear functionals evaluated in different directions.

2. Structured Tensors and the Geometry of Data

This machine learning dissertation analyzes data to build a quantitative understanding of the world. Linear algebra is the foundation of algorithms, dating back one hundred years, for extracting structure from data. Modern technologies provide an abundance of multi-dimensional data, in which multiple variables or factors can be compared simultaneously. To organize and analyze such data sets we can use a tensor , the higher-order analogue of a matrix. However, many theoretical and practical challenges arise in extending linear algebra to the setting of tensors. The first part of the thesis studies and develops the algebraic theory of tensors. The second part of the thesis presents three algorithms for tensor data. The algorithms use algebraic and geometric structure to give guarantees of optimality.

3. Statistical approaches for spatial prediction and anomaly detection

This machine learning dissertation is primarily a description of three projects. It starts with a method for spatial prediction and parameter estimation for irregularly spaced, and non-Gaussian data. It is shown that by judiciously replacing the likelihood with an empirical likelihood in the Bayesian hierarchical model, approximate posterior distributions for the mean and covariance parameters can be obtained. Due to the complex nature of the hierarchical model, standard Markov chain Monte Carlo methods cannot be applied to sample from the posterior distributions. To overcome this issue, a generalized sequential Monte Carlo algorithm is used. Finally, this method is applied to iron concentrations in California. The second project focuses on anomaly detection for functional data; specifically for functional data where the observed functions may lie over different domains. By approximating each function as a low-rank sum of spline basis functions the coefficients will be compared for each basis across each function. The idea being, if two functions are similar then their respective coefficients should not be significantly different. This project concludes with an application of the proposed method to detect anomalous behavior of users of a supercomputer at NREL. The final project is an extension of the second project to two-dimensional data. This project aims to detect location and temporal anomalies from ground motion data from a fiber-optic cable using distributed acoustic sensing (DAS). 

4. Sampling for Streaming Data

Advances in data acquisition technology pose challenges in analyzing large volumes of streaming data. Sampling is a natural yet powerful tool for analyzing such data sets due to their competent estimation accuracy and low computational cost. Unfortunately, sampling methods and their statistical properties for streaming data, especially streaming time series data, are not well studied in the literature. Meanwhile, estimating the dependence structure of multidimensional streaming time-series data in real-time is challenging. With large volumes of streaming data, the problem becomes more difficult when the multidimensional data are collected asynchronously across distributed nodes, which motivates us to sample representative data points from streams. This machine learning dissertation proposes a series of leverage score-based sampling methods for streaming time series data. The simulation studies and real data analysis are conducted to validate the proposed methods. The theoretical analysis of the asymptotic behaviors of the least-squares estimator is developed based on the subsamples.

5.  Statistical Machine Learning Methods for Complex, Heterogeneous Data

This machine learning dissertation develops statistical machine learning methodology for three distinct tasks. Each method blends classical statistical approaches with machine learning methods to provide principled solutions to problems with complex, heterogeneous data sets. The first framework proposes two methods for high-dimensional shape-constrained regression and classification. These methods reshape pre-trained prediction rules to satisfy shape constraints like monotonicity and convexity. The second method provides a nonparametric approach to the econometric analysis of discrete choice. This method provides a scalable algorithm for estimating utility functions with random forests, and combines this with random effects to properly model preference heterogeneity. The final method draws inspiration from early work in statistical machine translation to construct embeddings for variable-length objects like mathematical equations

6. Topics in Multivariate Statistics with Dependent Data

This machine learning dissertation comprises four chapters. The first is an introduction to the topics of the dissertation and the remaining chapters contain the main results. Chapter 2 gives new results for consistency of maximum likelihood estimators with a focus on multivariate mixed models. The presented theory builds on the idea of using subsets of the full data to establish consistency of estimators based on the full data. The theory is applied to two multivariate mixed models for which it was unknown whether maximum likelihood estimators are consistent. In Chapter 3 an algorithm is proposed for maximum likelihood estimation of a covariance matrix when the corresponding correlation matrix can be written as the Kronecker product of two lower-dimensional correlation matrices. The proposed method is fully likelihood-based. Some desirable properties of separable correlation in comparison to separable covariance are also discussed. Chapter 4 is concerned with Bayesian vector auto-regressions (VARs). A collapsed Gibbs sampler is proposed for Bayesian VARs with predictors and the convergence properties of the algorithm are studied. 

7.  Model Selection and Estimation for High-dimensional Data Analysis

In the era of big data, uncovering useful information and hidden patterns in the data is prevalent in different fields. However, it is challenging to effectively select input variables in data and estimate their effects. The goal of this machine learning dissertation is to develop reproducible statistical approaches that provide mechanistic explanations of the phenomenon observed in big data analysis. The research contains two parts: variable selection and model estimation. The first part investigates how to measure and interpret the usefulness of an input variable using an approach called “variable importance learning” and builds tools (methodology and software) that can be widely applied. Two variable importance measures are proposed, a parametric measure SOIL and a non-parametric measure CVIL, using the idea of a model combining and cross-validation respectively. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. The CVIL method possesses desirable theoretical properties and enhances the interpretability of many mysterious but effective machine learning methods. The second part focuses on how to estimate the effect of a useful input variable in the case where the interaction of two input variables exists. Investigated is the minimax rate of convergence for regression estimation in high-dimensional sparse linear models with two-way interactions, and construct an adaptive estimator that achieves the minimax rate of convergence regardless of the true heredity condition and the sparsity indices.

https://odsc.com/california/#register

8.  High-Dimensional Structured Regression Using Convex Optimization

While the term “Big Data” can have multiple meanings, this dissertation considers the type of data in which the number of features can be much greater than the number of observations (also known as high-dimensional data). High-dimensional data is abundant in contemporary scientific research due to the rapid advances in new data-measurement technologies and computing power. Recent advances in statistics have witnessed great development in the field of high-dimensional data analysis. This machine learning dissertation proposes three methods that study three different components of a general framework of the high-dimensional structured regression problem. A general theme of the proposed methods is that they cast a certain structured regression as a convex optimization problem. In so doing, the theoretical properties of each method can be well studied, and efficient computation is facilitated. Each method is accompanied by a thorough theoretical analysis of its performance, and also by an R package containing its practical implementation. It is shown that the proposed methods perform favorably (both theoretically and practically) compared with pre-existing methods.

9. Asymptotics and Interpretability of Decision Trees and Decision Tree Ensembles

Decision trees and decision tree ensembles are widely used nonparametric statistical models. A decision tree is a binary tree that recursively segments the covariate space along the coordinate directions to create hyper rectangles as basic prediction units for fitting constant values within each of them. A decision tree ensemble combines multiple decision trees, either in parallel or in sequence, in order to increase model flexibility and accuracy, as well as to reduce prediction variance. Despite the fact that tree models have been extensively used in practice, results on their asymptotic behaviors are scarce. This machine learning dissertation presents analyses on tree asymptotics in the perspectives of tree terminal nodes, tree ensembles, and models incorporating tree ensembles respectively. The study introduces a few new tree-related learning frameworks which provides provable statistical guarantees and interpretations. A study on the Gini index used in the greedy tree building algorithm reveals its limiting distribution, leading to the development of a test of better splitting that helps to measure the uncertain optimality of a decision tree split. This test is combined with the concept of decision tree distillation, which implements a decision tree to mimic the behavior of a block box model, to generate stable interpretations by guaranteeing a unique distillation tree structure as long as there are sufficiently many random sample points. Also applied is mild modification and regularization to the standard tree boosting to create a new boosting framework named Boulevard. Also included is an integration of two new mechanisms: honest trees , which isolate the tree terminal values from the tree structure, and adaptive shrinkage , which scales the boosting history to create an equally weighted ensemble. This theoretical development provides the prerequisite for the practice of statistical inference with boosted trees. Lastly, the thesis investigates the feasibility of incorporating existing semi-parametric models with tree boosting. 

10. Bayesian Models for Imputing Missing Data and Editing Erroneous Responses in Surveys

This dissertation develops Bayesian methods for handling unit nonresponse, item nonresponse, and erroneous responses in large-scale surveys and censuses containing categorical data. The focus is on applications of nested household data where individuals are nested within households and certain combinations of the variables are not allowed, such as the U.S. Decennial Census, as well as surveys subject to both unit and item nonresponse, such as the Current Population Survey.

11. Localized Variable Selection with Random Forest  

Due to recent advances in computer technology, the cost of collecting and storing data has dropped drastically. This makes it feasible to collect large amounts of information for each data point. This increasing trend in feature dimensionality justifies the need for research on variable selection. Random forest (RF) has demonstrated the ability to select important variables and model complex data. However, simulations confirm that it fails in detecting less influential features in presence of variables with large impacts in some cases. This dissertation proposes two algorithms for localized variable selection: clustering-based feature selection (CBFS) and locally adjusted feature importance (LAFI). Both methods aim to find regions where the effects of weaker features can be isolated and measured. CBFS combines RF variable selection with a two-stage clustering method to detect variables where their effect can be detected only in certain regions. LAFI, on the other hand, uses a binary tree approach to split data into bins based on response variable rankings, and implements RF to find important variables in each bin. Larger LAFI is assigned to variables that get selected in more bins. Simulations and real data sets are used to evaluate these variable selection methods. 

12. Functional Principal Component Analysis and Sparse Functional Regression

The focus of this dissertation is on functional data which are sparsely and irregularly observed. Such data require special consideration, as classical functional data methods and theory were developed for densely observed data. As is the case in much of functional data analysis, the functional principal components (FPCs) play a key role in current sparse functional data methods via the Karhunen-Loéve expansion. Thus, after a review of relevant background material, this dissertation is divided roughly into two parts, the first focusing specifically on theoretical properties of FPCs, and the second on regression for sparsely observed functional data.

13. Essays In Causal Inference: Addressing Bias In Observational And Randomized Studies Through Analysis And Design

In observational studies, identifying assumptions may fail, often quietly and without notice, leading to biased causal estimates. Although less of a concern in randomized trials where treatment is assigned at random, bias may still enter the equation through other means. This dissertation has three parts, each developing new methods to address a particular pattern or source of bias in the setting being studied. The first part extends the conventional sensitivity analysis methods for observational studies to better address patterns of heterogeneous confounding in matched-pair designs. The second part develops a modified difference-in-difference design for comparative interrupted time-series studies. The method permits partial identification of causal effects when the parallel trends assumption is violated by an interaction between group and history. The method is applied to a study of the repeal of Missouri’s permit-to-purchase handgun law and its effect on firearm homicide rates. The final part presents a study design to identify vaccine efficacy in randomized control trials when there is no gold standard case definition. The approach augments a two-arm randomized trial with natural variation of a genetic trait to produce a factorial experiment. 

14. Bayesian Shrinkage: Computation, Methods, and Theory

Sparsity is a standard structural assumption that is made while modeling high-dimensional statistical parameters. This assumption essentially entails a lower-dimensional embedding of the high-dimensional parameter thus enabling sound statistical inference. Apart from this obvious statistical motivation, in many modern applications of statistics such as Genomics, Neuroscience, etc. parameters of interest are indeed of this nature. For over almost two decades, spike and slab type priors have been the Bayesian gold standard for modeling of sparsity. However, due to their computational bottlenecks, shrinkage priors have emerged as a powerful alternative. This family of priors can almost exclusively be represented as a scale mixture of Gaussian distribution and posterior Markov chain Monte Carlo (MCMC) updates of related parameters are then relatively easy to design. Although shrinkage priors were tipped as having computational scalability in high-dimensions, when the number of parameters is in thousands or more, they do come with their own computational challenges. Standard MCMC algorithms implementing shrinkage priors generally scale cubic in the dimension of the parameter making real-life application of these priors severely limited. 

The first chapter of this dissertation addresses this computational issue and proposes an alternative exact posterior sampling algorithm complexity of which that linearly in the ambient dimension. The algorithm developed in the first chapter is specifically designed for regression problems. The second chapter develops a Bayesian method based on shrinkage priors for high-dimensional multiple response regression. Chapter three chooses a specific member of the shrinkage family known as the horseshoe prior and studies its convergence rates in several high-dimensional models. 

15.  Topics in Measurement Error Analysis and High-Dimensional Binary Classification

This dissertation proposes novel methods to tackle two problems: the misspecified model with measurement error and high-dimensional binary classification, both have a crucial impact on applications in public health. The first problem exists in the epidemiology practice. Epidemiologists often categorize a continuous risk predictor since categorization is thought to be more robust and interpretable, even when the true risk model is not a categorical one. Thus, their goal is to fit the categorical model and interpret the categorical parameters. The second project considers the problem of high-dimensional classification between the two groups with unequal covariance matrices. Rather than estimating the full quadratic discriminant rule, it is proposed to perform simultaneous variable selection and linear dimension reduction on original data, with the subsequent application of quadratic discriminant analysis on the reduced space. Further, in order to support the proposed methodology, two R packages were developed, CCP and DAP, along with two vignettes as long-format illustrations for their usage.

16. Model-Based Penalized Regression

This dissertation contains three chapters that consider penalized regression from a model-based perspective, interpreting penalties as assumed prior distributions for unknown regression coefficients. The first chapter shows that treating a lasso penalty as a prior can facilitate the choice of tuning parameters when standard methods for choosing the tuning parameters are not available, and when it is necessary to choose multiple tuning parameters simultaneously. The second chapter considers a possible drawback of treating penalties as models, specifically possible misspecification. The third chapter introduces structured shrinkage priors for dependent regression coefficients which generalize popular independent shrinkage priors. These can be useful in various applied settings where many regression coefficients are not only expected to be nearly or exactly equal to zero, but also structured.

17. Topics on Least Squares Estimation

This dissertation revisits and makes progress on some old but challenging problems concerning least squares estimation, the work-horse of supervised machine learning. Two major problems are addressed: (i) least squares estimation with heavy-tailed errors, and (ii) least squares estimation in non-Donsker classes. For (i), this problem is studied both from a worst-case perspective, and a more refined envelope perspective. For (ii), two case studies are performed in the context of (a) estimation involving sets and (b) estimation of multivariate isotonic functions. Understanding these particular aspects of least squares estimation problems requires several new tools in the empirical process theory, including a sharp multiplier inequality controlling the size of the multiplier empirical process, and matching upper and lower bounds for empirical processes indexed by non-Donsker classes.

How to Learn More about Machine Learning

At our upcoming event this November 16th-18th in San Francisco,  ODSC West 2021  will feature a plethora of talks, workshops, and training sessions on machine learning and machine learning research. You can  register now for 50% off all ticket types  before the discount drops to 40% in a few weeks. Some  highlighted sessions on machine learning  include:

  • Towards More Energy-Efficient Neural Networks? Use Your Brain!: Olaf de Leeuw | Data Scientist | Dataworkz
  • Practical MLOps: Automation Journey: Evgenii Vinogradov, PhD | Head of DHW Development | YooMoney
  • Applications of Modern Survival Modeling with Python: Brian Kent, PhD | Data Scientist | Founder The Crosstab Kite
  • Using Change Detection Algorithms for Detecting Anomalous Behavior in Large Systems: Veena Mendiratta, PhD | Adjunct Faculty, Network Reliability and Analytics Researcher | Northwestern University

Sessions on MLOps:

  • Tuning Hyperparameters with Reproducible Experiments: Milecia McGregor | Senior Software Engineer | Iterative
  • MLOps… From Model to Production: Filipa Peleja, PhD | Lead Data Scientist | Levi Strauss & Co
  • Operationalization of Models Developed and Deployed in Heterogeneous Platforms: Sourav Mazumder | Data Scientist, Thought Leader, AI & ML Operationalization Leader | IBM
  • Develop and Deploy a Machine Learning Pipeline in 45 Minutes with Ploomber: Eduardo Blancas | Data Scientist | Fidelity Investments

Sessions on Deep Learning:

  • GANs: Theory and Practice, Image Synthesis With GANs Using TensorFlow: Ajay Baranwal | Center Director | Center for Deep Learning in Electronic Manufacturing, Inc
  • Machine Learning With Graphs: Going Beyond Tabular Data: Dr. Clair J. Sullivan | Data Science Advocate | Neo4j
  • Deep Dive into Reinforcement Learning with PPO using TF-Agents & TensorFlow 2.0: Oliver Zeigermann | Software Developer | embarc Software Consulting GmbH
  • Get Started with Time-Series Forecasting using the Google Cloud AI Platform: Karl Weinmeister | Developer Relations Engineering Manager | Google

phd thesis in data science

Daniel Gutierrez, ODSC

Daniel D. Gutierrez is a practicing data scientist who’s been working with data long before the field came in vogue. As a technology journalist, he enjoys keeping a pulse on this fast-paced industry. Daniel is also an educator having taught data science, machine learning and R classes at the university level. He has authored four computer industry books on database and data science technology, including his most recent title, “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.” Daniel holds a BS in Mathematics and Computer Science from UCLA.

DE Summit Square

Microsoft Invests $1.5B in Abu Dhabi-Based AI Company G42

AI and Data Science News posted by ODSC Team Apr 17, 2024 As reported by Reuters, Microsoft has committed $1.5 billion to Abu Dhabi-based artificial intelligence company G42....

What is AI Washing and Why is it a Concern?

What is AI Washing and Why is it a Concern?

Responsible AI Modeling posted by ODSC Team Apr 16, 2024 A new term has emerged, capturing the attention of industry insiders and regulators alike: AI washing....

The AI Expo Hall and Other Ways to Attend ODSC East 2024 for Free

The AI Expo Hall and Other Ways to Attend ODSC East 2024 for Free

East 2024 Conferences posted by ODSC Team Apr 16, 2024 Hoping to attend ODSC East 2024, but a bit short on cash? Don’t worry, the team...

AI weekly square

Machine Learning - CMU

PhD Dissertations

PhD Dissertations

[all are .pdf files].

Learning Models that Match Jacob Tyo, 2024

Improving Human Integration across the Machine Learning Pipeline Charvi Rastogi, 2024

Reliable and Practical Machine Learning for Dynamic Healthcare Settings Helen Zhou, 2023

Automatic customization of large-scale spiking network models to neuronal population activity (unavailable) Shenghao Wu, 2023

Estimation of BVk functions from scattered data (unavailable) Addison J. Hu, 2023

Rethinking object categorization in computer vision (unavailable) Jayanth Koushik, 2023

Advances in Statistical Gene Networks Jinjin Tian, 2023 Post-hoc calibration without distributional assumptions Chirag Gupta, 2023

The Role of Noise, Proxies, and Dynamics in Algorithmic Fairness Nil-Jana Akpinar, 2023

Collaborative learning by leveraging siloed data Sebastian Caldas, 2023

Modeling Epidemiological Time Series Aaron Rumack, 2023

Human-Centered Machine Learning: A Statistical and Algorithmic Perspective Leqi Liu, 2023

Uncertainty Quantification under Distribution Shifts Aleksandr Podkopaev, 2023

Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023

Comparing Forecasters and Abstaining Classifiers Yo Joong Choe, 2023

Using Task Driven Methods to Uncover Representations of Human Vision and Semantics Aria Yuan Wang, 2023

Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023

Applied Mathematics of the Future Kin G. Olivares, 2023

METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023

NEURAL REASONING FOR QUESTION ANSWERING Haitian Sun, 2023

Principled Machine Learning for Societally Consequential Decision Making Amanda Coston, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Maxwell B. Wang, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Darby M. Losey, 2023

Calibrated Conditional Density Models and Predictive Inference via Local Diagnostics David Zhao, 2023

Towards an Application-based Pipeline for Explainability Gregory Plumb, 2022

Objective Criteria for Explainable Machine Learning Chih-Kuan Yeh, 2022

Making Scientific Peer Review Scientific Ivan Stelmakh, 2022

Facets of regularization in high-dimensional learning: Cross-validation, risk monotonization, and model complexity Pratik Patil, 2022

Active Robot Perception using Programmable Light Curtains Siddharth Ancha, 2022

Strategies for Black-Box and Multi-Objective Optimization Biswajit Paria, 2022

Unifying State and Policy-Level Explanations for Reinforcement Learning Nicholay Topin, 2022

Sensor Fusion Frameworks for Nowcasting Maria Jahja, 2022

Equilibrium Approaches to Modern Deep Learning Shaojie Bai, 2022

Towards General Natural Language Understanding with Probabilistic Worldbuilding Abulhair Saparov, 2022

Applications of Point Process Modeling to Spiking Neurons (Unavailable) Yu Chen, 2021

Neural variability: structure, sources, control, and data augmentation Akash Umakantha, 2021

Structure and time course of neural population activity during learning Jay Hennig, 2021

Cross-view Learning with Limited Supervision Yao-Hung Hubert Tsai, 2021

Meta Reinforcement Learning through Memory Emilio Parisotto, 2021

Learning Embodied Agents with Scalably-Supervised Reinforcement Learning Lisa Lee, 2021

Learning to Predict and Make Decisions under Distribution Shift Yifan Wu, 2021

Statistical Game Theory Arun Sai Suggala, 2021

Towards Knowledge-capable AI: Agents that See, Speak, Act and Know Kenneth Marino, 2021

Learning and Reasoning with Fast Semidefinite Programming and Mixing Methods Po-Wei Wang, 2021

Bridging Language in Machines with Language in the Brain Mariya Toneva, 2021

Curriculum Learning Otilia Stretcu, 2021

Principles of Learning in Multitask Settings: A Probabilistic Perspective Maruan Al-Shedivat, 2021

Towards Robust and Resilient Machine Learning Adarsh Prasad, 2021

Towards Training AI Agents with All Types of Experiences: A Unified ML Formalism Zhiting Hu, 2021

Building Intelligent Autonomous Navigation Agents Devendra Chaplot, 2021

Learning to See by Moving: Self-supervising 3D Scene Representations for Perception, Control, and Visual Reasoning Hsiao-Yu Fish Tung, 2021

Statistical Astrophysics: From Extrasolar Planets to the Large-scale Structure of the Universe Collin Politsch, 2020

Causal Inference with Complex Data Structures and Non-Standard Effects Kwhangho Kim, 2020

Networks, Point Processes, and Networks of Point Processes Neil Spencer, 2020

Dissecting neural variability using population recordings, network models, and neurofeedback (Unavailable) Ryan Williamson, 2020

Predicting Health and Safety: Essays in Machine Learning for Decision Support in the Public Sector Dylan Fitzpatrick, 2020

Towards a Unified Framework for Learning and Reasoning Han Zhao, 2020

Learning DAGs with Continuous Optimization Xun Zheng, 2020

Machine Learning and Multiagent Preferences Ritesh Noothigattu, 2020

Learning and Decision Making from Diverse Forms of Information Yichong Xu, 2020

Towards Data-Efficient Machine Learning Qizhe Xie, 2020

Change modeling for understanding our world and the counterfactual one(s) William Herlands, 2020

Machine Learning in High-Stakes Settings: Risks and Opportunities Maria De-Arteaga, 2020

Data Decomposition for Constrained Visual Learning Calvin Murdock, 2020

Structured Sparse Regression Methods for Learning from High-Dimensional Genomic Data Micol Marchetti-Bowick, 2020

Towards Efficient Automated Machine Learning Liam Li, 2020

LEARNING COLLECTIONS OF FUNCTIONS Emmanouil Antonios Platanios, 2020

Provable, structured, and efficient methods for robustness of deep networks to adversarial examples Eric Wong , 2020

Reconstructing and Mining Signals: Algorithms and Applications Hyun Ah Song, 2020

Probabilistic Single Cell Lineage Tracing Chieh Lin, 2020

Graphical network modeling of phase coupling in brain activity (unavailable) Josue Orellana, 2019

Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees Christoph Dann, 2019 Learning Generative Models using Transformations Chun-Liang Li, 2019

Estimating Probability Distributions and their Properties Shashank Singh, 2019

Post-Inference Methods for Scalable Probabilistic Modeling and Sequential Decision Making Willie Neiswanger, 2019

Accelerating Text-as-Data Research in Computational Social Science Dallas Card, 2019

Multi-view Relationships for Analytics and Inference Eric Lei, 2019

Information flow in networks based on nonstationary multivariate neural recordings Natalie Klein, 2019

Competitive Analysis for Machine Learning & Data Science Michael Spece, 2019

The When, Where and Why of Human Memory Retrieval Qiong Zhang, 2019

Towards Effective and Efficient Learning at Scale Adams Wei Yu, 2019

Towards Literate Artificial Intelligence Mrinmaya Sachan, 2019

Learning Gene Networks Underlying Clinical Phenotypes Under SNP Perturbations From Genome-Wide Data Calvin McCarter, 2019

Unified Models for Dynamical Systems Carlton Downey, 2019

Anytime Prediction and Learning for the Balance between Computation and Accuracy Hanzhang Hu, 2019

Statistical and Computational Properties of Some "User-Friendly" Methods for High-Dimensional Estimation Alnur Ali, 2019

Nonparametric Methods with Total Variation Type Regularization Veeranjaneyulu Sadhanala, 2019

New Advances in Sparse Learning, Deep Networks, and Adversarial Learning: Theory and Applications Hongyang Zhang, 2019

Gradient Descent for Non-convex Problems in Modern Machine Learning Simon Shaolei Du, 2019

Selective Data Acquisition in Learning and Decision Making Problems Yining Wang, 2019

Anomaly Detection in Graphs and Time Series: Algorithms and Applications Bryan Hooi, 2019

Neural dynamics and interactions in the human ventral visual pathway Yuanning Li, 2018

Tuning Hyperparameters without Grad Students: Scaling up Bandit Optimisation Kirthevasan Kandasamy, 2018

Teaching Machines to Classify from Natural Language Interactions Shashank Srivastava, 2018

Statistical Inference for Geometric Data Jisu Kim, 2018

Representation Learning @ Scale Manzil Zaheer, 2018

Diversity-promoting and Large-scale Machine Learning for Healthcare Pengtao Xie, 2018

Distribution and Histogram (DIsH) Learning Junier Oliva, 2018

Stress Detection for Keystroke Dynamics Shing-Hon Lau, 2018

Sublinear-Time Learning and Inference for High-Dimensional Models Enxu Yan, 2018

Neural population activity in the visual cortex: Statistical methods and application Benjamin Cowley, 2018

Efficient Methods for Prediction and Control in Partially Observable Environments Ahmed Hefny, 2018

Learning with Staleness Wei Dai, 2018

Statistical Approach for Functionally Validating Transcription Factor Bindings Using Population SNP and Gene Expression Data Jing Xiang, 2017

New Paradigms and Optimality Guarantees in Statistical Learning and Estimation Yu-Xiang Wang, 2017

Dynamic Question Ordering: Obtaining Useful Information While Reducing User Burden Kirstin Early, 2017

New Optimization Methods for Modern Machine Learning Sashank J. Reddi, 2017

Active Search with Complex Actions and Rewards Yifei Ma, 2017

Why Machine Learning Works George D. Montañez , 2017

Source-Space Analyses in MEG/EEG and Applications to Explore Spatio-temporal Neural Dynamics in Human Vision Ying Yang , 2017

Computational Tools for Identification and Analysis of Neuronal Population Activity Pengcheng Zhou, 2016

Expressive Collaborative Music Performance via Machine Learning Gus (Guangyu) Xia, 2016

Supervision Beyond Manual Annotations for Learning Visual Representations Carl Doersch, 2016

Exploring Weakly Labeled Data Across the Noise-Bias Spectrum Robert W. H. Fisher, 2016

Optimizing Optimization: Scalable Convex Programming with Proximal Operators Matt Wytock, 2016

Combining Neural Population Recordings: Theory and Application William Bishop, 2015

Discovering Compact and Informative Structures through Data Partitioning Madalina Fiterau-Brostean, 2015

Machine Learning in Space and Time Seth R. Flaxman, 2015

The Time and Location of Natural Reading Processes in the Brain Leila Wehbe, 2015

Shape-Constrained Estimation in High Dimensions Min Xu, 2015

Spectral Probabilistic Modeling and Applications to Natural Language Processing Ankur Parikh, 2015 Computational and Statistical Advances in Testing and Learning Aaditya Kumar Ramdas, 2015

Corpora and Cognition: The Semantic Composition of Adjectives and Nouns in the Human Brain Alona Fyshe, 2015

Learning Statistical Features of Scene Images Wooyoung Lee, 2014

Towards Scalable Analysis of Images and Videos Bin Zhao, 2014

Statistical Text Analysis for Social Science Brendan T. O'Connor, 2014

Modeling Large Social Networks in Context Qirong Ho, 2014

Semi-Cooperative Learning in Smart Grid Agents Prashant P. Reddy, 2013

On Learning from Collective Data Liang Xiong, 2013

Exploiting Non-sequence Data in Dynamic Model Learning Tzu-Kuo Huang, 2013

Mathematical Theories of Interaction with Oracles Liu Yang, 2013

Short-Sighted Probabilistic Planning Felipe W. Trevizan, 2013

Statistical Models and Algorithms for Studying Hand and Finger Kinematics and their Neural Mechanisms Lucia Castellanos, 2013

Approximation Algorithms and New Models for Clustering and Learning Pranjal Awasthi, 2013

Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems Mladen Kolar, 2013

Learning with Sparsity: Structures, Optimization and Applications Xi Chen, 2013

GraphLab: A Distributed Abstraction for Large Scale Machine Learning Yucheng Low, 2013

Graph Structured Normal Means Inference James Sharpnack, 2013 (Joint Statistics & ML PhD)

Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data Hai-Son Phuoc Le, 2013

Learning Large-Scale Conditional Random Fields Joseph K. Bradley, 2013

New Statistical Applications for Differential Privacy Rob Hall, 2013 (Joint Statistics & ML PhD)

Parallel and Distributed Systems for Probabilistic Reasoning Joseph Gonzalez, 2012

Spectral Approaches to Learning Predictive Representations Byron Boots, 2012

Attribute Learning using Joint Human and Machine Computation Edith L. M. Law, 2012

Statistical Methods for Studying Genetic Variation in Populations Suyash Shringarpure, 2012

Data Mining Meets HCI: Making Sense of Large Graphs Duen Horng (Polo) Chau, 2012

Learning with Limited Supervision by Input and Output Coding Yi Zhang, 2012

Target Sequence Clustering Benjamin Shih, 2011

Nonparametric Learning in High Dimensions Han Liu, 2010 (Joint Statistics & ML PhD)

Structural Analysis of Large Networks: Observations and Applications Mary McGlohon, 2010

Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy Brian D. Ziebart, 2010

Tractable Algorithms for Proximity Search on Large Graphs Purnamrita Sarkar, 2010

Rare Category Analysis Jingrui He, 2010

Coupled Semi-Supervised Learning Andrew Carlson, 2010

Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong, 2009

Efficient Matrix Models for Relational Learning Ajit Paul Singh, 2009

Exploiting Domain and Task Regularities for Robust Named Entity Recognition Andrew O. Arnold, 2009

Theoretical Foundations of Active Learning Steve Hanneke, 2009

Generalized Learning Factors Analysis: Improving Cognitive Models with Machine Learning Hao Cen, 2009

Detecting Patterns of Anomalies Kaustav Das, 2009

Dynamics of Large Networks Jurij Leskovec, 2008

Computational Methods for Analyzing and Modeling Gene Regulation Dynamics Jason Ernst, 2008

Stacked Graphical Learning Zhenzhen Kou, 2007

Actively Learning Specific Function Properties with Applications to Statistical Inference Brent Bryan, 2007

Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Pradeep Ravikumar, 2007

Scalable Graphical Models for Social Networks Anna Goldenberg, 2007

Measure Concentration of Strongly Mixing Processes with Applications Leonid Kontorovich, 2007

Tools for Graph Mining Deepayan Chakrabarti, 2005

Automatic Discovery of Latent Variable Models Ricardo Silva, 2005

phd thesis in data science

DigitalCommons@Kennesaw State University

Home > CCSE > Data Science and Analytics > PhD DSA

Doctor of Data Science and Analytics Dissertations

The PhD Website

The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests.

We launched the first formal PhD program in Data Science in 2015. Our program sits at the intersection of computer science, statistics, mathematics, and business. Our students engage in relevant research with faculty from across our eleven colleges. As one of the institutions on the forefront of the development of data science as an academic discipline, we are committed to developing the next generation of Data Science leaders, researchers, and educators. Culturally, we are committed to the discipline of Data Science, through ethical practices, attention to fairness, to a diverse student body, to academic excellence, and research which makes positive contributions to our local, regional, and global community. -Sherry Ni, Director, Ph.D. in Data Science and Analytics

This degree will train individuals to translate and facilitate new innovative research, structured and unstructured, complex data into information to improve decision making. This curriculum includes heavy emphasis on programming, data mining, statistical modeling, and the mathematical foundations to support these concepts. Importantly, the program also emphasizes communication skills – both oral and written – as well as application and tying results to business and research problems.

Need to Submit Your Dissertation? Submit Here!

Dissertations from 2023 2023.

Quantification of Various Types of Biases in Large Language Models , Sudhashree Sayenju

Dissertations from 2022 2022

Appley: Approximate Shapley Values for Model Explainability in Linear Time , Md Shafiul Alam

Ethical Analytics: A Framework for a Practically-Oriented Sub-Discipline of AI Ethics , Jonathan Boardman

Novel Instance-Level Weighted Loss Function for Imbalanced Learning , Trent Geisler

Debiasing Cyber Incidents – Correcting for Reporting Delays and Under-reporting , Seema Sangari

Dissertations from 2021 2021

Integrated Machine Learning Approaches to Improve Classification performance and Feature Extraction Process for EEG Dataset , Mohammad Masum

A Distance-Based Clustering Framework for Categorical Time Series: A Case Study in Episodes of Care Healthcare Delivery System , Lauren Staples

Dissertations from 2020 2020

A CREDIT ANALYSIS OF THE UNBANKED AND UNDERBANKED: AN ARGUMENT FOR ALTERNATIVE DATA , Edwin Baidoo

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects of Normalization Strategies , Jessica M. Rudd

Data-driven Investment Decisions in P2P Lending: Strategies of Integrating Credit Scoring and Profit Scoring , Yan Wang

A Novel Penalized Log-likelihood Function for Class Imbalance Problem , Lili Zhang

ATTACK AND DEFENSE IN SECURITY ANALYTICS , Yiyun Zhou

Dissertations from 2019 2019

One and Two-Step Estimation of Time Variant Parameters and Nonparametric Quantiles , Bogdan Gadidov

Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis , Jie Hao

Deep Embedding Kernel , Linh Le

Ordinal HyperPlane Loss , Bob Vanderheyden

Advanced Search

  • Notify me via email or RSS
  • All Collections
  • Disciplines
  • Conferences
  • Faculty Works
  • Open Access
  • Research Support
  • Student Works
  • Data Science Homepage

Useful Links

  • Training Materials

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright DigitalCommons@Kennesaw State University ISSN: 2576-6805

PhD in Data Science

First Year Requirements

The standard first-year program requires students to complete nine courses: four required courses (1-4 below); one elective either in mathematical foundations or scalability and computing (pick from either 5 or 6); and finally four other electives that can come from proposed courses in data science or existing graduate courses in Computer Science or Statistics. Some students, after consulting with the committee graduate advisor, might decide to take the nine courses over the first two years.

Required courses:

  • Foundations of Machine Learning and AI Part 1
  • Responsible Use of Data and Algorithms
  • Data Interaction
  • Systems for Data and Computers/Data Design
  • Foundations of Machine Learning and AI Part 2 
  • Data Engineering and Scalable Computing

Synthesis project

Students will take courses during the first two years after which they focus primarily on their research. A milestone in this transition is completion of a synthesis project before the end of the second year in the program. Thesis projects can be done in partnership with any of DSI affiliates, and aims to meaningfully connect PhD students to their chosen focus areas.

Thesis Advisor and Dissertation Committee

Students typically select a thesis advisor by the beginning of their second year. By the end of the third year, each PhD student, after consultation with their advisor, shall establish a thesis committee of at least three faculty members, including the advisor, with at least half of the members coming from the Committee on Data Science.

Proposal Presentation and Admission to Candidacy

By the end of the third year, students should have scheduled and completed a proposal presentation to their committee, in order to be advanced to candidacy. The proposal presentation is typically an hourlong meeting that begins with a 30-minute presentation by the student, followed by a question and discussion period with the committee.

Dissertation Defense

The PhD degree will be awarded following a successful defense and the electronic submission of the final version of the dissertation to the University’s Dissertation Office.

phd thesis in data science

Analytics Insight

10 Best Research and Thesis Topic Ideas for Data Science in 2022

' src=

These research and thesis topics for data science will ensure more knowledge and skills for both students and scholars

  • Handling practical video analytics in a distributed cloud:  With increased dependency on the internet, sharing videos has become a mode of data and information exchange. The role of the implementation of the Internet of Things (IoT), telecom infrastructure, and operators is huge in generating insights from video analytics. In this perspective, several questions need to be answered, like the efficiency of the existing analytics systems, the changes about to take place if real-time analytics are integrated, and others.
  • Smart healthcare systems using big data analytics: Big data analytics plays a significant role in making healthcare more efficient, accessible, and cost-effective. Big data analytics enhances the operational efficiency of smart healthcare providers by providing real-time analytics. It enhances the capabilities of the intelligent systems by using short-span data-driven insights, but there are still distinct challenges that are yet to be addressed in this field.
  • Identifying fake news using real-time analytics:  The circulation of fake news has become a pressing issue in the modern era. The data gathered from social media networks might seem legit, but sometimes they are not. The sources that provide the data are unauthenticated most of the time, which makes it a crucial issue to be addressed.
  • TOP 10 DATA SCIENCE JOB SKILLS THAT WILL BE ON HIGH DEMAND IN 2022
  • TOP 10 DATA SCIENCE UNDERGRADUATE COURSES IN INDIA FOR 2022
  • TOP DATA SCIENCE PROJECTS TO DO DURING YOUR OMICRON QUARANTINE
  • Secure federated learning with real-world applications : Federated learning is a technique that trains an algorithm across multiple decentralized edge devices and servers. This technique can be adopted to build models locally, but if this technique can be deployed at scale or not, across multiple platforms with high-level security is still obscure.
  • Big data analytics and its impact on marketing strategy : The advent of data science and big data analytics has entirely redefined the marketing industry. It has helped enterprises by offering valuable insights into their existing and future customers. But several issues like the existence of surplus data, integrating complex data into customers’ journeys, and complete data privacy are some of the branches that are still untrodden and need immediate attention.
  • Impact of big data on business decision-making: Present studies signify that big data has transformed the way managers and business leaders make critical decisions concerning the growth and development of the business. It allows them to access objective data and analyse the market environments, enabling companies to adapt rapidly and make decisions faster. Working on this topic will help students understand the present market and business conditions and help them analyse new solutions.
  • Implementing big data to understand consumer behaviour : In understanding consumer behaviour, big data is used to analyse the data points depicting a consumer’s journey after buying a product. Data gives a clearer picture in understanding specific scenarios. This topic will help understand the problems that businesses face in utilizing the insights and develop new strategies in the future to generate more ROI.
  • Applications of big data to predict future demand and forecasting : Predictive analytics in data science has emerged as an integral part of decision-making and demand forecasting. Working on this topic will enable the students to determine the significance of the high-quality historical data analysis and the factors that drive higher demand in consumers.
  • The importance of data exploration over data analysis : Exploration enables a deeper understanding of the dataset, making it easier to navigate and use the data later. Intelligent analysts must understand and explore the differences between data exploration and analysis and use them according to specific needs to fulfill organizational requirements.
  • Data science and software engineering : Software engineering and development are a major part of data science. Skilled data professionals should learn and explore the possibilities of the various technical and software skills for performing critical AI and big data tasks.

Whatsapp Icon

Disclaimer: Any financial and crypto market information given on Analytics Insight are sponsored articles, written for informational purpose only and is not an investment advice. The readers are further advised that Crypto products and NFTs are unregulated and can be highly risky. There may be no regulatory recourse for any loss from such transactions. Conduct your own research by contacting financial experts before making any investment decisions. The decision to read hereinafter is purely a matter of choice and shall be construed as an express undertaking/guarantee in favour of Analytics Insight of being absolved from any/ all potential legal action, or enforceable claims. We do not represent nor own any cryptocurrency, any complaints, abuse or concerns with regards to the information provided shall be immediately informed here .

You May Also Like

web 3.0

Best Web 3.0 Crypto Coins to Buy Now | The Ultimate Guide to the Top Trending Web3 Cryptocurrencies and DeFi Tokens To Buy in 2023

Tech News

Top Tech News Today: New Feature of ChatGPT That Gives the Chatbot a Human Voice. Crypto Firms with Ties to Justin Sun Lose US$115 Million in Hack

Scorpion Casino

Top Cryptos To Watch During This Bull Run: Uniswap, Scorpion Casino and Chainlink

Data Science Influencers to Follow on YouTube

Top 10 Data Science Influencers to Follow on YouTube in 2023

AI-logo

Analytics Insight® is an influential platform dedicated to insights, trends, and opinion from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.

linkedin

  • Select Language:
  • Privacy Policy
  • Content Licensing
  • Terms & Conditions
  • Submit an Interview

Special Editions

  • Dec – Crypto Weekly Vol-1
  • 40 Under 40 Innovators
  • Women In Technology
  • Market Reports
  • AI Glossary
  • Infographics

Latest Issue

Magazine April 2024

Disclaimer: Any financial and crypto market information given on Analytics Insight is written for informational purpose only and is not an investment advice. Conduct your own research by contacting financial experts before making any investment decisions, more information here .

Second Menu

phd thesis in data science

  • Current Students
  • Online Only Students
  • Faculty & Staff
  • Parents & Family
  • Alumni & Friends
  • Community & Business
  • Student Life
  • College of Computing and Software Engineering
  • Executive Advisory Board
  • CCSE Job Openings
  • Academic Advising
  • Student Resources
  • Faculty Resources
  • School of Data Science and Analytics
  • Department of Computer Science
  • Department of Information Technology
  • Department of Software Engineering and Game Development
  • Undergraduate
  • Why Partner?
  • Ways to Engage
  • Friends & Corporate Affiliates
  • K-12 outreach
  • Internship Networking

PhD in Data Science and Analytics

PhD in Data Science and Analytics

Degrees & Programs

  • Doctoral Degree in Data Science and Analytics
  • Certificates

We launched the first formal PhD program in Data Science in 2015.  Our program sits at the intersection ofcomputer science, statistics, mathematics, and business.  Our students engage in relevant research with faculty from across our eleven colleges.  As one of the institutions on the forefront of the development of data science as an academic discipline, we are committed to developing the next generation of Data Science leaders, researchers, and educators. Culturally, we are committed to the discipline of Data Science, through ethical practices, attention to fairness, to a diverse student body, to academic excellence, and research which makes positive contributions to our local, regional, and global community.   

Herman Ray , Director, Ph.D. in Data Science and Analytics

Sherry Ni

About the Doctoral Degree in Data Science and Analytics

This degree will train individuals to translate and facilitate new innovative research, structured and unstructured, complex data into information to improve decision making. This curriculum includes heavy emphasis on programming, data mining, statistical modeling, and the mathematical foundations to support these concepts. Importantly, the program also emphasizes communication skills – both oral and written – as well as application and tying results to business and research problems.

Because this degree is a Ph.D., it creates flexibility. Graduates can either pursue a position in the private or public sector as a "practicing" Data Scientist – where continued demand is expected to greatly outpace the supply - or pursue a position within academia, where they would be uniquely qualified to teach these skills to the next generation.

Information Sessions for Fall 2025 Admission

To be announced

Data Science and Analytics PhD Curriculum

Stage One: Pre-Program Requirements

  • Successful applicants will have completed a masters degree in a computational field (e.g., engineering, computer science, statistics, economics, finance, etc.)
  • Applicants are expected to have deep proficiency in at least one analytical programming language (e.g., SAS, R, Python). SQL and Java are helpful but not required.
  • Interested applicants who have earned an undergraduate degree are encouraged to apply to the Ph.D. Program with the embedded MS in Computer Science or with the MS in Applied Statistics.

Stage Two: Coursework

The Ph.D. in Data Science and Analytics requires 78 total credit hours spread over four years of study. Example Program of Study: 

  • CS 8265  - Big Data Analytics
  • CS 8267  - Machine Learning
  • MATH 8010  - Theory of Linear Models (optional)
  • MATH 8020  - Graph Theory
  • MATH 8030  - Applied Discrete and Combinatorial Mathematics 
  • STAT 8240  - Data Mining I
  • STAT 8250  - Data Mining II
  • Comprehensive Exam 
  • 21 credit hours of electives in computer science, statistics, mathematics, information technology, or other area by permission. 
  • Research Proposal 
  • DS 9700 Doctoral Internship/Research Lab
  • DS 9900 Dissertation
  • Dissertation Proposal Defense
  • DS 9900 DissertationFinal Dissertation Defense

Stage Three: Project Engagement and Research/Dissertation

Relevant, interdisciplinary research forms the foundation of the Ph.D. in Data Science and Analytics. While students are encouraged to engage in research from their first semester, the last two years of the program are structured to help students transition into becoming independent, lead researchers. In this last stage of the program, students will work with research faculty, including their advisor, in one of our data science research labs.

Program Student Learning Outcomes

At the end of the program, students will be able to:

  • Demonstrate their understanding of the research process
  • Demonstrate mastery of core concepts relevant to three key areas in mathematics, statistics and computer science
  • Develop themselves as professionals prepared for work as a doctoral-educated individual beyond graduation

Admission Requirements and Application

Frequently Asked Questions (FAQ)

How long will the program take?

How much does the program cost?

Who would be successful in the program?

Where do these graduates work after graduation?

What are the publication/research requirements?

What did Science Doctoral Students Study?

  • Applied Computer Science
  • Applied Economics and Statistics
  • Applied Statistics
  • Applied Mathematics
  • Bioinformatics
  • Business Analytics
  • Chemical Biology
  • Computer Science
  • Data Science
  • Forecasting & Strategic Management
  • Integrative Biology
  • Public Admin in Economic Policy Mgmt
  • Mathematics
  • Mechanical Engineering
  • Software Engineering

What is the Project Engagement requirement?

Can I pursue the program part- time while I am working full-time?

Can I live on campus?

Are the courses online?

Do I have to have a masters degree to apply?

Where did Data Doctoral Students Study?

  • Ajou University, South Korea
  • Albert-Ludwigs University of Freiburg
  • Auburn University
  • Bowling Green State University
  • Clemson University
  • Columbia University
  • Columbus State University
  • Florida State University
  • Georgia Southern University
  • Georgia State
  • Georgia Tech
  • Iran University of Science and Technology
  • Kennesaw State University
  • Marshall University
  • Michigan State University
  • Murray State University
  • North Carolina State University
  • St. Petersburg State University, Russia
  • University of KwaZulu-Natal, South Africa
  • University of Michigan
  • University of North Carolina
  • University of Toledo

Ph.D. in Data Science and Analytics Student Cohorts

Royce Alfred

Royce Alfred

Bachelor's Degree:   Psychology, Kennesaw State University

Master's Degree:   Applied Statistics and Analytics, Kennesaw State University

Work History:   4 years as a Data Scientist at Equifax

Professional Objective:   Work as a research data scientist in the corporate environment

Venkata Abhiram Chitty

Venkata Abhiram Chitty

Bachelor's Degree:   Mathematics, Statistics and Computer Science, Osmania University, Telangana, India

Master's Degree:   Data Science, VIT-AP University, Amaravati, Andhra Pradesh, India

Professional Objective:   To apply my Data Science skills in public health domain and help the society

Caleb Greski

Caleb Greski

Bachelor's Degree: 

Master's Degree: 

Work History: 

Courses Taught: 

Publications: 

Professional Objective: 

Moukthika Kadaparthi

Moukthika Kadaparthi

Bachelor's Degree:   Electrical and Electronics Engineering, SASTRA Deemed University

Master's Degree:   Computers and Information Science, Cleveland State University

Work History:  

  • Business Intelligence Analyst, Philips Healthcare, Georgia
  • Graduate Research Assistant, Cleveland State University, Ohio 

Professional Objective:   My objective is to enter academia with the aim of sharing the practical applications of data science in diverse domains and its potential positive impacts. With my unique blend of academic rigor and industry experience, I am driven to analyze complex data sets using cutting-edge data science techniques, to provide actionable insights and support data-driven decision-making.

Qiaomu Li

Bachelor's Degree:   Civil Engineering, Huazhong University of Science and Technology, China

Master's Degree:   Business Analytics, Syracuse University

  • Credit Modeling Analyst, Agricultural Development Bank of China
  • Research Assistant, Changjiang Securities
  • Graduate Assistant, Syracuse University

Courses Taught:  Calculus I, Marketing Analytics, Data Mining

Awards:   Merit-Based Scholarship, Syracuse University

Professional Objective:   To secure a challenging position in a reputable organization to expand myself within the field of Artificial Intelligence.

Kausar Perveen

Kausar Perveen

Bachelor's Degree:   Bachelor in Engineering Software Engineering, National University of Sciences and Technology, Pakistan

Master's Degree:   Masters in Data Science, Illinois Institute of Technology, Chicago

  • Fullstack Developer at ItRunsInMyFamily, Charleston, South Carolina
  • Software Engineer II , Xgrid Pakistan
  • Senior Research Coordinator, Aga Khan University Pakistan
  • Machine Learning Engineer, Agoda Thailand

Publications:  National cervical cancer burden estimation through systematic review and analysis of publicly available data in Pakistan 

Service and Awards:

  • Fulbright Scholarship award for Master’s degree in Data Science
  • Aga Khan Education Service Pakistan, merit cumulative need based scholarship for Bachelors in Software Engineering 

Professional Objective:  My main motivation behind getting a degree in Data Science is to receive and perform qualified research experience in Data Science and public health

Promi Roy

Bachelor's Degree:   Statistics, University of Dhaka, Dhaka, Bangladesh

Master's Degree:   Mathematics (Statistics Concentration), University of Toledo, Ohio

  • Analytics Engineer Intern, Cooper Smith, Toledo, Ohio
  • Business AnalystAkij Food and Beverage Limited, Dhaka, Bangladesh

Courses Taught:   Introduction to Statistics

Professional Objective:   I am interested to work as a data scientist in the industry

Ayomide Isaac Afolabi

Ayomide Isaac Afolabi

Bachelor's Degree:  Chemical Engineering, Ladoke Akintola University of Technology 

Master's Degree:  Data Science, Auburn University 

Work History:   Graduate Research Assistant, Auburn University 

Courses Taught:   Python Programming 

Publications:   Larson EA, Afolabi A, Zheng J, Ojeda AS. Sterols and sterol ratios to trace fecal contamination: pitfalls and potential solutions. Environ Sci Pollut Res Int. 2022 Jul;29(35):53395-53402.  doi: 10.1007/s11356-022-19611-2 . Epub 2022 Mar 14. PMID: 35287190

Professional Objective:  To work as a research data scientist in the industry

Dinesh Chowdary Attota

Dinesh Chowdary Attota

Bachelor's Degree:   Computer Science, Jawaharlal Nehru Technological University Kakinada (JNTUK), India

Master's Degree:   Computer Science, Kennesaw State University

Work History:   Associate Consultant, SL Techknow Solutions India Pvt Ltd, India  2018 - 2020

Publications:  

  • An Ensemble Multi-View Federated Learning Intrusion Detection for IoT
  • A Conversational Recommender System for Exploring Pedagogical Design Patterns
  • An Ensembled Method For Diabetic Retinopathy Classification using Transfer Learning  

Professional Objective:   I'd like to be a faculty member at a university so that I can continue to do research.

Nzubechukwu Ohalete

Nzubechukwu Ohalete

Bachelor's Degree:   Mathematics,University of Nigeria, Nsukka

Master's Degree:   Applied Statistics, Bowling Green State University

Work History:   Graduate Assistant/Data Analyst, Federal University of Technology, Owerri - Mathematics Department

Courses Taught:  Elementary Mathematics, Mathematical Methods

Awards:   James A. Sullivan Outstanding Graduate Student Award, Applied Statistics and Operations Research Department, April 2022

Professional Objective:   To use data science techniques to solve problems which makes our lives better and also makes our world a better place

Ryan Parker

Ryan Parker

Bachelor's Degree:  Microbiology, University of Tennessee - Knoxville

Master's Degree:   Integrative Biology, Kennesaw State University

Work History:  Instructor of Biology, Kennesaw State University

Courses Taught:   Nursing Microbiology Lectures and Labs, Introductory Biology Labs, Biotechnology Lectures and Labs

  • Parker RA, Gabriel KT, Graham K, Cornelison CT. Validation of methylene blue viability staining with the emerging pathogen Candida auris. J Microbiol Methods. 2020 Feb;169:105829.   doi: 10.1016/j.mimet.2019.105829 . Epub 2019 Dec 27. PMID: 31884053.
  • Parker RA, Gabriel KT, Graham KD, Butts BK, Cornelison CT. Antifungal Activity of Select Essential Oils against Candida auris and Their Interactions with Antifungal Drugs. Pathogens. 2022 Jul 22;11(8):821.   doi: 10.3390/pathogens11080821 . PMID: 35894044; PMCID: PMC9331469.

Awards:   Best Graduate Poster: Symposium for Student Scholars hosted by Kennesaw State University (Fall 2018) for Poster: "Antifungal Activity of Select Essential Oils and Synergism with Antifungal Drugs against Candida auris"

Professional Objective : To apply Data Science techniques to large scientific datasets, such as genomic and astronomical data, and to help bridge the gap between disparate fields by working in an interdisciplinary space to offer integrative and data-driven solutions to the increasingly complex problems presented to the traditional Sciences.

Askhat Yktybaev

Askhat Yktybaev

Bachelor's Degree:   Forecasting and Strategic Management, Saint-Petersburg State University of Economics and Finance, Russia

Master's Degree:   Forecasting and Strategic Management, Saint-Petersburg State University of Economics and Finance, Russia; Public Administration in Economic Policy Management, School of International and Public Affairs, Columbia University

Work History:

  • from Data Analyst to Head of Research Unit, Central Bank of Kyrgyz Republic
  • Sr. Data Scientist in OJSC, Aiyl Bank, Kyrgyzstan
  • Consultant, The World Bank, Washington D.C.

Courses Taught:   Financial Programing in the Central Bank, Monetary Policy Transmission Mechanism

Service and Awards:   Winner of the Joint Japan/World Bank Graduate Scholarship Program, National Bank Silver Medal for Best Forecast

Professional Objective:   I want to found a successful Fintech startup one day.

Sanad Biswas

Sanad Biswas

Bachelor's Degree:   Statistics, Biostatistics and Informatics, University of Dhaka, Bangladesh

Master's Degree:   Statistics, University of Toledo, OH

  • Research Assistant: US Army Research Lab, Kennesaw State University
  • Consultant, Statistical Consulting Service, University of Toledo
  • Graduate Teaching Assistant, University of Toledo

Courses Taught:   Calculus and Business Calculus, Facilitated students’ study of Statistics courses at the University of Toledo.

Professional Objective:   To work as a researcher in the industry or as a faculty. I am primarily interested in the application of machine learning in different fields.

Mallika Boyapati

Mallika Boyapati

Bachelor's Degree:  Electronics and Computer Engineering, K L University, India

Master's Degree:  Applied Computer Science, Columbus State University

  • T-Mobile, Seattle, WA, USA: Sr. Data analyst, 2018- 2021
  • UITS, Columbus State University, Columbus, GA, USA: Data Analyst -Graduate assistant, 2016-2018
  • Menlo Technologies, India: Jr. Data Analyst, Intern, 2014- 2016

Courses Taught:   DATA 4310 - Statistical Data Mining

Publications:

  • Anti-Phishing Approaches in the Era of the Internet of Things. In: Pathan, AS.K. (eds) Towards a Wireless Connected World: Achievements and New Technologies. Springer, Cham -   https://doi.org/10.1007/978-3-031-04321-5_3
  • An empirical analysis of image augmentation against model inversion attack in federated learning -   https://doi.org/10.1007/s10586-022-03596-1
  • M. Boyapati and R. Aygun, "Phishing Web Page Detection using Web Scraping," SoutheastCon 2023, Orlando, FL, USA, 2023, pp. 167-174, doi: 10.1109/SoutheastCon51012.2023.10115148.
  • M. Boyapati and R. Aygun, "Default Prediction on Commercial Credit Big Data Using Graph-based Variable Clustering," 2023 IEEE 17th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 2023, pp. 139-142, doi: 10.1109/ICSC56153.2023.00029.
  • Boyapati, M., Aygun, R. (2023) Explainable Machine Learning for Default Prediction on Commercial Credit Big Data Using Graph-based Variable Clustering. In Encyclopedia with Semantic Computing and Robotic Intelligence VOL. 0 https://doi.org/10.1142/S2529737623500119
  • Winners of Dataiku March Madness Bracket-thon, 2021 in predicting the NBA bracket
  • Winners of 2021 Analytics Day Ph.D. level research poster presentation 

Professional Objective:   To leverage strong analytical and technical abilities to research and develop effective data models, visualize data, and uncover insights that makes an impact in field of data science

Nina Grundlingh

Nina Grundlingh

Bachelor's Degree:   Applied Mathematics and Statistics, University of KwaZulu-Natal, South Africa

Master's Degree:   Statistics, University of KwaZulu-Natal, South Africa

Courses Taught:   Introduction to Statistics, University of KwaZulu-Natal

  • Grundlingh, N., Zewotir, T., Roberts, D. & Manda, S. Modelling diabetes in South Africa. The 61st conference of the South African Statistical Association, 27-29 November 2019, Nelson Mandela University, South Africa.
  • Grundlingh, N., Zewotir, T., Roberts, D. & Manda, S. Modelling diabetes in the South African population. College of Agriculture, Engineering and Science Postgraduate Research & Innovation Symposium 2019, 17 October 2019, University of KwaZulu-Natal, Westville, South Africa (the award for best MSc presentation was also received for this).
  • Grundlingh, N., Zewotir, T., Roberts, D. & Manda, S. Modelling risk factors of diabetes and pre-diabetes in South Africa. IBS SUSAN-SSACAB 2019 Conference, 8-11 September 2019, Cape Town, South Africa.
  • University of KwaZulu-Natal Postgraduate Research & Innovation Symposium 2019 – Best Masters oral presentation
  • South African Statistical Association Honours Project Competition 2018/2019 – 2nd place and special prize for best use of SAS

Professional Objective:   To work in a teaching position – sharing how data science can be applied to different fields and the positive impact it could have. I would like to use my theological background and passion to bring insight, clarity, and wisdom to data science problems. 

Namazbai Ishmakhametov

Namazbai Ishmakhametov

Bachelor's Degree:   Specialist in Mathematical Methods in Economics, Kyrgyz-Russian Slavic University

Master's Degree:   Analytics, Institute for Advanced Analytics at North Carolina State University

  • Expert at the Centre for Economic Research, National bank of the Kyrgyz Republic
  • Consultant in World Bank project dedicated to strengthening the regulatory practices in Kyrgyz Republic
  • Consultant at Deloitte Consulting LLP, Science Based Services group, Analytics & Cognitive offering
  • Macroeconomic modeling expert in the Economic Department, National bank of the Kyrgyz Republic

Courses Taught:   Introductory statistics and econometrics (cross-sections, times series and panels) lecturer at Ata-Turk Alatoo International University, Kyrgyzstan

  • Ishmakhametov Namazbai, Abdygulov Tolkunbek, Jenish Nurbek. 2020. “ Impact of 2014-2015 shocks on economic behavior of the households in the Kyrgyz Republic ". Working Paper of the National Bank of the Kyrgyz Republic
  • Sherrill W. Hayes, Jennifer L. Priestley, Namazbai Ishmakhametov, Herman E. Ray. 2020. “ I’m not Working from Home, I’m Living at Work ”: Perceived Stress and Work-Related Burnout before and during COVID-19”. PsyArxiv Preprints
  • Ishmakhametov Namazbai, Arykov Ruslan. 2016. “ Credit Risk Model on the Example of the Commercial Banks of the Kyrgyz Republic ”. Working Paper of the National Bank of the Kyrgyz Republic
  • Namazbai Ishmakhametov, Anvar Muratkhanov.2015. “Modeling strategy of the Bank of the Kyrgyz Republic”. National bank of Poland – Swiss National bank joint seminar. Zurich, Switzerland

Professional Objective:   To apply my quantitative skills in the field of biotech either in corporate or government sector

Symon Kimitei

Symon Kimitei

Bachelor's Degrees:   Mathematics, Kennesaw State University, and Computer Science,  Kennesaw State University

Master's Degree:   Mathematics (Scientific Computing Concentration), Georgia State University 

Work History:   Senior Lecturer and Math Department Coordinator of Supplemental Instruction, Kennesaw State University

Courses Taught:   Calculus 1, Precalculus, Applied Calculus & College Algebra 

  • Haskin, S., Kimitei, S., Chowdhury, M., Rahman, F., Longitudinal Predictive Curves of Health-Risk Factors for American Adolescent Girls. Journal of Adolescent Health.  JAH-2021-00601R1
  • Symon K Kimitei,   Algorithms for Toeplitz Matrices with Applications to Image Deblurring . 2008. Georgia State University, Masters thesis. ScholarWorks 

Poster Presentations:

  • Kimitei, Symon & Sammie Haskin. "Nadaraya-Watson Kernel Regression Longitudinal Analysis of Healthcare Risk Factors of African American and Caucasian American Girls." Kennesaw State University R Day Presentation.  11 Nov. 2019. Poster presentation.
  • Kimitei, Symon. " Social Network Analysis in Supreme Court Case Rulings by Precedence Using SAS Optgraph/Python." 23rd Annual Symposium of Scholars. Kennesaw State University.  19 April. 2018. Poster presentation.

Professional Objective:   As a Ph.D. student in Analytics & Data Science, I hope to gain skills in the program that will propel me into a Data Scientist / Machine Learning Engineer with a specialization in the design and implementation of deep learning & machine learning algorithms.

Jitendra Sai Kota

Jitendra Sai Kota

Bachelor's Degree:   Computer Science & Engineering, Amrita Vishwa Vidyapeetham, India

Master's Degree:   Computer Science, Florida State University

Work History:   Teaching Assistant Professor in Computer Science at an Engineering College in India

Courses Taught:   Problem Solving & Program Design through C, Artificial Intelligence, Data Mining

Publications:  Kota, Jitendra Sai, Vayelapelli, Mamatha. 2020. "Predicting the Outcome of a T20 Cricket Game Based on the Players' Abilities to Perform Under Pressure". IEIE Transactions on Smart Processing and Computing 9(3):230-237.   DOI: 10.5573/IEIESPC.2020.9.3.230

Professional Objective:   to work in Data Science in a Corporate Environment

ResearchGate

Catrice Taylor

Catrice Taylor

Bachelor's Degree:   Economics, Clemson University 

Master's Degrees:  Applied Economics and Statistics, Clemson University, and Applied Statistics, Kennesaw State University 

Professional Objective:   To work as an industry data scientist in a corporate environment 

Sahar Yarmohammadtoosky

Sahar Yarmohammadtoosky

Bachelor's Degree:   Applied Mathematics, Sheikh Bahaei University, Isfahan, Iran 

Master's Degree:   Applied Mathematics, Iran University of Science & Technology, Tehran, Iran

Courses Taught:  Numerical Analysis and Linear Algebra, Iran University of Science & Technology

Publications:   Noah, G., Sahar, Y., Anthony P. & Hung, C.C. "ISODS: An ISODATA-Based Initial Centroid Algorithm". Accepted to: 10th International Conference on Information, March 6 - 8, 2021, Hosei University, Tokyo, Japan

Professional Objective:   My goal is to become a competent Data Science specialist capable of using my skills to bring meaning to data, getting a faculty position at a university

Martin Brown

Martin Brown

Bachelor's Degree:  Mathematics, Swansea University, United Kingdom

Master's Degree:  Mathematics, Murray State University

  • Graduate Research Assistant, Kennesaw State University, August 2020 to present
  • Graduate Teaching Assistant, Murray State University, August 2018 to May 2020

Course Taught:  Problem Solving in Mathematics

Publications:   Brown, Martin K. W. "Evaluating an Ordinal Output using Data Modeling, Algorithmic Modeling, and Numerical Analysis" (2020).   Murray State Thesis and Dissertations 168 .

Awards:  David Pryce History of Mathematics Prize 2017-2018

Professional Objective:  To pursue a career in data science, machine learning, and predictive analytics to solve real-world issues 

 Inchan Hwang

Inchan Hwang

Bachelor’s Degree: Computer Science, Georgia Southwestern State University

Master’s Degree: Software Engineering, Ajou University, South Korea

Courses Tutored: Precalculus, College Algebra, Calculus I at Georgia Southwestern State University

Tutoring College Algebra, Calculus I and II at Academic Skills Center, Georgia Southwestern State University Research Assistant at Intelligence of HyperConnected Systems Lab of Ajou University Fullstack web developer, windows system programmer in the cybersecurity industry Professional Objective: To work in big data analytics, and research and development of machine learning in engineering, and security

Duleep Prasanna Rathgamage Don

Duleep Prasanna Rathgamage Don

Bachelor's degree:   Physics and Mathematics, The Open University of Sri Lanka

Master's degree:   Mathematics, Georgia Southern University

  • Graduate Teaching Assistant, Georgia Southern University, 2016 - 2018
  • Graduate Teaching Assistant, University of Wyoming, 2019 - 2020

Courses Taught:   Trigonometry, and Calculus I & II

Publications/Presentations:

  • Don, R. D. and Iacob, I. E., ‘DCSVM: Fast Multi-class Classification using Support Vector Machines’,   International Journal of Machine Learning and Cybernetics .
  • Rathgamage Don, D., Iacob, E., ‘Divide and Conquer Support Vector Machine for Multiclass Classification’, Research Symposium (2018), Georgia Southern University.
  • Rathgamage Don, D., Iacob, E., ‘Multiclass Classification using Support Vector Machines’, MAA Southeastern Section Meeting (2018), Clemson University.

Professional Objective:   To work in big data analytics, and research and development of machine learning in engineering, and medicine

Linglin Zhang

Linglin Zhang

Bachelor’s Degree:   Biological Sciences, Hubei University, China

Master’s Degree:   Chemical Biology, University of Michigan and Bioinformatics, Georgia Institute of Technology

Selected Publications:   Rebecca Shen, Zhi Li, Linglin Zhang, Yingqi Hua, Min Mao, Zhicong Li, Zhengdong Cai, Yunping Qiu, Jonathan Gryak, Kayvan Najarian. (2018). Osteosarcoma Patients Classification Using Plain X-Rays and Metabolomic Data. 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 690-693, 2018.

Professional Objective:  To become a researcher in industry or academia. My background in Biology and Bioinformatics could provide me strong theoretical support on a research role in the health industry. The experience of doing an internship at Equifax equipped me of certain knowledge on business cases. 

Yihong Zhang

Yihong Zhang

Bachelor’s Degree:   Psychology Mathematics Interdisciplinary, Chatham University

Master’s Degree:   Mathematics and Statistics Allied with Computer Science, Georgia State University

  • Research Assistant - Collaborated with biomedical department to analyze and visualize microarray gene expression data, Facilitated in data pre-processing and machine learning modeling of clinical liver cirrhosis image data, Assisted in feature engineering of image analysis in deep learning for pathology diagnosis with Mayo Clinic’s pilot project.
  • Graduate Lab Assistant - Tutored students with statistics and math subjects.

Professional Objective:   Make better use of data in healthcare and bioinformatic industry as a data scientist.

2019 - 2020

Trent Geisler

Trent Geisler

Graduation Date:   Summer 2022

Dissertation:   Novel Instance-Level Weighted Loss Function for Imbalanced Learning

Dissertation Advisor:   Dr. Herman Ray

Current Position:   Assistant Professor, Department of Systems Engineering, United States Military Academy West Point

Srivatsa Mallapragada

Srivatsa Mallapragada

Bachelor’s Degree:  Mechanical Engineering, Andhra University College of Engineering, India

Master’s Degree: Mechanical Engineering, University of North Carolina at Charlotte

Continuous Improvement Intern, Daimler Trucks North America at Cleveland, North Carolina, USA Computational Fluid Dynamics (CFD) Graduate Research Assistant, NC Motorsports and Research Laboratory Manufacturing Intern, Caterpillar India Pvt Ltd, Sriperambudur, India Selected Publications/Presentations:

Mallapragada, S. (2017). Computational Investigations on the Aerodynamics of a Generic Car Model in Proximity to a Side Wall (Master’s thesis, The University of North Carolina at Charlotte). Uddin, M., Mallapragada, S., & Misar, A. (2018). Computational Investigations on the Aerodynamics of a Generic Car Model in Proximity to a Side-Wall (No. 2018-01-0704). SAE Technical Paper. Dimensionality Reduction of Hyperspectral Images for Classification, Srivatsa, M., Michael, W. & Hung, C. C. Ninth International Conference on Information ISSN: 1343-4500 Bounds, C., Mallapragada, S., and Uddin, M., "Overset Mesh-Based Computational Investigations on the Aerodynamics of a Generic Car Model in Proximity to a Side-Wall," SAE Int. J. Passeng. Cars - Mech. Syst. 12(3):211-223, 2019, https://doi.org/10.4271/06-12-03-0015. Service and Awards: Base SAS Programmer V9 Professional Objectives: I am currently working in unsupervised pattern recognition in high dimensional data sets. After I graduate, I would like to pursue a career in Data Science and Machine Learning in the corporate environment.

Sudhashree Sayenju

Sudhashree Sayenju

Graduation Date:   Spring 2023

Dissertation:   Quantification and Mitigation of Various Types of Biases in Deep NLP Models

Dissertation Advisor:   Dr. Ramazan Aygun

Christina Stradwick

Christina Stradwick

Bachelor’s Degree:  Music Performance and Mathematics, Marshall University

Master’s Degree:  Mathematics with Emphasis in Statistics, Marshall University

Courses Taught:  Prep for College Algebra at Marshall University

Selected Presentations:

  • Stradwick, C. Exploring the Variance of the Sample Variance. Spring Meeting of the Mathematical Association of America Ohio Section, University of Akron, 2019.
  • Stradwick, C., Vaughn, L., Hanan Khan, A. Data Modeling on Insurance Beneficiary Dataset. College of Science Research Expo 2018, Marshall University, 2018. Poster Presentation.
  • Stradwick, C. Disease modeling on networks. The 13th Annual UNCG Regional Mathematics and Statistics Conference, University of North Carolina at Greensboro, 2017. Poster Presentation.

Professional Objectives:  To work as a researcher in industry or in a laboratory setting. I would like to use my background in mathematics and statistics to develop novel solutions that address limitations in current data science techniques and to apply known data science methods to solve real-world problems.

2018 - 2019

Md Shafiul Alam

Md Shafiul Alam

Graduation Date:   Fall 2022

Dissertation:   Appley:   App roximate Shap ley   Values for Model Explainability in Linear Time

Dissertation Advisor:   Dr. Ying Xie

Current Position:   AI Framework Engineer, Intel Corporation

Jonathan Boardman

Jonathan Boardman

Dissertation:   Ethical Analytics: A Framework for a Practically-Oriented Sub-Discipline of AI Ethics

Current Position:   Data Scientist, Equifax

Tejaswini Mallavarapu

Tejaswini Mallavarapu

Bachelor’s Degree:   Pharmacy, Acharya Nagarjuna University, India

Master’s Degree:   Computer Science, Kennesaw State University

  • Graduate Research Assistant, Kennesaw State University, 2017-present
  • Research Analyst, Divis Laboratories, 2013-2014

Selected Publications:

  • T. Mallavarapu, Y. Kim, J.H. Oh, and M. Kang, "R-PathCluster: Identifying Cancer Subtype of Glioblastoma Multiforme Using Pathway-Based Restricted Boltzmann Machine," Proceedings of IEEE International Conference on Bioinformatics & Biomedicine (IEEE BIBM 2017), International Workshop on Deep Learning in Bioinformatics, Biomedicine, and Healthcare Informatics, Accepted, 2017.
  • M.R. Shivalingam, K.S.G. Arul Kumaran, D. Jeslin, Ch. MadhusudhanaRao, M. Tejaswini, "Design and Evaluation of Binding Properties of Cassia roxburghii Seed Galacto mannan and Moringa oleifera Gum in the Formulation of Paracetamol Tablets," Research Journal of Pharmacy and Technology(RJPT). 3(1): Jan.-Mar. 2010; Page 254-256.
  • M.R. Shivalingam, K.S.G. Arul Kumaran, D. Jeslin, Y.V. Kishore Reddy, M. Tejaswini, Ch. MadhusudhanaRao, V. Tejopavan, "Cassia roxburghii Seed Galacto manna— a potential binding agent in the tablet formulation," Journal of Biomedical Science and Research(JBSR), Vol 2 (1), 2010, 18-22

Professional Objective:   To be a data scientist in the field of health care or bioinformatics where I can leverage my analytical skills and knowledge towards the advancement of the research field.

Seema Sangari

Seema Sangari

Dissertation:   Debiasing Cyber Incidents - Correcting for Reporting Delays and Under-reporting

Dissertation Advisor:   Dr. Michael Whitman

Current Position:   Principal Modeler, HSB 

Srivarna Janney

Srivarna Settisara Janney

Bachelor’s Degree:   Mechanical Engineering, Visveswaraiah Technological University, India

  • Graduate Research Assistant, Kennesaw State University, 2016-2018
  • Senior Software Engineer, Torry Harris Business Solutions (THBS), United Kingdom, 2010-2012 and India, 2012-2014
  • Software Engineer, Torry Harris Business Solutions (THBS), India, 2007-2010

Selected Publications/Presentations:

  • S.S. Janney, S. Chakravarty, “New Algorithms for CS – MRI: WTWTS, DWTS, WDWTS”, One-page research paper, 40th International Conference of IEEE Engineering in Medicine and Biology Society (IEEE EMBC), Jul 2018
  • Master thesis presented at Southeast Symposium on Contemporary Engineering Topics (SSCET), UAH Engineering Forum, Alabama, Aug 2018
  • Master thesis poster is accepted to be presented at Biomedical Engineering Society (BMES) 2018 Annual Meeting, Oct 2018
  • Submitted draft copy for book chapter contribution on “Bioelectronics and Medical Devices”, Elsevier Publisher, May 2018
  • Showcased 3MT, Georgia Council of Graduate Schools (GCGS), Apr 2018
  • Master thesis presented in workshop for “Medical Signal and Image Processing” at Department of Biotechnology & Medical Engineering, NIT Rourkella, Feb 2018
  • S.S. Janney, I. Karim, J. Yang, C.C Hung, Y. Wang, “Monitoring and Assessing Traffic Safety Using Live Video Images”, GDOT project showcase, 4th Annual Transportation Research Expo, Sept 2016
  • 1st Place Winner, Graduate Research Project, C-day Poster Presentation, Kennesaw State University, Spring 2018
  • People's Choice Award, 3 Minute Thesis (3MT), Apr 2018
  • CCSE Dean’s 4.0 Club, Jan 2018
  • 3rd Place Winner, Hackathon 2017 - HPCC Systems Big Data
  • Foundation of Computer Science, Certified by Kennesaw State University, Jun 2016
  • Fundamental of RESTful API Design, Certified by APIGEE, Nov 2014
  • Member of HandsOnAtlanta, since 2014
  • SOA Associate, Certified by IBM, Jun 2008

Professional Objective:   I would like to be a researcher in Data Science and Analytics in medical imaging technologies contributing to advancements that would help medical and healthcare professionals provide value-based and personalized health care. I would like to look at career opportunities in industry and academia that fuel my interest in research.

2017 - 2018

Liyuan Liu

Graduation Date: Summer 2021

Dissertation: Incentive-based Data Sharing and Exchanging Mechanism Design

Dissertation Advisor: Dr. Meng Han

Current Position: Assistant Professor, Saint Joseph's University - Erivan K. Haub School of Business

Mohammad Masum

Mohammad Masum

Dissertation: Integrated Machine Learning Approaches to Improve Classification Performance and Feature Extraction Process for EEG Dataset

Dissertation Advisor: Dr. Hossain Shahriar

Current Position: Assistant Professor, San Jose State University

Lauren Staples

Lauren Staples

Graduation Date: Fall 2021

Dissertation: A Distance-Based Clustering Framework for Categorical Time Series: A Case Study in the Episodes of Care Healthcare Delivery System

Dissertation Advisor: Dr. Joseph DeMaio

Current Position: Senior Data Scientist, Microsoft

2016 - 2017

Shashank Hebbar

Shashank Hebbar

Dissertation: Tree-BERT - Advanced Representation Learning for Relation Extraction

Dissertation Advisor: Dr. Ying Xie

Current Position: Data Scientist, Credigy

Jessica Rudd

Jessica Rudd

Graduation Date: Summer 2020

Dissertation: Quantitatively Motivated Model Development Framework: Downstream Analysis Effects of Normalization Strategies

Dissertation Advisor: Dr. Herman Ray

Current Position: Senior Data Engineer, Intuit Mailchimp

Yan Wang

Graduation Date: Spring 2020

Dissertation: Data-driven Investment Decisions in P2P Lending: Strategies of Integrating Credit Scoring and Profit Scoring

Dissertation Advisor: Dr. Sherry NI

Current Position: Applied Scientist II, Amazon

Lili Zhang

Dissertation: A Novel Penalized Log-likelihood Function for Class Imbalance Problem

Current Position: Data Scientist/Research Engineer, Hewlett Packard Enterprise

Yiyun Zhou

Dissertation: Attack and Defense in Security Analytics

Dissertation Advisor: Dr. Selena He

Current Position: NLP Data Scientist, NBME

2015 - 2016

Edwin Baidoo

Edwin Baidoo

Graduation Date:  Spring 2020

Dissertation: A Credit Analysis of the Unbanked and Underbanked: An Argument for Alternative Data

Dissertation Advisor:  Dr. Stefano Mazzotta

Current Position: Assistant Professor, Business Analytics, Tennessee Technological University

Bogdan Gadidov

Bogdan Gadidov

Graduation Date:  Summer 2019

Dissertation: One- and Two-Step Estimation of Time Variant Parameters and Nonparametric Quantiles

Dissertation Advisor: Dr. Mohammed Chowdhury

Current Position: Data Scientist, Variant

Jie Hao

Dissertation:  Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis

Dissertation Advisor:  Dr. Mingon Kang

Current Position:  Assistant Professor, Chinese Academy of Medical Sciences, Peking Union Medical College

Linh Le

Graduation Date:  Spring 2019

Dissertation:  Deep Embedding Kernel

Current Position: Assistant Professor, Information Technology, Kennesaw State University

Bob Vanderheyden

Bob Venderheyden

Graduation Date: Fall 2019

Dissertation:  Ordinal Hyperplane Loss

Dissertation Advisor:  Dr. Ying Xie

Current Position:  Principal Data Scientist, Microsoft

Contact Info

Kennesaw Campus 1000 Chastain Road Kennesaw, GA 30144

Marietta Campus 1100 South Marietta Pkwy Marietta, GA 30060

Campus Maps

Phone 470-KSU-INFO (470-578-4636)

kennesaw.edu/info

Media Resources

Resources For

Related Links

  • Financial Aid
  • Degrees, Majors & Programs
  • Job Opportunities
  • Campus Security
  • Global Education
  • Sustainability
  • Accessibility

470-KSU-INFO (470-578-4636)

© 2024 Kennesaw State University. All Rights Reserved.

  • Privacy Statement
  • Accreditation
  • Emergency Information
  • Reporting Hotline
  • Open Records
  • Human Trafficking Notice

logo

  • Mission and Goals
  • DEI Commitment and Resources
  • In Memoriam
  • The Halıcıoğlu Challenge
  • 5-Year Report
  • Administration
  • Visiting Scholars
  • Founding Faculty
  • Artificial Intelligence and Machine Learning
  • Biomedical Data Science
  • Data Infrastructure and Systems
  • Data Science for Scientific Discovery
  • Data and Society
  • Theoretical Foundations of Data Science
  • Visiting Scholar Program
  • MS / PhD Admissions
  • MSDS Course Requirements
  • Degree Questions
  • PhD Course Requirements
  • PhD Student Resources
  • Research Rotation
  • Spring Evaluation Requirements
  • Course Descriptions
  • Course Offerings
  • Career Services
  • Graduate Advising
  • Online Masters Program
  • Academic Advising
  • Concurrent Enrollment
  • Course Descriptions and Prerequisites
  • Enrolling in Classes
  • Financial Opportunities
  • Major Requirements
  • Minor Requirements
  • OSD Accommodations
  • Petition Instructions
  • Student Representatives
  • Selective Major Application
  • Prospective Double Majors
  • Prospective First-Year Students
  • Prospective Transfer Students
  • Partnership Programs
  • Research Collaboration
  • Access to Talent
  • Professional Development
  • UCTV Data Science Channel
  • Alumni Relations
  • Giving Back

Give us a call or drop by anytime, we endeavor to answer all inquiries within 24 hours.

map

PO Box 16122 Collins Street West Victoria, Australia

[email protected] / [email protected]

Phone support

Phone: + (066) 0760 0260 / + (057) 0760 0560

PhD Program

Requirements for doctor of philosophy (ph.d.) in data science.

The goal of the doctoral program is to create leaders in the field of Data Science who will lay the foundation and expand the boundaries of knowledge in the field. The doctoral program aims to provide a research-oriented education to students, teaching them knowledge, skills and awareness required to perform data driven research, and enabling them to, using this shared background, carry out research that expands the boundaries of knowledge in Data Science. The doctoral program spans from foundational aspects, including computational methods, machine learning, mathematical models and statistical analysis, to applications in data science.

Course Requirements

https://datascience.ucsd.edu/graduate/phd-program/phd-course-requirements/ 

Research Rotation Program

https://datascience.ucsd.edu/graduate/phd-program/research-rotation/

Preliminary Assessment Examination

The goal of the preliminary assessment examination is to assess students’ preparation for pursuing a PhD in data science, in terms of core knowledge and readiness for conducting research. The preliminary assessment is an advisory examination.

The preliminary assessment is an oral presentation that must be completed before the end of Spring quarter of the second academic year. Students must have a GPA of 3.0 or above to qualify for the assessment and have completed three of four core required courses . The student will choose a committee consisting of three members, one of which will be the HDSI academic advisor of the student. The other two committee members must be HDSI faculty members with  0% or more appointments; we encourage the student to select the second faculty member based on compatibility of research interests and topic of the presentation. The student is responsible for scheduling the meeting and making a room reservation. 

The student may choose to be evaluated based on (A) a scientific literature survey and data analysis or (B) based on a previous rotation project. The student will propose the topic of the presentation. 

  • If the student chooses the survey theme, they should select a broad area that is well represented among HDSI faculty members, such as causal inference, responsible AI, optimization, etc. The student should survey at least 10 peer-reviewed conference or journal papers representative of the last (at least) 5 years of the field. The student should present a novel and rigorous original analysis using publicly available data from the surveyed literature: this analysis may aim to answer a related or new research question.
  •  If the student chooses the rotation project theme, they should prepare to discuss the motivation for the project, the analysis undertaken, and the outcome of the rotation. 

For both themes, the student will describe their topic to the committee by writing a 1-2 page proposal that must be then approved by the committee. We emphasize that this is not a research proposal. The student will have 50 minutes to give an oral presentation which should include a comprehensive overview of previous work, motivation for the presented work or state-of-the-art studies, a critical assessment of previous work and of their own work, and a future outlook including logical next steps or unanswered questions. The presentation will then be followed by a Q&A session by the committee members; the entire exam is expected to finish within two hours. 

The committee will assess both the oral presentation as well as the student’s academic performance so far (especially in the required core courses). The committee will evaluate preparedness, technical skills, comprehension, critical thinking, and research readiness. Students who do not receive a satisfactory evaluation will receive a recommendation from the Graduate Program Committee regarding ways to remedy the lacking preparation or an opportunity to receive a terminal MS in Data Science degree provided the student can meet the degree requirements of the MS program . If the lack of preparation is course-based, the committee can require that additional course(s) be taken to pass the exam. If the lack of preparation is research-based, the committee can require an evaluation after another quarter of research with an HDSI faculty member; the faculty member will provide this evaluation. The preliminary assessment must be successfully completed no later than completion of two years (or sixth quarter enrollment) in the Ph.D. program. 

The oral presentation must be completed in-person. We recommend the following timeline so that students can plan their preliminary assessments:

  • Middle of winter quarter of second year: Student selects committee and proposes preliminary exam topic.  
  • Beginning of spring quarter of second year: Scheduling of exam is completed. 
  • End of spring quarter of second year: Exam. 

Research Qualifying Examination and Advancing to Candidacy

A research qualifying examination (UQE) is conducted by the dissertation committee consisting of five or more members approved by the graduate division as per senate regulation 715(D). One senate faculty member must have a primary appointment in the department outside of HDSI. Faculty with 25% or less partial appointment in HDSI may be considered for meeting this requirement on an exceptional basis upon approval from the graduate division.

The goal of UQE is to assess the ability of the candidate to perform independent critical research as evidenced by a presentation and writing a technical report at the level of a peer-reviewed journal or conference publication. The examination is taken after the student and his or her adviser have identified a topic for the dissertation and an initial demonstration of feasible progress has been made. The candidate is expected to describe his or her accomplishments to date as well as future work. The research qualifying examination must be completed no later than fourth year or 12 quarters from the start of the degree program; the UQE is tantamount to the advancement to PhD candidacy exam.

A petition to the Graduate Committee is required for students who take UQE after the required 12 quarters deadline. Students who fail the research qualifying examination may file a petition to retake it; if the petition is approved, they will be allowed to retake it one (and only one) more time. Students who fail UQE may also petition to transition to a MS in Data Science track.

Dissertation Defense Examination and Thesis Requirements

Students must successfully complete a final dissertation defense oral presentation and examination to the Dissertation Committee consisting of five or more members approved by the graduate division as per senate regulation 715(D).  One senate faculty member in the Dissertation Committee must have a primary appointment in a department outside of HDSI. Partially appointed faculty in HDSI (at 25% or less) are acceptable in meeting this outside-department requirement as long as their main (lead) department is not HDSI.

A dissertation in the scope of Data Science is required of every candidate for the PhD degree. HDSI PhD program thesis requirements must meet Regulation 715(D) requirements. The final form of the dissertation document must comply with published guidelines by the Graduate Division.

The dissertation topic will be selected by the student, under the advice and guidance of Thesis Adviser and the Dissertation Committee. The dissertation must contain an original contribution of quality that would be acceptable for publication in the academic literature that either extends the theory or methodology of data science, or uses data science methods to solve a scientific problem in applied disciplines.

The entire dissertation committee will conduct a final oral examination, which will deal primarily with questions arising out of the relationship of the dissertation to the field of Data Science. The final examination will be conducted in two parts. The first part consists of a presentation by the candidate followed by a brief period of questions pertaining to the presentation; this part of the examination is open to the public. The second part of the examination will immediately follow the first part; this is a closed session between the student and the committee and will consist of a period of questioning by the committee members.

Special Requirements: Generalization, Reproducibility and Responsibility A candidate for doctoral degree in data science is expected to demonstrate evidence of generalization skills as well as evidence of reproducibility in research results. Evidence of generalization skills may be in the form of — but not limited to — generalization of results arrived at across domains, or across applications within a domain, generalization of applicability of method(s) proposed, or generalization of thesis conclusions rooted in formal or mathematical proof or quantitative reasoning supported by robust statistical measures. Reproducibility requirement may be satisfied by additional supplementary material consisting of code and data repository. The dissertation will also be reviewed for responsible use of data.

Special Requirements: Professional Training and Communications

All graduate students in the doctoral program are required to complete at least one quarter of experience in the classroom as teaching assistants regardless of their eventual career goals. Effective communications and ability to explain deep technical subjects is considered a key measure of a well-rounded doctoral education. Thus, Ph.D. students are also required to take a 1-unit DSC 295 (Academia Survival Skills) course for a Satisfactory grade.

Obtaining an MS in Data Science

PhD students may obtain an MS Degree in Data Science along the way or a terminal MS degree, provided they complete the requirements for the MS degree.

cds official logo

NYU Center for Data Science

Harnessing Data’s Potential for the World

PhD in Data Science

An NRT-sponsored program in Data Science

  • Areas & Faculty
  • Admission Requirements
  • Medical School Track
  • NRT FUTURE Program

Degree Requirements

Degree requirements for the PhD in Data Science can be found in the NYU bulletin –  Doctor of Philosophy in Data Science .

To be awarded the Ph.D. in Data Science, students must, within 10 years of first enrolling:

  • Complete 72 credit hours while maintaining a cumulative grade point average of 3.0 (out of 4.0) each semester.
  • Complete the teaching requirement  (for incoming students Fall 2020 and later) .
  • Pass a Comprehensive Exam.
  • Pass the Depth Qualifying Exam (DQE) by May 15 of their fourth semester.
  • Complete all the steps for approval of their Ph.D. dissertation.

For more information on the Ph.D.  curriculum and requirements please visit the Ph.D. Student Handbook . Please note you will only be able to access the handbook through your NYU email address.

Required Course Information

Students must successfully complete the following courses by the end of their third semester unless otherwise stated or show evidence that they have taken equivalent coursework elsewhere. Recent course pages are linked below. Course descriptions can be found in NYU’s  Albert Course Search .

  • DS-GA 2003 – Introduction to Data Science for PhD Students
  • DS- GA 1002 – Probability and Statistics for Data Science
  • DS-GA 1003 – Machine Learning
  • DS-GA 1004 – Big Data
  • DS-GA 1005 – Inference and Representation
  • A research rotation is a semester-long guided research experience in which the student will have an opportunity to design and carry out original research in a collaborative setting. The idea is to help students identify research interests. Ph.D. students take this course 6 times.

39 credit hours of elective courses  (for incoming students starting Fall 2020 and later)

Students must successfully complete 39 credit hours of elective courses. Faculty at the Center for Data Science are experts in a broad range of data science topics, and the Center’s course offerings reflect that diversity. For example, students will be able to take courses in Deep Learning, Optimization, and Natural Language Processing.

Some of the electives offered at the Center for Data Science are below. Please see NYU’s  Albert Course Search  for course descriptions.

  • Deep Learning (DS-GA 1008)
  • Practical Training for Data Science (DS-GA 1009):  Practical Training offers course credit for the academically relevant internship experience. This is an integral part of the Ph.D. Program curriculum and facilitates students with academic and professional development. The course allows students to apply their academic and research knowledge to real-world problems.
  • Independent Study (DS-GA 1010)
  • Natural Language Processing with Representation Learning (DS-GA 1011)
  • Natural Language Understanding and Computational Semantics (DS-GA 1012)
  • Mathematical Tools for Data Science (DS-GA 1013)
  • Optimization and Computational Linear Algebra (DS-GA 1014)
  • Text as Data (DS-GA 1015)
  • Computational Cognitive Modeling (DS-GA 1016)
  • Responsible Data Science (DS-GA 1017)
  • Probabilistic Time Series Analysis (DS-GA 1018)
  • Communication Skills (DS-GA 2002)

Students can take electives outside of the Center of Data Science with permission from the Director of Graduate Studies (DGS).

Typical Schedule (Incoming Students Fall 2020 and later)

Typically, a student’s first 3 years will follow a schedule like the one outlined below. The student’s remaining years will consist of electives and work on his or her research and dissertation.

  • DS-GA 2003 Introduction to Data Science for PhD Students
  • DS-GA 1002 Probability and Statistics for Data Science
  • DS-GA-2001 Research Rotation
  • DS-GA 1003 Machine Learning
  • DS-GA 1004 Big Data
  • DS-GA 2001 Research Rotation
  • DS-GA 1005 Inference and Representation
  • Approved elective
  • Approved Elective

Teaching Requirement  (for incoming students starting Fall 2020 and later)

By the end of the fourth year of study, each student must have served as a section leader or instructor for at least two courses at the Center for Data Science (for students starting the program in Fall 2023 or later). For students who started the program between Fall 2020 – Fall 2022, the requirement is at least one course at the Center for Data Science.

Courses on related topics outside the Center may also be used to satisfy this requirement subject to approval by the DGS. The student must also participate in the Center’s teacher training session at or prior to the semester in which they teach. In certain circumstances, the DGS may allow the student to satisfy this requirement by serving as a course assistant or as a grader.  These exceptions will be determined by the DGS based on the availability of suitable recitations.

Comprehensive Exam

The comprehensive exam is designed to determine whether the candidate displays the requisite data science knowledge to pursue their research.

For students starting the program in Fall 2024 and later: To fulfill this requirement, students will submit a 4-page report describing their work during their first year and a plan of their future research at the end of their second semester. The student will also give a 10-minute presentation in front of a pre-committee of three faculty (which will include their research advisors). The committee will determine whether the student is progressing adequately based on their academic performance (including grades and feedback from course instructors), the presentation, and the report.

For students who started the program prior to Fall 2024: The comprehensive exam consists of material from DS-GA 1003 Machine Learning and DS-GA 1004 Big Data. To fulfill this requirement, students must receive an A- or above as their final grade for each of the courses above  (for students starting Fall 2020 – Fall 2023) . Students are expected to complete this requirement by the end of their second semester.

Depth Qualifying Exam (DQE)

No later than the end of the third semester, each student must:

  • Agree with a research advisor. The student is responsible for finding a research advisor, obtaining an agreement to advise the student, and informing the Director of Graduate Studies (DGS) of the agreement. Students must reach an agreement with the DGS and the Manager of Academic Affairs if they wish to change research advisors. If a research advisor determines that he or she no longer wishes to advise a student, the research advisor informs the DGS who will begin working with the student to find another research advisor.
  • Agree with his or her research advisor on a research project, an exam topic, and a Depth Qualifying Exam (DQE) committee.
  • Obtain the approval of the DGS on the research project, exam topic, and DQE committee, as well as the date of the DQE exam.

No later than the end of his fourth semester, the student must pass the depth qualifying exam (DQE). The exam may be taken no more than twice. The content of the exam is defined by the student’s DQE Committee, which must present a syllabus to the student at least 2 months before the date of the exam.

For incoming students Fall 2020 and later, the exam itself consists of a presentation by the student on original research carried out independently or in collaboration with faculty, research staff, or other students. This can include research done in the research rotations or other research conducted by the student in their area of interest. The goal of the DQE is to confirm the student’s knowledge of research in their area of interest.

Ph.D. Dissertation

Dissertation proposal approval.

CDS PhD students are encouraged to identify their dissertation proposal committee by the end of their second year. Students should consult with their advisor and/or the DGS. The student works with their research advisor to select a dissertation proposal approval committee, obtains approval of this committee from the DGS, submits a written dissertation proposal to the committee, and obtains the approval of the committee. The committee consists of at least three members, which may consist of individuals with similar standing outside of CDS. At least one member must be a CDS faculty member (CDS joint faculty member, member of the CDS PhD Advisory Group, or CDS affiliated (see the Areas & Faculty page ). Students should have their dissertation proposal approved no later than the end of their third year. However, this is a guideline. Students are encouraged to identify timing of the dissertation proposal in consultation with their advisor and/or the DGS.

DISSERTATION APPROVAL

A successful defense is required for award of the PhD. 

The PhD defense committee must have at least 5 members, including the advisor(s), three of whom must be CDS faculty (CDS joint faculty member, member of the CDS PhD Advisory Group, or CDS affiliated (see Areas & Faculty page ), and 1 external member (in related area from another NYU department or from an area institution, with approval from DGS). The membership of the defense committee is proposed by the student and approved by the DGS.

In addition, students must comply with all of the procedures of  NYU’s Graduate of School of Arts and Science related to the submission of their dissertation.

Boston University Academics

Boston University

  • Campus Life
  • Schools & Colleges
  • Degree Programs
  • Search Academics

PhD in Computing & Data Sciences

For more information and to get in touch, please visit the Faculty of Computing & Data Sciences website .

The PhD program in Computing & Data Sciences (CDS) at Boston University prepares its graduates to make significant contributions to the art, science, and engineering of computational and data-driven processes that are woven into all aspects of society, economy, and public discourse, leading to solution of problems and synthesis of knowledge related to the methodical, generalizable, and scalable extraction of insights from data as well as the design of new information systems and products that enable actionable use of those insights to advance scholarly as well as practical pursuits in a wide range of application domains.

Applicants to the PhD program in CDS are expected to have earned a bachelor’s or master’s degree in one of the methodological or applied disciplines relating to the computational and data-driven areas of scholarship in CDS. They are expected to possess basic mathematical and computational competencies, and demonstrable propensity for cross-disciplinary work. To accommodate a diversity of student backgrounds and preparations, a holistic admission review is utilized. As such, GRE tests and scores are not required, but could be optionally provided and considered as part of the applicant’s portfolio, which may also include evidence of prior, relevant preparation, including creative works, software code repositories, etc. Special attention will be paid to applicants from underrepresented minorities in computing and data science disciplines.

Completion of the PhD degree in CDS requires coursework covering breadth and depth topics spanning the foundational, applied, and sociotechnical dimensions of computing and data science; completion of research rotations that expose students to ongoing projects; completion of a cohort-based training on ethical and responsible computing; and successful proposal and defense of a doctoral thesis.

For their thesis work, and in preparation for careers in academia, industry, and government, CDS PhD students are expected to pursue theoretical, applied, or empirical studies leading to solution of new problems and synthesis of new knowledge in a topic area determined in consultation with their mentors and collaborators, which may include external researchers and practitioners in industrial and academic research laboratories.

Upon completion of the program, students will be prepared to pursue careers in which they lead independent cutting-edge research and development agendas, whether in academia (by teaching, mentoring, and supervising teams of students engaged in scholarly pursuits) or in industry (by collaborating, directing, and effectively managing diverse teams of practitioners working at the forefront of industrial R&D).

Learning Outcomes

The following learning outcomes explain what you will be able to do at the end of your time as a CDS PhD candidate, as a result of earning your degree.

  • Exhibit a strong grasp of the principles governing the design and implementation of the methodological approaches for computational and data-driven inquiry.
  • Identify the literature and demonstrate mastery of the compendium of works relevant to a well-defined area of research inquiry in computing and data sciences.
  • Show capacity to engage meaningfully in and materially contribute to multidisciplinary research and development endeavors.
  • Evidence a strong sense of social and professional responsibility for decisions related to the development and deployment of computational and data-driven technologies.
  • Assess and argue the merits, limitations, and possibilities of new research work in a specialized area at the level commensurate with standards of scholarly venues in that area.
  • Formulate and pursue a research agenda leading to solution of new problems and to synthesis of new knowledge shared through peer-reviewed publications.

Course Requirements

Sixteen semester courses (64 credits) are required for post-BA/BS students and 12 semester courses (48 credits) are required for post-MA/MS students. Students with prior graduate work (including master’s degrees) may be able to transfer up to two courses (8 credits) as long as these credits were not used to fulfill matriculation requirements, upon the recommendation of the student’s academic advisor, and subject to approval by the Associate Provost for CDS.

Of the 16 courses, up to 3 undergraduate courses (12 credits) may be counted as background courses, selected in consultation with the student’s academic advisor and subject to approval by the Associate Provost for CDS. Other than these remedial courses, all other courses must be graduate-level courses or directed studies offered by CDS or by other BU departments in order to satisfy the following degree requirements.

The methodology core requirement ensures that students possess foundational knowledge and competencies in a subset of the following eight methodological areas of CDS:

  • Mathematical Foundations of Data Science
  • Statistical Modeling and Inference
  • Efficient and Scalable Algorithms
  • Predictive Analytics and Machine Learning
  • Combinatorial Optimization and Algorithms
  • Computational Complexity
  • Programming and Software Design
  • Large-scale Data Management

A list of courses that can be used to satisfy these competencies will be maintained on the website for CDS. Students who start their PhD program in CDS are expected to satisfy at least six of these competencies. Students who complete the course requirement for the PhD program in a cognate discipline are expected to satisfy at least four of these competencies.

The subject core requirement ensures that students establish depth in one area of inquiry that is aligned with either the methodological or applied dimensions of CDS. Subject areas are defined by groups of CDS faculty members working in related disciplinary and/or interdisciplinary areas of research who expect their prospective students to have enough depth in the subset of topics to enable them to tackle doctoral-level research in these topics. The set of subject areas as well as a list of preapproved graduate-level courses offered in CDS or elsewhere at BU that can be used to satisfy each subject area will be maintained on the website for CDS.

During the first two years in the program, all PhD candidates in CDS must complete three cohort-based requirements; namely, a two-semester training course (4 credits) covering various aspects of the responsible and ethical conduct of computational and data-driven research, a two-semester doctoral seminar (4 credits) that introduces them to the research portfolios of CDS faculty members as well as to the skills and capacities needed for success as scholars, and at least two research or lab rotations (8 credits) that expose them to real-world computational and data-driven applications that must be tackled through effective multidisciplinary teamwork.

A cumulative GPA not less than 3.3 must be maintained for all non-Pass/Fail courses taken to satisfy the methodology core requirement and the subject core requirement of the degree, excluding any background courses and excluding any transferred credits. Students who receive grades of B– or lower in any three courses taken at BU will be withdrawn from the program.

Language Requirement

There is no foreign language requirement for the PhD degree in CDS.

Qualifying Examinations

No later than the end of the sixth semester (third year), all PhD candidates in CDS must pass a public oral examination administered by a committee of three faculty members, chaired by the student’s research (and presumptive thesis) advisor or coadvisors. The oral area exam is meant to establish the student mastery of a well-defined area of scholarship and preparedness to pursue original research in that area. The oral area examination may require completion of a survey paper or completion of a pilot project ahead of the examination. The scope as well as any additional requirements needed for the examination should be developed in consultation with and approval of the research advisor(s), at least one semester prior to the exam.

Dissertation and Final Oral Examination

Candidates shall demonstrate their abilities for independent study in a dissertation representing original research or creative scholarship. A prospectus for the dissertation must be successfully defended no later than the end of the eighth semester (fourth year) of study.

Candidates must undergo a final oral examination no later than the end of the 10th semester (fifth year) of study in which they defend their dissertation as a valuable contribution to knowledge in their field and demonstrate a mastery of their field of specialization in relation to their dissertation.

Both the prospectus and final dissertation must be administered by a dissertation committee of at least three readers (including the dissertation advisor or coadvisors) and chaired by a CDS faculty member who is not one of the readers.

Related Bulletin Pages

  • Abbreviations and Symbols

Beyond the Bulletin

  • Faculty of Computing & Data Sciences
  • Data Science for Good
  • Impact Labs & Co-Labs
  • BS in Data Science
  • MS in Data Science
  • PhD in Computing & Data Sciences
  • Minor in Data Science

Terms of Use

Note that this information may change at any time. Read the full terms of use .

Accreditation

Boston University is accredited by the New England Commission of Higher Education (NECHE).

Boston University

  • © Copyright
  • Mobile Version

William & Mary

  • Arts & Sciences
  • Data Science
  • Degrees in Data Science
  • Ph.D. Studies in Data Science

Ph.D. with specialization in Data Science

The Ph.D. in Data Science Studies at William & Mary is offered as a specialization within Applied Science, with the core mission of training students in the use of exceptionally large, heterogeneous datasets to drive decisionmaking across a wide range of fields (from Physics to the social sciences).  Graduate students complete a core sequence of coursework as a cohort, and then work closely with an advisory committee to complete the degree program.  Competitive stipends and tuition are provided to selected students (stipends for AY23-24 are about $30,000).

To receive a Doctor of Philosophy in Applied Science with a Specialization in Data Science, the candidate must:

  • Complete a sequence of coursework, normally lasting two years.
  • Pass a comprehensive qualifying examination designed to demonstrate competence.
  • Produce and defend a dissertation prospectus which details anticipated research after coursework is completed, including preliminary quantitative works.
  • Carry out a substantive original research project, and produce a dissertation describing this research which is approved by the student’s advisory committee and successfully defended in a public oral examination.

What to Expect & What We Expect

Be prepared for a rigorous program that emphasizes the analysis of large datasets, frequently in applied domains using machine learning techniques. You will take courses in both the underlying mathematical foundations and computational techniques used to define, implement, and validate models across a range of disciplines. 

Generally, we expect students applying to this program will have a background in computer programming, probability and statistics.  Most successful candidates will also have some experience working with large datasets in applied contexts.  Python is the most commonly used language, though some courses and laboratories use alternatives such as R, Scala, or compiled languages.

Most students will start their program during the Fall semester (while spring admissions are possible, they will only occur under exceptional circumstances).  In many cases, a Ph.D. student might expect a schedule similar to:

  • Year 1:  Coursework (Mathematical and Computational Methods, Applied Machine Learning, Bayesian & Frequentist Statistics, Deep Learning, Network Analysis), first research lab experiences, TAing a course, Doctoral Research Seminar.
  • Year 2:  Coursework (Mathematical and Computational Methods II, Data Engineering, Natural Language Processing, Probabilistic Programming, Reinforcement Learning, Directed Research), qualifiers, dissertation prospectus defense, TAing a course.
  • Year 3+: Dissertation, full time research in a research lab, annual evaluation of progress to dissertation by the graduate committee.

See the Graduate Catalog for details.

How to Apply

Data Science Ph.D. students are admitted to the Graduate program in Applied Science , in which they will earn a specialization of Data Science.  Applications can be submitted online at the link below, or by clicking  here . Note that we do not today require GRE scores, but they can be optionally submitted for consideration.

Applications are accepted on a rolling basis, but we aim to make our first round of decisions during the spring semester each year.  To be as competitive as you can be, we recommend your application be submitted by February 15th. 

The application process includes:

  • Registering for an Account, and clicking "Start New Application"
  • Choosing to apply to the "Graduate Arts and Sciences" school, and then the "Applied Science" program.  
  • Providing various background and demographic information.
  • Providing a 3500 character (or less) synopsis of your past project experience, including (if any) publications.
  • Providing a 3500 character (or less) synopsis of your background, including extracurricular activities and general experience.
  • Writing a personal essay describing your career plans and rationale for the pursuit of graduate study in Data Science - we recommend you identify members of our faculty that may match your interests in this essay.
  • For your "Applied Science Research Interest", you will choose "Data Science".
  • For the advisor you are interested in working with, you will choose the graduate director, "Dan Runfola"; graduate advisor assignment will occur after admission.  You will leave your second choice blank.

If you have any questions about the admissions process, you can contact Dr. Dan Runfola, the Graduate Director of Applied Science ( [email protected] ).

Follow W&M on Social Media:

Williamsburg, Virginia

  • Accessibility
  • Consumer Information
  • Non-Discrimination Notice
  • Privacy & Security

phd thesis in data science

  • Doing a PhD in Data Science

What Is a PhD in Data Science?

If you have always been fascinated by science, especially if you are interested in statistics and the scientific method, then a PhD in Data Science might be for you.

Data science is a field of study dedicated to applying the science of statistics to the problem areas of data visualisation, data science and machine learning. In this field, the challenge is to use data analysis and mathematical formulas to predict data patterns and draw conclusions from them.

Data science has become popular because it covers a wide range of topics, including the use of statistical methods for analysing and interpreting data. The primary goal of the discipline is to explain the way data enters the scientific community and influences decisions. Data is analysed to find patterns and connections, and then possible solutions are explored. With big data and new statistical computing methods, patterns can be uncovered, and relationships can be tested.

As more and more industries rely on information generated by computers, data science will be one of the key players in the future.

Browse PhDs in Data Science

Application of artificial intelligence to multiphysics problems in materials design, study of the human-vehicle interactions by a high-end dynamic driving simulator, physical layer algorithm design in 6g non-terrestrial communications, machine learning for autonomous robot exploration, detecting subtle but clinically significant cognitive change in an ageing population, what does a phd in data science focus on.

The primary focus for a PhD in Data Science is statistical methods. This means that you would study statistics in all its forms at the macroscopic and microscopic level, including statistical computer science, theory and applied mathematics. The advantage is that you get an insight into how large-scale data works. Thus, a position in a company where you are analysing large amounts of project data can be made available through a PhD.

PhD programs in data science provide university students with a thorough grounding in the theoretical aspects , but they are also taught the practical aspects of the discipline. PhD students are taught how to conduct proper experiments and interpret the results of scientific studies.

The importance of data and its interpretation is of paramount importance in all fields, and a PhD programme in data science addresses this topic, with some institutions also offering taught modules that doctoral students can use to deepen their knowledge.

Within a data science field, there are several areas of focus. One of them is the analysis of large databases and their effective interpretation. With this doctoral qualification, you could conduct statistical analysis, research studies and even exploratory data analysis. You could see what kinds of relationships exist between variables. You can explore areas such as Databases, Human Resource Management Machine Learning, or Information Technology during your studies.

Entry Requirements for A PhD in Data Science

A PhD in Data Science involves conducting original research in this area; therefore, applicants must have a good knowledge of statistical methods, computing, probability calculation, statistics and other related topics.

Basic requirements are typically a strong Master’s degree in mathematics, computer science or statistics from an accredited university. International students will also need to meet several minimum English language requirements set by the university, usually as part of a TOEFL or IELTS exam.

Although there are many advantages to obtaining a PhD in Data Science, it requires hard work and perseverance to master the techniques of analysis; to become an effective researcher, you will need strong mathematical and logical skills.

If you are interested in a PhD in Data Science but are unsure whether you have the background or resources available, consider taking a Master’s degree in this subject, or if you are a prospective student, contact the department you are interested in to see if they have any advice for you.

Duration and Programme Types

You can earn a PhD in data science in as little as 3 years full-time or 6 years part-time at a leading university. There are also online courses; many universities offer online PhD programmes which allow you to complete your entire doctoral programme from home. You still need to meet your course requirements by attending lectures and doing laboratory work, but your work can be completed at your own pace and off-campus.

Costs and Funding

The cost of a PhD in Data Science will depend on the university you study with, but average tuition fee is £4000-£6000 per academic year for UK/EU students and £16,000-£19,000 per academic year for international students.

Due to the popularity of Data Science PhD projects and the increasing demand for individuals who can elaborately analyse large data sets , it is not difficult to obtain PhD funding in this area. In many cases, funding for full-time research can be obtained from the university’s Centre for Doctoral Training (CDT), covering tuition fees and living costs.

Available Career Paths

A PhD in Data Science will enhance your data analysis skills and allow you to specialise in areas not available to others. A PhD offers many opportunities for those interested in statistics; you could become an engineer, statistician, consultant or academic lecturer. There are even PhDs in Data Science that offer internships in financial institutions or government agencies. Upon completing your doctorate, you can enter the workforce in many areas depending on your aptitude and experience.

PhD data science uk

A PhD in Data Science can lead to a wide range of jobs in many fields. If you are interested in working for a company that uses data one way or another, a PhD would be the perfect choice for you. If you are interested in independent research and studying various scientific methods and data, you will do well with a PhD. You could also spend your time teaching or doing your own research.

A person who has a PhD in data science can work in many industry-related positions. For example, you may work in the financial industry as an analyst for mergers and acquisitions, in healthcare, as a statistician, or as an information systems administrator. You can even get a job as an IT analyst, project manager, and software designer.

You can use your knowledge in the workplace to start up your own small business. Many small businesses today are founded on the back of a PhD. In fact, many Fortune 500 companies started as a result of a doctor trying to solve a problem or answer a long-standing question plaguing their industry.

Browse PhDs Now

Join thousands of students.

Join thousands of other students and stay up to date with the latest PhD programmes, funding opportunities and advice.

Warning icon

Thesis/Capstone for Master's in Data Science | Northwestern SPS - Northwestern School of Professional Studies

  • Post-baccalaureate
  • Undergraduate
  • Professional Development
  • Pre-College
  • Center for Public Safety
  • Get Information

SPS Logo

Data Science

Capstone and thesis overview.

Capstone and thesis are similar in that they both represent a culminating, scholarly effort of high quality. Both should clearly state a problem or issue to be addressed. Both will allow students to complete a larger project and produce a product or publication that can be highlighted on their resumes. Students should consider the factors below when deciding whether a capstone or thesis may be more appropriate to pursue.

A capstone is a practical or real-world project that can emphasize preparation for professional practice. A capstone is more appropriate if:

  • you don't necessarily need or want the experience of the research process or writing a big publication
  • you want more input on your project, from fellow students and instructors
  • you want more structure to your project, including assignment deadlines and due dates
  • you want to complete the project or graduate in a timely manner

A student can enroll in MSDS 498 Capstone in any term. However, capstone specialization courses can provide a unique student experience and may be offered only twice a year. 

A thesis is an academic-focused research project with broader applicability. A thesis is more appropriate if:

  • you want to get a PhD or other advanced degree and want the experience of the research process and writing for publication
  • you want to work individually with a specific faculty member who serves as your thesis adviser
  • you are more self-directed, are good at managing your own projects with very little supervision, and have a clear direction for your work
  • you have a project that requires more time to pursue

Students can enroll in MSDS 590 Thesis as long as there is an approved thesis project proposal, identified thesis adviser, and all other required documentation at least two weeks before the start of any term.

From Faculty Director, Thomas W. Miller, PhD

Tom Miller

Capstone projects and thesis research give students a chance to study topics of special interest to them. Students can highlight analytical skills developed in the program. Work on capstone and thesis research projects often leads to publications that students can highlight on their resumes.”

A thesis is an individual research project that usually takes two to four terms to complete. Capstone course sections, on the other hand, represent a one-term commitment.

Students need to evaluate their options prior to choosing a capstone course section because capstones vary widely from one instructor to the next. There are both general and specialization-focused capstone sections. Some capstone sections offer in individual research projects, others offer team research projects, and a few give students a choice of individual or team projects.

Students should refer to the SPS Graduate Student Handbook for more information regarding registration for either MSDS 590 Thesis or MSDS 498 Capstone.

Capstone Experience

If students wish to engage with an outside organization to work on a project for capstone, they can refer to this checklist and lessons learned for some helpful tips.

Capstone Checklist

  • Start early — set aside a minimum of one to two months prior to the capstone quarter to determine the industry and modeling interests.
  • Networking — pitch your idea to potential organizations for projects and focus on the business benefits you can provide.
  • Permission request — make sure your final project can be shared with others in the course and the information can be made public.
  • Engagement — engage with the capstone professor prior to and immediately after getting the dataset to ensure appropriate scope for the 10 weeks.
  • Teambuilding — recruit team members who have similar interests for the type of project during the first week of the course.

Capstone Lesson Learned

  • Access to company data can take longer than expected; not having this access before or at the start of the term can severely delay the progress
  • Project timeline should align with coursework timeline as closely as possible
  • One point of contact (POC) for business facing to ensure streamlined messages and more effective time management with the organization
  • Expectation management on both sides: (business) this is pro-bono (students) this does not guarantee internship or job opportunities
  • Data security/masking not executed in time can risk the opportunity completely

Publication of Work

Northwestern University Libraries offers an option for students to publish their master’s thesis or capstone in Arch, Northwestern’s open access research and data repository.

Benefits for publishing your thesis:

  • Your work will be indexed by search engines and discoverable by researchers around the world, extending your work’s impact beyond Northwestern
  • Your work will be assigned a Digital Object Identifier (DOI) to ensure perpetual online access and to facilitate scholarly citation
  • Your work will help accelerate discovery and increase knowledge in your subject domain by adding to the global corpus of public scholarly information

Get started:

  • Visit Arch online
  • Log in with your NetID
  • Describe your thesis: title, author, date, keywords, rights, license, subject, etc.
  • Upload your thesis or capstone PDF and any related supplemental files (data, code, images, presentations, documentation, etc.)
  • Select a visibility: Public, Northwestern-only, Embargo (i.e. delayed release)
  • Save your work to the repository

Your thesis manuscript or capstone report will then be published on the MSDS page. You can view other published work here .

For questions or support in publishing your thesis or capstone, please contact [email protected] .

MIT Libraries home DSpace@MIT

  • DSpace@MIT Home
  • MIT Libraries

This collection of MIT Theses in DSpace contains selected theses and dissertations from all MIT departments. Please note that this is NOT a complete collection of MIT theses. To search all MIT theses, use MIT Libraries' catalog .

MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

MIT Theses are openly available to all readers. Please share how this access affects or benefits you. Your story matters.

If you have questions about MIT theses in DSpace, [email protected] . See also Access & Availability Questions or About MIT Theses in DSpace .

If you are a recent MIT graduate, your thesis will be added to DSpace within 3-6 months after your graduation date. Please email [email protected] with any questions.

Permissions

MIT Theses may be protected by copyright. Please refer to the MIT Libraries Permissions Policy for permission information. Note that the copyright holder for most MIT theses is identified on the title page of the thesis.

Theses by Department

  • Comparative Media Studies
  • Computation for Design and Optimization
  • Computational and Systems Biology
  • Department of Aeronautics and Astronautics
  • Department of Architecture
  • Department of Biological Engineering
  • Department of Biology
  • Department of Brain and Cognitive Sciences
  • Department of Chemical Engineering
  • Department of Chemistry
  • Department of Civil and Environmental Engineering
  • Department of Earth, Atmospheric, and Planetary Sciences
  • Department of Economics
  • Department of Electrical Engineering and Computer Sciences
  • Department of Humanities
  • Department of Linguistics and Philosophy
  • Department of Materials Science and Engineering
  • Department of Mathematics
  • Department of Mechanical Engineering
  • Department of Nuclear Science and Engineering
  • Department of Ocean Engineering
  • Department of Physics
  • Department of Political Science
  • Department of Urban Studies and Planning
  • Engineering Systems Division
  • Harvard-MIT Program of Health Sciences and Technology
  • Institute for Data, Systems, and Society
  • Media Arts & Sciences
  • Operations Research Center
  • Program in Real Estate Development
  • Program in Writing and Humanistic Studies
  • Science, Technology & Society
  • Science Writing
  • Sloan School of Management
  • Supply Chain Management
  • System Design & Management
  • Technology and Policy Program

Collections in this community

Doctoral theses, graduate theses, undergraduate theses, recent submissions.

Thumbnail

The Role Of Repurposing Coal Plants to Thermal Energy Storage in the Context of India 

Thumbnail

Doing the Dirty Work: Employment vulnerability to the energy transition and its implications for climate policy and politics 

Thumbnail

BReach-LP: a Framework for Backward Reachability Analysis of Neural Feedback Loops 

feed

Graduate Studies

IMAGES

  1. PhD in Data Science

    phd thesis in data science

  2. 1. Overview of the chapters in this PhD thesis

    phd thesis in data science

  3. (PDF) PhD thesis

    phd thesis in data science

  4. (PDF) PhD Thesis Writing Process: A Systematic Approach—How to Write

    phd thesis in data science

  5. How to do your PhD Thesis Using Secondary Data Collection in 4 Steps

    phd thesis in data science

  6. Steps for preparing research methodology

    phd thesis in data science

VIDEO

  1. PhD Programme at IIMB: PhD scholar Sai Dattathrani, Information Systems area

  2. PhD Thesis Defense. Vadim Sotskov

  3. Want To Do Phd in Data Science

  4. Need Research Assistance with your Thesis?

  5. Create combined variable in SPSS :Codify categorical variable in SPSS

  6. Excel Data Analysis (Introduction)

COMMENTS

  1. Computational and Data Sciences (PhD) Dissertations

    Computational and Data Sciences (PhD) Dissertations. Below is a selection of dissertations from the Doctor of Philosophy in Computational and Data Sciences program in Schmid College that have been included in Chapman University Digital Commons. Additional dissertations from years prior to 2019 are available through the Leatherby Libraries ...

  2. Getting a PhD in Data Science: What You Need to Know

    A PhD in Data Science is a research degree that typically takes four to five years to complete but can take longer depending on a range of personal factors. In addition to taking more advanced courses, PhD candidates devote a significant amount of time to teaching and conducting dissertation research with the intent of advancing the field ...

  3. 10 Compelling Machine Learning Ph.D. Dissertations for 2020

    This dissertation explores three topics related to random forests: tree aggregation, variable importance, and robustness. 10. Climate Data Computing: Optimal Interpolation, Averaging, Visualization and Delivery. This dissertation solves two important problems in the modern analysis of big climate data.

  4. PhD in Data Science

    PhD in Analytics and Data Science. Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

  5. 17 Compelling Machine Learning Ph.D. Dissertations

    This dissertation revisits and makes progress on some old but challenging problems concerning least squares estimation, the work-horse of supervised machine learning. Two major problems are addressed: (i) least squares estimation with heavy-tailed errors, and (ii) least squares estimation in non-Donsker classes.

  6. PhD Dissertations

    PhD Dissertations [All are .pdf files] Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023. Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023. METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023. Applied Mathematics of the Future Kin G. Olivares, 2023

  7. Doctor of Data Science and Analytics Dissertations

    The PhD Website. The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests. We launched the first formal PhD program in Data Science in 2015.

  8. How to write a great data science thesis

    They will stress the importance of structure, substance and style. They will urge you to write down your methodology and results first, then progress to the literature review, introduction and conclusions and to write the summary or abstract last. To write clearly and directly with the reader's expectations always in mind.

  9. PhD in Data Science

    An NRT-sponsored program in Data Science Overview Overview Advances in computational speed and data availability, and the development of novel data analysis methods, have birthed a new field: data science. This new field requires a new type of researcher and actor: the rigorously trained, cross-disciplinary, and ethically responsible data scientist. Launched in Fall 2017, the …

  10. PhD in Data Science

    The PhD curriculum combines the aspiration to train all students in mathematical foundations of data science, responsible data use and communication, and advanced computational methods, with an appreciation of the diverse research interests of the data science faculty. First Year Requirements. The standard first-year program requires students ...

  11. 10 Best Research and Thesis Topic Ideas for Data Science in 2022

    In this article, we have listed 10 such research and thesis topic ideas to take up as data science projects in 2022. Handling practical video analytics in a distributed cloud: With increased dependency on the internet, sharing videos has become a mode of data and information exchange. The role of the implementation of the Internet of Things ...

  12. Five Tips For Writing A Great Data Science Thesis

    Although educational programs, conventions and thesis requirements vary wildly, I hope to offer some common guidelines for any student currently working on a Data Science thesis. The article offers five guidance points, but may effectively be summarized in a single line: "Write for your reader, not for yourself."

  13. PhD in Data Science and Analytics

    Stage Two: Coursework. The Ph.D. in Data Science and Analytics requires 78 total credit hours spread over four years of study. Example Program of Study: Year 1. CS 8265 - Big Data Analytics. CS 8267 - Machine Learning. MATH 8010 - Theory of Linear Models (optional) MATH 8020 - Graph Theory.

  14. PhD Program

    A dissertation in the scope of Data Science is required of every candidate for the PhD degree. HDSI PhD program thesis requirements must meet Regulation 715(D) requirements. The final form of the dissertation document must comply with published guidelines by the Graduate Division.

  15. PhD in Data Science

    Degree requirements for the PhD in Data Science can be found in the NYU bulletin - Doctor of Philosophy in Data Science. To be awarded the Ph.D. in Data Science, students must, within 10 years of first enrolling: Complete 72 credit hours while maintaining a cumulative grade point average of 3.0 (out of 4.0) each semester. Complete the ...

  16. PhD in Computing & Data Sciences

    The PhD program in Computing & Data Sciences (CDS) at Boston University prepares its graduates to make significant contributions to the art, science, and engineering of computational and data-driven processes that are woven into all aspects of society, economy, and public discourse, leading to solution of problems and synthesis of knowledge related to the methodical, generalizable, and ...

  17. PDF Optimization-based Modeling in Investment and Data Science a

    scope and quality as a dissertation for the degree of Doctor of Philosophy. (Stephen P. Boyd) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Emmanuel J. Candes) Approved for the Stanford University Committee on Graduate ...

  18. Ph.D. with specialization in Data Science

    Year 3+: Dissertation, full time research in a research lab, annual evaluation of progress to dissertation by the graduate committee. See the Graduate Catalog for details. How to Apply. Data Science Ph.D. students are admitted to the Graduate program in Applied Science, in which they will earn a

  19. Doing a PhD in Data Science

    The cost of a PhD in Data Science will depend on the university you study with, but average tuition fee is £4000-£6000 per academic year for UK/EU students and £16,000-£19,000 per academic year for international students. Due to the popularity of Data Science PhD projects and the increasing demand for individuals who can elaborately analyse ...

  20. OATD

    Advanced research and scholarship. Theses and dissertations, free to find, free to use. Browse by author name ("Author name starts with…"). October 3, 2022. OATD is dealing with a number of misbehaved crawlers and robots, and is currently taking some steps to minimize their impact on the system.

  21. Ph.D. in Data Science

    The Ph.D. in Data Science program will provide the essential skills required to analyze big and complex data sets and equip students with a broad understanding of data challenges and opportunities, along with the research and inquiry skills necessary to independently conduct research and answer questions within their area of concentration.

  22. Thesis/Capstone for Master's in Data Science

    Thesis. A thesis is an academic-focused research project with broader applicability. A thesis is more appropriate if: you want to get a PhD or other advanced degree and want the experience of the research process and writing for publication; you want to work individually with a specific faculty member who serves as your thesis adviser

  23. MIT Theses

    If you are a recent MIT graduate, your thesis will be added to DSpace within 3-6 months after your graduation date. ... Institute for Data, Systems, and Society; Media Arts & Sciences; Operations Research Center; ... (2372) Electrical Engineering and Computer Science (2298) Civil and Environmental Engineering. (2145) Aeronautics and Astronautics.

  24. Graduate Studies

    Loading Graduate Studies. JavaScript must be enabled for this to work. University of Waterloo. University of Waterloo. 43.471468-80.544205. Campus map 200 University Avenue West. Waterloo, ON, Canada N2L 3G1 +1 519 888 4567. Contact Waterloo Accessibility ...