Chapman University Digital Commons

Home > Dissertations and Theses > Computational and Data Sciences (PhD) Dissertations

Computational and Data Sciences (PhD) Dissertations

Below is a selection of dissertations from the Doctor of Philosophy in Computational and Data Sciences program in Schmid College that have been included in Chapman University Digital Commons. Additional dissertations from years prior to 2019 are available through the Leatherby Libraries' print collection or in Proquest's Dissertations and Theses database.

Dissertations from 2023 2023

Computational Analysis of Antibody Binding Mechanisms to the Omicron RBD of SARS-CoV-2 Spike Protein: Identification of Epitopes and Hotspots for Developing Effective Therapeutic Strategies , Mohammed Alshahrani

Integration of Computer Algebra Systems and Machine Learning in the Authoring of the SANYMS Intelligent Tutoring System , Sam Ford

Voluntary Action and Conscious Intention , Jake Gavenas

Random Variable Spaces: Mathematical Properties and an Extension to Programming Computable Functions , Mohammed Kurd-Misto

Computational Modeling of Superconductivity from the Set of Time-Dependent Ginzburg-Landau Equations for Advancements in Theory and Applications , Iris Mowgood

Application of Machine Learning Algorithms for Elucidation of Biological Networks from Time Series Gene Expression Data , Krupa Nagori

Stochastic Processes and Multi-Resolution Analysis: A Trigonometric Moment Problem Approach and an Analysis of the Expenditure Trends for Diabetic Patients , Isaac Nwi-Mozu

Applications of Causal Inference Methods for the Estimation of Effects of Bone Marrow Transplant and Prescription Drugs on Survival of Aplastic Anemia Patients , Yesha M. Patel

Causal Inference and Machine Learning Methods in Parkinson's Disease Data Analysis , Albert Pierce

Causal Inference Methods for Estimation of Survival and General Health Status Measures of Alzheimer’s Disease Patients , Ehsan Yaghmaei

Dissertations from 2022 2022

Computational Approaches to Facilitate Automated Interchange between Music and Art , Rao Hamza Ali

Causal Inference in Psychology and Neuroscience: From Association to Causation , Dehua Liang

Advances in NLP Algorithms on Unstructured Medical Notes Data and Approaches to Handling Class Imbalance Issues , Hanna Lu

Novel Techniques for Quantifying Secondhand Smoke Diffusion into Children's Bedroom , Sunil Ramchandani

Probing the Boundaries of Human Agency , Sook Mun Wong

Dissertations from 2021 2021

Predicting Eye Movement and Fixation Patterns on Scenic Images Using Machine Learning for Children with Autism Spectrum Disorder , Raymond Anden

Forecasting the Prices of Cryptocurrencies using a Novel Parameter Optimization of VARIMA Models , Alexander Barrett

Applications of Machine Learning to Facilitate Software Engineering and Scientific Computing , Natalie Best

Exploring Behaviors of Software Developers and Their Code Through Computational and Statistical Methods , Elia Eiroa Lledo

Assessing the Re-Identification Risk in ECG Datasets and an Application of Privacy Preserving Techniques in ECG Analysis , Arin Ghazarian

Multi-Modal Data Fusion, Image Segmentation, and Object Identification using Unsupervised Machine Learning: Conception, Validation, Applications, and a Basis for Multi-Modal Object Detection and Tracking , Nicholas LaHaye

Machine-Learning-Based Approach to Decoding Physiological and Neural Signals , Elnaz Lashgari

Learning-Based Modeling of Weather and Climate Events Related To El Niño Phenomenon via Differentiable Programming and Empirical Decompositions , Justin Le

Quantum State Estimation and Tracking for Superconducting Processors Using Machine Learning , Shiva Lotfallahzadeh Barzili

Novel Applications of Statistical and Machine Learning Methods to Analyze Trial-Level Data from Cognitive Measures , Chelsea Parlett

Optimal Analytical Methods for High Accuracy Cardiac Disease Classification and Treatment Based on ECG Data , Jianwei Zheng

Dissertations from 2020 2020

Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents , Steven Agajanian

Allocation of Public Resources: Bringing Order to Chaos , Lance Clifner

A Novel Correction for the Adjusted Box-Pierce Test — New Risk Factors for Emergency Department Return Visits within 72 hours for Children with Respiratory Conditions — General Pediatric Model for Understanding and Predicting Prolonged Length of Stay , Sidy Danioko

A Computational and Experimental Examination of the FCC Incentive Auction , Logan Gantner

Exploring the Employment Landscape for Individuals with Autism Spectrum Disorders using Supervised and Unsupervised Machine Learning , Kayleigh Hyde

Integrated Machine Learning and Bioinformatics Approaches for Prediction of Cancer-Driving Gene Mutations , Oluyemi Odeyemi

On Quantum Effects of Vector Potentials and Generalizations of Functional Analysis , Ismael L. Paiva

Long Term Ground Based Precipitation Data Analysis: Spatial and Temporal Variability , Luciano Rodriguez

Gaining Computational Insight into Psychological Data: Applications of Machine Learning with Eating Disorders and Autism Spectrum Disorder , Natalia Rosenfield

Connecting the Dots for People with Autism: A Data-driven Approach to Designing and Evaluating a Global Filter , Viseth Sean

Novel Statistical and Machine Learning Methods for the Forecasting and Analysis of Major League Baseball Player Performance , Christopher Watkins

Dissertations from 2019 2019

Contributions to Variable Selection in Complexly Sampled Case-control Models, Epidemiology of 72-hour Emergency Department Readmission, and Out-of-site Migration Rate Estimation Using Pseudo-tagged Longitudinal Data , Kyle Anderson

Bias Reduction in Machine Learning Classifiers for Spatiotemporal Analysis of Coral Reefs using Remote Sensing Images , Justin J. Gapper

Estimating Auction Equilibria using Individual Evolutionary Learning , Kevin James

Employing Earth Observations and Artificial Intelligence to Address Key Global Environmental Challenges in Service of the SDGs , Wenzhao Li

Image Restoration using Automatic Damaged Regions Detection and Machine Learning-Based Inpainting Technique , Chloe Martin-King

Theses from 2017 2017

Optimized Forecasting of Dominant U.S. Stock Market Equities Using Univariate and Multivariate Time Series Analysis Methods , Michael Schwartz

  • Collections
  • Disciplines

Advanced Search

  • Notify me via email or RSS

Author Corner

  • Submit Research
  • Rights and Terms of Use
  • Leatherby Libraries
  • Chapman University

ISSN 2572-1496

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

DigitalCommons@Kennesaw State University

Home > CCSE > Data Science and Analytics > PhD DSA

Doctor of Data Science and Analytics Dissertations

The PhD Website

The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests.

We launched the first formal PhD program in Data Science in 2015. Our program sits at the intersection of computer science, statistics, mathematics, and business. Our students engage in relevant research with faculty from across our eleven colleges. As one of the institutions on the forefront of the development of data science as an academic discipline, we are committed to developing the next generation of Data Science leaders, researchers, and educators. Culturally, we are committed to the discipline of Data Science, through ethical practices, attention to fairness, to a diverse student body, to academic excellence, and research which makes positive contributions to our local, regional, and global community. -Sherry Ni, Director, Ph.D. in Data Science and Analytics

This degree will train individuals to translate and facilitate new innovative research, structured and unstructured, complex data into information to improve decision making. This curriculum includes heavy emphasis on programming, data mining, statistical modeling, and the mathematical foundations to support these concepts. Importantly, the program also emphasizes communication skills – both oral and written – as well as application and tying results to business and research problems.

Dissertations from 2023 2023

Quantification of Various Types of Biases in Large Language Models , Sudhashree Sayenju

Dissertations from 2022 2022

Appley: Approximate Shapley Values for Model Explainability in Linear Time , Md Shafiul Alam

Ethical Analytics: A Framework for a Practically-Oriented Sub-Discipline of AI Ethics , Jonathan Boardman

Novel Instance-Level Weighted Loss Function for Imbalanced Learning , Trent Geisler

Debiasing Cyber Incidents – Correcting for Reporting Delays and Under-reporting , Seema Sangari

Dissertations from 2021 2021

Integrated Machine Learning Approaches to Improve Classification performance and Feature Extraction Process for EEG Dataset , Mohammad Masum

A Distance-Based Clustering Framework for Categorical Time Series: A Case Study in Episodes of Care Healthcare Delivery System , Lauren Staples

Dissertations from 2020 2020


Quantitatively Motivated Model Development Framework: Downstream Analysis Effects of Normalization Strategies , Jessica M. Rudd

Data-driven Investment Decisions in P2P Lending: Strategies of Integrating Credit Scoring and Profit Scoring , Yan Wang

A Novel Penalized Log-likelihood Function for Class Imbalance Problem , Lili Zhang


Dissertations from 2019 2019

One and Two-Step Estimation of Time Variant Parameters and Nonparametric Quantiles , Bogdan Gadidov

Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis , Jie Hao

Deep Embedding Kernel , Linh Le

Ordinal HyperPlane Loss , Bob Vanderheyden

Advanced Search

  • Notify me via email or RSS
  • Submit Research
  • All Collections
  • Disciplines
  • Conferences
  • Faculty Works
  • Open Access
  • Research Support
  • Student Works
  • Data Science Homepage

Useful Links

  • Training Materials
  • SelectedWorks Login

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright DigitalCommons@Kennesaw State University ISSN: 2576-6805

data science dissertation

Recent Dissertation Topics

Marty Wells and a student look over papers

Dan Kowal - "Bayesian Methods for Functional and Time Series Data"

Dissertation Advisor: David Matteson and David Ruppert

Initial job placement: assistant professor, Department of Statistics, Rice University

Keegan Kang - "Data Dependent Random Projections"

Dissertation Advisor: Giles Hooker

David Sinclair - "Model Selection Results for High Dimensional Graphical Models on Binary and Count Data with Applications to FMRI and Genomics"

Liu, yanning – "statistical issues in the design and analysis of clinical trials".

Dissertation Advisor: Bruce Turnbull

Nicholson, William Bertil – "Tools for Modeling Sparse Vector Autoregressions"

Dissertation Advisor: David Matteson

Tupper, Laura Lindley – "Topics in Classification and Clustering of High-Dimensional Data"

Chetelat, didier – "high-dimensional inference by unbiased risk estimation".

Dissertation Advisor: Martin Wells

Initial Job Placement: Assistant Professor Universite de Montreal, Montreal, Canada

Gaynanova, Irina – "Estimation Of Sparse Low-Dimensional Linear Projections"

Dissertation Advisor: James Booth

Initial Job Placement: Assistant Professor, Texas A&M, College Station, TX

Mentch, Lucas – "Ensemble Trees and CLTS: Statistical Inference in Machine Learning"

Initial Job Placement: Assistant Professor, University of Pittsburgh, Pittsburgh, PA

Risk, Ben – "Topics in Independent Component Analysis, Likelihood Component Analysis, and Spatiotemporal Mixed Modeling"

Dissertation Advisors: David Matteson and David Ruppert

Initial Job Placement: Postdoctoral Fellow, University of North Carolina, Chapel Hill, NC

Zhao, Yue – "Contributions to the Statistical Inference for the Semiparametric Elliptical Copula Model"

Disseration Advisor: Marten Wegkamp 

Initial Job Placement: Postoctoral Fellow, McGill University, Montreal, Canada

Chen, Maximillian Gene – "Dimension Reduction and Inferential Procedures for Images"

Dissertation Advisor: Martin Wells 

Earls, Cecelia – Bayesian hierarchical Gaussian process models for functional data analysis

Dissertation Advisor: Giles Hooker

Initial Job Placement: Lecturer, Cornell University, Ithaca, NY

Li, James Yi-Wei – "Tensor (Multidimensional Array) Decomposition, Regression, and Software for Statistics and Machine Learning"

Initial Job Placement: Research Scientist, Yahoo Labs

Schneider, Matthew John – "Three Papers on Time Series Forecasting and Data Privacy"

Dissertation Advisor: John Abowd

Initial Job Placement: Assistant Professor, Northwestern University, Evanston, IL

Thorbergsson, Leifur – "Experimental design for partially observed Markov decision processes"

Initial Job Placement: Data Scientist, Memorial Sloan Kettering Cancer Center, New York, NY

Wan, Muting – "Model-Based Classification with Applications to High-Dimensional Data in Bioinformatics"

Initial Job Placement: Senior Associate, 1010 Data, New York, NY

Johnson, Lynn Marie – "Topics in Linear Models: Methods for Clustered, Censored Data and Two-Stage Sampling Designs"

Dissertation Advisor: Robert Strawderman

Initial Job Placement: Statistical Consultant, Cornell, Statistical Consulting Unit, Ithaca, NY

Tecuapetla Gomez, Inder Rafael –  "Asymptotic Inference for Locally Stationary Processes"

Dissertation Advisor: Michael Nussbaum

Initial Job Placement: Postdoctoral Fellow, Georg-August-Universitat Gottigen, Gottigen, Germany. 

Bar, Haim – "Parallel Testing, and Variable Selection -- a Mixture-Model Approach with Applications in Biostatistics" 

Dissertation Advisor: James Booth

Initial Job Placement: Postdoc, Department of Medicine, Weill Medical Center, New York, NY

Cunningham, Caitlin –  "Markov Methods for Identifying ChIP-seq Peaks" 

Initial Job Placement: Assistant Professor, Le Moyne College, Syracuse, NY

Ji, Pengsheng – "Selected Topics in Nonparametric Testing and Variable Selection for High Dimensional Data" 

Dissertation Advisor: Michael Nussbaum 

Initial Job Placement: Assistant Professor, University of Georgia, Athens, GA

Morris, Darcy Steeg – "Methods for Multivariate Longitudinal Count and Duration Models with Applications in Economics" 

Dissertation Advisor: Francesca Molinari 

Initial Job Placement: Research Mathematical Statistician, Center for Statistical Research and Methodology, U.S. Census Bureau, Washington DC

Narayanan, Rajendran – "Shrinkage Estimation for Penalised Regression, Loss Estimation and Topics on Largest Eigenvalue Distributions" 

Initial Job Placement: Visiting Scientist, Indian Statistical Institute, Kolkata, India

Xiao, Luo – "Topics in Bivariate Spline Smoothing" 

Dissertation Advisor: David Ruppert 

Initial Job Placement: Postdoc, Johns Hopkins University, Baltimore, MD

Zeber, David – "Extremal Properties of Markov Chains and the Conditional Extreme Value Model" 

Dissertation Advisor: Sidney Resnick 

Initial Job Placement: Data Analyst, Mozilla, San Francisco, CA

Clement, David – "Estimating equation methods for longitudinal and survival data" 

Dissertation Advisor: Robert Strawderman 

Initial Job Placement: Quantitative Analyst, Smartodds, London UK

Eilertson, Kirsten – "Estimation and inference of random effect models with applications to population genetics and proteomics" 

Dissertation Advisor: Carlos Bustamante 

Initial Job Placement: Biostatistician, The J. David Gladstone Institutes, San Francisco CA

Grabchak, Michael – "Tempered stable distributions: properties and extensions" 

Dissertation Advisor: Gennady Samorodnitsky 

Initial Job Placement: Assistant Professor, UNC Charlotte, Charlotte NC

Li, Yingxing – "Aspects of penalized splines" 

Initial Job Placement: Assistant Professor, The Wang Yanan Institute for Studies in Economics, Xiamen University

Lopez Oliveros, Luis – "Modeling end-user behavior in data networks" 

Dissertation Advisor: Sidney Resnick  

Initial Job Placement: Consultant, Murex North America, New York NY

Ma, Xin – "Statistical Methods for Genome Variant Calling and Population Genetic Inference from Next-Generation Sequencing Data" 

Initial Job Placement: Postdoc, Stanford University, Stanford CA

Kormaksson, Matthias – "Dynamic path analysis and model based clustering of microarray data" 

Dissertation Advisor: James Booth 

Initial Job Placement: Postdoc, Department of Public Health, Weill Cornell Medical College, New York NY

Schifano, Elizabeth – "Topics in penalized estimation" 

Initial Job Placement: Postdoc, Department of Biostatistics, Harvard University, Boston MA

Hanlon, Bret – "High-dimensional data analysis" 

Dissertation Advisor: Anand Vidyashankar 

Shaby, Benjamin – "Tools for hard bayesian computations" 

Initial Job Placement: Postdoc, SAMSI, Durham NC

Zipunnikov, Vadim – "Topics on generalized linear mixed models" 

Initial Job Placement: Postdoc, Department of Biostatistics, Johns Hopkins University, Baltimore MD

Barger, Kathryn Jo-Anne – "Objective bayesian estimation for the number of classes in a population using Jeffreys and reference priors" 

Dissertation Advisor: John Bunge 

Initial Job Placement: Pfizer Incorporated

Chan, Serena Suewei – "Robust and efficient inference for linear mixed models using skew-normal distributions" 

Initial Job Placement: Statistician, Takeda Pharmaceuticles, Deerfield IL

Lin, Haizhi – "Distressed debt prices and recovery rate estimation" 

Dissertation Advisor: Martin Wells  

Initial Job Placement: Associate, Fixed Income Department, Credit Suisse Securities (USA), New York, NY

Warning icon

Thesis/Capstone for Master's in Data Science | Northwestern SPS - Northwestern School of Professional Studies

  • Post-baccalaureate
  • Undergraduate
  • Professional Development
  • Pre-College
  • Center for Public Safety
  • Get Information

SPS Logo

Data Science

Capstone and thesis overview.

Capstone and thesis are similar in that they both represent a culminating, scholarly effort of high quality. Both should clearly state a problem or issue to be addressed. Both will allow students to complete a larger project and produce a product or publication that can be highlighted on their resumes. Students should consider the factors below when deciding whether a capstone or thesis may be more appropriate to pursue.

A capstone is a practical or real-world project that can emphasize preparation for professional practice. A capstone is more appropriate if:

  • you don't necessarily need or want the experience of the research process or writing a big publication
  • you want more input on your project, from fellow students and instructors
  • you want more structure to your project, including assignment deadlines and due dates
  • you want to complete the project or graduate in a timely manner

A student can enroll in MSDS 498 Capstone in any term. However, capstone specialization courses can provide a unique student experience and may be offered only twice a year. 

A thesis is an academic-focused research project with broader applicability. A thesis is more appropriate if:

  • you want to get a PhD or other advanced degree and want the experience of the research process and writing for publication
  • you want to work individually with a specific faculty member who serves as your thesis adviser
  • you are more self-directed, are good at managing your own projects with very little supervision, and have a clear direction for your work
  • you have a project that requires more time to pursue

Students can enroll in MSDS 590 Thesis as long as there is an approved thesis project proposal, identified thesis adviser, and all other required documentation at least two weeks before the start of any term.

From Faculty Director, Thomas W. Miller, PhD

Tom Miller

Capstone projects and thesis research give students a chance to study topics of special interest to them. Students can highlight analytical skills developed in the program. Work on capstone and thesis research projects often leads to publications that students can highlight on their resumes.”

A thesis is an individual research project that usually takes two to four terms to complete. Capstone course sections, on the other hand, represent a one-term commitment.

Students need to evaluate their options prior to choosing a capstone course section because capstones vary widely from one instructor to the next. There are both general and specialization-focused capstone sections. Some capstone sections offer in individual research projects, others offer team research projects, and a few give students a choice of individual or team projects.

Students should refer to the SPS Graduate Student Handbook for more information regarding registration for either MSDS 590 Thesis or MSDS 498 Capstone.

Capstone Experience

If students wish to engage with an outside organization to work on a project for capstone, they can refer to this checklist and lessons learned for some helpful tips.

Capstone Checklist

  • Start early — set aside a minimum of one to two months prior to the capstone quarter to determine the industry and modeling interests.
  • Networking — pitch your idea to potential organizations for projects and focus on the business benefits you can provide.
  • Permission request — make sure your final project can be shared with others in the course and the information can be made public.
  • Engagement — engage with the capstone professor prior to and immediately after getting the dataset to ensure appropriate scope for the 10 weeks.
  • Teambuilding — recruit team members who have similar interests for the type of project during the first week of the course.

Capstone Lesson Learned

  • Access to company data can take longer than expected; not having this access before or at the start of the term can severely delay the progress
  • Project timeline should align with coursework timeline as closely as possible
  • One point of contact (POC) for business facing to ensure streamlined messages and more effective time management with the organization
  • Expectation management on both sides: (business) this is pro-bono (students) this does not guarantee internship or job opportunities
  • Data security/masking not executed in time can risk the opportunity completely

Publication of Work

Northwestern University Libraries offers an option for students to publish their master’s thesis or capstone in Arch, Northwestern’s open access research and data repository.

Benefits for publishing your thesis:

  • Your work will be indexed by search engines and discoverable by researchers around the world, extending your work’s impact beyond Northwestern
  • Your work will be assigned a Digital Object Identifier (DOI) to ensure perpetual online access and to facilitate scholarly citation
  • Your work will help accelerate discovery and increase knowledge in your subject domain by adding to the global corpus of public scholarly information

Get started:

  • Visit Arch online
  • Log in with your NetID
  • Describe your thesis: title, author, date, keywords, rights, license, subject, etc.
  • Upload your thesis or capstone PDF and any related supplemental files (data, code, images, presentations, documentation, etc.)
  • Select a visibility: Public, Northwestern-only, Embargo (i.e. delayed release)
  • Save your work to the repository

Your thesis manuscript or capstone report will then be published on the MSDS page. You can view other published work here .

For questions or support in publishing your thesis or capstone, please contact [email protected] .

  • Thesis Option

Data Science master’s students can choose to satisfy the research experience requirement by selecting the thesis option. Students will spend the majority of their second year working on a substantial data science project that culminates in the submission and oral defense of a master’s thesis. While all thesis projects must be related to data science, students are given leeway in finding a project in a domain of study that fits with their background and interest.

All students choosing the thesis option must find a research advisor and submit a thesis proposal by mid-April of their first year of study. Thesis proposals will be evaluated by the Data Science faculty committee and only those students whose proposals are accepted will be allowed to continue with the thesis option.  

To account for the time spent on thesis research, students choosing the thesis option are able substitute three required courses (the Capstone and two "free" elective courses (as defined in the final bullet point on the degree requirement page )) with AC 302.

In Applied Computation

  • How to Apply
  • Learning Outcomes
  • Master of Science Degree Requirements
  • Master of Engineering Degree Requirements
  • CSE courses
  • Degree Requirements
  • Data Science courses
  • Data Science FAQ
  • Secondary Field Requirements
  • Advising and Other Activities
  • AB/SM Information
  • Alumni Stories
  • Financing the Degree
  • Student FAQ

PhD in Data Science – Your Guide to Choosing a Doctorate Degree Program

data science dissertation

Created by aasif.faizal

Professional opportunities in data science are growing incredibly fast. That’s great news for students looking to pursue a career as a data scientist. But it also means that there are a lot more options out there to investigate and understand before developing the best educational path for you.

A PhD is the most advanced data science degree you can get, reflecting a depth of knowledge and technical expertise that will put you at the top of your field.

phd data science

This means that PhD programs are the most time-intensive degree option out there, typically requiring that students complete dissertations involving rigorous research. This means that PhDs are not for everyone. Indeed, many who work in the world of big data hold master’s degrees rather than PhDs, which tend to involve the same coursework as PhD programs without a dissertation component. However, for the right candidate, a PhD program is the perfect choice to become a true expert on your area of focus.

If you’ve concluded that a data science PhD is the right path for you, this guide is intended to help you choose the best program to suit your needs. It will walk through some of the key considerations while picking graduate data science programs and some of the nuts and bolts (like course load and tuition costs) that are part of the data science PhD decision-making process.

Data Science PhD vs. Masters: Choosing the right option for you

If you’re considering pursuing a data science PhD, it’s worth knowing that such an advanced degree isn’t strictly necessary in order to get good work opportunities. Many who work in the field of big data only hold master’s degrees, which is the level of education expected to be a competitive candidate for data science positions.

So why pursue a data science PhD?

Simply put, a PhD in data science will leave you qualified to enter the big data industry at a high level from the outset.

You’ll be eligible for advanced positions within companies, holding greater responsibilities, keeping more direct communication with leadership, and having more influence on important data-driven decisions. You’re also likely to receive greater compensation to match your rank.

However, PhDs are not for everyone. Dissertations require a great deal of time and an interest in intensive research. If you are eager to jumpstart a career quickly, a master’s program will give you the preparation you need to hit the ground running. PhDs are appropriate for those who want to commit their time and effort to schooling as a long-term investment in their professional trajectory.

For more information on the difference between data science PhD’s and master’s programs, take a look at our guide here.

Topics include:

  • Can I get an Online Ph.D in Data Science?
  • Overview of Ph.d Coursework

Preparing for a Doctorate Program

Building a solid track record of professional experience, things to consider when choosing a school.

  • What Does it Cost to Get a Ph.D in Data Science?
  • School Listings

data analysis graph

Data Science PhD Programs, Historically

Historically, data science PhD programs were one of the main avenues to get a good data-related position in academia or industry. But, PhD programs are heavily research oriented and require a somewhat long term investment of time, money, and energy to obtain. The issue that some data science PhD holders are reporting, especially in industry settings, is that that the state of the art is moving so quickly, and that the data science industry is evolving so rapidly, that an abundance of research oriented expertise is not always what’s heavily sought after.

Instead, many companies are looking for candidates who are up to date with the latest data science techniques and technologies, and are willing to pivot to match emerging trends and practices.

One recent development that is making the data science graduate school decisions more complex is the introduction of specialty master’s degrees, that focus on rigorous but compact, professional training. Both students and companies are realizing the value of an intensive, more industry-focused degree that can provide sufficient enough training to manage complex projects and that are more client oriented, opposed to research oriented.

However, not all prospective data science PhD students are looking for jobs in industry. There are some pretty amazing research opportunities opening up across a variety of academic fields that are making use of new data collection and analysis tools. Experts that understand how to leverage data systems including statistics and computer science to analyze trends and build models will be in high demand.

Can You Get a PhD in Data Science Online?

While it is not common to get a data science Ph.D. online, there are currently two options for those looking to take advantage of the flexibility of an online program.

Indiana University Bloomington and Northcentral University both offer online Ph.D. programs with either a minor or specialization in data science.

Given the trend for schools to continue increasing online offerings, expect to see additional schools adding this option in the near future.

woman data analysis on computer screens

Overview of PhD Coursework

A PhD requires a lot of academic work, which generally requires between four and five years (sometimes longer) to complete.

Here are some of the high level factors to consider and evaluate when comparing data science graduate programs.

How many credits are required for a PhD in data science?

On average, it takes 71 credits to graduate with a PhD in data science — far longer (almost double) than traditional master’s degree programs. In addition to coursework, most PhD students also have research and teaching responsibilities that can be simultaneously demanding and really great career preparation.

What’s the core curriculum like?

In a data science doctoral program, you’ll be expected to learn many skills and also how to apply them across domains and disciplines. Core curriculums will vary from program to program, but almost all will have a core foundation of statistics.

All PhD candidates will have to take a qualifying exam. This can vary from university to university, but to give you some insight, it is broken up into three phases at Yale. They have a practical exam, a theory exam and an oral exam. The goal is to make sure doctoral students are developing the appropriate level of expertise.


One of the final steps of a PhD program involves presenting original research findings in a formal document called a dissertation. These will provide background and context, as well as findings and analysis, and can contribute to the understanding and evolution of data science. A dissertation idea most often provides the framework for how a PhD candidate’s graduate school experience will unfold, so it’s important to be thoughtful and deliberate while considering research opportunities.

Since data science is such a rapidly evolving field and because choosing the right PhD program is such an important factor in developing a successful career path, there are some steps that prospective doctoral students can take in advance to find the best-fitting opportunity.

Join professional associations

Even before being fully credentials, joining professional associations and organizations such as the Data Science Association and the American Association of Big Data Professionals is a good way to get exposure to the field. Many professional societies are welcoming to new members and even encourage student participation with things like discounted membership fees and awards and contest categories for student researchers. One of the biggest advantages to joining is that these professional associations bring together other data scientists for conference events, research-sharing opportunities, networking and continuing education opportunities.

Leverage your social network

Be on the lookout to make professional connections with professors, peers, and members of industry. There are a number of LinkedIn groups dedicated to data science. A well-maintained professional network is always useful to have when looking for advice or letters of recommendation while applying to graduate school and then later while applying for jobs and other career-related opportunities.

Kaggle competitions

Kaggle competitions provide the opportunity to solve real-world data science problems and win prizes. A list of data science problems can be found at . Winning one of these competitions is a good way to demonstrate professional interest and experience.


Internships are a great way to get real-world experience in data science while also getting to work for top names in the world of business. For example, IBM offers a data science internship which would also help to stand out when applying for PhD programs, as well as in seeking employment in the future.

Demonstrating professional experience is not only important when looking for jobs, but it can also help while applying for graduate school. There are a number of ways for prospective students to gain exposure to the field and explore different facets of data science careers.

Get certified

There are a number of data-related certificate programs that are open to people with a variety of academic and professional experience. DeZyre has an excellent guide to different certifications, some of which might help provide good background for graduate school applications.


Conferences are a great place to meet people presenting new and exciting research in the data science field and bounce ideas off of newfound connections. Like professional societies and organizations, discounted student rates are available to encourage student participation. In addition, some conferences will waive fees if you are presenting a poster or research at the conference, which is an extra incentive to present.

teacher in full classroom of students

It can be hard to quantify what makes a good-fit when it comes to data science graduate school programs. There are easy to evaluate factors, such as cost and location, and then there are harder to evaluate criteria such as networking opportunities, accessibility to professors, and the up-to-dateness of the program’s curriculum.

Nevertheless, there are some key relevant considerations when applying to almost any data science graduate program.

What most schools will require when applying:

  • All undergraduate and graduate transcripts
  • A statement of intent for the program (reason for applying and future plans)
  • Letters of reference
  • Application fee
  • Online application
  • A curriculum vitae (outlining all of your academic and professional accomplishments)

What Does it Cost to Get a PhD in Data Science?

The great news is that many PhD data science programs are supported by fellowships and stipends. Some are completely funded, meaning the school will pay tuition and basic living expenses. Here are several examples of fully funded programs:

  • University of Southern California
  • University of Nevada, Reno
  • Kennesaw State University
  • Worcester Polytechnic Institute
  • University of Maryland

For all other programs, the average range of tuition, depending on the school can range anywhere from $1,300 per credit hour to $2,000 amount per credit hour. Remember, typical PhD programs in data science are between 60 and 75 credit hours, meaning you could spend up to $150,000 over several years.

That’s why the financial aspects are so important to evaluate when assessing PhD programs, because some schools offer full stipends so that you are able to attend without having to find supplemental scholarships or tuition assistance.

Can I become a professor of data science with a PhD.? Yes! If you are interested in teaching at the college or graduate level, a PhD is the degree needed to establish the full expertise expected to be a professor. Some data scientists who hold PhDs start by entering the field of big data and pivot over to teaching after gaining a significant amount of work experience. If you’re driven to teach others or to pursue advanced research in data science, a PhD is the right degree for you.

Do I need a master’s in order to pursue a PhD.? No. Many who pursue PhDs in Data Science do not already hold advanced degrees, and many PhD programs include all the coursework of a master’s program in the first two years of school. For many students, this is the most time-effective option, allowing you to complete your education in a single pass rather than interrupting your studies after your master’s program.

Can I choose to pursue a PhD after already receiving my master’s? Yes. A master’s program can be an opportunity to get the lay of the land and determine the specific career path you’d like to forge in the world of big data. Some schools may allow you to simply extend your academic timeline after receiving your master’s degree, and it is also possible to return to school to receive a PhD if you have been working in the field for some time.

If a PhD. isn’t necessary, is it a waste of time? While not all students are candidates for PhDs, for the right students – who are keen on doing in-depth research, have the time to devote to many years of school, and potentially have an interest in continuing to work in academia – a PhD is a great choice. For more information on this question, take a look at our article Is a Data Science PhD. Worth It?

Complete List of Data Science PhD Programs

Below you will find the most comprehensive list of schools offering a doctorate in data science. Each school listing contains a link to the program specific page, GRE or a master’s degree requirements, and a link to a page with detailed course information.

Note that the listing only contains true data science programs. Other similar programs are often lumped together on other sites, but we have chosen to list programs such as data analytics and business intelligence on a separate section of the website.

Boise State University  – Boise, Idaho PhD in Computing – Data Science Concentration

The Data Science emphasis focuses on the development of mathematical and statistical algorithms, software, and computing systems to extract knowledge or insights from data.  

In 60 credits, students complete an Introduction to Graduate Studies, 12 credits of core courses, 6 credits of data science elective courses, 10 credits of other elective courses, a Doctoral Comprehensive Examination worth 1 credit, and a 30-credit dissertation.

Electives can be taken in focus areas such as Anthropology, Biometry, Ecology/Evolution and Behavior, Econometrics, Electrical Engineering, Earth Dynamics and Informatics, Geoscience, Geostatistics, Hydrology and Hydrogeology, Materials Science, and Transportation Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $7,236 total (Resident), $24,573 total (Non-resident)

View Course Offerings

Bowling Green State University  – Bowling Green, Ohio Ph.D. in Data Science

Data Science students at Bowling Green intertwine knowledge of computer science with statistics.

Students learn techniques in analyzing structured, unstructured, and dynamic datasets.

Courses train students to understand the principles of analytic methods and articulating the strengths and limitations of analytical methods.

The program requires 60 credit hours in the studies of Computer Science (6 credit hours), Statistics (6 credit hours), Data Science Exploration and Communication, Ethical Issues, Advanced Data Mining, and Applied Data Science Experience.

Students must also complete 21 credit hours of elective courses, a qualifying exam, a preliminary exam, and a dissertation.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,418 (Resident), $14,410 (Non-resident)

Brown University  – Providence, Rhode Island PhD in Computer Science – Concentration in Data Science

Brown University’s database group is a world leader in systems-oriented database research; they seek PhD candidates with strong system-building skills who are interested in researching TupleWare, MLbase, MDCC, Crowd DB, or PIQL.

In order to gain entrance, applicants should consider first doing a research internship at Brown with this group. Other ways to boost an application are to take and do well at massive open online courses, do an internship at a large company, and get involved in a large open-source software project.

Coding well in C++ is preferred.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $62,680 total

Chapman University  – Irvine, California Doctorate in Computational and Data Sciences

Candidates for the doctorate in computational and data science at Chapman University begin by completing 13 core credits in basic methodologies and techniques of computational science.

Students complete 45 credits of electives, which are personalized to match the specific interests and research topics of the student.

Finally, students complete up to 12 credits in dissertation research.

Applicants must have completed courses in differential equations, data structures, and probability and statistics, or take specific foundation courses, before beginning coursework toward the PhD.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,538 per year

Clemson University / Medical University of South Carolina (MUSC) – Joint Program – Clemson, South Carolina & Charleston, South Carolina Doctor of Philosophy in Biomedical Data Science and Informatics – Clemson

The PhD in biomedical data science and informatics is a joint program co-authored by Clemson University and the Medical University of South Carolina (MUSC).

Students choose one of three tracks to pursue: precision medicine, population health, and clinical and translational informatics. Students complete 65-68 credit hours, and take courses in each of 5 areas: biomedical informatics foundations and applications; computing/math/statistics/engineering; population health, health systems, and policy; biomedical/medical domain; and lab rotations, seminars, and doctoral research.

Applicants must have a bachelor’s in health science, computing, mathematics, statistics, engineering, or a related field, and it is recommended to also have competency in a second of these areas.

Program requirements include a year of calculus and college biology, as well as experience in computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,858 total (South Carolina Resident), $22,566 total (Non-resident)

View Course Offerings – Clemson

George Mason University  – Fairfax, Virginia Doctor of Philosophy in Computational Sciences and Informatics – Emphasis in Data Science

George Mason’s PhD in computational sciences and informatics requires a minimum of 72 credit hours, though this can be reduced if a student has already completed a master’s. 48 credits are toward graduate coursework, and an additional 24 are for dissertation research.

Students choose an area of emphasis—either computer modeling and simulation or data science—and completed 18 credits of the coursework in this area. Students are expected to completed the coursework in 4-5 years.

Applicants to this program must have a bachelor’s degree in a natural science, mathematics, engineering, or computer science, and must have knowledge and experience with differential equations and computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $13,426 total (Virginia Resident), $35,377 total (Non-resident)

Harrisburg University of Science and Technology  – Harrisburg, Pennsylvania Doctor of Philosophy in Data Sciences

Harrisburg University’s PhD in data science is a 4-5 year program, the first 2 of which make up the Harrisburg master’s in analytics.

Beyond this, PhD candidates complete six milestones to obtain the degree, including 18 semester hours in doctoral-level courses, such as multivariate data analysis, graph theory, machine learning.

Following the completion of ANLY 760 Doctoral Research Seminar, students in the program complete their 12 hours of dissertation research bringing the total program hours to 36.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $14,940 total

Icahn School of Medicine at Mount Sinai  – New York, New York Genetics and Data Science, PhD

As part of the Biomedical Science PhD program, the Genetics and Data Science multidisciplinary training offers research opportunities that expand on genetic research and modern genomics. The training also integrates several disciplines of biomedical sciences with machine learning, network modeling, and big data analysis.

Students in the Genetics and Data Science program complete a predetermined course schedule with a total of 64 credits and 3 years of study.

Additional course requirements and electives include laboratory rotations, a thesis proposal exam and thesis defense, Computer Systems, Intro to Algorithms, Machine Learning for Biomedical Data Science, Translational Genomics, and Practical Analysis of a Personal Genome.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $31,303 total

Indiana University-Purdue University Indianapolis  – Indianapolis, Indiana PhD in Data Science PhD Minor in Applied Data Science

Doctoral candidates pursuing the PhD in data science at Indiana University-Purdue must display competency in research, data analytics, and at management and infrastructure to earn the degree.

The PhD is comprised of 24 credits of a data science core, 18 credits of methods courses, 18 credits of a specialization, written and oral qualifying exams, and 30 credits of dissertation research. All requirements must be completed within 7 years.

Applicants are generally expected to have a master’s in social science, health, data science, or computer science. 

Currently a majority of the PhD students at IUPUI are funded by faculty grants and two are funded by the federal government. None of the students are self funded.

IUPUI also offers a PhD Minor in Applied Data Science that is 12-18 credits. The minor is open to students enrolled at IUPUI or IU Bloomington in a doctoral program other than Data Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $9,228 per year (Indiana Resident), $25,368 per year (Non-resident)

Jackson State University – Jackson, Mississippi PhD Computational and Data-Enabled Science and Engineering

Jackson State University offers a PhD in computational and data-enabled science and engineering with 5 concentration areas: computational biology and bioinformatics, computational science and engineering, computational physical science, computation public health, and computational mathematics and social science.

Students complete 12 credits of common core courses, 12 credits in the specialization, 24 credits of electives, and 24 credits in dissertation research.

Students may complete the doctoral program in as little as 5 years and no more than 8 years.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,270 total

Kennesaw State University  – Kennesaw, Georgia PhD in Analytics and Data Science

Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

Prior to dissertation research, the comprehensive examination will cover material from the three areas of study: computer science, mathematics, and statistics.

Successful applicants will have a master’s degree in a computational field, calculus I and II, programming experience, modeling experience, and are encouraged to have a base SAS certification.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,328 total (Georgia Resident), $19,188 total (Non-resident)

New Jersey Institute of Technology  – Newark, New Jersey PhD in Business Data Science

Students may enter the PhD program in business data science at the New Jersey Institute of Technology with either a relevant bachelor’s or master’s degree. Students with bachelor’s degrees begin with 36 credits of advanced courses, and those with master’s take 18 credits before moving on to credits in dissertation research.

Core courses include business research methods, data mining and analysis, data management system design, statistical computing with SAS and R, and regression analysis.

Students take qualifying examinations at the end of years 1 and 2, and must defend their dissertations successfully by the end of year 6.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $21,932 total (New Jersey Resident), $32,426 total (Non-resident)

New York University  – New York, New York PhD in Data Science

Doctoral candidates in data science at New York University must complete 72 credit hours, pass a comprehensive and qualifying exam, and defend a dissertation with 10 years of entering the program.

Required courses include an introduction to data science, probability and statistics for data science, machine learning and computational statistics, big data, and inference and representation.

Applicants must have an undergraduate or master’s degree in fields such as mathematics, statistics, computer science, engineering, or other scientific disciplines. Experience with calculus, probability, statistics, and computer programming is also required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,332 per year

View Course Offering

Northcentral University  – San Diego, California PhD in Data Science-TIM

Northcentral University offers a PhD in technology and innovation management with a specialization in data science.

The program requires 60 credit hours, including 6-7 core courses, 3 in research, a PhD portfolio, and 4 dissertation courses.

The data science specialization requires 6 courses: data mining, knowledge management, quantitative methods for data analytics and business intelligence, data visualization, predicting the future, and big data integration.

Applicants must have a master’s already.

Delivery Method: Online GRE: Required 2022-2023 Tuition: $16,794 total

Stevens Institute of Technology – Hoboken, New Jersey Ph.D. in Data Science

Stevens Institute of Technology has developed a data science Ph.D. program geared to help graduates become innovators in the space.

The rigorous curriculum emphasizes mathematical and statistical modeling, machine learning, computational systems and data management.

The program is directed by Dr. Ted Stohr, a recognized thought leader in the information systems, operations and business process management arenas.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $39,408 per year

University at Buffalo – Buffalo, New York PhD Computational and Data-Enabled Science and Engineering

The curriculum for the University of Buffalo’s PhD in computational and data-enabled science and engineering centers around three areas: data science, applied mathematics and numerical methods, and high performance and data intensive computing. 9 credit course of courses must be completed in each of these three areas. Altogether, the program consists of 72 credit hours, and should be completed in 4-5 years. A master’s degree is required for admission; courses taken during the master’s may be able to count toward some of the core coursework requirements.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,310 per year (New York Resident), $23,100 per year (Non-resident)

University of Colorado Denver – Denver, Colorado PhD in Big Data Science and Engineering

The University of Colorado – Denver offers a unique program for those students who have already received admission to the computer science and information systems PhD program.

The Big Data Science and Engineering (BDSE) program is a PhD fellowship program that allows selected students to pursue research in the area of big data science and engineering. This new fellowship program was created to train more computer scientists in data science application fields such as health informatics, geosciences, precision and personalized medicine, business analytics, and smart cities and cybersecurity.

Students in the doctoral program must complete 30 credit hours of computer science classes beyond a master’s level, and 30 credit hours of dissertation research.

The BDSE fellowship requires students to have an advisor both in the core disciplines (either computer science or mathematics and statistics) as well as an advisor in the application discipline (medicine and public health, business, or geosciences).

In addition, the fellowship covers full stipend, tuition, and fees up to ~50k for BDSE fellows annually. Important eligibility requirements can be found here.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $55,260 total

University of Marylan d  – College Park, Maryland PhD in Information Studies

Data science is a potential research area for doctoral candidates in information studies at the University of Maryland – College Park. This includes big data, data analytics, and data mining.

Applicants for the PhD must have taken the following courses in undergraduate studies: programming languages, data structures, design and analysis of computer algorithms, calculus I and II, and linear algebra.

Students must complete 6 qualifying courses, 2 elective graduate courses, and at least 12 credit hours of dissertation research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $16,238 total (Maryland Resident), $35,388 total (Non-resident)

University of Massachusetts Boston  – Boston, Massachusetts PhD in Business Administration – Information Systems for Data Science Track

The University of Massachusetts – Boston offers a PhD in information systems for data science. As this is a business degree, students must complete coursework in their first two years with a focus on data for business; for example, taking courses such as business in context: markets, technologies, and societies.

Students must take and pass qualifying exams at the end of year 1, comprehensive exams at the end of year 2, and defend their theses at the end of year 4.

Those with a degree in statistics, economics, math, computer science, management sciences, information systems, and other related fields are especially encouraged, though a quantitative degree is not necessary.

Students accepted by the program are ordinarily offered full tuition credits and a stipend ($25,000 per year) to cover educational expenses and help defray living costs for up to three years of study.

During the first two years of coursework, they are assigned to a faculty member as a research assistant; for the third year students will be engaged in instructional activities. Funding for the fourth year is merit-based from a limited pool of program funds

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $18,894 total (in-state), $36,879 (out-of-state)

University of Nevada Reno – Reno, Nevada PhD in Statistics and Data Science

The University of Nevada – Reno’s doctoral program in statistics and data science is comprised of 72 credit hours to be completed over the course of 4-5 years. Coursework is all within the scope of statistics, with titles such as statistical theory, probability theory, linear models, multivariate analysis, statistical learning, statistical computing, time series analysis.

The completion of a Master’s degree in mathematics or statistics prior to enrollment in the doctoral program is strongly recommended, but not required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,814 total (in-state), $22,356 (out-of-state)

University of Southern California – Los Angles, California PhD in Data Sciences & Operations

USC Marshall School of Business offers a PhD in data sciences and operations to be completed in 5 years.

Students can choose either a track in operations management or in statistics. Both tracks require 4 courses in fall and spring of the first 2 years, as well as a research paper and courses during the summers. Year 3 is devoted to dissertation preparation and year 4 and/or 5 to dissertation defense.

A bachelor’s degree is necessary for application, but no field or further experience is required.

Students should complete 60 units of coursework. If the students are admitted with Advanced Standing (e.g., Master’s Degree in appropriate field), this requirement may be reduced to 40 credits.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $63,468 total

University of Tennessee-Knoxville  – Knoxville, Tennessee The Data Science and Engineering PhD

The data science and engineering PhD at the University of Tennessee – Knoxville requires 36 hours of coursework and 36 hours of dissertation research. For those entering with an MS degree, only 24 hours of course work is required.

The core curriculum includes work in statistics, machine learning, and scripting languages and is enhanced by 6 hours in courses that focus either on policy issues related to data, or technology entrepreneurship.

Students must also choose a knowledge specialization in one of these fields: health and biological sciences, advanced manufacturing, materials science, environmental and climate science, transportation science, national security, urban systems science, and advanced data science.

Applicants must have a bachelor’s or master’s degree in engineering or a scientific field. 

All students that are admitted will be supported by a research fellowship and tuition will be included.

Many students will perform research with scientists from Oak Ridge national lab, which is located about 30 minutes drive from campus.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,468 total (Tennessee Resident), $29,656 total (Non-resident)

University of Vermont – Burlington, Vermont Complex Systems and Data Science (CSDS), PhD

Through the College of Engineering and Mathematical Sciences, the Complex Systems and Data Science (CSDS) PhD program is pan-disciplinary and provides computational and theoretical training. Students may customize the program depending on their chosen area of focus.

Students in this program work in research groups across campus.

Core courses include Data Science, Principles of Complex Systems and Modeling Complex Systems. Elective courses include Machine Learning, Complex Networks, Evolutionary Computation, Human/Computer Interaction, and Data Mining.

The program requires at least 75 credits to graduate with approval by the student graduate studies committee.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $12,204 total (Vermont Resident), $30,960 total (Non-resident)

University of Washington Seattle Campus – Seattle, Washington PhD in Big Data and Data Science

The University of Washington’s PhD program in data science has 2 key goals: training of new data scientists and cyberinfrastructure development, i.e., development of open-source tools and services that scientists around the world can use for big data analysis.

Students must take core courses in data management, machine learning, data visualization, and statistics.

Students are also required to complete at least one internship that covers practical work in big data.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $17,004 per year (Washington resident), $30,477 (non-resident)

University of Wisconsin-Madison – Madison, Wisconsin PhD in Biomedical Data Science

The PhD program in Biomedical Data Science offered by the Department of Biostatistics and Medical Informatics at UW-Madison is unique, in blending the best of statistics and computer science, biostatistics and biomedical informatics. 

Students complete three year-long course sequences in biostatistics theory and methods, computer science/informatics, and a specialized sequence to fit their interests.

Students also complete three research rotations within their first two years in the program, to both expand their breadth of knowledge and assist in identifying a research advisor.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,728 total (in-state), $24,054 total (out-of-state)

Vanderbilt University – Nashville, Tennessee Data Science Track of the BMI PhD Program

The PhD in biomedical informatics at Vanderbilt has the option of a data science track.

Students complete courses in the areas of biomedical informatics (3 courses), computer science (4 courses), statistical methods (4 courses), and biomedical science (2 courses). Students are expected to complete core courses and defend their dissertations within 5 years of beginning the program.

Applicants must have a bachelor’s degree in computer science, engineering, biology, biochemistry, nursing, mathematics, statistics, physics, information management, or some other health-related field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $53,160 per year

Washington University in St. Louis – St. Louis, Missouri Doctorate in Computational & Data Sciences

Washington University now offers an interdisciplinary Ph.D. in Computational & Data Sciences where students can choose from one of four tracks (Computational Methodologies, Political Science, Psychological & Brain Sciences, or Social Work & Public Health).

Students are fully funded and will receive a stipend for at least five years contingent on making sufficient progress in the program.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $59,420 total

Worcester Polytechnic Institute – Worcester, Massachusetts PhD in Data Science

The PhD in data science at Worcester Polytechnic Institute focuses on 5 areas: integrative data science, business intelligence and case studies, data access and management, data analytics and mining, and mathematical analysis.

Students first complete a master’s in data science, and then complete 60 credit hours beyond the master’s, including 30 credit hours of research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $28,980 per year

Yale University – New Haven, Connecticut PhD Program – Department of Stats and Data Science

The PhD in statistics and data science at Yale University offers broad training in the areas of statistical theory, probability theory, stochastic processes, asymptotics, information theory, machine learning, data analysis, statistical computing, and graphical methods. Students complete 12 courses in the first year in these topics.

Students are required to teach one course each semester of their third and fourth years.

Most students complete and defend their dissertations in their fifth year.

Applicants should have an educational background in statistics, with an undergraduate major in statistics, mathematics, computer science, or similar field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $46,900 total

data science dissertation

  • Related Programs

wiley university servieces logo

  • Warning : Invalid argument supplied for foreach() in /home/customer/www/ on line 95 Warning : array_merge(): Expected parameter 2 to be an array, null given in /home/customer/www/ on line 102
  • AI+ Training
  • Speak at ODSC

data science dissertation

  • Data Engineering
  • Data Visualization
  • Deep Learning
  • Generative AI
  • Machine Learning
  • NLP and LLMs
  • Business & Use Cases
  • Career Advice
  • Write for us
  • ODSC Community Slack Channel
  • Upcoming Webinars

10 Compelling Machine Learning Ph.D. Dissertations for 2020

10 Compelling Machine Learning Ph.D. Dissertations for 2020

Machine Learning Modeling Research posted by Daniel Gutierrez, ODSC August 19, 2020 Daniel Gutierrez, ODSC

As a data scientist, an integral part of my work in the field revolves around keeping current with research coming out of academia. I frequently scour for late-breaking papers that show trends and reveal fertile areas of research. Other sources of valuable research developments are in the form of Ph.D. dissertations, the culmination of a doctoral candidate’s work to confer his/her degree. Ph.D. candidates are highly motivated to choose research topics that establish new and creative paths toward discovery in their field of study. Their dissertations are highly focused on a specific problem. If you can find a dissertation that aligns with your areas of interest, consuming the research is an excellent way to do a deep dive into the technology. After reviewing hundreds of recent theses from universities all over the country, I present 10 machine learning dissertations that I found compelling in terms of my own areas of interest.

[Related article: Introduction to Bayesian Deep Learning ]

I hope you’ll find several that match your own fields of inquiry. Each thesis may take a while to consume but will result in hours of satisfying summer reading. Enjoy!

1. Bayesian Modeling and Variable Selection for Complex Data

As we routinely encounter high-throughput data sets in complex biological and environmental research, developing novel models and methods for variable selection has received widespread attention. This dissertation addresses a few key challenges in Bayesian modeling and variable selection for high-dimensional data with complex spatial structures. 

2. Topics in Statistical Learning with a Focus on Large Scale Data

Big data vary in shape and call for different approaches. One type of big data is the tall data, i.e., a very large number of samples but not too many features. This dissertation describes a general communication-efficient algorithm for distributed statistical learning on this type of big data. The algorithm distributes the samples uniformly to multiple machines, and uses a common reference data to improve the performance of local estimates. The algorithm enables potentially much faster analysis, at a small cost to statistical performance.

Another type of big data is the wide data, i.e., too many features but a limited number of samples. It is also called high-dimensional data, to which many classical statistical methods are not applicable. 

This dissertation discusses a method of dimensionality reduction for high-dimensional classification. The method partitions features into independent communities and splits the original classification problem into separate smaller ones. It enables parallel computing and produces more interpretable results.

3. Sets as Measures: Optimization and Machine Learning

The purpose of this machine learning dissertation is to address the following simple question:

How do we design efficient algorithms to solve optimization or machine learning problems where the decision variable (or target label) is a set of unknown cardinality?

Optimization and machine learning have proved remarkably successful in applications requiring the choice of single vectors. Some tasks, in particular many inverse problems, call for the design, or estimation, of sets of objects. When the size of these sets is a priori unknown, directly applying optimization or machine learning techniques designed for single vectors appears difficult. The work in this dissertation shows that a very old idea for transforming sets into elements of a vector space (namely, a space of measures), a common trick in theoretical analysis, generates effective practical algorithms.

4. A Geometric Perspective on Some Topics in Statistical Learning

Modern science and engineering often generate data sets with a large sample size and a comparably large dimension which puts classic asymptotic theory into question in many ways. Therefore, the main focus of this dissertation is to develop a fundamental understanding of statistical procedures for estimation and hypothesis testing from a non-asymptotic point of view, where both the sample size and problem dimension grow hand in hand. A range of different problems are explored in this thesis, including work on the geometry of hypothesis testing, adaptivity to local structure in estimation, effective methods for shape-constrained problems, and early stopping with boosting algorithms. The treatment of these different problems shares the common theme of emphasizing the underlying geometric structure.

5. Essays on Random Forest Ensembles

A random forest is a popular machine learning ensemble method that has proven successful in solving a wide range of classification problems. While other successful classifiers, such as boosting algorithms or neural networks, admit natural interpretations as maximum likelihood, a suitable statistical interpretation is much more elusive for a random forest. The first part of this dissertation demonstrates that a random forest is a fruitful framework in which to study AdaBoost and deep neural networks. The work explores the concept and utility of interpolation, the ability of a classifier to perfectly fit its training data. The second part of this dissertation places a random forest on more sound statistical footing by framing it as kernel regression with the proximity kernel. The work then analyzes the parameters that control the bandwidth of this kernel and discuss useful generalizations.

6. Marginally Interpretable Generalized Linear Mixed Models

A popular approach for relating correlated measurements of a non-Gaussian response variable to a set of predictors is to introduce latent random variables and fit a generalized linear mixed model. The conventional strategy for specifying such a model leads to parameter estimates that must be interpreted conditional on the latent variables. In many cases, interest lies not in these conditional parameters, but rather in marginal parameters that summarize the average effect of the predictors across the entire population. Due to the structure of the generalized linear mixed model, the average effect across all individuals in a population is generally not the same as the effect for an average individual. Further complicating matters, obtaining marginal summaries from a generalized linear mixed model often requires evaluation of an analytically intractable integral or use of an approximation. Another popular approach in this setting is to fit a marginal model using generalized estimating equations. This strategy is effective for estimating marginal parameters, but leaves one without a formal model for the data with which to assess quality of fit or make predictions for future observations. Thus, there exists a need for a better approach.

This dissertation defines a class of marginally interpretable generalized linear mixed models that leads to parameter estimates with a marginal interpretation while maintaining the desirable statistical properties of a conditionally specified model. The distinguishing feature of these models is an additive adjustment that accounts for the curvature of the link function and thereby preserves a specific form for the marginal mean after integrating out the latent random variables. 

7. On the Detection of Hate Speech, Hate Speakers and Polarized Groups in Online Social Media

The objective of this dissertation is to explore the use of machine learning algorithms in understanding and detecting hate speech, hate speakers and polarized groups in online social media. Beginning with a unique typology for detecting abusive language, the work outlines the distinctions and similarities of different abusive language subtasks (offensive language, hate speech, cyberbullying and trolling) and how we might benefit from the progress made in each area. Specifically, the work suggests that each subtask can be categorized based on whether or not the abusive language being studied 1) is directed at a specific individual, or targets a generalized “Other” and 2) the extent to which the language is explicit versus implicit. The work then uses knowledge gained from this typology to tackle the “problem of offensive language” in hate speech detection. 

8. Lasso Guarantees for Dependent Data

Serially correlated high dimensional data are prevalent in the big data era. In order to predict and learn the complex relationship among the multiple time series, high dimensional modeling has gained importance in various fields such as control theory, statistics, economics, finance, genetics and neuroscience. This dissertation studies a number of high dimensional statistical problems involving different classes of mixing processes. 

9. Random forest robustness, variable importance, and tree aggregation

Random forest methodology is a nonparametric, machine learning approach capable of strong performance in regression and classification problems involving complex data sets. In addition to making predictions, random forests can be used to assess the relative importance of feature variables. This dissertation explores three topics related to random forests: tree aggregation, variable importance, and robustness. 

10. Climate Data Computing: Optimal Interpolation, Averaging, Visualization and Delivery

This dissertation solves two important problems in the modern analysis of big climate data. The first is the efficient visualization and fast delivery of big climate data, and the second is a computationally extensive principal component analysis (PCA) using spherical harmonics on the Earth’s surface. The second problem creates a way to supply the data for the technology developed in the first. These two problems are computationally difficult, such as the representation of higher order spherical harmonics Y400, which is critical for upscaling weather data to almost infinitely fine spatial resolution.

I hope you enjoyed learning about these compelling machine learning dissertations.

Editor’s note: Interested in more data science research? Check out the Research Frontiers track at ODSC Europe this September 17-19 or the ODSC West Research Frontiers track this October 27-30.

data science dissertation

Daniel Gutierrez, ODSC

Daniel D. Gutierrez is a practicing data scientist who’s been working with data long before the field came in vogue. As a technology journalist, he enjoys keeping a pulse on this fast-paced industry. Daniel is also an educator having taught data science, machine learning and R classes at the university level. He has authored four computer industry books on database and data science technology, including his most recent title, “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.” Daniel holds a BS in Mathematics and Computer Science from UCLA.

east discount square

A Structured Approach to Quality Assurance for AI Product Development: 2024 Guide

East 2024 Business + Management posted by ODSC Community Feb 22, 2024 Editor’s note: Kevin Rohling is a speaker for ODSC East this April 23-25. Be sure to...

12 Common Data Security Bad Practices to Avoid

12 Common Data Security Bad Practices to Avoid

cybersecurity Modeling posted by Zac Amos Feb 22, 2024 You might be falling for common data security pitfalls without even realizing it. Knowing which mistakes...

Getting Started with Multimodal Retrieval Augmented Generation

Getting Started with Multimodal Retrieval Augmented Generation

East 2024 Modeling posted by ODSC Community Feb 22, 2024 Editor’s note: Valentina Alto is a speaker for ODSC East this April 23-25. Be sure to...

AI weekly square

data science dissertation

Analytics Insight

10 Best Research and Thesis Topic Ideas for Data Science in 2022

' src=

These research and thesis topics for data science will ensure more knowledge and skills for both students and scholars

  • Handling practical video analytics in a distributed cloud:  With increased dependency on the internet, sharing videos has become a mode of data and information exchange. The role of the implementation of the Internet of Things (IoT), telecom infrastructure, and operators is huge in generating insights from video analytics. In this perspective, several questions need to be answered, like the efficiency of the existing analytics systems, the changes about to take place if real-time analytics are integrated, and others.
  • Smart healthcare systems using big data analytics: Big data analytics plays a significant role in making healthcare more efficient, accessible, and cost-effective. Big data analytics enhances the operational efficiency of smart healthcare providers by providing real-time analytics. It enhances the capabilities of the intelligent systems by using short-span data-driven insights, but there are still distinct challenges that are yet to be addressed in this field.
  • Identifying fake news using real-time analytics:  The circulation of fake news has become a pressing issue in the modern era. The data gathered from social media networks might seem legit, but sometimes they are not. The sources that provide the data are unauthenticated most of the time, which makes it a crucial issue to be addressed.
  • Secure federated learning with real-world applications : Federated learning is a technique that trains an algorithm across multiple decentralized edge devices and servers. This technique can be adopted to build models locally, but if this technique can be deployed at scale or not, across multiple platforms with high-level security is still obscure.
  • Big data analytics and its impact on marketing strategy : The advent of data science and big data analytics has entirely redefined the marketing industry. It has helped enterprises by offering valuable insights into their existing and future customers. But several issues like the existence of surplus data, integrating complex data into customers’ journeys, and complete data privacy are some of the branches that are still untrodden and need immediate attention.
  • Impact of big data on business decision-making: Present studies signify that big data has transformed the way managers and business leaders make critical decisions concerning the growth and development of the business. It allows them to access objective data and analyse the market environments, enabling companies to adapt rapidly and make decisions faster. Working on this topic will help students understand the present market and business conditions and help them analyse new solutions.
  • Implementing big data to understand consumer behaviour : In understanding consumer behaviour, big data is used to analyse the data points depicting a consumer’s journey after buying a product. Data gives a clearer picture in understanding specific scenarios. This topic will help understand the problems that businesses face in utilizing the insights and develop new strategies in the future to generate more ROI.
  • Applications of big data to predict future demand and forecasting : Predictive analytics in data science has emerged as an integral part of decision-making and demand forecasting. Working on this topic will enable the students to determine the significance of the high-quality historical data analysis and the factors that drive higher demand in consumers.
  • The importance of data exploration over data analysis : Exploration enables a deeper understanding of the dataset, making it easier to navigate and use the data later. Intelligent analysts must understand and explore the differences between data exploration and analysis and use them according to specific needs to fulfill organizational requirements.
  • Data science and software engineering : Software engineering and development are a major part of data science. Skilled data professionals should learn and explore the possibilities of the various technical and software skills for performing critical AI and big data tasks.

Whatsapp Icon

Disclaimer: Any financial and crypto market information given on Analytics Insight are sponsored articles, written for informational purpose only and is not an investment advice. The readers are further advised that Crypto products and NFTs are unregulated and can be highly risky. There may be no regulatory recourse for any loss from such transactions. Conduct your own research by contacting financial experts before making any investment decisions. The decision to read hereinafter is purely a matter of choice and shall be construed as an express undertaking/guarantee in favour of Analytics Insight of being absolved from any/ all potential legal action, or enforceable claims. We do not represent nor own any cryptocurrency, any complaints, abuse or concerns with regards to the information provided shall be immediately informed here .

You May Also Like


The Only Thing Stopping AI-Ransomware Attacks is AI Expertise! For How Long?

data science dissertation

Amid Facial Recognition Adoption Race Chinese Residents Concern Over Data Privacy

Best Crypto

10 Best Crypto and Blockchain Careers for 2023


Cardano (ADA) and Dogecoin (DOGE) Investors Jump Ships – Tradecurve Becomes a Life Boat

data science dissertation

Analytics Insight® is an influential platform dedicated to insights, trends, and opinion from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.


  • Select Language:
  • Privacy Policy
  • Content Licensing
  • Terms & Conditions
  • Submit an Interview

Special Editions

  • Dec – Crypto Weekly Vol-1
  • 40 Under 40 Innovators
  • Women In Technology
  • Market Reports
  • AI Glossary
  • Infographics

Latest Issue

Magazine Issue January 2024

Disclaimer: Any financial and crypto market information given on Analytics Insight is written for informational purpose only and is not an investment advice. Conduct your own research by contacting financial experts before making any investment decisions, more information here .

Second Menu

Also, note that the cryptocurrencies mentioned/listed on the website could potentially be scams. i.e designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. you are resposible for conducting your ownresearch (DYOR) before making any investment.

Machine Learning - CMU

PhD Dissertations

PhD Dissertations

[all are .pdf files].

Reliable and Practical Machine Learning for Dynamic Healthcare Settings Helen Zhou, 2023

Automatic customization of large-scale spiking network models to neuronal population activity (unavailable) Shenghao Wu, 2023

Estimation of BVk functions from scattered data (unavailable) Addison J. Hu, 2023

Rethinking object categorization in computer vision (unavailable) Jayanth Koushik, 2023

Advances in Statistical Gene Networks Jinjin Tian, 2023 Post-hoc calibration without distributional assumptions Chirag Gupta, 2023

The Role of Noise, Proxies, and Dynamics in Algorithmic Fairness Nil-Jana Akpinar, 2023

Collaborative learning by leveraging siloed data Sebastian Caldas, 2023

Modeling Epidemiological Time Series Aaron Rumack, 2023

Human-Centered Machine Learning: A Statistical and Algorithmic Perspective Leqi Liu, 2023

Uncertainty Quantification under Distribution Shifts Aleksandr Podkopaev, 2023

Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023

Comparing Forecasters and Abstaining Classifiers Yo Joong Choe, 2023

Using Task Driven Methods to Uncover Representations of Human Vision and Semantics Aria Yuan Wang, 2023

Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023

Applied Mathematics of the Future Kin G. Olivares, 2023



Principled Machine Learning for Societally Consequential Decision Making Amanda Coston, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Maxwell B. Wang

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Darby M. Losey, 2023

Calibrated Conditional Density Models and Predictive Inference via Local Diagnostics David Zhao, 2023

Towards an Application-based Pipeline for Explainability Gregory Plumb, 2022

Objective Criteria for Explainable Machine Learning Chih-Kuan Yeh, 2022

Making Scientific Peer Review Scientific Ivan Stelmakh, 2022

Facets of regularization in high-dimensional learning: Cross-validation, risk monotonization, and model complexity Pratik Patil, 2022

Active Robot Perception using Programmable Light Curtains Siddharth Ancha, 2022

Strategies for Black-Box and Multi-Objective Optimization Biswajit Paria, 2022

Unifying State and Policy-Level Explanations for Reinforcement Learning Nicholay Topin, 2022

Sensor Fusion Frameworks for Nowcasting Maria Jahja, 2022

Equilibrium Approaches to Modern Deep Learning Shaojie Bai, 2022

Towards General Natural Language Understanding with Probabilistic Worldbuilding Abulhair Saparov, 2022

Applications of Point Process Modeling to Spiking Neurons (Unavailable) Yu Chen, 2021

Neural variability: structure, sources, control, and data augmentation Akash Umakantha, 2021

Structure and time course of neural population activity during learning Jay Hennig, 2021

Cross-view Learning with Limited Supervision Yao-Hung Hubert Tsai, 2021

Meta Reinforcement Learning through Memory Emilio Parisotto, 2021

Learning Embodied Agents with Scalably-Supervised Reinforcement Learning Lisa Lee, 2021

Learning to Predict and Make Decisions under Distribution Shift Yifan Wu, 2021

Statistical Game Theory Arun Sai Suggala, 2021

Towards Knowledge-capable AI: Agents that See, Speak, Act and Know Kenneth Marino, 2021

Learning and Reasoning with Fast Semidefinite Programming and Mixing Methods Po-Wei Wang, 2021

Bridging Language in Machines with Language in the Brain Mariya Toneva, 2021

Curriculum Learning Otilia Stretcu, 2021

Principles of Learning in Multitask Settings: A Probabilistic Perspective Maruan Al-Shedivat, 2021

Towards Robust and Resilient Machine Learning Adarsh Prasad, 2021

Towards Training AI Agents with All Types of Experiences: A Unified ML Formalism Zhiting Hu, 2021

Building Intelligent Autonomous Navigation Agents Devendra Chaplot, 2021

Learning to See by Moving: Self-supervising 3D Scene Representations for Perception, Control, and Visual Reasoning Hsiao-Yu Fish Tung, 2021

Statistical Astrophysics: From Extrasolar Planets to the Large-scale Structure of the Universe Collin Politsch, 2020

Causal Inference with Complex Data Structures and Non-Standard Effects Kwhangho Kim, 2020

Networks, Point Processes, and Networks of Point Processes Neil Spencer, 2020

Dissecting neural variability using population recordings, network models, and neurofeedback (Unavailable) Ryan Williamson, 2020

Predicting Health and Safety: Essays in Machine Learning for Decision Support in the Public Sector Dylan Fitzpatrick, 2020

Towards a Unified Framework for Learning and Reasoning Han Zhao, 2020

Learning DAGs with Continuous Optimization Xun Zheng, 2020

Machine Learning and Multiagent Preferences Ritesh Noothigattu, 2020

Learning and Decision Making from Diverse Forms of Information Yichong Xu, 2020

Towards Data-Efficient Machine Learning Qizhe Xie, 2020

Change modeling for understanding our world and the counterfactual one(s) William Herlands, 2020

Machine Learning in High-Stakes Settings: Risks and Opportunities Maria De-Arteaga, 2020

Data Decomposition for Constrained Visual Learning Calvin Murdock, 2020

Structured Sparse Regression Methods for Learning from High-Dimensional Genomic Data Micol Marchetti-Bowick, 2020

Towards Efficient Automated Machine Learning Liam Li, 2020

LEARNING COLLECTIONS OF FUNCTIONS Emmanouil Antonios Platanios, 2020

Provable, structured, and efficient methods for robustness of deep networks to adversarial examples Eric Wong , 2020

Reconstructing and Mining Signals: Algorithms and Applications Hyun Ah Song, 2020

Probabilistic Single Cell Lineage Tracing Chieh Lin, 2020

Graphical network modeling of phase coupling in brain activity (unavailable) Josue Orellana, 2019

Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees Christoph Dann, 2019 Learning Generative Models using Transformations Chun-Liang Li, 2019

Estimating Probability Distributions and their Properties Shashank Singh, 2019

Post-Inference Methods for Scalable Probabilistic Modeling and Sequential Decision Making Willie Neiswanger, 2019

Accelerating Text-as-Data Research in Computational Social Science Dallas Card, 2019

Multi-view Relationships for Analytics and Inference Eric Lei, 2019

Information flow in networks based on nonstationary multivariate neural recordings Natalie Klein, 2019

Competitive Analysis for Machine Learning & Data Science Michael Spece, 2019

The When, Where and Why of Human Memory Retrieval Qiong Zhang, 2019

Towards Effective and Efficient Learning at Scale Adams Wei Yu, 2019

Towards Literate Artificial Intelligence Mrinmaya Sachan, 2019

Learning Gene Networks Underlying Clinical Phenotypes Under SNP Perturbations From Genome-Wide Data Calvin McCarter, 2019

Unified Models for Dynamical Systems Carlton Downey, 2019

Anytime Prediction and Learning for the Balance between Computation and Accuracy Hanzhang Hu, 2019

Statistical and Computational Properties of Some "User-Friendly" Methods for High-Dimensional Estimation Alnur Ali, 2019

Nonparametric Methods with Total Variation Type Regularization Veeranjaneyulu Sadhanala, 2019

New Advances in Sparse Learning, Deep Networks, and Adversarial Learning: Theory and Applications Hongyang Zhang, 2019

Gradient Descent for Non-convex Problems in Modern Machine Learning Simon Shaolei Du, 2019

Selective Data Acquisition in Learning and Decision Making Problems Yining Wang, 2019

Anomaly Detection in Graphs and Time Series: Algorithms and Applications Bryan Hooi, 2019

Neural dynamics and interactions in the human ventral visual pathway Yuanning Li, 2018

Tuning Hyperparameters without Grad Students: Scaling up Bandit Optimisation Kirthevasan Kandasamy, 2018

Teaching Machines to Classify from Natural Language Interactions Shashank Srivastava, 2018

Statistical Inference for Geometric Data Jisu Kim, 2018

Representation Learning @ Scale Manzil Zaheer, 2018

Diversity-promoting and Large-scale Machine Learning for Healthcare Pengtao Xie, 2018

Distribution and Histogram (DIsH) Learning Junier Oliva, 2018

Stress Detection for Keystroke Dynamics Shing-Hon Lau, 2018

Sublinear-Time Learning and Inference for High-Dimensional Models Enxu Yan, 2018

Neural population activity in the visual cortex: Statistical methods and application Benjamin Cowley, 2018

Efficient Methods for Prediction and Control in Partially Observable Environments Ahmed Hefny, 2018

Learning with Staleness Wei Dai, 2018

Statistical Approach for Functionally Validating Transcription Factor Bindings Using Population SNP and Gene Expression Data Jing Xiang, 2017

New Paradigms and Optimality Guarantees in Statistical Learning and Estimation Yu-Xiang Wang, 2017

Dynamic Question Ordering: Obtaining Useful Information While Reducing User Burden Kirstin Early, 2017

New Optimization Methods for Modern Machine Learning Sashank J. Reddi, 2017

Active Search with Complex Actions and Rewards Yifei Ma, 2017

Why Machine Learning Works George D. Montañez , 2017

Source-Space Analyses in MEG/EEG and Applications to Explore Spatio-temporal Neural Dynamics in Human Vision Ying Yang , 2017

Computational Tools for Identification and Analysis of Neuronal Population Activity Pengcheng Zhou, 2016

Expressive Collaborative Music Performance via Machine Learning Gus (Guangyu) Xia, 2016

Supervision Beyond Manual Annotations for Learning Visual Representations Carl Doersch, 2016

Exploring Weakly Labeled Data Across the Noise-Bias Spectrum Robert W. H. Fisher, 2016

Optimizing Optimization: Scalable Convex Programming with Proximal Operators Matt Wytock, 2016

Combining Neural Population Recordings: Theory and Application William Bishop, 2015

Discovering Compact and Informative Structures through Data Partitioning Madalina Fiterau-Brostean, 2015

Machine Learning in Space and Time Seth R. Flaxman, 2015

The Time and Location of Natural Reading Processes in the Brain Leila Wehbe, 2015

Shape-Constrained Estimation in High Dimensions Min Xu, 2015

Spectral Probabilistic Modeling and Applications to Natural Language Processing Ankur Parikh, 2015 Computational and Statistical Advances in Testing and Learning Aaditya Kumar Ramdas, 2015

Corpora and Cognition: The Semantic Composition of Adjectives and Nouns in the Human Brain Alona Fyshe, 2015

Learning Statistical Features of Scene Images Wooyoung Lee, 2014

Towards Scalable Analysis of Images and Videos Bin Zhao, 2014

Statistical Text Analysis for Social Science Brendan T. O'Connor, 2014

Modeling Large Social Networks in Context Qirong Ho, 2014

Semi-Cooperative Learning in Smart Grid Agents Prashant P. Reddy, 2013

On Learning from Collective Data Liang Xiong, 2013

Exploiting Non-sequence Data in Dynamic Model Learning Tzu-Kuo Huang, 2013

Mathematical Theories of Interaction with Oracles Liu Yang, 2013

Short-Sighted Probabilistic Planning Felipe W. Trevizan, 2013

Statistical Models and Algorithms for Studying Hand and Finger Kinematics and their Neural Mechanisms Lucia Castellanos, 2013

Approximation Algorithms and New Models for Clustering and Learning Pranjal Awasthi, 2013

Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems Mladen Kolar, 2013

Learning with Sparsity: Structures, Optimization and Applications Xi Chen, 2013

GraphLab: A Distributed Abstraction for Large Scale Machine Learning Yucheng Low, 2013

Graph Structured Normal Means Inference James Sharpnack, 2013 (Joint Statistics & ML PhD)

Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data Hai-Son Phuoc Le, 2013

Learning Large-Scale Conditional Random Fields Joseph K. Bradley, 2013

New Statistical Applications for Differential Privacy Rob Hall, 2013 (Joint Statistics & ML PhD)

Parallel and Distributed Systems for Probabilistic Reasoning Joseph Gonzalez, 2012

Spectral Approaches to Learning Predictive Representations Byron Boots, 2012

Attribute Learning using Joint Human and Machine Computation Edith L. M. Law, 2012

Statistical Methods for Studying Genetic Variation in Populations Suyash Shringarpure, 2012

Data Mining Meets HCI: Making Sense of Large Graphs Duen Horng (Polo) Chau, 2012

Learning with Limited Supervision by Input and Output Coding Yi Zhang, 2012

Target Sequence Clustering Benjamin Shih, 2011

Nonparametric Learning in High Dimensions Han Liu, 2010 (Joint Statistics & ML PhD)

Structural Analysis of Large Networks: Observations and Applications Mary McGlohon, 2010

Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy Brian D. Ziebart, 2010

Tractable Algorithms for Proximity Search on Large Graphs Purnamrita Sarkar, 2010

Rare Category Analysis Jingrui He, 2010

Coupled Semi-Supervised Learning Andrew Carlson, 2010

Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong, 2009

Efficient Matrix Models for Relational Learning Ajit Paul Singh, 2009

Exploiting Domain and Task Regularities for Robust Named Entity Recognition Andrew O. Arnold, 2009

Theoretical Foundations of Active Learning Steve Hanneke, 2009

Generalized Learning Factors Analysis: Improving Cognitive Models with Machine Learning Hao Cen, 2009

Detecting Patterns of Anomalies Kaustav Das, 2009

Dynamics of Large Networks Jurij Leskovec, 2008

Computational Methods for Analyzing and Modeling Gene Regulation Dynamics Jason Ernst, 2008

Stacked Graphical Learning Zhenzhen Kou, 2007

Actively Learning Specific Function Properties with Applications to Statistical Inference Brent Bryan, 2007

Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Pradeep Ravikumar, 2007

Scalable Graphical Models for Social Networks Anna Goldenberg, 2007

Measure Concentration of Strongly Mixing Processes with Applications Leonid Kontorovich, 2007

Tools for Graph Mining Deepayan Chakrabarti, 2005

Automatic Discovery of Latent Variable Models Ricardo Silva, 2005

data science dissertation


37 Research Topics In Data Science To Stay On Top Of

As a data scientist, staying on top of the latest research in your field is essential.

The data science landscape changes rapidly, and new techniques and tools are constantly being developed.

To keep up with the competition, you need to be aware of the latest trends and topics in data science research.

In this article, we will provide an overview of 37 hot research topics in data science.

We will discuss each topic in detail, including its significance and potential applications.

These topics could be an idea for a thesis or simply topics you can research independently.

Stay tuned – this is one blog post you don’t want to miss!

37 Research Topics in Data Science

1.) predictive modeling.

Predictive modeling is a significant portion of data science and a topic you must be aware of.

Simply put, it is the process of using historical data to build models that can predict future outcomes.

Predictive modeling has many applications, from marketing and sales to financial forecasting and risk management.

As businesses increasingly rely on data to make decisions, predictive modeling is becoming more and more important.

While it can be complex, predictive modeling is a powerful tool that gives businesses a competitive advantage.

predictive modeling

2.) Big Data Analytics

These days, it seems like everyone is talking about big data.

And with good reason – organizations of all sizes are sitting on mountains of data, and they’re increasingly turning to data scientists to help them make sense of it all.

But what exactly is big data? And what does it mean for data science?

Simply put, big data is a term used to describe datasets that are too large and complex for traditional data processing techniques.

Big data typically refers to datasets of a few terabytes or more.

But size isn’t the only defining characteristic – big data is also characterized by its high Velocity (the speed at which data is generated), Variety (the different types of data), and Volume (the amount of the information).

Given the enormity of big data, it’s not surprising that organizations are struggling to make sense of it all.

That’s where data science comes in.

Data scientists use various methods to wrangle big data, including distributed computing and other decentralized technologies.

With the help of data science, organizations are beginning to unlock the hidden value in their big data.

By harnessing the power of big data analytics, they can improve their decision-making, better understand their customers, and develop new products and services.

3.) Auto Machine Learning

Auto machine learning is a research topic in data science concerned with developing algorithms that can automatically learn from data without intervention.

This area of research is vital because it allows data scientists to automate the process of writing code for every dataset.

This allows us to focus on other tasks, such as model selection and validation.

Auto machine learning algorithms can learn from data in a hands-off way for the data scientist – while still providing incredible insights.

This makes them a valuable tool for data scientists who either don’t have the skills to do their own analysis or are struggling.

Auto Machine Learning

4.) Text Mining

Text mining is a research topic in data science that deals with text data extraction.

This area of research is important because it allows us to get as much information as possible from the vast amount of text data available today.

Text mining techniques can extract information from text data, such as keywords, sentiments, and relationships.

This information can be used for various purposes, such as model building and predictive analytics.

5.) Natural Language Processing

Natural language processing is a data science research topic that analyzes human language data.

This area of research is important because it allows us to understand and make sense of the vast amount of text data available today.

Natural language processing techniques can build predictive and interactive models from any language data.

Natural Language processing is pretty broad, and recent advances like GPT-3 have pushed this topic to the forefront.

natural language processing

6.) Recommender Systems

Recommender systems are an exciting topic in data science because they allow us to make better products, services, and content recommendations.

Businesses can better understand their customers and their needs by using recommender systems.

This, in turn, allows them to develop better products and services that meet the needs of their customers.

Recommender systems are also used to recommend content to users.

This can be done on an individual level or at a group level.

Think about Netflix, for example, always knowing what you want to watch!

Recommender systems are a valuable tool for businesses and users alike.

7.) Deep Learning

Deep learning is a research topic in data science that deals with artificial neural networks.

These networks are composed of multiple layers, and each layer is formed from various nodes.

Deep learning networks can learn from data similarly to how humans learn, irrespective of the data distribution.

This makes them a valuable tool for data scientists looking to build models that can learn from data independently.

The deep learning network has become very popular in recent years because of its ability to achieve state-of-the-art results on various tasks.

There seems to be a new SOTA deep learning algorithm research paper on  every single day!

deep learning

8.) Reinforcement Learning

Reinforcement learning is a research topic in data science that deals with algorithms that can learn on multiple levels from interactions with their environment.

This area of research is essential because it allows us to develop algorithms that can learn non-greedy approaches to decision-making, allowing businesses and companies to win in the long term compared to the short.

9.) Data Visualization

Data visualization is an excellent research topic in data science because it allows us to see our data in a way that is easy to understand.

Data visualization techniques can be used to create charts, graphs, and other visual representations of data.

This allows us to see the patterns and trends hidden in our data.

Data visualization is also used to communicate results to others.

This allows us to share our findings with others in a way that is easy to understand.

There are many ways to contribute to and learn about data visualization.

Some ways include attending conferences, reading papers, and contributing to open-source projects.

data visualization

10.) Predictive Maintenance

Predictive maintenance is a hot topic in data science because it allows us to prevent failures before they happen.

This is done using data analytics to predict when a failure will occur.

This allows us to take corrective action before the failure actually happens.

While this sounds simple, avoiding false positives while keeping recall is challenging and an area wide open for advancement.

11.) Financial Analysis

Financial analysis is an older topic that has been around for a while but is still a great field where contributions can be felt.

Current researchers are focused on analyzing macroeconomic data to make better financial decisions.

This is done by analyzing the data to identify trends and patterns.

Financial analysts can use this information to make informed decisions about where to invest their money.

Financial analysis is also used to predict future economic trends.

This allows businesses and individuals to prepare for potential financial hardships and enable companies to be cash-heavy during good economic conditions.

Overall, financial analysis is a valuable tool for anyone looking to make better financial decisions.

Financial Analysis

12.) Image Recognition

Image recognition is one of the hottest topics in data science because it allows us to identify objects in images.

This is done using artificial intelligence algorithms that can learn from data and understand what objects you’re looking for.

This allows us to build models that can accurately recognize objects in images and video.

This is a valuable tool for businesses and individuals who want to be able to identify objects in images.

Think about security, identification, routing, traffic, etc.

Image Recognition has gained a ton of momentum recently – for a good reason.

13.) Fraud Detection

Fraud detection is a great topic in data science because it allows us to identify fraudulent activity before it happens.

This is done by analyzing data to look for patterns and trends that may be associated with the fraud.

Once our machine learning model recognizes some of these patterns in real time, it immediately detects fraud.

This allows us to take corrective action before the fraud actually happens.

Fraud detection is a valuable tool for anyone who wants to protect themselves from potential fraudulent activity.

fraud detection

14.) Web Scraping

Web scraping is a controversial topic in data science because it allows us to collect data from the web, which is usually data you do not own.

This is done by extracting data from websites using scraping tools that are usually custom-programmed.

This allows us to collect data that would otherwise be inaccessible.

For obvious reasons, web scraping is a unique tool – giving you data your competitors would have no chance of getting.

I think there is an excellent opportunity to create new and innovative ways to make scraping accessible for everyone, not just those who understand Selenium and Beautiful Soup.

15.) Social Media Analysis

Social media analysis is not new; many people have already created exciting and innovative algorithms to study this.

However, it is still a great data science research topic because it allows us to understand how people interact on social media.

This is done by analyzing data from social media platforms to look for insights, bots, and recent societal trends.

Once we understand these practices, we can use this information to improve our marketing efforts.

For example, if we know that a particular demographic prefers a specific type of content, we can create more content that appeals to them.

Social media analysis is also used to understand how people interact with brands on social media.

This allows businesses to understand better what their customers want and need.

Overall, social media analysis is valuable for anyone who wants to improve their marketing efforts or understand how customers interact with brands.

social media

16.) GPU Computing

GPU computing is a fun new research topic in data science because it allows us to process data much faster than traditional CPUs .

Due to how GPUs are made, they’re incredibly proficient at intense matrix operations, outperforming traditional CPUs by very high margins.

While the computation is fast, the coding is still tricky.

There is an excellent research opportunity to bring these innovations to non-traditional modules, allowing data science to take advantage of GPU computing outside of deep learning.

17.) Quantum Computing

Quantum computing is a new research topic in data science and physics because it allows us to process data much faster than traditional computers.

It also opens the door to new types of data.

There are just some problems that can’t be solved utilizing outside of the classical computer.

For example, if you wanted to understand how a single atom moved around, a classical computer couldn’t handle this problem.

You’ll need to utilize a quantum computer to handle quantum mechanics problems.

This may be the “hottest” research topic on the planet right now, with some of the top researchers in computer science and physics worldwide working on it.

You could be too.

quantum computing

18.) Genomics

Genomics may be the only research topic that can compete with quantum computing regarding the “number of top researchers working on it.”

Genomics is a fantastic intersection of data science because it allows us to understand how genes work.

This is done by sequencing the DNA of different organisms to look for insights into our and other species.

Once we understand these patterns, we can use this information to improve our understanding of diseases and create new and innovative treatments for them.

Genomics is also used to study the evolution of different species.

Genomics is the future and a field begging for new and exciting research professionals to take it to the next step.

19.) Location-based services

Location-based services are an old and time-tested research topic in data science.

Since GPS and 4g cell phone reception became a thing, we’ve been trying to stay informed about how humans interact with their environment.

This is done by analyzing data from GPS tracking devices, cell phone towers, and Wi-Fi routers to look for insights into how humans interact.

Once we understand these practices, we can use this information to improve our geotargeting efforts, improve maps, find faster routes, and improve cohesion throughout a community.

Location-based services are used to understand the user, something every business could always use a little bit more of.

While a seemingly “stale” field, location-based services have seen a revival period with self-driving cars.


20.) Smart City Applications

Smart city applications are all the rage in data science research right now.

By harnessing the power of data, cities can become more efficient and sustainable.

But what exactly are smart city applications?

In short, they are systems that use data to improve city infrastructure and services.

This can include anything from traffic management and energy use to waste management and public safety.

Data is collected from various sources, including sensors, cameras, and social media.

It is then analyzed to identify tendencies and habits.

This information can make predictions about future needs and optimize city resources.

As more and more cities strive to become “smart,” the demand for data scientists with expertise in smart city applications is only growing.

21.) Internet Of Things (IoT)

The Internet of Things, or IoT, is exciting and new data science and sustainability research topic.

IoT is a network of physical objects embedded with sensors and connected to the internet.

These objects can include everything from alarm clocks to refrigerators; they’re all connected to the internet.

That means that they can share data with computers.

And that’s where data science comes in.

Data scientists are using IoT data to learn everything from how people use energy to how traffic flows through a city.

They’re also using IoT data to predict when an appliance will break down or when a road will be congested.

Really, the possibilities are endless.

With such a wide-open field, it’s easy to see why IoT is being researched by some of the top professionals in the world.

internet of things

22.) Cybersecurity

Cybersecurity is a relatively new research topic in data science and in general, but it’s already garnering a lot of attention from businesses and organizations.

After all, with the increasing number of cyber attacks in recent years, it’s clear that we need to find better ways to protect our data.

While most of cybersecurity focuses on infrastructure, data scientists can leverage historical events to find potential exploits to protect their companies.

Sometimes, looking at a problem from a different angle helps, and that’s what data science brings to cybersecurity.

Also, data science can help to develop new security technologies and protocols.

As a result, cybersecurity is a crucial data science research area and one that will only become more important in the years to come.

23.) Blockchain

Blockchain is an incredible new research topic in data science for several reasons.

First, it is a distributed database technology that enables secure, transparent, and tamper-proof transactions.

Did someone say transmitting data?

This makes it an ideal platform for tracking data and transactions in various industries.

Second, blockchain is powered by cryptography, which not only makes it highly secure – but is a familiar foe for data scientists.

Finally, blockchain is still in its early stages of development, so there is much room for research and innovation.

As a result, blockchain is a great new research topic in data science that vows to revolutionize how we store, transmit and manage data.


24.) Sustainability

Sustainability is a relatively new research topic in data science, but it is gaining traction quickly.

To keep up with this demand, The Wharton School of the University of Pennsylvania has  started to offer an MBA in Sustainability .

This demand isn’t shocking, and some of the reasons include the following:

Sustainability is an important issue that is relevant to everyone.

Datasets on sustainability are constantly growing and changing, making it an exciting challenge for data scientists.

There hasn’t been a “set way” to approach sustainability from a data perspective, making it an excellent opportunity for interdisciplinary research.

As data science grows, sustainability will likely become an increasingly important research topic.

25.) Educational Data

Education has always been a great topic for research, and with the advent of big data, educational data has become an even richer source of information.

By studying educational data, researchers can gain insights into how students learn, what motivates them, and what barriers these students may face.

Besides, data science can be used to develop educational interventions tailored to individual students’ needs.

Imagine being the researcher that helps that high schooler pass mathematics; what an incredible feeling.

With the increasing availability of educational data, data science has enormous potential to improve the quality of education.

online education

26.) Politics

As data science continues to evolve, so does the scope of its applications.

Originally used primarily for business intelligence and marketing, data science is now applied to various fields, including politics.

By analyzing large data sets, political scientists (data scientists with a cooler name) can gain valuable insights into voting patterns, campaign strategies, and more.

Further, data science can be used to forecast election results and understand the effects of political events on public opinion.

With the wealth of data available, there is no shortage of research opportunities in this field.

As data science evolves, so does our understanding of politics and its role in our world.

27.) Cloud Technologies

Cloud technologies are a great research topic.

It allows for the outsourcing and sharing of computer resources and applications all over the internet.

This lets organizations save money on hardware and maintenance costs while providing employees access to the latest and greatest software and applications.

I believe there is an argument that AWS could be the greatest and most technologically advanced business ever built (Yes, I know it’s only part of the company).

Besides, cloud technologies can help improve team members’ collaboration by allowing them to share files and work on projects together in real-time.

As more businesses adopt cloud technologies, data scientists must stay up-to-date on the latest trends in this area.

By researching cloud technologies, data scientists can help organizations to make the most of this new and exciting technology.

cloud technologies

28.) Robotics

Robotics has recently become a household name, and it’s for a good reason.

First, robotics deals with controlling and planning physical systems, an inherently complex problem.

Second, robotics requires various sensors and actuators to interact with the world, making it an ideal application for machine learning techniques.

Finally, robotics is an interdisciplinary field that draws on various disciplines, such as computer science, mechanical engineering, and electrical engineering.

As a result, robotics is a rich source of research problems for data scientists.

29.) HealthCare

Healthcare is an industry that is ripe for data-driven innovation.

Hospitals, clinics, and health insurance companies generate a tremendous amount of data daily.

This data can be used to improve the quality of care and outcomes for patients.

This is perfect timing, as the healthcare industry is undergoing a significant shift towards value-based care, which means there is a greater need than ever for data-driven decision-making.

As a result, healthcare is an exciting new research topic for data scientists.

There are many different ways in which data can be used to improve healthcare, and there is a ton of room for newcomers to make discoveries.


30.) Remote Work

There’s no doubt that remote work is on the rise.

In today’s global economy, more and more businesses are allowing their employees to work from home or anywhere else they can get a stable internet connection.

But what does this mean for data science? Well, for one thing, it opens up a whole new field of research.

For example, how does remote work impact employee productivity?

What are the best ways to manage and collaborate on data science projects when team members are spread across the globe?

And what are the cybersecurity risks associated with working remotely?

These are just a few of the questions that data scientists will be able to answer with further research.

So if you’re looking for a new topic to sink your teeth into, remote work in data science is a great option.

31.) Data-Driven Journalism

Data-driven journalism is an exciting new field of research that combines the best of both worlds: the rigor of data science with the creativity of journalism.

By applying data analytics to large datasets, journalists can uncover stories that would otherwise be hidden.

And telling these stories compellingly can help people better understand the world around them.

Data-driven journalism is still in its infancy, but it has already had a major impact on how news is reported.

In the future, it will only become more important as data becomes increasingly fluid among journalists.

It is an exciting new topic and research field for data scientists to explore.


32.) Data Engineering

Data engineering is a staple in data science, focusing on efficiently managing data.

Data engineers are responsible for developing and maintaining the systems that collect, process, and store data.

In recent years, there has been an increasing demand for data engineers as the volume of data generated by businesses and organizations has grown exponentially.

Data engineers must be able to design and implement efficient data-processing pipelines and have the skills to optimize and troubleshoot existing systems.

If you are looking for a challenging research topic that would immediately impact you worldwide, then improving or innovating a new approach in data engineering would be a good start.

33.) Data Curation

Data curation has been a hot topic in the data science community for some time now.

Curating data involves organizing, managing, and preserving data so researchers can use it.

Data curation can help to ensure that data is accurate, reliable, and accessible.

It can also help to prevent research duplication and to facilitate the sharing of data between researchers.

Data curation is a vital part of data science. In recent years, there has been an increasing focus on data curation, as it has become clear that it is essential for ensuring data quality.

As a result, data curation is now a major research topic in data science.

There are numerous books and articles on the subject, and many universities offer courses on data curation.

Data curation is an integral part of data science and will only become more important in the future.


34.) Meta-Learning

Meta-learning is gaining a ton of steam in data science. It’s learning how to learn.

So, if you can learn how to learn, you can learn anything much faster.

Meta-learning is mainly used in deep learning, as applications outside of this are generally pretty hard.

In deep learning, many parameters need to be tuned for a good model, and there’s usually a lot of data.

You can save time and effort if you can automatically and quickly do this tuning.

In machine learning, meta-learning can improve models’ performance by sharing knowledge between different models.

For example, if you have a bunch of different models that all solve the same problem, then you can use meta-learning to share the knowledge between them to improve the cluster (groups) overall performance.

I don’t know how anyone looking for a research topic could stay away from this field; it’s what the  Terminator  warned us about!

35.) Data Warehousing

A data warehouse is a system used for data analysis and reporting.

It is a central data repository created by combining data from multiple sources.

Data warehouses are often used to store historical data, such as sales data, financial data, and customer data.

This data type can be used to create reports and perform statistical analysis.

Data warehouses also store data that the organization is not currently using.

This type of data can be used for future research projects.

Data warehousing is an incredible research topic in data science because it offers a variety of benefits.

Data warehouses help organizations to save time and money by reducing the need for manual data entry.

They also help to improve the accuracy of reports and provide a complete picture of the organization’s performance.

Data warehousing feels like one of the weakest parts of the Data Science Technology Stack; if you want a research topic that could have a monumental impact – data warehousing is an excellent place to look.

data warehousing

36.) Business Intelligence

Business intelligence aims to collect, process, and analyze data to help businesses make better decisions.

Business intelligence can improve marketing, sales, customer service, and operations.

It can also be used to identify new business opportunities and track competition.

BI is business and another tool in your company’s toolbox to continue dominating your area.

Data science is the perfect tool for business intelligence because it combines statistics, computer science, and machine learning.

Data scientists can use business intelligence to answer questions like, “What are our customers buying?” or “What are our competitors doing?” or “How can we increase sales?”

Business intelligence is a great way to improve your business’s bottom line and an excellent opportunity to dive deep into a well-respected research topic.

37.) Crowdsourcing

One of the newest areas of research in data science is crowdsourcing.

Crowdsourcing is a process of sourcing tasks or projects to a large group of people, typically via the internet.

This can be done for various purposes, such as gathering data, developing new algorithms, or even just for fun (think: online quizzes and surveys).

But what makes crowdsourcing so powerful is that it allows businesses and organizations to tap into a vast pool of talent and resources they wouldn’t otherwise have access to.

And with the rise of social media, it’s easier than ever to connect with potential crowdsource workers worldwide.

Imagine if you could effect that, finding innovative ways to improve how people work together.

That would have a huge effect.

crowd sourcing

Final Thoughts, Are These Research Topics In Data Science For You?

Thirty-seven different research topics in data science are a lot to take in, but we hope you found a research topic that interests you.

If not, don’t worry – there are plenty of other great topics to explore.

The important thing is to get started with your research and find ways to apply what you learn to real-world problems.

We wish you the best of luck as you begin your data science journey!

Other Data Science Articles

We love talking about data science; here are a couple of our favorite articles:

  • Why Are You Interested In Data Science?
  • Recent Posts

Dylan Kaplan

  • 37 Research Topics In Data Science To Stay On Top Of - February 22, 2024
  • Loud and Proud: Verbose in Machine Learning - February 22, 2024
  • ML 101: 8 Heatmaps in Python (Full Code) - February 22, 2024


4131 Dolphin Dr Unit 81315, Tampa, FL 33617

© 2024

Data Science

Research Areas

Main navigation.

The world is being transformed by data and data-driven analysis is rapidly becoming an integral part of science and society. Stanford Data Science is a collaborative effort across many departments in all seven schools. We strive to unite existing data science research initiatives and create interdisciplinary collaborations, connecting the data science and related methodologists with disciplines that are being transformed by data science and computation.

Our work supports research in a variety of fields where incredible advances are being made through the facilitation of meaningful collaborations between domain researchers, with deep expertise in societal and fundamental research challenges, and methods researchers that are developing next-generation computational tools and techniques, including:

Data Science for Wildland Fire Research

In recent years, wildfire has gone from an infrequent and distant news item to a centerstage isssue spanning many consecutive weeks for urban and suburban communities. Frequent wildfires are changing everyday lives for California in numerous ways -- from public safety power shutoffs to hazardous air quality -- that seemed inconceivable as recently as 2015. Moreover, elevated wildfire risk in the western United States (and similar climates globally) is here to stay into the foreseeable future. There is a plethora of problems that need solutions in the wildland fire arena; many of them are well suited to a data-driven approach.

Seminar Series

Data Science for Physics

Astrophysicists and particle physicists at Stanford and at the SLAC National Accelerator Laboratory are deeply engaged in studying the Universe at both the largest and smallest scales, with state-of-the-art instrumentation at telescopes and accelerator facilities

Data Science for Economics

Many of the most pressing questions in empirical economics concern causal questions, such as the impact, both short and long run, of educational choices on labor market outcomes, and of economic policies on distributions of outcomes. This makes them conceptually quite different from the predictive type of questions that many of the recently developed methods in machine learning are primarily designed for.

Data Science for Education

Educational data spans K-12 school and district records, digital archives of instructional materials and gradebooks, as well as student responses on course surveys. Data science of actual classroom interaction is also of increasing interest and reality.

Data Science for Human Health

It is clear that data science will be a driving force in transitioning the world’s healthcare systems from reactive “sick-based” care to proactive, preventive care.

Data Science for Humanity

Our modern era is characterized by massive amounts of data documenting the behaviors of individuals, groups, organizations, cultures, and indeed entire societies. This wealth of data on modern humanity is accompanied by massive digitization of historical data, both textual and numeric, in the form of historic newspapers, literary and linguistic corpora, economic data, censuses, and other government data, gathered and preserved over centuries, and newly digitized, acquired, and provisioned by libraries, scholars, and commercial entities.

Data Science for Linguistics

The impact of data science on linguistics has been profound. All areas of the field depend on having a rich picture of the true range of variation, within dialects, across dialects, and among different languages. The subfield of corpus linguistics is arguably as old as the field itself and, with the advent of computers, gave rise to many core techniques in data science.

Data Science for Nature and Sustainability

Many key sustainability issues translate into decision and optimization problems and could greatly benefit from data-driven decision making tools. In fact, the impact of modern information technology has been highly uneven, mainly benefiting large firms in profitable sectors, with little or no benefit in terms of the environment. Our vision is that data-driven methods can — and should — play a key role in increasing the efficiency and effectiveness of the way we manage and allocate our natural resources.

Ethics and Data Science

With the emergence of new techniques of machine learning, and the possibility of using algorithms to perform tasks previously done by human beings, as well as to generate new knowledge, we again face a set of new ethical questions.

The Science of Data Science

The practice of data analysis has changed enormously. Data science needs to find new inferential paradigms that allow data exploration prior to the formulation of hypotheses.

A Deep Dissertion of Data Science: Related Issues and its Applications

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Home » Blog » Dissertation » 99 Data Science Dissertation Topics | Research Ideas

data science dissertation

99 Data Science Dissertation Topics | Research Ideas

By Liam Dec 5, 2023 in Data Science , Dissertation | No Comments

Are you a student embarking on the exhilarating journey of choosing Data Science dissertation topics for your undergraduate, master’s, or doctoral dissertation? The realm of Data Science offers a plethora of opportunities to explore and contribute to this dynamic field through your dissertation research. Selecting the right Data Science dissertation topic is crucial, as it […]

Data Science Dissertation Topics

Are you a student embarking on the exhilarating journey of choosing Data Science dissertation topics for your undergraduate, master’s, or doctoral dissertation? The realm of Data Science offers a plethora of opportunities to explore and contribute to this dynamic field through your dissertation research. Selecting the right Data Science dissertation topic is crucial, as it will set the course for your academic exploration and potentially open doors to exciting career prospects. In this article, we will provide you with a comprehensive list of Data Science dissertation topics tailored to different degree levels, ensuring you find the perfect subject to delve into for your upcoming dissertation project.

In conclusion, the world of Data Science is brimming with possibilities, waiting for you to unlock its secrets through your dissertation research. Whether you are pursuing an undergraduate, master’s, or doctoral degree, there are numerous captivating Data Science dissertation topics to choose from. Remember to select a topic that aligns with your interests, skills, and career aspirations, as it will be the cornerstone of your academic journey and future success in the field of Data Science. So, dive into the realm of Data Science, embark on your dissertation adventure, and make a meaningful contribution to this ever-evolving discipline. Download Data Science Dissertation Sample

A list of Data Science Dissertation Topics:

Analyzing the influence of regional variations on data science adoption and applications in the UK.

Analyzing the ethical implications of using AI in healthcare decision-making.

Examining the use of deep reinforcement learning in robotics.

Analyzing the role of data science in enhancing customer experience in retail.

Analyzing the impact of data science on political campaign strategies.

Exploring the use of data science in addressing climate change challenges specific to the UK.

Examining the use of natural language processing for sentiment analysis in social media.

Investigating the potential of data science in predicting natural disasters.

Assessing the effectiveness of machine learning in credit risk assessment.

Assessing the impact of data analytics in optimizing energy consumption.

Investigating the challenges of data quality in healthcare analytics.

Analyzing the influence of data-driven decision-making in public policy.

Analyzing the impact of bias and fairness issues in machine learning algorithms.

Analyzing the role of data analytics in predicting and managing future pandemics.

Assessing the effectiveness of data-driven marketing in the tourism industry.

Examining the role of data science in improving healthcare diagnostics.

Assessing the ethical implications of AI and machine learning algorithms in the UK criminal justice system.

Evaluating the role of data analytics in enhancing urban planning and smart cities initiatives in the UK.

Assessing the impact of data analytics in optimizing online advertising campaigns.

Analyzing the role of data science in predicting customer behavior in e-commerce.

Investigating the effectiveness of data mining techniques in fraud detection.

Assessing the use of machine learning in personalized education recommendations.

Exploring the synergy of data science and artificial intelligence for predictive analytics.

Investigating the application of data science in predicting customer preferences in online retail.

Examining the use of natural language processing for healthcare chatbots.

Investigating the use of data science for personalized education and skill development in the UK.

Examining the role of data analytics in optimizing supply chain management.

Examining the ethical considerations in using AI for mental health support.

Analyzing the evolution of data science applications in the fields of healthcare and life sciences.

Analyzing the influence of data-driven decision-making in cybersecurity strategies.

Exploring the use of natural language processing for tracking and analyzing COVID-19 misinformation in online social networks.

Assessing the role of data science in predicting disease outbreaks.

Analyzing the role of data science in personalized healthcare treatments.

Assessing the implications of data science in improving environmental monitoring and conservation efforts.

Analyzing the role of data science in improving urban planning.

Assessing the impact of data analytics in improving student performance in education.

Assessing the effectiveness of machine learning in predictive maintenance.

Examining the ethical considerations of AI-driven decision-making in healthcare post-COVID-19.

Examining the application of data-driven methods in climate change modeling.

Investigating the impact of Brexit on data sharing and collaborations in the UK data science ecosystem.

Investigating the impact of remote work on data privacy and security in the post-COVID era.

Exploring the role of data analytics in optimizing energy consumption and sustainability in the UK.

Examining the ethical considerations in using AI for criminal justice decisions.

Examining the ethical issues in AI-driven personalized content recommendation.

Analyzing the role of data science in optimizing manufacturing processes.

Analyzing societal trends through data science: a sociological perspective.

Assessing the impact of data analytics on predicting customer churn.

Investigating the challenges of data quality in financial data analysis.

Exploring the implications of remote learning data for educational policy and student outcomes post-COVID.

Analyzing the impact of GDPR on data privacy and data science practices in the UK.

Examining the ethical considerations in using AI for environmental monitoring.

Investigating the challenges of data integration in multi-modal sensor networks.

Examining the ethical considerations in using AI for hiring and HR decisions.

Analyzing the influence of data preprocessing techniques on predictive modeling outcomes.

Examining the use of deep learning in image and video recognition.

Examining the use of deep reinforcement learning for autonomous driving.

Investigating the application of data science in predicting consumer trends in fashion.

Investigating the role of data science in improving cybersecurity and threat detection.

Reviewing recent advancements in data science techniques for anomaly detection.

Investigating the ethical issues in AI-driven autonomous vehicles.

Investigating the intersection of data science and artificial intelligence in autonomous systems.

Assessing the effectiveness of machine learning in speech recognition.

Investigating the challenges of data privacy in social media analytics.

Evaluating the role of data science in supporting mental health services and well-being during and after the pandemic.

Analyzing the influence of data visualization on data-driven decision-making.

Analyzing the role of data science in predicting employee turnover.

Evaluating the effectiveness of data-driven decision-making in business and industry.

Evaluating the effectiveness of data-driven approaches in addressing public health challenges in the UK.

Investigating the application of data science in predicting traffic patterns.

Examining the contribution of data science to financial decision-making and risk management in the UK.

Investigating the challenges of data integration in healthcare informatics.

Examining the ethical issues in AI-powered virtual assistants.

Analyzing the influence of data-driven recommendations in the entertainment industry.

Enhancing food safety and quality with data science applications in food science.

Investigating the impact of machine learning algorithms on stock market prediction accuracy.

Examining the use of natural language processing for language translation.

Investigating the impact of big data analytics on e-commerce recommendation systems.

Investigating the use of data science for personalized marketing strategies.

Assessing the fairness and bias in machine learning algorithms for loan approval.

Evaluating the effectiveness of anomaly detection techniques in cybersecurity.

Analyzing the challenges and opportunities of data privacy in the era of IoT.

Analyzing the influence of data-driven decision-making in sports analytics.

Assessing the effectiveness of machine learning in sentiment analysis of news articles.

Analyzing the role of data science in predicting food supply chain disruptions.

Assessing the effectiveness of machine learning models in predicting stock price movements.

Assessing the use of machine learning algorithms to enhance contact tracing efforts in the context of infectious disease outbreaks.

Analyzing the influence of data-driven decision-making in disaster response.

Examining the use of deep learning for medical image analysis.

Assessing the impact of data analytics in optimizing energy grid operations.

Investigating the challenges of data integration in large-scale data projects.

Investigating the evolution of consumer behavior and sentiment analysis during the pandemic.

Exploring the applications of data science in social sciences and public policy research.

Examining the challenges and opportunities in data science for sustainable development.

Evaluating the adoption and effectiveness of telehealth data analytics during and after the COVID-19 pandemic.

Investigating the challenges of data privacy in genetic research.

Investigating the application of data science in predicting real estate market trends.

Assessing the performance of deep learning models in natural language processing tasks.

Analyzing the impact of COVID-19 on data-driven supply chain management and optimization.

Assessing the impact of data analytics in optimizing customer service.

Examining the use of deep reinforcement learning in autonomous drones.

There you go. Use the list of Data Science dissertation topics well and let us know if you have any comments or suggestions for topics-related blog posts for the future or want help with dissertation writing; send us an email at [email protected] .

Related Posts

  • 99 Political Science Dissertation Topics | Research Ideas December 12, 2023 -->
  • 99 Consumer Behaviour Dissertation Topics | Research Ideas December 12, 2023 -->
  • 99 Data Mining Dissertation Topics | Research Ideas December 12, 2023 -->
  • 99 Artificial Intelligence Dissertation Topics | Research Ideas December 11, 2023 -->
  • 99 Cybersecurity Dissertation Topics | Research Ideas December 11, 2023 -->
  • 99 Urban Planning Dissertation Topics | Research Ideas December 10, 2023 -->
  • 99 Textiles Dissertation Topics | Research Ideas December 10, 2023 -->
  • 99 Public Relations Dissertation Topics | Research Ideas December 9, 2023 -->
  • 99 Sociology Dissertation Topics | Research Ideas December 9, 2023 -->
  • 99 Psychotherapy Dissertation Topics | Research Ideas December 9, 2023 -->
  • 99 Pharmaceuticals Dissertation Topics | Research Ideas December 8, 2023 -->
  • 99 Photography Dissertation Topics | Research Ideas December 8, 2023 -->
  • 99 Neurology Dissertation Topics | Research Ideas December 8, 2023 -->
  • 99 Pathology Dissertation Topics | Research Ideas December 8, 2023 -->
  • 99 Natural Resource Management Dissertation Topics December 7, 2023 -->

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

WhatsApp us


Get an experienced writer start working

Review our examples before placing an order, learn how to draft academic papers, data science dissertation topics.

Research Citations A Practical Guide to Citing Your Sources Correctly

Research Citations: Navigating the Essentials of Proper Citation

data science dissertation

How to Craft an Exceptional Concept Paper: A Comprehensive Guide

data science dissertation

  • Dissertation Topics

data science dissertation

Data science is an interdisciplinary field that combines statistical and computational methods to extract insights and knowledge from data. Students in this field study statistics, programming, machine learning, data visualization, and data management using tools like Python, R, Tableau, PowerBI, AWS Azure, and programming languages. For ease of students, we offer a list of trending data science dissertation topics .

Review Our Quality Computer Science Dissertation Examples 

Premier Dissertations has produced a list of the latest dissertation topics in data science for 2024 .

If you would like to choose any topic from the list below, simply drop us a WhatsApp or an Email .

You may also like to review;

Science Dissertation Topics | Neuroscience Dissertation Topics  

3-Step Dissertation Process!

data science dissertation

Get 3+ Topics

data science dissertation

Dissertation Proposal

data science dissertation

Get Final Dissertation

List of latest data science research topics 2024, quality research topics in data science, trending thesis topics in data science for 2024, how does it work.

data science dissertation

Fill the Form

Please fill the free topic form and share your requirements

data science dissertation

Writer Starts Working

The writer starts to find a topic for you (based on your requirements)

data science dissertation

3+ Topics Emailed!

The writer shared custom topics with you within 24 hours

Get Expert Advice Before Deciding Topics in Data Science 

Getting expert advice before finalizing your data science project topics is crucial. Experts can guide you in choosing a relevant and feasible research area, ensuring your study aligns with current trends and challenges in data science. Their insights can help you refine your ideas and make informed decisions, setting the foundation for a successful and impactful dissertation. Seeking expert advice is like having a reliable map for your research journey, helping you navigate the complexities of data science.

Review Our Full List of Latest Research Topics  

For more data science thesis topics, please keep checking our website as we keep adding new topics to our existing list of titles. GOOD LUCK!

Get an Immediate Response

Discuss your requirments with our writers

Get 3+ Free Data Science Dissertation Topics within 24 hours

Your Number

Academic Level Select Academic Level Undergraduate Masters PhD

Area of Research

Discover More:

Business Administration and MBA Dissertation Topics Construction Engineering Dissertation Topics Environment and Sustainability Dissertation Topics Project Management Dissertation Topics COVID-19 Dissertation Topics Business Management Dissertation Topics Health and Safety Dissertation Topics Cryptocurrency Dissertation Topics Cyber Security Dissertation Topics Education Dissertation-Topics

admin farhan

admin farhan

Related posts.

Mass Media Research Topics

Mass Media Research Topics


Environmental Sustainability Dissertation Topics

data science dissertation

Chemistry Dissertation Topics

Comments are closed.

Alumni Perspective: Do I Need a Master’s Degree for Data Science? 

Alex Bass stands outside with mountain vista during fall

In March 2021, after researching for quite some time, I decided to do a master’s in data science. A little over two years later, I received my M.S. in data science from the University of Virginia. 

I hope to provide readers with some insight into the benefits (and drawbacks) of pursuing a master’s degree in the data science field. 

I’ll break up my thoughts into three main sections: my experience and tips in deciding to do a master’s degree; my experience and tips in choosing a program; and, other reflections and notes. 

My Experience Deciding to Go Back to School 

This path is unique to every person, of course. It’s worth noting that most data scientists have at least a master’s degree, but it is not necessarily required. This is at least according to a few links I found Googling around but also matches my personal experience as all my data scientist colleagues at my current workplace have a master’s or a Ph.D. Regardless, I’ll walk you through my logic in making my own decision.    Money: This depends on the program you choose. But still, any graduate program will generally require a monetary sacrifice (at least for my U.S. friends). I was fortunate enough to attend a low-cost private institution (about $5,000 a year in tuition) for my undergrad. Working through school and with help from my parents, I graduated without debt. Because I didn’t have debt, I was more willing to spend money on a master’s degree a year after my undergrad. This, however, won’t be the case with everyone, especially if you aren’t interested in more debt or want more of a break from school.    Time Commitment: This also depends on the program you choose. Because I knew that I wouldn’t receive financial help from my parents for my master’s degree, I wanted the option to work full-time while attending school part-time to offset the cost. This was a trade-off as I had to sacrifice many nights and weekends over two years. With that being said, I also graduated without any debt because I was able to work full-time. Regardless of your program and how flexible it is, getting a master’s degree is a time commitment. You need to take set aside X time to take Y credits no matter how long you spread it out.    Field Value: How valuable will this be for my career if completed? I think this depends on each situation. For example, someone with an undergrad STEM degree may have an easier time breaking into data science than someone with a social science degree. Therefore, a master’s degree in data science would be more valuable for a person with a degree in a social science field than with a degree in a STEM field. 

However, if you are already working as a data scientist, it probably doesn’t matter what degree you had in your undergrad at this point because experience generally is seen as more valuable than education when getting jobs. 

Educational Value: I believe the act of learning is intrinsically valuable, and I think degrees help you learn. Certainly, you don’t need a degree to learn about a subject. But if you are a person who benefits from professors with answers and a structured curriculum, I think the learning opportunity from a degree is intrinsically valuable. If you are not that person, there are so many online courses and online resources for learning everything about data science. It is a field that can be learned through self-study. For me, I wanted more exposure to the field of data science in a structured, comprehensive curriculum, which I received!    Decision: At the time, I decided to go forward with a master’s because I had the time, the means, and a gap in my knowledge. I also believed that this degree would help my professional career, which it did. From the start to the end of the program, I went from data analyst to data scientist, and my salary doubled.    Tip: A master’s degree in data science is especially valuable to those who have never worked in data science before and do not have a STEM degree in undergrad. I found myself in this position, which made a master’s degree very valuable for me in my opinion. 

My Experience Choosing a Program 

I investigated and researched several different programs. Here are some of the things I looked for:    Focus: As you start looking into this, you’ll notice that there are broad programs and more focused ones. I would say that typically a master’s degree is considered broader than a Ph.D., which is designed to hone in on one very specific piece in a field. But even among master’s degrees you see more broad ones: M.S. in data science (this was my program) or analytics or information and then you have M.S. in data analytics and public policy or M.S. in computer vision or M.S. in quantitative social science. 

Generally, you can expect the broad ones to cover a wider range of topics and the other ones to take a stronger focus on one part of data science. But, the reality of data science is… it’s a super broad field with applications in most other fields. There are natural language processing data scientists, biology data scientists, public health data scientists, political data scientists, robotics data scientists, marketing data scientists, sports data scientists, and on and on. 

While subject knowledge is important in each of these applications of data science, often the methods are more similar than different. If you are convinced that you will only be in one corner of data science, perhaps it would be better to do a focused one, but for me, I was interested in exploring the field more so than choosing a corner.    Cost: Price varies significantly from $10,000 total to more than $100,000. My program ended up costing $40,000, but my salary more than doubled from the start to the end of the program, and the salary differential fully covered the cost of the program and more! 

Technical Prerequisites: Different programs are geared toward different crowds. Some will have no or very few prerequisites. Personally, these programs worried me because I assumed that programs with few prerequisites would not cover topics in the technical depth I desired. I didn’t want a bunch of classes just covering things I already learned in my undergrad. I wanted to learn after all!    My specific program had four prerequisites at the time: calculus I & II, linear algebra, and programming in Python. I had already taken calculus in school but ended up taking community classes for linear algebra and programming in Python.    Technical Depth of Program: For those committed to a degree, I highly encourage you to look through the required courses in the program. Often, you can also find summaries and even syllabi for the classes in the program. A lot of programs that did not have prerequisites also did not seem to have technical depth in classes. What drew me to my program was there were classes on all the topics I was interested in (deep learning, Bayesian machine learning, NLP, and also more engineering components such as Spark/big data and data structures and algorithms). 

Looking back at my program, I am pleased with the courses I took. I learned so much in most of them and was exposed to so many interesting things from computer vision to NLP to CS search algorithms that I had no idea about with my political science background.    Decision: I decided on UVA because I was satisfied with the technical depth of the program, and the cost was not more than I could handle in my position at the time. It offered the flexibility I was looking for and opportunities for connections among classmates and professors. I also appreciated how broad the courses were — covering many facets of data science throughout the program. 

Reflections and Notes 

Tip 1: Of everything, I wish I paid more attention to program outcomes. Answering questions like, where do the alumni work? Does my program feed into specific companies? This is a HUGE advantage of certain programs. For example, UVA feeds into Capital One and has strong contacts and connections there for graduates. If you are interested in working at Capital One, my program would be a great choice!    If I were to do everything over again, I’m not sure I would change anything. But, if I did, I would apply to certain programs that have strong big tech connections such as the University of California, Berkeley; the University of Washington; and several other California schools. I may have considered Georgia Tech’s program, which has a strong reputation, and the tuition is about $10,000.    Tip 2: My program was fully online. In truth, it was one of the best online programs I could hope for. My cohort was about 30 students that I stuck with throughout the program. There were numerous group projects and assignments, and I got to know the students in my cohort. With the relatively small class sizes, I also felt like I got to know several of my professors. 

Having said all of this, I felt like I had a better experience and developed better connections during undergrad, which was in person. For these reasons, if I were to do it again, I would try to find an in-person program. This may not be possible for everyone — some may want to work or are unable to relocate — but I do think in-person provides a richer academic experience.    Tip 3: It can be helpful to just search around on LinkedIn and see the paths that current data scientists took to get where they are. Often, they do not have a master’s degree in data science but have a degree in physics, computer science, or some other STEM field. In my opinion, most STEM degrees are seen as generally equally viable in data science job candidates. So, if you are more interested in studying another STEM topic but want to work in data science, go study that topic and you can still get a job in data science. 

Learn more about Alex, his data science journey, and his projects on his blog . 

2022 Graduates celebrate with Dean

Demand for Data Science Drives Graduate Career Success

woman in suit at starting line for race

Job Forecast for Data Scientists Still Bright

UVA Rotunda

UVA Named to Top 20 Public Universities for High-Paying Data Science Jobs

Headshot of Reggie Leonard

Reggie Leonard

Headshot of Hannah Pede

Hannah Pede

Get the latest news.

Subscribe to receive updates from the School of Data Science.

  • Prospective Student
  • School of Data Science Alumnus
  • UVA Affiliate
  • Industry Member

Data Column | Institute for Advanced Analytics

Data Column | Institute for Advanced Analytics

The Collaborative Blog for Students in the Master of Science in Analytics

Data Column | Institute for Advanced Analytics

Philosophy and Data Science

Studying philosophy will make you a better data scientist.

Now, I might be biased because I did study both philosophy and statistics as an undergraduate. Still, the soft skills that are vital for the modern data scientist are just what philosophy teaches. Asking the right questions, thinking critically about ethics, clear communication, and problem-solving are all core competencies of both the philosopher and the data scientist, and I’d like to talk a bit more about each of them here. My hope is that by the end of this article, you’ll be inspired to pick up a little Plato!

Philosophy strengthens and enriches your domain knowledge by teaching you to ask better questions.

Data science is a very broadly applicable skill set across many different domains, which means that throughout your career, you might play in many different peoples’ backyards. This makes data science an exciting field to work in, but it also requires you to rapidly get situated and ask the right questions about topics that might be foreign to you. Philosophy is, in large part, the study of how to ask the right questions. 

For example, I am very passionate about the environment, so in college, I took an elective course called “Environmental Ethics.” I spent a lot of time that semester reading, discussing, and rebutting or extending a diverse set of arguments (about the nature of nature, how we ought to relate to our environment, and what we should think about when discussing environmental policy). Grappling with that material helped me adapt quickly by asking better questions about my postgraduate work at the Environmental Protection Agency. Without philosophy, I would not have had as strong an understanding of the ways my work would impact the natural world. And if you need more convincing, the logic course I took freshman year also comes in handy when I’m writing SQL queries!

Philosophy teaches you ethics.

Statistics is the science of generalizations – making broad inferences across demographics and identities is a core part of the discipline. As a data scientist, you put yourself at risk of over-generalizing or discriminating beyond what is appropriate by applying statistics without asking the right questions. Philosophers have spent millennia formulating, refining, and studying those questions, and by studying ethics, you can tap into that vast reservoir of knowledge. Once you’re comfortable grappling with questions like “What does it mean to blame somebody?” or “Does it make sense to ‘deserve’ punishment?” then questions like “Should our loan eligibility model be race-blind?” become much easier to answer.

Furthermore, as a data scientist, your work will often be extensive in scope and your results or models general in nature. It’s easy enough to treat an individual kindly, but designing an algorithm that doesn’t accidentally harm anyone is much more difficult. How do you ensure that your algorithm comes up with the best response for the average user without absolutely ruining the lives of a small group of users? Ethicists have been discussing this very issue for decades, and engaging with that literature will make you a stronger, safer, and more ethical data scientist.

Philosophy reinforces valuable data science skills like communication and problem-solving.

Clear, concise, and well-contextualized writing is one of the core values of contemporary analytic philosophy. Concise writing should be one of the main goals of any good data scientist. The shorter and easier it is to read your paper, the more effectively it helps bring a reader into the context of the broader philosophical discussion. Getting lots of practice writing clear and easy-to-follow papers on topics that are often hideously thorny and receiving feedback on that writing is a great way to become a better data scientist.

Problem-solving is a vital skill for the aspiring data scientist, but it is also one of the hardest skills to master. You have to swim through an ocean of problems until you start to get a sense of what works and what doesn’t. If the problems of philosophy were easy to solve, philosophy would cease to exist. Spending a few years wrestling with problems that are so hard they’ve stuck around for centuries, if not millennia, will make you a stronger problem-solver and help build your intuition for what may or may not work. Like with math and coding, you’ll get really good at thinking about possible edge cases!

In general, while hard technical skills are certainly important for data scientists, our focus on those skills can lead us to overlook all the valuable soft skills we can gain from studying other subjects. I use problem-solving, ethics, and writing skills that I worked so hard on in undergrad every day. If philosophy isn’t your cup of tea, I urge you to think about how the other paths you may have walked before coming to data science can inform your current work. That being said, I hope I have piqued your interest and maybe even persuaded you to check out some philosophy books, videos, or syllabi. If you’d like to talk philosophy (or climate, or math, or music!), please feel free to reach out!

Columnist: Henry Williams


  1. Writing the Best Dissertation Data Analysis Possible

    data science dissertation

  2. What Is Data Science Essay

    data science dissertation

  3. dissertation data analysis help

    data science dissertation

  4. Data Science for Business. Lecture 1. Introduction to Data Science

    data science dissertation

  5. Computer Science Dissertation Literature Review Example

    data science dissertation

  6. Calaméo

    data science dissertation


  1. Inaugural Session : Research Methodology and Data Analysis

  2. Introduction to Data Science

  3. Introduction to Data Science

  4. 3 Websites For Datasets & Research Papers 😮📜 #datascience #artificialintelligence #data #research

  5. Introduction to Data Science

  6. Day 72: Data Science & Client Work #dataanalysis #motivation #datascience #100daysoflearning


  1. Computational and Data Sciences (PhD) Dissertations

    Optimal Analytical Methods for High Accuracy Cardiac Disease Classification and Treatment Based on ECG Data, Jianwei Zheng. Dissertations from 2020 PDF. Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents, Steven Agajanian. PDF

  2. 17 Compelling Machine Learning Ph.D. Dissertations

    This dissertation revisits and makes progress on some old but challenging problems concerning least squares estimation, the work-horse of supervised machine learning. Two major problems are addressed: (i) least squares estimation with heavy-tailed errors, and (ii) least squares estimation in non-Donsker classes.

  3. How to write a great data science thesis

    They will stress the importance of structure, substance and style. They will urge you to write down your methodology and results first, then progress to the literature review, introduction and conclusions and to write the summary or abstract last. To write clearly and directly with the reader's expectations always in mind.

  4. Top 10 Essential Data Science Topics to Real-World Application From the

    1. Introduction. Statistics and data science are more popular than ever in this era of data explosion and technological advances. Decades ago, John Tukey (Brillinger, 2014) said, "The best thing about being a statistician is that you get to play in everyone's backyard."More recently, Xiao-Li Meng (2009) said, "We no longer simply enjoy the privilege of playing in or cleaning up everyone ...

  5. Doctor of Data Science and Analytics Dissertations

    The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests. We launched the first formal PhD program in Data Science in 2015.

  6. Recent Dissertation Topics

    2015. 2014. 2013. 2012. 2011. 2010. 2009. 2008. This list of recent dissertation topics shows the range of research areas that our students are working on.

  7. Getting a PhD in Data Science: What You Need to Know

    A PhD in Data Science is a research degree that typically takes four to five years to complete but can take longer depending on a range of personal factors. In addition to taking more advanced courses, PhD candidates devote a significant amount of time to teaching and conducting dissertation research with the intent of advancing the field.

  8. Five Tips For Writing A Great Data Science Thesis

    Although educational programs, conventions and thesis requirements vary wildly, I hope to offer some common guidelines for any student currently working on a Data Science thesis. The article offers five guidance points, but may effectively be summarized in a single line: "Write for your reader, not for yourself."

  9. Thesis/Capstone for Master's in Data Science

    Data Science; Capstone and Thesis Overview; Capstone and Thesis Overview. Capstone and thesis are similar in that they both represent a culminating, scholarly effort of high quality. Both should clearly state a problem or issue to be addressed. Both will allow students to complete a larger project and produce a product or publication that can ...

  10. Thesis Option

    Data Science master's students can choose to satisfy the research experience requirement by selecting the thesis option. Students will spend the majority of their second year working on a substantial data science project that culminates in the submission and oral defense of a master's thesis. While all thesis projects must be related to data science, students are given leeway in finding a ...

  11. Ten Research Challenge Areas in Data Science

    Abstract. To drive progress in the field of data science, we propose 10 challenge areas for the research community to pursue. Since data science is broad, with methods drawing from computer science, statistics, and other disciplines, and with applications appearing in all sectors, these challenge areas speak to the breadth of issues spanning ...

  12. MIT Theses

    MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

  13. PhD in Data Science

    PhD in Analytics and Data Science. Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

  14. 10 Compelling Machine Learning Ph.D. Dissertations for 2020

    This dissertation explores three topics related to random forests: tree aggregation, variable importance, and robustness. 10. Climate Data Computing: Optimal Interpolation, Averaging, Visualization and Delivery. This dissertation solves two important problems in the modern analysis of big climate data.

  15. 10 Best Research and Thesis Topic Ideas for Data Science in 2022

    The best course of action to amplify the robustness of a resume is to participate or take up different data science projects. In this article, we have listed 10 such research and thesis topic ideas to take up as data science projects in 2022. Handling practical video analytics in a distributed cloud: With increased dependency on the internet ...

  16. Top Data Science Ph.D. Dissertations (2019-2020)

    Top Data Science Ph.D. Dissertations (2019-2020) The American Mathematical Society (AMS) recently published in its Notices monthly journal a long list of all the doctoral degrees conferred from July 1, 2019 to June 30, 2020 for mathematics and statistics. The degrees come from 242 departments in 186 universities in the U.S.

  17. PhD Dissertations

    PhD Dissertations [All are .pdf files] Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023. Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023. METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023. Applied Mathematics of the Future Kin G. Olivares, 2023

  18. 37 Research Topics In Data Science To Stay On Top Of » EML

    37 Research Topics in Data Science. 1.) Predictive modeling. Predictive modeling is a significant portion of data science and a topic you must be aware of. Simply put, it is the process of using historical data to build models that can predict future outcomes.

  19. Research Areas

    Research Areas. The world is being transformed by data and data-driven analysis is rapidly becoming an integral part of science and society. Stanford Data Science is a collaborative effort across many departments in all seven schools. We strive to unite existing data science research initiatives and create interdisciplinary collaborations ...

  20. A Deep Dissertion of Data Science: Related Issues and its Applications

    Data Science refers to a study of extracting, collection, gathering data, representing and protecting data to be used for business purposes or in technical issues. Despite the fact that the name Data Science appears like something which meant, databases and software engineering, various types of quantitative and qualitative aptitudes including nonmathematical abilities are additionally ...

  21. 99 Data Science Dissertation Topics

    A list of Data Science Dissertation Topics: Analyzing the influence of regional variations on data science adoption and applications in the UK. Analyzing the ethical implications of using AI in healthcare decision-making. Examining the use of deep reinforcement learning in robotics.

  22. Data Science (with Dissertation)

    You'll learn new skills and ways of thinking as you explore data acquisition, preparation, transformation and modelling. For your dissertation, you'll work with your supervisor to decide on a topic and explore new approaches and the latest developments in data science. 3 reasons to study Science at Murdoch. Build your expertise in the ...

  23. What Is A Master's In Data Science?

    Master's in data science programs typically require a capstone or thesis project to graduate, allowing learners to demonstrate mastery of their data science knowledge under faculty guidance.

  24. Data Science Dissertation Topics

    Research Aim: This data science thesis topic aims to develop data science techniques that predict user behaviour on social media platforms. It identifies key predictors of user behaviour, such as user demographics, interests, and online activity. It develops machine learning models that accurately predict user behaviour based on these factors.

  25. Alumni Perspective: Do I Need a Master's Degree for Data Science?

    Alex Bass. February 15, 2024. In March 2021, after researching for quite some time, I decided to do a master's in data science. A little over two years later, I received my M.S. in data science from the University of Virginia. I hope to provide readers with some insight into the benefits (and drawbacks) of pursuing a master's degree in the ...

  26. Philosophy and Data Science

    Data science is a very broadly applicable skill set across many different domains, which means that throughout your career, you might play in many different peoples' backyards. This makes data science an exciting field to work in, but it also requires you to rapidly get situated and ask the right questions about topics that might be foreign ...