• Utility Menu

University Logo

Department of Statistics

4c69b3a36a33a4c1c5b5cd3ef5360949.

  • Open Positions

What do senior theses in Statistics look like?

This is a brief overview of thesis writing; for more information, please see our  complete guide here . Senior theses in Statistics cover a wide range of topics, across the spectrum from applied to theoretical. Typically, senior theses are expected to have one of the following three flavors:                                                                                                            

1. Novel statistical theory or methodology, supported by extensive mathematical and/or simulation results, along with a clear account of how the research extends or relates to previous related work.

2. An analysis of a complex data set that advances understanding in a related field, such as public health, economics, government, or genetics. Such a thesis may rely entirely on existing methods, but should give useful results and insights into an interesting applied problem.                                                                                 

3. An analysis of a complex data set in which new methods or modifications of published methods are required. While the thesis does not necessarily contain an extensive mathematical study of the new methods, it should contain strong plausibility arguments or simulations supporting the use of the new methods.

A good thesis is clear, readable, and well-motivated, justifying the applicability of the methods used rather than, for example, mechanically running regressions without discussing the assumptions (and whether they are plausible), performing diagnostics, and checking whether the conclusions make sense. 

Recent FAQs

  • What is a qualified applicant's likelihood for admission?
  • What is the application deadline?
  • Can I start the program in the spring?
  • Can I apply to two different GSAS degree programs at the same time?
  • Is a Math or Stats major required for admission?
  • Is the GRE required?

Secondary Menu

  • Our Mission
  • Diversity, Equity, and Inclusion
  • International Recognition
  • Department History
  • Departmental Awards
  • Why Study Statistics?
  • Degree Options
  • Getting Started with Statistics
  • For Current Students
  • Frequently Asked Questions
  • Summer Session Courses
  • For Prospective Students
  • Student Governance
  • Program Requirements
  • MS Focus Areas / Tracks
  • Course Help & Tutoring
  • MSS Completion Exercise
  • Independent Study
  • Statistical Science Proseminar
  • Primary Faculty
  • Secondary Faculty
  • Visiting Faculty
  • Postdoctoral Fellows
  • Ph.D. Students
  • M.S. Students
  • Theory, Methods, and Computation
  • Interdisciplinary Collaborations
  • Statistical Consulting Center
  • Alumni Profiles
  • For Our Alumni
  • Alumni Lists and Theses
  • Master's Thesis

As an integral component of the Master of Science in Statistical Science program, you can submit and defend a Master's Thesis. Your Master's Committee administers this oral examination. If you choose to defend a thesis, it is advisable to commence your research early, ideally during your second semester or the summer following your first year in the program. It's essential to allocate sufficient time for the thesis writing process. Your thesis advisor, who also serves as the committee chair, must approve both your thesis title and proposal. The final thesis work necessitates approval from all committee members and must adhere to the  Master's thesis requirements  set forth by the Duke University Graduate School.

Master’s BEST Award 

Each second-year Duke Master’s of Statistical Science (MSS) student defending their MSS thesis may be eligible for the  Master’s BEST Award . The Statistical Science faculty BEST Award Committee selects the awardee based on the submitted thesis of MSS thesis students, and the award is presented at the departmental graduation ceremony. 

Thesis Proposal

All second-year students choosing to do a thesis must submit a proposal (not more than two pages) approved by their thesis advisor to the Master's Director via Qualtrics by November 10th.  The thesis proposal should include a title,  the thesis advisor, committee members, and a description of your work. The description must introduce the research topic, outline its main objectives, and emphasize the significance of the research and its implications while identifying gaps in existing statistical literature. In addition, it can include some of the preliminary results. 

Committee members

MSS Students will have a thesis committee, which includes three faculty members - two must be departmental primary faculty, and the third could be from an external department in an applied area of the student’s interest, which must be a  Term Graduate Faculty through the Graduate School or have a secondary appointment with the Department of Statistical Science. All Committee members must be familiar with the Student’s work.  The department coordinates Committee approval. The thesis defense committee must be approved at least 30 days before the defense date.

Thesis Timeline and  Departmental Process:

Before defense:.

Intent to Graduate: Students must file an Intent to Graduate in ACES, specifying "Thesis Defense" during the application. For graduation deadlines, please refer to https://gradschool.duke.edu/academics/preparing-graduate .

Scheduling Thesis Defense: The student collaborates with the committee to set the date and time for the defense and communicates this information to the department, along with the thesis title. The defense must be scheduled during regular class sessions. Be sure to review the thesis defense and submission deadlines at https://gradschool.duke.edu/academics/theses-and-dissertations/

Room Reservations: The department arranges room reservations and sends confirmation details to the student, who informs committee members of the location.

Defense Announcement: The department prepares a defense announcement, providing a copy to the student and chair. After approval, it is signed by the Master's Director and submitted to the Graduate School. Copies are also posted on department bulletin boards.

Initial Thesis Submission: Two weeks before the defense, the student submits the initial thesis to the committee and the Graduate School. Detailed thesis formatting guidelines can be found at https://gradschool.duke.edu/academics/theses-and-dissertations.

Advisor Notification: The student requests that the advisor email [email protected] , confirming the candidate's readiness for defense. This step should be completed before the exam card appointment.

Format Check Appointment: One week before the defense, the Graduate School contacts the student to schedule a format check appointment. Upon approval, the Graduate School provides the Student Master’s Exam Card, which enables the student to send a revised thesis copy to committee members.

MSS Annual Report Form: The department provides the student with the MSS Annual Report Form to be presented at the defense.

Post Defense:

Communication of Defense Outcome: The committee chair conveys the defense results to the student, including any necessary follow-up actions in case of an unsuccessful defense.

In Case of Failure: If a student does not pass the thesis defense, the committee's decision to fail the student must be accompanied by explicit and clear comments from the chair, specifying deficiencies and areas that require attention for improvement.

Documentation: The student should ensure that the committee signs the Title Page, Abstract Page, and Exam Card.

Annual Report Form: The committee chair completes the Annual Report Form.

Master's Director Approval: The Master's director must provide their approval by signing the Exam Card.

Form Submission: Lastly, the committee chair is responsible for returning all completed and signed forms to the Department.

Final Thesis Submission: The student must meet the Graduate School requirement by submitting the final version of their Thesis to the Graduate School via ProQuest before the specified deadline. For detailed information, visit https://gradschool.duke.edu/academics/preparinggraduate .

  • The Stochastic Proximal Distance Algorithm
  • Logistic-tree Normal Mixture for Clustering Microbiome Compositions
  • Inference for Dynamic Treatment Regimes using Overlapping Sampling Splitting
  • Bayesian Modeling for Identifying Selection in B Cell Maturation
  • Differentially Private Verification with Survey Weights
  • Stable Variable Selection for Sparse Linear Regression in a Non-uniqueness Regime  
  • A Cost-Sensitive, Semi-Supervised, and Active Learning Approach for Priority Outlier Investigation
  • Bayesian Decoupling: A Decision Theory-Based Approach to Bayesian Variable Selection
  • A Differentially Private Bayesian Approach to Replication Analysis
  • Numerical Approximation of Gaussian-Smoothed Optimal Transport
  • Computational Challenges to Bayesian Density Discontinuity Regression
  • Hierarchical Signal Propagation for Household Level Sales in Bayesian Dynamic Models
  • Logistic Tree Gaussian Processes (LoTgGaP) for Microbiome Dynamics and Treatment Effects
  • Bayesian Inference on Ratios Subject to Differentially Private Noise
  • Multiple Imputation Inferences for Count Data
  • An Euler Characteristic Curve Based Representation of 3D Shapes in Statistical Analysis
  • An Investigation Into the Bias & Variance of Almost Matching Exactly Methods
  • Comparison of Bayesian Inference Methods for Probit Network Models
  • Differentially Private Counts with Additive Constraints
  • Multi-Scale Graph Principal Component Analysis for Connectomics
  • MCMC Sampling Geospatial Partitions for Linear Models
  • Bayesian Dynamic Network Modeling with Censored Flow Data  
  • An Application of Graph Diffusion for Gesture Classification
  • Easy and Efficient Bayesian Infinite Factor Analysis
  • Analyzing Amazon CD Reviews with Bayesian Monitoring and Machine Learning Methods
  • Missing Data Imputation for Voter Turnout Using Auxiliary Margins
  • Generalized and Scalable Optimal Sparse Decision Trees
  • Construction of Objective Bayesian Prior from Bertrand’s Paradox and the Principle of Indifference
  • Rethinking Non-Linear Instrumental Variables
  • Clustering-Enhanced Stochastic Gradient MCMC for Hidden Markov Models
  • Optimal Sparse Decision Trees
  • Bayesian Density Regression with a Jump Discontinuity at a Given Threshold
  • Forecasting the Term Structure of Interest Rates: A Bayesian Dynamic Graphical Modeling Approach
  • Testing Between Different Types of Poisson Mixtures with Applications to Neuroscience
  • Multiple Imputation of Missing Covariates in Randomized Controlled Trials
  • A Bayesian Strategy to the 20 Question Game with Applications to Recommender Systems
  • Applied Factor Dynamic Analysis for Macroeconomic Forecasting
  • A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results
  • Bayesian Inference Via Partitioning Under Differential Privacy
  • A Bayesian Forward Simulation Approach to Establishing a Realistic Prior Model for Complex Geometrical Objects
  • Two Applications of Summary Statistics: Integrating Information Across Genes and Confidence Intervals with Missing Data
  • Past Recipients
  • Considering a Statistical Science major at Duke?
  • Careers for Statisticians
  • Typical Pathways
  • Applied Electives for BS
  • Interdepartmental Majors
  • Minor in Statistical Science
  • Student Learning Outcomes
  • Study Abroad
  • Past Theses
  • Research Teams
  • Transfer Credit
  • Conference Funding for Research
  • Statistical Science Majors Union
  • Duke Actuarial Society
  • Duke Sports Analytics Club
  • Trinity Ambassadors
  • How to Apply
  • Financial Support
  • Graduate Placements
  • Living in Durham
  • Preliminary Examination
  • Dissertation
  • English Language Requirement
  • TA Guidelines
  • Progress Toward Completion
  • Ph.D. Committees
  • Terminal MS Degree
  • PhD / Research
  • Data Science & Analytics
  • Health Data Science
  • Finance & Economics
  • Marketing Research & Business Analytics
  • Social Science & Policy
  • Admission Statistics
  • Portfolio of Work
  • Capstone Project
  • Assisting Duke Students
  • StatSci Alumni Network
  • Ph.D. Student - Alumni Fund
  • Our Ph.D. Alums
  • Our M.S. Alums
  • Our Undergrad Alums
  • Our Postdoc Alums

statistical thesis meaning

Statistical Analysis in Research: Meaning, Methods and Types

Home » Videos » Statistical Analysis in Research: Meaning, Methods and Types

The scientific method is an empirical approach to acquiring new knowledge by making skeptical observations and analyses to develop a meaningful interpretation. It is the basis of research and the primary pillar of modern science. Researchers seek to understand the relationships between factors associated with the phenomena of interest. In some cases, research works with vast chunks of data, making it difficult to observe or manipulate each data point. As a result, statistical analysis in research becomes a means of evaluating relationships and interconnections between variables with tools and analytical techniques for working with large data. Since researchers use statistical power analysis to assess the probability of finding an effect in such an investigation, the method is relatively accurate. Hence, statistical analysis in research eases analytical methods by focusing on the quantifiable aspects of phenomena.

What is Statistical Analysis in Research? A Simplified Definition

Statistical analysis uses quantitative data to investigate patterns, relationships, and patterns to understand real-life and simulated phenomena. The approach is a key analytical tool in various fields, including academia, business, government, and science in general. This statistical analysis in research definition implies that the primary focus of the scientific method is quantitative research. Notably, the investigator targets the constructs developed from general concepts as the researchers can quantify their hypotheses and present their findings in simple statistics.

When a business needs to learn how to improve its product, they collect statistical data about the production line and customer satisfaction. Qualitative data is valuable and often identifies the most common themes in the stakeholders’ responses. On the other hand, the quantitative data creates a level of importance, comparing the themes based on their criticality to the affected persons. For instance, descriptive statistics highlight tendency, frequency, variation, and position information. While the mean shows the average number of respondents who value a certain aspect, the variance indicates the accuracy of the data. In any case, statistical analysis creates simplified concepts used to understand the phenomenon under investigation. It is also a key component in academia as the primary approach to data representation, especially in research projects, term papers and dissertations. 

Most Useful Statistical Analysis Methods in Research

Using statistical analysis methods in research is inevitable, especially in academic assignments, projects, and term papers. It’s always advisable to seek assistance from your professor or you can try research paper writing by CustomWritings before you start your academic project or write statistical analysis in research paper. Consulting an expert when developing a topic for your thesis or short mid-term assignment increases your chances of getting a better grade. Most importantly, it improves your understanding of research methods with insights on how to enhance the originality and quality of personalized essays. Professional writers can also help select the most suitable statistical analysis method for your thesis, influencing the choice of data and type of study.

Descriptive Statistics

Descriptive statistics is a statistical method summarizing quantitative figures to understand critical details about the sample and population. A description statistic is a figure that quantifies a specific aspect of the data. For instance, instead of analyzing the behavior of a thousand students, research can identify the most common actions among them. By doing this, the person utilizes statistical analysis in research, particularly descriptive statistics.

  • Measures of central tendency . Central tendency measures are the mean, mode, and media or the averages denoting specific data points. They assess the centrality of the probability distribution, hence the name. These measures describe the data in relation to the center.
  • Measures of frequency . These statistics document the number of times an event happens. They include frequency, count, ratios, rates, and proportions. Measures of frequency can also show how often a score occurs.
  • Measures of dispersion/variation . These descriptive statistics assess the intervals between the data points. The objective is to view the spread or disparity between the specific inputs. Measures of variation include the standard deviation, variance, and range. They indicate how the spread may affect other statistics, such as the mean.
  • Measures of position . Sometimes researchers can investigate relationships between scores. Measures of position, such as percentiles, quartiles, and ranks, demonstrate this association. They are often useful when comparing the data to normalized information.

Inferential Statistics

Inferential statistics is critical in statistical analysis in quantitative research. This approach uses statistical tests to draw conclusions about the population. Examples of inferential statistics include t-tests, F-tests, ANOVA, p-value, Mann-Whitney U test, and Wilcoxon W test. This

Common Statistical Analysis in Research Types

Although inferential and descriptive statistics can be classified as types of statistical analysis in research, they are mostly considered analytical methods. Types of research are distinguishable by the differences in the methodology employed in analyzing, assembling, classifying, manipulating, and interpreting data. The categories may also depend on the type of data used.

Predictive Analysis

Predictive research analyzes past and present data to assess trends and predict future events. An excellent example of predictive analysis is a market survey that seeks to understand customers’ spending habits to weigh the possibility of a repeat or future purchase. Such studies assess the likelihood of an action based on trends.

Prescriptive Analysis

On the other hand, a prescriptive analysis targets likely courses of action. It’s decision-making research designed to identify optimal solutions to a problem. Its primary objective is to test or assess alternative measures.

Causal Analysis

Causal research investigates the explanation behind the events. It explores the relationship between factors for causation. Thus, researchers use causal analyses to analyze root causes, possible problems, and unknown outcomes.

Mechanistic Analysis

This type of research investigates the mechanism of action. Instead of focusing only on the causes or possible outcomes, researchers may seek an understanding of the processes involved. In such cases, they use mechanistic analyses to document, observe, or learn the mechanisms involved.

Exploratory Data Analysis

Similarly, an exploratory study is extensive with a wider scope and minimal limitations. This type of research seeks insight into the topic of interest. An exploratory researcher does not try to generalize or predict relationships. Instead, they look for information about the subject before conducting an in-depth analysis.

The Importance of Statistical Analysis in Research

As a matter of fact, statistical analysis provides critical information for decision-making. Decision-makers require past trends and predictive assumptions to inform their actions. In most cases, the data is too complex or lacks meaningful inferences. Statistical tools for analyzing such details help save time and money, deriving only valuable information for assessment. An excellent statistical analysis in research example is a randomized control trial (RCT) for the Covid-19 vaccine. You can download a sample of such a document online to understand the significance such analyses have to the stakeholders. A vaccine RCT assesses the effectiveness, side effects, duration of protection, and other benefits. Hence, statistical analysis in research is a helpful tool for understanding data.

Sources and links For the articles and videos I use different databases, such as Eurostat, OECD World Bank Open Data, Data Gov and others. You are free to use the video I have made on your site using the link or the embed code. If you have any questions, don’t hesitate to write to me!

Support statistics and data, if you have reached the end and like this project, you can donate a coffee to “statistics and data”..

Copyright © 2022 Statistics and Data

image

Top 3 Techniques in Thesis Statistical Analysis for PhD

Top 3 Techniques in Thesis Statistical Analysis for PhD

Starting your journey into thesis statistical analysis for a PhD might feel a bit overwhelming, but fear not! We're here to guide you through the basics in simple terms. This blog is your go-to beginner's guide for understanding dissertation statistics and thesis statistics. We'll break down the top 3 techniques you need to know to ace your statistical analysis for that PhD journey. 

Thesis data analysis and interpretation for PhD involves applying statistical methods to interpret and draw meaningful conclusions from research data. It plays a vital role in validating hypotheses, making informed decisions, and ensuring the robustness of a doctoral thesis, contributing to the overall quality and credibility of the research.

No need for complicated jargon – we'll make it easy to grasp with thesis data analysis examples. Whether you're figuring out data interpretation, testing hypotheses, or selecting the right statistical tools, we've got your back. Join us as we unravel the mysteries, making the statistical side of your research more manageable and less intimidating. Let's dive in and make your thesis statistical analysis a breeze!

How to Conduct Thesis Data Analysis and Interpretation

1. Understand Your Data:

- Familiarize yourself with the dataset, its variables, and its structure.

 - Identify outliers, missing values, and patterns.

2. Choose Appropriate Methods:

- Select statistical techniques aligned with your research questions.

- Utilize descriptive statistics, inferential statistics, or other relevant methods.

3. Conduct Thesis Data Analysis:

- Apply chosen methods to the dataset.

- Present results through tables, graphs, or charts.

4. Interpret Findings:

- Analyze the implications of your results.

- Relate findings to your research questions and hypothesis.

Now let us dive into the top 3 techniques in thesis statistical analysis for PhD which can jumpstart your research.

# Descriptive Statistics

i) Summarizes Data: Descriptive statistics, such as mean, median, and mode, offer a concise overview of central tendencies, providing a snapshot of the dataset's key features.

ii) Identifies Patterns: Through measures like standard deviation and range, it highlights the dispersion of data points, aiding in the recognition of patterns or variations within the dataset.

iii) Informs Initial Understanding: In a thesis data analysis example, descriptive statistics play a crucial role in the initial stages of thesis data analysis and interpretation. They help researchers grasp the fundamental characteristics of their data, setting the foundation for more in-depth analysis.

iv) Facilitates Data Presentation: Results from descriptive statistics can be effectively presented through tables, graphs, or charts, enhancing the visual representation of complex data sets in the context of thesis data analysis for PhD research.

v) Guides Further Analysis: Insights gained from descriptive statistics guide researchers in formulating hypotheses and determining the appropriate statistical techniques for subsequent stages of analysis.

# Hypothesis Testing

i) Validates Research Hypotheses:

- Hypothesis testing is instrumental in confirming or refuting research hypotheses established during the initial stages of the PhD research process.

ii) Ensures Statistical Significance:

- It provides a systematic approach to assess the statistical significance of relationships within the data, validating the reliability and credibility of research findings.

iii)  Guides Research Methodology Consultation:

- Hypothesis testing outcomes contribute to the refinement of the overall research methodology, ensuring that the chosen methods align with the study's objectives and provide meaningful insights.

iv) Informs Decision-Making:

- Results from hypothesis testing guide researchers in making informed decisions about the acceptance or rejection of specific hypotheses, influencing the overall direction of the research.

v)   Enhances Survey Questionnaire Design Service:

Findings from hypothesis testing contribute valuable insights for improving survey questionnaire services. It helps identify key variables to measure and informs the development of survey questions that align with the study's objectives.

# Regression Analysis

i) Identifies Relationships Between Variables:

- Regression analysis allows researchers to explore and quantify the relationships between variables, providing insights into the nature and strength of these connections.

ii) Predicts Outcomes:

- It enables the prediction of outcomes based on the values of independent variables, adding a predictive dimension to the analysis in PhD thesis research.

iii)  Informs Qualitative Research Data Analysis Help for Thesis:

- Insights gained from regression analysis complement qualitative research data analysis help for the thesis. It assists in understanding the quantitative aspects of data, offering a comprehensive perspective for a more holistic analysis.

iv)   Enhances Results Chapter Writing Service for Dissertation:

- Results obtained through regression analysis contribute valuable information for the results chapter writing service for the dissertation. The interpretation of these results adds depth and context to the findings, making the results chapter more comprehensive.

v)   Supports Results with Interpretation Report:

- Regression analysis results are essential for constructing a results chapter with an interpretation report. The statistical findings help in drawing meaningful conclusions and presenting results in a format that is accessible to a broad audience.

Final Thoughts

To sum it up, delving into the top 3 techniques for thesis statistical analysis for PhD is like having a powerful toolkit for your research adventure. We've learned that descriptive stats give us a quick peek at the data, hypothesis testing helps us decide what's truly important, and regression analysis lets us predict outcomes. 

Just like in our thesis data analysis example, these methods are the storytellers of your research. They aren't just about crunching numbers; they help you narrate a compelling tale with your data. So, whether you're just starting out or refining your skills, these techniques are your pals for a strong and credible PhD journey. Cheers to making sense of the numbers and telling your research story well!

Oliver Statistics is a research consultancy firm based in Malaysia that provides comprehensive research services to support doctoral candidates in their academic pursuits. Their team of expert statisticians and academic professionals assists in developing robust research methodologies and utilizes top-notch software for effective data analysis . 

They offer a range of services, including research guidance, data analysis, and statistical consulting, to help PhD researchers in various subject domains. Oliver Statistics has created research tools and offers support services for data analytics to academics and PhD candidates worldwide. They have a team of elite academicians and expert technical researchers with subject matter experts from a variety of fields to assist PhD candidates. 

1. Why statistical analysis is important in research?

Ans. Statistical analysis is crucial in research to uncover patterns, and relationships, and draw meaningful conclusions from data.

2. What is SPSS in research methodology?

Ans. SPSS in research methodology is a statistical software used for data analysis and interpretation.

3. Is it necessary to use SPSS for data analysis?

Ans. The use of SPSS for data analysis is not mandatory, but it offers a powerful tool for researchers due to its versatility and statistical capabilities.

4. How do you present data analysis in a thesis?

Ans. Present data analysis in a thesis by utilizing clear visuals, such as tables and graphs, and providing a narrative that explains key findings and their significance.

TALK TO A QUALIFIED EXPERT ABOUT YOUR RESEARCH STAGE AND SEE HOW WE CAN HELP Contact Us

Statistical Methods in Theses: Guidelines and Explanations

Signed August 2018 Naseem Al-Aidroos, PhD, Christopher Fiacconi, PhD Deborah Powell, PhD, Harvey Marmurek, PhD, Ian Newby-Clark, PhD, Jeffrey Spence, PhD, David Stanley, PhD, Lana Trick, PhD

Version:  2.00

This document is an organizational aid, and workbook, for students. We encourage students to take this document to meetings with their advisor and committee. This guide should enhance a committee’s ability to assess key areas of a student’s work. 

In recent years a number of well-known and apparently well-established findings have  failed to replicate , resulting in what is commonly referred to as the replication crisis. The APA Publication Manual 6 th Edition notes that “The essence of the scientific method involves observations that can be repeated and verified by others.” (p. 12). However, a systematic investigation of the replicability of psychology findings published in  Science  revealed that over half of psychology findings do not replicate (see a related commentary in  Nature ). Even more disturbing, a  Bayesian reanalysis of the reproducibility project  showed that 64% of studies had sample sizes so small that strong evidence for or against the null or alternative hypotheses did not exist. Indeed, Morey and Lakens (2016) concluded that most of psychology is statistically unfalsifiable due to small sample sizes and correspondingly low power (see  article ). Our discipline’s reputation is suffering. News of the replication crisis has reached the popular press (e.g.,  The Atlantic ,   The Economist ,   Slate , Last Week Tonight ).

An increasing number of psychologists have responded by promoting new research standards that involve open science and the elimination of  Questionable Research Practices . The open science perspective is made manifest in the  Transparency and Openness Promotion (TOP) guidelines  for journal publications. These guidelines were adopted some time ago by the  Association for Psychological Science . More recently, the guidelines were adopted by American Psychological Association journals ( see details ) and journals published by Elsevier ( see details ). It appears likely that, in the very near future, most journals in psychology will be using an open science approach. We strongly advise readers to take a moment to inspect the  TOP Guidelines Summary Table . 

A key aspect of open science and the TOP guidelines is the sharing of data associated with published research (with respect to medical research, see point #35 in the  World Medical Association Declaration of Helsinki ). This practice is viewed widely as highly important. Indeed, open science is recommended by  all G7 science ministers . All Tri-Agency grants must include a data-management plan that includes plans for sharing: “ research data resulting from agency funding should normally be preserved in a publicly accessible, secure and curated repository or other platform for discovery and reuse by others.”  Moreover, a 2017 editorial published in the  New England Journal of Medicine announced that the  International Committee of Medical Journal Editors believes there is  “an ethical obligation to responsibly share data.”  As of this writing,  60% of highly ranked psychology journals require or encourage data sharing .

The increasing importance of demonstrating that findings are replicable is reflected in calls to make replication a requirement for the promotion of faculty (see details in  Nature ) and experts in open science are now refereeing applications for tenure and promotion (see details at the  Center for Open Science  and  this article ). Most dramatically, in one instance, a paper resulting from a dissertation was retracted due to misleading findings attributable to Questionable Research Practices. Subsequent to the retraction, the Ohio State University’s Board of Trustees unanimously revoked the PhD of the graduate student who wrote the dissertation ( see details ). Thus, the academic environment is changing and it is important to work toward using new best practices in lieu of older practices—many of which are synonymous with Questionable Research Practices. Doing so should help you avoid later career regrets and subsequent  public mea culpas . One way to achieve your research objectives in this new academic environment is  to incorporate replications into your research . Replications are becoming more common and there are even websites dedicated to helping students conduct replications (e.g.,  Psychology Science Accelerator ) and indexing the success of replications (e.g., Curate Science ). You might even consider conducting a replication for your thesis (subject to committee approval).

As early-career researchers, it is important to be aware of the changing academic environment. Senior principal investigators may be  reluctant to engage in open science  (see this student perspective in a  blog post  and  podcast ) and research on resistance to data sharing indicates that one of the barriers to sharing data is that researchers do not feel that they have knowledge of  how to share data online . This document is an educational aid and resource to provide students with introductory knowledge of how to participate in open science and online data sharing to start their education on these subjects. 

Guidelines and Explanations

In light of the changes in psychology, faculty members who teach statistics/methods have reviewed the literature and generated this guide for graduate students. The guide is intended to enhance the quality of student theses by facilitating their engagement in open and transparent research practices and by helping them avoid Questionable Research Practices, many of which are now deemed unethical and covered in the ethics section of textbooks.

This document is an informational tool.

How to Start

In order to follow best practices, some first steps need to be followed. Here is a list of things to do:

  • Get an Open Science account. Registration at  osf.io  is easy!
  • If conducting confirmatory hypothesis testing for your thesis, pre-register your hypotheses (see Section 1-Hypothesizing). The Open Science Foundation website has helpful  tutorials  and  guides  to get you going.
  • Also, pre-register your data analysis plan. Pre-registration typically includes how and when you will stop collecting data, how you will deal with violations of statistical assumptions and points of influence (“outliers”), the specific measures you will use, and the analyses you will use to test each hypothesis, possibly including the analysis script. Again, there is a lot of help available for this. 

Exploratory and Confirmatory Research Are Both of Value, But Do Not Confuse the Two

We note that this document largely concerns confirmatory research (i.e., testing hypotheses). We by no means intend to devalue exploratory research. Indeed, it is one of the primary ways that hypotheses are generated for (possible) confirmation. Instead, we emphasize that it is important that you clearly indicate what of your research is exploratory and what is confirmatory. Be clear in your writing and in your preregistration plan. You should explicitly indicate which of your analyses are exploratory and which are confirmatory. Please note also that if you are engaged in exploratory research, then Null Hypothesis Significance Testing (NHST) should probably be avoided (see rationale in  Gigerenzer  (2004) and  Wagenmakers et al., (2012) ). 

This document is structured around the stages of thesis work:  hypothesizing, design, data collection, analyses, and reporting – consistent with the headings used by Wicherts et al. (2016). We also list the Questionable Research Practices associated with each stage and provide suggestions for avoiding them. We strongly advise going through all of these sections during thesis/dissertation proposal meetings because a priori decisions need to be made prior to data collection (including analysis decisions). 

To help to ensure that the student has informed the committee about key decisions at each stage, there are check boxes at the end of each section.

How to Use This Document in a Proposal Meeting

  • Print off a copy of this document and take it to the proposal meeting.
  • During the meeting, use the document to seek assistance from faculty to address potential problems.
  • Revisit responses to issues raised by this document (especially the Analysis and Reporting Stages) when you are seeking approval to proceed to defense.

Consultation and Help Line

Note that the Center for Open Science now has a help line (for individual researchers and labs) you can call for help with open science issues. They also have training workshops. Please see their  website  for details.

  • Hypothesizing
  • Data Collection
  • Printer-friendly version
  • PDF version

statistical thesis meaning

Community Blog

Keep up-to-date on postgraduate related issues with our quick reads written by students, postdocs, professors and industry leaders.

Statistical Treatment of Data – Explained & Example

DiscoverPhDs

  • By DiscoverPhDs
  • September 8, 2020

Statistical Treatment of Data in Research

‘Statistical treatment’ is when you apply a statistical method to a data set to draw meaning from it. Statistical treatment can be either descriptive statistics, which describes the relationship between variables in a population, or inferential statistics, which tests a hypothesis by making inferences from the collected data.

Introduction to Statistical Treatment in Research

Every research student, regardless of whether they are a biologist, computer scientist or psychologist, must have a basic understanding of statistical treatment if their study is to be reliable.

This is because designing experiments and collecting data are only a small part of conducting research. The other components, which are often not so well understood by new researchers, are the analysis, interpretation and presentation of the data. This is just as important, if not more important, as this is where meaning is extracted from the study .

What is Statistical Treatment of Data?

Statistical treatment of data is when you apply some form of statistical method to a data set to transform it from a group of meaningless numbers into meaningful output.

Statistical treatment of data involves the use of statistical methods such as:

  • regression,
  • conditional probability,
  • standard deviation and
  • distribution range.

These statistical methods allow us to investigate the statistical relationships between the data and identify possible errors in the study.

In addition to being able to identify trends, statistical treatment also allows us to organise and process our data in the first place. This is because when carrying out statistical analysis of our data, it is generally more useful to draw several conclusions for each subgroup within our population than to draw a single, more general conclusion for the whole population. However, to do this, we need to be able to classify the population into different subgroups so that we can later break down our data in the same way before analysing it.

Statistical Treatment Example – Quantitative Research

Statistical Treatment of Data Example

For a statistical treatment of data example, consider a medical study that is investigating the effect of a drug on the human population. As the drug can affect different people in different ways based on parameters such as gender, age and race, the researchers would want to group the data into different subgroups based on these parameters to determine how each one affects the effectiveness of the drug. Categorising the data in this way is an example of performing basic statistical treatment.

Type of Errors

A fundamental part of statistical treatment is using statistical methods to identify possible outliers and errors. No matter how careful we are, all experiments are subject to inaccuracies resulting from two types of errors: systematic errors and random errors.

Systematic errors are errors associated with either the equipment being used to collect the data or with the method in which they are used. Random errors are errors that occur unknowingly or unpredictably in the experimental configuration, such as internal deformations within specimens or small voltage fluctuations in measurement testing instruments.

These experimental errors, in turn, can lead to two types of conclusion errors: type I errors and type II errors . A type I error is a false positive which occurs when a researcher rejects a true null hypothesis. On the other hand, a type II error is a false negative which occurs when a researcher fails to reject a false null hypothesis.

Finland PhD Hat and Sword

In Finland, all new PhD holders are given a traditional Doctoral Hat and Doctoral Sword during a Conferment Ceremony, symbolising the freedom of research.

Science Investigatory Project

A science investigatory project is a science-based research project or study that is performed by school children in a classroom, exhibition or science fair.

A Guide to Your First Week as a PhD Student

How should you spend your first week as a PhD student? Here’s are 7 steps to help you get started on your journey.

Join thousands of other students and stay up to date with the latest PhD programmes, funding opportunities and advice.

statistical thesis meaning

Browse PhDs Now

New PhD Student

Starting your PhD can feel like a daunting, exciting and special time. They’ll be so much to think about – here are a few tips to help you get started.

What is Scientific Misconduct?

Scientific misconduct can be described as a deviation from the accepted standards of scientific research, study and publication ethics.

Rowan Hooper Profile

Dr Hooper gained his PhD in evolutionary biology from the the University of Sheffield. He is now a journalist and writer (last book called Superhuman) and podcast editor at New Scientist.

statistical thesis meaning

Harneet’s heading into the fourth year of her PhD at the University of Leicester. Her research looks at the electrical currents in the Earth’s magnetosphere: how they change with time, and under different conditions.

Join Thousands of Students

  • Weblog home

International Students Blog

International Students blog

Thesis life: 7 ways to tackle statistics in your thesis.

statistical thesis meaning

By Pranav Kulkarni

Thesis is an integral part of your Masters’ study in Wageningen University and Research. It is the most exciting, independent and technical part of the study. More often than not, most departments in WU expect students to complete a short term independent project or a part of big on-going project for their thesis assignment.

https://www.coursera.org/learn/bayesian

Source : www.coursera.org

This assignment involves proposing a research question, tackling it with help of some observations or experiments, analyzing these observations or results and then stating them by drawing some conclusions.

Since it is an immitigable part of your thesis, you can neither run from statistics nor cry for help.

The penultimate part of this process involves analysis of results which is very crucial for coherence of your thesis assignment.This analysis usually involve use of statistical tools to help draw inferences. Most students who don’t pursue statistics in their curriculum are scared by this prospect. Since it is an immitigable part of your thesis, you can neither run from statistics nor cry for help. But in order to not get intimidated by statistics and its “greco-latin” language, there are a few ways in which you can make your journey through thesis life a pleasant experience.

Make statistics your friend

The best way to end your fear of statistics and all its paraphernalia is to befriend it. Try to learn all that you can about the techniques that you will be using, why they were invented, how they were invented and who did this deed. Personifying the story of statistical techniques makes them digestible and easy to use. Each new method in statistics comes with a unique story and loads of nerdy anecdotes.

Source: Wikipedia

If you cannot make friends with statistics, at least make a truce

If you cannot still bring yourself about to be interested in the life and times of statistics, the best way to not hate statistics is to make an agreement with yourself. You must realise that although important, this is only part of your thesis. The better part of your thesis is something you trained for and learned. So, don’t bother to fuss about statistics and make you all nervous. Do your job, enjoy thesis to the fullest and complete the statistical section as soon as possible. At the end, you would have forgotten all about your worries and fears of statistics.

Visualize your data

The best way to understand the results and observations from your study/ experiments, is to visualize your data. See different trends, patterns, or lack thereof to understand what you are supposed to do. Moreover, graphics and illustrations can be used directly in your report. These techniques will also help you decide on which statistical analyses you must perform to answer your research question. Blind decisions about statistics can often influence your study and make it very confusing or worse, make it completely wrong!

Self-sourced

Simplify with flowcharts and planning

Similar to graphical visualizations, making flowcharts and planning various steps of your study can prove beneficial to make statistical decisions. Human brain can analyse pictorial information faster than literal information. So, it is always easier to understand your exact goal when you can make decisions based on flowchart or any logical flow-plans.

https://www.imindq.com/blog/how-to-simplify-decision-making-with-flowcharts

Source: www.imindq.com

Find examples on internet

Although statistics is a giant maze of complicated terminologies, the internet holds the key to this particular maze. You can find tons of examples on the web. These may be similar to what you intend to do or be different applications of the similar tools that you wish to engage. Especially, in case of Statistical programming languages like R, SAS, Python, PERL, VBA, etc. there is a vast database of example codes, clarifications and direct training examples available on the internet. Various forums are also available for specialized statistical methodologies where different experts and students discuss the issues regarding their own projects.

Self-sourced

Comparative studies

Much unlike blindly searching the internet for examples and taking word of advice from online faceless people, you can systematically learn which quantitative tests to perform by rigorously studying literature of relevant research. Since you came up with a certain problem to tackle in your field of study, chances are, someone else also came up with this issue or something quite similar. You can find solutions to many such problems by scouring the internet for research papers which address the issue. Nevertheless, you should be cautious. It is easy to get lost and disheartened when you find many heavy statistical studies with lots of maths and derivations with huge cryptic symbolical text.

When all else fails, talk to an expert

All the steps above are meant to help you independently tackle whatever hurdles you encounter over the course of your thesis. But, when you cannot tackle them yourself it is always prudent and most efficient to ask for help. Talking to students from your thesis ring who have done something similar is one way of help. Another is to make an appointment with your supervisor and take specific questions to him/ her. If that is not possible, you can contact some other teaching staff or researchers from your research group. Try not to waste their as well as you time by making a list of specific problems that you will like to discuss. I think most are happy to help in any way possible.

Talking to students from your thesis ring who have done something similar is one way of help.

Sometimes, with the help of your supervisor, you can make an appointment with someone from the “Biometris” which is the WU’s statistics department. These people are the real deal; chances are, these people can solve all your problems without any difficulty. Always remember, you are in the process of learning, nobody expects you to be an expert in everything. Ask for help when there seems to be no hope.

Apart from these seven ways to make your statistical journey pleasant, you should always engage in reading, watching, listening to stuff relevant to your thesis topic and talking about it to those who are interested. Most questions have solutions in the ether realm of communication. So, best of luck and break a leg!!!

Related posts:

No related posts.

MSc Animal Science

View articles

There are 4 comments.

A perfect approach in a very crisp and clear manner! The sequence suggested is absolutely perfect and will help the students very much. I particularly liked the idea of visualisation!

You are write! I get totally stuck with learning and understanding statistics for my Dissertation!

Statistics is a technical subject that requires extra effort. With the highlighted tips you already highlighted i expect it will offer the much needed help with statistics analysis in my course.

this is so much relevant to me! Don’t forget one more point: try to enrol specific online statistics course (in my case, I’m too late to join any statistic course). The hardest part for me actually to choose what type of statistical test to choose among many options

Leave a reply Cancel reply

Your email address will not be published. Required fields are marked *

Table of Contents

Types of statistical analysis, importance of statistical analysis, benefits of statistical analysis, statistical analysis process, statistical analysis methods, statistical analysis software, statistical analysis examples, career in statistical analysis, choose the right program, become proficient in statistics today, what is statistical analysis types, methods and examples.

What Is Statistical Analysis?

Statistical analysis is the process of collecting and analyzing data in order to discern patterns and trends. It is a method for removing bias from evaluating data by employing numerical analysis. This technique is useful for collecting the interpretations of research, developing statistical models, and planning surveys and studies.

Statistical analysis is a scientific tool in AI and ML that helps collect and analyze large amounts of data to identify common patterns and trends to convert them into meaningful information. In simple words, statistical analysis is a data analysis tool that helps draw meaningful conclusions from raw and unstructured data. 

The conclusions are drawn using statistical analysis facilitating decision-making and helping businesses make future predictions on the basis of past trends. It can be defined as a science of collecting and analyzing data to identify trends and patterns and presenting them. Statistical analysis involves working with numbers and is used by businesses and other institutions to make use of data to derive meaningful information. 

Given below are the 6 types of statistical analysis:

Descriptive Analysis

Descriptive statistical analysis involves collecting, interpreting, analyzing, and summarizing data to present them in the form of charts, graphs, and tables. Rather than drawing conclusions, it simply makes the complex data easy to read and understand.

Inferential Analysis

The inferential statistical analysis focuses on drawing meaningful conclusions on the basis of the data analyzed. It studies the relationship between different variables or makes predictions for the whole population.

Predictive Analysis

Predictive statistical analysis is a type of statistical analysis that analyzes data to derive past trends and predict future events on the basis of them. It uses machine learning algorithms, data mining , data modelling , and artificial intelligence to conduct the statistical analysis of data.

Prescriptive Analysis

The prescriptive analysis conducts the analysis of data and prescribes the best course of action based on the results. It is a type of statistical analysis that helps you make an informed decision. 

Exploratory Data Analysis

Exploratory analysis is similar to inferential analysis, but the difference is that it involves exploring the unknown data associations. It analyzes the potential relationships within the data. 

Causal Analysis

The causal statistical analysis focuses on determining the cause and effect relationship between different variables within the raw data. In simple words, it determines why something happens and its effect on other variables. This methodology can be used by businesses to determine the reason for failure. 

Statistical analysis eliminates unnecessary information and catalogs important data in an uncomplicated manner, making the monumental work of organizing inputs appear so serene. Once the data has been collected, statistical analysis may be utilized for a variety of purposes. Some of them are listed below:

  • The statistical analysis aids in summarizing enormous amounts of data into clearly digestible chunks.
  • The statistical analysis aids in the effective design of laboratory, field, and survey investigations.
  • Statistical analysis may help with solid and efficient planning in any subject of study.
  • Statistical analysis aid in establishing broad generalizations and forecasting how much of something will occur under particular conditions.
  • Statistical methods, which are effective tools for interpreting numerical data, are applied in practically every field of study. Statistical approaches have been created and are increasingly applied in physical and biological sciences, such as genetics.
  • Statistical approaches are used in the job of a businessman, a manufacturer, and a researcher. Statistics departments can be found in banks, insurance businesses, and government agencies.
  • A modern administrator, whether in the public or commercial sector, relies on statistical data to make correct decisions.
  • Politicians can utilize statistics to support and validate their claims while also explaining the issues they address.

Become a Data Science & Business Analytics Professional

  • 28% Annual Job Growth By 2026
  • 11.5 M Expected New Jobs For Data Science By 2026

Data Analyst

  • Industry-recognized Data Analyst Master’s certificate from Simplilearn
  • Dedicated live sessions by faculty of industry experts

Data Scientist

  • Add the IBM Advantage to your Learning
  • 25 Industry-relevant Projects and Integrated labs

Here's what learners are saying regarding our programs:

Gayathri Ramesh

Gayathri Ramesh

Associate data engineer , publicis sapient.

The course was well structured and curated. The live classes were extremely helpful. They made learning more productive and interactive. The program helped me change my domain from a data analyst to an Associate Data Engineer.

A.Anthony Davis

A.Anthony Davis

Simplilearn has one of the best programs available online to earn real-world skills that are in demand worldwide. I just completed the Machine Learning Advanced course, and the LMS was excellent.

Statistical analysis can be called a boon to mankind and has many benefits for both individuals and organizations. Given below are some of the reasons why you should consider investing in statistical analysis:

  • It can help you determine the monthly, quarterly, yearly figures of sales profits, and costs making it easier to make your decisions.
  • It can help you make informed and correct decisions.
  • It can help you identify the problem or cause of the failure and make corrections. For example, it can identify the reason for an increase in total costs and help you cut the wasteful expenses.
  • It can help you conduct market analysis and make an effective marketing and sales strategy.
  • It helps improve the efficiency of different processes.

Given below are the 5 steps to conduct a statistical analysis that you should follow:

  • Step 1: Identify and describe the nature of the data that you are supposed to analyze.
  • Step 2: The next step is to establish a relation between the data analyzed and the sample population to which the data belongs. 
  • Step 3: The third step is to create a model that clearly presents and summarizes the relationship between the population and the data.
  • Step 4: Prove if the model is valid or not.
  • Step 5: Use predictive analysis to predict future trends and events likely to happen. 

Although there are various methods used to perform data analysis, given below are the 5 most used and popular methods of statistical analysis:

Mean or average mean is one of the most popular methods of statistical analysis. Mean determines the overall trend of the data and is very simple to calculate. Mean is calculated by summing the numbers in the data set together and then dividing it by the number of data points. Despite the ease of calculation and its benefits, it is not advisable to resort to mean as the only statistical indicator as it can result in inaccurate decision making. 

Standard Deviation

Standard deviation is another very widely used statistical tool or method. It analyzes the deviation of different data points from the mean of the entire data set. It determines how data of the data set is spread around the mean. You can use it to decide whether the research outcomes can be generalized or not. 

Regression is a statistical tool that helps determine the cause and effect relationship between the variables. It determines the relationship between a dependent and an independent variable. It is generally used to predict future trends and events.

Hypothesis Testing

Hypothesis testing can be used to test the validity or trueness of a conclusion or argument against a data set. The hypothesis is an assumption made at the beginning of the research and can hold or be false based on the analysis results. 

Sample Size Determination

Sample size determination or data sampling is a technique used to derive a sample from the entire population, which is representative of the population. This method is used when the size of the population is very large. You can choose from among the various data sampling techniques such as snowball sampling, convenience sampling, and random sampling. 

Everyone can't perform very complex statistical calculations with accuracy making statistical analysis a time-consuming and costly process. Statistical software has become a very important tool for companies to perform their data analysis. The software uses Artificial Intelligence and Machine Learning to perform complex calculations, identify trends and patterns, and create charts, graphs, and tables accurately within minutes. 

Look at the standard deviation sample calculation given below to understand more about statistical analysis.

The weights of 5 pizza bases in cms are as follows:

Calculation of Mean = (9+2+5+4+12)/5 = 32/5 = 6.4

Calculation of mean of squared mean deviation = (6.76+19.36+1.96+5.76+31.36)/5 = 13.04

Sample Variance = 13.04

Standard deviation = √13.04 = 3.611

A Statistical Analyst's career path is determined by the industry in which they work. Anyone interested in becoming a Data Analyst may usually enter the profession and qualify for entry-level Data Analyst positions right out of high school or a certificate program — potentially with a Bachelor's degree in statistics, computer science, or mathematics. Some people go into data analysis from a similar sector such as business, economics, or even the social sciences, usually by updating their skills mid-career with a statistical analytics course.

Statistical Analyst is also a great way to get started in the normally more complex area of data science. A Data Scientist is generally a more senior role than a Data Analyst since it is more strategic in nature and necessitates a more highly developed set of technical abilities, such as knowledge of multiple statistical tools, programming languages, and predictive analytics models.

Aspiring Data Scientists and Statistical Analysts generally begin their careers by learning a programming language such as R or SQL. Following that, they must learn how to create databases, do basic analysis, and make visuals using applications such as Tableau. However, not every Statistical Analyst will need to know how to do all of these things, but if you want to advance in your profession, you should be able to do them all.

Based on your industry and the sort of work you do, you may opt to study Python or R, become an expert at data cleaning, or focus on developing complicated statistical models.

You could also learn a little bit of everything, which might help you take on a leadership role and advance to the position of Senior Data Analyst. A Senior Statistical Analyst with vast and deep knowledge might take on a leadership role leading a team of other Statistical Analysts. Statistical Analysts with extra skill training may be able to advance to Data Scientists or other more senior data analytics positions.

Supercharge your career in AI and ML with Simplilearn's comprehensive courses. Gain the skills and knowledge to transform industries and unleash your true potential. Enroll now and unlock limitless possibilities!

Program Name AI Engineer Post Graduate Program In Artificial Intelligence Post Graduate Program In Artificial Intelligence Geo All Geos All Geos IN/ROW University Simplilearn Purdue Caltech Course Duration 11 Months 11 Months 11 Months Coding Experience Required Basic Basic No Skills You Will Learn 10+ skills including data structure, data manipulation, NumPy, Scikit-Learn, Tableau and more. 16+ skills including chatbots, NLP, Python, Keras and more. 8+ skills including Supervised & Unsupervised Learning Deep Learning Data Visualization, and more. Additional Benefits Get access to exclusive Hackathons, Masterclasses and Ask-Me-Anything sessions by IBM Applied learning via 3 Capstone and 12 Industry-relevant Projects Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Resume Building Assistance Upto 14 CEU Credits Caltech CTME Circle Membership Cost $$ $$$$ $$$$ Explore Program Explore Program Explore Program

Hope this article assisted you in understanding the importance of statistical analysis in every sphere of life. Artificial Intelligence (AI) can help you perform statistical analysis and data analysis very effectively and efficiently. 

If you are a science wizard and fascinated by the role of AI in statistical analysis, check out this amazing Caltech Post Graduate Program in AI & ML course in collaboration with Caltech. With a comprehensive syllabus and real-life projects, this course is one of the most popular courses and will help you with all that you need to know about Artificial Intelligence. 

Our AI & Machine Learning Courses Duration And Fees

AI & Machine Learning Courses typically range from a few weeks to several months, with fees varying based on program and institution.

Get Free Certifications with free video courses

Introduction to Data Analytics Course

Data Science & Business Analytics

Introduction to Data Analytics Course

Introduction to Data Science

Introduction to Data Science

Learn from Industry Experts with free Masterclasses

Ai & machine learning.

Gain Gen AI expertise in Purdue's Applied Gen AI Specialization

Unlock Your Career Potential: Land Your Dream Job with Gen AI Tools

Make Your Gen AI & ML Career Shift in 2024 a Success with iHUB DivyaSampark, IIT Roorkee

Recommended Reads

Free eBook: Guide To The CCBA And CBAP Certifications

Understanding Statistical Process Control (SPC) and Top Applications

A Complete Guide on the Types of Statistical Studies

Digital Marketing Salary Guide 2021

What Is Data Analysis: A Comprehensive Guide

A Complete Guide to Get a Grasp of Time Series Analysis

Get Affiliated Certifications with Live Class programs

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Book cover

Making Sense of Learning pp 385–410 Cite as

Making Sense of Statistics

How Statistics Can Help Us Forward in Education

  • Norman Reid 3 &
  • Asma Amanat Ali 4  
  • First Online: 26 August 2020

634 Accesses

Part of the book series: Springer Texts in Education ((SPTE))

Statistics can be seen as a tool to enable us to make some sense of much data. Statistics now underpins many areas of modern life today and is widely used across many academic disciplines. It has a powerful role in helping us to interpret research data in education. In this chapter, we seek to unfold and explain some key statistical ideas. We then move on to show how these ideas can be very useful and insightful in interpreting different kinds of data. Statistics has a central place in educationalresearch but it also has an important making senserole in of all kinds of measurements made regularly in schools and universities today.

Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write . H. G. Wells

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Sometimes, spreadsheets give the answer to the calculations to huge numbers of decimal places. This generates considerable confusion. It is necessary to reduce the number of numbers after the decimal point and this can be adjusted (the methods to do this varying between software packages).

Almadani, K., Reid, N. and Rodrigues, S. (2012). What examinations test. Problems of Education in the 21st century, 72 , 6–19. http://www.scientiasocialis.lt/qiic/files/pdf/Almadani_Vol.1_qiic.pdf . Accessed December 11, 2019.

Garfield, J., & Ahlgren, A. (1988). Difficulties in learning basic concepts in statistics: Implications for research. Journal for Research in Mathematics Education, 19 (3), 44–63.

Article   Google Scholar  

Ghani, S. A. (2004). A study of student teachers’ performance and psychological characteristics in learning introductory statistics (Ph.D. Thesis). Glasgow: University of Glasgow. http://theses.gla.ac.uk/2848/ . Accessed November 7, 2019.

Johnstone, A. H. (1991). Why is science difficult to learn? Things are seldom what they seem. Journal of Computer Assisted learning, 7, 75–83.

Johnstone, A. H., & El-Banna, H. (1989). Understanding learning difficulties—A predictive research model. Studies in Higher Education, 14 (2), 159–168.

Lane, D. M. (undated). Areas under normal distribution. http://onlinestatbook.com/2/calculators/normal_dist.html . Accessed November 7, 2019.

Podehl, M. (2002). Statistical literacy and the media. In Proceedings of the Sixth International Conference on Teaching Statistics , Voorburg, The Netherlands: ISI.

Google Scholar  

Reid, N. (2006). Thoughts on attitude measurement. Research in Science and Technological Education, 24 (1), 3–27.

Reid, N. (2015). Attitude research in science education. In M. S. Khine (Ed.), Attitude research in science education (pp. 3–46). Charlotte, NC: Information Age Publishing Inc.

Reid, N., & Skryabina, E. (2002). Gender and physics. International Journal Science Education, 25 (4), 509–536.

Download references

Author information

Authors and affiliations.

University of Glasgow, Glasgow, UK

Norman Reid

Beaconhouse Schools System, Lahore, Pakistan

Asma Amanat Ali

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Norman Reid .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Cite this chapter.

Reid, N., Amanat Ali, A. (2020). Making Sense of Statistics. In: Making Sense of Learning. Springer Texts in Education. Springer, Cham. https://doi.org/10.1007/978-3-030-53677-0_16

Download citation

DOI : https://doi.org/10.1007/978-3-030-53677-0_16

Published : 26 August 2020

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-53676-3

Online ISBN : 978-3-030-53677-0

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Privacy Policy

Buy Me a Coffee

Research Method

Home » ANOVA (Analysis of variance) – Formulas, Types, and Examples

ANOVA (Analysis of variance) – Formulas, Types, and Examples

Table of Contents

ANOVA

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means. It is similar to the t-test, but the t-test is generally used for comparing two means, while ANOVA is used when you have more than two means to compare.

ANOVA is based on comparing the variance (or variation) between the data samples to the variation within each particular sample. If the between-group variance is high and the within-group variance is low, this provides evidence that the means of the groups are significantly different.

ANOVA Terminology

When discussing ANOVA, there are several key terms to understand:

  • Factor : This is another term for the independent variable in your analysis. In a one-way ANOVA, there is one factor, while in a two-way ANOVA, there are two factors.
  • Levels : These are the different groups or categories within a factor. For example, if the factor is ‘diet’ the levels might be ‘low fat’, ‘medium fat’, and ‘high fat’.
  • Response Variable : This is the dependent variable or the outcome that you are measuring.
  • Within-group Variance : This is the variance or spread of scores within each level of your factor.
  • Between-group Variance : This is the variance or spread of scores between the different levels of your factor.
  • Grand Mean : This is the overall mean when you consider all the data together, regardless of the factor level.
  • Treatment Sums of Squares (SS) : This represents the between-group variability. It is the sum of the squared differences between the group means and the grand mean.
  • Error Sums of Squares (SS) : This represents the within-group variability. It’s the sum of the squared differences between each observation and its group mean.
  • Total Sums of Squares (SS) : This is the sum of the Treatment SS and the Error SS. It represents the total variability in the data.
  • Degrees of Freedom (df) : The degrees of freedom are the number of values that have the freedom to vary when computing a statistic. For example, if you have ‘n’ observations in one group, then the degrees of freedom for that group is ‘n-1’.
  • Mean Square (MS) : Mean Square is the average squared deviation and is calculated by dividing the sum of squares by the corresponding degrees of freedom.
  • F-Ratio : This is the test statistic for ANOVAs, and it’s the ratio of the between-group variance to the within-group variance. If the between-group variance is significantly larger than the within-group variance, the F-ratio will be large and likely significant.
  • Null Hypothesis (H0) : This is the hypothesis that there is no difference between the group means.
  • Alternative Hypothesis (H1) : This is the hypothesis that there is a difference between at least two of the group means.
  • p-value : This is the probability of obtaining a test statistic as extreme as the one that was actually observed, assuming that the null hypothesis is true. If the p-value is less than the significance level (usually 0.05), then the null hypothesis is rejected in favor of the alternative hypothesis.
  • Post-hoc tests : These are follow-up tests conducted after an ANOVA when the null hypothesis is rejected, to determine which specific groups’ means (levels) are different from each other. Examples include Tukey’s HSD, Scheffe, Bonferroni, among others.

Types of ANOVA

Types of ANOVA are as follows:

One-way (or one-factor) ANOVA

This is the simplest type of ANOVA, which involves one independent variable . For example, comparing the effect of different types of diet (vegetarian, pescatarian, omnivore) on cholesterol level.

Two-way (or two-factor) ANOVA

This involves two independent variables. This allows for testing the effect of each independent variable on the dependent variable , as well as testing if there’s an interaction effect between the independent variables on the dependent variable.

Repeated Measures ANOVA

This is used when the same subjects are measured multiple times under different conditions, or at different points in time. This type of ANOVA is often used in longitudinal studies.

Mixed Design ANOVA

This combines features of both between-subjects (independent groups) and within-subjects (repeated measures) designs. In this model, one factor is a between-subjects variable and the other is a within-subjects variable.

Multivariate Analysis of Variance (MANOVA)

This is used when there are two or more dependent variables. It tests whether changes in the independent variable(s) correspond to changes in the dependent variables.

Analysis of Covariance (ANCOVA)

This combines ANOVA and regression. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative covariates (interval variables) account. This allows the comparison of one variable outcome between groups, while statistically controlling for the effect of other continuous variables that are not of primary interest.

Nested ANOVA

This model is used when the groups can be clustered into categories. For example, if you were comparing students’ performance from different classrooms and different schools, “classroom” could be nested within “school.”

ANOVA Formulas

ANOVA Formulas are as follows:

Sum of Squares Total (SST)

This represents the total variability in the data. It is the sum of the squared differences between each observation and the overall mean.

  • yi represents each individual data point
  • y_mean represents the grand mean (mean of all observations)

Sum of Squares Within (SSW)

This represents the variability within each group or factor level. It is the sum of the squared differences between each observation and its group mean.

  • yij represents each individual data point within a group
  • y_meani represents the mean of the ith group

Sum of Squares Between (SSB)

This represents the variability between the groups. It is the sum of the squared differences between the group means and the grand mean, multiplied by the number of observations in each group.

  • ni represents the number of observations in each group
  • y_mean represents the grand mean

Degrees of Freedom

The degrees of freedom are the number of values that have the freedom to vary when calculating a statistic.

For within groups (dfW):

For between groups (dfB):

For total (dfT):

  • N represents the total number of observations
  • k represents the number of groups

Mean Squares

Mean squares are the sum of squares divided by the respective degrees of freedom.

Mean Squares Between (MSB):

Mean Squares Within (MSW):

F-Statistic

The F-statistic is used to test whether the variability between the groups is significantly greater than the variability within the groups.

If the F-statistic is significantly higher than what would be expected by chance, we reject the null hypothesis that all group means are equal.

Examples of ANOVA

Examples 1:

Suppose a psychologist wants to test the effect of three different types of exercise (yoga, aerobic exercise, and weight training) on stress reduction. The dependent variable is the stress level, which can be measured using a stress rating scale.

Here are hypothetical stress ratings for a group of participants after they followed each of the exercise regimes for a period:

  • Yoga: [3, 2, 2, 1, 2, 2, 3, 2, 1, 2]
  • Aerobic Exercise: [2, 3, 3, 2, 3, 2, 3, 3, 2, 2]
  • Weight Training: [4, 4, 5, 5, 4, 5, 4, 5, 4, 5]

The psychologist wants to determine if there is a statistically significant difference in stress levels between these different types of exercise.

To conduct the ANOVA:

1. State the hypotheses:

  • Null Hypothesis (H0): There is no difference in mean stress levels between the three types of exercise.
  • Alternative Hypothesis (H1): There is a difference in mean stress levels between at least two of the types of exercise.

2. Calculate the ANOVA statistics:

  • Compute the Sum of Squares Between (SSB), Sum of Squares Within (SSW), and Sum of Squares Total (SST).
  • Calculate the Degrees of Freedom (dfB, dfW, dfT).
  • Calculate the Mean Squares Between (MSB) and Mean Squares Within (MSW).
  • Compute the F-statistic (F = MSB / MSW).

3. Check the p-value associated with the calculated F-statistic.

  • If the p-value is less than the chosen significance level (often 0.05), then we reject the null hypothesis in favor of the alternative hypothesis. This suggests there is a statistically significant difference in mean stress levels between the three exercise types.

4. Post-hoc tests

  • If we reject the null hypothesis, we conduct a post-hoc test to determine which specific groups’ means (exercise types) are different from each other.

Examples 2:

Suppose an agricultural scientist wants to compare the yield of three varieties of wheat. The scientist randomly selects four fields for each variety and plants them. After harvest, the yield from each field is measured in bushels. Here are the hypothetical yields:

The scientist wants to know if the differences in yields are due to the different varieties or just random variation.

Here’s how to apply the one-way ANOVA to this situation:

  • Null Hypothesis (H0): The means of the three populations are equal.
  • Alternative Hypothesis (H1): At least one population mean is different.
  • Calculate the Degrees of Freedom (dfB for between groups, dfW for within groups, dfT for total).
  • If the p-value is less than the chosen significance level (often 0.05), then we reject the null hypothesis in favor of the alternative hypothesis. This would suggest there is a statistically significant difference in mean yields among the three varieties.
  • If we reject the null hypothesis, we conduct a post-hoc test to determine which specific groups’ means (wheat varieties) are different from each other.

How to Conduct ANOVA

Conducting an Analysis of Variance (ANOVA) involves several steps. Here’s a general guideline on how to perform it:

  • Null Hypothesis (H0): The means of all groups are equal.
  • Alternative Hypothesis (H1): At least one group mean is different from the others.
  • The significance level (often denoted as α) is usually set at 0.05. This implies that you are willing to accept a 5% chance that you are wrong in rejecting the null hypothesis.
  • Data should be collected for each group under study. Make sure that the data meet the assumptions of an ANOVA: normality, independence, and homogeneity of variances.
  • Calculate the Degrees of Freedom (df) for each sum of squares (dfB, dfW, dfT).
  • Compute the Mean Squares Between (MSB) and Mean Squares Within (MSW) by dividing the sum of squares by the corresponding degrees of freedom.
  • Compute the F-statistic as the ratio of MSB to MSW.
  • Determine the critical F-value from the F-distribution table using dfB and dfW.
  • If the calculated F-statistic is greater than the critical F-value, reject the null hypothesis.
  • If the p-value associated with the calculated F-statistic is smaller than the significance level (0.05 typically), you reject the null hypothesis.
  • If you rejected the null hypothesis, you can conduct post-hoc tests (like Tukey’s HSD) to determine which specific groups’ means (if you have more than two groups) are different from each other.
  • Regardless of the result, report your findings in a clear, understandable manner. This typically includes reporting the test statistic, p-value, and whether the null hypothesis was rejected.

When to use ANOVA

ANOVA (Analysis of Variance) is used when you have three or more groups and you want to compare their means to see if they are significantly different from each other. It is a statistical method that is used in a variety of research scenarios. Here are some examples of when you might use ANOVA:

  • Comparing Groups : If you want to compare the performance of more than two groups, for example, testing the effectiveness of different teaching methods on student performance.
  • Evaluating Interactions : In a two-way or factorial ANOVA, you can test for an interaction effect. This means you are not only interested in the effect of each individual factor, but also whether the effect of one factor depends on the level of another factor.
  • Repeated Measures : If you have measured the same subjects under different conditions or at different time points, you can use repeated measures ANOVA to compare the means of these repeated measures while accounting for the correlation between measures from the same subject.
  • Experimental Designs : ANOVA is often used in experimental research designs when subjects are randomly assigned to different conditions and the goal is to compare the means of the conditions.

Here are the assumptions that must be met to use ANOVA:

  • Normality : The data should be approximately normally distributed.
  • Homogeneity of Variances : The variances of the groups you are comparing should be roughly equal. This assumption can be tested using Levene’s test or Bartlett’s test.
  • Independence : The observations should be independent of each other. This assumption is met if the data is collected appropriately with no related groups (e.g., twins, matched pairs, repeated measures).

Applications of ANOVA

The Analysis of Variance (ANOVA) is a powerful statistical technique that is used widely across various fields and industries. Here are some of its key applications:

Agriculture

ANOVA is commonly used in agricultural research to compare the effectiveness of different types of fertilizers, crop varieties, or farming methods. For example, an agricultural researcher could use ANOVA to determine if there are significant differences in the yields of several varieties of wheat under the same conditions.

Manufacturing and Quality Control

ANOVA is used to determine if different manufacturing processes or machines produce different levels of product quality. For instance, an engineer might use it to test whether there are differences in the strength of a product based on the machine that produced it.

Marketing Research

Marketers often use ANOVA to test the effectiveness of different advertising strategies. For example, a marketer could use ANOVA to determine whether different marketing messages have a significant impact on consumer purchase intentions.

Healthcare and Medicine

In medical research, ANOVA can be used to compare the effectiveness of different treatments or drugs. For example, a medical researcher could use ANOVA to test whether there are significant differences in recovery times for patients who receive different types of therapy.

ANOVA is used in educational research to compare the effectiveness of different teaching methods or educational interventions. For example, an educator could use it to test whether students perform significantly differently when taught with different teaching methods.

Psychology and Social Sciences

Psychologists and social scientists use ANOVA to compare group means on various psychological and social variables. For example, a psychologist could use it to determine if there are significant differences in stress levels among individuals in different occupations.

Biology and Environmental Sciences

Biologists and environmental scientists use ANOVA to compare different biological and environmental conditions. For example, an environmental scientist could use it to determine if there are significant differences in the levels of a pollutant in different bodies of water.

Advantages of ANOVA

Here are some advantages of using ANOVA:

Comparing Multiple Groups: One of the key advantages of ANOVA is the ability to compare the means of three or more groups. This makes it more powerful and flexible than the t-test, which is limited to comparing only two groups.

Control of Type I Error: When comparing multiple groups, the chances of making a Type I error (false positive) increases. One of the strengths of ANOVA is that it controls the Type I error rate across all comparisons. This is in contrast to performing multiple pairwise t-tests which can inflate the Type I error rate.

Testing Interactions: In factorial ANOVA, you can test not only the main effect of each factor, but also the interaction effect between factors. This can provide valuable insights into how different factors or variables interact with each other.

Handling Continuous and Categorical Variables: ANOVA can handle both continuous and categorical variables . The dependent variable is continuous and the independent variables are categorical.

Robustness: ANOVA is considered robust to violations of normality assumption when group sizes are equal. This means that even if your data do not perfectly meet the normality assumption, you might still get valid results.

Provides Detailed Analysis: ANOVA provides a detailed breakdown of variances and interactions between variables which can be useful in understanding the underlying factors affecting the outcome.

Capability to Handle Complex Experimental Designs: Advanced types of ANOVA (like repeated measures ANOVA, MANOVA, etc.) can handle more complex experimental designs, including those where measurements are taken on the same subjects over time, or when you want to analyze multiple dependent variables at once.

Disadvantages of ANOVA

Some limitations or disadvantages that are important to consider:

Assumptions: ANOVA relies on several assumptions including normality (the data follows a normal distribution), independence (the observations are independent of each other), and homogeneity of variances (the variances of the groups are roughly equal). If these assumptions are violated, the results of the ANOVA may not be valid.

Sensitivity to Outliers: ANOVA can be sensitive to outliers. A single extreme value in one group can affect the sum of squares and consequently influence the F-statistic and the overall result of the test.

Dichotomous Variables: ANOVA is not suitable for dichotomous variables (variables that can take only two values, like yes/no or male/female). It is used to compare the means of groups for a continuous dependent variable.

Lack of Specificity: Although ANOVA can tell you that there is a significant difference between groups, it doesn’t tell you which specific groups are significantly different from each other. You need to carry out further post-hoc tests (like Tukey’s HSD or Bonferroni) for these pairwise comparisons.

Complexity with Multiple Factors: When dealing with multiple factors and interactions in factorial ANOVA, interpretation can become complex. The presence of interaction effects can make main effects difficult to interpret.

Requires Larger Sample Sizes: To detect an effect of a certain size, ANOVA generally requires larger sample sizes than a t-test.

Equal Group Sizes: While not always a strict requirement, ANOVA is most powerful and its assumptions are most likely to be met when groups are of equal or similar sizes.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

Graphical Methods

Graphical Methods – Types, Examples and Guide

Substantive Framework

Substantive Framework – Types, Methods and...

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

1.1: Basic Definitions and Concepts

  • Last updated
  • Save as PDF
  • Page ID 567

Learning Objectives

  • To learn the basic definitions used in statistics and some of its key concepts.

We begin with a simple example. There are millions of passenger automobiles in the United States. What is their average value? It is obviously impractical to attempt to solve this problem directly by assessing the value of every single car in the country, add up all those values, then divide by the number of values, one for each car. In practice the best we can do would be to estimate the average value. A natural way to do so would be to randomly select some of the cars, say \(200\) of them, ascertain the value of each of those cars, and find the average of those \(200\) values. The set of all those millions of vehicles is called the population of interest, and the number attached to each one, its value, is a measurement . The average value is a parameter : a number that describes a characteristic of the population, in this case monetary worth. The set of \(200\) cars selected from the population is called a sample , and the \(200\) numbers, the monetary values of the cars we selected, are the sample data . The average of the data is called a statistic : a number calculated from the sample data. This example illustrates the meaning of the following definitions.

Definitions: populations and samples

A population is any specific collection of objects of interest. A sample is any subset or subcollection of the population, including the case that the sample consists of the whole population, in which case it is termed a census.

Definitions: measurements and Sample Data

A measurement is a number or attribute computed for each member of a population or of a sample. The measurements of sample elements are collectively called the sample data .

Definition: parameters

A parameter is a number that summarizes some aspect of the population as a whole. A statistic is a number computed from the sample data.

Continuing with our example, if the average value of the cars in our sample was \($8,357\), then it seems reasonable to conclude that the average value of all cars is about \($8,357\). In reasoning this way we have drawn an inference about the population based on information obtained from the sample . In general, statistics is a study of data: describing properties of the data, which is called descriptive statistics , and drawing conclusions about a population of interest from information extracted from a sample, which is called inferential statistics . Computing the single number \($8,357\) to summarize the data was an operation of descriptive statistics; using it to make a statement about the population was an operation of inferential statistics.

Definition: Statistics

Statistics is a collection of methods for collecting, displaying, analyzing, and drawing conclusions from data.

Definition: Descriptive statistics

Descriptive statistics is the branch of statistics that involves organizing, displaying, and describing data.

Definition: Inferential statistics

Inferential statistics is the branch of statistics that involves drawing conclusions about a population based on information contained in a sample taken from that population.

Definition: Qualitative data

Qualitative data are measurements for which there is no natural numerical scale, but which consist of attributes, labels, or other non-numerical characteristics.

Definition: Quantitative data

Quantitative data are numerical measurements that arise from a natural numerical scale.

Qualitative data can generate numerical sample statistics. In the automobile example, for instance, we might be interested in the proportion of all cars that are less than six years old. In our same sample of \(200\) cars we could note for each car whether it is less than six years old or not, which is a qualitative measurement. If \(172\) cars in the sample are less than six years old, which is \(0.86\) or \(86\% \), then we would estimate the parameter of interest, the population proportion, to be about the same as the sample statistic, the sample proportion, that is, about \(0.86\).

The relationship between a population of interest and a sample drawn from that population is perhaps the most important concept in statistics, since everything else rests on it. This relationship is illustrated graphically in Figure \(\PageIndex{1}\). The circles in the large box represent elements of the population. In the figure there was room for only a small number of them but in actual situations, like our automobile example, they could very well number in the millions. The solid black circles represent the elements of the population that are selected at random and that together form the sample. For each element of the sample there is a measurement of interest, denoted by a lower case \(x\) (which we have indexed as \(x_1 , \ldots, x_n\) to tell them apart); these measurements collectively form the sample data set. From the data we may calculate various statistics. To anticipate the notation that will be used later, we might compute the sample mean \(\bar{x}\) and the sample proportion \(\hat{p}\), and take them as approximations to the population mean \(\mu\) (this is the lower case Greek letter mu, the traditional symbol for this parameter) and the population proportion \(p\), respectively. The other symbols in the figure stand for other parameters and statistics that we will encounter.

061a9224135424ca4644d6a0bb5f5913.jpg

Key Takeaway

  • Statistics is a study of data: describing properties of data (descriptive statistics) and drawing conclusions about a population based on information in a sample (inferential statistics).
  • The distinction between a population together with its parameters and a sample together with its statistics is a fundamental concept in inferential statistics.
  • Information in a sample is used to make inferences about the population from which the sample was drawn.

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organisations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organise and summarise the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalise your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarise your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, frequently asked questions about statistics.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalise your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalisable findings, you should use a probability sampling method. Random selection reduces sampling bias and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to be biased, they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalising your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalise your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialised, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalised in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardised indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarise them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organising data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualising the relationship between two variables using a scatter plot .

By visualising your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimise the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasises null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
  • If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Statistical analysis is the main method for analyzing quantitative research data . It uses probabilities and models to test predictions about a population from sample data.

Is this article helpful?

Other students also liked, a quick guide to experimental design | 5 steps & examples, controlled experiments | methods & examples of control, between-subjects design | examples, pros & cons, more interesting articles.

  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Correlation Coefficient | Types, Formulas & Examples
  • Descriptive Statistics | Definitions, Types, Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | Meaning, Formula & Examples
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Inferential Statistics | An Easy Introduction & Examples
  • Levels of measurement: Nominal, ordinal, interval, ratio
  • Missing Data | Types, Explanation, & Imputation
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Skewness | Definition, Examples & Formula
  • T-Distribution | What It Is and How To Use It (With Examples)
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Interval Data? | Examples & Definition
  • What Is Nominal Data? | Examples & Definition
  • What Is Ordinal Data? | Examples & Definition
  • What Is Ratio Data? | Examples & Definition
  • What Is the Mode in Statistics? | Definition, Examples & Calculator

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

Basic statistical tools in research and data analysis

Zulfiqar ali.

Department of Anaesthesiology, Division of Neuroanaesthesiology, Sheri Kashmir Institute of Medical Sciences, Soura, Srinagar, Jammu and Kashmir, India

S Bala Bhaskar

1 Department of Anaesthesiology and Critical Care, Vijayanagar Institute of Medical Sciences, Bellary, Karnataka, India

Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

INTRODUCTION

Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.[ 1 ] This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice.[ 2 ]

Variable is a characteristic that varies from one individual member of population to another individual.[ 3 ] Variables such as height and weight are measured by some type of scale, convey quantitative information and are called as quantitative variables. Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g001.jpg

Classification of variables

Quantitative variables

Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer), whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data. Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.

A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ].

Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist (as in gender male and female), it is called as a dichotomous (or binary) data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.

Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale.

Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature. With the Fahrenheit scale, the difference between 70° and 75° is equal to the difference between 80° and 85°: The units of measurement are equal throughout the full range of the scale.

Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.

STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS

Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1 .

Example of descriptive and inferential statistics

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g002.jpg

Descriptive statistics

The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.

Measures of central tendency

The measures of central tendency are mean, median and mode.[ 6 ] Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g003.jpg

where x = each observation and n = number of observations. Median[ 6 ] is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value) while mode is the most frequently occurring variable in a distribution. Range defines the spread, or variability, of a sample.[ 7 ] It is described by the minimum and maximum values of the variables. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into 100 equal parts. We can then describe 25%, 50%, 75% or any other percentile amount. The median is the 50 th percentile. The interquartile range will be the observations in the middle 50% of the observations about the median (25 th -75 th percentile). Variance[ 7 ] is a measure of how spread out is the distribution. It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g004.jpg

where σ 2 is the population variance, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The variance of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g005.jpg

where s 2 is the sample variance, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. The formula for the variance of a population has the value ‘ n ’ as the denominator. The expression ‘ n −1’ is known as the degrees of freedom and is one less than the number of parameters. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD).[ 8 ] The SD of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g006.jpg

where σ is the population SD, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The SD of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g007.jpg

where s is the sample SD, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. An example for calculation of variation and SD is illustrated in Table 2 .

Example of mean, variance, standard deviation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g008.jpg

Normal distribution or Gaussian distribution

Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point.[ 1 ] The standard normal distribution curve is a symmetrical bell-shaped. In a normal distribution curve, about 68% of the scores are within 1 SD of the mean. Around 95% of the scores are within 2 SDs of the mean and 99% within 3 SDs of the mean [ Figure 2 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g009.jpg

Normal distribution curve

Skewed distribution

It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1 . In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g010.jpg

Curves showing negatively skewed and positively skewed distribution

Inferential statistics

In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects.

Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).

In inferential statistics, the term ‘null hypothesis’ ( H 0 ‘ H-naught ,’ ‘ H-null ’) denotes that there is no relationship (difference) between the population variables in question.[ 9 ]

Alternative hypothesis ( H 1 and H a ) denotes that a statement between the variables is expected to be true.[ 9 ]

The P value (or the calculated probability) is the probability of the event occurring by chance if the null hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ].

P values with interpretation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g011.jpg

If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is rejected [ Table 4 ]. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.[ 11 ] Further details regarding alpha error, beta error and sample size calculation and factors influencing them are dealt with in another section of this issue by Das S et al .[ 12 ]

Illustration for null hypothesis

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g012.jpg

PARAMETRIC AND NON-PARAMETRIC TESTS

Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.[ 13 ]

Two most basic prerequisites for parametric statistical analysis are:

  • The assumption of normality which specifies that the means of the sample group are normally distributed
  • The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.

However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

Parametric tests

The parametric tests assume that the data are on a quantitative (numerical) scale, with a normal distribution of the underlying population. The samples have the same variance (homogeneity of variances). The samples are randomly drawn from the population, and the observations within a group are independent of each other. The commonly used parametric tests are the Student's t -test, analysis of variance (ANOVA) and repeated measures ANOVA.

Student's t -test

Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g013.jpg

where X = sample mean, u = population mean and SE = standard error of mean

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g014.jpg

where X 1 − X 2 is the difference between the means of the two groups and SE denotes the standard error of the difference.

  • To test if the population means estimated by two dependent samples differ significantly (the paired t -test). A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.

The formula for paired t -test is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g015.jpg

where d is the mean difference and SE denotes the standard error of this difference.

The group variances can be compared using the F -test. The F -test is the ratio of variances (var l/var 2). If F differs significantly from 1.0, then it is concluded that the group variances differ significantly.

Analysis of variance

The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.

In ANOVA, we study two variances – (a) between-group variability and (b) within-group variability. The within-group variability (error variance) is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples.

However, the between-group (or effect variance) is the result of our treatment. These two estimates of variances are compared using the F-test.

A simplified formula for the F statistic is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g016.jpg

where MS b is the mean squares between the groups and MS w is the mean squares within groups.

Repeated measures analysis of variance

As with ANOVA, repeated measures ANOVA analyses the equality of means of three or more groups. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.

As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: The data violate the ANOVA assumption of independence. Hence, in the measurement of repeated dependent variables, repeated measures ANOVA should be used.

Non-parametric tests

When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests (distribution-free test) are used in such situation as they do not require the normality assumption.[ 15 ] Non-parametric tests may fail to detect a significant difference when compared with a parametric test. That is, they usually have less power.

As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5 .

Analogue of parametric and non-parametric tests

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g017.jpg

Median test for one sample: The sign test and Wilcoxon's signed rank test

The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.

This test examines the hypothesis about the median θ0 of a population. It tests the null hypothesis H0 = θ0. When the observed value (Xi) is greater than the reference value (θ0), it is marked as+. If the observed value is smaller than the reference value, it is marked as − sign. If the observed value is equal to the reference value (θ0), it is eliminated from the sample.

If the null hypothesis is true, there will be an equal number of + signs and − signs.

The sign test ignores the actual values of the data and only uses + or − signs. Therefore, it is useful when it is difficult to measure the values.

Wilcoxon's signed rank test

There is a major limitation of sign test as we lose the quantitative information of the given data and merely use the + or – signs. Wilcoxon's signed rank test not only examines the observed values in comparison with θ0 but also takes into consideration the relative sizes, adding more statistical power to the test. As in the sign test, if there is an observed value that is equal to the reference value θ0, this observed value is eliminated from the sample.

Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums.

Mann-Whitney test

It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.

Mann–Whitney test compares all data (xi) belonging to the X group and all data (yi) belonging to the Y group and calculates the probability of xi being greater than yi: P (xi > yi). The null hypothesis states that P (xi > yi) = P (xi < yi) =1/2 while the alternative hypothesis states that P (xi > yi) ≠1/2.

Kolmogorov-Smirnov test

The two-sample Kolmogorov-Smirnov (KS) test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.

Kruskal-Wallis test

The Kruskal–Wallis test is a non-parametric test to analyse the variance.[ 14 ] It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic.

Jonckheere test

In contrast to Kruskal–Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal–Wallis test.[ 14 ]

Friedman test

The Friedman test is a non-parametric test for testing the difference between several related samples. The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects.[ 13 ]

Tests to analyse the categorical data

Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups (i.e., the null hypothesis). It is calculated by the sum of the squared difference between observed ( O ) and the expected ( E ) data (or the deviation, d ) divided by the expected data by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g018.jpg

A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables. It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. It is applied to 2 × 2 table with paired-dependent samples. It is used to determine whether the row and column frequencies are equal (that is, whether there is ‘marginal homogeneity’). The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.

SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

Numerous statistical software systems are available currently. The commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System ((SAS – developed by SAS Institute North Carolina, United States of America), R (designed by Ross Ihaka and Robert Gentleman from R core team), Minitab (developed by Minitab Inc), Stata (developed by StataCorp) and the MS Excel (developed by Microsoft).

There are a number of web resources which are related to statistical power analyses. A few are:

  • StatPages.net – provides links to a number of online power calculators
  • G-Power – provides a downloadable power analysis program that runs under DOS
  • Power analysis for ANOVA designs an interactive site that calculates power or sample size needed to attain a given power for one effect in a factorial ANOVA design
  • SPSS makes a program called SamplePower. It gives an output of a complete report on the computer screen which can be cut and paste into another document.

It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

1.1 Definitions of Statistics, Probability, and Key Terms

The science of statistics deals with the collection, analysis, interpretation, and presentation of data . We see and use data in our everyday lives.

Collaborative Exercise

In your classroom, try this exercise. Have class members write down the average time—in hours, to the nearest half-hour—they sleep per night. Your instructor will record the data. Then create a simple graph, called a dot plot, of the data. A dot plot consists of a number line and dots, or points, positioned above the number line. For example, consider the following data:

5, 5.5, 6, 6, 6, 6.5, 6.5, 6.5, 6.5, 7, 7, 8, 8, 9.

The dot plot for this data would be as follows:

Does your dot plot look the same as or different from the example? Why? If you did the same example in an English class with the same number of students, do you think the results would be the same? Why or why not?

Where do your data appear to cluster? How might you interpret the clustering?

The questions above ask you to analyze and interpret your data. With this example, you have begun your study of statistics.

In this course, you will learn how to organize and summarize data. Organizing and summarizing data is called descriptive statistics . Two ways to summarize data are by graphing and by using numbers, for example, finding an average. After you have studied probability and probability distributions, you will use formal methods for drawing conclusions from good data. The formal methods are called inferential statistics . Statistical inference uses probability to determine how confident we can be that our conclusions are correct.

Effective interpretation of data, or inference, is based on good procedures for producing data and thoughtful examination of the data. You will encounter what will seem to be too many mathematical formulas for interpreting data. The goal of statistics is not to perform numerous calculations using the formulas, but to gain an understanding of your data. The calculations can be done using a calculator or a computer. The understanding must come from you. If you can thoroughly grasp the basics of statistics, you can be more confident in the decisions you make in life.

Statistical Models

Statistics, like all other branches of mathematics, uses mathematical models to describe phenomena that occur in the real world. Some mathematical models are deterministic. These models can be used when one value is precisely determined from another value. Examples of deterministic models are the quadratic equations that describe the acceleration of a car from rest or the differential equations that describe the transfer of heat from a stove to a pot. These models are quite accurate and can be used to answer questions and make predictions with a high degree of precision. Space agencies, for example, use deterministic models to predict the exact amount of thrust that a rocket needs to break away from Earth’s gravity and achieve orbit.

However, life is not always precise. While scientists can predict to the minute the time that the sun will rise, they cannot say precisely where a hurricane will make landfall. Statistical models can be used to predict life’s more uncertain situations. These special forms of mathematical models or functions are based on the idea that one value affects another value. Some statistical models are mathematical functions that are more precise—one set of values can predict or determine another set of values. Or some statistical models are mathematical functions in which a set of values do not precisely determine other values. Statistical models are very useful because they can describe the probability or likelihood of an event occurring and provide alternative outcomes if the event does not occur. For example, weather forecasts are examples of statistical models. Meteorologists cannot predict tomorrow’s weather with certainty. However, they often use statistical models to tell you how likely it is to rain at any given time, and you can prepare yourself based on this probability.

Probability

Probability is a mathematical tool used to study randomness. It deals with the chance of an event occurring. For example, if you toss a fair coin four times, the outcomes may not be two heads and two tails. However, if you toss the same coin 4,000 times, the outcomes will be close to half heads and half tails. The expected theoretical probability of heads in any one toss is 1 2 1 2 or .5. Even though the outcomes of a few repetitions are uncertain, there is a regular pattern of outcomes when there are many repetitions. After reading about the English statistician Karl Pearson who tossed a coin 24,000 times with a result of 12,012 heads, one of the authors tossed a coin 2,000 times. The results were 996 heads. The fraction 996 2,000 996 2,000 is equal to .498 which is very close to .5, the expected probability.

The theory of probability began with the study of games of chance such as poker. Predictions take the form of probabilities. To predict the likelihood of an earthquake, of rain, or whether you will get an A in this course, we use probabilities. Doctors use probability to determine the chance of a vaccination causing the disease the vaccination is supposed to prevent. A stockbroker uses probability to determine the rate of return on a client's investments.

In statistics, we generally want to study a population . You can think of a population as a collection of persons, things, or objects under study. To study the population, we select a sample . The idea of sampling is to select a portion, or subset, of the larger population and study that portion—the sample—to gain information about the population. Data are the result of sampling from a population.

Because it takes a lot of time and money to examine an entire population, sampling is a very practical technique. If you wished to compute the overall grade point average at your school, it would make sense to select a sample of students who attend the school. The data collected from the sample would be the students' grade point averages. In presidential elections, opinion poll samples of 1,000–2,000 people are taken. The opinion poll is supposed to represent the views of the people in the entire country. Manufacturers of canned carbonated drinks take samples to determine if a 16-ounce can contains 16 ounces of carbonated drink.

From the sample data, we can calculate a statistic. A statistic is a number that represents a property of the sample. For example, if we consider one math class as a sample of the population of all math classes, then the average number of points earned by students in that one math class at the end of the term is an example of a statistic. Since we do not have the data for all math classes, that statistic is our best estimate of the average for the entire population of math classes. If we happen to have data for all math classes, we can find the population parameter. A parameter is a numerical characteristic of the whole population that can be estimated by a statistic. Since we considered all math classes to be the population, then the average number of points earned per student over all the math classes is an example of a parameter.

One of the main concerns in the field of statistics is how accurately a statistic estimates a parameter. In order to have an accurate sample, it must contain the characteristics of the population in order to be a representative sample . We are interested in both the sample statistic and the population parameter in inferential statistics. In a later chapter, we will use the sample statistic to test the validity of the established population parameter.

A variable , usually notated by capital letters such as X and Y , is a characteristic or measurement that can be determined for each member of a population. Variables may describe values like weight in pounds or favorite subject in school. Numerical variables take on values with equal units such as weight in pounds and time in hours. Categorical variables place the person or thing into a category. If we let X equal the number of points earned by one math student at the end of a term, then X is a numerical variable. If we let Y be a person's party affiliation, then some examples of Y include Republican, Democrat, and Independent. Y is a categorical variable. We could do some math with values of X —calculate the average number of points earned, for example—but it makes no sense to do math with values of Y —calculating an average party affiliation makes no sense.

Data are the actual values of the variable. They may be numbers or they may be words. Datum is a single value.

Two words that come up often in statistics are mean and proportion . If you were to take three exams in your math classes and obtain scores of 86, 75, and 92, you would calculate your mean score by adding the three exam scores and dividing by three. Your mean score would be 84.3 to one decimal place. If, in your math class, there are 40 students and 22 are males and 18 females, then the proportion of men students is 22 40 22 40 and the proportion of women students is 18 40 18 40 . Mean and proportion are discussed in more detail in later chapters.

The words mean and average are often used interchangeably. In this book, we use the term arithmetic mean for mean.

Example 1.1

Determine what the population, sample, parameter, statistic, variable, and data referred to in the following study.

We want to know the mean amount of extracurricular activities in which high school students participate. We randomly surveyed 100 high school students. Three of those students were in 2, 5, and 7 extracurricular activities, respectively.

The population is all high school students.

The sample is the 100 high school students interviewed.

The parameter is the mean amount of extracurricular activities in which all high school students participate.

The statistic is the mean amount of extracurricular activities in which the sample of high school students participate.

The variable could be the amount of extracurricular activities by one high school student. Let X = the amount of extracurricular activities by one high school student.

The data are the number of extracurricular activities in which the high school students participate. Examples of the data are 2, 5, 7.

Find an article online or in a newspaper or magazine that refers to a statistical study or poll. Identify what each of the key terms—population, sample, parameter, statistic, variable, and data—refers to in the study mentioned in the article. Does the article use the key terms correctly?

Example 1.2

Determine what the key terms refer to in the following study.

A study was conducted at a local high school to analyze the average cumulative GPAs of students who graduated last year. Fill in the letter of the phrase that best describes each of the items below.

1. Population ____ 2. Statistic ____ 3. Parameter ____ 4. Sample ____ 5. Variable ____ 6. Data ____

  • a) all students who attended the high school last year
  • b) the cumulative GPA of one student who graduated from the high school last year
  • c) 3.65, 2.80, 1.50, 3.90
  • d) a group of students who graduated from the high school last year, randomly selected
  • e) the average cumulative GPA of students who graduated from the high school last year
  • f) all students who graduated from the high school last year
  • g) the average cumulative GPA of students in the study who graduated from the high school last year

1. f ; 2. g ; 3. e ; 4. d ; 5. b ; 6. c

Example 1.3

As part of a study designed to test the safety of automobiles, the National Transportation Safety Board collected and reviewed data about the effects of an automobile crash on test dummies (The Data and Story Library, n.d.). Here is the criterion they used.

Cars with dummies in the front seats were crashed into a wall at a speed of 35 miles per hour. We want to know the proportion of dummies in the driver’s seat that would have had head injuries, if they had been actual drivers. We start with a simple random sample of 75 cars.

The population is all cars containing dummies in the front seat.

The sample is the 75 cars, selected by a simple random sample.

The parameter is the proportion of driver dummies—if they had been real people—who would have suffered head injuries in the population.

The statistic is proportion of driver dummies—if they had been real people—who would have suffered head injuries in the sample.

The variable X = whether driver dummies—if they had been real people—would have suffered head injuries.

The data are either: yes, had head injury, or no, did not.

Example 1.4

An insurance company would like to determine the proportion of all medical doctors who have been involved in one or more malpractice lawsuits. The company selects 500 doctors at random from a professional directory and determines the number in the sample who have been involved in a malpractice lawsuit.

The population is all medical doctors listed in the professional directory.

The parameter is the proportion of medical doctors who have been involved in one or more malpractice suits in the population.

The sample is the 500 doctors selected at random from the professional directory.

The statistic is the proportion of medical doctors who have been involved in one or more malpractice suits in the sample.

The variable X records whether a doctor has or has not been involved in a malpractice suit.

The data are either: yes, was involved in one or more malpractice lawsuits; or no, was not.

Do the following exercise collaboratively with up to four people per group. Find a population, a sample, the parameter, the statistic, a variable, and data for the following study: You want to determine the average—mean—number of glasses of milk college students drink per day. Suppose yesterday, in your English class, you asked five students how many glasses of milk they drank the day before. The answers were 1, 0, 1, 3, and 4 glasses of milk.

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/1-1-definitions-of-statistics-probability-and-key-terms

© Jan 23, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Descriptive Statistics | Definitions, Types, Examples

Published on July 9, 2020 by Pritha Bhandari . Revised on June 21, 2023.

Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population.

In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

The next step is inferential statistics , which help you decide whether your data confirms or refutes your hypothesis and whether it is generalizable to a larger population.

Table of contents

Types of descriptive statistics, frequency distribution, measures of central tendency, measures of variability, univariate descriptive statistics, bivariate descriptive statistics, other interesting articles, frequently asked questions about descriptive statistics.

There are 3 main types of descriptive statistics:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability or dispersion concerns how spread out the values are.

Types of descriptive statistics

You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and multivariate analysis.

  • Go to a library
  • Watch a movie at a theater
  • Visit a national park

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

statistical thesis meaning

A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarize the frequency of every possible value of a variable in numbers or percentages. This is called a frequency distribution .

  • Simple frequency distribution table
  • Grouped frequency distribution table

From this table, you can see that more women than men or people with another gender identity took part in the study. In a grouped frequency distribution, you can group numerical response values and add up the number of responses for each group. You can also convert each of these numbers to percentages.

Measures of central tendency estimate the center, or average, of a data set. The mean, median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using the first 6 responses of our survey.

The mean , or M , is the most commonly used method for finding the average.

To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N .

The median is the value that’s exactly in the middle of a data set.

To find the median, order each response value from the smallest to the biggest. Then , the median is the number in the middle. If there are two numbers in the middle, find their mean.

The mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode.

To find the mode, order your data set from lowest to highest and find the response that occurs most frequently.

Measures of variability give you a sense of how spread out the response values are. The range, standard deviation and variance each reflect different aspects of spread.

The range gives you an idea of how far apart the most extreme response scores are. To find the range , simply subtract the lowest value from the highest value.

Standard deviation

The standard deviation ( s or SD ) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:

  • List each score and find their mean.
  • Subtract the mean from each score to get the deviation from the mean.
  • Square each of these deviations.
  • Add up all of the squared deviations.
  • Divide the sum of the squared deviations by N – 1.
  • Find the square root of the number you found.

Step 5: 421.5/5 = 84.3

Step 6: √84.3 = 9.18

The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.

To find the variance, simply square the standard deviation. The symbol for variance is s 2 .

Prevent plagiarism. Run a free check.

Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread. Programs like SPSS and Excel can be used to easily calculate these.

If you were to only consider the mean as a measure of central tendency, your impression of the “middle” of the data set can be skewed by outliers, unlike the median or mode.

Likewise, while the range is sensitive to outliers , you should also consider the standard deviation and variance to get easily comparable measures of spread.

If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests .

Multivariate analysis is the same as bivariate analysis but with more than two variables.

Contingency table

In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read “across” the table to see how the independent and dependent variables relate to each other.

Interpreting a contingency table is easier when the raw data is converted to percentages. Percentages make each row comparable to the other by making it seem as if each group had only 100 observations or participants. When creating a percentage-based contingency table, you add the N for each independent variable on the end.

From this table, it is more clear that similar proportions of children and adults go to the library over 17 times a year. Additionally, children most commonly went to the library between 5 and 8 times, while for adults, this number was between 13 and 16.

Scatter plots

A scatter plot is a chart that shows you the relationship between two or three variables . It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

From your scatter plot, you see that as the number of movies seen at movie theaters increases, the number of visits to the library decreases. Based on your visual assessment of a possible linear relationship, you perform further tests of correlation and regression.

Descriptive statistics: Scatter plot

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Statistical power
  • Pearson correlation
  • Degrees of freedom
  • Statistical significance

Methodology

  • Cluster sampling
  • Stratified sampling
  • Focus group
  • Systematic review
  • Ethnography
  • Double-Barreled Question

Research bias

  • Implicit bias
  • Publication bias
  • Cognitive bias
  • Placebo effect
  • Pygmalion effect
  • Hindsight bias
  • Overconfidence bias

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.
  • Univariate statistics summarize only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 21). Descriptive Statistics | Definitions, Types, Examples. Scribbr. Retrieved April 1, 2024, from https://www.scribbr.com/statistics/descriptive-statistics/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, central tendency | understanding the mean, median & mode, variability | calculating range, iqr, variance, standard deviation, inferential statistics | an easy introduction & examples, what is your plagiarism score.

Digital Commons @ University of South Florida

  • USF Research
  • USF Libraries

Digital Commons @ USF > College of Arts and Sciences > Mathematics and Statistics > Theses and Dissertations

Mathematics and Statistics Theses and Dissertations

Theses/dissertations from 2023 2023.

Classification of Finite Topological Quandles and Shelves via Posets , Hitakshi Lahrani

Applied Analysis for Learning Architectures , Himanshu Singh

Rational Functions of Degree Five That Permute the Projective Line Over a Finite Field , Christopher Sze

Theses/Dissertations from 2022 2022

New Developments in Statistical Optimal Designs for Physical and Computer Experiments , Damola M. Akinlana

Advances and Applications of Optimal Polynomial Approximants , Raymond Centner

Data-Driven Analytical Predictive Modeling for Pancreatic Cancer, Financial & Social Systems , Aditya Chakraborty

On Simultaneous Similarity of d-tuples of Commuting Square Matrices , Corey Connelly

Symbolic Computation of Lump Solutions to a Combined (2+1)-dimensional Nonlinear Evolution Equation , Jingwei He

Boundary behavior of analytic functions and Approximation Theory , Spyros Pasias

Stability Analysis of Delay-Driven Coupled Cantilevers Using the Lambert W-Function , Daniel Siebel-Cortopassi

A Functional Optimization Approach to Stochastic Process Sampling , Ryan Matthew Thurman

Theses/Dissertations from 2021 2021

Riemann-Hilbert Problems for Nonlocal Reverse-Time Nonlinear Second-order and Fourth-order AKNS Systems of Multiple Components and Exact Soliton Solutions , Alle Adjiri

Zeros of Harmonic Polynomials and Related Applications , Azizah Alrajhi

Combination of Time Series Analysis and Sentiment Analysis for Stock Market Forecasting , Hsiao-Chuan Chou

Uncertainty Quantification in Deep and Statistical Learning with applications in Bio-Medical Image Analysis , K. Ruwani M. Fernando

Data-Driven Analytical Modeling of Multiple Myeloma Cancer, U.S. Crop Production and Monitoring Process , Lohuwa Mamudu

Long-time Asymptotics for mKdV Type Reduced Equations of the AKNS Hierarchy in Weighted L 2 Sobolev Spaces , Fudong Wang

Online and Adjusted Human Activities Recognition with Statistical Learning , Yanjia Zhang

Theses/Dissertations from 2020 2020

Bayesian Reliability Analysis of The Power Law Process and Statistical Modeling of Computer and Network Vulnerabilities with Cybersecurity Application , Freeh N. Alenezi

Discrete Models and Algorithms for Analyzing DNA Rearrangements , Jasper Braun

Bayesian Reliability Analysis for Optical Media Using Accelerated Degradation Test Data , Kun Bu

On the p(x)-Laplace equation in Carnot groups , Robert D. Freeman

Clustering methods for gene expression data of Oxytricha trifallax , Kyle Houfek

Gradient Boosting for Survival Analysis with Applications in Oncology , Nam Phuong Nguyen

Global and Stochastic Dynamics of Diffusive Hindmarsh-Rose Equations in Neurodynamics , Chi Phan

Restricted Isometric Projections for Differentiable Manifolds and Applications , Vasile Pop

On Some Problems on Polynomial Interpolation in Several Variables , Brian Jon Tuesink

Numerical Study of Gap Distributions in Determinantal Point Process on Low Dimensional Spheres: L -Ensemble of O ( n ) Model Type for n = 2 and n = 3 , Xiankui Yang

Non-Associative Algebraic Structures in Knot Theory , Emanuele Zappala

Theses/Dissertations from 2019 2019

Field Quantization for Radiative Decay of Plasmons in Finite and Infinite Geometries , Maryam Bagherian

Probabilistic Modeling of Democracy, Corruption, Hemophilia A and Prediabetes Data , A. K. M. Raquibul Bashar

Generalized Derivations of Ternary Lie Algebras and n-BiHom-Lie Algebras , Amine Ben Abdeljelil

Fractional Random Weighted Bootstrapping for Classification on Imbalanced Data with Ensemble Decision Tree Methods , Sean Charles Carter

Hierarchical Self-Assembly and Substitution Rules , Daniel Alejandro Cruz

Statistical Learning of Biomedical Non-Stationary Signals and Quality of Life Modeling , Mahdi Goudarzi

Probabilistic and Statistical Prediction Models for Alzheimer’s Disease and Statistical Analysis of Global Warming , Maryam Ibrahim Habadi

Essays on Time Series and Machine Learning Techniques for Risk Management , Michael Kotarinos

The Systems of Post and Post Algebras: A Demonstration of an Obvious Fact , Daviel Leyva

Reconstruction of Radar Images by Using Spherical Mean and Regular Radon Transforms , Ozan Pirbudak

Analyses of Unorthodox Overlapping Gene Segments in Oxytricha Trifallax , Shannon Stich

An Optimal Medium-Strength Regularity Algorithm for 3-uniform Hypergraphs , John Theado

Power Graphs of Quasigroups , DayVon L. Walker

Theses/Dissertations from 2018 2018

Groups Generated by Automata Arising from Transformations of the Boundaries of Rooted Trees , Elsayed Ahmed

Non-equilibrium Phase Transitions in Interacting Diffusions , Wael Al-Sawai

A Hybrid Dynamic Modeling of Time-to-event Processes and Applications , Emmanuel A. Appiah

Lump Solutions and Riemann-Hilbert Approach to Soliton Equations , Sumayah A. Batwa

Developing a Model to Predict Prevalence of Compulsive Behavior in Individuals with OCD , Lindsay D. Fields

Generalizations of Quandles and their cohomologies , Matthew J. Green

Hamiltonian structures and Riemann-Hilbert problems of integrable systems , Xiang Gu

Optimal Latin Hypercube Designs for Computer Experiments Based on Multiple Objectives , Ruizhe Hou

Human Activity Recognition Based on Transfer Learning , Jinyong Pang

Signal Detection of Adverse Drug Reaction using the Adverse Event Reporting System: Literature Review and Novel Methods , Minh H. Pham

Statistical Analysis and Modeling of Cyber Security and Health Sciences , Nawa Raj Pokhrel

Machine Learning Methods for Network Intrusion Detection and Intrusion Prevention Systems , Zheni Svetoslavova Stefanova

Orthogonal Polynomials With Respect to the Measure Supported Over the Whole Complex Plane , Meng Yang

Theses/Dissertations from 2017 2017

Modeling in Finance and Insurance With Levy-It'o Driven Dynamic Processes under Semi Markov-type Switching Regimes and Time Domains , Patrick Armand Assonken Tonfack

Prevalence of Typical Images in High School Geometry Textbooks , Megan N. Cannon

On Extending Hansel's Theorem to Hypergraphs , Gregory Sutton Churchill

Contributions to Quandle Theory: A Study of f-Quandles, Extensions, and Cohomology , Indu Rasika U. Churchill

Linear Extremal Problems in the Hardy Space H p for 0 p , Robert Christopher Connelly

Statistical Analysis and Modeling of Ovarian and Breast Cancer , Muditha V. Devamitta Perera

Statistical Analysis and Modeling of Stomach Cancer Data , Chao Gao

Structural Analysis of Poloidal and Toroidal Plasmons and Fields of Multilayer Nanorings , Kumar Vijay Garapati

Dynamics of Multicultural Social Networks , Kristina B. Hilton

Cybersecurity: Stochastic Analysis and Modelling of Vulnerabilities to Determine the Network Security and Attackers Behavior , Pubudu Kalpani Kaluarachchi

Generalized D-Kaup-Newell integrable systems and their integrable couplings and Darboux transformations , Morgan Ashley McAnally

Patterns in Words Related to DNA Rearrangements , Lukas Nabergall

Time Series Online Empirical Bayesian Kernel Density Segmentation: Applications in Real Time Activity Recognition Using Smartphone Accelerometer , Shuang Na

Schreier Graphs of Thompson's Group T , Allen Pennington

Cybersecurity: Probabilistic Behavior of Vulnerability and Life Cycle , Sasith Maduranga Rajasooriya

Bayesian Artificial Neural Networks in Health and Cybersecurity , Hansapani Sarasepa Rodrigo

Real-time Classification of Biomedical Signals, Parkinson’s Analytical Model , Abolfazl Saghafi

Lump, complexiton and algebro-geometric solutions to soliton equations , Yuan Zhou

Theses/Dissertations from 2016 2016

A Statistical Analysis of Hurricanes in the Atlantic Basin and Sinkholes in Florida , Joy Marie D'andrea

Statistical Analysis of a Risk Factor in Finance and Environmental Models for Belize , Sherlene Enriquez-Savery

Putnam's Inequality and Analytic Content in the Bergman Space , Matthew Fleeman

On the Number of Colors in Quandle Knot Colorings , Jeremy William Kerr

Statistical Modeling of Carbon Dioxide and Cluster Analysis of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, and Multi-Level Time Series Clustering , Doo Young Kim

Some Results Concerning Permutation Polynomials over Finite Fields , Stephen Lappano

Hamiltonian Formulations and Symmetry Constraints of Soliton Hierarchies of (1+1)-Dimensional Nonlinear Evolution Equations , Solomon Manukure

Modeling and Survival Analysis of Breast Cancer: A Statistical, Artificial Neural Network, and Decision Tree Approach , Venkateswara Rao Mudunuru

Generalized Phase Retrieval: Isometries in Vector Spaces , Josiah Park

Leonard Systems and their Friends , Jonathan Spiewak

Resonant Solutions to (3+1)-dimensional Bilinear Differential Equations , Yue Sun

Statistical Analysis and Modeling Health Data: A Longitudinal Study , Bhikhari Prasad Tharu

Global Attractors and Random Attractors of Reaction-Diffusion Systems , Junyi Tu

Time Dependent Kernel Density Estimation: A New Parameter Estimation Algorithm, Applications in Time Series Classification and Clustering , Xing Wang

On Spectral Properties of Single Layer Potentials , Seyed Zoalroshd

Theses/Dissertations from 2015 2015

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach , Wei Chen

Active Tile Self-assembly and Simulations of Computational Systems , Daria Karpenko

Nearest Neighbor Foreign Exchange Rate Forecasting with Mahalanobis Distance , Vindya Kumari Pathirana

Statistical Learning with Artificial Neural Network Applied to Health and Environmental Data , Taysseer Sharaf

Radial Versus Othogonal and Minimal Projections onto Hyperplanes in l_4^3 , Richard Alan Warner

Ensemble Learning Method on Machine Maintenance Data , Xiaochuang Zhao

Theses/Dissertations from 2014 2014

Properties of Graphs Used to Model DNA Recombination , Ryan Arredondo

Recursive Methods in Number Theory, Combinatorial Graph Theory, and Probability , Jonathan Burns

On the Classification of Groups Generated by Automata with 4 States over a 2-Letter Alphabet , Louis Caponi

Statistical Analysis, Modeling, and Algorithms for Pharmaceutical and Cancer Systems , Bong-Jin Choi

Topological Data Analysis of Properties of Four-Regular Rigid Vertex Graphs , Grant Mcneil Conine

Trend Analysis and Modeling of Health and Environmental Data: Joinpoint and Functional Approach , Ram C. Kafle

Advanced Search

  • Email Notifications and RSS
  • All Collections
  • USF Faculty Publications
  • Open Access Journals
  • Conferences and Events
  • Theses and Dissertations
  • Textbooks Collection

Useful Links

  • Mathematics and Statistics Department
  • Rights Information
  • SelectedWorks
  • Submit Research

Home | About | Help | My Account | Accessibility Statement | Language and Diversity Statements

Privacy Copyright

Advertisement

The Dali was just starting a 27-day voyage.

The ship had spent two days in Baltimore’s port before setting off.

  • Share full article

The side of a large ship, painted blue, with the words “Dali” and “Singapore,” sitting at a port.

By Claire Moses and Jenny Gross

  • Published March 26, 2024 Updated March 27, 2024

The Dali was less than 30 minutes into its planned 27-day journey when the ship ran into the Francis Scott Key Bridge on Tuesday.

The ship, which was sailing under the Singaporean flag, was on its way to Sri Lanka and was supposed to arrive there on April 22, according to VesselFinder, a ship tracking website.

The Dali, which is nearly 1,000 feet long, left the Baltimore port around 1 a.m. Eastern on Tuesday. The ship had two pilots onboard, according to a statement by its owners, Grace Ocean Investment. There were 22 crew members on board, the Maritime & Port Authority of Singapore said in a statement. There were no reports of any injuries, Grace Ocean said.

Before heading off on its voyage, the Dali had returned to the United States from Panama on March 19, harboring in New York. It then arrived on Saturday in Baltimore, where it spent two days in the port.

Maersk, the shipping giant, said in a statement on Tuesday that it had chartered the vessel, which was carrying Maersk cargo. No Maersk crew and personnel were onboard, the statement said, adding that the company was monitoring the investigations being carried out by the authorities and by Synergy Group, the company that was operating the vessel.

“We are horrified by what has happened in Baltimore, and our thoughts are with all of those affected,” the Maersk statement said.

The Dali was built in 2015 by the South Korea-based Hyundai Heavy Industries. The following year, the ship was involved in a minor incident when it hit a stone wall at the port of Antwerp . The Dali sustained damage at the time, but no one was injured.

Claire Moses is a reporter for the Express desk in London. More about Claire Moses

Jenny Gross is a reporter for The Times in London covering breaking news and other topics. More about Jenny Gross

VIDEO

  1. Definition of Statistics in detail|STATS|Statistics|Introduction|Statistics types|Purpose|Importance

  2. Introduction to Statistics

  3. Differences Between Thesis and Dissertation

  4. Types of Thesis

  5. Thesis

  6. Statistical Decision Theory Part

COMMENTS

  1. PDF Guideline to Writing a Master's Thesis in Statistics

    A master's thesis is an independent scientific work and is meant to prepare students for future professional or academic work. Largely, the thesis is expected to be similar to papers published in statistical journals. It is not set in stone exactly how the thesis should be organized. The following outline should however be followed. Title Page

  2. The Beginner's Guide to Statistical Analysis

    Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organizations. To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process. You need to specify ...

  3. What do senior theses in Statistics look like?

    Senior theses in Statistics cover a wide range of topics, across the spectrum from applied to theoretical. Typically, senior theses are expected to have one of the following three flavors: 1. Novel statistical theory or methodology, supported by extensive mathematical and/or simulation results, along with a clear account of how the research ...

  4. Master's Thesis

    Master's Thesis. As an integral component of the Master of Science in Statistical Science program, you can submit and defend a Master's Thesis. Your Master's Committee administers this oral examination. If you choose to defend a thesis, it is advisable to commence your research early, ideally during your second semester or the summer following ...

  5. Statistical Analysis in Research: Meaning, Methods and Types

    A Simplified Definition. Statistical analysis uses quantitative data to investigate patterns, relationships, and patterns to understand real-life and simulated phenomena. The approach is a key analytical tool in various fields, including academia, business, government, and science in general. This statistical analysis in research definition ...

  6. PDF CHAPTER 3 COMMONLY USED STATISTICAL TERMS

    Mean, arithmetic mean (X or M): The sum of the scores in a distribution divided by the number of scores in the distri-bution. It is the most commonly used measure of central tendency. It is often reported with its companion statistic, the standard deviation, which shows how far things vary from the average.

  7. An Introduction to Statistics: Choosing the Correct Statistical Test

    The choice of the test differs depending on whether two or more than two measurements are being compared. This includes more than two groups (unmatched data) or more than two measurements in a group (matched data). ( Table 1 lists the tests commonly used for comparing unpaired data, depending on the number of groups and type of data.

  8. Top 3 Techniques in Thesis Statistical Analysis for PhD

    Thesis data analysis and interpretation for PhD involves applying statistical methods to interpret and draw meaningful conclusions from research data. It plays a vital role in validating hypotheses, making informed decisions, and ensuring the robustness of a doctoral thesis, contributing to the overall quality and credibility of the research.

  9. Statistical Methods in Theses: Guidelines and Explanations

    Guidelines and Explanations. In light of the changes in psychology, faculty members who teach statistics/methods have reviewed the literature and generated this guide for graduate students. The guide is intended to enhance the quality of student theses by facilitating their engagement in open and transparent research practices and by helping ...

  10. Choosing the Right Statistical Test

    Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include: Ordinal: represent data with an order (e.g. rankings). Nominal: represent group names (e.g. brands or species names). Binary: represent data with a yes/no or 1/0 outcome (e.g. win or lose).

  11. Statistical Treatment of Data

    Statistical Treatment Example - Quantitative Research. For a statistical treatment of data example, consider a medical study that is investigating the effect of a drug on the human population. As the drug can affect different people in different ways based on parameters such as gender, age and race, the researchers would want to group the ...

  12. An Easy Introduction to Statistical Significance (With Examples)

    The p value determines statistical significance. An extremely low p value indicates high statistical significance, while a high p value means low or no statistical significance. Example: Hypothesis testing. To test your hypothesis, you first collect data from two groups. The experimental group actively smiles, while the control group does not.

  13. Thesis Life: 7 ways to tackle statistics in your thesis

    Apart from these seven ways to make your statistical journey pleasant, you should always engage in reading, watching, listening to stuff relevant to your thesis topic and talking about it to those who are interested. Most questions have solutions in the ether realm of communication. So, best of luck and break a leg!!!

  14. What is Statistical Analysis? Types, Methods and Examples

    Statistical analysis is the process of collecting and analyzing data in order to discern patterns and trends. It is a method for removing bias from evaluating data by employing numerical analysis. This technique is useful for collecting the interpretations of research, developing statistical models, and planning surveys and studies.

  15. Making Sense of Statistics

    Abstract. Statistics can be seen as a tool to enable us to make some sense of much data. Statistics now underpins many areas of modern life today and is widely used across many academic disciplines. It has a powerful role in helping us to interpret research data in education.

  16. ANOVA (Analysis of variance)

    Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means. It is similar to the t-test, but the t-test is generally used for comparing two means, while ANOVA is used when you have more than two means to compare. ANOVA is based on comparing the variance (or variation) between the data samples to the ...

  17. 1.1: Basic Definitions and Concepts

    In general, statistics is a study of data: describing properties of the data, which is called descriptive statistics, and drawing conclusions about a population of interest from information extracted from a sample, which is called inferential statistics. Computing the single number $8, 357 $ 8, 357 to summarize the data was an operation of ...

  18. Descriptive Statistics

    Descriptive statistics summarise and organise characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population . In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e ...

  19. The Beginner's Guide to Statistical Analysis

    Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organisations. To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process. You need to specify ...

  20. Basic statistical tools in research and data analysis

    The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a ...

  21. 1.1 Definitions of Statistics, Probability, and Key Terms

    Organizing and summarizing data is called descriptive statistics. Two ways to summarize data are by graphing and by using numbers, for example, finding an average. After you have studied probability and probability distributions, you will use formal methods for drawing conclusions from good data.

  22. Descriptive Statistics

    Types of descriptive statistics. There are 3 main types of descriptive statistics: The distribution concerns the frequency of each value. The central tendency concerns the averages of the values. The variability or dispersion concerns how spread out the values are. You can apply these to assess only one variable at a time, in univariate ...

  23. Mathematics and Statistics Theses and Dissertations

    Statistical Learning of Biomedical Non-Stationary Signals and Quality of Life Modeling, Mahdi Goudarzi. PDF. Probabilistic and Statistical Prediction Models for Alzheimer's Disease and Statistical Analysis of Global Warming, Maryam Ibrahim Habadi. PDF. Essays on Time Series and Machine Learning Techniques for Risk Management, Michael ...

  24. Regression to the Mean: Definition & Examples

    Regression to the mean is the statistical tendency for an extreme sample or observed value to be followed by a more average one. It is also known as reverting to the mean, highlighting the propensity for a later observation to move closer to the mean after an extreme value. The concept applies only to random variation in a process or system and ...

  25. Dali Ship That Hit Key Bridge Was Destined for Sri Lanka

    The Dali was less than 30 minutes into its planned 27-day journey when the ship ran into the Francis Scott Key Bridge on Tuesday.. The ship, which was sailing under the Singaporean flag, was on ...