Statistics Made Easy

The Importance of Statistics in Research (With Examples)

The field of statistics is concerned with collecting, analyzing, interpreting, and presenting data.

In the field of research, statistics is important for the following reasons:

Reason 1 : Statistics allows researchers to design studies such that the findings from the studies can be extrapolated to a larger population.

Reason 2 : Statistics allows researchers to perform hypothesis tests to determine if some claim about a new drug, new procedure, new manufacturing method, etc. is true.

Reason 3 : Statistics allows researchers to create confidence intervals to capture uncertainty around population estimates.

In the rest of this article, we elaborate on each of these reasons.

Reason 1: Statistics Allows Researchers to Design Studies

Researchers are often interested in answering questions about populations like:

  • What is the average weight of a certain species of bird?
  • What is the average height of a certain species of plant?
  • What percentage of citizens in a certain city support a certain law?

One way to answer these questions is to go around and collect data on every single individual in the population of interest.

However, this is typically too costly and time-consuming which is why researchers instead take a  sample  of the population and use the data from the sample to draw conclusions about the population as a whole.

Example of taking a sample from a population

There are many different methods researchers can potentially use to obtain individuals to be in a sample. These are known as  sampling methods .

There are two classes of sampling methods:

  • Probability sampling methods : Every member in a population has an equal probability of being selected to be in the sample.
  • Non-probability sampling methods : Not every member in a population has an equal probability of being selected to be in the sample.

By using probability sampling methods, researchers can maximize the chances that they obtain a sample that is representative of the overall population.

This allows researchers to extrapolate the findings from the sample to the overall population.

Read more about the two classes of sampling methods here .

Reason 2: Statistics Allows Researchers to Perform Hypothesis Tests

Another way that statistics is used in research is in the form of hypothesis tests .

These are tests that researchers can use to determine if there is a statistical significance between different medical procedures or treatments.

For example, suppose a scientist believes that a new drug is able to reduce blood pressure in obese patients. To test this, he measures the blood pressure of 30 patients before and after using the new drug for one month.

He then performs a paired samples t- test using the following hypotheses:

  • H 0 : μ after = μ before (the mean blood pressure is the same before and after using the drug)
  • H A : μ after < μ before (the mean blood pressure is less after using the drug)

If the p-value of the test is less than some significance level (e.g. α = .05), then he can reject the null hypothesis and conclude that the new drug leads to reduced blood pressure.

Note : This is just one example of a hypothesis test that is used in research. Other common tests include a one sample t-test , two sample t-test , one-way ANOVA , and two-way ANOVA .

Reason 3: Statistics Allows Researchers to Create Confidence Intervals

Another way that statistics is used in research is in the form of confidence intervals .

A confidence interval is a range of values that is likely to contain a population parameter with a certain level of confidence.

For example, suppose researchers are interested in estimating the mean weight of a certain species of turtle.

Instead of going around and weighing every single turtle in the population, researchers may instead take a simple random sample of turtles with the following information:

  • Sample size  n = 25
  • Sample mean weight  x  = 300
  • Sample standard deviation  s = 18.5

Using the confidence interval for a mean formula , researchers may then construct the following 95% confidence interval:

95% Confidence Interval:  300 +/-  1.96*(18.5/√ 25 ) =  [292.75, 307.25]

The researchers would then claim that they’re 95% confident that the true mean weight for this population of turtles is between 292.75 pounds and 307.25 pounds.

Additional Resources

The following articles explain the importance of statistics in other fields:

The Importance of Statistics in Healthcare The Importance of Statistics in Nursing The Importance of Statistics in Business The Importance of Statistics in Economics The Importance of Statistics in Education

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Enago Academy

Effective Use of Statistics in Research – Methods and Tools for Data Analysis

' src=

Remember that impending feeling you get when you are asked to analyze your data! Now that you have all the required raw data, you need to statistically prove your hypothesis. Representing your numerical data as part of statistics in research will also help in breaking the stereotype of being a biology student who can’t do math.

Statistical methods are essential for scientific research. In fact, statistical methods dominate the scientific research as they include planning, designing, collecting data, analyzing, drawing meaningful interpretation and reporting of research findings. Furthermore, the results acquired from research project are meaningless raw data unless analyzed with statistical tools. Therefore, determining statistics in research is of utmost necessity to justify research findings. In this article, we will discuss how using statistical methods for biology could help draw meaningful conclusion to analyze biological studies.

Table of Contents

Role of Statistics in Biological Research

Statistics is a branch of science that deals with collection, organization and analysis of data from the sample to the whole population. Moreover, it aids in designing a study more meticulously and also give a logical reasoning in concluding the hypothesis. Furthermore, biology study focuses on study of living organisms and their complex living pathways, which are very dynamic and cannot be explained with logical reasoning. However, statistics is more complex a field of study that defines and explains study patterns based on the sample sizes used. To be precise, statistics provides a trend in the conducted study.

Biological researchers often disregard the use of statistics in their research planning, and mainly use statistical tools at the end of their experiment. Therefore, giving rise to a complicated set of results which are not easily analyzed from statistical tools in research. Statistics in research can help a researcher approach the study in a stepwise manner, wherein the statistical analysis in research follows –

1. Establishing a Sample Size

Usually, a biological experiment starts with choosing samples and selecting the right number of repetitive experiments. Statistics in research deals with basics in statistics that provides statistical randomness and law of using large samples. Statistics teaches how choosing a sample size from a random large pool of sample helps extrapolate statistical findings and reduce experimental bias and errors.

2. Testing of Hypothesis

When conducting a statistical study with large sample pool, biological researchers must make sure that a conclusion is statistically significant. To achieve this, a researcher must create a hypothesis before examining the distribution of data. Furthermore, statistics in research helps interpret the data clustered near the mean of distributed data or spread across the distribution. These trends help analyze the sample and signify the hypothesis.

3. Data Interpretation Through Analysis

When dealing with large data, statistics in research assist in data analysis. This helps researchers to draw an effective conclusion from their experiment and observations. Concluding the study manually or from visual observation may give erroneous results; therefore, thorough statistical analysis will take into consideration all the other statistical measures and variance in the sample to provide a detailed interpretation of the data. Therefore, researchers produce a detailed and important data to support the conclusion.

Types of Statistical Research Methods That Aid in Data Analysis

statistics in research

Statistical analysis is the process of analyzing samples of data into patterns or trends that help researchers anticipate situations and make appropriate research conclusions. Based on the type of data, statistical analyses are of the following type:

1. Descriptive Analysis

The descriptive statistical analysis allows organizing and summarizing the large data into graphs and tables . Descriptive analysis involves various processes such as tabulation, measure of central tendency, measure of dispersion or variance, skewness measurements etc.

2. Inferential Analysis

The inferential statistical analysis allows to extrapolate the data acquired from a small sample size to the complete population. This analysis helps draw conclusions and make decisions about the whole population on the basis of sample data. It is a highly recommended statistical method for research projects that work with smaller sample size and meaning to extrapolate conclusion for large population.

3. Predictive Analysis

Predictive analysis is used to make a prediction of future events. This analysis is approached by marketing companies, insurance organizations, online service providers, data-driven marketing, and financial corporations.

4. Prescriptive Analysis

Prescriptive analysis examines data to find out what can be done next. It is widely used in business analysis for finding out the best possible outcome for a situation. It is nearly related to descriptive and predictive analysis. However, prescriptive analysis deals with giving appropriate suggestions among the available preferences.

5. Exploratory Data Analysis

EDA is generally the first step of the data analysis process that is conducted before performing any other statistical analysis technique. It completely focuses on analyzing patterns in the data to recognize potential relationships. EDA is used to discover unknown associations within data, inspect missing data from collected data and obtain maximum insights.

6. Causal Analysis

Causal analysis assists in understanding and determining the reasons behind “why” things happen in a certain way, as they appear. This analysis helps identify root cause of failures or simply find the basic reason why something could happen. For example, causal analysis is used to understand what will happen to the provided variable if another variable changes.

7. Mechanistic Analysis

This is a least common type of statistical analysis. The mechanistic analysis is used in the process of big data analytics and biological science. It uses the concept of understanding individual changes in variables that cause changes in other variables correspondingly while excluding external influences.

Important Statistical Tools In Research

Researchers in the biological field find statistical analysis in research as the scariest aspect of completing research. However, statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible.

1. Statistical Package for Social Science (SPSS)

It is a widely used software package for human behavior research. SPSS can compile descriptive statistics, as well as graphical depictions of result. Moreover, it includes the option to create scripts that automate analysis or carry out more advanced statistical processing.

2. R Foundation for Statistical Computing

This software package is used among human behavior research and other fields. R is a powerful tool and has a steep learning curve. However, it requires a certain level of coding. Furthermore, it comes with an active community that is engaged in building and enhancing the software and the associated plugins.

3. MATLAB (The Mathworks)

It is an analytical platform and a programming language. Researchers and engineers use this software and create their own code and help answer their research question. While MatLab can be a difficult tool to use for novices, it offers flexibility in terms of what the researcher needs.

4. Microsoft Excel

Not the best solution for statistical analysis in research, but MS Excel offers wide variety of tools for data visualization and simple statistics. It is easy to generate summary and customizable graphs and figures. MS Excel is the most accessible option for those wanting to start with statistics.

5. Statistical Analysis Software (SAS)

It is a statistical platform used in business, healthcare, and human behavior research alike. It can carry out advanced analyzes and produce publication-worthy figures, tables and charts .

6. GraphPad Prism

It is a premium software that is primarily used among biology researchers. But, it offers a range of variety to be used in various other fields. Similar to SPSS, GraphPad gives scripting option to automate analyses to carry out complex statistical calculations.

This software offers basic as well as advanced statistical tools for data analysis. However, similar to GraphPad and SPSS, minitab needs command over coding and can offer automated analyses.

Use of Statistical Tools In Research and Data Analysis

Statistical tools manage the large data. Many biological studies use large data to analyze the trends and patterns in studies. Therefore, using statistical tools becomes essential, as they manage the large data sets, making data processing more convenient.

Following these steps will help biological researchers to showcase the statistics in research in detail, and develop accurate hypothesis and use correct tools for it.

There are a range of statistical tools in research which can help researchers manage their research data and improve the outcome of their research by better interpretation of data. You could use statistics in research by understanding the research question, knowledge of statistics and your personal experience in coding.

Have you faced challenges while using statistics in research? How did you manage it? Did you use any of the statistical tools to help you with your research data? Do write to us or comment below!

Frequently Asked Questions

Statistics in research can help a researcher approach the study in a stepwise manner: 1. Establishing a sample size 2. Testing of hypothesis 3. Data interpretation through analysis

Statistical methods are essential for scientific research. In fact, statistical methods dominate the scientific research as they include planning, designing, collecting data, analyzing, drawing meaningful interpretation and reporting of research findings. Furthermore, the results acquired from research project are meaningless raw data unless analyzed with statistical tools. Therefore, determining statistics in research is of utmost necessity to justify research findings.

Statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible. They can manage large data sets, making data processing more convenient. A great number of tools are available to carry out statistical analysis of data like SPSS, SAS (Statistical Analysis Software), and Minitab.

' src=

nice article to read

Holistic but delineating. A very good read.

Rate this article Cancel Reply

Your email address will not be published.

purpose of statistics in thesis

Enago Academy's Most Popular

Research Interviews for Data Collection

  • Reporting Research

Research Interviews: An effective and insightful way of data collection

Research interviews play a pivotal role in collecting data for various academic, scientific, and professional…

Planning Your Data Collection

Planning Your Data Collection: Designing methods for effective research

Planning your research is very important to obtain desirable results. In research, the relevance of…

best plagiarism checker

  • Language & Grammar

Best Plagiarism Checker Tool for Researchers — Top 4 to choose from!

While common writing issues like language enhancement, punctuation errors, grammatical errors, etc. can be dealt…


  • Industry News
  • Publishing News

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats were achieved!

It’s beginning to look a lot like success! Some of the greatest opportunities to research…

purpose of statistics in thesis

  • Manuscript Preparation
  • Publishing Research

Qualitative Vs. Quantitative Research — A step-wise guide to conduct research

A research study includes the collection and analysis of data. In quantitative research, the data…

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats…

purpose of statistics in thesis

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

purpose of statistics in thesis

According to you, how can AI writing tools improve academic writing accuracy?

The Writing Center • University of North Carolina at Chapel Hill

There are lies, damned lies, and statistics. —Mark Twain

What this handout is about

The purpose of this handout is to help you use statistics to make your argument as effectively as possible.


Numbers are power. Apparently freed of all the squishiness and ambiguity of words, numbers and statistics are powerful pieces of evidence that can effectively strengthen any argument. But statistics are not a panacea. As simple and straightforward as these little numbers promise to be, statistics, if not used carefully, can create more problems than they solve.

Many writers lack a firm grasp of the statistics they are using. The average reader does not know how to properly evaluate and interpret the statistics he or she reads. The main reason behind the poor use of statistics is a lack of understanding about what statistics can and cannot do. Many people think that statistics can speak for themselves. But numbers are as ambiguous as words and need just as much explanation.

In many ways, this problem is quite similar to that experienced with direct quotes. Too often, quotes are expected to do all the work and are treated as part of the argument, rather than a piece of evidence requiring interpretation (see our handout on how to quote .) But if you leave the interpretation up to the reader, who knows what sort of off-the-wall interpretations may result? The only way to avoid this danger is to supply the interpretation yourself.

But before we start writing statistics, let’s actually read a few.

Reading statistics

As stated before, numbers are powerful. This is one of the reasons why statistics can be such persuasive pieces of evidence. However, this same power can also make numbers and statistics intimidating. That is, we too often accept them as gospel, without ever questioning their veracity or appropriateness. While this may seem like a positive trait when you plug them into your paper and pray for your reader to submit to their power, remember that before we are writers of statistics, we are readers. And to be effective readers means asking the hard questions. Below you will find a useful set of hard questions to ask of the numbers you find.

1. Does your evidence come from reliable sources?

This is an important question not only with statistics, but with any evidence you use in your papers. As we will see in this handout, there are many ways statistics can be played with and misrepresented in order to produce a desired outcome. Therefore, you want to take your statistics from reliable sources (for more information on finding reliable sources, please see our handout on evaluating print sources ). This is not to say that reliable sources are infallible, but only that they are probably less likely to use deceptive practices. With a credible source, you may not need to worry as much about the questions that follow. Still, remember that reading statistics is a bit like being in the middle of a war: trust no one; suspect everyone.

2. What is the data’s background?

Data and statistics do not just fall from heaven fully formed. They are always the product of research. Therefore, to understand the statistics, you should also know where they come from. For example, if the statistics come from a survey or poll, some questions to ask include:

  • Who asked the questions in the survey/poll?
  • What, exactly, were the questions?
  • Who interpreted the data?
  • What issue prompted the survey/poll?
  • What (policy/procedure) potentially hinges on the results of the poll?
  • Who stands to gain from particular interpretations of the data?

All these questions help you orient yourself toward possible biases or weaknesses in the data you are reading. The goal of this exercise is not to find “pure, objective” data but to make any biases explicit, in order to more accurately interpret the evidence.

3. Are all data reported?

In most cases, the answer to this question is easy: no, they aren’t. Therefore, a better way to think about this issue is to ask whether all data have been presented in context. But it is much more complicated when you consider the bigger issue, which is whether the text or source presents enough evidence for you to draw your own conclusion. A reliable source should not exclude data that contradicts or weakens the information presented.

An example can be found on the evening news. If you think about ice storms, which make life so difficult in the winter, you will certainly remember the newscasters warning people to stay off the roads because they are so treacherous. To verify this point, they tell you that the Highway Patrol has already reported 25 accidents during the day. Their intention is to scare you into staying home with this number. While this number sounds high, some studies have found that the number of accidents actually goes down on days with severe weather. Why is that? One possible explanation is that with fewer people on the road, even with the dangerous conditions, the number of accidents will be less than on an “average” day. The critical lesson here is that even when the general interpretation is “accurate,” the data may not actually be evidence for the particular interpretation. This means you have no way to verify if the interpretation is in fact correct.

There is generally a comparison implied in the use of statistics. How can you make a valid comparison without having all the facts? Good question. You may have to look to another source or sources to find all the data you need.

4. Have the data been interpreted correctly?

If the author gives you her statistics, it is always wise to interpret them yourself. That is, while it is useful to read and understand the author’s interpretation, it is merely that—an interpretation. It is not the final word on the matter. Furthermore, sometimes authors (including you, so be careful) can use perfectly good statistics and come up with perfectly bad interpretations. Here are two common mistakes to watch out for:

  • Confusing correlation with causation. Just because two things vary together does not mean that one of them is causing the other. It could be nothing more than a coincidence, or both could be caused by a third factor. Such a relationship is called spurious.The classic example is a study that found that the more firefighters sent to put out a fire, the more damage the fire did. Yikes! I thought firefighters were supposed to make things better, not worse! But before we start shutting down fire stations, it might be useful to entertain alternative explanations. This seemingly contradictory finding can be easily explained by pointing to a third factor that causes both: the size of the fire. The lesson here? Correlation does not equal causation. So it is important not only to think about showing that two variables co-vary, but also about the causal mechanism.
  • Ignoring the margin of error. When survey results are reported, they frequently include a margin of error. You might see this written as “a margin of error of plus or minus 5 percentage points.” What does this mean? The simple story is that surveys are normally generated from samples of a larger population, and thus they are never exact. There is always a confidence interval within which the general population is expected to fall. Thus, if I say that the number of UNC students who find it difficult to use statistics in their writing is 60%, plus or minus 4%, that means, assuming the normal confidence interval of 95%, that with 95% certainty we can say that the actual number is between 56% and 64%.

Why does this matter? Because if after introducing this handout to the students of UNC, a new poll finds that only 56%, plus or minus 3%, are having difficulty with statistics, I could go to the Writing Center director and ask for a raise, since I have made a significant contribution to the writing skills of the students on campus. However, she would no doubt point out that a) this may be a spurious relationship (see above) and b) the actual change is not significant because it falls within the margin of error for the original results. The lesson here? Margins of error matter, so you cannot just compare simple percentages.

Finally, you should keep in mind that the source you are actually looking at may not be the original source of your data. That is, if you find an essay that quotes a number of statistics in support of its argument, often the author of the essay is using someone else’s data. Thus, you need to consider not only your source, but the author’s sources as well.

Writing statistics

As you write with statistics, remember your own experience as a reader of statistics. Don’t forget how frustrated you were when you came across unclear statistics and how thankful you were to read well-presented ones. It is a sign of respect to your reader to be as clear and straightforward as you can be with your numbers. Nobody likes to be played for a fool. Thus, even if you think that changing the numbers just a little bit will help your argument, do not give in to the temptation.

As you begin writing, keep the following in mind. First, your reader will want to know the answers to the same questions that we discussed above. Second, you want to present your statistics in a clear, unambiguous manner. Below you will find a list of some common pitfalls in the world of statistics, along with suggestions for avoiding them.

1. The mistake of the “average” writer

Nobody wants to be average. Moreover, nobody wants to just see the word “average” in a piece of writing. Why? Because nobody knows exactly what it means. There are not one, not two, but three different definitions of “average” in statistics, and when you use the word, your reader has only a 33.3% chance of guessing correctly which one you mean.

For the following definitions, please refer to this set of numbers: 5, 5, 5, 8, 12, 14, 21, 33, 38

  • Mean (arithmetic mean) This may be the most average definition of average (whatever that means). This is the weighted average—a total of all numbers included divided by the quantity of numbers represented. Thus the mean of the above set of numbers is 5+5+5+8+12+14+21+33+38, all divided by 9, which equals 15.644444444444 (Wow! That is a lot of numbers after the decimal—what do we do about that? Precision is a good thing, but too much of it is over the top; it does not necessarily make your argument any stronger. Consider the reasonable amount of precision based on your input and round accordingly. In this case, 15.6 should do the trick.)
  • Median Depending on whether you have an odd or even set of numbers, the median is either a) the number midway through an odd set of numbers or b) a value halfway between the two middle numbers in an even set. For the above set (an odd set of 9 numbers), the median is 12. (5, 5, 5, 8 < 12 < 14, 21, 33, 38)
  • Mode The mode is the number or value that occurs most frequently in a series. If, by some cruel twist of fate, two or more values occur with the same frequency, then you take the mean of the values. For our set, the mode would be 5, since it occurs 3 times, whereas all other numbers occur only once.

As you can see, the numbers can vary considerably, as can their significance. Therefore, the writer should always inform the reader which average he or she is using. Otherwise, confusion will inevitably ensue.

2. Match your facts with your questions

Be sure that your statistics actually apply to the point/argument you are making. If we return to our discussion of averages, depending on the question you are interesting in answering, you should use the proper statistics.

Perhaps an example would help illustrate this point. Your professor hands back the midterm. The grades are distributed as follows:

The professor felt that the test must have been too easy, because the average (median) grade was a 95.

When a colleague asked her about how the midterm grades came out, she answered, knowing that her classes were gaining a reputation for being “too easy,” that the average (mean) grade was an 80.

When your parents ask you how you can justify doing so poorly on the midterm, you answer, “Don’t worry about my 63. It is not as bad as it sounds. The average (mode) grade was a 58.”

I will leave it up to you to decide whether these choices are appropriate. Selecting the appropriate facts or statistics will help your argument immensely. Not only will they actually support your point, but they will not undermine the legitimacy of your position. Think about how your parents will react when they learn from the professor that the average (median) grade was 95! The best way to maintain precision is to specify which of the three forms of “average” you are using.

3. Show the entire picture

Sometimes, you may misrepresent your evidence by accident and misunderstanding. Other times, however, misrepresentation may be slightly less innocent. This can be seen most readily in visual aids. Do not shape and “massage” the representation so that it “best supports” your argument. This can be achieved by presenting charts/graphs in numerous different ways. Either the range can be shortened (to cut out data points which do not fit, e.g., starting a time series too late or ending it too soon), or the scale can be manipulated so that small changes look big and vice versa. Furthermore, do not fiddle with the proportions, either vertically or horizontally. The fact that USA Today seems to get away with these techniques does not make them OK for an academic argument.

Charts A, B, and C all use the same data points, but the stories they seem to be telling are quite different. Chart A shows a mild increase, followed by a slow decline. Chart B, on the other hand, reveals a steep jump, with a sharp drop-off immediately following. Conversely, Chart C seems to demonstrate that there was virtually no change over time. These variations are a product of changing the scale of the chart. One way to alleviate this problem is to supplement the chart by using the actual numbers in your text, in the spirit of full disclosure.

Another point of concern can be seen in Charts D and E. Both use the same data as charts A, B, and C for the years 1985-2000, but additional time points, using two hypothetical sets of data, have been added back to 1965. Given the different trends leading up to 1985, consider how the significance of recent events can change. In Chart D, the downward trend from 1990 to 2000 is going against a long-term upward trend, whereas in Chart E, it is merely the continuation of a larger downward trend after a brief upward turn.

One of the difficulties with visual aids is that there is no hard and fast rule about how much to include and what to exclude. Judgment is always involved. In general, be sure to present your visual aids so that your readers can draw their own conclusions from the facts and verify your assertions. If what you have cut out could affect the reader’s interpretation of your data, then you might consider keeping it.

4. Give bases of all percentages

Because percentages are always derived from a specific base, they are meaningless until associated with a base. So even if I tell you that after this reading this handout, you will be 23% more persuasive as a writer, that is not a very meaningful assertion because you have no idea what it is based on—23% more persuasive than what?

Let’s look at crime rates to see how this works. Suppose we have two cities, Springfield and Shelbyville. In Springfield, the murder rate has gone up 75%, while in Shelbyville, the rate has only increased by 10%. Which city is having a bigger murder problem? Well, that’s obvious, right? It has to be Springfield. After all, 75% is bigger than 10%.

Hold on a second, because this is actually much less clear than it looks. In order to really know which city has a worse problem, we have to look at the actual numbers. If I told you that Springfield had 4 murders last year and 7 this year, and Shelbyville had 30 murders last year and 33 murders this year, would you change your answer? Maybe, since 33 murders are significantly more than 7. One would certainly feel safer in Springfield, right?

Not so fast, because we still do not have all the facts. We have to make the comparison between the two based on equivalent standards. To do that, we have to look at the per capita rate (often given in rates per 100,000 people per year). If Springfield has 700 residents while Shelbyville has 3.3 million, then Springfield has a murder rate of 1,000 per 100,000 people, and Shelbyville’s rate is merely 1 per 100,000. Gadzooks! The residents of Springfield are dropping like flies. I think I’ll stick with nice, safe Shelbyville, thank you very much.

Percentages are really no different from any other form of statistics: they gain their meaning only through their context. Consequently, percentages should be presented in context so that readers can draw their own conclusions as you emphasize facts important to your argument. Remember, if your statistics really do support your point, then you should have no fear of revealing the larger context that frames them.

Important questions to ask (and answer) about statistics

  • Is the question being asked relevant?
  • Do the data come from reliable sources?
  • Margin of error/confidence interval—when is a change really a change?
  • Are all data reported, or just the best/worst?
  • Are the data presented in context?
  • Have the data been interpreted correctly?
  • Does the author confuse correlation with causation?

Now that you have learned the lessons of statistics, you have two options. Use this knowledge to manipulate your numbers to your advantage, or use this knowledge to better understand and use statistics to make accurate and fair arguments. The choice is yours. Nine out of ten writers, however, prefer the latter, and the other one later regrets his or her decision.

You may reproduce it for non-commercial use if you use the entire handout and attribute the source: The Writing Center, University of North Carolina at Chapel Hill

Make a Gift

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Descriptive Statistics

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

This handout explains how to write with statistics including quick tips, writing descriptive statistics, writing inferential statistics, and using visuals with statistics.

The mean, the mode, the median, the range, and the standard deviation are all examples of descriptive statistics. Descriptive statistics are used because in most cases, it isn't possible to present all of your data in any form that your reader will be able to quickly interpret.

Generally, when writing descriptive statistics, you want to present at least one form of central tendency (or average), that is, either the mean, median, or mode. In addition, you should present one form of variability , usually the standard deviation.

Measures of Central Tendency and Other Commonly Used Descriptive Statistics

The mean, median, and the mode are all measures of central tendency. They attempt to describe what the typical data point might look like. In essence, they are all different forms of 'the average.' When writing statistics, you never want to say 'average' because it is difficult, if not impossible, for your reader to understand if you are referring to the mean, the median, or the mode.

The mean is the most common form of central tendency, and is what most people usually are referring to when the say average. It is simply the total sum of all the numbers in a data set, divided by the total number of data points. For example, the following data set has a mean of 4: {-1, 0, 1, 16}. That is, 16 divided by 4 is 4. If there isn't a good reason to use one of the other forms of central tendency, then you should use the mean to describe the central tendency.

The median is simply the middle value of a data set. In order to calculate the median, all values in the data set need to be ordered, from either highest to lowest, or vice versa. If there are an odd number of values in a data set, then the median is easy to calculate. If there is an even number of values in a data set, then the calculation becomes more difficult. Statisticians still debate how to properly calculate a median when there is an even number of values, but for most purposes, it is appropriate to simply take the mean of the two middle values. The median is useful when describing data sets that are skewed or have extreme values. Incomes of baseballs players, for example, are commonly reported using a median because a small minority of baseball players makes a lot of money, while most players make more modest amounts. The median is less influenced by extreme scores than the mean.

The mode is the most commonly occurring number in the data set. The mode is best used when you want to indicate the most common response or item in a data set. For example, if you wanted to predict the score of the next football game, you may want to know what the most common score is for the visiting team, but having an average score of 15.3 won't help you if it is impossible to score 15.3 points. Likewise, a median score may not be very informative either, if you are interested in what score is most likely.

Standard Deviation

The standard deviation is a measure of variability (it is not a measure of central tendency). Conceptually it is best viewed as the 'average distance that individual data points are from the mean.' Data sets that are highly clustered around the mean have lower standard deviations than data sets that are spread out.

For example, the first data set would have a higher standard deviation than the second data set:

Notice that both groups have the same mean (5) and median (also 5), but the two groups contain different numbers and are organized much differently. This organization of a data set is often referred to as a distribution. Because the two data sets above have the same mean and median, but different standard deviation, we know that they also have different distributions. Understanding the distribution of a data set helps us understand how the data behave.

Statistical Methods in Theses: Guidelines and Explanations

Signed August 2018 Naseem Al-Aidroos, PhD, Christopher Fiacconi, PhD Deborah Powell, PhD, Harvey Marmurek, PhD, Ian Newby-Clark, PhD, Jeffrey Spence, PhD, David Stanley, PhD, Lana Trick, PhD

Version:  2.00

This document is an organizational aid, and workbook, for students. We encourage students to take this document to meetings with their advisor and committee. This guide should enhance a committee’s ability to assess key areas of a student’s work. 

In recent years a number of well-known and apparently well-established findings have  failed to replicate , resulting in what is commonly referred to as the replication crisis. The APA Publication Manual 6 th Edition notes that “The essence of the scientific method involves observations that can be repeated and verified by others.” (p. 12). However, a systematic investigation of the replicability of psychology findings published in  Science  revealed that over half of psychology findings do not replicate (see a related commentary in  Nature ). Even more disturbing, a  Bayesian reanalysis of the reproducibility project  showed that 64% of studies had sample sizes so small that strong evidence for or against the null or alternative hypotheses did not exist. Indeed, Morey and Lakens (2016) concluded that most of psychology is statistically unfalsifiable due to small sample sizes and correspondingly low power (see  article ). Our discipline’s reputation is suffering. News of the replication crisis has reached the popular press (e.g.,  The Atlantic ,   The Economist ,   Slate , Last Week Tonight ).

An increasing number of psychologists have responded by promoting new research standards that involve open science and the elimination of  Questionable Research Practices . The open science perspective is made manifest in the  Transparency and Openness Promotion (TOP) guidelines  for journal publications. These guidelines were adopted some time ago by the  Association for Psychological Science . More recently, the guidelines were adopted by American Psychological Association journals ( see details ) and journals published by Elsevier ( see details ). It appears likely that, in the very near future, most journals in psychology will be using an open science approach. We strongly advise readers to take a moment to inspect the  TOP Guidelines Summary Table . 

A key aspect of open science and the TOP guidelines is the sharing of data associated with published research (with respect to medical research, see point #35 in the  World Medical Association Declaration of Helsinki ). This practice is viewed widely as highly important. Indeed, open science is recommended by  all G7 science ministers . All Tri-Agency grants must include a data-management plan that includes plans for sharing: “ research data resulting from agency funding should normally be preserved in a publicly accessible, secure and curated repository or other platform for discovery and reuse by others.”  Moreover, a 2017 editorial published in the  New England Journal of Medicine announced that the  International Committee of Medical Journal Editors believes there is  “an ethical obligation to responsibly share data.”  As of this writing,  60% of highly ranked psychology journals require or encourage data sharing .

The increasing importance of demonstrating that findings are replicable is reflected in calls to make replication a requirement for the promotion of faculty (see details in  Nature ) and experts in open science are now refereeing applications for tenure and promotion (see details at the  Center for Open Science  and  this article ). Most dramatically, in one instance, a paper resulting from a dissertation was retracted due to misleading findings attributable to Questionable Research Practices. Subsequent to the retraction, the Ohio State University’s Board of Trustees unanimously revoked the PhD of the graduate student who wrote the dissertation ( see details ). Thus, the academic environment is changing and it is important to work toward using new best practices in lieu of older practices—many of which are synonymous with Questionable Research Practices. Doing so should help you avoid later career regrets and subsequent  public mea culpas . One way to achieve your research objectives in this new academic environment is  to incorporate replications into your research . Replications are becoming more common and there are even websites dedicated to helping students conduct replications (e.g.,  Psychology Science Accelerator ) and indexing the success of replications (e.g., Curate Science ). You might even consider conducting a replication for your thesis (subject to committee approval).

As early-career researchers, it is important to be aware of the changing academic environment. Senior principal investigators may be  reluctant to engage in open science  (see this student perspective in a  blog post  and  podcast ) and research on resistance to data sharing indicates that one of the barriers to sharing data is that researchers do not feel that they have knowledge of  how to share data online . This document is an educational aid and resource to provide students with introductory knowledge of how to participate in open science and online data sharing to start their education on these subjects. 

Guidelines and Explanations

In light of the changes in psychology, faculty members who teach statistics/methods have reviewed the literature and generated this guide for graduate students. The guide is intended to enhance the quality of student theses by facilitating their engagement in open and transparent research practices and by helping them avoid Questionable Research Practices, many of which are now deemed unethical and covered in the ethics section of textbooks.

This document is an informational tool.

How to Start

In order to follow best practices, some first steps need to be followed. Here is a list of things to do:

  • Get an Open Science account. Registration at  is easy!
  • If conducting confirmatory hypothesis testing for your thesis, pre-register your hypotheses (see Section 1-Hypothesizing). The Open Science Foundation website has helpful  tutorials  and  guides  to get you going.
  • Also, pre-register your data analysis plan. Pre-registration typically includes how and when you will stop collecting data, how you will deal with violations of statistical assumptions and points of influence (“outliers”), the specific measures you will use, and the analyses you will use to test each hypothesis, possibly including the analysis script. Again, there is a lot of help available for this. 

Exploratory and Confirmatory Research Are Both of Value, But Do Not Confuse the Two

We note that this document largely concerns confirmatory research (i.e., testing hypotheses). We by no means intend to devalue exploratory research. Indeed, it is one of the primary ways that hypotheses are generated for (possible) confirmation. Instead, we emphasize that it is important that you clearly indicate what of your research is exploratory and what is confirmatory. Be clear in your writing and in your preregistration plan. You should explicitly indicate which of your analyses are exploratory and which are confirmatory. Please note also that if you are engaged in exploratory research, then Null Hypothesis Significance Testing (NHST) should probably be avoided (see rationale in  Gigerenzer  (2004) and  Wagenmakers et al., (2012) ). 

This document is structured around the stages of thesis work:  hypothesizing, design, data collection, analyses, and reporting – consistent with the headings used by Wicherts et al. (2016). We also list the Questionable Research Practices associated with each stage and provide suggestions for avoiding them. We strongly advise going through all of these sections during thesis/dissertation proposal meetings because a priori decisions need to be made prior to data collection (including analysis decisions). 

To help to ensure that the student has informed the committee about key decisions at each stage, there are check boxes at the end of each section.

How to Use This Document in a Proposal Meeting

  • Print off a copy of this document and take it to the proposal meeting.
  • During the meeting, use the document to seek assistance from faculty to address potential problems.
  • Revisit responses to issues raised by this document (especially the Analysis and Reporting Stages) when you are seeking approval to proceed to defense.

Consultation and Help Line

Note that the Center for Open Science now has a help line (for individual researchers and labs) you can call for help with open science issues. They also have training workshops. Please see their  website  for details.

  • Hypothesizing
  • Data Collection
  • Printer-friendly version
  • PDF version
  • Weblog home

International Students Blog

International Students blog

Thesis life: 7 ways to tackle statistics in your thesis.

purpose of statistics in thesis

By Pranav Kulkarni

Thesis is an integral part of your Masters’ study in Wageningen University and Research. It is the most exciting, independent and technical part of the study. More often than not, most departments in WU expect students to complete a short term independent project or a part of big on-going project for their thesis assignment.

Source :

This assignment involves proposing a research question, tackling it with help of some observations or experiments, analyzing these observations or results and then stating them by drawing some conclusions.

Since it is an immitigable part of your thesis, you can neither run from statistics nor cry for help.

The penultimate part of this process involves analysis of results which is very crucial for coherence of your thesis assignment.This analysis usually involve use of statistical tools to help draw inferences. Most students who don’t pursue statistics in their curriculum are scared by this prospect. Since it is an immitigable part of your thesis, you can neither run from statistics nor cry for help. But in order to not get intimidated by statistics and its “greco-latin” language, there are a few ways in which you can make your journey through thesis life a pleasant experience.

Make statistics your friend

The best way to end your fear of statistics and all its paraphernalia is to befriend it. Try to learn all that you can about the techniques that you will be using, why they were invented, how they were invented and who did this deed. Personifying the story of statistical techniques makes them digestible and easy to use. Each new method in statistics comes with a unique story and loads of nerdy anecdotes.

Source: Wikipedia

If you cannot make friends with statistics, at least make a truce

If you cannot still bring yourself about to be interested in the life and times of statistics, the best way to not hate statistics is to make an agreement with yourself. You must realise that although important, this is only part of your thesis. The better part of your thesis is something you trained for and learned. So, don’t bother to fuss about statistics and make you all nervous. Do your job, enjoy thesis to the fullest and complete the statistical section as soon as possible. At the end, you would have forgotten all about your worries and fears of statistics.

Visualize your data

The best way to understand the results and observations from your study/ experiments, is to visualize your data. See different trends, patterns, or lack thereof to understand what you are supposed to do. Moreover, graphics and illustrations can be used directly in your report. These techniques will also help you decide on which statistical analyses you must perform to answer your research question. Blind decisions about statistics can often influence your study and make it very confusing or worse, make it completely wrong!


Simplify with flowcharts and planning

Similar to graphical visualizations, making flowcharts and planning various steps of your study can prove beneficial to make statistical decisions. Human brain can analyse pictorial information faster than literal information. So, it is always easier to understand your exact goal when you can make decisions based on flowchart or any logical flow-plans.


Find examples on internet

Although statistics is a giant maze of complicated terminologies, the internet holds the key to this particular maze. You can find tons of examples on the web. These may be similar to what you intend to do or be different applications of the similar tools that you wish to engage. Especially, in case of Statistical programming languages like R, SAS, Python, PERL, VBA, etc. there is a vast database of example codes, clarifications and direct training examples available on the internet. Various forums are also available for specialized statistical methodologies where different experts and students discuss the issues regarding their own projects.


Comparative studies

Much unlike blindly searching the internet for examples and taking word of advice from online faceless people, you can systematically learn which quantitative tests to perform by rigorously studying literature of relevant research. Since you came up with a certain problem to tackle in your field of study, chances are, someone else also came up with this issue or something quite similar. You can find solutions to many such problems by scouring the internet for research papers which address the issue. Nevertheless, you should be cautious. It is easy to get lost and disheartened when you find many heavy statistical studies with lots of maths and derivations with huge cryptic symbolical text.

When all else fails, talk to an expert

All the steps above are meant to help you independently tackle whatever hurdles you encounter over the course of your thesis. But, when you cannot tackle them yourself it is always prudent and most efficient to ask for help. Talking to students from your thesis ring who have done something similar is one way of help. Another is to make an appointment with your supervisor and take specific questions to him/ her. If that is not possible, you can contact some other teaching staff or researchers from your research group. Try not to waste their as well as you time by making a list of specific problems that you will like to discuss. I think most are happy to help in any way possible.

Talking to students from your thesis ring who have done something similar is one way of help.

Sometimes, with the help of your supervisor, you can make an appointment with someone from the “Biometris” which is the WU’s statistics department. These people are the real deal; chances are, these people can solve all your problems without any difficulty. Always remember, you are in the process of learning, nobody expects you to be an expert in everything. Ask for help when there seems to be no hope.

Apart from these seven ways to make your statistical journey pleasant, you should always engage in reading, watching, listening to stuff relevant to your thesis topic and talking about it to those who are interested. Most questions have solutions in the ether realm of communication. So, best of luck and break a leg!!!

Related posts:

No related posts.

MSc Animal Science

View articles

There are 4 comments.

A perfect approach in a very crisp and clear manner! The sequence suggested is absolutely perfect and will help the students very much. I particularly liked the idea of visualisation!

You are write! I get totally stuck with learning and understanding statistics for my Dissertation!

Statistics is a technical subject that requires extra effort. With the highlighted tips you already highlighted i expect it will offer the much needed help with statistics analysis in my course.

this is so much relevant to me! Don’t forget one more point: try to enrol specific online statistics course (in my case, I’m too late to join any statistic course). The hardest part for me actually to choose what type of statistical test to choose among many options

Leave a reply Cancel reply

Your email address will not be published. Required fields are marked *

University of Cambridge

Study at Cambridge

About the university, research at cambridge.

  • Undergraduate courses
  • Events and open days
  • Fees and finance
  • Postgraduate courses
  • How to apply
  • Postgraduate events
  • Fees and funding
  • International students
  • Continuing education
  • Executive and professional education
  • Courses in education
  • How the University and Colleges work
  • Term dates and calendars
  • Visiting the University
  • Annual reports
  • Equality and diversity
  • A global university
  • Public engagement
  • Give to Cambridge
  • For Cambridge students
  • For our researchers
  • Business and enterprise
  • Colleges & departments
  • Email & phone search
  • Museums & collections
  • Open Research
  • Share Your Research
  • Open Research overview
  • Share Your Research overview
  • Open Research Position Statement
  • Scholarly Communication overview
  • Join the discussion overview
  • Author tools overview
  • Publishing Schol Comm research overview
  • Open Access overview
  • Open Access policies overview
  • Places to find OA content
  • Open Access Monographs overview
  • Open Access Infrastructure
  • Repository overview
  • How to Deposit overview
  • Digital Object Identifiers (DOI)
  • Request a Copy
  • Copyright overview
  • Third party copyright
  • Licensing options
  • Creative Commons
  • Authorship and IP
  • Copyright and VLE
  • Copyright resources
  • Outreach overview
  • Training overview
  • Events overview
  • Contact overview
  • Governance overview

Data and your thesis

  • Scholarly Communication
  • Open Access
  • Training, Outreach and Events

What is research data?

Research data are the evidence that underpins the answer to your research question and can support the findings or outputs of your research. Research data takes many different forms. They may include for example, statistics, digital images, sound recordings, films, transcripts of interviews, survey data, artworks, published texts or manuscripts, or fieldwork observations. The term 'data' is more familiar to researchers in Science, Technology, Engineering and Mathematics (STEM), but any outputs from research could be considered data. For example, Humanities, Arts and Social Sciences (HASS) researchers might create data in the form of presentations, spreadsheets, documents, images, works of art, or musical scores. The Research Data Management Team in the University Library aim to help you plan, create, organise, share, and look after your research materials, whatever form they take. For more information about the Research data Management Team, visit their website .

Data Management Plans

Research Data Management is a complex issue, but if done correctly from the start, could save you a lot of time and hassle when you are writing up your thesis. We advise all students to consider data management as early as possible and create a Data Management Plan (DMP). The Research Data Management Team offer help in creating your DMP and can offer advice and training on how to do this. There are some departments that have joined a pilot project to include Data Management Plans in the registration reviews of PhD students. As part of the pilot, students are asked to complete a brief Data Management Plan (DMP) and supervisors and assessors ensure that the student has thought about all the issues and their responses are reasonable. If your department is taking part in the pilot or would like to, see the Data Management Plans for Pilot for Cambridge PhD Students page. The Research Data Management Team will provide support for any students, supervisors or assessors that are in need.

Submitting your digital thesis and depositing your data

If you have created data that is connected to your thesis and the data is in a format separate to the thesis file itself, we recommend that you deposit it in the data repository and make it open access to improve discoverability. We will accept data that either does not contain third party copyright, or contains third party copyright that has been cleared and is data of the following types:

  •     computer code written by the researcher
  •     software written by the researcher
  •     statistical data
  •     raw data from experiments

If you have created a research output which is not one of those listed above, please contact us on the [email protected] address and we will advise whether you should deposit this with your thesis, or separately in the data repository. If you are ready to deposit your data in the data repository, please do so via symplectic elements. More information on how to deposit can be found on the Research Data Management pages . If you wish to cite your data in your thesis, we can arranged for placeholder DOIs to be created in the data repository before your thesis is submitted. For further information, please email:  [email protected]  

Third party copyright in your data

For an explanation of what is third party copyright, please see the OSC third party copyright page . If your data is based on, or contains third party copyright you will need to obtain clearance to make your data open access in the data repository. It is possible to apply a 12 month embargo to datasets while clearance is obtained if you need extra time to do this. However, if it is not possible to clear the third party copyrighted material, it is not possible to deposit your data in the data repository. In these cases, it might be preferable to deposit your data with your thesis instead, under controlled access, but this can be complicated if you wish to deposit the thesis itself under a different access level. Please email [email protected] with any queries and we can advise on the best solution.

Open Research Newsletter sign-up

Please contact us at  [email protected]   to be added to the mailing list to receive our quarterly e-Newsletter.

The Office of Scholarly Communication sends this Newsletter to its subscribers in order to disseminate information relevant to open access, research data management, scholarly communication and open research topics. For details on how the personal information you enter here is used, please see our  privacy policy . 

Privacy Policy

© 2024 University of Cambridge

  • Contact the University
  • Accessibility
  • Freedom of information
  • Privacy policy and cookies
  • Statement on Modern Slavery
  • Terms and conditions
  • University A-Z
  • Undergraduate
  • Postgraduate
  • Research news
  • About research at Cambridge
  • Spotlight on...

Grad Coach

How To Write The Results/Findings Chapter

For quantitative studies (dissertations & theses).

By: Derek Jansen (MBA). Expert Reviewed By: Kerryn Warren (PhD) | July 2021

So, you’ve completed your quantitative data analysis and it’s time to report on your findings. But where do you start? In this post, we’ll walk you through the results chapter (also called the findings or analysis chapter), step by step, so that you can craft this section of your dissertation or thesis with confidence. If you’re looking for information regarding the results chapter for qualitative studies, you can find that here .

The results & analysis section in a dissertation

Overview: Quantitative Results Chapter

  • What exactly the results/findings/analysis chapter is
  • What you need to include in your results chapter
  • How to structure your results chapter
  • A few tips and tricks for writing top-notch chapter

What exactly is the results chapter?

The results chapter (also referred to as the findings or analysis chapter) is one of the most important chapters of your dissertation or thesis because it shows the reader what you’ve found in terms of the quantitative data you’ve collected. It presents the data using a clear text narrative, supported by tables, graphs and charts. In doing so, it also highlights any potential issues (such as outliers or unusual findings) you’ve come across.

But how’s that different from the discussion chapter?

Well, in the results chapter, you only present your statistical findings. Only the numbers, so to speak – no more, no less. Contrasted to this, in the discussion chapter , you interpret your findings and link them to prior research (i.e. your literature review), as well as your research objectives and research questions . In other words, the results chapter presents and describes the data, while the discussion chapter interprets the data.

Let’s look at an example.

In your results chapter, you may have a plot that shows how respondents to a survey  responded: the numbers of respondents per category, for instance. You may also state whether this supports a hypothesis by using a p-value from a statistical test. But it is only in the discussion chapter where you will say why this is relevant or how it compares with the literature or the broader picture. So, in your results chapter, make sure that you don’t present anything other than the hard facts – this is not the place for subjectivity.

It’s worth mentioning that some universities prefer you to combine the results and discussion chapters. Even so, it is good practice to separate the results and discussion elements within the chapter, as this ensures your findings are fully described. Typically, though, the results and discussion chapters are split up in quantitative studies. If you’re unsure, chat with your research supervisor or chair to find out what their preference is.

The results and discussion chapter are typically split

What should you include in the results chapter?

Following your analysis, it’s likely you’ll have far more data than are necessary to include in your chapter. In all likelihood, you’ll have a mountain of SPSS or R output data, and it’s your job to decide what’s most relevant. You’ll need to cut through the noise and focus on the data that matters.

This doesn’t mean that those analyses were a waste of time – on the contrary, those analyses ensure that you have a good understanding of your dataset and how to interpret it. However, that doesn’t mean your reader or examiner needs to see the 165 histograms you created! Relevance is key.

How do I decide what’s relevant?

At this point, it can be difficult to strike a balance between what is and isn’t important. But the most important thing is to ensure your results reflect and align with the purpose of your study .  So, you need to revisit your research aims, objectives and research questions and use these as a litmus test for relevance. Make sure that you refer back to these constantly when writing up your chapter so that you stay on track.

There must be alignment between your research aims objectives and questions

As a general guide, your results chapter will typically include the following:

  • Some demographic data about your sample
  • Reliability tests (if you used measurement scales)
  • Descriptive statistics
  • Inferential statistics (if your research objectives and questions require these)
  • Hypothesis tests (again, if your research objectives and questions require these)

We’ll discuss each of these points in more detail in the next section.

Importantly, your results chapter needs to lay the foundation for your discussion chapter . This means that, in your results chapter, you need to include all the data that you will use as the basis for your interpretation in the discussion chapter.

For example, if you plan to highlight the strong relationship between Variable X and Variable Y in your discussion chapter, you need to present the respective analysis in your results chapter – perhaps a correlation or regression analysis.

Need a helping hand?

purpose of statistics in thesis

How do I write the results chapter?

There are multiple steps involved in writing up the results chapter for your quantitative research. The exact number of steps applicable to you will vary from study to study and will depend on the nature of the research aims, objectives and research questions . However, we’ll outline the generic steps below.

Step 1 – Revisit your research questions

The first step in writing your results chapter is to revisit your research objectives and research questions . These will be (or at least, should be!) the driving force behind your results and discussion chapters, so you need to review them and then ask yourself which statistical analyses and tests (from your mountain of data) would specifically help you address these . For each research objective and research question, list the specific piece (or pieces) of analysis that address it.

At this stage, it’s also useful to think about the key points that you want to raise in your discussion chapter and note these down so that you have a clear reminder of which data points and analyses you want to highlight in the results chapter. Again, list your points and then list the specific piece of analysis that addresses each point. 

Next, you should draw up a rough outline of how you plan to structure your chapter . Which analyses and statistical tests will you present and in what order? We’ll discuss the “standard structure” in more detail later, but it’s worth mentioning now that it’s always useful to draw up a rough outline before you start writing (this advice applies to any chapter).

Step 2 – Craft an overview introduction

As with all chapters in your dissertation or thesis, you should start your quantitative results chapter by providing a brief overview of what you’ll do in the chapter and why . For example, you’d explain that you will start by presenting demographic data to understand the representativeness of the sample, before moving onto X, Y and Z.

This section shouldn’t be lengthy – a paragraph or two maximum. Also, it’s a good idea to weave the research questions into this section so that there’s a golden thread that runs through the document.

Your chapter must have a golden thread

Step 3 – Present the sample demographic data

The first set of data that you’ll present is an overview of the sample demographics – in other words, the demographics of your respondents.

For example:

  • What age range are they?
  • How is gender distributed?
  • How is ethnicity distributed?
  • What areas do the participants live in?

The purpose of this is to assess how representative the sample is of the broader population. This is important for the sake of the generalisability of the results. If your sample is not representative of the population, you will not be able to generalise your findings. This is not necessarily the end of the world, but it is a limitation you’ll need to acknowledge.

Of course, to make this representativeness assessment, you’ll need to have a clear view of the demographics of the population. So, make sure that you design your survey to capture the correct demographic information that you will compare your sample to.

But what if I’m not interested in generalisability?

Well, even if your purpose is not necessarily to extrapolate your findings to the broader population, understanding your sample will allow you to interpret your findings appropriately, considering who responded. In other words, it will help you contextualise your findings . For example, if 80% of your sample was aged over 65, this may be a significant contextual factor to consider when interpreting the data. Therefore, it’s important to understand and present the demographic data.

Communicate the data

 Step 4 – Review composite measures and the data “shape”.

Before you undertake any statistical analysis, you’ll need to do some checks to ensure that your data are suitable for the analysis methods and techniques you plan to use. If you try to analyse data that doesn’t meet the assumptions of a specific statistical technique, your results will be largely meaningless. Therefore, you may need to show that the methods and techniques you’ll use are “allowed”.

Most commonly, there are two areas you need to pay attention to:

#1: Composite measures

The first is when you have multiple scale-based measures that combine to capture one construct – this is called a composite measure .  For example, you may have four Likert scale-based measures that (should) all measure the same thing, but in different ways. In other words, in a survey, these four scales should all receive similar ratings. This is called “ internal consistency ”.

Internal consistency is not guaranteed though (especially if you developed the measures yourself), so you need to assess the reliability of each composite measure using a test. Typically, Cronbach’s Alpha is a common test used to assess internal consistency – i.e., to show that the items you’re combining are more or less saying the same thing. A high alpha score means that your measure is internally consistent. A low alpha score means you may need to consider scrapping one or more of the measures.

#2: Data shape

The second matter that you should address early on in your results chapter is data shape. In other words, you need to assess whether the data in your set are symmetrical (i.e. normally distributed) or not, as this will directly impact what type of analyses you can use. For many common inferential tests such as T-tests or ANOVAs (we’ll discuss these a bit later), your data needs to be normally distributed. If it’s not, you’ll need to adjust your strategy and use alternative tests.

To assess the shape of the data, you’ll usually assess a variety of descriptive statistics (such as the mean, median and skewness), which is what we’ll look at next.

Descriptive statistics

Step 5 – Present the descriptive statistics

Now that you’ve laid the foundation by discussing the representativeness of your sample, as well as the reliability of your measures and the shape of your data, you can get started with the actual statistical analysis. The first step is to present the descriptive statistics for your variables.

For scaled data, this usually includes statistics such as:

  • The mean – this is simply the mathematical average of a range of numbers.
  • The median – this is the midpoint in a range of numbers when the numbers are arranged in order.
  • The mode – this is the most commonly repeated number in the data set.
  • Standard deviation – this metric indicates how dispersed a range of numbers is. In other words, how close all the numbers are to the mean (the average).
  • Skewness – this indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph (this is called a normal or parametric distribution), or do they lean to the left or right (this is called a non-normal or non-parametric distribution).
  • Kurtosis – this metric indicates whether the data are heavily or lightly-tailed, relative to the normal distribution. In other words, how peaked or flat the distribution is.

A large table that indicates all the above for multiple variables can be a very effective way to present your data economically. You can also use colour coding to help make the data more easily digestible.

For categorical data, where you show the percentage of people who chose or fit into a category, for instance, you can either just plain describe the percentages or numbers of people who responded to something or use graphs and charts (such as bar graphs and pie charts) to present your data in this section of the chapter.

When using figures, make sure that you label them simply and clearly , so that your reader can easily understand them. There’s nothing more frustrating than a graph that’s missing axis labels! Keep in mind that although you’ll be presenting charts and graphs, your text content needs to present a clear narrative that can stand on its own. In other words, don’t rely purely on your figures and tables to convey your key points: highlight the crucial trends and values in the text. Figures and tables should complement the writing, not carry it .

Depending on your research aims, objectives and research questions, you may stop your analysis at this point (i.e. descriptive statistics). However, if your study requires inferential statistics, then it’s time to deep dive into those .

Dive into the inferential statistics

Step 6 – Present the inferential statistics

Inferential statistics are used to make generalisations about a population , whereas descriptive statistics focus purely on the sample . Inferential statistical techniques, broadly speaking, can be broken down into two groups .

First, there are those that compare measurements between groups , such as t-tests (which measure differences between two groups) and ANOVAs (which measure differences between multiple groups). Second, there are techniques that assess the relationships between variables , such as correlation analysis and regression analysis. Within each of these, some tests can be used for normally distributed (parametric) data and some tests are designed specifically for use on non-parametric data.

There are a seemingly endless number of tests that you can use to crunch your data, so it’s easy to run down a rabbit hole and end up with piles of test data. Ultimately, the most important thing is to make sure that you adopt the tests and techniques that allow you to achieve your research objectives and answer your research questions .

In this section of the results chapter, you should try to make use of figures and visual components as effectively as possible. For example, if you present a correlation table, use colour coding to highlight the significance of the correlation values, or scatterplots to visually demonstrate what the trend is. The easier you make it for your reader to digest your findings, the more effectively you’ll be able to make your arguments in the next chapter.

make it easy for your reader to understand your quantitative results

Step 7 – Test your hypotheses

If your study requires it, the next stage is hypothesis testing. A hypothesis is a statement , often indicating a difference between groups or relationship between variables, that can be supported or rejected by a statistical test. However, not all studies will involve hypotheses (again, it depends on the research objectives), so don’t feel like you “must” present and test hypotheses just because you’re undertaking quantitative research.

The basic process for hypothesis testing is as follows:

  • Specify your null hypothesis (for example, “The chemical psilocybin has no effect on time perception).
  • Specify your alternative hypothesis (e.g., “The chemical psilocybin has an effect on time perception)
  • Set your significance level (this is usually 0.05)
  • Calculate your statistics and find your p-value (e.g., p=0.01)
  • Draw your conclusions (e.g., “The chemical psilocybin does have an effect on time perception”)

Finally, if the aim of your study is to develop and test a conceptual framework , this is the time to present it, following the testing of your hypotheses. While you don’t need to develop or discuss these findings further in the results chapter, indicating whether the tests (and their p-values) support or reject the hypotheses is crucial.

Step 8 – Provide a chapter summary

To wrap up your results chapter and transition to the discussion chapter, you should provide a brief summary of the key findings . “Brief” is the keyword here – much like the chapter introduction, this shouldn’t be lengthy – a paragraph or two maximum. Highlight the findings most relevant to your research objectives and research questions, and wrap it up.

Some final thoughts, tips and tricks

Now that you’ve got the essentials down, here are a few tips and tricks to make your quantitative results chapter shine:

  • When writing your results chapter, report your findings in the past tense . You’re talking about what you’ve found in your data, not what you are currently looking for or trying to find.
  • Structure your results chapter systematically and sequentially . If you had two experiments where findings from the one generated inputs into the other, report on them in order.
  • Make your own tables and graphs rather than copying and pasting them from statistical analysis programmes like SPSS. Check out the DataIsBeautiful reddit for some inspiration.
  • Once you’re done writing, review your work to make sure that you have provided enough information to answer your research questions , but also that you didn’t include superfluous information.

If you’ve got any questions about writing up the quantitative results chapter, please leave a comment below. If you’d like 1-on-1 assistance with your quantitative analysis and discussion, check out our hands-on coaching service , or book a free consultation with a friendly coach.

purpose of statistics in thesis

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project. 

You Might Also Like:

How to write the results chapter in a qualitative thesis

Thank you. I will try my best to write my results.


Awesome content 👏🏾


this was great explaination

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Descriptive Statistics | Definitions, Types, Examples

Published on July 9, 2020 by Pritha Bhandari . Revised on June 21, 2023.

Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population.

In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

The next step is inferential statistics , which help you decide whether your data confirms or refutes your hypothesis and whether it is generalizable to a larger population.

Table of contents

Types of descriptive statistics, frequency distribution, measures of central tendency, measures of variability, univariate descriptive statistics, bivariate descriptive statistics, other interesting articles, frequently asked questions about descriptive statistics.

There are 3 main types of descriptive statistics:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability or dispersion concerns how spread out the values are.

Types of descriptive statistics

You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and multivariate analysis.

  • Go to a library
  • Watch a movie at a theater
  • Visit a national park

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarize the frequency of every possible value of a variable in numbers or percentages. This is called a frequency distribution .

  • Simple frequency distribution table
  • Grouped frequency distribution table

From this table, you can see that more women than men or people with another gender identity took part in the study. In a grouped frequency distribution, you can group numerical response values and add up the number of responses for each group. You can also convert each of these numbers to percentages.

Measures of central tendency estimate the center, or average, of a data set. The mean, median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using the first 6 responses of our survey.

The mean , or M , is the most commonly used method for finding the average.

To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N .

The median is the value that’s exactly in the middle of a data set.

To find the median, order each response value from the smallest to the biggest. Then , the median is the number in the middle. If there are two numbers in the middle, find their mean.

The mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode.

To find the mode, order your data set from lowest to highest and find the response that occurs most frequently.

Measures of variability give you a sense of how spread out the response values are. The range, standard deviation and variance each reflect different aspects of spread.

The range gives you an idea of how far apart the most extreme response scores are. To find the range , simply subtract the lowest value from the highest value.

Standard deviation

The standard deviation ( s or SD ) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:

  • List each score and find their mean.
  • Subtract the mean from each score to get the deviation from the mean.
  • Square each of these deviations.
  • Add up all of the squared deviations.
  • Divide the sum of the squared deviations by N – 1.
  • Find the square root of the number you found.

Step 5: 421.5/5 = 84.3

Step 6: √84.3 = 9.18

The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.

To find the variance, simply square the standard deviation. The symbol for variance is s 2 .

Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread. Programs like SPSS and Excel can be used to easily calculate these.

If you were to only consider the mean as a measure of central tendency, your impression of the “middle” of the data set can be skewed by outliers, unlike the median or mode.

Likewise, while the range is sensitive to outliers , you should also consider the standard deviation and variance to get easily comparable measures of spread.

If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests .

Multivariate analysis is the same as bivariate analysis but with more than two variables.

Contingency table

In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read “across” the table to see how the independent and dependent variables relate to each other.

Interpreting a contingency table is easier when the raw data is converted to percentages. Percentages make each row comparable to the other by making it seem as if each group had only 100 observations or participants. When creating a percentage-based contingency table, you add the N for each independent variable on the end.

From this table, it is more clear that similar proportions of children and adults go to the library over 17 times a year. Additionally, children most commonly went to the library between 5 and 8 times, while for adults, this number was between 13 and 16.

Scatter plots

A scatter plot is a chart that shows you the relationship between two or three variables . It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

From your scatter plot, you see that as the number of movies seen at movie theaters increases, the number of visits to the library decreases. Based on your visual assessment of a possible linear relationship, you perform further tests of correlation and regression.

Descriptive statistics: Scatter plot

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Statistical power
  • Pearson correlation
  • Degrees of freedom
  • Statistical significance


  • Cluster sampling
  • Stratified sampling
  • Focus group
  • Systematic review
  • Ethnography
  • Double-Barreled Question

Research bias

  • Implicit bias
  • Publication bias
  • Cognitive bias
  • Placebo effect
  • Pygmalion effect
  • Hindsight bias
  • Overconfidence bias

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.
  • Univariate statistics summarize only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 21). Descriptive Statistics | Definitions, Types, Examples. Scribbr. Retrieved February 12, 2024, from

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, central tendency | understanding the mean, median & mode, variability | calculating range, iqr, variance, standard deviation, inferential statistics | an easy introduction & examples, what is your plagiarism score.

Book cover

International Handbook of Research in Statistics Education pp 5–36 Cite as

What Is Statistics?

  • Christopher J. Wild 4 ,
  • Jessica M. Utts 5 &
  • Nicholas J. Horton 6  
  • First Online: 10 December 2017

3378 Accesses

17 Citations

Part of the Springer International Handbooks of Education book series (SIHE)

What is statistics? We attempt to answer this question as it relates to grounding research in statistics education. We discuss the nature of statistics as the science of learning from data, its history and traditions, what characterizes statistical thinking and how it differs from mathematics, connections with computing and data science, why learning statistics is essential, and what is most important. Finally, we attempt to gaze into the future, drawing upon what is known about the fast-growing demand for statistical skills and the portents of where the discipline is heading, especially those arising from data science and the promises and problems of big data.

  • Discipline of statistics
  • Statistical thinking
  • Value of statistics
  • Statistical fundamentals
  • Decision-making
  • Trends in statistical practice
  • Data science
  • Computational thinking

This is a preview of subscription content, log in via an institution .

American Association for the Advancement of Science (2015). Meeting theme: Innovations, information, and imaging. Retrieved from .

Google Scholar  

American Statistical Association Undergraduate Guidelines Workgroup. (2014). Curriculum guidelines for undergraduate programs in statistical science . Alexandria, VA: American Statistical Association. Online. Retrieved from

AP Computer Science Principles. (2017). Course and exam description. Retrieved from .

AP Statistics. (2016). Course overview. Retrieved from .

Applebaum, B. (2015, May 21). Vague on your monthly spending? You’re not alone. New York Times , A3.

Arnold, P. A. (2013). Statistical Investigative Questions: An enquiry into posing and answering investigative questions from existing data . Ph.D. thesis, Statistics University of Auckland. Retrieved from .

Baldi, B., & Utts, J. (2015). What your future doctor should know about statistics: Must-include topics for introductory undergraduate biostatistics. The American Statistician, 69 (3), 231–240.

Article   Google Scholar  

Bartholomew, D. (1995). What is statistics? Journal of the Royal Statistical Society, Series A: Statistics in Society, 158 , 1–20.

Box, G. E. P. (1990). Commentary. Technometrics, 32 (3), 251–252.

Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16 (3), 199–231.

Brown, E. N., & Kass, R. E. (2009). What is statistics? (with discussion). The American Statistician, 63 (2), 105–123.

Carver, R. H., & Stevens, M. (2014). It is time to include data management in introductory statistics. In K. Makar, B. de Sousa, & R. Gould (Eds.), Proceedings of the ninth international conference on teaching statistics . Retrieved from

Chambers, J. M. (1993). Greater or lesser statistics: A choice for future research. Statistics and Computing, 3 (4), 182–184.

Chance, B. (2002). Components of statistical thinking and implications for instruction and assessment. Journal of Statistics Education, 10 (3). Retrieved from .

Cobb, G. W. (2015). Mere renovation is too little, too late: We need to rethink the undergraduate curriculum from the ground up. The American Statistician, 69 (4), 266–282.

Cobb, G. W., & Moore, D. S. (1997). Mathematics, statistics, and teaching. The American Mathematical Monthly, 104 (9), 801–823.

Cohn, V., & Cope, L. (2011). News and numbers: A writer’s guide to statistics . Hoboken, NJ: Wiley-Blackwell.

CRA. (2012). Challenges and opportunities with big data: A community white paper developed by leading researchers across the United States. Retrieved from .

De Veaux, R. D., & Velleman, P. (2008). Math is music; statistics is literature. Amstat News, 375 , 54–60.

Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 249–267). Cambridge, England: Cambridge University Press.

Chapter   Google Scholar  

Farrell, D., & Greig, F. (2015, May). Weathering volatility: Big data on the financial ups and downs of U.S. individuals (J.P. Morgan Chase & Co. Institute Technical Report). Retrieved from August 15, 2015, .

Fienberg, S. E. (1992). A brief history of statistics in three and one-half chapters: A review essay. Statistical Science, 7 (2), 208–225.

Fienberg, S. E. (2014). What is statistics? Annual Review of Statistics and Its Applications, 1 , 1–9.

Finzer, W. (2013). The data science education dilemma. Technology Innovations in Statistics Education, 7 (2). Retrieved from .

Forbes, S. (2014). The coming of age of statistics education in New Zealand, and its influence internationally. Journal of Statistics Education, 22 (2). Retrieved from .

Friedman, J. H. (2001). The role of statistics in the data revolution? International Statistical Review, 69 (1), 5–10.

Friendly, M. (2008). The golden age of statistical graphics. Statistical Science, 23 (4), 502–535.

Future of Statistical Sciences. (2013). Statistics and Science: A report of the London Workshop on the Future of the Statistical Sciences . Retrieved from .

GAISE College Report. (2016). Guidelines for assessment and instruction in Statistics Education College Report , American Statistical Association, Alexandria, VA. Retrieved from .

GAISE K-12 Report. (2005). Guidelines for assessment and instruction in Statistics Education K-12 Report , American Statistical Association, Alexandria, VA. Retrieved from .

Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L. M., & Woloshin, S. (2008). Helping doctors and patients make sense of health statistics. Psychological Science in the Public Interest, 8 (2), 53–96.

Grolemund, G., & Wickham, H. (2014). A cognitive interpretation of data analysis. International Statistical Review, 82 (2), 184–204.

Hacking, I. (1990). The taming of chance . New York, NY: Cambridge University Press.

Book   Google Scholar  

Hahn, G. J., & Doganaksoy, N. (2012). A career in statistics: Beyond the numbers . Hoboken, NJ: Wiley.

Hand, D. J. (2014). The improbability principle: Why coincidences, miracles, and rare events happen every day . New York, NY: Scientific American.

Holmes, P. (2003). 50 years of statistics teaching in English schools: Some milestones (with discussion). Journal of the Royal Statistical Society, Series D (The Statistician), 52 (4), 439–474.

Horton, N. J. (2015). Challenges and opportunities for statistics and statistical education: Looking back, looking forward. The American Statistician, 69 (2), 138–145.

Horton, N. J., & Hardin, J. (2015). Teaching the next generation of statistics students to “Think with Data”: Special issue on statistics and the undergraduate curriculum. The American Statistician, 69 (4), 258–265. Retrieved from

Ioannidis, J. (2005). Why most published research findings are false. PLoS Medicine, 2 , e124.

Kendall, M. G. (1960). Studies in the history of probability and statistics. Where shall the history of statistics begin? Biometrika, 47 (3), 447–449.

Konold, C., & Pollatsek, A. (2002). Data analysis as the search for signals in noisy processes. Journal for Research in Mathematics Education, 33 (4), 259–289.

Lawes, C. M., Vander Hoorn, S., Law, M. R., & Rodgers, A. (2004). High cholesterol. In M. Ezzati, A. D. Lopez, A. Rodgers, & C. J. L. Murray (Eds.), Comparative quantification of health risks, global and regional burden of disease attributable to selected major risk factors (Vol. 1, pp. 391–496). Geneva: World Health Organization.

Live Science. (2012, February 22). Citrus fruits lower women’s stroke risk . Retrieved from .

MacKay, R. J., & Oldford, R. W. (2000). Scientific method, statistical method and the speed of light. Statistical Science, 15 (3), 254–278.

Madigan, D., & Gelman, A. (2009). Comment. The American Statistician, 63 (2), 114–115.

Manyika, J., Chui, M., Brown B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. Retrieved from .

Marquardt, D. W. (1987). The importance of statisticians. Journal of the American Statistical Association, 82 (397), 1–7.

Moore, D. S. (1998). Statistics among the Liberal Arts. Journal of the American Statistical Association, 93 (444), 1253–1259.

Moore, D. S. (1999). Discussion: What shall we teach beginners? International Statistical Review, 67 (3), 250–252.

Moore, D. S., & Notz, W. I. (2016). Statistics: Concepts and controversies (9th ed.). New York, NY: Macmillan Learning.

NBC News. (2011, January 4). Walk faster and you just might live longer . Retrieved from .

NBC News. (2012, May 16). 6 cups a day? Coffee lovers less likely to die, study finds . Retrieved from .

Nolan, D., & Perrett, J. (2016). Teaching and learning data visualization: Ideas and assignments. The American Statistician 70(3):260–269. Retrieved from .

Nolan, D., & Temple Lang, D. (2010). Computing in the statistics curricula. The American Statistician, 64 (2), 97–107.

Nolan, D., & Temple Lang, D. (2014). XML and web technologies for data sciences with R . New York, NY: Springer.

Nuzzo, R. (2014). Scientific method: Statistical errors. Nature, 506 , 150–152. Retrieved from

Pfannkuch, M., Budget, S., Fewster, R., Fitch, M., Pattenwise, S., Wild, C., et al. (2016). Probability modeling and thinking: What can we learn from practice? Statistics Education Research Journal, 15 (2), 11–37. Retrieved from

Pfannkuch, M., & Wild, C. J. (2004). Towards an understanding of statistical thinking. In D. Ben-Zvi & J. Garfield (Eds.), The challenge of developing statistical literacy, reasoning, and thinking (pp. 17–46). Dordrecht, The Netherlands: Kluwer Academic Publishers.

Porter, T. M. (1986). The rise of statistical thinking 1820–1900 . Princeton, NJ: Princeton University Press.

Pullinger, J. (2014). Statistics making an impact. Journal of the Royal Statistical Society, A, 176 (4), 819–839.

Ridgway, J. (2015). Implications of the data revolution for statistics education. International Statistical Review, 84 (3), 528–549. Retrieved from

Rodriguez, R. N. (2013). The 2012 ASA Presidential Address: Building the big tent for statistics. Journal of the American Statistical Association, 108 (501), 1–6.

Scheaffer, R. L. (2001). Statistics education: Perusing the past, embracing the present, and charting the future. Newsletter for the Section on Statistical Education, 7 (1). Retrieved from .

Schoenfeld, A. H. (1985). Mathematical problem solving . Orlando, FL: Academic Press.

Silver, N. (2014, August 25). Is the polling industry in stasis or in crisis? FiveThirtyEight Politics. Retrieved August 15, 2015, from .

Snee, R. (1990). Statistical thinking and its contribution to quality. The American Statistician, 44 (2), 116–121.

Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900 . Cambridge, MA: Harvard University Press.

Stigler, S. M. (2016). The seven pillars of statistical wisdom . Cambridge, MA: Harvard University Press.

Utts, J. (2003). What educated citizens should know about statistics and probability. The American Statistician, 57 (2), 74–79.

Utts, J. (2010). Unintentional lies in the media: Don’t blame journalists for what we don’t teach. In C. Reading (Ed.), Proceedings of the Eighth International Conference on Teaching Statistics. Data and Context in Statistics Education . Voorburg, The Netherlands: International Statistical Institute.

Utts, J. (2015a). Seeing through statistics (4th ed.). Stamford, CT: Cengage Learning.

Utts, J. (2015b). The many facets of statistics education: 175 years of common themes. The American Statistician, 69 (2), 100–107.

Utts, J., & Heckard, R. (2015). Mind on statistics (5th ed.). Stamford, CT: Cengage Learning.

Vere-Jones, D. (1995). The coming of age of statistical education. International Statistical Review, 63 (1), 3–23.

Wasserstein, R. (2015). Communicating the power and impact of our profession: A heads up for the next Executive Directors of the ASA. The American Statistician, 69 (2), 96–99.

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p -values: Context, process, and purpose. The American Statistician, 70 (2), 129–133.

Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59 (10). Retrieved from .

Wild, C. J. (1994). On embracing the ‘wider view’ of statistics. The American Statistician, 48 (2), 163–171.

Wild, C. J. (2015). Further, faster, wider. The American Statistician . Retrieved from

Wild, C. J. (2017). Statistical literacy as the earth moves. Statistics Education Research Journal, 16 (1), 31–37.

Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry (with discussion). International Statistical Review, 67 (3), 223–265.

Download references

Author information

Authors and affiliations.

Department of Statistics, The University of Auckland, Auckland, New Zealand

Christopher J. Wild

Department of Statistics, University of California—Irvine, Irvine, CA, USA

Jessica M. Utts

Department of Mathematics and Statistics, Amherst College, Amherst, MA, USA

Nicholas J. Horton

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Christopher J. Wild .

Editor information

Editors and affiliations.

Faculty of Education, The University of Haifa, Haifa, Israel

Dani Ben-Zvi

School of Education, University of Queensland, St Lucia, Queensland, Australia

Katie Makar

Department of Educational Psychology, The University of Minnesota, Minneapolis, Minnesota, USA

Joan Garfield

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter.

Wild, C.J., Utts, J.M., Horton, N.J. (2018). What Is Statistics?. In: Ben-Zvi, D., Makar, K., Garfield, J. (eds) International Handbook of Research in Statistics Education. Springer International Handbooks of Education. Springer, Cham.

Download citation


Published : 10 December 2017

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-66193-3

Online ISBN : 978-3-319-66195-7

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Utility Menu

University Logo

Department of Statistics


  • Open Positions

What do senior theses in Statistics look like?

This is a brief overview of thesis writing; for more information, please see our  complete guide here . Senior theses in Statistics cover a wide range of topics, across the spectrum from applied to theoretical. Typically, senior theses are expected to have one of the following three flavors:                                                                                                            

1. Novel statistical theory or methodology, supported by extensive mathematical and/or simulation results, along with a clear account of how the research extends or relates to previous related work.

2. An analysis of a complex data set that advances understanding in a related field, such as public health, economics, government, or genetics. Such a thesis may rely entirely on existing methods, but should give useful results and insights into an interesting applied problem.                                                                                 

3. An analysis of a complex data set in which new methods or modifications of published methods are required. While the thesis does not necessarily contain an extensive mathematical study of the new methods, it should contain strong plausibility arguments or simulations supporting the use of the new methods.

A good thesis is clear, readable, and well-motivated, justifying the applicability of the methods used rather than, for example, mechanically running regressions without discussing the assumptions (and whether they are plausible), performing diagnostics, and checking whether the conclusions make sense. 

Recent FAQs

  • What is a qualified applicant's likelihood for admission?
  • What is the application deadline?
  • Can I start the program in the spring?
  • Can I apply to two different GSAS degree programs at the same time?
  • Is a Math or Stats major required for admission?
  • Is the GRE required?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Ann Card Anaesth
  • v.22(1); Jan-Mar 2019

Descriptive Statistics and Normality Tests for Statistical Data

Prabhaker mishra.

Department of Biostatistics and Health Informatics, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Chandra M Pandey

Uttam singh, anshul gupta.

1 Department of Haematology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Chinmoy Sahu

2 Department of Microbiology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Amit Keshri

3 Department of Neuro-Otology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Descriptive statistics are an important part of biomedical research which is used to describe the basic features of the data in the study. They provide simple summaries about the sample and the measures. Measures of the central tendency and dispersion are used to describe the quantitative data. For the continuous data, test of the normality is an important step for deciding the measures of central tendency and statistical methods for data analysis. When our data follow normal distribution, parametric tests otherwise nonparametric methods are used to compare the groups. There are different methods used to test the normality of data, including numerical and visual methods, and each method has its own advantages and disadvantages. In the present study, we have discussed the summary measures and methods used to test the normality of the data.


A data set is a collection of the data of individual cases or subjects. Usually, it is meaningless to present such data individually because that will not produce any important conclusions. In place of individual case presentation, we present summary statistics of our data set with or without analytical form which can be easily absorbable for the audience. Statistics which is a science of collection, analysis, presentation, and interpretation of the data, have two main branches, are descriptive statistics and inferential statistics.[ 1 ]

Summary measures or summary statistics or descriptive statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Descriptive statistics are the kind of information presented in just a few words to describe the basic features of the data in a study such as the mean and standard deviation (SD).[ 2 , 3 ] The another is inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors and sampling variation). In inferential statistics, most predictions are for the future and generalizations about a population by studying a smaller sample.[ 2 , 4 ] To draw the inference from the study participants in terms of different groups, etc., statistical methods are used. These statistical methods have some assumptions including normality of the continuous data. There are different methods used to test the normality of data, including numerical and visual methods, and each method has its own advantages and disadvantages.[ 5 ] Descriptive statistics and inferential statistics both are employed in scientific analysis of data and are equally important in the statistics. In the present study, we have discussed the summary measures to describe the data and methods used to test the normality of the data. To understand the descriptive statistics and test of the normality of the data, an example [ Table 1 ] with a data set of 15 patients whose mean arterial pressure (MAP) was measured are given below. Further examples related to the measures of central tendency, dispersion, and tests of normality are discussed based on the above data.

Distribution of mean arterial pressure (mmHg) as per sex

MAP: Mean arterial pressure, M: Male, F: Female

Descriptive Statistics

There are three major types of descriptive statistics: Measures of frequency (frequency, percent), measures of central tendency (mean, median and mode), and measures of dispersion or variation (variance, SD, standard error, quartile, interquartile range, percentile, range, and coefficient of variation [CV]) provide simple summaries about the sample and the measures. A measure of frequency is usually used for the categorical data while others are used for quantitative data.

Measures of Frequency

Frequency statistics simply count the number of times that in each variable occurs, such as the number of males and females within the sample or population. Frequency analysis is an important area of statistics that deals with the number of occurrences (frequency) and percentage. For example, according to Table 1 , out of the 15 patients, frequency of the males and females were 8 (53.3%) and 7 (46.7%), respectively.

Measures of Central Tendency

Data are commonly describe the observations in a measure of central tendency, which is also called measures of central location, is used to find out the representative value of a data set. The mean, median, and mode are three types of measures of central tendency. Measures of central tendency give us one value (mean or median) for the distribution and this value represents the entire distribution. To make comparisons between two or more groups, representative values of these distributions are compared. It helps in further statistical analysis because many techniques of statistical analysis such as measures of dispersion, skewness, correlation, t -test, and ANOVA test are calculated using value of measures of central tendency. That is why measures of central tendency are also called as measures of the first order. A representative value (measures of central tendency) is considered good when it was calculated using all observations and not affected by extreme values because these values are used to calculate for further measures.

Computation of Measures of Central Tendency

Mean is the mathematical average value of a set of data. Mean can be calculated using summation of the observations divided by number of observations. It is the most popular measure and very easy to calculate. It is a unique value for one group, that is, there is only one answer, which is useful when comparing between the groups. In the computation of mean, all the observations are used.[ 2 , 5 ] One disadvantage with mean is that it is affected by extreme values (outliers). For example, according to Table 2 , mean MAP of the patients was 97.47 indicated that average MAP of the patients was 97.47 mmHg.

Descriptive statistics of the mean arterial pressure (mmHg)

SD: Standard deviation, SE: Standard error, Q1: First quartile, Q2: Second quartile, Q3: Third quartile

The median is defined as the middle most observation if data are arranged either in increasing or decreasing order of magnitude. Thus, it is one of the observations, which occupies the central place in the distribution (data). This is also called positional average. Extreme values (outliers) do not affect the median. It is unique, that is, there is only one median of one data set which is useful when comparing between the groups. There is one disadvantage of median over mean that it is not as popular as mean.[ 6 ] For example, according to Table 2 , median MAP of the patients was 95 mmHg indicated that 50% observations of the data are either less than or equal to the 95 mmHg and rest of the 50% observations are either equal or greater than 95 mmHg.

Mode is a value that occurs most frequently in a set of observation, that is, the observation, which has maximum frequency is called mode. In a data set, it is possible to have multiple modes or no mode exists. Due to the possibility of the multiple modes for one data set, it is not used to compare between the groups. For example, according to Table 2 , maximum repeated value is 116 mmHg (2 times) rest are repeated one time only, mode of the data is 116 mmHg.

Measures of Dispersion

Measures of dispersion is another measure used to show how spread out (variation) in a data set also called measures of variation. It is quantitatively degree of variation or dispersion of values in a population or in a sample. More specifically, it is showing lack of representation of measures of central tendency usually for mean/median. These are indices that give us an idea about homogeneity or heterogeneity of the data.[ 2 , 6 ]

Common measures

Variance, SD, standard error, quartile, interquartile range, percentile, range, and CV.

Computation of Measures of Dispersion

Standard deviation and variance.

The SD is a measure of how spread out values is from its mean value. Its symbol is σ (the Greek letter sigma) or s. It is called SD because we have taken a standard value (mean) to measures the dispersion. Where x i is individual value, x ̄ is mean value. If sample size is <30, we use “ n -1” in denominator, for sample size ≥30, use “ n ” in denominator. The variance (s 2 ) is defined as the average of the squared difference from the mean. It is equal to the square of the SD (s).

An external file that holds a picture, illustration, etc.
Object name is ACA-22-67-g001.jpg

For example, in the above, SD is 11.01 mmHg When n <30 which showed that approximate average deviation between mean value and individual values is 11.01. Similarly, variance is 121.22 [i.e., (11.01) 2 ], which showed that average square deviation between mean value and individual values is 121.22 [ Table 2 ].

Standard error

Standard error is the approximate difference between sample mean and population mean. When we draw the many samples from same population with same sample size through random sampling technique, then SD among the sample means is called standard error. If sample SD and sample size are given, we can calculate standard error for this sample, by using the formula.

Standard error = sample SD/√sample size.

For example, according to Table 2 , standard error is 2.84 mmHg, which showed that average mean difference between sample means and population mean is 2.84 mmHg [ Table 2 ].

Quartiles and interquartile range

The quartiles are the three points that divide the data set into four equal groups, each group comprising a quarter of the data, for a set of data values which are arranged in either ascending or descending order. Q1, Q2, and Q3 are represent the first, second, and third quartile's value.[ 7 ]

For ith Quartile = [i * (n + 1)/4] th observation, where i = 1, 2, 3.

For example, in the above, first quartile (Q1) = (n + 1)/4= (15 + 1)/4 = 4 th observation from initial = 88 mmHg (i.e., first 25% number of observations of the data are either ≤88 and rest 75% observations are either ≥88), Q2 (also called median) = [2* (n + 1)/4] = 8 th observation from initial = 95 mmHg, that is, first 50% number of observations of the data are either less or equal to the 95 and rest 50% observations are either ≥95, and similarly Q3 = [3* (n + 1)/4] = 12 th observation from initial = 107 mmHg, i.e., indicated that first 75% number of observations of the data are either ≤107 and rest 25% observations are either ≥107. The interquartile range (IQR) is a measure of variability, also called the midspread or middle 50%, which is a measure of statistical dispersion, being equal to the difference between 75 th (Q3 or third quartile) and 25 th (Q1 or first quartile) percentiles. For example, in the above example, three quartiles, that is, Q1, Q2, and Q3 are 88, 95, and 107, respectively. As the first and third quartile in the data is 88 and 107. Hence, IQR of the data is 19 mmHg (also can write like: 88–107) [ Table 2 ].

An external file that holds a picture, illustration, etc.
Object name is ACA-22-67-g002.jpg

The percentiles are the 99 points that divide the data set into 100 equal groups, each group comprising a 1% of the data, for a set of data values which are arranged in either ascending or descending order. About 25% percentile is the first quartile, 50% percentile is the second quartile also called median value, while 75% percentile is the third quartile of the data.

For ith percentile = [i * (n + 1)/100] th observation, where i = 1, 2, 3.,99.

Example: In the above, 10 th percentile = [10* (n + 1)/100] =1.6 th observation from initial which is fall between the first and second observation from the initial = 1 st observation + 0.6* (difference between the second and first observation) = 83.20 mmHg, which indicated that 10% of the data are either ≤83.20 and rest 90% observations are either ≥83.20.

An external file that holds a picture, illustration, etc.
Object name is ACA-22-67-g003.jpg

Coefficient of Variation

Interpretation of SD without considering the magnitude of mean of the sample or population may be misleading. To overcome this problem, CV gives an idea. CV gives the result in terms of ratio of SD with respect to its mean value, which expressed in %. CV = 100 × (SD/mean). For example, in the above, coefficient of the variation is 11.3% which indicated that SD is 11.3% of its mean value [i.e., 100* (11.01/97.47)] [ Table 2 ].

Difference between largest and smallest observation is called range. If A and B are smallest and largest observations in a data set, then the range (R) is equal to the difference of largest and smallest observation, that is, R = A−B.

For example, in the above, minimum and maximum observation in the data is 82 mmHg and 116 mmHg. Hence, the range of the data is 34 mmHg (also can write like: 82–116) [ Table 2 ].

Descriptive statistics can be calculated in the statistical software “SPSS” (analyze → descriptive statistics → frequencies or descriptives.

Normality of data and testing

The standard normal distribution is the most important continuous probability distribution has a bell-shaped density curve described by its mean and SD and extreme values in the data set have no significant impact on the mean value. If a continuous data is follow normal distribution then 68.2%, 95.4%, and 99.7% observations are lie between mean ± 1 SD, mean ± 2 SD, and mean ± 3 SD, respectively.[ 2 , 4 ]

Why to test the normality of data

Various statistical methods used for data analysis make assumptions about normality, including correlation, regression, t -tests, and analysis of variance. Central limit theorem states that when sample size has 100 or more observations, violation of the normality is not a major issue.[ 5 , 8 ] Although for meaningful conclusions, assumption of the normality should be followed irrespective of the sample size. If a continuous data follow normal distribution, then we present this data in mean value. Further, this mean value is used to compare between/among the groups to calculate the significance level ( P value). If our data are not normally distributed, resultant mean is not a representative value of our data. A wrong selection of the representative value of a data set and further calculated significance level using this representative value might give wrong interpretation.[ 9 ] That is why, first we test the normality of the data, then we decide whether mean is applicable as representative value of the data or not. If applicable, then means are compared using parametric test otherwise medians are used to compare the groups, using nonparametric methods.

Methods used for test of normality of data

An assessment of the normality of data is a prerequisite for many statistical tests because normal data is an underlying assumption in parametric testing. There are two main methods of assessing normality: Graphical and numerical (including statistical tests).[ 3 , 4 ] Statistical tests have the advantage of making an objective judgment of normality but have the disadvantage of sometimes not being sensitive enough at low sample sizes or overly sensitive to large sample sizes. Graphical interpretation has the advantage of allowing good judgment to assess normality in situations when numerical tests might be over or undersensitive. Although normality assessment using graphical methods need a great deal of the experience to avoid the wrong interpretations. If we do not have a good experience, it is the best to rely on the numerical methods.[ 10 ] There are various methods available to test the normality of the continuous data, out of them, most popular methods are Shapiro–Wilk test, Kolmogorov–Smirnov test, skewness, kurtosis, histogram, box plot, P–P Plot, Q–Q Plot, and mean with SD. The two well-known tests of normality, namely, the Kolmogorov–Smirnov test and the Shapiro–Wilk test are most widely used methods to test the normality of the data. Normality tests can be conducted in the statistical software “SPSS” (analyze → descriptive statistics → explore → plots → normality plots with tests).

The Shapiro–Wilk test is more appropriate method for small sample sizes (<50 samples) although it can also be handling on larger sample size while Kolmogorov–Smirnov test is used for n ≥50. For both of the above tests, null hypothesis states that data are taken from normal distributed population. When P > 0.05, null hypothesis accepted and data are called as normally distributed. Skewness is a measure of symmetry, or more precisely, the lack of symmetry of the normal distribution. Kurtosis is a measure of the peakedness of a distribution. The original kurtosis value is sometimes called kurtosis (proper). Most of the statistical packages such as SPSS provide “excess” kurtosis (also called kurtosis [excess]) obtained by subtracting 3 from the kurtosis (proper). A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. If mean, median, and mode of a distribution coincide, then it is called a symmetric distribution, that is, skewness = 0, kurtosis (excess) = 0. A distribution is called approximate normal if skewness or kurtosis (excess) of the data are between − 1 and + 1. Although this is a less reliable method in the small-to-moderate sample size (i.e., n <300) because it can not adjust the standard error (as the sample size increases, the standard error decreases). To overcome this problem, a z -test is applied for normality test using skewness and kurtosis. A Z score could be obtained by dividing the skewness values or excess kurtosis value by their standard errors. For small sample size ( n <50), z value ± 1.96 are sufficient to establish normality of the data.[ 8 ] However, medium-sized samples (50≤ n <300), at absolute z -value ± 3.29, conclude the distribution of the sample is normal.[ 11 ] For sample size >300, normality of the data is depend on the histograms and the absolute values of skewness and kurtosis. Either an absolute skewness value ≤2 or an absolute kurtosis (excess) ≤4 may be used as reference values for determining considerable normality.[ 11 ] A histogram is an estimate of the probability distribution of a continuous variable. If the graph is approximately bell-shaped and symmetric about the mean, we can assume normally distributed data[ 12 , 13 ] [ Figure 1 ]. In statistics, a Q–Q plot is a scatterplot created by plotting two sets of quantiles (observed and expected) against one another. For normally distributed data, observed data are approximate to the expected data, that is, they are statistically equal [ Figure 2 ]. A P–P plot (probability–probability plot or percent–percent plot) is a graphical technique for assessing how closely two data sets (observed and expected) agree. It forms an approximate straight line when data are normally distributed. Departures from this straight line indicate departures from normality [ Figure 3 ]. Box plot is another way to assess the normality of the data. It shows the median as a horizontal line inside the box and the IQR (range between the first and third quartile) as the length of the box. The whiskers (line extending from the top and bottom of the box) represent the minimum and maximum values when they are within 1.5 times the IQR from either end of the box (i.e., Q1 − 1.5* IQR and Q3 + 1.5* IQR). Scores >1.5 times and 3 times the IQR are out of the box plot and are considered as outliers and extreme outliers, respectively. A box plot that is symmetric with the median line at approximately the center of the box and with symmetric whiskers indicate that the data may have come from a normal distribution. In case many outliers are present in our data set, either outliers are need to remove or data should treat as nonnormally distributed[ 8 , 13 , 14 ] [ Figure 4 ]. Another method of normality of the data is relative value of the SD with respect to mean. If SD is less than half mean (i.e., CV <50%), data are considered normal.[ 15 ] This is the quick method to test the normality. However this method should only be used when our sample size is at least 50.

An external file that holds a picture, illustration, etc.
Object name is ACA-22-67-g004.jpg

Histogram showing the distribution of the mean arterial pressure

An external file that holds a picture, illustration, etc.
Object name is ACA-22-67-g005.jpg

Normal Q–Q Plot showing correlation between observed and expected values of the mean arterial pressure

An external file that holds a picture, illustration, etc.
Object name is ACA-22-67-g006.jpg

Normal P–P Plot showing correlation between observed and expected cumulative probability of the mean arterial pressure

An external file that holds a picture, illustration, etc.
Object name is ACA-22-67-g007.jpg

Boxplot showing distribution of the mean arterial pressure

For example in Table 1 , data of MAP of the 15 patients are given. Normality of the above data was assessed. Result showed that data were normally distributed as skewness (0.398) and kurtosis (−0.825) individually were within ±1. Critical ratio ( Z value) of the skewness (0.686) and kurtosis (−0.737) were within ±1.96, also evident to normally distributed. Similarly, Shapiro–Wilk test ( P = 0.454) and Kolmogorov–Smirnov test ( P = 0.200) were statistically insignificant, that is, data were considered normally distributed. As sample size is <50, we have to take Shapiro–Wilk test result and Kolmogorov–Smirnov test result must be avoided, although both methods indicated that data were normally distributed. As SD of the MAP was less than half mean value (11.01 <48.73), data were considered normally distributed, although due to sample size <50, we should avoid this method because it should use when our sample size is at least 50 [Tables ​ [Tables2 2 and ​ and3 3 ].

Skewness, kurtosis, and normality tests for mean arterial pressure (mmHg)

K-S: Kolmogorov–Smirnov, SD: Standard deviation, SE: Standard error


Descriptive statistics are a statistical method to summarizing data in a valid and meaningful way. A good and appropriate measure is important not only for data but also for statistical methods used for hypothesis testing. For continuous data, testing of normality is very important because based on the normality status, measures of central tendency, dispersion, and selection of parametric/nonparametric test are decided. Although there are various methods for normality testing but for small sample size ( n <50), Shapiro–Wilk test should be used as it has more power to detect the nonnormality and this is the most popular and widely used method. When our sample size ( n ) is at least 50, any other methods (Kolmogorov–Smirnov test, skewness, kurtosis, z value of the skewness and kurtosis, histogram, box plot, P–P Plot, Q–Q Plot, and SD with respect to mean) can be used to test of the normality of continuous data.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.


The authors would like to express their deep and sincere gratitude to Dr. Prabhat Tiwari, Professor, Department of Anaesthesiology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, for his critical comments and useful suggestions that was very much useful to improve the quality of this manuscript.

Descriptive Research

Descriptive statistics and their important role in research.

Data and statistics are often used in research but these two terms have different distinctions. Data refers to the pieces of information recorded and gathered for the purpose of analysis. Descriptive statistics, on the other hand, is the result of the analysis made through the data gathered.

In order for you to be able to gather data in your statistical study, you have to first determine the population – which can be a group of people, or things, depending on the subject of your descriptive research . A collection of data of one or more variables is called a data set. A variable has two main types: categorical and numerical. Categorical represents data that is qualitative while numerical represents quantitative data. A categorical variable can either be nominal or ordinal. On the other hand, numerical variables are either discrete or continuous. 

Role of Statistics in Research

The role of statistics in research is to be used as a tool in analyzing and summarizing a large volume of raw data and coming up with conclusions on tests being made. The study of statistics is classified into two main branches: descriptive statistics and inferential statistics. Inferential statistics are used for hypotheses testing and estimating the parameters of a population while descriptive statistics is the way of summarizing and organizing sets of data to make it more easily understood by the audience it is meant for. It often describes information through patterns and graphs.

The first and foremost steps being used in data analysis, as it is difficult to analyze raw data in large volumes. Before you are able to go further on your research, you have to first gather and simplify your data sets. 

There are two methods in descriptive statistics: the numerical method and the graphical method. 

Descriptive Statistics and Numerical Methods

Descriptive statistics involves averages, frequencies, and percentages for categorical data, and standard deviations for continuous data. The statistical measures used in descriptive statistics are the measures of central tendency, measures of spread, and measures of skewness.

A measure of central tendency represents the central point of a dataset which involves the mean, median, and mode. Mean, also called as the average, is simply the sum of the data sets divided by the number of terms. The median, on the other hand, is the value in the middle of a data set while the mode is the number that appears the most in a data set.

If the total number of terms in a data set is an even number, you have to get the two values in the middle and solve for its average to be able to get the median. It is also possible to find more than one mode in a data set which is called multimodal. 

A measure of spread, also called as the measure of dispersion, describes the variability of the values in a data set. It involves the computation of range, variance, standard deviation, and quartiles. 

The range tells the value of the distance between the lowest and the hight value. Standard deviation is the measure of dispersion in the center of the value, while the variance is the expectation square of the standard deviation. 

A quartile is classified into three parts: The first quartile or the Q1 is the middle number between the lowest value and the median of the data set. The second quartile (Q2), is the median of the data, while the third quartile (Q3) is the middle value between the median and the highest number in the data set. 

Skewness is the degree of distribution of a variable about its mean. The value of skewness can either be positive, negative, or undefined. When the result is zero, it means that no skewness has occurred.

Descriptive Statistics and Graphical Methods

In descriptive statistics, graphs are used to visualize data analysis and plot data sets. The most common graphs used are pie graphs, bar graphs, line charts, and histograms. Other types of graphs that are less common are the dot plots, and dots and whiskers charts. 

The pie chart is a circular chart to describe a numerical proportion. It is often divided into slices proportionally based on the quantity it represents. In the example below, assume that you created a survey for teenagers to know their top hobbies. On a scale of 100 teens, 40 of them often browse social media which makes it the highest among other hobbies mentioned. Indoor games being 5% based on the pie chart is the least favorite hobby of teenagers. 

Line Charts

A line chart or line graph is a type of chart that illustrates data sets without a specific numerical proportion. It is best and most often used to represent sequential orders and trends over time. For example, you have recently set up a business selling sweet treats. Each week you have used different types of marketing approaches. For you to be able to determine what kind of marketing strategy worked best, you have to illustrate and analyze the sales you’ve made each week. 

Bar graphs illustrate categorical data with vertical or horizontal rectangular bars as their values.  Through a bar graph, the viewer can compare data easily at a glance. Pie charts represent proportion as a whole while bar graphs illustrate each bar with different values or categories. For example, you are selling different ice cream flavors. To know which are the best sellers, you can simply draw a graph presenting each flavor and the number of ice cream being sold. We can easily determine in this graph that the dark chocolate flavor has the most number of ice cream being sold and vanilla being the least among other flavors. 

Analyzing data can be both easy and complicated. With descriptive statistics, we can easily describe and simplify a large volume of raw data and convert it into something that can be easily understood by our viewers. From illustrating the number of sales in the business up to the analysis needed for machine learning, descriptive statistics help us go further beyond every research and know the other steps necessary for data analysis.

Image by  rawpixel

Thanks for Reading

Enjoyed this post? Share it with your networks.

purpose of statistics in thesis

DescriptiveResearch Contributor

Descriptive analysis and its importance in our daily lives.

How does descriptive analysis fit in our daily life? We often hear the word “statistics” in our math classes. However, most of us aren’t aware of the importance of statistics in our life. It plays…

  • Skip to content  (access key: 1)
  • Skip to Search  (access key: 2)

Institutes, schools, other departments, and programs create their own web content and menus.

To help you better navigate the site, see here where you are at the moment.

Current information

Master's thesis topics for the summer semester 2024  .

At the Institute of Management Control and Consulting, topics for master's theses will again be offered for the coming summer semester of 2024.

JKU Managementzentrum Gebäudeansicht

The available topics, including the application and supervision modalities, can be found in the following Moodle course , opens an external URL in a new window .

You can also contribute your own topics.

Johannes Kepler University Linz

Altenberger Straße 69

4040 Linz, Austria

Logo from the Audit hochschuleundfamilie

Use of cookies

Our website uses cookies to ensure you get the best experience on our website, for analytical purposes, to provide social media features, and for targeted advertising. This it is necessary in order to pass information on to respective service providers. If you would like additional information about cookies on this website, please see our data privacy policy .

Required cookies

These cookies are required to help our website run smoothly.

Web statistics cookies

These cookies help us to continuously improve our services and adapt our website to your needs. We statistically evaluate the pseudonymized data collected from our website.

Marketing cookies

These cookies help us make our services more attractive to you as well as optimize our advertising and website content. We analyze and evaluate pseudonymised data collected from our website.

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • For authors
  • New editors
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • Exercise, digital health and chronic disease: feasibility, effectiveness and utilisation (PhD Academy Award)
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Riley C C Brown 1 , 2
  • 1 School of Human Movement and Nutrition Sciences , The University of Queensland , Brisbane , Queensland , Australia
  • 2 Centre for Research on Exercise, Physical Activity and Health (CRExPAH) , The University of Queensland , Brisbane , Queensland , Australia
  • Correspondence to Dr Riley C C Brown, School of Human Movement and Nutrition Sciences, The University of Queensland, Brisbane, QLD 4072, Australia; riley.brown{at}

Statistics from

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

  • Noncommunicable Diseases

What did I do?

My thesis investigated the feasibility, effectiveness and utilisation of digital health physical activity/exercise interventions in chronic disease cohorts. Contemporary literature was reviewed for applicable digital health interventions, 1 a feasibility randomised controlled trial was undertaken to evaluate the feasibility of a patient-centred digital health exercise and diet intervention for people living with kidney and liver disease in tertiary care, 2 3 and the utilisation of telehealth exercise physiology (EP) services in Australia was examined. 4

Why did I do it?

How did i do it.

I conducted a feasibility randomised controlled trial assessing a patient-centred digital health diet and exercise intervention for people with complex chronic disease 2 3 (U-DECIDE trial; n=67; ( figure 1 )). Participants in the intervention group chose their preferred level of engagement with digital health options. The primary outcome was feasibility of delivery, determined via a priori criteria (safety, recruitment, retention, exposure uptake, video consultation adherence). Clinical effectiveness, participant choices and engagement with digital health were also assessed.

  • Download figure
  • Open in new tab
  • Download powerpoint

Video-conferencing exercise session (U-DECIDE randomised controlled trial).

Second, I conducted a systematic review incorporating meta-analysis of the effectiveness of exercise interventions delivered via video-conferencing for people living with chronic disease. 1 Videoconferencing was defined as the use of synchronous telecommunication in the form of a video-linked appointment, and effectiveness was determined by changes in exercise capacity and quality of life.

A cross-sectional study was conducted using U-DECIDE baseline data (n=53). This evaluated the relationships between physical activity, neuromuscular fitness, exercise capacity and the Metabolic Syndrome Severity Score (MetSSS) 6 (unpublished).

Finally, I led an ecological study evaluating the quantity and costs of telehealth EP services reimbursed through the Medicare Benefits Schedule (MBS; part of Australia’s universal healthcare scheme) before and during the COVID-19 pandemic. 4

What did I find?

Findings from the feasibility randomised controlled trial highlighted that the digital health intervention was feasible in a tertiary hospital outpatient setting. No study-related serious adverse events were identified. Video consultation attendance was higher than comparator group review attendance, but did not meet the a priori target of 80% (42% of scheduled video consultations attended). Engagement with digital health options was inconsistent and often highest in the first month of use. Clinical effectiveness measures were underpowered due to recruitment challenges. However, there was no evidence to suggest any significant between-group differences for change in any clinical variable adjusted for baseline.

The systematic review identified that exercise interventions appeared to be effective for improving exercise capacity and quality of life for patients with chronic disease. Meta-analyses identified small-moderate effects favouring video-conferencing in trials using both an exercising and non-exercising comparator. 1 Significant methodological faults were identified, with resultant moderate overall risk of bias and low-moderate certainty in outcome ratings.

The cross-sectional study identified an independent and inverse association between exercise capacity and MetSSS. Surprisingly, no relationship was identified between physical activity or neuromuscular fitness measures and the MetSSS. Exercise interventions influence multiple cardiometabolic risk factors, and it may be more appropriate to evaluate risk factors together rather than independently. This study concluded that clinical trials with exercise interventions are warranted to assess whether changes in exercise capacity reflect significant changes in the MetSSS. This may provide evidence for the use of the MetSSS to monitor the cardiometabolic benefits of exercise training.

The Australian government has recently introduced telehealth compensable item numbers to the MBS, including EP services for patients with chronic disease. 7 The ecological study found that utilisation of telehealth services has been minimal for EP in Australia during the height of COVID-19 restrictions to in-person practice (≥6% of services conducted by telehealth through 2020–2021). Further research and advocacy are needed to ensure that telehealth is a viable option for patients to consider when accessing EP services.

What are the most important clinical impact/practical applications?

The review findings indicated that video-conferencing exercise interventions were effective, safe and feasible in chronic disease settings. The randomised controlled trial determined that digital health interventions are safe and feasible in a tertiary complex chronic disease setting. However, there was a lack of sustained engagement with digital health options despite a high desire at baseline. It is recommended that clinicians implementing a digital health approach to exercise service should co-design approaches to patient preferences to foster engagement. There was low utilisation of telehealth EP services through the COVID-19 pandemic in Australia, indicating a need for advocacy, promotion and telehealth-specific exercise training for professionals.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

The U-DECIDE RCT received ethical approval from the University of Queensland Human Research Ethics Committee (approval number: 2020000127) and the Metro South Human Research Ethics Committee (approval number: HREC/2019/QMS/58285). The ecological study received ethics exemption from the University of Queensland Human Research Ethics Committee (approval number: 2022/HE000136).


RCCB was supervised by Dr Shelley E Keating and Professor Jeff S Coombes from the School of Human Movement and Nutrition Sciences at the University of Queensland. RCCB would like to acknowledge all U-DECIDE investigators including Associate Professor Ingrid J Hickman and Mrs Lindsey Webb for their support.

  • Brown RCC ,
  • Coombes JS ,
  • Jungbluth Rodriguez K , et al
  • Keating SE ,
  • Jegatheesan DK , et al
  • Jegatheesan DK ,
  • Conley MM , et al
  • Snoswell CL , et al
  • Australian Institute of Health and Welfare
  • Carrington MJ
  • Australian Government - Department of Health

Twitter @Riley_Brown96

Contributors RCCB is the sole author.

Funding RCCB received an Australian government research training programme stipend from the University of Queensland from 2020 to 2023. The U-DECIDE Study was supported by the Queensland Health Practitioner Grant 2019–2020 (grant number: N/A) and Metro South Research Support Scheme 2020–2021 (grant number: N/A) and the Departments of Nutrition and Dietetics and Nephrology at the Princess Alexandra Hospital.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:


  1. PPT

    purpose of statistics in thesis

  2. PPT

    purpose of statistics in thesis

  3. Professional Statistics PhD SoP Help

    purpose of statistics in thesis

  4. PPT

    purpose of statistics in thesis

  5. PPT

    purpose of statistics in thesis

  6. English Wizard Online: Sample Statistical Treatment of Data in Chapter

    purpose of statistics in thesis


  1. How to interpret Results in Research/Thesis?

  2. Basic Terms & Concepts of Statistics You Must Know

  3. Statistical Treatment of Data Thesis Writing-English Wizard Online

  4. Introduction to Research and Statistics

  5. Quantitative Research

  6. Introduction to Statistics


  1. The Importance of Statistics in Research (With Examples)

    Reason 1: Statistics allows researchers to design studies such that the findings from the studies can be extrapolated to a larger population. Reason 2: Statistics allows researchers to perform hypothesis tests to determine if some claim about a new drug, new procedure, new manufacturing method, etc. is true.

  2. PDF Why You Need to Use Statistics in Your Research

    Put simply, statistics is a range of procedures for gathering, organis-ing, analysing and presenting quantitative data. 'Data' is the term for facts that have been obtained and subsequently recorded, and, for statisticians, 'data' usually refers to quantitative data that are num-bers.

  3. Writing with Descriptive Statistics

    One of the reasons to use statistics is to condense large amounts of information into more manageable chunks; presenting your entire data set defeats this purpose. At the bare minimum, if you are presenting statistics on a data set, it should include the mean and probably the standard deviation.

  4. Role of Statistics in Research

    Statistics is a branch of science that deals with collection, organization and analysis of data from the sample to the whole population. Moreover, it aids in designing a study more meticulously and also give a logical reasoning in concluding the hypothesis.

  5. The Beginner's Guide to Statistical Analysis

    Knowledge Base Statistics The Beginner's Guide to Statistical Analysis | 5 Steps & Examples Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organizations.

  6. Statistics

    The purpose of this handout is to help you use statistics to make your argument as effectively as possible. Introduction Numbers are power. Apparently freed of all the squishiness and ambiguity of words, numbers and statistics are powerful pieces of evidence that can effectively strengthen any argument. But statistics are not a panacea.

  7. Descriptive Statistics

    Descriptive Statistics Descriptive Statistics The mean, the mode, the median, the range, and the standard deviation are all examples of descriptive statistics. Descriptive statistics are used because in most cases, it isn't possible to present all of your data in any form that your reader will be able to quickly interpret.

  8. The Importance of Statistics

    Statistics is a crucial process behind how we make discoveries in science, make decisions based on data, and make predictions. Statistics allows you to understand a subject much more deeply. In this post, I cover two main reasons why studying the field of statistics is crucial in modern society.

  9. Statistical Methods in Theses: Guidelines and Explanations

    Guidelines and Explanations. In light of the changes in psychology, faculty members who teach statistics/methods have reviewed the literature and generated this guide for graduate students. The guide is intended to enhance the quality of student theses by facilitating their engagement in open and transparent research practices and by helping ...

  10. Introduction: Statistics as a Research Tool

    The Purpose of Statistics Is to Clarify. It sometimes seems as if researchers use statistics as a kind of secret language. In this sense, statistics provide a way for the initiated to share ideas and concepts without including the rest of us. Of course, it is necessary to use a common language to report research results.

  11. Thesis Life: 7 ways to tackle statistics in your thesis

    This assignment involves proposing a research question, tackling it with help of some observations or experiments, analyzing these observations or results and then stating them by drawing some conclusions. Since it is an immitigable part of your thesis, you can neither run from statistics nor cry for help. The penultimate part of this process ...

  12. Data and your thesis

    The term 'data' is more familiar to researchers in Science, Technology, Engineering and Mathematics (STEM), but any outputs from research could be considered data. For example, Humanities, Arts and Social Sciences (HASS) researchers might create data in the form of presentations, spreadsheets, documents, images, works of art, or musical scores.

  13. Dissertation Results/Findings Chapter (Quantitative)

    The results chapter (also referred to as the findings or analysis chapter) is one of the most important chapters of your dissertation or thesis because it shows the reader what you've found in terms of the quantitative data you've collected. It presents the data using a clear text narrative, supported by tables, graphs and charts.

  14. Basic statistical tools in research and data analysis

    Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population. This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. ... The purpose is to answer or test the ...

  15. (PDF) Use of Statistics in Research

    The function of statistics in research is to purpose as a tool in conniving research, analyzing its data and portrayal of conclusions there from. Most research studies result in a extensive...

  16. PDF Guideline to Writing a Master's Thesis in Statistics

    The aim of this document is to help master's students develop effective technical skills in writing a master's thesis in statistics. The contents are meant to reflect the System of Qualifications in the Higher Education Ordinance. Recommendations and guidelines regarding the structure and content of a master's thesis are given.

  17. Descriptive Statistics

    Types of descriptive statistics. There are 3 main types of descriptive statistics: The distribution concerns the frequency of each value. The central tendency concerns the averages of the values. The variability or dispersion concerns how spread out the values are. You can apply these to assess only one variable at a time, in univariate ...

  18. PDF Introduction to Statistics

    Statistics is a branch of mathematics used to summarize, analyze, and interpret a group of numbers or observations. ... graphs serve a similar purpose to summarize large and small sets of data. Most often, researchers collect data from a portion of individuals in a group of interest. For example, the 50, 100, or 1,000 students in the anxiety ...

  19. What Is Statistics?

    6) At their core, most disciplines think and learn about some particular aspects of life and the world, be it the physical nature of the universe, living organisms, or how economies or societies function. Statistics is a meta-discipline in that it thinks about how to think about turning data into real-world insights.

  20. What do senior theses in Statistics look like?

    What do senior theses in Statistics look like? This is a brief overview of thesis writing; for more information, please see our complete guide here. Senior theses in Statistics cover a wide range of topics, across the spectrum from applied to theoretical. Typically, senior theses are expected to have one of the following three flavors: 1.

  21. The Importance of Statistics in Dissertations

    While students are required to write a long thesis on any chosen topic, statistics is required to validate and certify the arguments made. Statistics, owing to its accuracy in facts and figures, go a long way in attesting to the truth. ... Academically, dissertation statistics is extremely relevant and serves the purpose of attaining success ...

  22. Descriptive Statistics and Normality Tests for Statistical Data

    Descriptive statistics are an important part of biomedical research which is used to describe the basic features of the data in the study. They provide simple summaries about the sample and the measures. Measures of the central tendency and dispersion are used to describe the quantitative data. For the continuous data, test of the normality is ...

  23. Descriptive Statistics and Their Important Role in Research

    The role of statistics in research is to be used as a tool in analyzing and summarizing a large volume of raw data and coming up with conclusions on tests being made. The study of statistics is classified into two main branches: descriptive statistics and inferential statistics.

  24. Master's thesis topics for the summer semester 2024

    At the Institute of Management Control and Consulting, topics for master's theses will again be offered for the coming summer semester of 2024. The available topics, including the application and supervision modalities, can be found in the following Moodle course. You can also contribute your own topics.

  25. Exercise, digital health and chronic disease: feasibility

    My thesis investigated the feasibility, effectiveness and utilisation of digital health physical activity/exercise interventions in chronic disease cohorts. Contemporary literature was reviewed for applicable digital health interventions,1 a feasibility randomised controlled trial was undertaken to evaluate the feasibility of a patient-centred digital health exercise and diet intervention for ...

  26. Tax Statistics and Trends 2024

    Furthermore, the share of income taxes paid by the top 1% increased from 33.2% in 2001 to 42.3% in 2020. In the same time frame, the share of income taxes paid by those in the bottom 50% decreased ...