What is Hypothesis Testing?

date posted: 2020-06-07




TMI of the day

  1. Ate puradak, which is a thing in korea it is basically "high end" chicken. Waassss good. But this time I had with bone, I prefer boneless.


What is Hypothesis Testing?

For a while I had hard time understanding what exactly Hypothesis testing is since there are too many confusing statistical jargons. Here I will go through an example question and explain which part I was confused and explain differences between two jargons that might be confusing to most people like me as well as explain what hypothesis testing is and when and how to use them.

Here are some important jargons in understanding what hypothesis testing is and terms you probably have to come back to multiple times:

  • Null hypothesis = Initial belief in your experiment.
  • Alternative hypothesis = New belief that you want to verify by rejecting null hypothesis.
  • Z-score = How many standard deviation away from the mean is this value you want to test.
  • P-value = probability of such event given null hypothesis is true.
  • Type I error = Rejecting null when it is true. Also known as false positive
  • alpha = significance level = probabilty of Type I error occuring.
  • Type II error = Accepting null hypothesis when it is false. Also known as false negative.
  • Test Statistics = statistic used to reject null or fail to reject null.

Hypothesis testing is used to conclude whether the result your experiment produced means something. For example if you invented a drug for covid-19 and want to see if it actually cures people with the virus we would use hypothesis testing to see if it is the drug that cures instead of some other factore we are not aware of.

Before we move on lets clarify four terms that were confusing to me, P-value, type I error, alpha, and significance level. If you want to test effectiveness of your drug you wouldn't try in on all population since it would be expensive thus select sample from population then experiment. So with the outcome of the sample we have to conclude whether our intial belief (drug is effective) is true or reject intial belief (drug is ineffective).

Probability of getting outcome you've got from the sample assuming intial belief is true is called a p-value. Low p-value implies that assuming inital belief outcome is very unlikely to happen thus it would be appropriate to reject intial belief. p-value = 0.05 implies 5% chance outcome is likely to happen if initial belief is true. Alpha and significance level basically mean the same thing, it means how low of a percentage will you accept for rejecting initial belief. Setting alpha = 0.05 means that if p-value is less than 0.05 we are 95% confident in rejecting initial belief thus alternative hypothesis is accepted and it is statistically significant. If alpha = 0.01 it means that p-value has to be less than 0.01 to be able to reject inital belief.

Alpha and significance level are threshold for deciding whether you can reject initial belief. In other words it is how much risk you are willing to take, that is how much Type I error is acceptable? If you are building a recommendation system you would set alpha to be 0.05 or higher since giving incorrect recommendation is not crucial to survival, in other words it is acceptible to have Type I error.

But why use hypothesis testing when we could just look at the outcome of experiment and make a conclusion?


Why Hypothesis Testing?

It is a way of reducing subjectivity leading to less bias.

Let me show you why using an example.

Say you are running candy bar factory and on average candy bar's weight is 5g. New employee recruited today comes into your office and tells you that average weight of candy bars are not 5g.

Your initial belief is candy bars are 5g, this is called a Null hypothesis. We assume this is true until new contradiction is made. This new contradiction from new employee is called an Alternative hypothesis. We assume our initial belief is true since we've been running candy factory for a while and if new employee provides enough evidence that his/her accusation is true then we reject our initial belief and accept new belief that average weight of candy bars are not 5g.

So to test this we would select 50 candy bars at random and calculate average weight. Doing this 3 times and outputs average of 5.02g, 5.65g, and 7.22g. For first outcome most people will agree that our null is true since 5.02g is pretty close to 5g. Our last outcome 7.22g, most people will agree with alternative hypothessis. What about when sample outputted average weight of 5.65g? Some might agree on null hypothesis and some might not => subjective.

To reduce subjectivity in our conclusion we use hypothesis testing.

We have basic understanding of hypothesis testing, lets fully understand its usage with two examples: