What is Hypothesis Testing ?
When we talked about statistic machine learning, the first thing that comes to our mind is data. Data is nothing if we can't derive useful information using it. We use hypothesis testing to conclude or interpret the data and to make statements about the population using sample data.
Now let's try to understand the above statement using an example. For example, a company named ABC sell hair oil in India. Now the company wants to know how many Indians like the oil that they sell. But it's difficult to survey such a vast population. So, we make samples of the population and conducted surveys on samples, and concluded that 30% of the population like the hair oil produced by the company. In simple terms, we did some statistics on samples to conclude a result on population.
Now, let's suppose the company wants to launch a new hair oil. So, the company makes an assumption (hypothesis) based on previous hair oil that at least 30% of people will like the new hair oil.
To check how much the hypothesis of the company is correct, we do some statistical tests.
The assumptions, in this case, called the hypothesis and statistical test called hypothesis tests.
Types of hypothesis:
Null hypothesis (H0): Default assumption or the assumption that has nothing to change. To understand this, let's consider the above example. In the null hypothesis, the company will assume that the new hair oil will get the same popularity as the old one (that is 30%), which means there is no change in new and old values or assumptions.
But, what if it is not true or does not have enough evidence to be true.
Then there should be an alternate hypothesis (or first hypothesis) that resists the idea of the null hypothesis.
Alternate (First) hypothesis (H1 / Ha): The Assumption, that resists the idea of the null hypothesis. Simply if there is no null hypothesis then there is an alternate hypothesis.
Rejection of hypothesis
Now we know that what is the null hypothesis and what is an alternate hypothesis.
Rejection of a hypothesis occurs when we don't have enough evidence to prove it correct. In the above example if we don't have enough evidence to prove that 30% of the population will be going to like the new hair oil of the company then this hypothesis is got rejected.
And the reasons for rejection either lack evidence or less significance of the statement (hypothesis).
And it's obvious if one hypothesis got rejected then the other will be accepted.
Type 1 error: If we know that the null hypothesis is true but got rejected due to the lack of evidence then it is called, type 1 error.
Type 2 error: If an alternate hypothesis got rejected due to lack of evidence and we have to accept null hypothesis, although it is not true, it is called, type 2 error.
As you see in the diagram that at a time one hypothesis is got selected and the other got rejected.
What is The level of Significance and P value ?
The level of significance and degree of significance is criteria in which we reject or accept the null hypothesis. We can't accept or reject a hypothesis 100%. So basically we have to find out the likelihood of rejection or acceptance of a hypothesis. The level of significance is also denoted by alpha (α) and its value is generally consider 5% or 0.005.
P value or probability value is a extreme value than the one you got from the experiment, when null hypothesis is true.
We calculate P value using different hypothesis test. Significance of p value:
If p value <= alpha - Null hypothesis rejected
If p value > alpha - Null Hypothesis accepted
Question: What are different type of hypothesis test ?