In hypothesis testing on the population using data from a sample,
Why / How is it assumed beforehand that the population variance / standard deviation would be the same as is observed in the sample observations.
In calculation of the Z-score using sample mean, we input the sample variance/SD into the formula for population variance/SD. Seeking help / clarification on the theoretical basis of this please.
Thanks
Hello Saurav,
The first thing is to understand that there are different hypothesis tests designed for different situations. For example, While testing for means, when we conduct what is called the Z-test, we assume that we know the population variance but do not know the population mean. The Z statistic is:
Now the above case in which we know the population variance but not the population mean is very unrealistic but is often taught as an introductory hypothesis test due to its simplicity. A more realistic case is that of a T-test, which is used when we do not know either the population mean or the population variance. Hence you will find it more applied in any kind of research and even in regression analysis.
As we know that the t-distribution has fatter tails compared to a standard normal distribution. We account for using the sample standard deviation (s) instead of population standard deviation (sigma), by the added uncertainty of estimation due to fatter tails of a t-distribution. The
T-statistic is given by:
Do get back if further help is needed.