How to Solve Hypothesis Testing: Step-by-Step
Subject Mastery

How to Solve Hypothesis Testing: Step-by-Step

By Jonas29 June 202612 min read
Key Takeaways
Hypothesis testing follows five steps: state hypotheses, choose alpha, calculate the test statistic, find the p-value, then decide and conclude in context.
The p-value is the probability of observing data at least as extreme as yours if the null hypothesis were true. It is not the probability the null is false.
Use a z-test when the population standard deviation is known; use a t-test when it is not, with degrees of freedom equal to n minus 1.
The most common mistake is deciding significance after seeing the data. Set alpha before you calculate anything.
Rejecting the null does not prove the alternative. Statistical significance and practical significance are two separate questions.

Most students who struggle with hypothesis testing get the arithmetic right and the logic wrong. The five-step procedure covered in this post solves both: it shows every calculation and explains why each step exists, so you can carry it into any exam scenario, not just the one you practiced on.

Two fully worked examples follow the method section, one using a z-test and one using a t-test. Each shows every number from setup to conclusion. The section on the most common mistake covers the p-value misreading that costs marks even when the calculation is correct.

What Is Hypothesis Testing?

Hypothesis testing is a statistical procedure for deciding whether sample data provide enough evidence to reject a specific claim about a population. The procedure answers one question: could the observed result plausibly have occurred by random chance if the null hypothesis were true?

The method does not prove anything. It quantifies how surprised you should be by your data under a specific assumption. That surprise is measured by the p-value, which tells you how often random sampling alone would produce a result at least as extreme as yours if the null were correct.

The Null Hypothesis and the Alternative Hypothesis

Every hypothesis test starts with two competing statements. The null hypothesis (H0) is the default claim of no effect, no difference, or no relationship. The alternative hypothesis (H1 or Ha) is the specific claim you want to test, usually that something has changed, differs, or exceeds a threshold. The null is what you assume true until the data argue strongly otherwise.

A common phrasing: a pharmaceutical company claims a new drug reduces average recovery time below the population mean of 14 days. The null hypothesis states the mean recovery time with the drug equals 14 days. The alternative states the mean is less than 14 days. The test asks whether the sample data from the drug trial are consistent with H0 or whether they shift sufficiently toward H1 to justify rejecting the null.

One-Tailed Versus Two-Tailed Tests

A two-tailed test checks whether the population parameter differs from the null value in either direction. A one-tailed test checks whether it differs specifically upward or specifically downward. The choice must follow from the research question, not from inspection of the data.

Test typeTwo-tailed
H1 formatmu does not equal mu0
Rejection regionBoth tails (alpha/2 each)
When to useAny difference in either direction is interesting
Test typeLeft-tailed (lower)
H1 formatmu is less than mu0
Rejection regionLeft tail only (full alpha)
When to useYou predict the parameter decreased
Test typeRight-tailed (upper)
H1 formatmu is greater than mu0
Rejection regionRight tail only (full alpha)
When to useYou predict the parameter increased

Set the test direction before collecting data. Changing from two-tailed to one-tailed after seeing the results inflates the Type I error rate.

The Five-Step Method for Hypothesis Testing

The five-step structure below works for every standard parametric test: z-tests, t-tests, chi-square tests, and F-tests all follow the same logical chain. Master the chain with z and t, and the other tests slot in at Step 3.

Five-Step Hypothesis Testing MethodFive vertical steps connected by downward arrows. Step 1: State hypotheses. Step 2: Choose alpha. Step 3: Calculate test statistic. Step 4: Find p-value. Step 5: Decide and conclude.The Five-Step Hypothesis TestStep 1State H0 and H1 in terms of the population parametere.g. H0: mu = 14, H1: mu < 14Step 2Choose significance level alpha before seeing resultsTypically alpha = 0.05 (or 0.01 in medical/pharma research)Step 3Calculate the test statistic (z or t)z = (x-bar - mu0) / (sigma / sqrt(n)) or t = (x-bar - mu0) / (s / sqrt(n))Step 4Find p-value from z / t table at correct degrees of freedomTwo-tailed: double the single-tail probabilityStep 5Reject H0 if p ≤ alpha; state conclusion in problem context
Steps 1 and 2 happen before you touch the data. Steps 3 and 4 use the data. Step 5 connects the arithmetic to the real-world question.

Step 1: State the Hypotheses

Write both hypotheses in terms of the population parameter, not the sample. For a test about a population mean, use mu. For a test about a proportion, use p. Always include an equals sign in H0 because the test statistic is computed under the assumption that H0 is exactly true.

Write H0 and H1 before you look at any data. If you choose a one-tailed direction after seeing that your sample mean went a particular way, your significance level is no longer what alpha claims. The test is only valid when the direction is set by the research question, not by the data.

Step 2: Choose the Significance Level

Alpha, the significance level, is the probability of a Type I error you are willing to accept. The standard choice across most university statistics courses is 0.05. Choosing 0.05 means that in a long series of experiments where H0 is true, you would incorrectly reject it 5% of the time. Fields with high consequences for false positives, such as clinical trials and drug approval, often require 0.01 or 0.001.

Never Choose Alpha After Seeing the p-Value

Setting alpha to 0.06 after you calculate a p-value of 0.055 is called p-hacking. It inflates your actual Type I error rate far above your stated alpha and renders the test results invalid. Regulators, journals, and instructors treat post-hoc alpha adjustment as a methodological error. Set alpha once, before computing anything.

Step 3: Calculate the Test Statistic

The test statistic converts your sample result into a standardised number that measures how many standard errors the sample mean sits away from the null-hypothesis mean. Two formulas cover most introductory statistics tests.

For a z-test (population standard deviation sigma is known):
z = (x-bar − mu0) ÷ (sigma ÷ √n)

For a one-sample t-test (population standard deviation unknown, estimated by the sample standard deviation s):
t = (x-bar − mu0) ÷ (s ÷ √n)

The denominator in both formulas is the standard error, the standard deviation of the sampling distribution of the mean. A larger sample shrinks the standard error, making it easier to detect a real difference.

Step 4: Find the p-Value

Once you have the test statistic, locate the corresponding probability in the appropriate distribution. For a z-test, use the standard normal distribution. For a t-test, use the t-distribution with n − 1 degrees of freedom. Most statistics tables give you the area in the tail beyond your test statistic.

For a two-tailed test, double the single-tail probability. A z-statistic of 2.1 places 0.018 in the upper tail. For a two-tailed test, p equals 0.036.

Step 5: Make the Decision and State the Conclusion

Compare p to alpha. If p ≤ alpha, reject H0. If p > alpha, fail to reject H0. Note the language: you never “accept” H0. Failing to reject means the data do not provide sufficient evidence against it, not that the null is confirmed true.

Always state your conclusion in the language of the original problem. “Reject H0” alone earns no credit on most university assessments. The conclusion should read: “At the 5% significance level, there is sufficient evidence to conclude that the population mean recovery time with the drug is less than 14 days.”

Worked Example 1: One-Sample z-Test

A bottling plant fills cans with a target mean of 330 ml. The population standard deviation of fill volumes is known to be 4.2 ml from years of manufacturing data. A quality inspector draws a random sample of 36 cans and finds a sample mean of 328.3 ml. At the 5% significance level, is there evidence that the machine is underfilling?

Setting Up the Test

Step 1 (State the hypotheses):H0: mu = 330 ml; H1: mu < 330 ml. This is a one-tailed (left-tailed) test because the inspector is only concerned about underfilling.

Step 2 (Choose alpha):alpha = 0.05. The critical z-value for a one-tailed left test at 0.05 is −1.645.

Calculating the z-Score and p-Value

Step 3 (Test statistic):

Standard error = sigma ÷ √n = 4.2 ÷ √36 = 4.2 ÷ 6 = 0.70

z = (x-bar − mu0) ÷ SE = (328.3 − 330) ÷ 0.70 = −1.7 ÷ 0.70 = −2.43

Step 4 (p-value):The area to the left of z = −2.43 in a standard normal table is approximately 0.0075.

Step 5 (Decision):p = 0.0075 < alpha = 0.05, so reject H0. The test statistic −2.43 also lies beyond the critical value −1.645.

Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the true mean fill volume is less than 330 ml. The machine appears to be underfilling.

One-Sample z-Test: Rejection Region and Test StatisticBell curve centred at zero. The left tail beyond negative 1.645 is shaded in magenta to show the rejection region. A vertical line at negative 2.43 marks the test statistic, which falls inside the rejection region, leading to rejection of H0.z-Test: Rejection Region and Test Statistic0-1.645-2.43Rejectionregionalpha = 0.05z = -2.43Reject H0CriticalvalueDo not reject H0
The test statistic z = −2.43 falls beyond the critical value of −1.645, placing it inside the shaded rejection region. The corresponding p-value of 0.0075 sits below alpha = 0.05.
z = -2.43
test statistic for the bottling example
p-value of 0.0075 is below alpha 0.05, so we reject H0 at the 5% significance level.

Worked Example 2: One-Sample t-Test

A university administrator claims that the average study time per week for full-time students in a particular program is 25 hours. A researcher suspects the true average is higher. The researcher surveys a random sample of 16 students and finds a sample mean of 27.4 hours and a sample standard deviation of 5.1 hours. Test the administrator's claim at the 1% significance level.

Setting Up the t-Test

Step 1 (Hypotheses):H0: mu = 25 hours; H1: mu > 25 hours. One-tailed right test because the researcher predicts the mean is higher.

Step 2 (Alpha):alpha = 0.01. Degrees of freedom = n − 1 = 16 − 1 = 15. The critical t-value for a one-tailed right test at alpha = 0.01 with 15 df is approximately 2.602.

Calculating t and Comparing to the Critical Value

Step 3 (Test statistic):

Standard error = s ÷ √n = 5.1 ÷ √16 = 5.1 ÷ 4 = 1.275

t = (x-bar − mu0) ÷ SE = (27.4 − 25) ÷ 1.275 = 2.4 ÷ 1.275 = 1.882

Step 4 (p-value): For t = 1.882 with 15 degrees of freedom, the upper-tail probability falls between 0.025 and 0.05 (using t-tables, approximately 0.039).

Step 5 (Decision):p ≈ 0.039 > alpha = 0.01, so fail to reject H0. The test statistic 1.882 also lies below the critical value 2.602.

Conclusion:At the 1% significance level, there is insufficient evidence to conclude that the true mean weekly study time exceeds 25 hours. The administrator's claim is consistent with the data at this level.

Same Data, Different Decision at Different Alpha

If the researcher had chosen alpha = 0.05 instead of 0.01, the p-value of 0.039 would fall below alpha and H0 would be rejected. This example shows precisely why alpha must be set before data collection. Choosing alpha after you see the p-value lets you reverse any decision by picking a conveniently larger or smaller significance threshold.

One-Sample t-Test at 1% Significance LevelBell-shaped t-distribution curve centred at zero with heavier tails than the normal. The right tail beyond t equals 2.602 (the critical value at alpha 0.01, 15 df) is shaded red. The test statistic t equals 1.882 is shown as a dashed line that does not reach the shaded region.t-Test (df = 15): Test Statistic vs Critical Value01.8822.602t = 1.882Fail to rejectRejectionregionalpha = 0.01Do not reject H0
The test statistic of 1.882 does not reach the critical value of 2.602 at alpha = 0.01 with 15 df. Had alpha been 0.05 (critical value approximately 1.753), the same test statistic would have crossed into the rejection region.

The subject calculators hub includes statistical tools that can check your test-statistic and p-value calculations once you have worked through the steps by hand.

Statistics Calculators

Check your z-scores, t-statistics, and p-values using the statistical tools in the subject calculators hub.

Use the Calculator

The Most Common Mistake: Misreading the p-Value

The p-value trips up more students than any step in the calculation. Here is the correct reading: the p-value is the probability of observing a test statistic at least as extreme as yours, assuming H0 is true. It is not the probability that H0 is true. It is not the probability that H1 is true. These statements sound close to correct, but they describe fundamentally different things.

A p-value of 0.04 means: if the null hypothesis were true and you ran this experiment thousands of times, 4% of those experiments would produce a test statistic as extreme or more extreme than yours, purely by random sampling variation. Nothing more. The p-value gives no information about the probability of any hypothesis being true. That inference requires prior probabilities, which classical hypothesis testing does not incorporate.

Type I and Type II Errors

Every hypothesis test carries two error risks. A Type I error (false positive) rejects a true null hypothesis. Its probability equals alpha. A Type II error(false negative) fails to reject a false null hypothesis. Its probability is called beta, and power (1 − beta) measures the test's ability to detect a real effect.

Type I and Type II Error Decision MatrixA grid with two rows (decision: reject H0 and fail to reject H0) and two columns (truth: H0 is true and H0 is false). The reject-when-true cell is labelled Type I error. The fail-to-reject-when-false cell is labelled Type II error. The other two cells are correct decisions.Hypothesis Test Decision MatrixReality (unknown)H0 is TRUEH0 is FALSEYour decisionRejectH0Fail toreject H0Type I ErrorFalse positiveProbability = alphaCorrect RejectionTrue positiveProbability = power (1 - beta)Correct Non-RejectionTrue negativeProbability = 1 - alphaType II ErrorFalse negativeProbability = beta
Lowering alpha reduces Type I errors but increases Type II errors. Increasing sample size is the only way to reduce both simultaneously.

The trade-off between error types is a constraint, not a failure. Choosing a stricter alpha (0.01 instead of 0.05) lowers the risk of false positives but raises the risk of missing real effects. Larger samples increase power and let you lower both simultaneously. The quantitative revision guide covers how to build comfort with this kind of numerical reasoning before your statistics assessments.

For connected topics in your statistics course, the worked-example guide on limits follows the same step-by-step format if your program covers calculus alongside statistics. The exam time management guide covers how to allocate your minutes across multi-part statistics questions under timed conditions.

If you want to talk through a hypothesis testing problem, get an explanation of where your working went wrong, or work through practice questions at your exact level:

Key Takeaways

  1. Hypothesis testing follows five steps in order: state H0 and H1, choose alpha, calculate the test statistic, find the p-value, and state the decision and conclusion in the language of the original problem.
  2. The null hypothesis always contains an equality. H0: mu = mu0. The alternative states the direction of the claim. Always set the direction before seeing the data.
  3. Use a z-test when the population standard deviation is known. Use a t-test when it is not, with degrees of freedom equal to n minus 1 for a one-sample test.
  4. The p-value is the probability of observing your test statistic or something more extreme, given that H0 is true. It is not the probability that H0 or H1 is correct.
  5. Set alpha before you calculate anything. Choosing or adjusting alpha after you see the p-value invalidates the test and inflates the real Type I error rate above the stated alpha.
  6. A Type I error rejects a true null, with probability equal to alpha. A Type II error fails to reject a false null, with probability called beta. Increasing sample size reduces both simultaneously; lowering alpha alone merely trades one for the other.
  7. Always state conclusions in the context of the original problem with explicit reference to the significance level, not just “reject H0.”

For further practice, the university resources hub links to subject-specific tools. The grade calculators hub can help you track where statistics sits in your overall module average so you know how much weight this topic carries. The matrix multiplication worked example follows the same detailed format if linear algebra appears alongside your statistics course.

OpenStax Introductory Statistics covers hypothesis testing in Chapters 9 and 10 with additional worked examples and free access. MIT OpenCourseWare's Statistics for Applications (18.650) provides lecture notes and problem sets at a higher level. Penn State's STAT 415 course notes include step-by-step hypothesis testing examples with t-tables.

Related articles

Try a free AI tutoring session