Test Statistic Calculator
Calculate z-score, t-score, chi-square, or F-statistic with confidence intervals
Calculation Results
Comprehensive Guide: How to Calculate Test Statistics (With Examples)
Test statistics are fundamental tools in inferential statistics that help researchers determine whether to reject or fail to reject a null hypothesis. This guide explains the four primary test statistics—z-score, t-score, chi-square, and F-statistic—with step-by-step calculations, real-world applications, and interpretation guidelines.
1. Understanding Test Statistics
A test statistic measures how far your sample data diverges from the null hypothesis. The formula varies by test type but generally follows this structure:
Test Statistic = (Observed Value – Expected Value) / Standard Error
Key Components:
- Observed Value: Your sample mean or proportion
- Expected Value: Population parameter under H₀
- Standard Error: Standard deviation of the sampling distribution
2. Z-Test (Normal Distribution)
Used when:
- Population standard deviation (σ) is known
- Sample size ≥ 30 (Central Limit Theorem)
- Data is normally distributed
Formula:
z = (x̄ – μ) / (σ / √n)
Example Calculation:
A factory claims their lightbulbs last 1,000 hours (μ). A sample of 50 bulbs (n) lasts 990 hours (x̄) with σ = 25. Test at α = 0.05:
- State hypotheses:
- H₀: μ = 1000
- H₁: μ ≠ 1000 (two-tailed)
- Calculate z-score:
z = (990 – 1000) / (25 / √50) = -10 / 3.535 ≈ -2.83
- Critical z-value for α/2 = 0.025 is ±1.96
- Since |-2.83| > 1.96, reject H₀
3. T-Test (Student’s t-Distribution)
Used when:
- Population standard deviation is unknown
- Sample size < 30
- Data is approximately normal
Formula (One-Sample t-test):
t = (x̄ – μ) / (s / √n)
Degrees of freedom (df) = n – 1
Comparison: Z-Test vs. T-Test
| Feature | Z-Test | T-Test |
|---|---|---|
| Population σ known | ✅ Yes | ❌ No (uses sample s) |
| Sample size requirement | n ≥ 30 | Any n (but prefers n < 30) |
| Distribution shape | Normal (always) | t-distribution (heavier tails) |
| Critical values | Fixed for given α | Vary by degrees of freedom |
4. Chi-Square Test (Goodness of Fit)
Tests whether observed frequencies match expected frequencies. Common applications:
- Genetics (Mendelian ratios)
- Market research (preference distributions)
- Quality control (defect categories)
Formula:
χ² = Σ [(Oᵢ – Eᵢ)² / Eᵢ]
df = number of categories – 1
Example:
A casino suspects a die is loaded. After 120 rolls:
| Face | Observed (O) | Expected (E) | (O-E)²/E |
|---|---|---|---|
| 1 | 15 | 20 | 1.25 |
| 2 | 25 | 20 | 1.25 |
| 3 | 18 | 20 | 0.20 |
| 4 | 22 | 20 | 0.20 |
| 5 | 19 | 20 | 0.05 |
| 6 | 21 | 20 | 0.05 |
| Total χ² | 3.00 | ||
Critical χ² (df=5, α=0.05) = 11.07. Since 3.00 < 11.07, fail to reject H₀ (die is fair).
5. F-Test (Variance Comparison)
Compares variances from two populations. Used to:
- Test homogeneity of variance (ANOVA assumption)
- Compare precision between measurement methods
Formula:
F = s₁² / s₂² (where s₁² > s₂²)
df₁ = n₁ – 1, df₂ = n₂ – 1
Interpretation Rules:
- Always place larger variance in numerator
- F-distribution is right-skewed
- Critical values depend on both df₁ and df₂
6. Practical Considerations
Choosing the Right Test:
| Scenario | Recommended Test | Key Assumptions |
|---|---|---|
| Comparing single mean to population mean, σ known | Z-test | Normality, independence |
| Comparing single mean to population mean, σ unknown | One-sample t-test | Approximate normality |
| Comparing two independent means | Independent t-test | Equal variances (check with F-test) |
| Categorical data analysis | Chi-square | Expected frequencies ≥5 per cell |
| Testing variance equality | F-test | Normal population distributions |
Common Mistakes to Avoid:
- Ignoring assumptions: Always check normality (Shapiro-Wilk test) and variance equality (Levene’s test)
- Misinterpreting p-values: A p-value of 0.04 means “reject H₀ at α=0.05”, not “probability H₀ is true”
- Multiple testing: Running many tests on the same data inflates Type I error (use Bonferroni correction)
- Confusing statistical vs. practical significance: A tiny effect with large n may be “statistically significant” but meaningless
7. Advanced Topics
Effect Size Measures:
Test statistics tell you if an effect exists, but not its magnitude. Always report effect sizes:
- Cohen’s d: (x̄₁ – x̄₂) / s_pooled (0.2=small, 0.5=medium, 0.8=large)
- η²: SS_between / SS_total (proportion of variance explained)
- Cramer’s V: χ²/(n*min(r-1,c-1)) for chi-square (0-1 scale)
Power Analysis:
Before collecting data, calculate required sample size to detect an effect:
- Specify α (typically 0.05)
- Choose desired power (1-β, typically 0.80)
- Estimate effect size (from pilot data or literature)
- Use power analysis software (G*Power, R pwr package)
Nonparametric Alternatives:
When normality assumptions are violated:
- Mann-Whitney U: Alternative to independent t-test
- Wilcoxon signed-rank: Alternative to paired t-test
- Kruskal-Wallis: Alternative to one-way ANOVA
- Friedman test: Alternative to repeated-measures ANOVA
8. Real-World Applications
Case Study 1: Pharmaceutical Drug Testing
A pharmaceutical company tests a new cholesterol drug on 100 patients (n=100). The sample mean LDL reduction is 32 mg/dL (x̄) with s=12 mg/dL. The existing drug reduces LDL by 28 mg/dL (μ).
Calculation:
t = (32 – 28) / (12/√100) = 4 / 1.2 = 3.33
df = 99, critical t (α=0.05, two-tailed) ≈ 1.984
Decision: Reject H₀ (3.33 > 1.984). The new drug shows significantly greater efficacy (p < 0.01).
Case Study 2: Manufacturing Quality Control
A factory claims only 2% of their products are defective. A random sample of 400 items reveals 15 defects.
Chi-square goodness-of-fit test:
Expected defects = 400 * 0.02 = 8
χ² = (15-8)²/8 + (385-392)²/392 ≈ 6.125
Critical χ² (df=1, α=0.05) = 3.841
Decision: Reject H₀ (6.125 > 3.841). The defect rate exceeds the claimed 2% (p < 0.05).
9. Software Implementation
Calculating Test Statistics in Python:
# Z-test in Python
from statsmodels.stats.weightstats import ztest
z_score, p_value = ztest(sample_data, value=population_mean)
# T-test in Python
from scipy import stats
t_stat, p_value = stats.ttest_1samp(sample_data, population_mean)
# Chi-square test
chi2_stat, p_value, df, expected = stats.chi2_contingency(observed_table)
# F-test for variances
f_stat = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
p_value = 1 - stats.f.cdf(f_stat, dfn=len(sample1)-1, dfd=len(sample2)-1)
Excel Functions:
=Z.TEST(array, μ, [σ])=T.TEST(array1, array2, tails, type)=CHISQ.TEST(observed_range, expected_range)=F.TEST(array1, array2)(returns p-value directly)
10. Frequently Asked Questions
Q: Can I use a z-test with small samples?
A: Only if the population standard deviation is known and the data is normally distributed. Otherwise, use a t-test.
Q: What’s the difference between one-tailed and two-tailed tests?
A: One-tailed tests for an effect in one direction (e.g., “greater than”), while two-tailed tests for any difference. One-tailed tests have more power but should only be used when the direction is theoretically justified.
Q: How do I calculate degrees of freedom for a chi-square test?
A: For goodness-of-fit: df = number of categories – 1. For independence tests: df = (rows – 1) × (columns – 1).
Q: When should I use an F-test?
A: Primarily to compare variances (e.g., checking homogeneity of variance before ANOVA) or in regression analysis to test overall model fit.
Q: What’s the relationship between test statistics and p-values?
A: The p-value is the probability of observing your test statistic (or more extreme) if H₀ is true. It’s calculated from the test statistic’s distribution (normal, t, χ², or F).