Statistics Essentials Cheat Sheet

dataGrades 10-127 sections

Visual Overview: Normal Distribution

The bell curve is the most important distribution in statistics. Useful for understanding data spread and probability

Descriptive Statistics

// Central Tendency
Mean: Average = Σx / n
Median: Middle value (50th percentile)
Mode: Most frequent value

Skewed right: Mean > Median (outliers on right)
Skewed left: Mean < Median (outliers on left)
Symmetric: Mean ≈ Median

// Spread
Range: Max - Min
Variance: σ² = Σ(x - mean)² / n
Standard Deviation: σ = √variance
Coefficient of Variation: σ / mean × 100%

IQR = Q3 - Q1 (middle 50%)
Outliers: Beyond Q1 - 1.5×IQR or Q3 + 1.5×IQR

// Percentiles & Quartiles
Q1 (25th percentile): 25% below
Q2 (50th percentile) = Median
Q3 (75th percentile): 75% below
IQR = Q3 - Q1

// Z-score (standardization)
z = (x - mean) / std
z = -1: 1 std below mean
z = 0: At mean
z = 2: 2 std above mean

// Covariance & Correlation
Covariance: Cov(X,Y) = E[(X - mean_X)(Y - mean_Y)]
Correlation: ρ = Cov(X,Y) / (σ_X × σ_Y)
Range: -1 to 1
ρ = -1: Perfect negative
ρ = 0: No linear relationship
ρ = 1: Perfect positive

Distributions

Distribution	Use Case	PDF	Mean	Variance
Normal	Natural phenomena	Bell curve	μ	σ²
Binomial	Binary outcomes (n trials)	P(X=k) = C(n,k)p^k(1-p)^(n-k)	np	np(1-p)
Poisson	Count events (rare)	P(X=k) = (e^-λ × λ^k)/k!	λ	λ
Exponential	Waiting time	f(x) = λe^(-λx)	1/λ	1/λ²
Uniform	Equal probability	f(x) = 1/(b-a)	(a+b)/2	(b-a)²/12
Chi-square	Variance tests	Skewed right	k (df)	2k
T-distribution	Small samples	Bell-like, thicker tails	0	df/(df-2)
F-distribution	Variance ratio (ANOVA)	Right-skewed	df2/(df2-2)	-

Hypothesis Testing

// Hypothesis test structure
H0 (Null): No effect, no difference (default assumption)
H1 (Alternative): There is an effect

// Type I & II errors
Type I (α): Reject H0 when true (false positive)
Type II (β): Fail to reject H0 when false (false negative)
Power = 1 - β

Typical: α = 0.05 (5% significance level)

// P-value
Probability of observing data as extreme (given H0 true)
p < 0.05: Reject H0 (statistically significant)
p ≥ 0.05: Fail to reject H0

// T-test (compare means)
One sample: Does mean differ from value?
Two sample: Do two group means differ?
Paired: Do pre/post measurements differ?

H0: μ1 = μ2
Test statistic: t = (mean1 - mean2) / SE

// Chi-square test (categorical)
H0: Variables are independent
χ² = Σ (observed - expected)² / expected

// ANOVA (compare 3+ means)
H0: All group means are equal
F = between-group variance / within-group variance

// Confidence Interval
CI = point_estimate ± margin_error
95% CI: We're 95% confident true value is in range
Margin of error = critical_value × SE

For mean: CI = mean ± 1.96 × (σ / √n)  [for n > 30]

Effect Size & Practical Significance

// Cohen's d (difference in means)
d = (mean1 - mean2) / pooled_std
Interpretation:
d = 0.2: Small effect
d = 0.5: Medium effect
d = 0.8: Large effect

Example: Test prep improves score by 0.6σ → medium effect

// Correlation strength (Cohen)
r = 0.1: Small
r = 0.3: Medium
r = 0.5: Large

// Odds Ratio
Odds ratio = odds(A) / odds(B)
> 1: Higher odds in A
= 1: Equal odds
< 1: Lower odds in A

// Statistical vs Practical Significance
Significant (p < 0.05) ≠ Important
Large sample: Trivial difference becomes significant
Small sample: Large difference not significant
Always report effect size with p-value

Example: p < 0.001, but d = 0.1 (small effect, may not matter)

Bayesian Thinking

// Bayes Theorem
P(A|B) = P(B|A) × P(A) / P(B)

P(A|B): Posterior (what we want)
P(B|A): Likelihood (evidence strength)
P(A): Prior (previous belief)
P(B): Evidence (normalizing constant)

// Example: Disease testing
Prior: 1% of population has disease
Test accuracy: 95% sensitivity, 95% specificity
Question: If test positive, what's prob of disease?

P(Disease|Positive) = P(Positive|Disease) × P(Disease) / P(Positive)
= 0.95 × 0.01 / [0.95×0.01 + 0.05×0.99]
= 0.0095 / 0.0545
≈ 17.4%

// Frequentist vs Bayesian
Frequentist: p-values, confidence intervals
- Probability is long-run frequency
- Fixed parameters
- Focus on p-values

Bayesian: Posterior distributions
- Probability is degree of belief
- Parameters are random
- Update beliefs with data

// Prior selection
Uninformative (flat): No strong prior belief
Informative: Incorporate domain knowledge
Conjugate: Mathematically convenient

// Posterior is new prior
Can update beliefs as new data arrives
Bayesian online learning

Regression & Correlation

// Linear Regression
y = a + bx + ε

a: Intercept (y when x=0)
b: Slope (change in y per unit x)
ε: Error term

// Least Squares Estimation
Minimize: Σ(y - ŷ)²
Solution: b = Cov(x,y) / Var(x)
         a = mean_y - b × mean_x

// R-squared (coefficient of determination)
R² = 1 - (SS_res / SS_tot)
= 1 - Σ(y - ŷ)² / Σ(y - mean_y)²

Interpretation:
R² = 0: Model explains 0%
R² = 0.5: Model explains 50%
R² = 1: Perfect fit

// Standard Error
SE = √(Σ residuals² / (n - 2))
Used for confidence intervals of predictions

// Correlation vs Causation
Correlation ≠ Causation
Control for confounders
Consider reverse causation
Use experiments when possible

// Multiple Regression
y = a + b1×x1 + b2×x2 + ... + ε
Partial slopes: Effect of x1 controlling for others

// Assumptions
Linearity: Relationship is linear
Normality: Errors are normally distributed
Homoscedasticity: Constant variance
Independence: Observations are independent

Statistics Essentials Cheat Sheet

Visual Overview: Normal Distribution

Descriptive Statistics

Distributions

Hypothesis Testing

Effect Size & Practical Significance

Bayesian Thinking

Regression & Correlation

More Cheat Sheets

SQL Reference Cheat Sheet

Pandas & NumPy Cheat Sheet

Data Visualization Cheat Sheet