The bell curve is the most important distribution in statistics. Useful for understanding data spread and probability
Statistics Essentials Cheat Sheet
Visual Overview: Normal Distribution
Descriptive Statistics
// Central Tendency
Mean: Average = Σx / n
Median: Middle value (50th percentile)
Mode: Most frequent value
Skewed right: Mean > Median (outliers on right)
Skewed left: Mean < Median (outliers on left)
Symmetric: Mean ≈ Median
// Spread
Range: Max - Min
Variance: σ² = Σ(x - mean)² / n
Standard Deviation: σ = √variance
Coefficient of Variation: σ / mean × 100%
IQR = Q3 - Q1 (middle 50%)
Outliers: Beyond Q1 - 1.5×IQR or Q3 + 1.5×IQR
// Percentiles & Quartiles
Q1 (25th percentile): 25% below
Q2 (50th percentile) = Median
Q3 (75th percentile): 75% below
IQR = Q3 - Q1
// Z-score (standardization)
z = (x - mean) / std
z = -1: 1 std below mean
z = 0: At mean
z = 2: 2 std above mean
// Covariance & Correlation
Covariance: Cov(X,Y) = E[(X - mean_X)(Y - mean_Y)]
Correlation: ρ = Cov(X,Y) / (σ_X × σ_Y)
Range: -1 to 1
ρ = -1: Perfect negative
ρ = 0: No linear relationship
ρ = 1: Perfect positive
Distributions
| Distribution | Use Case | Mean | Variance | |
|---|---|---|---|---|
| Normal | Natural phenomena | Bell curve | μ | σ² |
| Binomial | Binary outcomes (n trials) | P(X=k) = C(n,k)p^k(1-p)^(n-k) | np | np(1-p) |
| Poisson | Count events (rare) | P(X=k) = (e^-λ × λ^k)/k! | λ | λ |
| Exponential | Waiting time | f(x) = λe^(-λx) | 1/λ | 1/λ² |
| Uniform | Equal probability | f(x) = 1/(b-a) | (a+b)/2 | (b-a)²/12 |
| Chi-square | Variance tests | Skewed right | k (df) | 2k |
| T-distribution | Small samples | Bell-like, thicker tails | 0 | df/(df-2) |
| F-distribution | Variance ratio (ANOVA) | Right-skewed | df2/(df2-2) | - |
Hypothesis Testing
// Hypothesis test structure
H0 (Null): No effect, no difference (default assumption)
H1 (Alternative): There is an effect
// Type I & II errors
Type I (α): Reject H0 when true (false positive)
Type II (β): Fail to reject H0 when false (false negative)
Power = 1 - β
Typical: α = 0.05 (5% significance level)
// P-value
Probability of observing data as extreme (given H0 true)
p < 0.05: Reject H0 (statistically significant)
p ≥ 0.05: Fail to reject H0
// T-test (compare means)
One sample: Does mean differ from value?
Two sample: Do two group means differ?
Paired: Do pre/post measurements differ?
H0: μ1 = μ2
Test statistic: t = (mean1 - mean2) / SE
// Chi-square test (categorical)
H0: Variables are independent
χ² = Σ (observed - expected)² / expected
// ANOVA (compare 3+ means)
H0: All group means are equal
F = between-group variance / within-group variance
// Confidence Interval
CI = point_estimate ± margin_error
95% CI: We're 95% confident true value is in range
Margin of error = critical_value × SE
For mean: CI = mean ± 1.96 × (σ / √n) [for n > 30]
Effect Size & Practical Significance
// Cohen's d (difference in means)
d = (mean1 - mean2) / pooled_std
Interpretation:
d = 0.2: Small effect
d = 0.5: Medium effect
d = 0.8: Large effect
Example: Test prep improves score by 0.6σ → medium effect
// Correlation strength (Cohen)
r = 0.1: Small
r = 0.3: Medium
r = 0.5: Large
// Odds Ratio
Odds ratio = odds(A) / odds(B)
> 1: Higher odds in A
= 1: Equal odds
< 1: Lower odds in A
// Statistical vs Practical Significance
Significant (p < 0.05) ≠ Important
Large sample: Trivial difference becomes significant
Small sample: Large difference not significant
Always report effect size with p-value
Example: p < 0.001, but d = 0.1 (small effect, may not matter)
Bayesian Thinking
// Bayes Theorem
P(A|B) = P(B|A) × P(A) / P(B)
P(A|B): Posterior (what we want)
P(B|A): Likelihood (evidence strength)
P(A): Prior (previous belief)
P(B): Evidence (normalizing constant)
// Example: Disease testing
Prior: 1% of population has disease
Test accuracy: 95% sensitivity, 95% specificity
Question: If test positive, what's prob of disease?
P(Disease|Positive) = P(Positive|Disease) × P(Disease) / P(Positive)
= 0.95 × 0.01 / [0.95×0.01 + 0.05×0.99]
= 0.0095 / 0.0545
≈ 17.4%
// Frequentist vs Bayesian
Frequentist: p-values, confidence intervals
- Probability is long-run frequency
- Fixed parameters
- Focus on p-values
Bayesian: Posterior distributions
- Probability is degree of belief
- Parameters are random
- Update beliefs with data
// Prior selection
Uninformative (flat): No strong prior belief
Informative: Incorporate domain knowledge
Conjugate: Mathematically convenient
// Posterior is new prior
Can update beliefs as new data arrives
Bayesian online learning
Regression & Correlation
// Linear Regression
y = a + bx + ε
a: Intercept (y when x=0)
b: Slope (change in y per unit x)
ε: Error term
// Least Squares Estimation
Minimize: Σ(y - ŷ)²
Solution: b = Cov(x,y) / Var(x)
a = mean_y - b × mean_x
// R-squared (coefficient of determination)
R² = 1 - (SS_res / SS_tot)
= 1 - Σ(y - ŷ)² / Σ(y - mean_y)²
Interpretation:
R² = 0: Model explains 0%
R² = 0.5: Model explains 50%
R² = 1: Perfect fit
// Standard Error
SE = √(Σ residuals² / (n - 2))
Used for confidence intervals of predictions
// Correlation vs Causation
Correlation ≠ Causation
Control for confounders
Consider reverse causation
Use experiments when possible
// Multiple Regression
y = a + b1×x1 + b2×x2 + ... + ε
Partial slopes: Effect of x1 controlling for others
// Assumptions
Linearity: Relationship is linear
Normality: Errors are normally distributed
Homoscedasticity: Constant variance
Independence: Observations are independent