PMF — Probability Mass Function
A function that gives the probability that a discrete random variable takes on exactly a specific value. Denoted $P(X = x)$ or $f_X(x)$.
Key properties: all probabilities are ≥ 0, sum to 1 over all possible values. Used for: Bernoulli, Binomial, Poisson, Geometric, Negative Binomial.
PDF — Probability Density Function
A function that describes the relative likelihood of a continuous random variable. The probability of falling in an interval is the area under the curve. Denoted $f_X(x)$.
$P(a \le X \le b) = \int_a^b f(x)\,dx$. Key: individual point probabilities are zero. Used for: Normal, Exponential, Gamma, Uniform, Beta, Pareto, Weibull, Log-Normal.
CDF — Cumulative Distribution Function
The probability that a random variable is less than or equal to a value $x$. Denoted $F_X(x) = P(X \le x)$. Works for both discrete and continuous.
Always non-decreasing, goes from 0 to 1. For continuous: $f(x) = F'(x)$. For discrete: $F(x) = \sum_{k \le x} P(X = k)$.
Expected Value (Mean)
The probability-weighted average of all possible values. Think: "what's the long-run average outcome?" Denoted $E[X]$ or $\mu$.
Linearity: $E[aX + bY] = aE[X] + bE[Y]$ always — no independence needed. This is the most powerful tool on Exam P.
Variance
Measures the spread of a distribution — how far values typically sit from the mean. Denoted $\operatorname{Var}(X)$ or $\sigma^2$.
Standard deviation $\sigma = \sqrt{\operatorname{Var}(X)}$. For independent variables: $\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$. For the Poisson: $E[X] = \operatorname{Var}(X)$, which is unique.
MGF — Moment Generating Function
A function that encodes all moments (mean, variance, skewness, etc.) of a distribution. Denoted $M_X(t) = E[e^{tX}]$.
The k-th derivative at t=0 gives $E[X^k]$. Key use: if two distributions have the same MGF, they are the same distribution. Used to prove the CLT and to identify distributions from sums.
Memoryless Property
The past has no effect on the future. Having "waited" so far doesn't change the distribution of remaining time or trials.
Discrete: Geometric. Continuous: Exponential. These are the only distributions with this property. If a lightbulb has been burning for 100 hours and still works, its remaining life is Exponential with the same rate as when it was new.
Bernoulli Trial
A single experiment with exactly two outcomes: success (1) with probability $p$, failure (0) with probability $1-p$. The atom of all discrete distributions.
Independence is essential: the outcome of one trial must not affect another. Every discrete distribution builds from here: Binomial = sum of n Bernoullis, Geometric = count trials until first success, Poisson = limit of many Bernoullis with tiny p.
Poisson Process
A model for events that occur independently at a constant average rate $\lambda$. The number of events in time $t$ is Poisson($\lambda t$), and the time between events is Exponential($\lambda$).
Three key properties: (1) independent increments — what happens in disjoint time intervals is independent; (2) stationary increments — the distribution depends only on interval length, not start time; (3) no simultaneous events. Used everywhere in insurance: claims arrivals, accident occurrences, customer calls.
Central Limit Theorem
The sum (or average) of independent random variables approaches a Normal distribution as the number grows, regardless of the original distribution's shape.
This is why the Normal appears everywhere: sums of Bernoulli trials (Binomial), sums of Exponentials (Gamma), averages of any distribution. The CLT justifies using Normal approximations for large samples — the foundation of statistical inference and Exam P's biggest tool.
Law of Large Numbers
The sample average converges to the expected value as the sample size grows. The more data you have, the closer your observed average gets to the true mean.
This is why insurance works: with enough independent policies, the average claim cost converges to the expected cost, allowing the insurer to set premiums with confidence.
Independence
Two events are independent if the occurrence of one does not affect the probability of the other. For random variables: $P(X \le x, Y \le y) = P(X \le x) P(Y \le y)$ for all $x$, $y$.
Independence means $\operatorname{Cov}(X,Y) = 0$ (but the reverse isn't always true). When summing independent variables, variances and MGFs multiply/add in simple ways — this is why independence is so powerful in derivations.
Conditional Probability
The probability of an event given that another event has occurred. Denoted $P(A \mid B)$.
If A and B are independent, $P(A \mid B) = P(A)$. The Law of Total Probability: $P(A) = P(A \mid B)P(B) + P(A \mid B^c)P(B^c)$.
Bayes' Theorem
Updates the probability of a hypothesis given observed evidence. Connects $P(H \mid E)$ to $P(E \mid H)$.
On Exam P, Bayes' Theorem is often tested with the Law of Total Probability as the denominator. The Beta distribution's role as a conjugate prior for the Binomial is a direct application: $P(p \mid \text{data}) \propto P(\text{data} \mid p) P(p)$ with a nice closed-form result.
Conjugate Prior
A prior distribution that, when combined with the likelihood through Bayes' Theorem, yields a posterior distribution of the same family. The math stays clean — no numerical integration needed.
Key pairs on Exam P: Beta is conjugate to Binomial (posterior = Beta($\alpha + k$, $\beta + n - k$)). Gamma is conjugate to Poisson. This is why the Beta appears — it's the natural way to model an unknown probability.
Hazard Rate (Force of Mortality)
The instantaneous rate of failure at time $t$, given survival up to $t$. Denoted $h(t) = f(t) / S(t)$.
Constant hazard → Exponential (memoryless). Increasing hazard → Weibull $k > 1$, Normal (aging/wear-out). Decreasing hazard → Weibull $k < 1$, Pareto (infant mortality). The hazard rate is the link between reliability theory and insurance.
Gamma Function $\Gamma(z)$
Generalizes the factorial to real (and complex) numbers. For integer $n$: $\Gamma(n) = (n-1)!$.
Used in the Gamma distribution PDF, Beta function, and Chi-square distribution. $\Gamma(1) = 1$, $\Gamma(1/2) = \sqrt{\pi}$. Key identity: $\Gamma(z+1) = z\Gamma(z)$.
Beta Function $B(\alpha, \beta)$
A normalization constant that ensures the Beta PDF integrates to 1. Relates to the Gamma function.
$B(\alpha, \beta) = \Gamma(\alpha)\Gamma(\beta) / \Gamma(\alpha+\beta)$. The Beta distribution's mean $\alpha/(\alpha+\beta)$ is intuitive — think "success count over total count."