Notes for reviewing my probability and statistics course atNortheastern.

Sample Spaces

Repeatable procedure with a set of possible results

Sample Outcome
One of many possible results of an experiment

Sample Space
$S$: The set of all possible outcomes of an experiment

$A \subset S$ : 0 or more outcomes of an experiment

$P(A) = \frac{n(A)}{n(S)}$

Set Theory

Discrete Set
A finite or countable set. i.e. tossing a coin until a head is received:$S = \{ H, TH, TTH, \dots \}$

Continuous Set
A range of values. I.e. getting a number smaller than 1 from within areal number between 0 and $\sqrt{2}$: $S = $, $A = [0, 1)$

$A \cap B = \{x | x \in A\ and\ x \in B\}$ for some events A, B

$A \cap B = \emptyset$. Also known as "mutually exclusive"

$A \cup B = \{x | x \in A\ or\ x \in B\}$

$A^{c} = \{x \in S | x \not{\in} A\}$

DeMorgan's Law(s)

  1. $(A \cup B)^{c} = A^{c} \cap B^{c}$
  2. $(A \cap B)^{c} = A^{c} \cup B^{c}$

Probability Function

$P$ assigns a real number to any event of a sample space, and followsthe following axioms for a sample space that's finite:

  • $P(A) \geq 0$ for all $A$
  • $P(S) = 1$
  • $A \cap B = \emptyset \implies P(A \cup B) = P(A) + P(B)$, or$P(A \cup B) = P(A) + P(B) - P(A \cap B)$ otherwise
  • $P(\cup_{i=1}^{\infty} A_{i}) = \sum_{i=1}^{\infty} P(A_{i})$ if any$A_1, A_2, A_3 \dots$ are mutually exclusive in $S$

Conditional Probability

$P(A|B) = \frac{P(A \cup B)}{P(B)}$$P(A \cap B) = P(B|A)P(A) = P(A|B)P(B)$ Solving conditional probability:Make a tree! Fork at each choice, recording the probability on each"leaf". Then use the leaves to assess a series of events.


$A$ and $B$ are independent if $P(A \cap B) = P(A)P(B)$; putanother way, $P(A|B) = P(A) \iff P(B|A) = P(B)$ For more than two sets:

  • $P(A \cap B \cap C) = P(A)P(B)P(C)$
  • $P(A \cap B) = P(A)P(B)$, $P(A \cap C) = P(A)P(C)$,$P(B \cap C) = P(B)P(C)$ and so on…



$S_{n} = \sum_{k=0}^{n}r^{k} = 1 + r + r^{2} + \dots + r^{n}$ For$-1 < r < 1$, the sum converges:$S \equiv S_{\infty} = \sum_{k=0}^{\infty}r^{k} = \frac{1}{1 - r}$, for$|r| < 1$

PDF$p_{Y}(k) = (1 - p)^{k - 1}p$


Probability Distributions

$(^{n}_{r}) = {n}C{r} = \frac{n!}{(n - r)! r!}$

Binomial Distribution

Given n independent trials with two outcomes and a constantP(success) for each outcome, $P(k) = (^{n}_{k}) * p^{k}(1 - p)^{n - k}$for $k = 0, 1, 2, \dots, n$.

PDF$P(X = k) = (^{n}_{k}) * p^{k}(1 - p)^{n - k}$

  • calculate: use binompdf

CDF$P(X \leq t) = \sum_{k=0}^{t}(^{n}_{k}) * p^{k}(1 - p)^{n - k}$

  • calculate: use binomcdf

Expected Value$E(X) = np$

Bernoulli Distribution

Essentially a Binomial Distribution where$n = 1$. $P(k) = (^{1}_{k}) * p^{k}(1 - p)^{1 - k}$

Hypergeometric Distribution

$P(x) = \frac{(^{k}{x})(^{N-k}{n-x})}{(^{N}_{n})}$

  • Random selection of $n$ items without replacement from a set of $N$items
  • Not guaranteed P(success) stays constant.
  • $k$ items are success; $N - k$ items are failures.

This can be considered a generalization of the (#Binomial Distribution).

Uniform Distribution

All values in a range are equally likely. For some interval $$:

PDF$\frac{1}{b - a}$

CDF$\frac{x - a}{b - a}$

Exponential Distribution

PDF$f_{X}(x) = \lambda e^{-\lambda x}$ for $x \geq 0$

CDF$F_{X}(x) = 1 - e^{-\lambda x}$

$E(X) = \frac{1}{\lambda}$

$Var(X) = \frac{1}{\lambda^{2}}$

Poisson Distribution

Counts the number of occurences per unit of measurement - over aspecific period of time, a specific area, or volume, etc…

The probability of an event occuring in a unit of measurement must bethe same for all similar units; for example, if the unit of measurementis a month, then the probability must be the same for all months.

Poisson is often used to approximate the binomial distribution: giventhat $n$ is large ($n \geq 100$) and $p$ is small, we can let \$λ =np\$! In other words, $\lambda := np$ -\> (average number of occurencesper unit) \* (length of observation period)

Use poissonpdf and poissoncdf calculator unctions to approximatethis.

PDF$p_{X}(k) = P(X = k) := \frac{\lambda^{k}e^{-\lambda}}{k!}$

$E(X) = \lambda$

$Var(X) = \lambda$

Normal Distribution

Density Functions

Discrete Random Variable
Some $X$ such that:

  • $X: S \rightarrow \mathbb{R}$
  • $X$ is a countable subset of $\mathbb{R}$
  • Motivation: Constrain the sample space to a smaller sample space,using a single variable to represent each outcome we're investigating.If we're looking at pairs of numbers, for example, we only care thatthe sum of the pair is the same, so we consider (1, 2) and (2, 1) tobe the same outcome.

Continuous random variable
Like discrete, but ranges over a continuous interval of $\mathbb{R}$instead. Two continuous random variables are independent if somefunctions, $g(x)$ and $h(x)$, exist such that:

  • $f_{X,Y}(x,y) = g(x)h(y)$
  • $f_{X}(x) = g(x)$
  • $f_{Y}(y) = h(y)$

In other words, one should be able to multiply the results of themarginal pdfs to produce the joint pdf for the two variables, and viceversa.


(Probability Density Function)


For every $X$, a probability density function (pdf) looks like:$p_{x}(k) = P(X = k) := P(\{s \in S | X(s) = k\})$, where$p_{x}(k): \mathbb{R} \rightarrow \mathbb{R}$. Here, $s$ and $S$ arefrom the original sample space we're sampling from.


Some $f_{x}(x)$ satisfying:

  1. $f_{x}(x) \geq 0$
  2. $\int_{-\infty}^{\infty}f_{x}(x) = 1$

$P(a \leq X \leq b) = \int_{b}^{a}f_{x}(x)dx$

Joint PDF

$p_{X,Y}(x,y) := P(X = x, Y = y)$, satisfying:

  1. $p_{X,Y}(x,y) \geq 0$
  2. $\sum_{all y} \sum_{all y} p_{X,Y}(x,y) = 1$
  1. Discrete

    Given the joint pdf of $X$ and $Y$, the marginal pdfs of $X$and $Y$ are:

    • $p_{X}(x) = \sum_{all y}p_{X,Y}(x,y)$
    • $p_{Y}(y) = \sum_{all x}p_{X,Y}(x,y)$
  2. Continuous

    $P((X,Y) \in R) = \int \int_{R}f_{X,Y}(x,y) dx dy$ When solving -identify the bound that's dependent on the other! It can really helpto plot out some 2D plane, then graph the relationship between thetwo continuous random variables. From this graph it's often fairlyeasy to identify, and thus estimate, the area we're investigating;this guides us to investigate what we should learn more from!Absolutely worth working through some pracice problems.


Cumulative Distribution Function


$F_{x}(t) = P(X \leq t) := P(\{s \in S | X(s) \leq t\})$, where$F_{x}(t): \mathbb{R} \rightarrow \mathbb{R}$. Generally,$F_{x}(t) = \int_{-\infty}^{t}p_{x}(t)$; the pdf represents theprobability of the discrete random variable being a specific value,while the cdf represents the probability of all outcomes occuring lessthan some outcome upper bound $t$.


$F_{x}(x) = P(X \leq x) = \int_{-\infty}^{x}p_{x}(x)dx$$P(a \leq X \leq b) = F_{X}(b) - F_{X}(a)$

Expected Value

A generalization of the concept of "average". The name's on the tin -it's a value that represents the proportionally weighted, expectedresult.

For example, if I have a 5% chance at \$100 and a 95% chance at \$20,the expected value would be $100 * 0.05 + 0.95 * 20$, so \$24.


$E(X) = \sum_{all \ k} k * p_{X}(k)$


$E(X) = \int_{-\infty}^{\infty} x * p_{X}(x) dx$

Other Properties

$E(aX + bY) = aE(X) + bE(Y)$ for any random varaibles $X, Y$ and numbers$a$ and $b$. As such, if $X$ and $Y$ are independent, then$E(XY) = E(X)E(Y)$.

Sample Mean

Denoted as $\bar{X}$$\bar{X} = \frac{1}{n}(X_{1} + X_{2} + \dots + X_{n})$



"Middle number" of the distribution, or the average of the two middlenumbers if the cardinality is even; the standard definition.


$m$ such that $\int_{-\infty}^{m}f_{Y}(y) dy = 0.5$. Finding the median:

  1. Integrate and substitute.
  2. Factor in terms of and solve for $m$.


A measure of how far the distribution spreads from its mean.$Var(X) := E((X - \mu)^{2})$, where:

  • $\mu = E(X)$ is the mean of $X$
  • $\sigma := \sqrt{Var(X)}$ is the standard deviation of $X$

If $X$ and $Y$ are independent, then:$Var(aX + bY) = a^{2}Var(X) + b^{2}Var(Y)$ In general:$Var(aX + bY) = a^{2}Var(X) + b^{2}Var(Y) - 2abCov(X,Y)$


where $Cov(X, Y)$, the covariance of X and Y, is:$Cov(X,Y) := E(XY) - E(X)E(Y)$ As can be assumed, if $X$ and $Y$ areindependent, then $Cov(X,Y) = 0$.


$Corr(X,Y) = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}$ A measurement ofcorrelation; if positive, then the two random variables increasetogether; if negative, one increases while the other decreasese and viceversa.

Double Integrals

Fubini's Theorem:$\int_{a}^{b} \int_{c}^{d} f(x,y) dy dx = \int_{c}^{d} \int_{a}^{b} f(x,y) dx dy$,given $a \leq x \leq b$ and $c \leq y \leq d$ To identify the region in$\mathbb{R}^{2}$ to integrate over, use the inside first. Treat theunevaluated integral variable as a constant and just integrate withrespect to a constant. Often one variable is much easier to integratethan the other; pick the right one to use! This takes practice.

TODO Problems to Practice


Graphing PDF, CDF

Converting between the two, esp. with continuous piecewise

what's with that variance theorm and E(g(X))? practice those problems.

there are some good exercises for deriving expected value and varianceavailable in the textbook - review these!

do the exam

3.7 3.9 4.x 5.x, especially the problem i missed on the last exam examproblems and how exactly to approach them the rest of the 6s and 7s,though i can probably wing those just

Types of Problems

Exam 1

Probabilities and Sets

i.e. $P(A), P(A \cup B), P(A \cap B^{c})$ -\> find something Draw thingsout as Venn diagrams to help visualize Nice properties:

  • $P(A \cup B) = P(A \cap B^{c}) + P(A \cap B) + P(A^{c} \cap B)$
  • Bayes: $P(A|B) = \frac{P(A \cap B)}{P(B)}$

Simple Probabilities

Generally, use some arrangement of $nCr$ ; P(at most x) = 1 - P(at leastx); vice vers

Conditional Probabilities with Scenarios

Draw a tree:

  • At the root, no branch is chosen
  • After the root, choose what info you have more about: i.e. if you'relooking for P(Defective \| built at plant X), draw a tree with thefirst set of descendant notes representing the plant at which thething was constructed, then from there the chances that the productwas defective given the plant branching off of each.

    From here, can use Bayes rule to fill out the tree, then find desiresdprobability.

Exam 2

distributions to know: binomial,

Finding Mean, stdev, sample from given problem

$u = np$ $\sigma = \sqrt{np(1-p)}$$P(X=a) = nCa * P(a)^{a}(1-P(a))^{n-a}$ \~ binompdf w/ n, P(a), a

Finding P in range:$P(a < X \leq b) = cdf(500, b, P(x)) - cdf(500, a, P(x))$ gives CDF forthat range!

Find cdf of discrete set of scenarios

  1. Write out each possible scenario and associated value of randomvariable
  2. Use P(scenario) \* (num occurences of scenario) for each randomvariable value to compute some P(X=res) for each
  3. Assemble into table; P(X=k) for values x=1, x=2, x=3 for example.


integrate when converting to cdf, outside the range provided for thepdf, must state that the value of the cdf is 0 before and 1 followingthe range; otherwise the cdf won't function outside of it, but the rangefor a cdf should support anything



Find E, Var given density function

  1. $E(X) = \int_{a}^{b} x f_{X}(x) dx$ over provided range$a \leq f_{X}(X) \leq b$
  2. $Var(X) = E(X^{2}) - (E(X))^{2}$

    1. where $E(X^{2}) = \int_{a}^{b} x^{2} f_{X}(x) dx$

Piecewise CDF-\>PDF, PDF -\> CDF

derivative, integral respectively of each function provided over eachrange

Joint PDF, CDF

  1. Find marginal pdfs for each; for x, integrate by dy, and for yintegrate by dx
  2. Set up integral for $E(XY)$; this should be some$\int_{c}^{d} \int_{a}^{b} xyf_{XY}(x,y) dx dy$ for bounds$a \leq x \leq b$ and $c \leq y \leq d$ . If bounds overlap,reference the relationship between them (i.e.$0 < y < x < 1 \implies a=y, b=1, c=0, d=1$), as the bounds of oneare dependent on the bounds for the other.
  3. Find the covariance$Cov(X,Y) := E(XY) - E(X)E(Y)$. Use the marginal PDFs and integrateto find corresponding $E$, then follow the formula

finding c for some pdf with constant to solve for

TODO Combining Random Variables

i.e. test 2: 13, 14

Exam 3

use normal distribution with continuity correction to estimate probability in bounds

  1. find mean ($\mu$) and standard deviation ($\sigma$) of the providedscenario for the sample mean; note that$\sigma(\bar{x}) = \frac{\sigma}{\sqrt{n}}$
  2. state that the normal approximation can be used if given "normalapproximation" or $n \geq 30$; "using Central Limit Theorem" ifusing this approximation, where$\bar{X} \tilde{=} N(\mu, \frac{\sigma}{\sqrt{n}})$

    1. TODO when to use $\frac{\sigma^{2}}{n}$?
  3. continuity correction: "round up or down" to the nearest 0.5 so thatthe interval encapsulates the intended population. i.e. if intervalis $\leq$, it's necessary to ensure interval will encapsulate upperand lower bound
  4. apply normalcdf: $normalcdf(lower, upper, \mu, \sigma)$; where withsample mean, use $\sigma(\bar{x}) = \frac{\sigma}{\sqrt{n}}$instead. Use $10^{99}$ to replace upper and lower bounds (negativefor low) as needed to fill in provided open P(..) intervals.

solve for interval given resultant probability

  1. Investigate the interval; sketch it out relative to a normaldistribution. If it's two tailed, mark that
  2. Finding the bound asked for:$invNorm(area before interval, \bar{x}, \sigma(\bar{x}))$ providessuch a bound.

probabilities with poisson dist.

finding confidence interval

maximum likelihood estimation

  1. Find $L(\theta) = \prod_{i=1}^{n}\f_{X}(x)$

Exam 4

unbiasted estimators variance

Some $\hat{\theta}$ is an unbiased estimator for $\theta$ if$E(\hat{\theta}) = \theta$, so:

  1. Set up variance formula: ignore constants, and take variance of therandom variables used to calculate the new random variable
  2. Substitute based on what's provided for these existing randomvariables; i.e. if $E(X) = E(Y) = \theta$, can substitute theexpected value of each there for theta when calculating
  3. if original value is reached after evaluating, then it's an unbiasedestimator!


examine the calculation for the random variable, squaring the constantsand taking the variance of the random variables used to calculate it

find convidence interval

  1. T test

    1. State facts: S, df, $\alpha$
    2. calculate $t_{\frac{\alpha}{2}, df}$ with$invT(1 - \alpha, df)$.
    3. Find interval:$\bar{X} = t_{\frac{\alpha}{2}, df} \pm \frac{s}{\sqrt{n}}$; canuse TInterval
  2. Z test

statistical test

critical values and errors

  1. find critical value in nonstandard form: typically$invNorm(accept-area, average, \frac{\sigma}{\sqrt{n}})$
  2. Find type 1 error: typically $\alpha$
  3. find type 2 error (given that real mean, $\mu(H_{a})$):$\beta = P(TypeII) = P(accept H_{0} | \mu = H_{a})$ -\>$normalcdf(b1, b2, \mu(H_{a}), \sigma(\bar{x}))$
  4. Find the power of the test: $Power = 1 - \beta$

Additional Material