| X | P(X = x) |
|---|---|
| 1 | 0.2 |
| 2 | 0.5 |
| 3 | 0.3 |
Chapter 2 Probability and Distribution
This chapter covers fundamental concepts in probability and statistical distributions, including sample space, events, probability rules, conditional probability, sensitivity and specificity, discrete distributions, mean and variance, binomial distribution, uniform distribution, and normal distribution.
2.1 Sample Space
The sample space (denoted\(S\)) is the set of all possible outcomes of a random experiment.
- Example 1: Rolling a six-sided die.
- Sample space: S = {1, 2, 3, 4, 5, 6}.
- Example 2: Waiting time for the next message from my best friend.
- Sample space: S = [0,\(\infty\)], the set of all numbers that are zero or positive.
- Types:
- Finite: Limited outcomes (e.g., die roll).
- Infinite: Countable (e.g., number of coin flips until heads) or uncountable (e.g., time to failure of a machine).
2.2 Event
An event is a subset of the sample space, representing a specific outcome or set of outcomes. We use upper-case letters to denote events.
- Example: For a die roll, the event “rolling an even number” is\(E\)= {2, 4, 6}.
- Types:
- Simple event: Single outcome (e.g., rolling a 3).
- Compound event: Multiple outcomes (e.g., rolling an even number).
- Events can be combined using:
- Union (\(A\cup B\)): Either event \(A\) or \(B\) occurs.
- Intersection (\(A \cap B\)): Both \(A\) and \(B\) occur.
- Complement (\(A^c\)): Event \(A\) does not occur.
2.3 Probability
Probability measures the likelihood of an event occurring, ranging from 0 (impossible) to 1 (certain).
- Definition: For a sample space \(S\) with equally likely outcomes, the probability of event\(A\)is: \[P(A) = \frac{\text{Number of outcomes in } A}{\text{Total number of outcomes in } S}\]
- Example: For a die, if \(A\) represents the event of rolling an even number, then \[P(A)=\frac{3}{6} = 0.5\]
2.4 Basic Probability Rules
- Rule 1: Non-negativity:\(P(A) \geq 0\).
- Rule 2: Normalization:\(P(S) = 1\).
- Rule 3: Addition Rule: For mutually exclusive events \(A\) and \(B\): \[ P(A \cup B) = P(A) + P(B) \]
- Rule 4: General Addition Rule: For any events: \[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \] which reduces to \[ P(A \cup B) = P(A) + P(B) \] when events \(A\) and \(B\) are disjoint (i.e., cannot happen simultaneously).
- Rule 5: Complement Rule:\(P(A^c) = 1 - P(A)\).
2.5 Conditional Probability
Conditional probability is the probability of an event \(A\) occurring given that event \(B\) has occurred, denoted $P(A|B) $.
Formula: \[ P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0 \]
Example: In a deck of 52 cards, find the probability of drawing a heart given that the card is red.
- Red cards: 26 (13 hearts, 13 diamonds).
- \(P(\text{heart}~ | ~\text{red}) = \frac{P(\text{heart} ~\cap~ \text{red})}{P(\text{red})} = \frac{13/52}{26/52} = \frac{1}{2}\).
Independence: Events \(A\) and \(B\) are independent if: \[P(A \cap B) = P(A) \cdot P(B) \quad \text{or} \quad P(A|B) = P(A)\]
2.6 Application of Conditional Probability: Sensitivity and Specificity
In medical testing, sensitivity and specificity evaluate a test’s accuracy.
- Sensitivity: Probability of a positive test result given the person has the disease (true positive rate). \[\text{Sensitivity} = P(\text{Positive} ~|~ \text{Disease})\]
- Specificity: Probability of a negative test result given the person does not have the disease (true negative rate). \[\text{Specificity} = P(\text{Negative} ~|~ \text{No Disease})\]
- Example: A test for a disease has sensitivity 0.95 and specificity 0.90.
- 95% of diseased individuals test positive.
- 90% of healthy individuals test negative.
2.7 Discrete Distribution
A discrete distribution describes the probability of outcomes for a discrete random variable (takes countable values). A random variable is a variable whose values are random.
- Probability Mass Function (PMF): Gives the probability \(P(X = x)\) for each possible value \(x\).
- Properties:
- All probabilities must sum to 1.
- Example:
| Number of Complaints ($X$) | Number of Days Observed | Probability $P(X = x)$ |
|---|---|---|
| 0 | 15 | 0.50 |
| 1 | 9 | 0.30 |
| 2 | 4 | 0.13 |
| 3 | 2 | 0.07 |
2.8 Mean and Variance of a Random Variable
Mean (Expected Value): Measures the central tendency of a discrete random variable \(X\). It’s defined as the sum of products of value and its probability. In math notation, \[E(X) = \sum_x x \cdot P(X = x) \quad\] where \(\sum\) is the notation for summation. \(E(X)\) often is denoted by the Greek letter \(\mu\) (read as “mu”). The mean tells you the average outcome you’d expect over many repetitions of an experiment.
Variance: Measures the spread of \(X\). \[\text{Var}(X) = \sum_x (x - \mu)^2 \cdot P(X = x)\]
- Standard deviation: \(\sigma = \sqrt{\text{Var}(X)}\). The Greek letter \(\sigma\) is read as “sigma”.
Example: For \(X\) = outcome of a die roll:
- \(E(X) = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + \cdots + 6 \cdot \frac{1}{6} = 3.5\). This means on average, you would get 3.5 if you rolled the die many times.
- \(\text{Var}(X) = (1 - 3.5)^2 \cdot \frac{1}{6} + (2 - 3.5)^2 \cdot \frac{1}{6} +\cdots + (6 - 3.5)^2 \cdot \frac{1}{6} \approx 2.9167\). \(\sigma = \sqrt{2.9167}=1.71\), which means on average a rolled number differs from the mean by 1.71.
2.9 Binomial Distribution
The binomial distribution models the number of successes in \(n\) independent trials, each with success probability \(p\).
Binomial probability formula: \[P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k = 0, 1, \ldots, n\] where\(\binom{n}{k} = \frac{n!}{k!\cdot (n - k)!}\) and \(5!\) means the product of 1,2, 3, 4, and 5.
Mean:\(E(X) = n p\).
Variance:\(\text{Var}(X) = n p (1 - p)\).
Example: Flipping a fair coin 10 times (\(p = 0.5\)).
- Probability of exactly 3 heads: \[P(X = 3) = \binom{10}{3} (0.5)^3 (0.5)^7 = \frac{10!}{3!\cdot 7!} (0.5)^3 (0.5)^7 \approx 0.1172\]
- Probability of exactly 6 heads: \[P(X = 6) = \binom{10}{6} (0.5)^6 (0.5)^4 = \frac{10!}{6!\cdot 4!} (0.5)^6 (0.5)^4 \approx 0.2051\]
2.10 Uniform Distribution over an Finite Interval
If a continuous random variable \(X\) takes every possible outcome within a range between \(a\) and \(b\) equally likely, we say that the random variable \(X\) has a uniform distribution over the interval \([a, b]\). The mean of \(X\) is the average of \(a\) and \(b\). The standard deviation of \(X\) is \((a-b)/\sqrt{12}\).
Example: Suppose a delivery app says food will arrive between 20 and 40 minutes after ordering, with all times equally likely.
Mean wait time: (20 + 40) / 2 = 30 minutes.
Probability of arriving in 25–30 mins? Since all intervals of the same length have equal chance, this is (30 – 25)/(40 – 20) = 5/20 = 25%.
2.11 Normal Distribution
The normal distribution is one of the most important probability distributions in statistics. It describes many real-world phenomena where data clusters around a central average, with symmetrical spread around that center.
The normal distribution is a continuous distribution characterized by a bell-shaped curve, defined by mean \(\mu\) and standard deviation \(\sigma\).
- The Empirical Rule:
- Approximately 68% of data falls within 1 standard deviation of the mean.
- Approximately 95% within 2 standard deviations of the mean.
- Approximately 99.7% within 3 standard deviations of the mean.
- Example: IQ scores are normally distributed with \(\mu = 100\) and \(\sigma = 15\).
- Probability of IQ between 85 and 115 is approximately 0.68, since the range is the one standard deviation of the mean.
2.12 Conclusion
This chapter introduced key probability concepts and distributions. Understanding sample spaces, events, and probability rules forms the foundation. Conditional probability, sensitivity, and specificity are critical in applications like medical testing. Discrete distributions model countable outcomes, while the continuous distributions are essential for continuous data. Practice these concepts with real-world examples to deepen your understanding.
2.13 Exercises
Section 2.1-2.3: Sample Space, Events, and Probability
Exercise 1: Define the sample space of a random experiment and provide two examples: one with a finite sample space and one with an infinite sample space.
Solution:
A sample space is the set of all possible outcomes of a random experiment.
- Finite example: Flipping a coin: \(S = \{H, T\}\)
- Infinite example: Waiting time for a bus: \(S = [0, \infty)\)
Exercise 2: For rolling a six-sided die:
- Define a simple event
- Define a compound event
- If \(E = \{2, 4, 6\}\) and \(F = \{1, 2, 3\}\), find \(E \cup F\) and \(E \cap F\)
Solution:
- Simple event: Any single outcome (e.g., rolling a 3: \(\{3\}\))
- Compound event: Multiple outcomes (e.g., even numbers: \(\{2, 4, 6\}\))
- \(E \cup F = \{1, 2, 3, 4, 6\}\), \(E \cap F = \{2\}\)
Section 2.4: Basic Probability Rules
Exercise 3: A fair coin is flipped twice.
- List the sample space
- Calculate:
- Probability of exactly one head
- Probability of at least one tail
- Probability of exactly one head
Solution:
- \(S = \{\text{HH, HT, TH, TT}\}\)
- The answer is
- Probability of exactly one head: 2/4
- Probability of at least one tail: 3/4
- Probability of exactly one head: 2/4
Section 2.5: Conditional Probability
Exercise 4: A class has 60% girls and 40% boys. 30% of girls and 20% of boys wear glasses.
- Use probability notation to denote the numbers 60% and 40%.
- Use conditional probability notation to denote the numbers 30% and 20%.
Solution:
- \(P(\text{Girls}) = 0.6, ~P(\text{Boys}) = 0.4\)
- \(P(\text{Glasses|Girls}) = 0.3, ~P(\text{Glasses|Boys}) = 0.2\)
Section 2.6: Sensitivity and Specificity
Exercise 5: A test has sensitivity = 0.95, specificity = 0.90. Disease prevalence = 2%.
Use probability or conditional probability notation to denote the numbers.
Solution:
\(P(\text{Prevalence}) = 0.02\)
\(P(\text{Positive}|\text{Disease}) = 0.95\)
\(P(\text{Negative}|\text{No Disease}) = 0.90\)
Section 2.7-2.8: Discrete Distributions
Exercise 6: Given the discrete distribution:
- Verify validity.
- Calculate \(E(X)\) and \(\text{Var}(X)\).
Solution:
- Valid because \(0.2+0.5+0.3 = 1\)
- \(E(X) = 1(0.2) + 2(0.5) + 3(0.3) = 2.1\)
\(\text{Var}(X) = (1-2.1)^2(0.2) + (2-2.1)^2(0.5) + (3-2.1)^2(0.3) = 0.49\)
Section 2.9: Binomial Distribution
Exercise 7: Fair coin flipped 5 times.
- Probability of exactly 3 heads
- Mean and variance
Solution:
- \(P(X=3) = \binom{5}{3}(0.5)^5 = \frac{5!}{3!\cdot 2!}(0.5)^5 = 0.3125\)
- \(\mu = 5 \times 0.5 = 2.5\), \(\sigma^2 = 5 \times 0.5 \times 0.5 = 1.25\)
Section 2.10-2.11: Uniform & Normal
Exercise 8: Bus arrives uniformly every 10 minutes.
- \(P(\text{Wait} \leq 3)\)
- Mean and SD
Solution:
- \(3/10 = 0.3\)
- \(\mu = (0+10)/2 = 5\), \(\sigma = (10-0)/\sqrt{12} \approx 2.887\)