Lecture 04 - Conditional Independence and Directed GMs (BNs)
Review of conditional independence, and an introduction to Directed GMs (BNs)
- Logistics Review
- Homework info
- Conditional Independence
- Naïve Bayes Classifier
- Bayesian Networks (BNs)
- Hidden Markov Models (HMMs)
Logistics Review
- Class webpage: lengerichlab.github.io/pgm-spring-2025
- Lecture scribe sign-up sheet
- Readings,Class Announcements, Assignment Submissions: Canvas
- Instructor: Ben Lengerich
- Office Hours: Thursday 3:30-4:30pm, 7278 Medical Sciences Center
- Email: lengerich@wisc.edu
- TA: Chenyang Jiang
- Office Hours: Monday 11am-12pm, 1219 Medical Sciences Center
- Email: cjiang77@wisc.edu
HomeWork info
- Released: Due February 11 at midnight.
- PDF and Latex solution template (
.tex
) available on website.
- PDF and Latex solution template (
- Submit via: Canvas
- Most preferred format:
- PDF with your solution written in the provided solution box using LaTeX.
Conditional Independence
Definitions
- Independence:
Variables $X$ and $Y$ are independent $(X \perp Y)$ if:- $P(X, Y) = P(X)P(Y)$
- Conditional Independence:
$X$ and $Y$ are conditionally independent given $Z$ if:- $P(X, Y \mid Z) = P(X \mid Z)P(Y \mid Z)$
data:image/s3,"s3://crabby-images/f5e26/f5e26683e7c0cccbb85c15639922c9ae79c01621" alt=""
Example
- Medical Diagnosis:
Let $X = \text{Fever}, Y = \text{Rash}, Z = \text{Measles}$.
If a patient has measles ($Z$), knowing they have a fever ($X$) provides no additional information about whether they develop a rash ($Y$).
data:image/s3,"s3://crabby-images/d2214/d22145ecdb7522c7dbfcf377d737c4df6b7b516c" alt=""
Relate to Naïve Bayes
- Conditional independence allows us to compute $P(X\mid Y)$ efficiently.
- Switching the direction of one arrow does not change the probability.
- Switching the direction of two arrow changes the probability because there is a double-count evidence (two $X$’s repeatedly contains part of information about $Y$)
data:image/s3,"s3://crabby-images/514d7/514d75f314c6e644f6fd9dcc02003704e5bed79d" alt="".png)
data:image/s3,"s3://crabby-images/d89a5/d89a506daee3f4c98b23d279a090876cc9cd80b5" alt="".png)
data:image/s3,"s3://crabby-images/338bd/338bdfd2215bcc65ef743e43a1b59fdc4bf11343" alt="".png)
data:image/s3,"s3://crabby-images/7dfc1/7dfc115a46a8074961469493fc27a36de43389dd" alt=""
Directed Graphical Models (causality relationship)
Two types of GMs:
- Directed edges give causality relationship (e.g. Bayesian Network)
- Undirected edges give correlations between variables (e.g. Markov Random Field)
1. Markov Chains
- Markov Property:
The future state depends only on the current state:- $P(Z_{t+1} \mid Z_t, Z_{t-1}, \ldots) = P(Z_{t+1} \mid Z_t)$
- Transition Matrix:
Defines probabilities $P(Z_t \mid Z_{t-1})$. - Application:
Modeling sequences like weather patterns or stock prices.
data:image/s3,"s3://crabby-images/8f333/8f333a823e16ba5c498010fea338bc03b4617b90" alt=""
2. Hidden Markov Models (HMMs)
We need it when the underlying drivers are not observed
- Components:
- Hidden States ($Z_t$): Latent variables (e.g., emotional states in speech).
- Observations ($X_t$): Observed data (e.g., audio signals).
- Transition Probability: $P(Z_t \mid Z_{t-1})$.
- Emission Probability: $P(X_t \mid Z_t)$.
- Example:
Dishonest Casino:- Hidden states: Fair die ($Z=0$) vs. loaded die ($Z=1$).
- Observations: Dice rolls (e.g., $X=6$).
- Goal: Infer when the dealer switches dice based on observed rolls.
data:image/s3,"s3://crabby-images/9db73/9db7371cba1d88ea10d969df0d6cefaaf05df75a" alt="".png)
3. Bayesian Networks (BNs)
- Structure:
A BN is a directed acyclic graph whose nodes represent the random variables and whose edges represent direct influence of one variable on another. Provides the skeleton for representing a joint distribution compactly in a factorized way. It compacts representation of a set of conditional independence assumptions. We can view the graph as encoding a generative sampling process executed by nature. - Factorization:
Joint distribution factorizes as:- $P(X_1, \ldots, X_n) = \prod_{i=1}^n P(X_i \mid \text{Parents}(X_i))$
- Key Structures:
- Common Parent:
$A \leftarrow B \rightarrow C$ ⟹ $A \perp C \mid B$.
Example: $B = \text{Season}, A = \text{Rain}, C = \text{Sprinkler}$.
- Common Parent:
data:image/s3,"s3://crabby-images/68588/68588c768e239f3f7f7a34d46c71d638f7311e9e" alt=""
- Cascade:
$A \rightarrow B \rightarrow C$ ⟹ $A \perp C \mid B$.
Example: $A = \text{Smoking}, B = \text{Lung Damage}, C = \text{Cough}$.
data:image/s3,"s3://crabby-images/c5e2f/c5e2f938cc8f0863ed55d7fbcae9e4782c06f862" alt=""
- V-Structure (Collider):
$A \rightarrow B \leftarrow C$ ⟹ $A$ and $C$ become dependent if $B$ is observed.
Example: $A = \text{Alarm}, B = \text{Burglary}, C = \text{Earthquake}$.
data:image/s3,"s3://crabby-images/201cf/201cff98a06f276a6e4996c7f03c322fe431b224" alt=""
- I-map
- Independence set: let $P$ be a distribution on $X$. Define $I(P)$ to be the set of independences $(X \perp Y \mid Z)$ that hold in $P$.
- I-Map: Let $G$ be any graph object with an associated independence set $I(G)$. We say that $G$ is an I-map for an independent set $I$ if $I(G) \subseteq I$.
- I-Map Distribution: We say $G$ is an I-map for $P$ if $G$ is an I-map for $I(P)$, when we use $I(G)$ as the associated independence set.
data:image/s3,"s3://crabby-images/63407/634077aa936d5a1f39103ede148407ac62f40942" alt=""
- Facts:
- Any independence that $G$ asserts must hold in $P$. Conversely, $P$ could contain other independence that are not in $G$.
- We are capable to use $G$ to estimate $P$.
- Due to the property of BNs, $I(G)$ can be used to represent the local Markov assumption by $I_l(G)={X_i \perp NonDescendants_{X_i} \mid Pa_{X_i}:\forall i}$
data:image/s3,"s3://crabby-images/735d5/735d5d1af6b29cdf7b794dd07e69b7d0d7e5a50c" alt=""
- Example: $G_0$, $G_1$, $G_2$ can be I-map for $P_1$, while $G_0$ is not the I-map for $P_2$. Because $G$ implies independence that is not contained in $P_2$, so it cannot be used to estimate $P_2$.
data:image/s3,"s3://crabby-images/d2ac5/d2ac551ec85360db1d7d4111ad69d9343b250fb1" alt=""
- D-Separation:
- A path between $X$ and $Y$ is blocked by $Z$ if:
- Chain $\rightarrow \bullet \rightarrow$: Middle node is in $Z$.
- Fork $\leftarrow \bullet \rightarrow$: Middle node is in $Z$.
- Collider $\rightarrow \bullet \leftarrow$: Middle node and its descendants are not in $Z$.
- A path between $X$ and $Y$ is blocked by $Z$ if:
data:image/s3,"s3://crabby-images/d3973/d3973fa209c32f53ebcab26290a8ed4b66e99fff" alt=""
data:image/s3,"s3://crabby-images/fdc52/fdc52897e1465462637f16c68b7464ab172d8140" alt=""
data:image/s3,"s3://crabby-images/af783/af7839e9d356137016dd45dbb03bf767eb8fb800" alt=""
- Definition: $X$ and $Y$ are d-separated given $Z$ if all paths are blocked.
- MAG definition of D-Separation: Variable $X$ and $Y$ are D-separated given $Z$ if they are separated in the moralized ancestral graph.
data:image/s3,"s3://crabby-images/5fd46/5fd46d886ca66634c7ffa3a2bdbe3e8c16acc638" alt=""
- Example:
data:image/s3,"s3://crabby-images/82789/8278918be5d9062c4f5096d0457bdf84c1e493d3" alt=""%20graph.png)
Learning & Inference
Learning in BNs
- Parameter Learning:
Estimate Conditional Probability Tables (CPTs) from data:- $P(X_i \mid \text{Parents}(X_i)) = \frac{\text{Count}(X_i, \text{Parents}(X_i))}{\text{Count}(\text{Parents}(X_i))}$
- Structure Learning:
Use algorithms like K2 or PC to infer the DAG from data.
Inference in BNs
- Exact Inference:
- Variable Elimination: Marginalize variables step-by-step.
- Junction Tree Algorithm: Transform the BN into a tree structure.
- Approximate Inference:
- Sampling: Markov Chain Monte Carlo (MCMC).
- Loopy Belief Propagation: Message-passing in cyclic graphs.
I-Equivalence
Definition
- Two Bayesian Networks are I-equivalent if they encode the same set of conditional independence statements.
- Example:
Networks $A \rightarrow B \rightarrow C$ and $A \leftarrow B \leftarrow C$ are I-equivalent.
data:image/s3,"s3://crabby-images/7fd29/7fd2968308b254324653a77114bd6192c0021c4b" alt=""
Implications
- Different graph structures can represent identical independence relationships.
- Critical for model selection and avoiding overfitting.
Notation: “Plate”
- Naïve Bayes with Streamlined Notation:
data:image/s3,"s3://crabby-images/96691/96691433682008483a6cefdeb90e9ff033abfeaa" alt=""
Applications
1. Naïve Bayes Classifier
- Assumption: Features are conditionally independent given the class.
- Formula:
- $P(Y \mid X_1, \ldots, X_n) \propto P(Y) \prod_{i=1}^n P(X_i \mid Y)$
- Use Case: Spam detection, sentiment analysis.
2. Hidden Markov Models
- Applications:
- Speech recognition (mapping audio to words).
- Bioinformatics (gene prediction from DNA sequences).
3. Causal Inference
- Bayesian Networks for Causality:
- Identify causal effects using interventions.
- Example: Estimating the effect of a drug on recovery while controlling for confounders.