Lecture 16 - Deep Learning from a GM Perspective

Deep learning foundations viewed through the lens of graphical models, covering perceptrons, neural networks, backpropagation, and probabilistic interpretations.

Logistics Review

Graded material

About Research Papers

While research papers may appear rigorous and comprehensive, they often omit practical nuances. As learners, we should approach them optimistically — extracting value even from imperfect sources.


From Biological to Artificial Neurons

Early AI models were inspired by biological neurons. The McCulloch & Pitts neuron, proposed in 1943, used threshold logic to compute simple Boolean functions like AND and OR. However, it could not represent XOR.


The Perceptron and Its Limits

The perceptron extended MP neurons by introducing weighted inputs and activation functions (like sigmoid):

f(x) = \sigma(w^\top x + b)

This model supports gradient-based learning for functions like:

Y \sim \mathcal{N}(f(x), \Sigma) \Rightarrow \arg\min_w \sum_i \left(y_i - f(x_i; w)\right)^2

Why XOR Cannot Be Represented

Suppose XOR could be represented by a single-layer perceptron. Then:

for input (1,1):

\sigma(w_1 + w_2) < \theta

for input (1,0), (0,1):

\sigma(w_1) \geq \theta , \sigma(w_2) \geq \theta

Adding the latter two contradicts the first — no linear decision boundary exists.


Multi-Layer Perceptrons (MLP)

To model non-linear functions like XOR, multi-layer perceptrons are introduced:

x h = \sigma(Wx + b) \hat{y} = W' h + b'

This architecture allows hierarchical feature extraction and can represent any continuous function under mild assumptions.


Backpropagation

Neural networks are compositions of differentiable functions:

L = \ell(f_3(f_2(f_1(x))))

The gradient is computed via reverse-mode autodiff:

\frac{dL}{dx} = \frac{dL}{df_3} \cdot \frac{df_3}{df_2} \cdot \frac{df_2}{df_1} \cdot \frac{df_1}{dx}

This process is the core of backpropagation, enabling scalable training of deep networks.


Graphical Models vs. Deep Nets

Graphical Models (GMs) Deep Neural Networks (DNNs)
Probabilistic semantics Function approximation
Explicit latent variables Learned intermediate features
Inference via message passing Learning via SGD

While GMs offer interpretability, DNNs provide flexibility and scalability. Hybrid models aim to combine both strengths.


Probabilistic Neural Nets

Restricted Boltzmann Machines (RBM)

An RBM is an undirected graphical model with visible ( v ) and hidden ( h ) units:

P(v, h) \propto e^{-E(v, h)}, \quad E(v, h) = -v^\top W h - a^\top v - b^\top h

RBMs are trained using contrastive divergence, and serve as building blocks for deeper models.


Deep Belief Networks (DBN)

DBNs stack multiple RBMs and apply layer-wise pretraining, followed by supervised fine-tuning. They offer a probabilistic view of deep learning, with each layer capturing increasingly abstract representations.


NNs and GMs—Natural Complements

Neural networks and graphical models represent two major paradigms in probabilistic AI. While they originate from different modeling philosophies, they can be viewed as complementary in both function and design.

Graphical models (GMs) offer structured representations of joint distributions, with explicit semantics over variables and their dependencies. In contrast, neural networks (NNs) are powerful function approximators, trained end-to-end via gradient-based optimization. Despite this apparent contrast, many modern systems integrate the two:

Specific examples include:

Ultimately, the synergy between GMs and NNs enables systems that are both expressive and interpretable.