Intro

Admin

Mid-Term Evaluation open

remember to come by office hours once

Recap

From Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition"

Defining an HMM

HMM is characterized by

N, number of states in the model

S, {S_1,S_2,S_3}

state at time t is q_t

M the number of distinct observation symbols per state

V = {v_1,V_2,V_3}

State transition probability

A={a_ij}

a_ij = P(q_t+1 = S_j | q_t = S_i) 1<= i, j <= N

if all states can reach all states then a_ij > 0 for all i,m

The Observation symbol probability distribution in state j

B = {b_j(k)}

b_j(k) = P(v_k at t | q_t = S_j) 1<= j <=N , 1<=k <=M

The initial state distribution pi = {pi_i}

pi_i = P(q_1=S_i) 1<= i <=N

Example of how to use it generatively

Model can be completely determined by M, N, A, B, and pi

New Material

Motivate with a house and motion sensor example

3 problems from Rabiner

The 3 basic HMM problems

Pasted Graphic

Problem 1 is the evaluation problem

It is also considered the scoring problem

Choosing among multiple models

Problem 2 is an attempt to uncover the hidden variable or find the correct state sequence

"correct" is not accurate -> some optimality criterion there are several possible

Problem 3 is the learning problem

"training" the HMM based on some observed data

Solving Problem #1

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

A more tractable version is called the Forward-Backward procedure

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

So this only describes the "forward variable"

It is sufficient for problem 1

but we will want to use a backward variable for the other problems

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Problem 2

Lots of possible solutions

"optimal"

choose the states which are individually most likely

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Problems with this formulation

State sequence may not be possible

Other ways of optimizing are possible

One well-known optimization is to optimize the single best state sequence

Based on dynamic programming

"Viterbi Algorithm"

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Two points

This is very similar to the solution to Problem 1 except that we are maximizing rather than summing

and it should be clear that it can be done efficiently with a lattice structure

Problem 3

Adusting the model parameters (A,B,pi) to maximize the probability of the observation sequence

This is a global optimization problem

No known analytical problem of maximize

Use a local optimazation procedure called Baum-Welch algorithm

iterative

Expectation-Maximization procedure

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic

Pasted Graphic