  Intro
|
  Admin
|
  Mid-Term Evaluation open
|
  remember to come by office hours once
|
  Recap
|
  From Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition"
|
  Defining an HMM
|
  HMM is characterized by
|
  N, number of states in the model
|
  S, {S_1,S_2,S_3}
|
  state at time t is q_t
|
  M the number of distinct observation symbols per state
|
  V = {v_1,V_2,V_3}
|
  State transition probability
|
  A={a_ij}
|
  a_ij = P(q_t+1 = S_j | q_t = S_i) 1<= i, j <= N
|
  if all states can reach all states then a_ij > 0 for all i,m
|
  The Observation symbol probability distribution in state j
|
  B = {b_j(k)}
|
  b_j(k) = P(v_k at t | q_t = S_j) 1<= j <=N , 1<=k <=M
|
  The initial state distribution pi = {pi_i}
|
  pi_i = P(q_1=S_i) 1<= i <=N
|
  Example of how to use it generatively
|
  Model can be completely determined by M, N, A, B, and pi
|
  New Material
|
  Motivate with a house and motion sensor example
|
  3 problems from Rabiner
|
  The 3 basic HMM problems
|
|
  Problem 1 is the evaluation problem
|
  It is also considered the scoring problem
|
  Choosing among multiple models
|
  Problem 2 is an attempt to uncover the hidden variable or find the correct state sequence
|
  "correct" is not accurate -> some optimality criterion there are several possible
|
  Problem 3 is the learning problem
|
  "training" the HMM based on some observed data
|
  Solving Problem #1
|
|
|
|
|
|
  A more tractable version is called the Forward-Backward procedure
|
|
|
|
|
|
|
  So this only describes the "forward variable"
|
  It is sufficient for problem 1
|
  but we will want to use a backward variable for the other problems
|
|
|
|
|
|
  Problem 2
|
  Lots of possible solutions
|
  "optimal"
|
  choose the states which are individually most likely
|
|
|
|
|
  Problems with this formulation
|
  State sequence may not be possible
|
  Other ways of optimizing are possible
|
  One well-known optimization is to optimize the single best state sequence
|
  Based on dynamic programming
|
  "Viterbi Algorithm"
|
|
|
|
|
|
|
  Two points
|
  This is very similar to the solution to Problem 1 except that we are maximizing rather than summing
|
  and it should be clear that it can be done efficiently with a lattice structure
|
  Problem 3
|
  Adusting the model parameters (A,B,pi) to maximize the probability of the observation sequence
|
  This is a global optimization problem
|
  No known analytical problem of maximize
|
  Use a local optimazation procedure called Baum-Welch algorithm
|
  iterative
|
  Expectation-Maximization procedure
|
|
|
|
|
|
|
|
|