wedge Kick-off
wedge Sweet Home Country Grammar
* Mei-Lwun
* Announcements
* Go over names
wedge Go over assignment 01
* Survey
wedge New Material
* Talked about a single state markov model
* Talked about probabilities and calculating probabilities
wedge Smoothing
wedge Training a model is based on generalizing over a set of training instances
* memorizing examples is consistent but doesn't generalize
* Occam's razor "all other things being equal, the simplest consistent explanation is best"
* P(feature|class) = n_featureandclass/n_class
wedge Since you rarely have all the training instances you must account for those you do not have
* a rare feature
* avoid zeroes in the probability distributions
* Smoothing is one way to do this.
wedge LaPlace smoothing using an m-esitmate assumes that each feature is given a prior probability,p, that is assumed to have been previously observed in a "virtual" sample of size m.
* P(feature|class) = n_featureandclass + mp /( n_class + m)
* For binary features, p is assumed to by 0.5.
* This is equivalent of seeing each word in a category once.
* Do an example
wedge Using a Markov model generatively
* Start with probability estimates
* and a good random number generator
* walk forward through the world
wedge Assignment 02
wedge Data set is now available
* only 2 overlaps
* ./ACTION/Leon_JiShengyue.txt
./ACTION/PirateOfTheCaribbean_SinhaPinaki.txt
./ACTION/PulpFiction_NasrRamzi.txt
./ACTION/PulpFiction_SutterNathan.txt
./ACTION/Terminator_BichutskiyVadim.txt
./ACTION/TheBourneIdentity_JiShengyue.txt
./COMEDY/AmericanSplendor_VernicaRares.txt
./COMEDY/DogDayAfternoon_VernicaRares.txt
./COMEDY/DumbAndDumber_DesaiChaitanya.txt
./COMEDY/DumbAndDumber_PirzadehPouria.txt
./COMEDY/Election_TikuZubin.txt
./COMEDY/Friends1_PirzadehPouria.txt
./COMEDY/Friends2_PirzadehPouria.txt
./COMEDY/Friends3_PirzadehPouria.txt
./COMEDY/Friends4_PirzadehPouria.txt
./COMEDY/TheresSomethingAboutMary_SaprooSameer.txt
./DRAMA/Braveheart_PartidaAugusto.txt
./DRAMA/Brick_JavanmardiSara.txt
./DRAMA/FightClub_NasrRamzi.txt
./DRAMA/GodFather_HabibiAmir.txt
./DRAMA/GodFatherII_HabibiAmir.txt
./DRAMA/Rocky_LinsteadEric.txt
./DRAMA/SixthSense_AlmishariMishari.txt
./DRAMA/TheQueen_JavanmardiSara.txt
./DRAMA/TheSting_LinsteadEric.txt
./DRAMA/TrainingDay_AlmishariMishari.txt
./FAMILY/FatherOfTheBride_SinhaPinaki.txt
./FAMILY/Shrek_BauTien.txt
./FANTASY/AI_SaprooSameer.txt
./FANTASY/Dune_TikuZubin.txt
./FANTASY/LOTR_ROTK_PartidaAugusto.txt
./FANTASY/ReturnOfTheJedi_BauTien.txt
./MISC/Pi_SutterNathan.txt
./MISC/PrettyWoman_BichutskiyVadim.txt
./MISC/TheMummy_DesaiChaitanya.txt

wedge Break
* Face transformer
* www.djp3.net—face_transformer_1.html
wedge More new material
wedge Classifying a sequence
wedge Probability calculation
* lots of multiplications of small number goes under zero
wedge Underflow prevention
* log(P(a)*P(b)) =log(Pa)+log(Pb)
* class with highest final unnormalized log probability score is the most probable
* Bring this back to classifying a movie genre