Computer Science 221: Assignment #02
Spring 2007
Department of Informatics
Donald Bren School of Information and Computer Sciences
University of California, Irvine
The
goal of this project is as a warm-up to working with graphical models and the concepts involved with sequential data modelling. In this assignment you will be creating mathematical models of movie scripts and the using the models to generate new scripts which are mathematically similar. This assigment is supposed to fun !
Data set is located here.
- This is an individual project. Your work should be done on your own.
- Write a computer program which trains a markov model on movie scripts
- Use any language/environment you would like.
- You may not use a library which implements markov models. This assignment is for you to implement such a program.
- Develop the following models
- A one state model in which each state corresponds to a single ASCII character
- A one state model in which each state corresponds to a pair of ASCII characters with no overlap.
- The phrase "showtime" would therefore consist of 4 states and 3 state transitions
- A second order markov model in which each state corresponds to a single ASCII character.
- A third order markov model in which each state corresponds to a single ASCII character
- A one state model in which each state corresponds to word. Inter-word punctuation should be considered its own word
- Train each model on
- The test data
- implement no parameter smoothing in this training set
- A single script of your choice
- implement single count parameter smoothing in this training set
- All the scripts of a single genre
concatenated together
- implement single count parameter smoothing in this training set
- Use your model to generate 5 lines of dialog for each (model x data set) pair
- Initiate each model with the most frequent
- single letter
- letter pair
- letter pair
- random choice
- random choice
- You should therefore be turning in 15 sets of 5 lines of dialog
- Put your output into a single 15 page Word or .pdf document. On each page put a header which indicates which model and which training set this output corresponds to.
- Email the resulting document to me with the subject line "CS 221 Assignment 02" by 11:59pm on the due date (see calendar)