The Main Goal of this class is to enable biologists and computer scientists to work together. This requires learning something of each others concepts and methodologies.
Grading Your grade is based on your homework (minor), your project presentation and write-up (major). If you believe your homework assignment has been misgraded (a distinct possibility), simply resubmit with an explanation and I will look at it again. Or see me during office hours. A project does not have to be successful. It need only be a reasonable effort. The project write-up should follow the standard paper format.
NOTE: The following is a tentative schedule. The general flow is correct, but some later topics may be replaced. Time may vary with the requirements for understanding the work. Questions in class or via email are welcomed. Only next week's homework is guarantee to be correct.
Default assignment: For each paper or chapter that is required, hand in a short typed observation, question, correction or criticism about the work. These are due on first meeting of the class for the week, usually tuesdays. A paragraph is plenty. Default assignments may be overruled or supplanted
The best type of comment would take the form of constructive criticism: how could the author make the paper more convincing or more significant. Pretend it is a relative that has written the paper and you are helping them get their ideas published.
Example Projects: Projects do not need to be successful, in the sense of discovering something new. WARNING: Use other peoples data and other peoples algorithm. If data is not in hand, the project will not finish. If the algorithms are not known, the time to code them is too short. A project may reconfirm what is already known.
1. Working with a biologist. Understand the data and the question that the biologist has and apply either some machine learning algorithm or download existing software and apply it to the problem. This is a 2-person project and the final report should reflect this. The write-up will be about 8 pages including a description of the data, the problem, and the algorithm.
2. Working another CS student. Using existing biocomputational programs. This is similar to 1, but with much less chance of success. An example would be to apply some gene-finding program to some genome. You would explain the algorithm, the results and alternative approaches.
3. Working with another CS student. Instead of using existing bioinformatics tools, use machine learning methods. Numerous algorithms are available over the web. A suite of such algorithms is available at the Weka site, a unified collection of about 30 machine learning algorithms. Write-up would be similar.
4. Working with another CS students or alone. Any idea you have for analyzing genomic information (genomes, gene expression data, protein data, metabolic data etc). You may implement your own algorithm, but the aim of the project, which may not be realized, should be the discovery of new Biological knowledge.
I will be suggested many projects in the lectures. Unless noted, readings are from the text "Bioinformatics" by Mount. There are many sites with lectures notes for Computational Biology, but I think the notes from Martin Tompa's class are particular useful. Here's the url: http://www.cs.washington.edu/homes/tompa/. Tompa is a computer scientist and Mount is a biologist. The differences in the way they think should be apparent. Another good source for lectures notes is from Princeton at http://www.cs.princeton.edu/courses/archive/fall01/cs551/.
For notes on mathematical aspects of this course , such as probability, entropy and hidden markov models see: http://www-2.cs.cmu.edu/~awm/tutorials/.
Algorithms can be and should be understood at multiple levels. At the minimum you should clearly understand the inputs, outputs, and assumptions, i.e. you should know what the algorithm computes. A different level of understanding is how the computation is carried out. That level is necessary if you want to code or improve the algorithm.
Identify candidate response elements to Androgen Receptor
Chris Wasserman, Chin-Yi Chu, & Greg Kodama
Analyze Life-Cycle Gene Expression data for Chlamydia (1000 ORFS) to determine which genes are responsible for transforming from RB (reticular body/non-infectious) to EB(elementary body/infectious). Next analyze upstream regions for regulatory binding sites.
Johnny Akers, Jianlin Cheng & Arlo Randall
Study of protein-protein interactions in yeast
Kevin Lin, Yimeng Dou & Haiying Deng
Correlate Gene Expression data with Protein-Protein Interaction Data
Lin Wu & Yu-Chyuan Su