Assignment 3. Similarity a la Needleman-Wunsch
As in assignment 1, in this assignment your deliverables consists of two two parts. You need to hand in a) the code for computing the global similarity of a family of dna strings and in a separate document, the output of your program together with the answers to a number of questions given below.
- Write a Java or Python program to a) read in a fasta file of strings, b) compute the global similarity between any two strings and c) print out the similarity matrix for all pairs of strings. Deposit this program in your folder. I thought writing the program to read in the strings was harder than the program for global similarity. In the worse case and with a loss of credit, you can simply create an array of strings where you hard code the strings.
- From the masterhit course site get the file of 10 viruses. These are in fasta format. Use the settings gap = -1, mismatch = -1, and match = +1. With these settings compute the similarity matrix (10 by 10) and put the matrix in a *.doc file. Add to the word file the answers to the following questions.
- Which two strings have the greatest similarity and what is their similarity?
- Which two strings have the least similarity and what is their similarity.
- Describe strings s1 and s2 of lengths n and m which would have maximum similarity? Use the scoring scheme where the gap cost is -1. What would their similarity score be?
- Describe strings s1 and s2 of lengths n and m which have minimum similarity? Again use the scoring scheme where the gap cost is -1. What would their similarity be?