Wright State University

CS/BIO 271 -- Homework Assignment #3

15 Points. Due in class, Thursday, May 5

Validating the Jukes-Cantor and Kimura models

The objective of this lab is to design and execute a computational experiment that will validate both the Jukes-Cantor and Kimura metods for estimating substitution rates.

Validating the Jukes-Cantor model

The first part of your assignment is to write a perl script (or two) that simulates substitutions over time. First, given an input sequence and a substitution rate, your script should make random substitutions according to the given rate, and output a new sequence. For example: given the input sequence ACAGATTATA and a substitution rate of 0.25, your script should output a similar sequence, but with 25% of the nucleotides substituted with another, random nucleotide. There must exist the possibility that a given nucleotide position will undergo multiple substitutions.

Once you have written this script, you should use it to simulate evolution on a set of input sequences, and then determine the relationship between the number of substitutions your script is generating, and the number observed in the output sequence. I will provide you with a script for generating random nucleotide sequences of a given length to serve as input for your script.

Validating the Kimura model

The second part of your assignment is similar to the first, however your second Perl script should accept as input two substitution rates, one for transitions and one for transversions. You should again generate substitutions according to these rates on the input sequence, allowing the possibility that multiple substitutions may occur at any given site. Using the output of this script, you should validate the Kimura two-parameter model.

Generating random numbers

In order to generate random substitutions, you will need a method to generate random numbers. The perl function rand() generates a random number between 0 and 1. You can multiply this number by any number n to generate a random number between 0 and n.

Example:

	# generate a random number between zero and 4
	randomnum = rand() * 4;  

Generating random nucleotide sequences for testing

The following perl script will generate a random sequence of nucleotides that you can use as input for your scripts: randseq.pl.

Usage:

  randseq 1000 > seq.txt
Will place a 1000 nucleotide sequence into the file seq.txt.

What to turn in

After running your experiments for various sized sequences, you should prepare for me a small (3-5 pages) report describing exactly how you tested the two models (Jukes-Cantor and Kimura), showing the results of those tests (with appropriate graphs, tables, and/or other figures) and explaining the results of your tests. This report should include your perl code as an appendix (not included in the 3-5 page count). The quality of your code, the completeness and design of your experiment, and the quality of your report will all contribute to your grade for this lab.