Phylogenetics


Tasks for the lab report
Analyse two of the alignments given above, your choice, (one should be DNA and one aa-sequences) and use at least two of the three main methods of phylogenetic recounstruction, distance, parsimony, maximum likelihood. The threes should include some kind of statistical testing.

Hemoglobin B from various species (unaligned): Amino acid sequence

hemo-boot-njnj.plot

Figure 1. (above) A constructed phylogeny that is a Neighbor joining tree with a boot-strap test of the hemoglobin B aa sequence between 12 taxa. The substitution model: Poisson Correction ( = most probable distributation of substitutions of amino acids on the sequence, better than PAM); all done in MEGA 3. The figure below represent the neighbor joining of the same sequences displayed in NJ-plot.

hemo-pars
Figure 2. The Phylogeny tree construct with the maximum parsinomy method. Bootstrap did not work on Maximum Parsimony using MEGA 3. Substition model was not applicable (why?: It is completely different from nj. It does not use substitution models. Maximum parsimony is based on principal of minimal evolution.)


distance
Figure 3. Pairwise Distances matrix for the aa sequences using poisson Correction as substitution model. This distance matrix values are the basis for the neighbor joining phylogeny tree seen in figure 1.




A 896 bp segment of mtDNA d-loop for five primates from Brown et al. (1982);

homi-boot-nj
Figure 4. The neighbor joining phylogeny tree uses kimura 2 parameter as a substitution model. Bootstrap values are indicated in the figure. The figure indicates the gorilla to be the most recent common ancestor for the chimpanzee and human. This correlates well with an overall putative model of hominoid evolution.

homi-pars
Figure 5. The phylogeny tree with the maximum parsimony method. In contrast to the neighbor joining tree (above in figure 4) this tree claims the human to be the recent common ancestor of the mtDNA d-loop sequence to the gorilla and the chimpanzee.

homi-distance
Figure 6. The distance matrix basis for the neighbor joining phylogeny tree (figure 4).



Conclusion

The distance matrix value; the closer the value is to zero the more similar the pairwise alignments is. In figure 6, this would mean that chimpanzee and human (0,095) is more similar than human and gibbon (0,212). This is displayed in the neighbor joining phylogeny tree (figure 4.)

Neighbor joining is an algorithm for inferring a branching tree diagram from the distance matrix. It works by successively clustering pairs of taxa together. NJ can facilitate contemporary tips of uneven length. The nj is effective to use for datasets comprising sequences with largely varying rates of evolution (not proceeded as a clock).

"Bootstrap is a method that is analagous to cutting the data matrix into individual columns of data and throwing the characters into a hat. A character is then drawn at random from this hat and it becomes the first character of the new datamatrix. The character is then replaced in the hat, the hat is shaken and again another character is drawn from the hat. This process is repeated until our new pseudoreplicate is the same size as the original. Some characters will be sampled more than once and some will not be sampled at all. This process is repeated many times (say, 100-1,000) and phylogenetic trees are reconstructed each time. After the bootstrap procedure is finished, a majority-rule consensus tree is constructed from the optimal tree from each bootstrap sample." The bootstrap support for any internal branch is the number of times it was recovered during the bootstrapping procedure. Bootstrap values over >50 are considered valid. In short, bootstrap is a method estimating probability in a relation study and proceeds by resampling the original data matrix with replacement of the characters.


Maximum parsimony (character-based method) means that phylogenetic trees that can explain a given data set (aligned sequences) by fewer evolutionary events is preferred over a tree that requires more evolutionary events. It follows the principle of simpler solutions being preferred over more complex ones.



Another note is that the nj-plot function "swap nodes" is not to fiddle with data; it is just a view.


(info found in lectures, google and http://www.bioinf.org/molsys/glossary.html)


Hints in Mega3

1. Align (export to Mega file)
2. Phylogeny
3. Distances: compute pairwise