Research

Understanding Protein Binding Interactions

Using the methods described above, I have performed several investigations into the determinants of protein-water binding. The Consolv algorithm is a k-nearest-neighbor classifier hybridized with a genetic algorithm for feature selection and extraction. This algorithm can predict conserved water binding between independently-solved crystallographic protein structures with ~67% accuracy. Analysis of the features employed in this classification has provided some insight into the physical and chemical determinants of protein solvent binding and solvation site conservation between ligand-bound and unbound structures. Hybrid classification techniques have proven useful for identification of other types of protein binding sites as well, including metal binding sites in metaloenzymes, and protein active sites.

Data Mining of Three-Dimensional Protein Structure Information

Another area of current research interest is the use of hybrid EC/classifier techniques, as well as graph theory and exploratory statistical methods, to analyze and understand large databases of three-dimensional protein structure information such as the RCSB Protein Data Bank, and the smaller PDB-select database. Possible areas of exploration include domain identification, secondary structure prediction, and analysis of structure-function relationships.

Drug Lead Screening by Region-based Regression for QSAR analysis.

The recent advances in genomics present an opportunity for significant related developments in computing science and education and research. Genomics and other branches of bioinformatics have evolved from the biological aspects of the science of genetics and from the artificial intelligence, database, and algorithm disciplines of computer science. The bioinformatics research group is working to develop region-based regression techniques to improve QSAR analysis of potential drug leads. The eventual goal of this research is to develop empirical potential functions for computational drug screening and docking algorithms.

Molecular Visualization

Molecular visualization programs are available on many platforms. They allow a user to visualize and manipulate molecular structures. PocketMol provides the same functionality on a Pocket PC handheld computer. Using standard protein data bank (pdb) files, the user can move, rotate, and scale a protein to explore its structure and function. The user can choose from a standard backbone view or a simplified view using only alpha carbon atoms. Pocket-MolGX uses the Microsoft Game API to provide fast animation that is quite smooth. PocketMol is designed as an aid for those wishing to explore or demonstrate protein structures without the availability of a full-size computer. PocketMol has the basic functionality of its desktop counterparts, and with some work, will be a suitable alternative for those wishing to explore or demonstrate protein structures on a handheld device. PocketMol web page.

Comparative Protein Structure Modeling

The sequencing of the human genome was a great stride toward modeling our cellular complexes, massive systems whose key players are proteins and DNA. A major bottleneck limiting the modeling process is structure and function annotation for the new genes. Contemporary protein structure prediction algorithms represent the sequence of every protein of known structure with a profile to which the profile of a protein sequence of unknown structure is compared for recognition. We propose a novel approach to increase the scope and resolution of protein structure profiles. Our technique locates equivalent regions among the members of a structurally similar fold family, and clusters these regions linkers by structural similarity. Equivalent substructures can then be swapped on the common regions to generate an array of profiles which represent hypothetical structures to supplement profiles of known structures. Strategies for a specific implementation are discussed, including application to multiple template comparative modeling. CMPare web page.

Forensic Analysis of DNA evidence data

Cases involving DNA usually involve heinous crimes, including rape, assault, and murder. Alarmingly, fewer than 1% of these cases are reviewed by the defense, meaning that there is a high risk of undiscovered mistakes and ultimately, a wrongful conviction. Barriers to reviewing STR-DNA evidence are high (i.e. software needed to even open the files provided by crime laboratories in discovery costs over $18,000) and not enough experts are available to review every case. Experts are needed, but they simply do not have enough time to analyze as many cases as they would like. Running the DNA analysis software is an involved process that takes several hours. Only then can the expert interpret the evidence. The Genophiler software developed by WSU's BiRG and comertailized by Forensic Bioinformatic Services offers a solution by automatically running the analysis software with very little setup time. Experts can evaluate more cases and have more and better opportunities to find serious problems such as unreported secondary contributors, failed controls and overstated interpretations of test results. Forensic DNA evidence is a new technology, yet it has accounted for over 100 people being exonerated after being convicted. How many others are currently in prison due to inadequate review of DNA test results is still an open question. FBS web page.

Repetitive elements as time-series genomic data.

Short interspersed repeats (SINEs) account for a significant portion of the "junk DNA" within the mammalian genome. Thousands of copies of SINE subfamilies are scattered essentially randomly through the human genome. Although the copies of each repeat subfamily are identical at the time of their insertion, they become subject to individual substitutions after insertion. As the relative time of insertion is known for many of these repeats, such repeats provide sizeable number of time-series data points for studying substitution effects in a variety of genomic contexts. Herein, we summarize the factors relevant to the use of the Alu family of repeats. Furthermore, we demonstrate both the utility and some potential pitfalls associated with the use of repetitive elements in bioinformatic assays. Alu web page.

Designed by Paul Anderson and Matthew Gerald
Last modified: 01Jun2005