Genomic Data Mining Enhanced by Symbolic Manipulation of Boolean Functions

Dr. Sungroh Yoon

Research Fellow, Standford University, USA

2005. 9. 8

Today, more and more large-scale genomic data sets are being produced by various high-throughput technologies, and genomic data mining has never been more important. Clustering is an unsupervised learning technique that has been popular in data analysis. Although there is mature statistical literature on clustering, new types of genomic data such as gene expression data have sparked development of multiple new methods. Specifically, the technique of biclustering refers to a method that performs simultaneous clustering of rows and columns in a data matrix identifying patterns that appear in the form of (possibly overlapping) submatrices. Although this method has some clear advantages over conventional clustering techniques, it has been challenging to develop an efficient biclustering algorithm, since the problem of biclustering is inherently intractable and hard to approximate. In the first part of this dissertation, a novel biclustering algorithm based upon the symbolic manipulation of Boolean functions is presented. This algorithm exploits the zero-suppressed binary decision diagrams (ZBDDs) to implicitly represent and manipulate massive intermediate data occurring in the biclustering process. Leveraged by the ZBDDs, the proposed algorithm can find all the biclusters that satisfy specific input parameters.

The second part discusses the application of this algorithm to various genomic data mining tasks such as analyzing gene expression data, linking clinical traits with related genes, and predicting microRNA regulatory modules. The experimental results demonstrate that the proposed method outperforms the alternative techniques tested - in terms of response time, the number of biclusters that can be found, and more importantly, how accurately the discovered biclusters conform to the known biological knowledge.


This page is maintained by Ji-seon Yoo (
Last update: September 8, 2005