A framework towards computational discovery of disease sub-types and associated (sub-)biomarkers.
Biomarker related patient data is generally assessed in order to determine relevant but generalized subset of the biomarkers. However, it fails to identify specific sub-groups of the patients or their corresponding (subset of) the biomarkers. This paper therefore proposes a novel framework that is capable of discovering disease sub-groups (types) and associated subset of biomarkers, which is expected to lead to enable the discovery of personalized bio-marker set. The framework is based on the utilization of a histogram obtained by using the Euclidean distances between the samples in a given data set. The t-test method is used for the selection of sub-set(s) of the biomarkers whereas the classification is performed by means of k-nearest neighbor, support vector machines and naive Bayes (NBayes) classifiers. For the assessment of the methods, leave-out-out cross validation is employed. As a case study, the method is applied in the analysis of male hypertension microarray data that consists of 159 patients and 22184 gene expressions. The method has helped identify specific sub-groups of the patients and their corresponding bio-marker sub-sets. The results therefore suggest that the generalized bio-marker sub-sets are not representative of the disease and therefore more focus should be on the sub-groups of the patients and their biomarker subsets identified through the proposed approach. It is particularly observed that the threshold values over the histogram are crucial to discover both sub-sets of the samples and biomarkers, and therefore can be used to determine complexity level of the study.
Citation:Kurnaz, M.N. & Seker, H. (2013) A framework towards computational discovery of disease sub-types and associated (sub-)biomarkers. Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE, pp. 4074-4077.