Discriminative Margin Clustering

Kamesh Munagala, Rob Tibshirani and Patrick O. Brown

Home
Paper Home  
Figures
Paper figures  
Data Set
Tumor and Normal data set
Analysis
Results of the clustering method on the data 
Code
C code for the clustering and expansion methods
Authors
People who contributed to the project

Tumor and Normal Data Set

    Tumor and Normal dataset from micro-array experiments in PCL format. We will put up the raw microarray data from the experiments once the paper is accepted for publication.

    We impute missing values using K-nearest neighbor method. Any values still missing are set to very low expression values. Imputing missing values must be done carefully, as different methods of imputing could potentially produce different results. We suggest setting left over missing values to the baseline value, zero. We also recommend re-doing the analysis with all negative values set to zero, to reduce the impact of genes with large negative expression values in certain tissues.

        List of Tumor samples

        List of Normal samples

        List of genes

    For the sake of convenience, we will use the IDs (which are integers starting from 0) instead of experiment id or image id. These IDs are indicated in the first column in the lists given above. For example, the normal samples are given IDs in the range 0..103.

        List of tumor IDs classified according to known physiological type.