Tumor
and Normal dataset from micro-array experiments in PCL
format. We will put up the raw microarray data from the
experiments once the paper is accepted for publication.
We
impute missing values using K-nearest neighbor method. Any
values still missing are set to very low expression values.
Imputing missing values must be done carefully, as different
methods of imputing could potentially produce different
results. We suggest setting left over missing values to
the baseline value, zero. We also recommend re-doing the
analysis with all negative values set to zero, to reduce
the impact of genes with large negative expression values
in certain tissues.
List of Tumor
samples
List of Normal
samples
List of genes
For
the sake of convenience, we will use the IDs (which are
integers starting from 0) instead of experiment id or image
id. These IDs are indicated in the first column in the lists
given above. For example, the normal samples are given IDs
in the range 0..103.
List of tumor
IDs classified according to known physiological type.