| Home
|
Paper Home
|
| Figures
|
Paper figures
|
| Data
Set |
| Tumor and Normal data set |
| Analysis
|
| Results of the clustering
method on the data |
| Code |
| C code for the clustering and expansion methods |
| Authors
|
| People who contributed
to the project |
|
|
| Code |
C code for finding
expanded feature sets. The I/O consists of the following
files:
-
"tumor.txt"
contains the gene expression data for the tumor samples
in the following format. Row i and Column j contains
the expression value of gene with ID j-1 in tumor
with ID i-1. Please ensure there are no missing values.
-
"normal.txt"
contains the gene expression data for the normal
samples in the following format. Row i and Column
j contains the expression value of gene with ID j-1
in normal sample with ID i-1. Please ensure there
are no missing values.
-
"node.txt"
contains a list of sets of tumor IDs (one set on each
line) for which we need expanded feature sets. For
example: "1 2<newline>4 5<newline>9 15"
would indicate 3 sets of tumors, (1,2), (4,5) and
(9,15).
-
"val.txt"
contains the combined margin for each of the sets
in "node.txt", one margin value on each line. The
number of lines in "node.txt" and "val.txt" are therefore
identical.
-
Command
line input: Number of tumors, Number of normal samples,
Number of genes.
-
Standard
Output: For each set of tumors, the expanded feature
set produced by the quadratic programming method.
We output the ID of the gene along with its weight.
The output can be saved to a text file.
This
software also requires the CPLEX set of libraries. The
code contains a parameter EPS, which specifies how much
the margin is relaxed from the maximum margin. The default
value is 0.4, but it can be changed by editing the line
"#define EPS 0.4".
The
maximum allowed number of tissues is 300, and the maximum
number of genes is 15000.
|
|