Guide for Integrating Analysis Scripts into Chipster

Basically, you have to do three things:

Making R-scripts Chipster-compatible

Chipster uses regular R-scripts. The only thing to remember is that interactive functions can not be used.

Before running the script, the system runs the following initialisation snippet:

setwd(".")

The script should output results in table format to a file specified in description header. So, for example like this:

write.table(mytable, file="results.txt", quote=FALSE, col.names=FALSE, row.names=FALSE)

Writing VVSADL-description

R-scripts must be described by using a Chipster-specific description format called VVSADL (Very Very Simple Analysis Description Language). A VVSADL-snippet is added in comments as a header to a R-script file. The specification for the VVSADL can be found from the Javadoc documentation of the class fi.csc.microarray.VVSADLSyntax. Reasonably up-to-date copy of the document should be available from http://ocicat.csc.fi/microarray/javadocs/fi/csc/microarray/VVSADLSyntax.html.

Here is an example of a VVSADL snippet to attach before of the actual R code:

# ANALYSIS Test/test (Just a test analysis for development)
# INPUT CDNA microarray[...].txt OUTPUT results.txt, messages.txt
# PARAMETER value1 INTEGER FROM 0 TO 200 DEFAULT 10 (the first value of the result set)
# PARAMETER value2 DECIMAL FROM 0 TO 200 DEFAULT 20 (the second value of the result set)
# PARAMETER value3 DECIMAL FROM 0 TO 200 DEFAULT 30.2 (the third value of the result set)
# PARAMETER method PERCENT DEFAULT 34 (how much we need)
# PARAMETER method [linear, logarithmic, exponential] DEFAULT logarithmic (which method to apply)
# PARAMETER genename STRING DEFAULT at_something (which gene we are interested in)
# PARAMETER key COLNAME (which column we use as a key)

The analysis function named "test" is added to category "Test". Multiword names must be put into quotation marks. It has a description "Just a test analysis for development". It takes a set of input files called microarray0.txt, microarray1.txt etc. in cDNA-format (tab separated values), so the script can assume that the files exist in the working directory when it is called by the analyser. It outputs results to a file "results.txt" and execution related extra information to a file "messages.txt". The analyser assumes that the files exist after the script is run.

Then we have to define parameters. They are made available by the analyser as variables. So, for example, the script can assume to have an integer variable called value1 with a value in between of 0 and 200. Parameters must be given a name and a type, can be given a range and a default value, and must be given a description (in quotation marks).

Everything should be in the same order is in the example snippet. So, for example, parameters have to be described after input/output. Only the first line (ANALYSIS) is compulsory.

VVSADL Details

Input/output types:

    * CDNA
          o result of a cDNA scan
          o must not have a header
          o columns CH1I_MEAN, CH1B_MEDIAN, CH2I_MEAN, CH2B_MEDIAN 
    * AFFY
          o result of a Affy scan
          o must have a header
          o columns X, Y, MEAN, STDV, NPIXELS 
    * EXPRS
          o expression values, calculated from scan results
          o must not have a header
          o column EXPR 
    * GENE_EXPRS
          o expression values with associated probeset names
          o must not have a header
          o for Affy data, but this could be used for cDNA data also?
          o columns PROBESET_ID, EXPR 
    * GENENAMES
          o probeset names
          o column PROBESET_ID 

Often (note: this should be specified) column names can be derived automatically, so they are not needed. This is the case with analysis script result files.

Parameter types:

    * INTEGER
          o integer value 
    * DECIMAL
          o decimal value 
    * PERCENT
          o integer from between 0 - 100
          o might be removed in future, if there is no need for this 
    * STRING
          o a free string value 
    * [val1, val2, val3]
          o enumeration type
          o possible values are given in block parenthesis 
    * COLNAME
          o a column name present in the input matrix
                + in case of multiple inputs, present in all of them 
          o can also be empty 

Validating Annotated R-Scripts

A validator is provided in the Chipster-distribution. You can trigger it with command line parameter rcheck, followed with script name. So, for example:

java -jar microarray-0.0.2.jar rcheck my_scripts/script.R

Adding R-scripts into Analyser

When made publicly available, scripts should be added to scripts/R in the project directory (ProjectDirectoryStructure).

Scripts are configured (added to analyser) in a file called nami.properties on the analyser node. The file is created from default setting template data/nami_default.properties, so update it to make addition permanent.