Phenodata is simply a description of your experimental setup. In other words, you specify the treatments, time points, replicates, etc. for your data set using the phenodata.
Unless you fill in at least the phenodata column group, you will not be able to perform most of the statistical analyses. If you're aiming at a more thorough analysis, you should fill the phenodata and describe your experiment in detail.
Phenodata editor becomes available when phenodata has been linked to or generated for a normalized data set. This typically happens when you normalize your raw data set, but it can also happen when you import a prenormalized data and generate an empty phenodata for it, or link the dataset with an existing phenodata that could have been imported with the prenormalized data.
Phenodata can be invoked by first selecting phenodata from the workflow view, and then selecting Phenodata editor from the visualization dropdown menu:
An empty phenodata is always marked with an exclamation mark.
Phenodata editor contains a few locked fields, typically sample (the chip name in Chipster), original name (name of the original data file) and chiptype (which chip was used for the experiment). Other fields are customizable. Phenodata contains one row per array (uploaded file) in the experiment.
A minimal requirement for a filled-in phenodata in the column group filled with numbers. You can click with the left-hand mouse button to a cell in the group column, type in a number, and press enter. This fills in the group description for one sample. You should repeate the process to all arrays in the experiment. Only numbers are allowed in the cells, and every cell in the column should be filled, otherwise the analyses will stop with an error message.
Once you have filled in the group column, the exclamation mark will disappear:
If you have a simple set up, for example, a comparison of two groups without any other information on the samples, then it is sufficient to fill in the group column only. If you have more information on the samples, this data can be taken into account by coding it into additional columns in the phenodata.
New columns can be added to phenodata using the tool on the right side of the phenodata editor. If you type in a new name for a column, and click on Add-button, you can create a new column that can then be filled in just like the group column.
Columns can also be deleted from phenodata. Select a column from the drop-down menu and press Remove-button to delete it permanently.
Every array in this experimental setup contain a reference and a treatment. The main interest is comparing treatment to control. In such a setup, all arrays are coded with 1 in the group column. This tells chipster to treat them a single group. The statistical analysis could be, for example, a one-group t-test.
Some of the samples belong to the control group, and some other samples to a treatment group. Control samples should be coded with 1 and treatment samples with 2 in the group column. Then Chipster knows to compare treatment to control (control is coded with a smaller number than the treatment). If the control is coded with a larger number than the treatment, then the comparison is control against treatment. Since this makes a difference when interpreting the results, it is worth paying attention to.
More than two groups can be coded with a running number in Chipster. However, it is suggested that the control group is always coded with the smallest number. In other words, the control group is coded with 1, the first treatment group with 2, the third treatment group with 3, and so third.
Pairing is introduced in the experiment if, for example, several measurements are taken from the same person. The samples could represent, e.g., before and after treatment measurements. In such cases pairing is better to be taken into account in the analysis. Pairing is coded in the phenodata by first creating a new column, and then coding all the paired samples with the same number. For example, is there were two persons, with before and after treatment measurements, the phenodata could be filled in as:
group pair 1 1 2 1 1 2 2 2
Group column divides the samples into before treatment (1) and and after treatment (2) groups, and pair column specifies which samples are paired (person 1 and person 2).
Pairing can be taken into account if the analysis is done using the linear modeling tool in Chipster.
The easiest way to deal with time is to treat it as a categorical variable. This applies to experiments where only a few time points have been used. Code the treatment in the group column, and create a new column for time. If it is treated as a categorical variable, code it with a running number, the first time point being the smallest number. This experiment is best analyzed using the linear modeling tool.