Automated Robust MicroArray Data Analysis in MATLAB (Toolbox)
Microarray technology allows gene expression profiling at a global level by measuring mRNA abundance. ARMADA (Automated Robust MicroArray Data Analysis) is a MATLAB implemented program with a graphical user interface (GUI) which performs all steps of typical microarray data analysis; starting from importing raw data from several image analysis software outputs as well as text tab delimited files or already processed data that need to undergo statistical testing, ARMADA continues with processes including noise filtering, spot background correction, data normalization, statistical selection of differentially expressed genes based on parametric or non parametric statistics, cluster or classification analysis based on several widely used clustering methods (Hierarchical, k-means, Fuzzy C-means) or statistical learning algorithms for classification (Discriminant Analysis, k-Nearest Neighbors, Support Vector Machines) and annotation steps, resulting in detailed lists of differentially expressed genes and formed clusters.
Along with the user friendly interface, ARMADA offers a variety of visualization options (MA plots, boxplots, array images, clustering heatmaps etc), a module which allows multiple analyses to be performed in batch mode under a specific analysis workflow and an annotation tool. The optimal number of clusters in any of the supported clustering algorithms can be estimated using the Gap statistic and Principal Component Analysis ability is also provided. Emphasis is given to the output data format which is fully customizable and contains a substantial amount of useful information such as detailed normalized and unnormalized expression values for each gene on each slide replicate along with several statistics concerning expression values for each experimental condition. The ARMADA output files can be easily imported in a spreadsheet like software such as MS Excel or in a database for further processing and storage and the analysis results can be saved as .mat files for further possible processing with MATLAB’s built-in algorithms.
ARMADA v. 2.0 application
Test datasets
- QuantArray test dataset (Tzouvelekis et al., 2007)
- ImaGene test dataset (GEO accession GDS1928)
- GenePix test dataset (GEO accession GSE1275)
- Affymetrix test dataset (GEO accession GSE9311)
- Text tab delimited test dataset (ArrayExpress accession E-MEXP-817)
- Text tab delimited normalized dataset (ArrayExpress accession E-MEXP-109)
- Classification dataset (Armstrong et al., 2002)

