Friday, June 10, 2016
From gene expression modeling to gene network to investigate Arabidopsis thaliana genes involved in stress response
Institute of Plant Sciences Paris-Saclay
The gap between the structural annotation of a genome and the functional one still remains wide. Recent studies have estimated that 20% to 40% of the predicted genes have no assigned function in eukaryotic organisms whose genome is completely sequenced. Transcriptome data allow investigating the gene behaviors and co-expression studies have rapidly been considered as a way to identify sets of candidate gene modules. Generally co-expression is established by analyzing correlations between all gene pairs in multiple microarray experiments collected from public repositories. Such approaches may suffer from both heterogeneity of data and the choice of the clustering method, usually based on gene pairs. Tackling these limitations, we propose an analysis based on a large and homogeneous set of transcriptome data extracted from CATdb: 387 stress conditions organized into 9 biotic and 9 abiotic stress categories. Instead of correlation analysis, a model-based clustering was applied to identify clusters of co-expressed genes per stress category. Various resources were then analyzed and integrated to characterize functions associated with genes in these clusters. Protein–protein interactions and transcription factors-targets interactions were exploited to display gene networks. All the results are stored and managed in GEM2Net, a new module of CATdb (Zaag et al., 2015). We are currently demonstrating that this resource provides a valuable starting point to study stress responses and to propose a high-throughput functional annotation of Arabidopsis thaliana genome.
Contact : Antoine Martin