Doctorate thesis of Montpellier University

 

Monday december 18 2023  at 2 pm, Amphi Lamour

 

Artificial Intelligence to Predict Plant networks and Phenotypes

Doctoral school : GAIA – Biodiversité, Agriculture, Alimentation, Environnement, Terre, Eau
Spéciality : BIDAP – Biologie, Interactions, Diversité Adaptative des Plantes
University : Université Montpellier
Reasearch unit : IPSiM –  Institut for Plant Sciences of Montpellier

Team: SYSTEMS

Jury:

Véronique BRUNAUD, Chargée de recherche, INRAE (Rapportrice)
Frédérick GARCIA, Directeur de recherche, INRAE (Rapporteur)
Sophie LEBRE, Maitre de conférence, Université Paul Valéry (Examinatrice)
Andréas NIEBEL, Directeur de recherche, LIPME (Examinateur)
Vincent SEGURA, Chargé de recherche, INRAE (Examinateur)
André MAS, Professeur, IMAG (Directeur de thèse)

Abstract:

This PhD thesis explores the transformative impact of machine learning, specifically supervised learning, on various fields of biology, with a focus on plant science. In the first part of this work, a comprehensive overview of several supervised machine learning models is presented, serving as a foundational entry point into the realm of these methods. The second part delves into the applications of these models within the context of plant science. The applications part of the thesis addresses the enigma of missing heritability. This phenomenon illuminated by the first GWAS pertains to unexplained phenotypic variations that transcend simple genomic modifications. Genetic interactions between different loci has emerged as a partial explanation. However, current GWAS statistical models suffer from scalability issues, high sensibility to false discovery rate (FDR). To address these challenges, the thesis introduces Next-Geneneration GWAS (NGG), a novel modeling approach capable of evaluating over 60 billions single nucleotide polymorphisms within hours. The method is benchmarked against state of the art GWAS models and applied to Arabidopsis thaliana yielding 2D epistatic maps at gene resolution. Results demonstrate NGG’s efficacy in retrieving missing heritability through epistatic interactions, thereby enhancing phenotype prediction capabilities. Additionally, the thesis investigates the regulatory mechanisms that govern gene expression, with a focus on transcription factor interactions (TF). TFs are known to play an important role in gene expression regulation, and their interactions are known to shape genomic transcriptional responses. The thesis proposes a machine learning approach using CART Trees to predict influent TFs in a Single Cell RNA sequencing (scRNA-seq) dataset from Arabidopsis thaliana roots. This new methodology offers a robust and interpretable means of predicting TFs but is currently highly limited by validation data. The goal of this thesis is mainly to underscores the profound influence of supervised machine learning on experimental science, showcasing its contributions to deciphering complex phenomena such as missing heritability and intricate gene regulatory mechanisms.

Mots Clé : Machine Learning, Predictions, Modelisation, Gene Networks