WP2

WP2: Molecular genetic diversity and genotype x environment analyses

Leaders: Igepp and Agap

Objectives: Sequencing data to explore the genetic diversity and to identify genomic regions involved in response to climatic and soil constraints

Task 2.1: Plant genome sequencing

Collected seeds will be grown under greenhouse condition for DNA extraction. For each population, 30 plants will be used to build DNA bulks with a standardized sampling per plant. Libraries from the DNA pools will be constructed following the Illumina procedures and they will be sequenced with Illumina HiSeq technology with an approximately 10X coverage per genome at the GetPlage INRA facility. For each library, Illumina reads will be mapped on the corresponding reference genomes (530 Mb for B. rapa and 630Mb for B.oleracea) and the SNPs allele polymorphism matrix will be inferred for each species by Varscan and SNP calling software.

Task 2.2: Plant diversity analyses

Diversity and population genetic structure analyses will be implemented using classical population genetic and genomic tools (hierfstat R package, EggLib, ..). The software STRUCTURE will also be used to reveal the occurrence of large ancestral populations in each species.

Task 2.3: Design of core collections

Genome-wide patterns of diversity, genetic structure as well as passport data will be used to design core collections, i.e. limited sets of accessions representing, with a minimum of repetitiveness, the collected genetic diversity (Frankel 1984). Core-collections will be built per morphotype and to maximize the diversity along the climatic gradient for each species

Task 2.4: Genome-environment association

Genome-environment association (GEA) analyses will be performed genome-wide to identify significant associations between genetic polymorphisms and environmental variables. Four sets of environmental variables will be considered: climate variables, flora composition, soil biochemical properties and mineral composition and soil microbiota diversity descriptors. To this aim, we will use the Bayesian hierarchical model proposed by Gautier (2015) which controls for confounding by population structure. SNPs exhibiting the highest associations with environmental variation (also called “top SNPs”) will be analysed and the scientific literature about the underlying genes thoroughly reviewed. For the identification of SNPs associated with soil microbiota descriptors, the confounding effects of the abiotic factors will be removed as described in Frachon et al. 2018 to identify the plant loci truly associated with microbiota diversity.