Research Project 8

Exploring the transcriptional landscape of environmental samples using metatranscriptomics long reads

About the Project

CEA – Genoscope

Évry, France

Carmen Lafuente

Early Stage Researcher

France Denoeud

Main Supervisor

Jean-Marc Aury


Université of Paris-Saclay

PhD enrolment

Research Objectives

Marine environments remain largely unexplored and unobserved. Planktonic eukaryotes have been shown to harbor non-canonical splice sites and original splicing mechanisms, but a lot are uncultivable. We are aiming at exploring the transcriptional landscape (intron structure, organization and evolution) in marine metatranscriptomics samples. Our objectives are to:

  • Evaluate and develop a method to predict complete protein sequences from LrRNA-seq
  • Reconstruct full exon/intron structures by combining metatranscriptomics and metagenomics long reads of a given sample.
  • Characterize new introns and splicing mechanisms, and better understand the evolution of introns.
  • Use this resource for the annotation of MAGs (metagenomics assembled genomes).

Envisioned Secondments

  •  CRG (Guigo) : assess/compare gene annotation methods using LrRNA-seq data .
  • Stockholm University (Sahlin) assess novel mapping algorithms for long transcripts.

Early Stage Researcher

Carmen Lafuente


I am originally from Spain. I completed my Bachelor’s Degree in Biotechnology at the Universidad de Zaragoza in 2021. In 2020, I undertook an internship at the Blood and Tissue Bank of Aragon, and, in the following year, I had the opportunity to participate in an Erasmus program in Milan, where I conducted my Bachelor’s thesis studying atherosclerosis using single-cell RNA sequencing data.

In 2021, I enrolled in the Master’s Degree program in Molecular Biotechnology and Bioinformatics at Università degli Studi di Milano, which I completed in June 2023. Throughout this period, I expanded my knowledge in computational biology and genome analysis and my interest in next-generation sequencing, particularly in long read sequencing technology, grew. Additionally, I undertook a hybrid-internship at Universidad de Zaragoza, where I analysed protein sequences leading to translocation to mitochondria. As part of my Master’s studies, I dedicated a year to my thesis at the Consiglio Nazionale delle Ricerche in Milano, where I designed an algorithm for predicting patient responsiveness to treatment based on specific genetic mutations. This experience allowed me to gain expertise in machine learning concepts and programming.

My fascination with long read sequencing technology and its benefits and advantages drove me to join LongTREC. I am grateful for the opportunity to pursue my Ph.D. within this network, and I aspire to learn as much as possible during this period. My goal is to perform an outstanding PhD project that will contribute significantly to the scientific community and, hopefully, society.

About the Main Supervisor and Host Group

France Denoeud


Our lab (LBGB: Bioinformatics for Genomics and Biodiversity Lab) has been a pioneer in the use of RNA sequencing and has developed tools dedicated to the annotation of eukaryotic genome. Since 2017 we have been part of the ASTER project (Algorithms and software for third generation RNA sequencing). Through this project and its strong network of experts in algorithmics and genomics we have participated in several projects based on RNA sequencing with long reads. The clustering of long reads native from the same gene is an important step to allow correction at the gene/transcript level. This was one of the motivations to develop the CARNAC-LR tool. We have also developed NaS, an error-correction method based on the combination of short and long reads. Generally, these methods are dedicated to DNA reads and have individual heuristics that preclude their use to error-correct transcriptomic reads. We have summarized these results in a complete benchmark.  The ASTER project was a good opportunity to sequence RNA samples and monitor the evolution of the long-reads technology provided by Oxford Nanopore. Our result suggests potential biases in the long-reads approach and also the great potential of direct RNA sequencing. At Genoscope, we also have strong expertise in metatranscriptomics and metagenomics, and performed and analyzed long-reads data from marine samples of TARA expeditions. In addition, we extensively studied introns structure and evolution in several marine organisms.