LongTREC

Research Project 3

Algorithm development for splice alignment of long-read sequence data

About the Project

Stockholm University

Stockholm, Sweden

Sami Aydin

Early Stage Researcher

Kristoffer Sahlin

Main Supervisor

Adam Ameur

Co-Supervisor

Stockholm University

PhD enrolment

Research Objectives

Mapping LRs is challenging when dealing with small exons or low complexity regions. uLTRA mapper solves this problem but cannot deal with novel transcripts. Our objectives are:
  • Integrate novel seeding technique (strobemers) to transcriptomic long-read alignment to genome.
  • Adapt the exon-to-read chaining step in uLTRA of exons aligned to work without a reference  annotation.
  • Implement a post-alignment step that re-evaluates alignments based on reads aligned to the same region for alignment correction.

Envisioned Secondments

  • Wobble Genomics (Richard Kuo): Evaluate uLTRA to improve TAMA collapse.
  • CISC (Ana Conesa): Evaluate new mapper for quantification and integrate in SQANTI.

Early Stage Researcher

Sami Aydin

Sami Aydin

Stockholm University,
Sweden

Supervisor

I completed my first Bachelor’s degree in Computer engineering, followed by a second major in Mathematics, both in 2020 at Bilkent University. Afterward, I pursued my Master’s degree in Computer Engineering at Bilkent University in 2023. During my Master’s program, my primary focus was on the field of Bioinformatics. My Master’s thesis, titled “Whole Genome Alignment via Alternating Lyndon Factorization Tree Traversal,” introduced a novel data structure derived from mathematical principles to address the challenge of whole genome alignment. In addition, as part of various side research projects, I delved into the analysis of memory access patterns within the Suffix Array, a widely used tool in sequence mapping, treating it as a social network in the context of temporal and spatial relationships. I must admit that I am more inclined towards theoretical aspects rather than practical ones, but I understand the significance of practical applications in research. Joining the LongTREC network would offer me the opportunity to collaborate with individuals who have strong practical perspectives. As a LongTREC Doctoral candidate, my aim is to contribute to the field of science by combining my theoretical orientation with the practical insights of my colleagues, ultimately striving to make influential contributions to the field.

About the Main Supervisor and Host Group

Kristoffer Sahlin

Kristoffer Sahlin

Stockholm University,
Sweden

Supervisor

I am an Assistant Professor at Stockholm University (the Department of Mathematics) and SciLifeLab Fellow at the national center for molecular biosciences, Science for Life Laboratory. I obtained my PhD in Computer science from KTH Royal institute of Technology and I have worked as a Postdoctoral researcher at Pennsylvania State University and at University of Helsinki. Before that I obtained a bachelor’s degree in mathematics and a master’s degree in mathematical statistics from Stockholm University, Sweden.

Host Group

Our group develops computational methods to analyse large biological datasets. Particularly, we develop scalable algorithms for high-throughput genomic and transcriptomic sequencing data to study problems related to genome assembly, structural variation detection, and transcriptome analysis. We emphasise applicability of the methods and models to relevant biological and biomedical questions. For example, we have developed methods to cluster, error-correct and align long-transcriptomic reads (isONclust, isONcorrect, IsoCon, uLTRA), and indexing constructs for large scale sequence matching (strobemers), and algorithms for short-read alignment (strobealign). Detailed list can be found at our GitHub.