Analysis of data from high-throughput molecular biology experiments (Ioannis): Kallisto explained

Kallisto belongs to a new generation of RNA-seq quantification programs (along with Salmon) that can circumvent the time-consuming step of genome alignment. The main premise is the reversal of the quantification question; Instead of asking where each read could map to, we seek to find which reads could have originated from the transcript. This is done by introducing the term "pseudoalignment" that stands for mapping a read to a set of transcripts of unspecified coordinates, implemented under k-mer hashing of the transcriptome de Bruijin graph. The latter is a de Brujin graph of k-mers stemming from the transcriptome.The key point is that by spanning a path of the graph according to the read sequence, we highlight a specific transcript this read could have originated from.This concept allows skipping nodes of the graph and eliminating error-prone reads, thus leading to faster and more accurate quantification, respectively.

.

The basic algorithmic steps of Kallisto. (a) A read (black) and three overalpping transcripts (colours). (b) Transcriptome de Bruijin graph; each node represents a k-mer. Relative to the path they belong to, every k-mer acquires a k-compatibility class. (c) "Pseudoaligned" k-mers of a read are hashed to find the k-compatibility class of the read. (d) Nodes belonging to the same k-compatibility class can be skipped. (e) The key point is that the intersection of k-compatibility classes of a read corresponds to the transcript origin.

[Taken and adapted from: Nicolas L Bray, Harold Pimentel, Páll Melsted and Lior Pachter, Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology 34, 525–527 (2016), doi:10.1038/nbt.3519]

Analysis of data from high-throughput molecular biology experiments (Ioannis)

Τρίτη 10 Ιανουαρίου 2017

Kallisto explained

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου