RPubs

Basecaller accuracy comparison

Making use of PAF alignment of cDNA splice-aware mapped reads to hg38. Wanted to ascertain the improvement in accuracy when implementing the flip_flop model available since release of guppy_v2.3.1. This R markdown makes heavy use of scripts generated by rrwick who did something similar to ascertain both single read, and assembly accuracy for various bacterial gDNA reads. While this is perhaps not the most accurate means of doing it, it provides a practice in using PAF, and R plotting. Note: The styling of markdown is thanks to the css template generated for an ONT tutorial. So many thanks. And of course the tufte markdown package

almost 7 years ago

Mapped Identities of guppy called cDNA reads + bbplot()

1. A collection for 4000 reads generated from SQK-LSK109 kit, converted to a multi-fast5 file suitable for guppy and then run through production model base caller or experimental flip_flop model basecaller. Run on ubuntu 18 using docker for ubuntu 16 so could install guppy_gpu and use on board Nvidia GeForce GTX 1060 GPU, particularly useful if wanting to use the slow flip_flop model 2. Pass read fastq concatenated into a single pass fastq file. 3. Reads were then mapped to hg38 using minimap2, output to PAF. simplified minimap2 cmd: minimap2 -x splice --secondary=no hg38.mmi /path/to/ > output.PAF 4. Import PAF as data.frame to R, Column 10 provides the number of matches between query and target Column 11 provides the length of the target include gaps to the query Column 10 value divided by column 11 value provides a blastn like identity percentage. This may provide a proxy for basecall accuracy tidyr to form factors "guppy" (production model), and "guppy_floppie" (experimental model) ggplot density of identities for the two factors, shown as overlay

almost 7 years ago

Mapped Identities of guppy called cDNA reads

1. A collection for 4000 reads generated from SQK-LSK109 kit, converted to a multi-fast5 file suitable for guppy and then run through production model base caller or experimental flip_flop model basecaller. Run on ubuntu 18 using docker for ubuntu 16 so could install guppy_gpu and use on board Nvidia GeForce GTX 1060 GPU, particularly useful if wanting to use the slow flip_flop model 2. Pass read fastq concatenated into a single pass fastq file. 3. Reads were then mapped to hg38 using minimap2, output to PAF. simplified minimap2 cmd: minimap2 -x splice --secondary=no hg38.mmi /path/to/ > output.PAF 4. Import PAF as data.frame to R, Column 10 provides the number of matches between query and target Column 11 provides the length of the target include gaps to the query Column 10 value divided by column 11 value provides a blastn like identity percentage. This may provide a proxy for basecall accuracy tidyr to form factors "guppy" (production model), and "guppy_floppie" (experimental model) ggplot density of identities for the two factors, shown as overlay

almost 7 years ago

neuron_diff_heatmap

almost 7 years ago

My_first_PCA

Mommy wow, I am an R statistician now.

almost 7 years ago

RPubs

callumjcparr

Callum Parr

Recently Published

Basecaller accuracy comparison

Mapped Identities of guppy called cDNA reads + bbplot()

Mapped Identities of guppy called cDNA reads

neuron_diff_heatmap

My_first_PCA

Sign In

RPubs

callumjcparr

Callum Parr

Recently Published

Basecaller accuracy comparison

Mapped Identities of guppy called cDNA reads + bbplot()

Mapped Identities of guppy called cDNA reads

neuron_diff_heatmap

My_first_PCA