all Ts in trace listed as Ns?

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
cjones
Posts: 99
Joined: Sun Feb 04, 2007 10:19 pm
Location: Moravian College, Bethlehem PA
Contact:

all Ts in trace listed as Ns?

Post by cjones » Thu Feb 10, 2011 5:00 pm

Hello, and welcome back to another exciting season of Consed WTF? here at Moravian College!

One of my students has a short (3-read) contig, with all of the Ts in each of the 3 reads listed in the traces as Ns. (Well, the reads are all of low quality, so they're lower-case ns.) I'm not too concerned because the low quality suggests that they're not going to be very valuable in finishing the contig, but I'm curious about the cause and solution to this. My guess is that it's a mysterious phred error and that the only way to correct it (if we wanted to) would be to go in and manually edit each n to a t....

I'm fairly certain this question has come up before, and I've asked about it, but the forum search function is telling me either that it is pretty inadequate at finding things, or I'm hallucinating. Either is a distinct possibility.
Chris Jones
Assoc. Prof. of Biology
Moravian College
Bethlehem PA

cshaffer
Posts: 211
Joined: Sun Feb 04, 2007 10:29 pm
Location: Washington University in St Louis
Contact:

Re: all Ts in trace listed as Ns?

Post by cshaffer » Fri Feb 11, 2011 2:55 pm

You are correct on all cases, except maybe the hallucination part.

This is indeed an issue with the phred basecaller, not so much with more modern basecallers. This is almost always seen when there is a large dyeblob of T signal (due to a large amount of unincorporated dye-labeled ddTTP left in the sample). This corrupts the mathematical processing of the trace and it ends up with all the T's being called N's. So your guess is correct, a finisher may use manual editing to override the basecaller when it is clearly wrong; but the only reason to do this is if you need the data to cover a low quality region, if these problematic reads go to a high quality region just forget about them.

This issue was covered in the old mouse based consed exercise (http://gep.wustl.edu/repository/course_ ... Contig.pdf), but with the new basecallers it does not happen as much so was dropped in the newer drosophila training material.

Post Reply