Different chemistries, different results?

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
jstamm
Posts: 30
Joined: Mon Aug 06, 2007 8:28 pm

Different chemistries, different results?

Post by jstamm » Thu Feb 28, 2008 7:06 pm

In the student's words:

I just received reads back from my last reads called for my fosmid. Two of the reads look alright, but the third read (all of which come for the same primers) has more than half of the sequence x-ed out as vector sequence. I am a little close to the end of my fosmid, but the end is in the opposite direction from where the x's are located. My main question is, how is it possible that the singular read has part of it as vector sequence and the other part as high quality correctly aligned sequence?

Image[/img]

jstamm
Posts: 30
Joined: Mon Aug 06, 2007 8:28 pm

And a further comment from me

Post by jstamm » Thu Feb 28, 2008 7:09 pm

I posted a few weeks ago about a sequence that a student obtained, which ended up being vector sequence. We realized today that we had assumed the incorrect identity of the primer that gave this sequence. Based on the assembly, this region should not be anywhere close to the primer. How is this possible? The vector sequence we got from that read was very good, although the first 50-100 bases or so of the sequence were of relatively low quality.

cshaffer
Posts: 211
Joined: Sun Feb 04, 2007 10:29 pm
Location: Washington University in St Louis
Contact:

Conseds vector identification

Post by cshaffer » Thu Feb 28, 2008 7:43 pm

Consed uses a simple string matching to try and find vector sequence. To simplify installation consed actually checks the sequence against all vectors used in genomic sequencing. You can look at the file its on the wiki but I think it has about 75 sequences in it.

Although this is convenient in that you don't have to update consed every time you add data with a different vector is does mean that consed will sometimes X out some sequences that are not vector. This is yet another place where a human brain is way better than a stupid computer program. You know it is not vector and you are correct. Since you have two good reads it really does not matter that consed X these low quality bases out, they don't contribute to the consensus anyway.

As long as the consensus is correct you are good to go.

The trick comes when there is only a single read and it gets X'ed out or all the reads get X'ed out. In these cases the finisher would have to go in and manually edit all the bases back to the correct basecalls and then "change consensus" to get the consensus correct; but in this case you have two good reads so you don't need to do any editing.

jstamm
Posts: 30
Joined: Mon Aug 06, 2007 8:28 pm

Post by jstamm » Thu Feb 28, 2008 8:11 pm

But what about the second case that I posted about? In that instance, we just had one read, and it was indeed vector sequence. But the primer that was called was in the middle of the fosmid.

cshaffer
Posts: 211
Joined: Sun Feb 04, 2007 10:29 pm
Location: Washington University in St Louis
Contact:

reads and masking

Post by cshaffer » Sat Mar 01, 2008 3:35 am

Two possibilities that you should be able to figure out but I cannot tell from what you are showing.

first possibility is that the read is really from the end and should have vector but was misassembled. This can be checked as the read should be placed just downstream of the primer that created it.

The other possibility is that the read is properly placed downstream of its primer but the sequence happens to match one of the 70 odd vectors found in the vector file. This means it will be X'ed out even though it does not match any vector sequence.

Also if you are talking about the bottom read in the above fig I would not worry about it, there are clearly a lot of noise in that trace and quite a few miscalls. It could be that the miscalls make the sequence similiar enough to a vector sequence that it gets masked but the quality of that read is so low it just does not matter if it is Xed out or not, its not going to contribute any quality data to help you with getting to the correct consensus.

The other two reads you show are pretty good so I am guessing the consensus is correct in this region. It doesn't really matter if you keep read with all the X or take it out, just make sure the consensus is correct based on the two reads above it.

Post Reply