Infamous Red Tirangles

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
dbeach
Posts: 1
Joined: Tue Jun 19, 2012 9:23 pm

Infamous Red Tirangles

Post by dbeach » Thu Mar 07, 2013 3:35 pm

We are having an issue with red triangles that indicate issues with the paired-end reads separation in the assembly. Several students have been removing these to try to reassemble them correctly. The alignments for the reads look reasonable for both ends, but we are getting the errors. We have been removing the reads where there is sufficient sequence data, and when they are re-inserted using a miniassembly, the red triangles return. RE digests in the area look fine, so the assembly appears to be correct. There are no other repeat regions identified that might indicate other problems.

First - is there another way to resolve these angry read demons? Any hints would be appreciated.

Second, In several cases, when the paired reads are removed from the assemble, the "Red Triangle" disappears only to create a new one for different reads. Again the RE digests do not throw any errors and there are multiple reads in these area, so I wouldn't expect the assembly to be significantly changing. I have not looked specifically at RE band sizes to determine if there are any changes in contig lengths. Can anyone help me figure out what is happening?

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Infamous Red Tirangles

Post by wleung » Fri Mar 08, 2013 2:03 am

When you click on the inconsistent mate pairs in Assembly View, Consed should explain the reason why the mate pairs are inconsistent. If the mate pairs are inconsistent because they are oriented away from each other, then it indicates that there is likely a misassembly in the region. In contrast, if the mate pairs are just slightly further away than the maximum insert size, then the inconsistent mate pairs are likely to be spurious.

I would leave the inconsistent mate pairs in the assembly if all four restriction digests matched and there are no unaligned high quality or multiple high quality discrepancies in the region. Please add a comment tag on the inconsistent mate pair reads which explains why the inconsistent mate pairs do not indicate a problem in the assembly (i.e. in-silico and real digests matched, no high quality discrepancies, etc).

In some cases, the inconsistent mate pairs might have been misplaced and are derived from a different part of the genome. Please see the forum topic good digest, (some) bad read pairs on the GEP forum for the strategies you can use to determine if the inconsistent mate pairs should be placed elsewhere in the whole genome assembly.


> We have been removing the reads where there is sufficient sequence data, and when they are re-inserted using a miniassembly

If you were to join the Miniassembled contig back to the same location in the main contig, then the mate pairs in the Miniassembled contig will still be inconsistent. This is because the main contig sequence (and the distance between the mate pairs) after the join likely did not change when compared to the sequence in the original assembly.


> Second, In several cases, when the paired reads are removed from the assemble, the "Red Triangle" disappears only to create a new one for different reads

When you pull out reads from the main Contig, Consed might pull out additional reads with similar sequences and it will update the consensus. Consequently, the underlying assembly might have changed when you pulled out reads. By default, Consed will show inconsistent mate pairs in Assembly View only if there is at least one additional inconsistent mate pair that is inconsistent close to (i.e. within the average insert size) the first inconsistent mate pair. I would suggest clicking on these inconsistent mate pairs to see why they are discrepant. If the reads are discrepant because they are just slightly further apart than the maximum insert size, then these new inconsistent mate pairs are likely to be spurious. However, if there are a large number of inconsistent mate pairs, then it would indicate that the region is likely to be misassembled.

Post Reply