Inconsistent Read Pairs

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
dpaetkau
Posts: 29
Joined: Fri Jun 05, 2009 6:18 pm

Inconsistent Read Pairs

Post by dpaetkau » Thu Oct 15, 2015 1:33 am

Hello All Knowing Forum Participants,

I have an interesting case for you finishers. Actually two similar cases.
The first is project DBIA1495015. As you can see by the first picture, there is a whole set of inconsistent read pairs due to distance between the end reads exceeding the expected distance. These could be mismapped, however when we look at the reads, they show perfect agreement with the consensus.
So, could these really be mismapped and there is an overlap that needs to be fixed.
Screen Shot 2015-10-01 at 12.20.43 PM.png
Screen Shot 2015-10-01 at 12.20.43 PM.png (27.57 KiB) Viewed 2115 times
When we search for string and compare the contains, we get a pretty fine match. A few mismatched bases, as you can see from picture 2, but generally a very good match. If we decided to "Join Contigs", however, it would cause a serious shortening of the entire project.
Screen Shot 2015-10-01 at 12.21.19 PM.png
Screen Shot 2015-10-01 at 12.21.19 PM.png (50.7 KiB) Viewed 2115 times
So the question. Is there a justification for joining contigs. The reads seem to map correctly and they seem to be inconsistent for the right reasons. The inconsistent read pairs seem to be the only indication, without gels, that we have a need for a possible rearrangement, so do we believe them and make the contig shorter.
Screen Shot 2015-10-01 at 4.21.48 PM.png
Screen Shot 2015-10-01 at 4.21.48 PM.png (105.02 KiB) Viewed 2115 times
The second question is similar.
A student working on project DELE8314004 has a regions that are marked "too far apart" but the sequences of those reads seem to match to the consensus. The third picture shows this issue. Can we do anything to work on improving these inconsistencies

Thanks in advance for your help,
Don

wleung
Posts: 182
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Inconsistent Read Pairs

Post by wleung » Thu Oct 15, 2015 2:40 am

> So, could these really be mismapped and there is an overlap that needs to be fixed.

The high percent identity match could be due to multiple copies of a transposon (e.g. terminal repeats of an LTR retrotransposon). In this case, the region contains multiple helitron transposon fragments based on the repeat tags in the assembly and based on examination of the "RepeatMasker" track on the genome browser (D. biarmipes April 2013 (BCM-HGSC/Dbia_2.0) Assembly, the end of scaffold KB462401).

The inconsistent mate pairs suggest that at least one of the paired end reads were mismapped. However, the alignment in the "Compared contigs" window suggests that there might be a collapsed repeat instead of an overlap. The question marks at the beginning of the alignment with white bases indicate that there are high quality unique bases that are discrepant between the two repeat copies. If one were to force join the region together, then all of the consistent mate pairs that are in this high quality region would become inconsistent. (You can see the list of consistent mate pairs by selecting "What to Show" -> "Fwd/Rev Pairs" -> "show each consistent fwd/rev pair within contigs" and "show legs on squares for consistent fwd/rev pairs" in Assembly View.)

In order to resolve the region, one would need to pull the inconsistent mate pairs out of the main contig, run miniassembly, tear the main contig near the inconsistent mate pairs, and then try to join the new contig back into the main assembly.


> A student working on project DELE8314004 has a regions that are marked "too far apart" but the sequences of those reads seem to match to the consensus.

One could pull the inconsistent mate pairs out of the main contig from Assembly View, perform a Miniassembly, and then use "Search for String" to see if the new contigs match other copies of the repeats in the main contig. If there are no alternate matches in the main contig, then either there is a collapsed repeat or the reads are derived from another part of the D. biarmipes genome.

Post Reply