Misassembled reads

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Misassembled reads

Post by drevie » Mon Feb 13, 2012 7:41 pm

In 1774P08, there are a number of reads that are too far apart. In examining this, it appears that there is a 1666 bp indel that some of the reads have and others do not. In examining the indel, it appears to be a transposon! How do we report this? We have two contigs as some of the reads span the indel, while some other reads are "too far apart". Do we mark each as inconsistent due to the indel?

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Misassembled reads

Post by wleung » Mon Feb 13, 2012 9:51 pm

While one possible explanation for the presence of the inconsistent reverse mate pairs is an insert polymorphism, an alternative explanation is that we may have incorporated reads from another copy of the transposon that is located elsewhere in the genome. Because, the restriction digests are consistent with the presence of the 1.6kb insertion, the fosmid most likely contains this region. If we pull out the inconsistent reads (that are too far apart) and use them to generate a miniassembly, we found that they assemble into a separate contig. Note that there are no forward reverse mate pairs that would link this smaller contig (22) with our main contig (2).
1774P08_assembly_extra_contig.png
Assembly view for 1774P08 that shows the extra contig
1774P08_assembly_extra_contig.png (27 KiB) Viewed 2984 times
Comparing the repeat copies found in contig22 and contig2 reveal numerous high quality discrepancies, consistent with the hypothesis that these are different copies of the transposons:
1774P08_repeat_align_discrepancies.png
Discrepancies seen in the two different copies of the repeat in 1774P08
1774P08_repeat_align_discrepancies.png (158.7 KiB) Viewed 2984 times
Consequently, I would not try to incorporate these discrepant reads into the main contig. Instead, I would add a comment tag to the assembled contig indicating that this could be another copy of the transposon that is located elsewhere in the whole genome assembly.

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Misassembled reads

Post by drevie » Wed Feb 15, 2012 8:33 pm

We looked at the fosmids again, and I agree that it is likely reads from another fosmid. Most or all of our fosmids have extraneous reads, including one that likely has a 4 kb contig from the next fosmid down the genome (27 reads). As the extraneous reads haven't been the case in the previous years, I wasn't expecting it.

Post Reply