Polymorphism frequency in ananassae?

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
cjones
Posts: 99
Joined: Sun Feb 04, 2007 10:19 pm
Location: Moravian College, Bethlehem PA
Contact:

Polymorphism frequency in ananassae?

Post by cjones » Thu Feb 07, 2013 1:44 am

Most of my students have one or more sites in their projects in which one or more reads has a high-quality discrepancy from the consensus. Given the size of the homologous flanking regions I'm inclined to call these polymorphisms, but there are a lot more of them than seems reasonable. Has this been a common observation in these ananassae finishing projects?
Chris Jones
Assoc. Prof. of Biology
Moravian College
Bethlehem PA

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Polymorphism frequency in ananassae?

Post by wleung » Thu Feb 07, 2013 2:08 am

Many of the genuine high quality discrepancies I have seen in the D. ananassae projects are placed in regions that matched known transposons in D. ananassae. The consensus sequences in the repetitive regions are marked with blue "repeat" tags. Consequently, some of the high quality discrepancies could be attributed to reads that have been misplaced into the wrong copy of the transposon. While there might only be a single copy of the transposon in your project, the read might have been derived from a different copy of the transposon in the genome. Note that if the read is part of a transposon, we would expect it to have a high degree of similarity with other copies of the transposons in the genome.

Hence I would suggest checking both the read and its mate pair to verify that you are confident in the placement of both reads in your assembly. In addition, you can search the read (and its mate pairs) against the whole genome assembly (e.g. using FlyBase BLAST or BLAT in the official UCSC Genome Browser) to see if there are other locations in the genome where the reads would match better.

In addition, for projects with major misassemblies, high quality discrepancies is a key tool for distinguishing different repeat copies from each other. In particular, regions where there are multiple reads with multiple high quality discrepancies would likely indicate a collapsed repeat. Please refer to Chris' talk on polymorphisms for more information on how you can use the restriction digests to distinguish polymorphisms from misassemblies.

Post Reply