What to do with a contig that doesn't fit

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
dpaetkau
Posts: 29
Joined: Fri Jun 05, 2009 6:18 pm

What to do with a contig that doesn't fit

Post by dpaetkau » Thu Feb 21, 2013 4:42 am

I have a couple of students with projects that seem to have reads (a small contig) that don't belong to their project. The evidence for the readings being in the wrong project includes very good restriction digest evidence for the contig without the extra reads and either (1 project) a large increase in high quality discrepancies when the extra contig is joined to the large contig or (1 project) lack of any sequence overlap of more than a few (20 bases).

My question is - what should they do with these reads? Should they rip them delete them or leave them in a separate contig and label the small extra contig.

wleung
Posts: 182
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: What to do with a contig that doesn't fit

Post by wleung » Thu Feb 21, 2013 6:14 am

In many cases, the extra reads are placed in a repetitive region with a blue transposon tag. This suggests that these reads might have been derived from a different copy of a transposon in the genome and the reads were misplaced in the published draft assembly.

I would recommend pulling these reads (and their corresponding mate pairs) out of the main contig and then run Miniassembly on the reads you have pulled out. You can then add a comment tag to the extra contigs that explains why the reads were pulled out of the main assembly (e.g. incorporating the extra reads will introduce discrepancies in restriction digests or inconsistent mate pairs).

Please also include an explanation of the extra contigs in the finishing checklist under the item:
Comment tags on any contigs over 2 kb that are not in the assembly.

cjones
Posts: 99
Joined: Sun Feb 04, 2007 10:19 pm
Location: Moravian College, Bethlehem PA
Contact:

Re: What to do with a contig that doesn't fit

Post by cjones » Thu Feb 28, 2013 12:50 am

What about contigs that are less than 2 kb long? One of my students has a single contig plus a single read that differs in multiple positions from the contig, suggesting that it belongs elsewhere in the genome. Technically she can't say "Project is in a single contig" as there is that single outlying read, but it's less than 2 kb so doesn't require comment....
Chris Jones
Assoc. Prof. of Biology
Moravian College
Bethlehem PA

wleung
Posts: 182
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: What to do with a contig that doesn't fit

Post by wleung » Thu Feb 28, 2013 4:10 pm

In many projects, there are usually extra reads that are left over besides the main contig. These are often low quality reads or possible contamination. Generally you can ignore these extra reads if the main contig passes the finishing standard (e.g. digests are consistent and there are no inconsistent mate pairs). In our case, some of these reads could also have been derived from other locations in the genome.

We would consider a project to be in a single contig if there are no gaps in the main contig (even though there might be extra singlets or contigs that are less than 2kb). You do not need to comment on these smaller contigs in the finishing checklist.

However, please add a comment tag to the beginning of the contig / read that justifies why the reads were pulled out of the main assembly (e.g. introduce multiple high quality discrepancies, unaligned high quality regions, etc.). Note that if you decided to pull out a read from the assembly, then you would also need to pull out its mate pair.

Post Reply