closing gaps without PCR

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
dpaetkau
Posts: 29
Joined: Fri Jun 05, 2009 6:18 pm

closing gaps without PCR

Post by dpaetkau » Tue Sep 15, 2015 1:58 am

I am wondering how to close gaps without PCR. I assume you have to remove the DBIA read into its own config to be able to close any gaps. Is this correct?

Particularly, I am working on DBIA2377007. The red triangles seems to indicate that there is a problem in the area of the gap. To be able to tear the config, I need to remove the DBIA2377007 read, right?

So, once I have done that, I get the attached picture? Any suggestions? Is this a case where we do PCR and leave everything else as it is, or is it possible to close this gap without PCR. Thanks for the help, in advance.
Attachments
Screen Shot 2015-09-14 at 10.10.52 PM.png
Screen Shot 2015-09-14 at 10.10.52 PM.png (41.88 KiB) Viewed 2876 times

wleung
Posts: 182
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: closing gaps without PCR

Post by wleung » Tue Sep 15, 2015 4:31 am

> I assume you have to remove the DBIA read into its own contig to be able to close any gaps. Is this correct?

Yes. Once you have removed the assembly piece (i.e. the DBIA read) from the assembly, I would recommend deleting the read from the assembly by selecting "Remove Reads" from the Consed Main Window and then selecting the "Delete Reads From Assembly" option. Otherwise the assembly piece would cause crossmatch to run very slowly and it will also interfere with picking PCR primers.

Please see pages 17-21 of the GEP Hybrid Assembly Walkthrough for additional information on how to remove the assembly piece and then use "Search for String" and "Compare Contigs" to close a gap without PCR.


> So, once I have done that, I get the attached picture? Any suggestions?

The next step depends on the reason that the paired-end reads are inconsistent. You can click on the red lines to determine why the paired-end reads are inconsistent.

If the reads are inconsistent because they are too far apart, then performing a force join might resolve in the inconsistent paired-end reads. However, the black boxes in Assembly View suggests that the two regions with the inconsistent paired-end reads contain an inverted repeat. Hence if the reads are inconsistent because they are in the wrong orientation (e.g., <- ->), then resolving the gap would not affect the inconsistent paired-end reads. In that case, the most likely explanation for the inconsistent paired-end reads is that the region contains additional copies of the repeats that were merged together by the assembler. In order to resolve the misassembly, you will need to pull out one set of the inconsistent reads, perform a miniassembly, run crossmatch to identify the collapsed repeat, tear the main contig, and then incorporate the new repeat copy into the main contig.

Please note that the crossmatch results suggest that the inverted repeat is ~3kb. Because of the short read lengths and the small number of mate pair reads from the 454 library, the region might require additional long read data (e.g. Sanger) in order to sort this inverted repeat. In that case, you can add a "dataNeeded" tag to the region prior to submitting the project.

dpaetkau
Posts: 29
Joined: Fri Jun 05, 2009 6:18 pm

Re: closing gaps without PCR

Post by dpaetkau » Tue Sep 15, 2015 1:22 pm

Thank you for the clear answer.
As a follow up and to complete the discussion for others that are attempting PCR: Is this a reasonable project to undertake?
In other words, would it be a good idea to try a longer PCR/Sanger reads to fix this miss-assembly/inversion, or is the chance of being successful so remote, and the data insufficient, that it is not really part of the current project paradigm to attempt to fix this problem. I am happy to work on problems with students if it presents a challenge that they can work through. However, if we are just going to end up with an "I don't know", or an "I can't really get any evidence to figure out the correct assembly (i.e.. my work is not evidence-based)" - then I don't want students to spend time on this. Better to spend more time on "Beyond Annotation".

Thanks again for your help.

wleung
Posts: 182
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: closing gaps without PCR

Post by wleung » Tue Sep 15, 2015 2:30 pm

> Is this a reasonable project to undertake?

The primary sequence improvement goal for the hybrid assembly projects is to resolve consensus errors within mononucleotide runs. Resolving gaps and low quality regions are secondary goals for the project. It would be useful to have a more accurate estimate of the gap size even if we cannot fill in the gap with additional Sanger data. Hence I would recommend that your students attempt to resolve the gap by performing at most two rounds of PCR reactions. If these reactions failed, then I would tag the region as "dataNeeded" and focus on the coding region and transcription start sites annotations.

Note that because the problem region is very repetitive, Consed might not be able to find unique PCR primer pairs in this region. In that case, you might need to design PCR primers that flanked one copy of the repeat in order to generate the PCR amplicon. You can then use this amplicon as a template for additional sequencing reactions.

Post Reply