Hybrid Assemblies: criteria for HQDs in MNRs

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
chauser
Posts: 47
Joined: Wed Jan 31, 2007 3:34 pm

Hybrid Assemblies: criteria for HQDs in MNRs

Post by chauser » Fri Jan 24, 2014 1:17 am

We are looking thru the search features of consed to find regions that match all 3 criteria:
HQ >40
>= 3 HQDs
MNR >=5

Under Navigate by HD regions we can specify
# discrepant read =3
ignore bases below qual=40

but this returns MANY reads that are not in MNRs

We need a custome sort/search - suggestions?

Chuck

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Hybrid Assemblies: criteria for HQDs in MNRs

Post by wleung » Fri Jan 24, 2014 1:31 am

Unfortunately, the current recommendation is to manually identify the mononucleotide run (MNR) regions that overlap with the highly discrepant regions. Basically, you can perform a "Search for String" for "AAAAA" and "GGGGG" to generate the list of MNR regions. Then you can compare the list of MNR regions against the list of highly discrepant regions and look for intersections between the two lists.

To automate this process, you can save the navigator lists from your MNR and HDR searches and then import the lists into MS Excel or Galaxy in order to look for intersections between the two list.

cshaffer
Posts: 211
Joined: Sun Feb 04, 2007 10:29 pm
Location: Washington University in St Louis
Contact:

Re: Hybrid Assemblies: criteria for HQDs in MNRs

Post by cshaffer » Fri Jan 24, 2014 5:18 pm

Yes indeed, pretty horrible list with many many false positives. Finishing this type of assembly (i.e. a type we have never seen before), means that many computational tools that would help things run more efficiently have not yet been created.

There is a bit of work on this from last fall; see the private wiki where we were working with pivot tables in excel to count and sort each location based on the number of HQD. That protocol created a list with the most discrepant positions at the top of the list.

The protocol I have written up is the most basic that I expect will work for anyone with any set of computer skills. Those schools wishing to teach more advanced computer skills have a great opportunity here to have a student/group/class create tools/protocols/write-ups that could be posted so that we could all use them to make the whole process more efficient. The difficulty of course is matching up didactic goals with research goals.

cbazinet
Posts: 2
Joined: Tue Jun 19, 2012 9:22 pm

Re: Hybrid Assemblies

Post by cbazinet » Sun Jan 26, 2014 9:49 pm

As we move into the hybrid assemblies, some aspects of the tutorials (e.g., comparing actual restriction digest patterns with those predicted by an in silico assembly) become less relevant/irrelevant. I am having my students go through these various tutorials before they start on a project--and noting places where things work significantly differently than as described in the guides. Having them sit in the lab and hack through the exercise(s) together is finding the issues a lot faster than i could myself. I'll appreciate any new info regarding other changes in approach/protocol for adjusting to the new sequencing technologies.

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Hybrid Assemblies: criteria for HQDs in MNRs

Post by wleung » Mon Jan 27, 2014 5:02 pm

Sorry about the delay in posting the revised curriculum materials for sequence improvement of the hybrid assemblies. You can find the updated and new curriculum materials for improving the modENCODE hybrid assemblies using the "Hybrid assembly" tag on the GEP web site.

Most of the new curriculum materials for hybrid assemblies are available on the GEP Specific Issues section. In particular, the Sequence Improvement Protocol for GEP Hybrid Assembly Projects document provides a detailed description of the GEP sequence improvement criteria for the hybrid assembly projects. The GEP Hybrid Assembly Walkthrough document shows how you can apply this protocol to tackle problems in a specific D. biarmipes project.

Post Reply