checklist issue: BLAST check

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
cjones
Posts: 99
Joined: Sun Feb 04, 2007 10:19 pm
Location: Moravian College, Bethlehem PA
Contact:

checklist issue: BLAST check

Post by cjones » Thu Mar 22, 2007 7:31 pm

The finishing checklist includes an item instructing us to "Run BLAST (check for contamination from vector, host)" -- are we actually supposed to use the entire 40kb fosmid consensus sequence? It takes forever and the proverbial day, and not surprisingly there are lots of hits to various Drosophilae. Are we missing something here in running this search?
Chris Jones
Assoc. Prof. of Biology
Moravian College
Bethlehem PA

cshaffer
Posts: 211
Joined: Sun Feb 04, 2007 10:29 pm
Location: Washington University in St Louis
Contact:

BLAST

Post by cshaffer » Fri Mar 23, 2007 8:13 pm

I am not sure why your BLAST searches are taking so long. Should only take about 3 or 4 minutes unless NCBI is really backed up.

In screening for contamination you are looking for exact hits to vector sequence not highly similar hits so that is pretty easy to screen for these. As for host conatmination there are times when an e coli transposon will jump into a fosmid during propagation. You really do need to make sure you are screening for these things but they are rare and the length and level of similarity needs to be quite high so again its pretty easy to screen by eye from the hit list to see if you have any contamination.

If NCBI is really going slow you can install a local copy of blast on your macs. The searches should take anywhere from 3 - 10 minutes depending on the database and how old your macs are. I do not have an easy to install package but it is an option if anyone is interested.

mshaw
Posts: 11
Joined: Sun Feb 04, 2007 10:26 pm

Blast

Post by mshaw » Fri Mar 30, 2007 8:35 pm

I presume that this is a Blastn search. How do we get the sequence out of consed to use?

cshaffer
Posts: 211
Joined: Sun Feb 04, 2007 10:29 pm
Location: Washington University in St Louis
Contact:

getting consensus sequences out of Consed

Post by cshaffer » Fri Mar 30, 2007 10:25 pm

One way is to go to the aligned reads window and select "Export Consensus sequence" from the file menu. This will give you a save dialog box where you can save the consensus sequence of the contig you are viewing.

You can use the saved file to search the non-redundant database. Remember you are looking for possible contamination. Real contamination will be long stretches of EXACT matches.

The likelyhood of some kind of cantamination is extreamly low given the history of these sequences. The lenght of the match would have to be quite long (say 500-1000 bp) before I started to worry about contamination.

Post Reply