understanding fosmid end reads in the ananassae projects

Ask questions about sequence improvement / finishing D. mojavensis projects here.
Post Reply
Posts: 99
Joined: Sun Feb 04, 2007 10:19 pm
Location: Moravian College, Bethlehem PA

understanding fosmid end reads in the ananassae projects

Post by cjones » Tue Feb 05, 2013 1:52 am

As a first step to tackling their projects, I've recommended students identify the ends of their fosmids. However, danafosXXX and fosmidendXXX reads are scattered all over the place. Some of them have a number of bases x'd out, but have non-x bases on either side of this region: shouldn't the x's -- assuming they represent vector sequence -- extend all the way to one end of the end of the read?

Similarly, I read Don Petkau's thread last year ("Where the Xs are placed") and wonder: if you include end-spanning reads (shown in blue in Wilson's diagram) which don't contain vector, are they still designated as "danafos" or "fosmidend" reads, even though they wouldn't contain x's? (Assuming I'm right that x's correspond to vector sequence.)

Is the vector sequence for this project available for us to blast against, or is that unnecessary?
Chris Jones
Assoc. Prof. of Biology
Moravian College
Bethlehem PA

Posts: 211
Joined: Sun Feb 04, 2007 10:29 pm
Location: Washington University in St Louis

Re: understanding fosmid end reads in the ananassae projects

Post by cshaffer » Tue Feb 05, 2013 7:05 pm

Yes the X's correspond to vector and should extend all the way, however the sequence quality can be so low that it no longer aligns with vector so the computer stops changing the bases to X's.

The reads that span the fosmid end are from whole genome reads and retain the naming convention for genomic reads. You can X out the sequences if you wish. Remember that each fosmid has at least 2 kb overlap with its neighbor so creation of the final published sequence does not require the ends to be high quality since it is also found internal in the adjacent clone and its probably better quality.

the vector sequence is in the edit_dir, I believe it is called pcc01.fasta

Post Reply