As a first step to tackling their projects, I've recommended students identify the ends of their fosmids. However, danafosXXX and fosmidendXXX reads are scattered all over the place. Some of them have a number of bases x'd out, but have non-x bases on either side of this region: shouldn't the x's -- assuming they represent vector sequence -- extend all the way to one end of the end of the read?
Similarly, I read Don Petkau's thread last year ("Where the Xs are placed") and wonder: if you include end-spanning reads (shown in blue in Wilson's diagram) which don't contain vector, are they still designated as "danafos" or "fosmidend" reads, even though they wouldn't contain x's? (Assuming I'm right that x's correspond to vector sequence.)
Is the vector sequence for this project available for us to blast against, or is that unnecessary?
understanding fosmid end reads in the ananassae projects
-
- Posts: 99
- Joined: Sun Feb 04, 2007 10:19 pm
- Location: Moravian College, Bethlehem PA
- Contact:
understanding fosmid end reads in the ananassae projects
Chris Jones
Assoc. Prof. of Biology
Moravian College
Bethlehem PA
Assoc. Prof. of Biology
Moravian College
Bethlehem PA
-
- Posts: 211
- Joined: Sun Feb 04, 2007 10:29 pm
- Location: Washington University in St Louis
- Contact:
Re: understanding fosmid end reads in the ananassae projects
Yes the X's correspond to vector and should extend all the way, however the sequence quality can be so low that it no longer aligns with vector so the computer stops changing the bases to X's.
The reads that span the fosmid end are from whole genome reads and retain the naming convention for genomic reads. You can X out the sequences if you wish. Remember that each fosmid has at least 2 kb overlap with its neighbor so creation of the final published sequence does not require the ends to be high quality since it is also found internal in the adjacent clone and its probably better quality.
the vector sequence is in the edit_dir, I believe it is called pcc01.fasta
The reads that span the fosmid end are from whole genome reads and retain the naming convention for genomic reads. You can X out the sequences if you wish. Remember that each fosmid has at least 2 kb overlap with its neighbor so creation of the final published sequence does not require the ends to be high quality since it is also found internal in the adjacent clone and its probably better quality.
the vector sequence is in the edit_dir, I believe it is called pcc01.fasta