Welcome to GSP HomePage

(An Efficient l-mer frequency genome size predictor)
HOME            DOWNLOAD           DOCUMENT          CONTACTS

Workflow Chart:
useage: fqKmerFreq_v1.06: <inputList><formatFlag= 0/1 for FASTQ or FASTA><K-mer length>

Usage: gncov_v1.06 [-options] -t <tuple count> -o <output prefix>
   -t                   <tuple count file>                                                 #the tuple frequency table files, *.countMH
   -o                   <output file prefix>                                              #the output prefix  
   -l   (8-15)            <M length>                          Default:[8]           #upper boundary  freqeuncy k
   -k   (17-25)         <K-mer length>                    Default:[25]          #K-mer width
   -i   (1000-3000)   <iterate times>                     Default:[3000]      #iterative times
   -m   (0-1000)      <mutation ratio>                   Default:[200]        #ratio to disturb durbing the iteratiion
   -r   (25-100)        <reads length>                     Default:[70]          #Average read length
   -c   (0.01-5000)   <intial covreage>                   Default:[30]         #initial coverage input
   -e   (0-10)           <error cut-off>                      Default:[5]           #the K-mer frequency cut-off
   -g   Small Optimal Flag: On[Optional](2-3 fold)   Default:[OFF]       #optimize for small data set
   -s   Stable Optimal Flag: On [Optional]               Default:[OFF]       #Flag on for more stable
./make install

/bin/fqKmerFreq_v1.06  #Count the K-mer frequency
/bin/gncov_v1.06  #predict the genome size base on the K-mer freq

sample inputlist file open
/bin/fqKmerFreq_v1.06 AE005174v2_fq.list 0 25
/bin/gncov_v1.06 -t AE005174v2_fq.list.25mer.countMH -o AE005174v2_fq.list.25mer -k 25 -r 70

Sample output file
AE005174v2_fq.list.25mer.countMH open
AE005174v2_fq.list.25mer.report open

Additional document 1  (pdf)
Additional document 2  (pdf)

E.coli 10-fold data set (download)

Staphylococcus aureus strain MW2 data set (download)
Staphylococcus aureus strain MW2 data Result (download)

Complete Staphylococcus aureus strain MW2 data set (link)

1.J. Shendure and H. Ji. Next-generation DNA sequencing. nature biotechnology, 26(10):1135ĘC1145, 2008.
2.T.D. Harris, P.R. Buzby, H. Babcock, E. Beer, J. Bowers, I. Braslavsky, M. Causey, J. Colonell, J. DiMeo, J.W. Efcavitch, et al. Single-molecule DNA sequencing of a viral genome. Science, 320(5872):106, 2008.
3.RD Fleischmann, MD Adams, O. White, RA Clayton, EF Kirkness, AR Kerlavage, CJ Bult, JF Tomb, BA Dougherty, JM Merrick, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269(5223):496, 1995.
4.J. Gao and JG Scott. Use of quantitative real-time polymerase chain reaction to estimate the size of the house-fly Musca domestica genome. Insect Molecular Biology, 15(6):835ĘC837, 2006.
5.J. Raes, J. Korbel, M. Lercher, C. von Mering, and P. Bork. Prediction of effective genome size in metagenomic samples. Genome Biology, 8(1):R10, 2007.
6. X. Li and M.S. Waterman. Estimating the repeat structure and length of DNA sequences using L-tuples, 2003.
7. AP Dempster, NM Laird, and DB Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1ĘC38, 1977.

Copyright, 2009 - 2010, The Zhejiang University, China.  All Rights reserved.

Permission granted to download and use GSP freely for academics.  Any restrictions to use by non-academics are License need. Contact Zhejiang University Ph.D. Email: shangood@zju.edu.cn.

If you hope to known the genome size before de novo assembing, this is a definite must have. It is beyond simple!

What's NEW?

2010-4-7 GSP 1.06 released
2010-3- 15
   GSP 1.05 released
  GSP 1.04 released
  GSP 1.03 released
  GSP 1.0 released
  GSP Registered at SourceForge.net