Compile:
./make
./make install
Run:
/bin/fqKmerFreq_v1.06 #Count the K-mer frequency
/bin/gncov_v1.06 #predict the genome size base on the K-mer freq
Sample:
sample inputlist file open
/bin/fqKmerFreq_v1.06 AE005174v2_fq.list 0 25
/bin/gncov_v1.06 -t AE005174v2_fq.list.25mer.countMH -o AE005174v2_fq.list.25mer -k 25 -r 70
Sample output file
AE005174v2_fq.list.25mer.countMH open
AE005174v2_fq.list.25mer.report open
Additional document 1 (pdf)
Additional document 2 (pdf)
TEST DATA SET
Staphylococcus aureus strain MW2 data set (download)
Staphylococcus aureus strain MW2 data Result (download)
Complete Staphylococcus aureus strain MW2 data set (link)
Reference:
1.J. Shendure and H. Ji. Next-generation DNA sequencing. nature biotechnology, 26(10):1135¨C1145, 2008.
2.T.D. Harris, P.R. Buzby, H. Babcock, E. Beer, J. Bowers, I. Braslavsky, M. Causey, J. Colonell, J. DiMeo, J.W. Efcavitch, et al. Single-molecule DNA sequencing of a viral genome. Science, 320(5872):106, 2008.
3.RD Fleischmann, MD Adams, O. White, RA Clayton, EF Kirkness, AR Kerlavage, CJ Bult, JF Tomb, BA Dougherty, JM Merrick, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269(5223):496, 1995.
4.J. Gao and JG Scott. Use of quantitative real-time polymerase chain reaction to estimate the size of the house-fly Musca domestica genome. Insect Molecular Biology, 15(6):835¨C837, 2006.
5.J. Raes, J. Korbel, M. Lercher, C. von Mering, and P. Bork. Prediction of effective genome size in metagenomic samples. Genome Biology, 8(1):R10, 2007.
6. X. Li and M.S. Waterman. Estimating the repeat structure and length of DNA sequences using L-tuples, 2003.
7. AP Dempster, NM Laird, and DB Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1¨C38, 1977.