Sunday, December 27, 2009

Natural Selection Simulator

Catalogued on GSR
All these programs can be run if you have the gcc compiler (freely available at: http://gcc.gnu.org) installed in your computer. Please report any questions or concerns to: pkbadri at yahoo.com.

The program given below is based on the algorithm described in Padhukasahasram et al. 2008.




Download the 3 program files in the links above. newsel.c can simulate a Wright-Fisher population of constant size undergoing mutation, recombination and natural selection at multiple sites. First create a file named newselinput.txt in the same directory in which you want to run this program. Fix the number and positions of selected sites and selection coefficients in newselinput.txt. This file shud have the following:


1st line: Number of sites under selection.
2nd line: Positions of selected sites in ascending order separated by spaces. Positions vary from 0 to len - 1, where len is the length of the DNA sequence in base pairs.
3rd line: Respective selection coefficients for the above positions for the genotype heterozygous for alleles (10 or 01). These should be separated by spaces.
4th line: Respective selection coefficients for the above positions for the genotype homozygous for one allele (11). These should be separated by spaces.
5th line: Respective selection coefficients for the above positions for the genotype homozygous for other allele (00). These should be separated by spaces.

For example:
5
0 100 200 300 499
0.5 0.5 0.5 0.5 0.5
0 0 0 0 0
0 0 0 0 0

The fitness for each genotype is equal to 1 + selection coefficient. Fitness effects are either added or multiplied across the sites. Compile using g++ -O3 -o sel newsel.c mtrand.cpp -Wno-deprecated. There are 10 different command line parameters:

-samples (Number of samples to output from final population. Choose < pop/2)
-pop (Total number of chromsomes in the diploid population. Should be a even number.)
-len (Length of the DNA sequence in basepairs. Choose larger than del*u*pop)
-r (Per-generation per-sequence rate of recombination)
-u (Per-generation per-sequence rate of mutation)
-s (Probability of self-fertilization)
-gen (Total number of generations)
-del (Generations after which fixed mutations are removed)
-reps (Number of replicates)
-fitness (1 to add fitness effects. 2 to multiply effects)

To run type ./sel along with all the command line parameters in the same order. For example: ./sel -samples 100 -pop 1000 -len 1000000 -r 0.50 -u 0.50 -s 0.0 -gen 10000 -del 500 -reps 10 -fitness 2. Outputs positions of biallelic polymorphisms and haplotypes in 0-1 format for samples collected from the final population.

Forwsim

Catalogued on GSR

The program given below is based on the algorithm described in Padhukasahasram et al. 2008.

Forward Wright-Fisher Simulator

Download the files in the 3 links above. forwsim.c is a program that can efficiently simulate the evolution of a diploid Wright-Fisher population of constant size with uniform crossing-over rate, uniform mutation rate and with or without self-fertilization, forward in time. To compile type: g++ -O3 -o forwsim forwsim.c mtrand.cpp -Wno-deprecated. There are 8 different command line parameters which are:

-samples (Number of samples to output from final population.)

-pop (Total number of chromsomes in the diploid population. Should be a even number.)

-len (Length of the DNA sequence. Choose larger than del*u*pop)

-r (Per-generation per-sequence rate of recombination)

-u (Per-generation per-sequence rate of mutation)

-s (Probability of self-fertilization)

-gen (Total number of generations)

-del (Generations after which fixed mutations are removed)


To run type ./forwsim along with all the command line parameters in the same order. For example: ./forwsim -samples 100 -pop 1000 -len 1000000 -r 0.50 -u 0.50 -s 0.0 -gen 10000 -del 500. The output file is called finalpopulation.txt. Each line in this file corresponds to a chromosome in the population and the numbers represent positions of mutations in it. Consecutive pairs of lines represent homologous chromosomes in the diploid population.

Saturday, July 11, 2009

Standard Coalescent Simulators

Standard Coalescent Simulator

Download the 3 files in the links above. arg.c is a C++ program that can efficiently simulate datasets under the standard coalescent with uniform mutation rate, constant population size and with either non-uniform or uniform crossing-over and conversion rates. Parameters can be changed at the top of the code. Can be compiled using: g++ -O3 -o arg arg.c mtrand.cpp -Wno-deprecated and run by typing ./arg. Outputs SNP count and SNP locations with the alleles at those locations.


Fixed Segregating Sites Simulator

Download the 3 files in the links above. fixedseg.c allows the user to fix the positions of segregating sites in the sequence. Parameters can be changed at the top of the code. Can be compiled by typing: g++ -O3 -o fixedseg fixedseg.c mtrand.cpp -Wno-deprecated and run by typing ./fixedseg. Outputs SNP count, their positions and the haplotypes in 0-1 format where 0 denotes ancestral allele and 1 denotes derived allele.

Friday, July 10, 2009

Recombination estimation program

Crossing over and Gene conversion Estimation
http://www.sourceforge.net/projects/forwsim/files/fixeds.c/download
http://www.sourceforge.net/projects/forwsim/files/mtrand.h/download
http://www.sourceforge.net/projects/forwsim/files/mtrand.cpp/download
The C++ program fixeds.c outputs 20 different summary statistics (described in Padhukasahasram et al. 2006 and references there) under models with constant population size and uniform or nonuniform crossing-over and gene-conversion rates. The positions of segregating sites, population crossing-over and gene-conversion rates can be specified by the user at the top of the file fixeds.c. Program can be compiled using: g++ -O3 -o fixeds fixeds.c mtrand.cpp -Wno-deprecated and run by typing ./fixeds. The summaries that are output are respectively:
1. Frequency of D' < 1.0 (30% acceptance error)
2. Frequency of D' < 0.75
3. Frequency of D' < 0.50
4. Average number of distinct haplotypes for 5 kb sliding windows with 4 kb overlap. (15% acceptance error)
5. Average number of distinct haplotypes for 10 kb sliding windows with 9 kb overlap. (15% acceptance error)
6. Total number of distinct haplotypes H (15% acceptance error).
7. Frequency of D'(AB) < D'(AC) and D'(BC) < D'(AC) for 5 kb range. (30% acceptance error)
8. Frequency of D'(AB) < D'(AC) or D'(BC) < D'(AC) for 5 kb range. (30% acceptance error)
9. Frequency of D'(AB) < D'(AC) and D'(BC) < D'(AC) for 10 kb range. (30% acceptance error)
10. Frequency of D'(AB) < D'(AC) or D'(BC) < D'(AC) for 10 kb range. (30% acceptance error)
11. Frequency of D'(AB) < 1.00 and D'(BC) < 1.00 for 50 kb range.
12. Frequency of D'(AB) < 0.75 and D'(BC) < 0.75 for 50 kb range.
13. Frequency of D'(AB) < 0.50 and D'(BC) < 0.50 for 50 kb range. (30% acceptance error)
14. Frequency of D'(AB) < 0.25 and D'(BC) < 0.25 for 50 kb range.
15. Frequency of D'(AB) < 0.10 and D'(BC) < 0.10 for 50 kb range.
16. Frequency of D'(AB) < 0.50 and D'(BC) < 0.50 for 5 kb range.
17. Frequency of D'(AB) < 0.25 and D'(BC) < 0.25 for 5 kb range.
18. Frequency of D'(AB) < 0.50 and D'(BC) < 0.50 for 10 kb range.
19. Frequency of D'(AB) < 0.25 and D'(BC) < 0.25 for 10 kb range.
20. Frequency of D'(AB) < 0.10 and D'(BC) < 0.10 for 10 kb range.

Summaries that are most accurate for joint crossing-over and gene-conversion estimation are 1, 6, 13 for long-range data and 4, 5, 7, 8, 9, 10 for short-range. Suggested acceptance errors are given in brackets. These choices are slightly different from those used in Padhukasahasram et al 2006 and work better under fixed segregating sites model. Including the bounds on the minimum number of recombination events (Myers and Griffiths 2003, Song, Wu and Gusfield 2005 or Bafna and Bansal 2005) in the rejection scheme and smoothing the likelihood surfaces can result in further improvements in accuracy. For sequence length different from 50 kb, it may be necessary to change the distance cutoffs i.e. choose values other than 5 kb, 10 kb and 50 kb. Distance cutoffs can be changed at the end of the code.