Overview
ShapeIT uses the Li & Stephens hidden Markov model [] which requires recombination rates estimates between successive SNPs. ShapeIT offers two ways to estimate them:
- You can compute them from a genetic map (position in cM of the SNPs) and an effective population size as done in [].
- You can infer them from the data as done in [].
Case 1: I have a genetic map
You can specify the genetic map of the chromosome you want to phase and the effective population size with respectively the options --input-map and --effective-size. To download and format correctly a genetic map for Human, see section 2.5. The default value for the effective population size is 14000. Hereafter, a command line example to specify to use example.gmap as genetic map and 11418 for the effective population size:
shapeit --phase --input-ped example.ped example.map --input-map example.gmap --output-max example.phased --effective-size 11418
- European (CEU) : 11418
- African (YRI) : 17469
- Asian (CHB+JPT) : 14269
Case 2: I don't have a genetic map
If you don't have a genetic map for your data, you can estimates the recombination rates with Mettropolis-Hasting (MH) algorithm as done in []. However, it will require a substantial additional computational effort for phasing the data. The MH algorithm has the following parameters:
- --init-rho X : Where X is the initial recombination rate value between SNPs. The default value is 0.0004.
- --states-rho Y : Where Y is the number of haplotypes used to infer recombination rates between SNPs. A large value of Y will increase accuracy of the estimation as well as the running times (the MH algorithm is quadratic with Y). The default value is 50 which allows to phase large dataset with tractable running times.
- --infer-rho : This flag indicates to ShapeIT to use the MH algorithm to estimate recombination rates. If this flag is not provided, then ShapeIT does not use the MH algorithm and assumes a constant recombination rate accross the genome whose value is determined by the --init-rho option.
Hereafter, an example of recombination rates inference:
shapeit --phase --input-ped example.ped example.map --output-max example.phased --infer-rho