Impute2 VCF conversion

From CKDGen wiki
Jump to navigation Jump to search

The workflow and scripts below help with the following steps to convert IMPUTE2 *.gen files into *.vcf files that contain the dosage of the alternate allele in a single field (needed by EPACTS to use dosages for association):

  • QCtool helps with Impute2 genotypes (*.impute2/*.gen) to VCF conversion.
  • It might be necessary to set the correct chromosome number in the VCF file.
  • You need to pay attention to file sorting (by variant position) - script provided
  • To allow EPACTS make use of the genotype dosages, a dosage field (default DS) need to be added to the VCF file based on genotype call probabilities (default GP). The AddVcfDosage script does this final step.
  • You will need to specify the dosage field by "--field" option when running EPACTS.

CAVEAT: Processing huge files can consume considerable amounts of disk space (temporary files!) and memory. Alternative directories for temporary files created by the sort command can be specified by its -T option.

Example script (excerpts):

   # GEN -> VCF
   qctool \
       -g ../GCKD_Common_Clean-chr${CHR}.gen.gz \
       -s ../GCKD_Common_Clean-chr${CHR}.sample \
       -og GCKD_Common_Clean-chr${CHR}.vcf
   bgzip GCKD_Common_Clean-chr${CHR}.vcf
   # set chromosome number (replace NA with the chromosome number in case it is missing)
   zcat GCKD_Common_Clean-chr${CHR}.vcf.gz | \
       awk -v chr=$CHR 'BEGIN { FS = "\t"; OFS = "\t" } ; { if ($1 == "NA") { $1 = chr } ; print }' | \
       bgzip > GCKD_Common_Clean-chr${CHR}.renumber.vcf.gz
   # sort VCF file by variant position
   . GCKD_Common_Clean-chr${CHR}.renumber.vcf.gz \
   # add dosage field to VCF GCKD_Common_Clean-chr${CHR}.sorted.vcf.gz \

This is the "" script (

   if [ ! -f "$INFN" ]
           echo "Input file does not exist: $INFN"
   if [ -f "$OUTFN" ]
           echo "Output file must not exist: $OUTFN"
   echo "Sort VCF: $INFN"
   (zgrep ^"#" $INFN ; zgrep -v ^"#" $INFN | sort -k2n) | bgzip > $OUTFN
   echo "Index VCF: $OUTFN"
   tabix -p vcf $OUTFN
   echo "Done"

The "" script is part of AddVcfDosage and available at