Cgatools commands

From BITS wiki
Jump to: navigation, search

for analysis of Complete Genomics data

Full listing of commands used during the CGI training

make sure you copy the full text of the commands, some are spread over several lines or go outside of the dotted box

UCSC browser direct link [1]

back to [training page]

set path and other inits

people with a bits computer do not need to do this!
by typing this bash command in the cgatools folder, you can shorten the commands in the following exercises by replacing the path by its variable name ${path_to_data}

command:

cd <<the full path to the 'training_data' folder>>
path_to_data=`pwd`

Reference Coverage File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/REF/coverageRefScore-chr21-GS19240-180-36-ASM.tsv.bz2\
| head -20

result:


… in a region

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/REF/coverageRefScore-chr21-GS19240-180-36-ASM.tsv.bz2\
 | perl -F'\t' -ane 'print if ($F[0]>= 40647496 && $F[0]<= 40647501);'

result:


Variant File

command:

if you want to see more lines than the first 20, increase the value at the end of the line accordingly

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2\
 | head -20

result:


… in a region

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2\
 | grep -w chr21 | perl -F'\t' -ane 'print if ($F[4]>=3946599 && $F[5]<=9719804);'

result:


Evidence Interval File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceIntervals-chr21-GS19240-180-36-ASM.tsv.bz2\
 | head -20

result:


… in a locus

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceIntervals-chr21-GS19240-180-36-ASM.tsv.bz2\
 | perl -F'\t' -ane 'print if ($F[0]==19995);'

result:


… in a region

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceIntervals-chr21-GS19240-180-36-ASM.tsv.bz2\
 | perl -F'\t' -ane 'print if ($F[2]<=9719803 && ($F[2]+$F[3])>=9719804);'

result:


Evidence DNB File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceDnbs-chr21-GS19240-180-36-ASM.tsv.bz2\
 | head -20

result:


… in a locus

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceDnbs-chr21-GS19240-180-36-ASM.tsv.bz2\
 | perl -F'\t' -ane 'print if ($F[0]==19995 );'

result:


Evidence Correlation File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/correlation-GS19240-180-36-ASM.tsv.bz2\
 | head -20

result:


[long running]

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/correlation-GS19240-180-36-ASM.tsv.bz2\
 | perl -F'\t' -ane 'print if (($F[0] eq 'chr21' && $F[1] == 9720357) or ($F[3] eq 'chr21' && $F[4] == 9720357));'

result:


Reference Genome File

command:

cgatools decodecrr --reference ${path_to_data}/cgatools/cgatools_input/hg18.crr --range chr21,45724800,45724870

result:



… chromosome mode

command:

cgatools listcrr --reference reference ${path_to_data}/cgatools/cgatools_input/hg18.crr --mode chromosome

result:


… contig mode

command:

cgatools listcrr --reference reference ${path_to_data}/cgatools/cgatools_input/hg18.crr --mode contig

result:


Gene Variation Summary File

command:

cat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/geneVarSummary-GS19240-180-36-ASM.tsv\
 | head -20

result:


… summarized

command:

grep '[^0-9]' ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/geneVarSummary-GS19240-180-36-ASM.tsv\
 | sort -k18nr | head

result:


Gene File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2\
 | head -20

result:


… for a given snpID

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2\
 | grep 'rs1042522'

result:


[long running, generates count of components]

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2\
 | grep '^[0-9]' | cut -f16 | sort | uniq -c | sort -k1nr

result:



[long running, generates count of impacts]

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2\
 | grep '^[0-9]' | cut -f19 | sort | uniq -c | sort -k1nr

result:



… FRAMESHIFT in TEKT4 only

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2\
 | grep -w 'TEKT4' | perl -F'\t' -ane 'print if ($F[18] =~ /FRAMESHIFT/);'

result:


… DELETE or INSERT in TEKT4 only

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2\
 | grep -w 'TEKT4' | perl -F'\t' -ane 'print if ($F[18] =~ /DELETE/ || $F[18] =~ /INSERT/);'

result:


… MISSENSE in gene TEKT4 summarized

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2\
 | grep -w 'TEKT4' | grep -w 'MISSENSE' | cut -f1 | sort | uniq -c | sort -k1nr | head -5

result:

 

chr21 Varfile Subset

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2\
 | head -200 | grep -v '^[0-9]' > ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv

result:


… subset chr21 from all

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2\
 | grep 'chr21' >> ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv

result:

# see the results with
head -20 ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv

cgatools calldiff

command:

 cgatools calldiff --reference ${path_to_data}/cgatools/cgatools_input/hg18.crr \
 --variantsA ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv \
 --variantsB ${path_to_data}/cgatools/cgatools_input/var-NA19238-chr21.tsv \
 --superlocus-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-output.tsv \
 --superlocus-stats ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-stats.csv \
 --locus-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-output.tsv \
 --locus-stats ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-stats.csv \
 --debug-call-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-debug-call-output.tsv \
 --debug-superlocus-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-debug-superlocus-output.tsv

result:

# preview results with

# superlocus-output
head -20 ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-output.tsv

# superlocus-stats
cat ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-stats.csv

# locus-output
head -20 ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-output.tsv

# locus-stats
cat ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-stats.csv

cgatools snpdiff

command:

cgatools snpdiff --reference ${path_to_data}/cgatools/cgatools_input/hg18.crr \
 --variants ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv \
 --genotypes ${path_to_data}/cgatools/cgatools_input/NA19238_Infinium_Genotypes.tsv \
 --output ${path_to_data}/cgatools/cgatools_output/NA19238_out.tsv \
 --verbose ${path_to_data}/cgatools/cgatools_output/NA19238_verb.tsv \
 --stats ${path_to_data}/cgatools/cgatools_output/NA19238_stats.csv

result:

# preview output with

# output
head -20  ${path_to_data}/cgatools/cgatools_output/NA19238_out.tsv

# stats
cat ${path_to_data}/cgatools/cgatools_output/NA19238_stats.csv

end of 'September-9th' exercises

when we get new commands for version 1.1, they will be added here
back to [training page]