Cgatools commands with results
Full listing of commands used during the CGI training AND their solutions
make sure you copy the full text of the commands, some are spread over several lines or go outside of the dotted box
UCSC browser direct link [1]
back to [training page]
Contents
- 1 set path and other inits
- 2 Reference Coverage File
- 3 …
- 4 Variant File
- 5 ...
- 6 Evidence Interval File
- 7 ...
- 8 ...
- 9 Evidence DNB File
- 10 ...
- 11 Evidence Correlation File
- 12 [long running]
- 13 Reference Genome File
- 14 ...
- 15 ...
- 16 Gene Variation Summary File
- 17 ...
- 18 Gene File
- 19 ...
- 20 [long running, generates count of components]
- 21 [long running, generates count of impacts]
- 22 ...
- 23 ...
- 24 ...
- 25 chr21 Varfile Subset
- 26 ...
- 27 see result
- 28 calldiff
- 29 snpdiff
- 30 end of 'September-9th' exercises
set path and other inits
by typing this bash command in the cgatools folder, you can shorten the commands in the following exercises by replacing the path by its variable name ${path_to_data}
command:
cd <<the full path to the cgatools folder>> path_to_data=`pwd`
Reference Coverage File
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/REF/coverageRefScore-chr21-GS19240-180-36-ASM.tsv.bz2 | head -20
result:
#ASSEMBLY_ID GS19240-180-36-ASM #CHROMOSOME chr21 #FORMAT_VERSION 1.3 #GENERATED_AT 2010-Jul-14 14:21:34.927322 #GENERATED_BY ExportReferenceSupport #SAMPLE GS00028-DNA_C01 #SOFTWARE_VERSION 1.8.0.23 #TYPE REFMETRICS >offset refScore uniqueSequenceCoverage weightSumSequenceCoverage 9719767 -75 6 7 9719768 -46 6 8 9719769 -46 10 13 9719770 -13 16 20 9719771 -13 19 23 9719772 3 18 23 9719773 3 26 31 9719774 6 28 32 9719775 10 31 36 9719776 12 36 42
…
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/REF/coverageRefScore-chr21-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if ($F[0]>= 40647496 && $F[0]<= 40647501);'
result:
40647496 55 24 24 40647497 61 24 24 40647498 52 24 24 40647499 -94 22 22 40647500 22 18 18 40647501 43 19 19
Variant File
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2 | head -20
result:
#ASSEMBLY_ID GS19240-180-36-ASM #DBSNP_BUILD dbSNP build 130 #FORMAT_VERSION 1.3 #GENERATED_AT 2010-Jul-14 04:33:11.149915 #GENERATED_BY dbsnptool #GENOME_REFERENCE NCBI build 36 #SAMPLE GS00028-DNA_C01 #SOFTWARE_VERSION 1.8.0.23 #TYPE VAR-ANNOTATION >locus ploidy allele chromosome begin end varType reference alleleSeq totalScore hapLink xRef 1 2 all chr1 0 901 no-call = ? 2 2 all chr1 901 910 ref = = 3 2 all chr1 910 956 no-call = ? 4 2 all chr1 956 972 ref = = 5 2 all chr1 972 979 no-call = ? 6 2 all chr1 979 993 ref = = 7 2 all chr1 993 1033 no-call = ? 8 2 all chr1 1033 1046 ref = = 9 2 all chr1 1046 1053 no-call = ?
...
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2 | grep -w chr21 | perl -F'\t' -ane 'print if ($F[4]>=3946599 && $F[5]<=9719804);'
result:
21030571 2 1 chr21 9719767 9719767 no-call ? 21030571 2 1 chr21 9719767 9719773 ref AATTCT AATTCT 112 21030571 2 2 chr21 9719767 9719773 no-call AATTCT ? 21030572 2 all chr21 9719773 9719792 ref = = 21030573 2 1 chr21 9719792 9719793 ref G G 112 3931115 21030573 2 2 chr21 9719792 9719793 snp G T 112 3931116 21030574 2 all chr21 9719793 9719796 ref = = 21030575 2 1 chr21 9719796 9719797 ref G G 112 3931115 21030575 2 2 chr21 9719796 9719797 snp G C 112 3931116 21030576 2 all chr21 9719797 9719803 ref = = 21030577 2 1 chr21 9719803 9719804 snp T C 112 dbsnp.129:rs55981545 21030577 2 2 chr21 9719803 9719804 snp T C 112 dbsnp.129:rs55981545
Evidence Interval File
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceIntervals-chr21-GS19240-180-36-ASM.tsv.bz2 | head -20
result:
#ASSEMBLY_ID GS19240-180-36-ASM #CHROMOSOME chr21 #FORMAT_VERSION 1.3 #GENERATED_AT 2010-Jul-14 14:21:04.402357 #GENERATED_BY ExportEvidence #SAMPLE GS00028-DNA_C01 #SOFTWARE_VERSION 1.8.0.23 #TYPE EVIDENCE-INTERVALS >IntervalId Chromosome OffsetInChromosome Length Ploidy AlleleIndexes Score Allele0 Allele1 Allele2 Allele1Alignment Allele2Alignment 0 chr21 9719767 49 2 1;2 1134 AATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATTTCACACATTTGA GAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTCACACATTTGA GAATTCTGAGAAACTTCTTTGTGAGGTTTGCATTCATCTCACACATTTGA 1I49M 1I49M 1 chr21 9719830 37 2 0;1 1143 GAAGATTTGGAAACAGTCTTTTTGTAAAATCTATAAA GAAGATTTGGAAACAGTCTTTTTGTAAAATCTACAAA 37M 2 chr21 9719892 14 2 0;1 373 CTAGGGTGAAGTAG CTATGGTGAAGTAG 14M 3 chr21 9719933 34 2 0;1 198 GAAATTTTCTGAGAAACGTTTTAGTGATGCGTGC GAAATTTTCTGAGAAACCTTTTAGTGATGCGTGC 34M 4 chr21 9720003 46 2 1;2 1035 GCACTTTGGAAACAGTCCTATTGTAGAATCCCCAAAGGAATACTTC GCACTTTGGAAACAGTCCTATTGTAGAATCCCCAAAGGGATACTTC GCAGTTTGGAAACAGTCCTATTGTAGAATCCCCAAAGGGGTACTTC 46M 46M 5 chr21 9720073 40 2 1;1 645 TATTGGAAATATCTTCACATAAAAGCTAGACAGAAACTTT TATTGGAAATATCTTCACATAAAAGCTAGACAGAAGCTTT 40M 6 chr21 9720137 7 2 0;1 675 GCTCTCA GCTTTCA 7M 7 chr21 9720156 35 2 0;1 351 AAGTGTTTCTTTTGAATGAGCAGTTTGGAAACACT AAGCATTTCTTTTGACTGAGCAGTTTGGAAACACT 35M 8 chr21 9720221 47 2 1;2 1259 GAGCGTTTTGAGGCCTATGGTGAAAAAGGAAATATCTTCACATAAAA GAGCGTTTCGAGGCCTATGGTGAAAAAGCAAATATCTTCACATAAAA GAGTGTTTTGAGGCCTATGGTGAAAAAGGAAATATCTTCACATAAAA 47M 47M 9 chr21 9720276 33 2 0;1 925 GAAGCTTTCTGAGAAACTACTTTGTAATGTGTG GAAGTTTTCTGGGAAACTAATTTGTAATGTGTG 33M
...
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceIntervals-chr21-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if ($F[0]==19995);'
result:
19995 chr21 18571475 34 2 1;2 509 AAGTCACATTTACTCAGCTTTTAAAAAAAATCCA AAGCCACATTTACTCAGCTTTTAAAAAAAAATCCA AAGCCACATTTACTCAGCTTTTAAAAAAAATCCA 24M1I10M 34M
...
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceIntervals-chr21-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if ($F[2]<=9719803 && ($F[2]+$F[3])>=9719804);'
result:
0 chr21 9719767 49 2 1;2 1134 AATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATTTCACACATTTGA GAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTCACACATTTGA GAATTCTGAGAAACTTCTTTGTGAGGTTTGCATTCATCTCACACATTTGA 1I49M 1I49M
Evidence DNB File
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceDnbs-chr21-GS19240-180-36-ASM.tsv.bz2 | head -20
result:
#ASSEMBLY_ID GS19240-180-36-ASM #CHROMOSOME chr21 #FORMAT_VERSION 1.3 #GENERATED_AT 2010-Jul-14 14:21:04.402357 #GENERATED_BY ExportEvidence #SAMPLE GS00028-DNA_C01 #SOFTWARE_VERSION 1.8.0.23 #TYPE EVIDENCE-DNBS >IntervalId Chromosome Slide Lane FileNumInLane DnbOffsetInLaneFile AlleleIndex Side Strand OffsetInAllele AlleleAlignment OffsetInReference ReferenceAlignment MateOffsetInReference MateReferenceAlignment MappingQuality ScoreAllele0 ScoreAllele1 ScoreAllele2 Sequence Scores 0 chr21 GS10364-FS3 L04 5 2091773 0 R - 12 5M2B10M0N10M7N10M 9719779 5M2B10M0N10M7N10M9720139 10M6N10M0N10M2B5M H 39 39 0 TGCTCTCATTCAAAAGAAACACTTATGAGATGAGAAGTTCAAATGTGAATCCAACCCTCACAAAGAGAAG 3831567766677569;;<;;;6787/93148-0/%4(78)25%2:9899;;;:8886881874694599 0 chr21 GS10364-FS3 L04 5 16662456 0 L + 13 5M2B10M0N10M6N10M 9719780 5M2B10M0N10M6N10M 9720077 10M5N10M0N10M1B5M : 25 25 0 TTCTTTTTGTGAGGGTTGGATTCATCATTTGAACAGGAAATATCTTAAAAGCTAGACAGAAGCTTTTCTG 585386'78*647778:<<+;:;67:99:8::91/28/781.0.3::;:::;;:88868855656:9697 0 chr21 GS10364-FS3 L05 11 13924291 0 L + 11 5M2B10M0N10M6N10M 9719778 5M2B10M0N10M6N10M 9720110 10M6N10M0N10M2B5M 8 23 23 0 ACTTCTCTTTGTGAGGGTTGGATTCCACATTTGAATGCCAGAGAATTTTAATGAGTGCTCTCATCTCTCA 7887978676574668;;,::;:0::99989991/''+)&02532:7::8;;::877888574454:9:9 0 chr21 GS10364-FS3 L06 4 18290323 0 L + 12 5M2B10M0N10M6N10M 9719779 5M2B10M0N10M6N10M 9720173 10M6N10M0N10M2B5M A 32 32 0 CTTCTCTTTGTGAGGGTTGGATTCAACATTTGAACGTGCTGTTTGCTCTTTTTGCATAATCTGCACAAAT 7679060433056668:2;4::::7::99:8:+1/1$+,$733,0:9:):::::86879264275686$1 0 chr21 GS10364-FS3 L08 2 24245523 0 L + 6 5M2B10M0N10M5N10M 9719773 5M2B10M0N10M5N10M 9720151 10M6N10M0N10M2B5M Y 56 27 1 GAGAAAAACTTCTTTGTGAGGGTTGATTTCACACAGAGTTAAGTGTTGAATGAGCAGTTTGGAAAAACAC 78798865756773&7:;:<9::::'791:::91/77858103(0:9:5:;:::8(9888074168:695 0 chr21 GS10364-FS3 L08 5 29566964 0 L + 7 5M2B10M0N10M6N10M 9719774 5M2B10M0N10M6N10M 9720015 10M6N10M0N10M2B5M 5 20 0 0 AGAAAAACTTCTTTGTGAGGGTTGGTTCACACATTCAGTCCTATTTCCCCAAAGGGATACTTCTCTCAGC 778885056467447/;3:<::9:59::89::610'85+/8,&1,%&/9%5488888968648058:749 0 chr21 GS10364-FS3 L08 10 10818535 0 R - 12 5M2B10M0N10M4N10M 9719779 5M2B10M0N10M4N10M 9720139 10M6N10M0N10M2B5M S 50 50 0 TGCTCTCATTCAAAAGAAACACTTATGAGATGAGATCAAATGTGTTGAATCCAACCCTCACAAAGAGAAG 84799'8755667367;<<;6;:::799595-8*.18674223'26:5:9;;;;88291964730::899 0 chr21 GS10367-FS3 L01 9 2114032 0 L + 18 5M2B10M0N10M6N10M 9719785 5M2B10M0N10M6N10M9720095 10M7N10M0N10M2B5M - 12 7 0 TGTGGGAGGGTTGGATTCATTTCACGAACATTTCTAANCTAGACATCTGAGAAACTTATTTTTAAAATGA *(&9/8679666*178;;;;$8.69::::;:981/89!*05650/,'&&&/;;:8(8*68623.)9:899 0 chr21 GS10367-FS3 L02 4 6345383 0 L + 13 5M2B10M0N10M6N10M 9719780 5M2B10M0N10M6N10M9720162 10M6N10M0N10M2B5M , 11 11 0 TTCTTTTTGTGAGGGTTGGATTCATCATTTGAACATTCTTTTGAAGTTTGGAAACACTCTTTTTTTTNAT 686/01#442667779:;;.;;;9::998:8991/&6625%/4-.9::::;;;:8675872456-/,!*2 0 chr21 GS10367-FS3 L05 2 27188331 0 L + 36 5M2B10M0N10M6N10M 9719803 5M2B10M0N10M6N10M 9720163 10M5N10M1N10M2B5M K 42 15 15 TTCACACACATTTGAACATTTCTTTAGATTTGGAATCTTTTGAATGTTNGGAAACCTCTTTTTGCGCATA 7882976440677669;9<;9;28997989:9:1/5738842-3,889!::;::3*7898(22,69:797
...
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceDnbs-chr21-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if ($F[0]==19995 );'
result:
19995 chr21 GS10367-FS3 L07 10 11349238 0 L + 21 5M2B10M0N10M5N10M 18571496 5M2B10M0N10M5N10M 18571824 10M4N10M0N10M2B5M > 30 19 30 TAAAAAAAAAATCCAAANACAAAAATCCTATNAAAGANTTAGAACTGCTTTCTCTTATAATAGGTGTAAC -869999886675769;!<;:;9:877889:!81/77!57-44(27:89:78::7$758865556::899 19995 chr21 GS10364-FS3 L02 6 877188 1 R + -20 10M6N10M0N10M2B5M 18571455 10M6N10M0N10M2B5M 18571136 5M2B10M0N10M6N10M K 11 42 42 AGAGAGACGGGGTTTCACCATGTTGCTGGTCTCGATAGTAACAATACGTAAGCCACATTTACTCACAGCT 6879(&7868677777;;9;9;:8:7,8999&9(%488673535/::99:;;:988999366653:8.37 19995 chr21 GS10364-FS3 L02 7 22800208 1 R - -3 5M2B10M0N10M6N10M 18571472 5M2B10M0N10M4N1P1N10M 18571883 10M6N10M0N10M2B5M F 19 37 34 GCAGAAAAAATCTTGTGGAAAGGTACTCAGAATCATTGGATTTTTAAGCTGAGTAAATGTGGCTTTTACG 67739)2&7506276895:89999:97993877//(5786)/0+0982:9;:4:83578*)+441:88'8 19995 chr21 GS10364-FS3 L02 9 5502806 1 R - -11 5M1B10M0N10M5N10M 18571464 5M1B10M0N10M5N6M1I3M 18571802 10M6N10M0N10M2B5M ( 0 7 8 GCACACAAAGTTCTAAGTCAAATGCGATAACTCCTGTTATTAAAAGTAAATGTGGCTTACGTTTAACTTA +/2238/65,(06568;;;:79:989991:8:4,/$/$'$%22*.8'9:9:;;98878840552+/:5*5 19995 chr21 GS10364-FS3 L03 4 469726 1 L - -19 10M6N10M0N10M2B5M 18571456 10M6N10M0N10M2B5M 18571098 5M2B10M0N10M6N10M U 21 52 52 AAGCTCTGAGTAAATGTGGCTTACGTATTGTTACTTACTTAAAATAAAAAATTAGCCAGGCGTGGGGTGG 78889165666777689<<,6:3::4)5:9:971/'460.242)/:::::;;:98.878535%6499::9 19995 chr21 GS10364-FS3 L05 7 1904718 1 L - -14 10M6N10M0N10M2B5M 18571461 10M6N10M0N10M2B4M1I 18570973 5M2B10M0N10M4N10M 3 0 18 18 TTTAAAAAAGCTGAGTAAATGTGGCTTACTTATGGGCGCCATTGCCAGCCTGGAGGACAAGAGCTCTAGG 47+88-60665707638;;4::97/8,498:990/77068-#211'1/-/;;;98898881567/9::99 19995 chr21 GS10364-FS3 L07 2 4371163 1 R - -3 5M2B10M0N10M6N10M 18571472 5M2B10M0N10M4N1P1N10M 18571883 10M6N10M0N10M2B5M N 19 45 43 GCAGAGAAAATCTTGTGGAAAGGTACTCAGAATCATTGGATTTTTAAGCTGAGTAAATGTGGCTTTTACG 7778966&4255.0));:;8;;94959::899#100678%,242-::%:9:;(9884,88344/56:786 19995 chr21 GS10364-FS3 L08 11 7769695 1 L - -17 10M6N10M0N10M2B5M 18571458 10M6N10M0N10M2B5M 18571205 5M2B10M0N10M6N10M 6 11 21 21 AAAAGAGNTGAGTAAATGTGGCTTACTTATTGTTACGGTGGCTCTTATACCCAACACTTTGGGAGAGGCT 7889968!6766555.&8:;'/::.:793:::91.5,,7&8&3$#/7%+/,);:8%889911.&44:999 19995 chr21 GS10367-FS3 L01 4 26409639 1 L + -18 5M1B10M0N10M6N10M 18571457 5M1B10M0N10M6N10M 18571770 10M6N10M0N10M2B5M U 22 52 52 GTAACCAATAAGTAAACGTAAGCCACTCAGCTTTTAATACAGTGTTGTGAAGAACTTTTGTAGGAGAGTT 7889972777676768::9:;;:::899:::681/71627,-33)999:96;;:8-67584576,:9:79 19995 chr21 GS10367-FS3 L01 6 8229516 1 L + 14 5M2B10M0N10M5N10M 18571489 5M2B7M1I2M0N10M5N10M 18571784 10M5N10M1N10M2B5M : 0 25 0 CAGCTCATTTAAAAAAAAATCCAAAAAAAAACTTCGCTGTGAAGATGTAGGAGTTTCAGAATAGCGCATT +(889'&&3,0546-9;<;:9;::9:897::991/7866582431::::9;:;85/8885/%-219:873 19995 chr21 GS10367-FS3 L02 4 28884636 1 R - -32 5M2B10M0N10M7N10M 18571443 5M2B10M0N10M7N10M 18571695 10M6N10M0N10M2B5M I 12 40 40 ATTGAGAAGTCAAAGACTGTATATTTTTCTCCCTCTGTGGCTTACTATTGTTACTAAAGATATATATNTC 58698576567767(8;4.:0:8,9885:99991067708545)/92989;;,:8788836665295!97 19995 chr21 GS10367-FS3 L02 12 8349863 1 L - -24 10M6N10M0N10M2B5M 18571451 10M6N10M0N10M2B5M 18571062 5M2B10M0N10M6N10M K 17 42 42 GAGTATAANTGTGGCTTACGTTTACTTACTAAAGATGGTACACACTCCCAGGCTACTCGGGAGGCGCNGA 76688&53!3333548:8;;48:::9989:9:,1/648,2413,$:::::;;;:6677865265347!88 19995 chr21 GS10367-FS3 L03 7 19754890 1 L - 12 10M5N10M0N10M2B5M 18571487 10M2N1P2N10M0N10M2B5M 18571049 5M2B10M0N10M5N10M = 8 28 8 AGTTTTTTTATTGTTTTTGGATTTTAAAAGCTGAGAATCCCAGGCGGGAGGCTGAGGCAGGAGAAAATCA 08899302./657658;<<;/75:9789997:91/6/8707141.1::;:;8;:8898876157577949 19995 chr21 GS10367-FS3 L03 10 9478592 1 R - -16 5M2B10M0N10M6N10M 18571459 5M2B10M0N10M6N10M 18571814 10M6N10M0N10M2B5M 0 0 15 15 TTATATAAGAGAAAGCACAAAGTTCAAATGCTTTTTAAAAGTTGAGTGGCTTACGTTTACTTAGTTTGTT ((1423737,770569;<<9;7;::3998::971./6476/24./:6::9;;7:6*8868$'-$59$55& 19995 chr21 GS10367-FS3 L04 5 18961206 1 L + 14 5M2B10M0N10M6N10M 18571489 5M2B7M1I2M0N10M6N10M 18571784 10M5N10M1N10M2B5M 6 0 21 0 CAGCTCTTGTAAAAAAAAATCCAAAAAAAACTTCCGCTGTGAAGATGTAGGAGTTTCAGAAAAGCGCATT 8688970%.(3777*9;<<;99::::::9:9:81/563.14+,+1::9::;6:/8&88885'20'::965 19995 chr21 GS10367-FS3 L05 5 7047390 1 R - -31 5M2B10M0N10M6N10M 18571444 5M2B10M0N10M6N10M 18571810 10M5N10M0N10M2B5M 2 0 17 17 AGAGAGAAAGCACAAAGTTCTAAGTGCTTTTCTGATGTGGCTTACTTANTGTTACCAAAGATATATATGT 66999776676677699;7;%)78/9687978*-%,6/78+#'00%$9!%884-,&9998813$,*97:6 19995 chr21 GS10367-FS3 L05 7 22469414 1 L - -17 10M5N10M0N10M2B5M 18571458 10M5N10M0N10M2B5M 18571095 5M2B10M0N10M5N10M 8 2 23 23 AAAGCGCTGAGTAAATGTGGCTTACCTTATTGTTATAAAATACCAAAANTAGCCAGGCGTGGTGGGGTAC 888993.54656766958,4:;:94%-(363971.$2.8711%+)987!8::(988968656&048:998 19995 chr21 GS10367-FS3 L07 8 29170387 1 L - -22 10M6N10M0N10M2B5M 18571453 10M6N10M0N10M2B5M 18571059 5M2B10M0N10M5N10M I 12 40 40 CTGAGAGTAAATGTGGCTTACGTTTTGTTACTAAAACACACCAGTCAGGCTACTCGGGAGGCTGAGAGGC 8689767'78667678:;949;::98:6:::691/77888(0(&%69:9:3;:88.28175362385958 19995 chr21 GS10367-FS3 L07 9 7259350 1 L + -34 5M2B10M0N10M7N10M 18571441 5M2B10M0N10M7N10M 18571698 10M5N10M0N10M2B5M T 12 51 50 CAGACACATATATCTTTAGTAACAAACGTAAGCCAGGAGAAATATTATACAGTCTTTGACTTCAAAATAT 78899663'4676368;;9;6:;:::9:9:9391/47988923,-:996:6;::4888896474599:93 19995 chr21 GS10367-FS3 L08 1 11188879 1 R + -17 10M6N10M0N10M2B5M 18571458 10M6N10M0N10M2B5M 18571128 5M2B10M0N10M6N10M Q 24 55 55 TTTTATAAGTAGAGACGGGGTTTCATGGCCAGGCTTAACAATAAGTAAGCCACATTTACTCAGCTCTTTT '8625-7675657668;;<::;2:69:89:9081/'6888810647869:89;:74888861454:92'' 19995 chr21 GS10367-FS3 L08 11 2896905 1 L + 22 5M2B10M0N10M6N10M 18571497 2M1I2M2B10M0N10M6N10M 18571788 10M6N10M0N10M2B5M ; 33 53 33 AAAAAAAAAAATCCAAAAACAAAAACCTATGAAATTGAAGAACTTGAGTTATCAGAAAAGCATTTTTGAC 2789978692534668;;;;8:;::6898:9:90,/,237441204::45:7:28888987543341.78 19995 chr21 GS10364-FS3 L02 6 877188 2 R + -20 10M6N10M0N10M2B5M 18571455 10M6N10M0N10M2B5M 18571136 5M2B10M0N10M6N10M K 11 42 42 AGAGAGACGGGGTTTCACCATGTTGCTGGTCTCGATAGTAACAATACGTAAGCCACATTTACTCACAGCT 6879(&7868677777;;9;9;:8:7,8999&9(%488673535/::99:;;:988999366653:8.37 19995 chr21 GS10364-FS3 L02 7 22800208 2 R - -3 5M2B10M0N10M5N10M 18571472 5M2B10M0N10M5N10M 18571883 10M6N10M0N10M2B5M C 19 37 34 GCAGAAAAAATCTTGTGGAAAGGTACTCAGAATCATTGGATTTTTAAGCTGAGTAAATGTGGCTTTTACG 67739)2&7506276895:89999:97993877//(5786)/0+0982:9;:4:83578*)+441:88'8 19995 chr21 GS10364-FS3 L02 9 5502806 2 R - -11 5M1B10M0N10M5N10M 18571464 5M1B10M0N10M5N10M 18571802 10M6N10M0N10M2B5M ( 0 7 8 GCACACAAAGTTCTAAGTCAAATGCGATAACTCCTGTTATTAAAAGTAAATGTGGCTTACGTTTAACTTA +/2238/65,(06568;;;:79:989991:8:4,/$/$'$%22*.8'9:9:;;98878840552+/:5*5 19995 chr21 GS10364-FS3 L03 4 469726 2 L - -19 10M6N10M0N10M2B5M 18571456 10M6N10M0N10M2B5M 18571098 5M2B10M0N10M6N10M U 21 52 52 AAGCTCTGAGTAAATGTGGCTTACGTATTGTTACTTACTTAAAATAAAAAATTAGCCAGGCGTGGGGTGG 78889165666777689<<,6:3::4)5:9:971/'460.242)/:::::;;:98.878535%6499::9 19995 chr21 GS10364-FS3 L03 8 10073663 2 R - 1 5M2B10M0N10M5N10M 18571476 5M2B10M0N10M5N10M 18571896 10M5N10M0N10M2B5M 5 0 16 20 TAGGAGATGAAAATGCAGAAAATCTAAAGGTAGAGGTTTTTGGATTTANAAGCTGAGTAAATGTGTGGCT 7789964676665369;;;;:::6:979:977:1074366.4323879!8:;9:8795787763699999 19995 chr21 GS10364-FS3 L05 7 1904718 2 L - -14 10M6N10M0N10M2B5M 18571461 10M6N10M0N10M2B5M 18570973 5M2B10M0N10M4N10M 3 0 18 18 TTTAAAAAAGCTGAGTAAATGTGGCTTACTTATGGGCGCCATTGCCAGCCTGGAGGACAAGAGCTCTAGG 47+88-60665707638;;4::97/8,498:990/77068-#211'1/-/;;;98898881567/9::99 19995 chr21 GS10364-FS3 L07 2 4371163 2 R - -3 5M2B10M0N10M5N10M 18571472 5M2B10M0N10M5N10M 18571883 10M6N10M0N10M2B5M L 19 45 43 GCAGAGAAAATCTTGTGGAAAGGTACTCAGAATCATTGGATTTTTAAGCTGAGTAAATGTGGCTTTTACG 7778966&4255.0));:;8;;94959::899#100678%,242-::%:9:;(9884,88344/56:786 19995 chr21 GS10364-FS3 L08 11 7769695 2 L - -17 10M6N10M0N10M2B5M 18571458 10M6N10M0N10M2B5M 18571205 5M2B10M0N10M6N10M 6 11 21 21 AAAAGAGNTGAGTAAATGTGGCTTACTTATTGTTACGGTGGCTCTTATACCCAACACTTTGGGAGAGGCT 7889968!6766555.&8:;'/::.:793:::91.5,,7&8&3$#/7%+/,);:8%889911.&44:999 19995 chr21 GS10367-FS3 L01 4 26409639 2 L + -18 5M1B10M0N10M6N10M 18571457 5M1B10M0N10M6N10M 18571770 10M6N10M0N10M2B5M U 22 52 52 GTAACCAATAAGTAAACGTAAGCCACTCAGCTTTTAATACAGTGTTGTGAAGAACTTTTGTAGGAGAGTT 7889972777676768::9:;;:::899:::681/71627,-33)999:96;;:8-67584576,:9:79 19995 chr21 GS10367-FS3 L02 3 3044773 2 L + -7 5M2B10M0N10M6N10M 18571468 5M2B10M0N10M6N10M 18571821 10M6N10M2N10M3B5M 7 1 14 22 TAAACACGTAAGCCACATTTACTCAAAAAAAAATCTTTGGCTTAGGTGCTTTCTCATAATAGGTAGTNAC &889887867645779;;;;,/865::9:9/,,$.+786)7022/7:::::;;:867887641%7::!78 19995 chr21 GS10367-FS3 L02 4 28884636 2 R - -32 5M2B10M0N10M7N10M 18571443 5M2B10M0N10M7N10M 18571695 10M6N10M0N10M2B5M I 12 40 40 ATTGAGAAGTCAAAGACTGTATATTTTTCTCCCTCTGTGGCTTACTATTGTTACTAAAGATATATATNTC 58698576567767(8;4.:0:8,9885:99991067708545)/92989;;,:8788836665295!97 19995 chr21 GS10367-FS3 L02 12 8349863 2 L - -24 10M6N10M0N10M2B5M 18571451 10M6N10M0N10M2B5M 18571062 5M2B10M0N10M6N10M K 17 42 42 GAGTATAANTGTGGCTTACGTTTACTTACTAAAGATGGTACACACTCCCAGGCTACTCGGGAGGCGCNGA 76688&53!3333548:8;;48:::9989:9:,1/648,2413,$:::::;;;:6677865265347!88 19995 chr21 GS10367-FS3 L03 10 9478592 2 R - -16 5M2B10M0N10M6N10M 18571459 5M2B10M0N10M6N10M 18571814 10M6N10M0N10M2B5M 0 0 15 15 TTATATAAGAGAAAGCACAAAGTTCAAATGCTTTTTAAAAGTTGAGTGGCTTACGTTTACTTAGTTTGTT ((1423737,770569;<<9;7;::3998::971./6476/24./:6::9;;7:6*8868$'-$59$55& 19995 chr21 GS10367-FS3 L05 5 7047390 2 R - -31 5M2B10M0N10M6N10M 18571444 5M2B10M0N10M6N10M 18571810 10M5N10M0N10M2B5M 2 0 17 17 AGAGAGAAAGCACAAAGTTCTAAGTGCTTTTCTGATGTGGCTTACTTANTGTTACCAAAGATATATATGT 66999776676677699;7;%)78/9687978*-%,6/78+#'00%$9!%884-,&9998813$,*97:6 19995 chr21 GS10367-FS3 L05 7 22469414 2 L - -17 10M5N10M0N10M2B5M 18571458 10M5N10M0N10M2B5M 18571095 5M2B10M0N10M5N10M 8 2 23 23 AAAGCGCTGAGTAAATGTGGCTTACCTTATTGTTATAAAATACCAAAANTAGCCAGGCGTGGTGGGGTAC 888993.54656766958,4:;:94%-(363971.$2.8711%+)987!8::(988968656&048:998 19995 chr21 GS10367-FS3 L06 1 18489507 2 R + 0 10M5N10M0N10M2B5M 18571475 10M5N10M0N10M2B5M 18571166 5M2B10M0N10M6N10M ' 0 1 6 TGGTCTCCCGAACTCCTGACCTCAGTGCCCGGCTCAAGCCACATTAGCTTTTACAAAAAATCCAAAAAAA 1'1+(2&$6846*%'7;;7:99:6958/:79:81/68638554/447(2+:1;&8888,9,7.66+9999 19995 chr21 GS10367-FS3 L07 8 9323455 2 R + 0 10M4N10M0N10M2B5M 18571475 10M4N10M0N10M2B5M 18571205 5M2B10M0N10M6N10M K 21 39 49 AGCCTCTCCCAANGTGTTGGGTTTATGAGCCACCGAAGCCACATTCAGCTTTTAAAAAAAATCCACAAAA 67)396646/66!6388;;;:99)*8:4898990/37787822)30+5::9:7:87888862466::9:9 19995 chr21 GS10367-FS3 L07 8 29170387 2 L - -22 10M6N10M0N10M2B5M 18571453 10M6N10M0N10M2B5M 18571059 5M2B10M0N10M5N10M I 12 40 40 CTGAGAGTAAATGTGGCTTACGTTTTGTTACTAAAACACACCAGTCAGGCTACTCGGGAGGCTGAGAGGC 8689767'78667678:;949;::98:6:::691/77888(0(&%69:9:3;:88.28175362385958 19995 chr21 GS10367-FS3 L07 9 7259350 2 L + -34 5M2B10M0N10M7N10M 18571441 5M2B10M0N10M7N10M 18571698 10M5N10M0N10M2B5M S 12 51 50 CAGACACATATATCTTTAGTAACAAACGTAAGCCAGGAGAAATATTATACAGTCTTTGACTTCAAAATAT 78899663'4676368;;9;6:;:::9:9:9391/47988923,-:996:6;::4888896474599:93 19995 chr21 GS10367-FS3 L07 10 11349238 2 L + 21 5M2B10M0N10M5N10M 18571496 5M2B10M0N10M5N10M 18571824 10M4N10M0N10M2B5M > 30 19 30 TAAAAAAAAAATCCAAANACAAAAATCCTATNAAAGANTTAGAACTGCTTTCTCTTATAATAGGTGTAAC -869999886675769;!<;:;9:877889:!81/77!57-44(27:89:78::7$758865556::899 19995 chr21 GS10367-FS3 L08 1 11188879 2 R + -17 10M6N10M0N10M2B5M 18571458 10M6N10M0N10M2B5M 18571128 5M2B10M0N10M6N10M Q 24 55 55 TTTTATAAGTAGAGACGGGGTTTCATGGCCAGGCTTAACAATAAGTAAGCCACATTTACTCAGCTCTTTT '8625-7675657668;;<::;2:69:89:9081/'6888810647869:89;:74888861454:92''
Evidence Correlation File
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/correlation-GS19240-180-36-ASM.tsv.bz2 | head -20
result:
#ASSEMBLY_ID GS19240-180-36-ASM #FORMAT_VERSION 1.3 #GENERATED_AT 2010-Jul-14 14:19:09.429488 #GENERATED_BY ExportCorrelation #SAMPLE GS00028-DNA_C01 #SOFTWARE_VERSION 1.8.0.23 #TYPE EVIDENCE-CORRELATION >Chromosome1 OffsetInChromosome1 Length1 Chromosome2 OffsetInChromosome2 Length2 P1 P2 P12 chr1 910 46 chr9 1452 26 314 86 399 chr1 910 46 chr9 1496 54 314 122 426 chr1 910 46 chr9 1589 7 314 79 384 chr1 910 46 chr9 1617 46 314 1198 1484 chr1 910 46 chr9 1697 7 314 48 353 chr1 910 46 chr9 1714 43 314 643 938 chr1 910 46 chr9 1776 17 314 788 1093 chr1 910 46 chr9 1800 46 314 418 723 chr1 910 46 chr15 100337124 41 314 1930 2233 chr1 910 46 chr15 100337181 7 314 1123 1426 chr1 910 46 chr15 100337203 36 314 220 535
[long running]
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/correlation-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if (($F[0] eq 'chr21' && $F[1] == 9720357) or ($F[3] eq 'chr21' && $F[4] == 9720357));'
result:
chr6 61947637 50 chr21 9720357 59 262 1818 2072 chr16 33880293 42 chr21 9720357 59 116 1818 1911 chr16 33880349 9 chr21 9720357 59 52 1818 1873 chr16 33880382 54 chr21 9720357 59 330 1818 2100 chr16 33880453 16 chr21 9720357 59 170 1818 1953 chr16 33880483 22 chr21 9720357 59 141 1818 1953 chr16 33880517 83 chr21 9720357 59 729 1818 2531 chr16 33880655 49 chr21 9720357 59 769 1818 2563 chr16 33880737 49 chr21 9720357 59 86 1818 1898 chr16 33880822 69 chr21 9720357 59 325 1818 2119 chr16 33880963 46 chr21 9720357 59 232 1818 2047 chr16 33881047 38 chr21 9720357 59 147 1818 1957 chr16 33881122 39 chr21 9720357 59 132 1818 1939 chr16 33881172 50 chr21 9720357 59 68 1818 1876 chr21 9719830 37 chr21 9720357 59 1143 1818 2985 chr21 9719892 14 chr21 9720357 59 373 1818 2188 chr21 9719933 34 chr21 9720357 59 198 1818 2018 chr21 9720003 46 chr21 9720357 59 1035 1818 2923 chr21 9720073 40 chr21 9720357 59 645 1818 2497 chr21 9720137 7 chr21 9720357 59 675 1818 2541 chr21 9720156 35 chr21 9720357 59 351 1818 2127 chr21 9720221 47 chr21 9720357 59 1259 1818 2926 chr21 9720320 31 chr21 9720357 59 469 1818 2285 chr21 9720357 59 chr21 9720428 10 1818 43 1652 chr21 9720357 59 chr21 9720564 38 1818 793 2565 chr21 9720357 59 chr21 9720619 43 1818 795 2562 chr21 9720357 59 chr21 9720671 25 1818 429 2220 chr21 9720357 59 chr21 9720723 21 1818 530 2332 chr21 9720357 59 chr21 9720751 48 1818 292 2107 chr21 9720357 59 chr21 9720819 20 1818 694 2511 chr21 9720357 59 chr21 9720853 31 1818 98 1918
Reference Genome File
command:
cgatools decodecrr --reference ${path_to_data}/cgatools/cgatools_input/hg18.crr --range chr21,45724800,45724870
result:
TTCCTCAGACCTTCATTGACATGGAGGGATCTGGCTTCGGGGGCGATCTGGAGGCCCTGCGGGTGAGTGG
...
command:
cgatools listcrr --reference reference ${path_to_data}/cgatools/cgatools_input/hg18.crr --mode chromosome
result:
cgatools listcrr: multiple_occurrences
...
command:
cgatools listcrr --reference reference ${path_to_data}/cgatools/cgatools_input/hg18.crr --mode contig
result:
cgatools listcrr: multiple_occurrences
Gene Variation Summary File
command:
cat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/geneVarSummary-GS19240-180-36-ASM.tsv | head -20
result:
#ASSEMBLY_ID GS19240-180-36-ASM #DBSNP_BUILD dbSNP build 130 #FORMAT_VERSION 1.3 #GENERATED_AT 2010-Jul-14 05:39:36.249717 #GENERATED_BY callannotate #GENE_ANNOTATIONS NCBI build 36.3 #GENOME_REFERENCE NCBI build 36 #SAMPLE GS00028-DNA_C01 #SOFTWARE_VERSION 1.8.0.23 #TYPE GENE-VAR-SUMMARY-REPORT >geneId mrnaAcc symbol chromosome begin end missense nonsense nonStop frameshift inframe total missenseNovel nonsenseNovel nonStopNovel frameshiftNovel inframeNovel totalNovel 653635 XR_017611.2 LOC653635 chr1 814 19919 0 0 0 0 0 0 0 0 0 000 79501 NM_001005484.1 OR4F5 chr1 58953 59871 4 0 0 0 0 4 0 0 0 0 00 100132632 XM_001724183.1 LOC100132632 chr1 77384 80096 0 0 0 0 0 0 0 0 000 0 643670 XR_039242.1 LOC643670 chr1 110380 130714 0 0 0 0 0 0 0 0 0 000 729737 XM_001133863.1 LOC729737 chr1 114642 134341 0 0 0 0 0 0 0 0 0 000 653340 XR_039243.1 LOC653340 chr1 123323 130714 0 0 0 0 0 0 0 0 0 000 728481 XR_015292.2 LOC728481 chr1 217632 218641 0 0 0 0 0 0 0 0 0 000 100132287 XR_039254.1 LOC100132287 chr1 313132 319752 0 0 0 0 0 0 0 0 000 0
...
command:
grep '[^0-9]' ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/geneVarSummary-GS19240-180-36-ASM.tsv | sort -k18nr | head
result:
#ASSEMBLY_ID GS19240-180-36-ASM #DBSNP_BUILD dbSNP build 130 #FORMAT_VERSION 1.3 #GENERATED_AT 2010-Jul-14 05:39:36.249717 #GENERATED_BY callannotate #GENE_ANNOTATIONS NCBI build 36.3 #GENOME_REFERENCE NCBI build 36 #SAMPLE GS00028-DNA_C01 #SOFTWARE_VERSION 1.8.0.23150483 NM_144705.2 TEKT4 chr2 94900958 94906295 22 0 0 1 2 25 19 0 0 0 221 100132267 XM_001718197.1 LOC100132267 chr16 33843834 33844897 20 0 0 1 0 21 12 0 01 0 13 4583 NM_002457.2 MUC2 chr11 1064901 1094419 14 0 0 8 0 22 6 0 0 6 0 12 5558 NM_000947.2 PRIM2 chr6 57290380 57621335 13 2 0 1 0 16 8 1 0 1 010 100132821 XM_001714759.1 LOC100132821 chr2 240501949 240507031 5 1 0 7 0 13 1 0 07 0 8 219417 NM_001005204.1 OR8U1 chr11 55899675 55900635 13 0 0 0 0 13 7 0 0 0 07 140453 NM_001040105.1 MUC17 chr7 100450083 100488860 23 0 0 0 0 23 6 0 0 0 06 3115 NM_002121.4 HLA-DPB1 chr6 33151737 33162954 13 0 0 4 0 17 2 0 0 40 6 3768 NM_021012.4 KCNJ12 chr17 21220291 21263772 12 0 0 0 0 12 6 0 0 0 06 440563 XM_001720694.1 LOC440563 chr1 13105374 13106872 8 0 0 0 0 8 6 0 0 00 6 0
Gene File
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | head -20
result:
#ASSEMBLY_ID GS19240-180-36-ASM #DBSNP_BUILD dbSNP build 130 #FORMAT_VERSION 1.3 #GENERATED_AT 2010-Jul-14 05:39:36.249717 #GENERATED_BY callannotate #GENE_ANNOTATIONS NCBI build 36.3 #GENOME_REFERENCE NCBI build 36 #SAMPLE GS00028-DNA_C01 #SOFTWARE_VERSION 1.8.0.23 #TYPE GENE-ANNOTATION >index locus allele chromosome begin end varType reference call xRef geneId mrnaAcc proteinAcc symbol orientation component componentIndex codingRegionKnown impact nucleotidePos proteinPos annotationRefSequence sampleSequence genomeRefSequence 0 5 1 chr1 972 979 no-call ? 653635 XR_017611.2 LOC653635 - EXON 11 N UNDEFINED 0 0 5 2 chr1 972 979 no-call ? 653635 XR_017611.2 LOC653635 - EXON 11 N UNDEFINED 5867 1 9 1 chr1 1046 1053 no-call ? 653635 XR_017611.2 LOC653635 - EXON 11 N UNDEFINED 5867 1 9 2 chr1 1046 1053 no-call ? 653635 XR_017611.2 LOC653635 - EXON 11 N UNDEFINED 5793 2 21 1 chr1 1367 1374 no-call ? 653635 XR_017611.2 LOC653635 - EXON 11 N UNDEFINED 5793 2 21 2 chr1 1367 1374 no-call ? 653635 XR_017611.2 LOC653635 - EXON 11 N UNDEFINED 5472 3 33 1 chr1 1722 1729 no-call ? 653635 XR_017611.2 LOC653635 - EXON 11 N UNDEFINED 5472 3 33 2 chr1 1722 1729 no-call ? 653635 XR_017611.2 LOC653635 - EXON 11 N UNDEFINED 5117
...
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep 'rs1042522'
result:
4223284 18936084 1 chr17 7520196 7520197 snp G C dbsnp.86:rs1042522 7157 NM_000546.3 NP_000537.3 TP53 - EXON 3 Y MISSENSE 465 71 P R P 4550307 20202772 1 chr19 13748223 13748224 snp G C dbsnp.119:rs10425229 28974 NM_014047.2 NP_054766.1 C19orf53 + INTRON 1 Y
[long running, generates count of components]
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep '^[0-9]' | cut -f16 | sort | uniq -c | sort -k1nr
result:
9058542 INTRON 1651007 UTR 283921 EXON 22565 ACCEPTOR 7752 DONOR
[long running, generates count of impacts]
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep '^[0-9]' | cut -f19 | sort | uniq -c | sort -k1nr
result:
10723425 119677 NO-CHANGE 102256 UNKNOWN 38143 UNDEFINED 20675 COMPATIBLE 18170 MISSENSE 545 FRAMESHIFT 190 MISSTART 182 NONSENSE 129 DISRUPT 97 DELETE+ 93 INSERT+ 88 NONSTOP 70 DELETE 47 INSERT
...
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep -w 'TEKT4' | perl -F'\t' -ane 'print if ($F[18] =~ /FRAMESHIFT/);'
result:
558976 2440057 1 chr2 94901151 94901173 del TCCGCACCTCCAAGTACCTGCT dbsnp.126:rs34195690 150483 NM_144705.2 NP_653306.1 TEKT4 + EXON 0 Y FRAMESHIFT 193 33 FRTSKYLL W FRTSKYLL
...
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep -w 'TEKT4' | perl -F'\t' -ane 'print if ($F[18] =~ /DELETE/ || $F[18] =~ /INSERT/);'
result:
558976 2440057 1 chr2 94901151 94901173 del TCCGCACCTCCAAGTACCTGCT dbsnp.126:rs34195690 150483 NM_144705.2 NP_653306.1 TEKT4 + EXON 0 Y FRAMESHIFT 193 33 FRTSKYLL W FRTSKYLL [splaisan@localhost training_data]$ bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep -w 'TEKT4' | perl -F'\t' -ane 'print if ($F[18] =~ /DELETE/ || $F[18] =~ /INSERT/);' 558970 2440045 1 chr2 94901055 94901055 ins TGCAGACGGATGTGCTCCTACCAGAGCCGGCAC 150483 NM_144705.2 NP_653306.1 TEKT4 + EXON 0 Y INSERT+ 97 1 A VQTDVLLPEPAP A 558982 2440069 2 chr2 94901247 94901247 ins TGCAGG 150483 NM_144705.2 NP_653306.1 TEKT4 + EXON 0 Y INSERT+ 289 65 R LQG R
...
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep -w 'TEKT4' | grep -w 'MISSENSE' | cut -f1 | sort | uniq -c | sort -k1nr | head -5
result:
1 558977 1 558978 1 558979 1 558981 1 558986
chr21 Varfile Subset
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2 | head -200 | grep -v '^[0-9]' > ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv
result:
# result saved to file, no terminal output
...
command:
bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2 | grep 'chr21' >> ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv
result:
# result added to file, no terminal output
see result
command:
head -20 ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv
result:
#ASSEMBLY_ID GS19240-180-36-ASM #DBSNP_BUILD dbSNP build 130 #FORMAT_VERSION 1.3 #GENERATED_AT 2010-Jul-14 04:33:11.149915 #GENERATED_BY dbsnptool #GENOME_REFERENCE NCBI build 36 #SAMPLE GS00028-DNA_C01 #SOFTWARE_VERSION 1.8.0.23 #TYPE VAR-ANNOTATION >locus ploidy allele chromosome begin end varType reference alleleSeq totalScore hapLink xRef 21030570 ? all chr21 0 9719767 no-ref = ? 21030571 2 1 chr21 9719767 9719767 no-call ? 21030571 2 1 chr21 9719767 9719773 ref AATTCT AATTCT 112 21030571 2 2 chr21 9719767 9719773 no-call AATTCT ? 21030572 2 all chr21 9719773 9719792 ref = = 21030573 2 1 chr21 9719792 9719793 ref G G 112 3931115 21030573 2 2 chr21 9719792 9719793 snp G T 112 3931116 21030574 2 all chr21 9719793 9719796 ref = = 21030575 2 1 chr21 9719796 9719797 ref G G 112 3931115
calldiff
command:
cgatools calldiff --reference ${path_to_data}/cgatools/cgatools_input/hg18.crr \ --variantsA ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv \ --variantsB ${path_to_data}/cgatools/cgatools_input/var-NA19238-chr21.tsv \ --superlocus-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-output.tsv \ --superlocus-stats ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-stats.csv \ --locus-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-output.tsv \ --locus-stats ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-stats.csv \ --debug-call-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-debug-call-output.tsv \ --debug-superlocus-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-debug-superlocus-output.tsv
result:
# result saved to file, no terminal output # superlocus-output head -20 ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-output.tsv # superlocus-stats cat ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-stats.csv # locus-output head -20 ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-output.tsv # locus-stats cat ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-stats.csv
snpdiff
command:
cgatools snpdiff --reference ${path_to_data}/cgatools/cgatools_input/hg18.crr \ --variants ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv \ --genotypes ${path_to_data}/cgatools/cgatools_input/NA19238_Infinium_Genotypes.tsv \ --output ${path_to_data}/cgatools/cgatools_output/NA19238_out.tsv \ --verbose ${path_to_data}/cgatools/cgatools_output/NA19238_verb.tsv \ --stats ${path_to_data}/cgatools/cgatools_output/NA19238_stats.csv
result:
# result saved to file, no terminal output # preview output with head -20 ${path_to_data}/cgatools/cgatools_output/NA19238_out.tsv cat ${path_to_data}/cgatools/cgatools_output/NA19238_stats.csv
end of 'September-9th' exercises
when we get new commands for version 1.1, they will be added here
back to [training page]