Cgatools commands with results

From BITS wiki
Jump to: navigation, search

Full listing of commands used during the CGI training AND their solutions

make sure you copy the full text of the commands, some are spread over several lines or go outside of the dotted box

UCSC browser direct link [1]

back to [training page]

set path and other inits

by typing this bash command in the cgatools folder, you can shorten the commands in the following exercises by replacing the path by its variable name ${path_to_data}

command:

cd <<the full path to the cgatools folder>>
path_to_data=`pwd`

Reference Coverage File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/REF/coverageRefScore-chr21-GS19240-180-36-ASM.tsv.bz2 | head -20

result:

#ASSEMBLY_ID    GS19240-180-36-ASM
#CHROMOSOME     chr21
#FORMAT_VERSION 1.3
#GENERATED_AT   2010-Jul-14 14:21:34.927322
#GENERATED_BY   ExportReferenceSupport
#SAMPLE GS00028-DNA_C01
#SOFTWARE_VERSION       1.8.0.23
#TYPE   REFMETRICS

>offset refScore        uniqueSequenceCoverage  weightSumSequenceCoverage
9719767 -75     6       7
9719768 -46     6       8
9719769 -46     10      13
9719770 -13     16      20
9719771 -13     19      23
9719772 3       18      23
9719773 3       26      31
9719774 6       28      32
9719775 10      31      36
9719776 12      36      42

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/REF/coverageRefScore-chr21-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if ($F[0]>= 40647496 && $F[0]<= 40647501);'

result:

40647496        55      24      24
40647497        61      24      24
40647498        52      24      24
40647499        -94     22      22
40647500        22      18      18
40647501        43      19      19

Variant File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2 | head -20

result:

#ASSEMBLY_ID    GS19240-180-36-ASM
#DBSNP_BUILD    dbSNP build 130
#FORMAT_VERSION 1.3
#GENERATED_AT   2010-Jul-14 04:33:11.149915
#GENERATED_BY   dbsnptool
#GENOME_REFERENCE       NCBI build 36
#SAMPLE GS00028-DNA_C01
#SOFTWARE_VERSION       1.8.0.23
#TYPE   VAR-ANNOTATION

>locus  ploidy  allele  chromosome      begin   end     varType reference       alleleSeq       totalScore       hapLink xRef
1       2       all     chr1    0       901     no-call =       ?
2       2       all     chr1    901     910     ref     =       =
3       2       all     chr1    910     956     no-call =       ?
4       2       all     chr1    956     972     ref     =       =
5       2       all     chr1    972     979     no-call =       ?
6       2       all     chr1    979     993     ref     =       =
7       2       all     chr1    993     1033    no-call =       ?
8       2       all     chr1    1033    1046    ref     =       =
9       2       all     chr1    1046    1053    no-call =       ?


...

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2 | grep -w chr21 | perl -F'\t' -ane 'print if ($F[4]>=3946599 && $F[5]<=9719804);'

result:

21030571        2       1       chr21   9719767 9719767 no-call         ?
21030571        2       1       chr21   9719767 9719773 ref     AATTCT  AATTCT  112
21030571        2       2       chr21   9719767 9719773 no-call AATTCT  ?
21030572        2       all     chr21   9719773 9719792 ref     =       =
21030573        2       1       chr21   9719792 9719793 ref     G       G       112     3931115
21030573        2       2       chr21   9719792 9719793 snp     G       T       112     3931116
21030574        2       all     chr21   9719793 9719796 ref     =       =
21030575        2       1       chr21   9719796 9719797 ref     G       G       112     3931115
21030575        2       2       chr21   9719796 9719797 snp     G       C       112     3931116
21030576        2       all     chr21   9719797 9719803 ref     =       =
21030577        2       1       chr21   9719803 9719804 snp     T       C       112             dbsnp.129:rs55981545
21030577        2       2       chr21   9719803 9719804 snp     T       C       112             dbsnp.129:rs55981545

Evidence Interval File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceIntervals-chr21-GS19240-180-36-ASM.tsv.bz2 | head -20

result:

#ASSEMBLY_ID    GS19240-180-36-ASM
#CHROMOSOME     chr21
#FORMAT_VERSION 1.3
#GENERATED_AT   2010-Jul-14 14:21:04.402357
#GENERATED_BY   ExportEvidence
#SAMPLE GS00028-DNA_C01
#SOFTWARE_VERSION       1.8.0.23
#TYPE   EVIDENCE-INTERVALS

>IntervalId     Chromosome      OffsetInChromosome      Length      Ploidy  AlleleIndexes   Score   Allele0 Allele1 Allele2     Allele1Alignment        Allele2Alignment
0       chr21   9719767 49      2       1;2     1134    AATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATTTCACACATTTGA   GAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTCACACATTTGA  GAATTCTGAGAAACTTCTTTGTGAGGTTTGCATTCATCTCACACATTTGA  1I49M   1I49M
1       chr21   9719830 37      2       0;1     1143    GAAGATTTGGAAACAGTCTTTTTGTAAAATCTATAAA       GAAGATTTGGAAACAGTCTTTTTGTAAAATCTACAAA               37M
2       chr21   9719892 14      2       0;1     373     CTAGGGTGAAGTAG      CTATGGTGAAGTAG          14M
3       chr21   9719933 34      2       0;1     198     GAAATTTTCTGAGAAACGTTTTAGTGATGCGTGC  GAAATTTTCTGAGAAACCTTTTAGTGATGCGTGC          34M
4       chr21   9720003 46      2       1;2     1035    GCACTTTGGAAACAGTCCTATTGTAGAATCCCCAAAGGAATACTTC      GCACTTTGGAAACAGTCCTATTGTAGAATCCCCAAAGGGATACTTC      GCAGTTTGGAAACAGTCCTATTGTAGAATCCCCAAAGGGGTACTTC      46M     46M
5       chr21   9720073 40      2       1;1     645     TATTGGAAATATCTTCACATAAAAGCTAGACAGAAACTTT    TATTGGAAATATCTTCACATAAAAGCTAGACAGAAGCTTT            40M
6       chr21   9720137 7       2       0;1     675     GCTCTCA     GCTTTCA         7M
7       chr21   9720156 35      2       0;1     351     AAGTGTTTCTTTTGAATGAGCAGTTTGGAAACACT AAGCATTTCTTTTGACTGAGCAGTTTGGAAACACT         35M
8       chr21   9720221 47      2       1;2     1259    GAGCGTTTTGAGGCCTATGGTGAAAAAGGAAATATCTTCACATAAAA     GAGCGTTTCGAGGCCTATGGTGAAAAAGCAAATATCTTCACATAAAA     GAGTGTTTTGAGGCCTATGGTGAAAAAGGAAATATCTTCACATAAAA     47M     47M
9       chr21   9720276 33      2       0;1     925     GAAGCTTTCTGAGAAACTACTTTGTAATGTGTG   GAAGTTTTCTGGGAAACTAATTTGTAATGTGTG           33M


...

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceIntervals-chr21-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if ($F[0]==19995);'

result:

19995   chr21   18571475        34      2       1;2     509     AAGTCACATTTACTCAGCTTTTAAAAAAAATCCA      AAGCCACATTTACTCAGCTTTTAAAAAAAAATCCA       AAGCCACATTTACTCAGCTTTTAAAAAAAATCCA      24M1I10M        34M

...

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceIntervals-chr21-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if ($F[2]<=9719803 && ($F[2]+$F[3])>=9719804);'

result:

0       chr21   9719767 49      2       1;2     1134    AATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATTTCACACATTTGA       GAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTCACACATTTGA        GAATTCTGAGAAACTTCTTTGTGAGGTTTGCATTCATCTCACACATTTGA      1I49M   1I49M

Evidence DNB File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceDnbs-chr21-GS19240-180-36-ASM.tsv.bz2 | head -20

result:

#ASSEMBLY_ID    GS19240-180-36-ASM
#CHROMOSOME     chr21
#FORMAT_VERSION 1.3
#GENERATED_AT   2010-Jul-14 14:21:04.402357
#GENERATED_BY   ExportEvidence
#SAMPLE GS00028-DNA_C01
#SOFTWARE_VERSION       1.8.0.23
#TYPE   EVIDENCE-DNBS

>IntervalId     Chromosome      Slide   Lane    FileNumInLane   DnbOffsetInLaneFile     AlleleIndex     Side    Strand  OffsetInAllele  AlleleAlignment   OffsetInReference       ReferenceAlignment      MateOffsetInReference   MateReferenceAlignment  MappingQuality  ScoreAllele0      ScoreAllele1    ScoreAllele2    Sequence        Scores
0       chr21   GS10364-FS3     L04     5       2091773 0       R       -       12      5M2B10M0N10M7N10M       9719779 5M2B10M0N10M7N10M9720139  10M6N10M0N10M2B5M       H       39      39      0       TGCTCTCATTCAAAAGAAACACTTATGAGATGAGAAGTTCAAATGTGAATCCAACCCTCACAAAGAGAAG  3831567766677569;;<;;;6787/93148-0/%4(78)25%2:9899;;;:8886881874694599
0       chr21   GS10364-FS3     L04     5       16662456        0       L       +       13      5M2B10M0N10M6N10M       9719780 5M2B10M0N10M6N10M 9720077 10M5N10M0N10M1B5M       :       25      25      0       TTCTTTTTGTGAGGGTTGGATTCATCATTTGAACAGGAAATATCTTAAAAGCTAGACAGAAGCTTTTCTG    585386'78*647778:<<+;:;67:99:8::91/28/781.0.3::;:::;;:88868855656:9697
0       chr21   GS10364-FS3     L05     11      13924291        0       L       +       11      5M2B10M0N10M6N10M       9719778 5M2B10M0N10M6N10M 9720110 10M6N10M0N10M2B5M       8       23      23      0       ACTTCTCTTTGTGAGGGTTGGATTCCACATTTGAATGCCAGAGAATTTTAATGAGTGCTCTCATCTCTCA    7887978676574668;;,::;:0::99989991/''+)&02532:7::8;;::877888574454:9:9
0       chr21   GS10364-FS3     L06     4       18290323        0       L       +       12      5M2B10M0N10M6N10M       9719779 5M2B10M0N10M6N10M 9720173 10M6N10M0N10M2B5M       A       32      32      0       CTTCTCTTTGTGAGGGTTGGATTCAACATTTGAACGTGCTGTTTGCTCTTTTTGCATAATCTGCACAAAT    7679060433056668:2;4::::7::99:8:+1/1$+,$733,0:9:):::::86879264275686$1
0       chr21   GS10364-FS3     L08     2       24245523        0       L       +       6       5M2B10M0N10M5N10M       9719773 5M2B10M0N10M5N10M 9720151 10M6N10M0N10M2B5M       Y       56      27      1       GAGAAAAACTTCTTTGTGAGGGTTGATTTCACACAGAGTTAAGTGTTGAATGAGCAGTTTGGAAAAACAC    78798865756773&7:;:<9::::'791:::91/77858103(0:9:5:;:::8(9888074168:695
0       chr21   GS10364-FS3     L08     5       29566964        0       L       +       7       5M2B10M0N10M6N10M       9719774 5M2B10M0N10M6N10M 9720015 10M6N10M0N10M2B5M       5       20      0       0       AGAAAAACTTCTTTGTGAGGGTTGGTTCACACATTCAGTCCTATTTCCCCAAAGGGATACTTCTCTCAGC    778885056467447/;3:<::9:59::89::610'85+/8,&1,%&/9%5488888968648058:749
0       chr21   GS10364-FS3     L08     10      10818535        0       R       -       12      5M2B10M0N10M4N10M       9719779 5M2B10M0N10M4N10M 9720139 10M6N10M0N10M2B5M       S       50      50      0       TGCTCTCATTCAAAAGAAACACTTATGAGATGAGATCAAATGTGTTGAATCCAACCCTCACAAAGAGAAG    84799'8755667367;<<;6;:::799595-8*.18674223'26:5:9;;;;88291964730::899
0       chr21   GS10367-FS3     L01     9       2114032 0       L       +       18      5M2B10M0N10M6N10M       9719785 5M2B10M0N10M6N10M9720095  10M7N10M0N10M2B5M       -       12      7       0       TGTGGGAGGGTTGGATTCATTTCACGAACATTTCTAANCTAGACATCTGAGAAACTTATTTTTAAAATGA  *(&9/8679666*178;;;;$8.69::::;:981/89!*05650/,'&&&/;;:8(8*68623.)9:899
0       chr21   GS10367-FS3     L02     4       6345383 0       L       +       13      5M2B10M0N10M6N10M       9719780 5M2B10M0N10M6N10M9720162  10M6N10M0N10M2B5M       ,       11      11      0       TTCTTTTTGTGAGGGTTGGATTCATCATTTGAACATTCTTTTGAAGTTTGGAAACACTCTTTTTTTTNAT  686/01#442667779:;;.;;;9::998:8991/&6625%/4-.9::::;;;:8675872456-/,!*2
0       chr21   GS10367-FS3     L05     2       27188331        0       L       +       36      5M2B10M0N10M6N10M       9719803 5M2B10M0N10M6N10M 9720163 10M5N10M1N10M2B5M       K       42      15      15      TTCACACACATTTGAACATTTCTTTAGATTTGGAATCTTTTGAATGTTNGGAAACCTCTTTTTGCGCATA    7882976440677669;9<;9;28997989:9:1/5738842-3,889!::;::3*7898(22,69:797

...

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/evidenceDnbs-chr21-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if ($F[0]==19995 );'

result:

19995   chr21   GS10367-FS3     L07     10      11349238        0       L       +       21      5M2B10M0N10M5N10M       18571496        5M2B10M0N10M5N10M 18571824        10M4N10M0N10M2B5M       >       30      19      30      TAAAAAAAAAATCCAAANACAAAAATCCTATNAAAGANTTAGAACTGCTTTCTCTTATAATAGGTGTAAC    -869999886675769;!<;:;9:877889:!81/77!57-44(27:89:78::7$758865556::899
19995   chr21   GS10364-FS3     L02     6       877188  1       R       +       -20     10M6N10M0N10M2B5M       18571455        10M6N10M0N10M2B5M 18571136        5M2B10M0N10M6N10M       K       11      42      42      AGAGAGACGGGGTTTCACCATGTTGCTGGTCTCGATAGTAACAATACGTAAGCCACATTTACTCACAGCT    6879(&7868677777;;9;9;:8:7,8999&9(%488673535/::99:;;:988999366653:8.37
19995   chr21   GS10364-FS3     L02     7       22800208        1       R       -       -3      5M2B10M0N10M6N10M       18571472        5M2B10M0N10M4N1P1N10M     18571883        10M6N10M0N10M2B5M       F       19      37      34      GCAGAAAAAATCTTGTGGAAAGGTACTCAGAATCATTGGATTTTTAAGCTGAGTAAATGTGGCTTTTACG    67739)2&7506276895:89999:97993877//(5786)/0+0982:9;:4:83578*)+441:88'8
19995   chr21   GS10364-FS3     L02     9       5502806 1       R       -       -11     5M1B10M0N10M5N10M       18571464        5M1B10M0N10M5N6M1I3M      18571802        10M6N10M0N10M2B5M       (       0       7       8       GCACACAAAGTTCTAAGTCAAATGCGATAACTCCTGTTATTAAAAGTAAATGTGGCTTACGTTTAACTTA    +/2238/65,(06568;;;:79:989991:8:4,/$/$'$%22*.8'9:9:;;98878840552+/:5*5
19995   chr21   GS10364-FS3     L03     4       469726  1       L       -       -19     10M6N10M0N10M2B5M       18571456        10M6N10M0N10M2B5M 18571098        5M2B10M0N10M6N10M       U       21      52      52      AAGCTCTGAGTAAATGTGGCTTACGTATTGTTACTTACTTAAAATAAAAAATTAGCCAGGCGTGGGGTGG    78889165666777689<<,6:3::4)5:9:971/'460.242)/:::::;;:98.878535%6499::9
19995   chr21   GS10364-FS3     L05     7       1904718 1       L       -       -14     10M6N10M0N10M2B5M       18571461        10M6N10M0N10M2B4M1I       18570973        5M2B10M0N10M4N10M       3       0       18      18      TTTAAAAAAGCTGAGTAAATGTGGCTTACTTATGGGCGCCATTGCCAGCCTGGAGGACAAGAGCTCTAGG    47+88-60665707638;;4::97/8,498:990/77068-#211'1/-/;;;98898881567/9::99
19995   chr21   GS10364-FS3     L07     2       4371163 1       R       -       -3      5M2B10M0N10M6N10M       18571472        5M2B10M0N10M4N1P1N10M     18571883        10M6N10M0N10M2B5M       N       19      45      43      GCAGAGAAAATCTTGTGGAAAGGTACTCAGAATCATTGGATTTTTAAGCTGAGTAAATGTGGCTTTTACG    7778966&4255.0));:;8;;94959::899#100678%,242-::%:9:;(9884,88344/56:786
19995   chr21   GS10364-FS3     L08     11      7769695 1       L       -       -17     10M6N10M0N10M2B5M       18571458        10M6N10M0N10M2B5M 18571205        5M2B10M0N10M6N10M       6       11      21      21      AAAAGAGNTGAGTAAATGTGGCTTACTTATTGTTACGGTGGCTCTTATACCCAACACTTTGGGAGAGGCT    7889968!6766555.&8:;'/::.:793:::91.5,,7&8&3$#/7%+/,);:8%889911.&44:999
19995   chr21   GS10367-FS3     L01     4       26409639        1       L       +       -18     5M1B10M0N10M6N10M       18571457        5M1B10M0N10M6N10M 18571770        10M6N10M0N10M2B5M       U       22      52      52      GTAACCAATAAGTAAACGTAAGCCACTCAGCTTTTAATACAGTGTTGTGAAGAACTTTTGTAGGAGAGTT    7889972777676768::9:;;:::899:::681/71627,-33)999:96;;:8-67584576,:9:79
19995   chr21   GS10367-FS3     L01     6       8229516 1       L       +       14      5M2B10M0N10M5N10M       18571489        5M2B7M1I2M0N10M5N10M      18571784        10M5N10M1N10M2B5M       :       0       25      0       CAGCTCATTTAAAAAAAAATCCAAAAAAAAACTTCGCTGTGAAGATGTAGGAGTTTCAGAATAGCGCATT    +(889'&&3,0546-9;<;:9;::9:897::991/7866582431::::9;:;85/8885/%-219:873
19995   chr21   GS10367-FS3     L02     4       28884636        1       R       -       -32     5M2B10M0N10M7N10M       18571443        5M2B10M0N10M7N10M 18571695        10M6N10M0N10M2B5M       I       12      40      40      ATTGAGAAGTCAAAGACTGTATATTTTTCTCCCTCTGTGGCTTACTATTGTTACTAAAGATATATATNTC    58698576567767(8;4.:0:8,9885:99991067708545)/92989;;,:8788836665295!97
19995   chr21   GS10367-FS3     L02     12      8349863 1       L       -       -24     10M6N10M0N10M2B5M       18571451        10M6N10M0N10M2B5M 18571062        5M2B10M0N10M6N10M       K       17      42      42      GAGTATAANTGTGGCTTACGTTTACTTACTAAAGATGGTACACACTCCCAGGCTACTCGGGAGGCGCNGA    76688&53!3333548:8;;48:::9989:9:,1/648,2413,$:::::;;;:6677865265347!88
19995   chr21   GS10367-FS3     L03     7       19754890        1       L       -       12      10M5N10M0N10M2B5M       18571487        10M2N1P2N10M0N10M2B5M     18571049        5M2B10M0N10M5N10M       =       8       28      8       AGTTTTTTTATTGTTTTTGGATTTTAAAAGCTGAGAATCCCAGGCGGGAGGCTGAGGCAGGAGAAAATCA    08899302./657658;<<;/75:9789997:91/6/8707141.1::;:;8;:8898876157577949
19995   chr21   GS10367-FS3     L03     10      9478592 1       R       -       -16     5M2B10M0N10M6N10M       18571459        5M2B10M0N10M6N10M 18571814        10M6N10M0N10M2B5M       0       0       15      15      TTATATAAGAGAAAGCACAAAGTTCAAATGCTTTTTAAAAGTTGAGTGGCTTACGTTTACTTAGTTTGTT    ((1423737,770569;<<9;7;::3998::971./6476/24./:6::9;;7:6*8868$'-$59$55&
19995   chr21   GS10367-FS3     L04     5       18961206        1       L       +       14      5M2B10M0N10M6N10M       18571489        5M2B7M1I2M0N10M6N10M      18571784        10M5N10M1N10M2B5M       6       0       21      0       CAGCTCTTGTAAAAAAAAATCCAAAAAAAACTTCCGCTGTGAAGATGTAGGAGTTTCAGAAAAGCGCATT    8688970%.(3777*9;<<;99::::::9:9:81/563.14+,+1::9::;6:/8&88885'20'::965
19995   chr21   GS10367-FS3     L05     5       7047390 1       R       -       -31     5M2B10M0N10M6N10M       18571444        5M2B10M0N10M6N10M 18571810        10M5N10M0N10M2B5M       2       0       17      17      AGAGAGAAAGCACAAAGTTCTAAGTGCTTTTCTGATGTGGCTTACTTANTGTTACCAAAGATATATATGT    66999776676677699;7;%)78/9687978*-%,6/78+#'00%$9!%884-,&9998813$,*97:6
19995   chr21   GS10367-FS3     L05     7       22469414        1       L       -       -17     10M5N10M0N10M2B5M       18571458        10M5N10M0N10M2B5M 18571095        5M2B10M0N10M5N10M       8       2       23      23      AAAGCGCTGAGTAAATGTGGCTTACCTTATTGTTATAAAATACCAAAANTAGCCAGGCGTGGTGGGGTAC    888993.54656766958,4:;:94%-(363971.$2.8711%+)987!8::(988968656&048:998
19995   chr21   GS10367-FS3     L07     8       29170387        1       L       -       -22     10M6N10M0N10M2B5M       18571453        10M6N10M0N10M2B5M 18571059        5M2B10M0N10M5N10M       I       12      40      40      CTGAGAGTAAATGTGGCTTACGTTTTGTTACTAAAACACACCAGTCAGGCTACTCGGGAGGCTGAGAGGC    8689767'78667678:;949;::98:6:::691/77888(0(&%69:9:3;:88.28175362385958
19995   chr21   GS10367-FS3     L07     9       7259350 1       L       +       -34     5M2B10M0N10M7N10M       18571441        5M2B10M0N10M7N10M 18571698        10M5N10M0N10M2B5M       T       12      51      50      CAGACACATATATCTTTAGTAACAAACGTAAGCCAGGAGAAATATTATACAGTCTTTGACTTCAAAATAT    78899663'4676368;;9;6:;:::9:9:9391/47988923,-:996:6;::4888896474599:93
19995   chr21   GS10367-FS3     L08     1       11188879        1       R       +       -17     10M6N10M0N10M2B5M       18571458        10M6N10M0N10M2B5M 18571128        5M2B10M0N10M6N10M       Q       24      55      55      TTTTATAAGTAGAGACGGGGTTTCATGGCCAGGCTTAACAATAAGTAAGCCACATTTACTCAGCTCTTTT    '8625-7675657668;;<::;2:69:89:9081/'6888810647869:89;:74888861454:92''
19995   chr21   GS10367-FS3     L08     11      2896905 1       L       +       22      5M2B10M0N10M6N10M       18571497        2M1I2M2B10M0N10M6N10M     18571788        10M6N10M0N10M2B5M       ;       33      53      33      AAAAAAAAAAATCCAAAAACAAAAACCTATGAAATTGAAGAACTTGAGTTATCAGAAAAGCATTTTTGAC    2789978692534668;;;;8:;::6898:9:90,/,237441204::45:7:28888987543341.78
19995   chr21   GS10364-FS3     L02     6       877188  2       R       +       -20     10M6N10M0N10M2B5M       18571455        10M6N10M0N10M2B5M 18571136        5M2B10M0N10M6N10M       K       11      42      42      AGAGAGACGGGGTTTCACCATGTTGCTGGTCTCGATAGTAACAATACGTAAGCCACATTTACTCACAGCT    6879(&7868677777;;9;9;:8:7,8999&9(%488673535/::99:;;:988999366653:8.37
19995   chr21   GS10364-FS3     L02     7       22800208        2       R       -       -3      5M2B10M0N10M5N10M       18571472        5M2B10M0N10M5N10M 18571883        10M6N10M0N10M2B5M       C       19      37      34      GCAGAAAAAATCTTGTGGAAAGGTACTCAGAATCATTGGATTTTTAAGCTGAGTAAATGTGGCTTTTACG    67739)2&7506276895:89999:97993877//(5786)/0+0982:9;:4:83578*)+441:88'8
19995   chr21   GS10364-FS3     L02     9       5502806 2       R       -       -11     5M1B10M0N10M5N10M       18571464        5M1B10M0N10M5N10M 18571802        10M6N10M0N10M2B5M       (       0       7       8       GCACACAAAGTTCTAAGTCAAATGCGATAACTCCTGTTATTAAAAGTAAATGTGGCTTACGTTTAACTTA    +/2238/65,(06568;;;:79:989991:8:4,/$/$'$%22*.8'9:9:;;98878840552+/:5*5
19995   chr21   GS10364-FS3     L03     4       469726  2       L       -       -19     10M6N10M0N10M2B5M       18571456        10M6N10M0N10M2B5M 18571098        5M2B10M0N10M6N10M       U       21      52      52      AAGCTCTGAGTAAATGTGGCTTACGTATTGTTACTTACTTAAAATAAAAAATTAGCCAGGCGTGGGGTGG    78889165666777689<<,6:3::4)5:9:971/'460.242)/:::::;;:98.878535%6499::9
19995   chr21   GS10364-FS3     L03     8       10073663        2       R       -       1       5M2B10M0N10M5N10M       18571476        5M2B10M0N10M5N10M 18571896        10M5N10M0N10M2B5M       5       0       16      20      TAGGAGATGAAAATGCAGAAAATCTAAAGGTAGAGGTTTTTGGATTTANAAGCTGAGTAAATGTGTGGCT    7789964676665369;;;;:::6:979:977:1074366.4323879!8:;9:8795787763699999
19995   chr21   GS10364-FS3     L05     7       1904718 2       L       -       -14     10M6N10M0N10M2B5M       18571461        10M6N10M0N10M2B5M 18570973        5M2B10M0N10M4N10M       3       0       18      18      TTTAAAAAAGCTGAGTAAATGTGGCTTACTTATGGGCGCCATTGCCAGCCTGGAGGACAAGAGCTCTAGG    47+88-60665707638;;4::97/8,498:990/77068-#211'1/-/;;;98898881567/9::99
19995   chr21   GS10364-FS3     L07     2       4371163 2       R       -       -3      5M2B10M0N10M5N10M       18571472        5M2B10M0N10M5N10M 18571883        10M6N10M0N10M2B5M       L       19      45      43      GCAGAGAAAATCTTGTGGAAAGGTACTCAGAATCATTGGATTTTTAAGCTGAGTAAATGTGGCTTTTACG    7778966&4255.0));:;8;;94959::899#100678%,242-::%:9:;(9884,88344/56:786
19995   chr21   GS10364-FS3     L08     11      7769695 2       L       -       -17     10M6N10M0N10M2B5M       18571458        10M6N10M0N10M2B5M 18571205        5M2B10M0N10M6N10M       6       11      21      21      AAAAGAGNTGAGTAAATGTGGCTTACTTATTGTTACGGTGGCTCTTATACCCAACACTTTGGGAGAGGCT    7889968!6766555.&8:;'/::.:793:::91.5,,7&8&3$#/7%+/,);:8%889911.&44:999
19995   chr21   GS10367-FS3     L01     4       26409639        2       L       +       -18     5M1B10M0N10M6N10M       18571457        5M1B10M0N10M6N10M 18571770        10M6N10M0N10M2B5M       U       22      52      52      GTAACCAATAAGTAAACGTAAGCCACTCAGCTTTTAATACAGTGTTGTGAAGAACTTTTGTAGGAGAGTT    7889972777676768::9:;;:::899:::681/71627,-33)999:96;;:8-67584576,:9:79
19995   chr21   GS10367-FS3     L02     3       3044773 2       L       +       -7      5M2B10M0N10M6N10M       18571468        5M2B10M0N10M6N10M 18571821        10M6N10M2N10M3B5M       7       1       14      22      TAAACACGTAAGCCACATTTACTCAAAAAAAAATCTTTGGCTTAGGTGCTTTCTCATAATAGGTAGTNAC    &889887867645779;;;;,/865::9:9/,,$.+786)7022/7:::::;;:867887641%7::!78
19995   chr21   GS10367-FS3     L02     4       28884636        2       R       -       -32     5M2B10M0N10M7N10M       18571443        5M2B10M0N10M7N10M 18571695        10M6N10M0N10M2B5M       I       12      40      40      ATTGAGAAGTCAAAGACTGTATATTTTTCTCCCTCTGTGGCTTACTATTGTTACTAAAGATATATATNTC    58698576567767(8;4.:0:8,9885:99991067708545)/92989;;,:8788836665295!97
19995   chr21   GS10367-FS3     L02     12      8349863 2       L       -       -24     10M6N10M0N10M2B5M       18571451        10M6N10M0N10M2B5M 18571062        5M2B10M0N10M6N10M       K       17      42      42      GAGTATAANTGTGGCTTACGTTTACTTACTAAAGATGGTACACACTCCCAGGCTACTCGGGAGGCGCNGA    76688&53!3333548:8;;48:::9989:9:,1/648,2413,$:::::;;;:6677865265347!88
19995   chr21   GS10367-FS3     L03     10      9478592 2       R       -       -16     5M2B10M0N10M6N10M       18571459        5M2B10M0N10M6N10M 18571814        10M6N10M0N10M2B5M       0       0       15      15      TTATATAAGAGAAAGCACAAAGTTCAAATGCTTTTTAAAAGTTGAGTGGCTTACGTTTACTTAGTTTGTT    ((1423737,770569;<<9;7;::3998::971./6476/24./:6::9;;7:6*8868$'-$59$55&
19995   chr21   GS10367-FS3     L05     5       7047390 2       R       -       -31     5M2B10M0N10M6N10M       18571444        5M2B10M0N10M6N10M 18571810        10M5N10M0N10M2B5M       2       0       17      17      AGAGAGAAAGCACAAAGTTCTAAGTGCTTTTCTGATGTGGCTTACTTANTGTTACCAAAGATATATATGT    66999776676677699;7;%)78/9687978*-%,6/78+#'00%$9!%884-,&9998813$,*97:6
19995   chr21   GS10367-FS3     L05     7       22469414        2       L       -       -17     10M5N10M0N10M2B5M       18571458        10M5N10M0N10M2B5M 18571095        5M2B10M0N10M5N10M       8       2       23      23      AAAGCGCTGAGTAAATGTGGCTTACCTTATTGTTATAAAATACCAAAANTAGCCAGGCGTGGTGGGGTAC    888993.54656766958,4:;:94%-(363971.$2.8711%+)987!8::(988968656&048:998
19995   chr21   GS10367-FS3     L06     1       18489507        2       R       +       0       10M5N10M0N10M2B5M       18571475        10M5N10M0N10M2B5M 18571166        5M2B10M0N10M6N10M       '       0       1       6       TGGTCTCCCGAACTCCTGACCTCAGTGCCCGGCTCAAGCCACATTAGCTTTTACAAAAAATCCAAAAAAA    1'1+(2&$6846*%'7;;7:99:6958/:79:81/68638554/447(2+:1;&8888,9,7.66+9999
19995   chr21   GS10367-FS3     L07     8       9323455 2       R       +       0       10M4N10M0N10M2B5M       18571475        10M4N10M0N10M2B5M 18571205        5M2B10M0N10M6N10M       K       21      39      49      AGCCTCTCCCAANGTGTTGGGTTTATGAGCCACCGAAGCCACATTCAGCTTTTAAAAAAAATCCACAAAA    67)396646/66!6388;;;:99)*8:4898990/37787822)30+5::9:7:87888862466::9:9
19995   chr21   GS10367-FS3     L07     8       29170387        2       L       -       -22     10M6N10M0N10M2B5M       18571453        10M6N10M0N10M2B5M 18571059        5M2B10M0N10M5N10M       I       12      40      40      CTGAGAGTAAATGTGGCTTACGTTTTGTTACTAAAACACACCAGTCAGGCTACTCGGGAGGCTGAGAGGC    8689767'78667678:;949;::98:6:::691/77888(0(&%69:9:3;:88.28175362385958
19995   chr21   GS10367-FS3     L07     9       7259350 2       L       +       -34     5M2B10M0N10M7N10M       18571441        5M2B10M0N10M7N10M 18571698        10M5N10M0N10M2B5M       S       12      51      50      CAGACACATATATCTTTAGTAACAAACGTAAGCCAGGAGAAATATTATACAGTCTTTGACTTCAAAATAT    78899663'4676368;;9;6:;:::9:9:9391/47988923,-:996:6;::4888896474599:93
19995   chr21   GS10367-FS3     L07     10      11349238        2       L       +       21      5M2B10M0N10M5N10M       18571496        5M2B10M0N10M5N10M 18571824        10M4N10M0N10M2B5M       >       30      19      30      TAAAAAAAAAATCCAAANACAAAAATCCTATNAAAGANTTAGAACTGCTTTCTCTTATAATAGGTGTAAC    -869999886675769;!<;:;9:877889:!81/77!57-44(27:89:78::7$758865556::899
19995   chr21   GS10367-FS3     L08     1       11188879        2       R       +       -17     10M6N10M0N10M2B5M       18571458        10M6N10M0N10M2B5M 18571128        5M2B10M0N10M6N10M       Q       24      55      55      TTTTATAAGTAGAGACGGGGTTTCATGGCCAGGCTTAACAATAAGTAAGCCACATTTACTCAGCTCTTTT    '8625-7675657668;;<::;2:69:89:9081/'6888810647869:89;:74888861454:92''

Evidence Correlation File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/correlation-GS19240-180-36-ASM.tsv.bz2 | head -20

result:

#ASSEMBLY_ID    GS19240-180-36-ASM
#FORMAT_VERSION 1.3
#GENERATED_AT   2010-Jul-14 14:19:09.429488
#GENERATED_BY   ExportCorrelation
#SAMPLE GS00028-DNA_C01
#SOFTWARE_VERSION       1.8.0.23
#TYPE   EVIDENCE-CORRELATION

>Chromosome1    OffsetInChromosome1     Length1 Chromosome2     OffsetInChromosome2     Length2 P1      P2      P12
chr1    910     46      chr9    1452    26      314     86      399
chr1    910     46      chr9    1496    54      314     122     426
chr1    910     46      chr9    1589    7       314     79      384
chr1    910     46      chr9    1617    46      314     1198    1484
chr1    910     46      chr9    1697    7       314     48      353
chr1    910     46      chr9    1714    43      314     643     938
chr1    910     46      chr9    1776    17      314     788     1093
chr1    910     46      chr9    1800    46      314     418     723
chr1    910     46      chr15   100337124       41      314     1930    2233
chr1    910     46      chr15   100337181       7       314     1123    1426
chr1    910     46      chr15   100337203       36      314     220     535

[long running]

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/EVIDENCE/correlation-GS19240-180-36-ASM.tsv.bz2 | perl -F'\t' -ane 'print if (($F[0] eq 'chr21' && $F[1] == 9720357) or ($F[3] eq 'chr21' && $F[4] == 9720357));'

result:

chr6    61947637        50      chr21   9720357 59      262     1818    2072
chr16   33880293        42      chr21   9720357 59      116     1818    1911
chr16   33880349        9       chr21   9720357 59      52      1818    1873
chr16   33880382        54      chr21   9720357 59      330     1818    2100
chr16   33880453        16      chr21   9720357 59      170     1818    1953
chr16   33880483        22      chr21   9720357 59      141     1818    1953
chr16   33880517        83      chr21   9720357 59      729     1818    2531
chr16   33880655        49      chr21   9720357 59      769     1818    2563
chr16   33880737        49      chr21   9720357 59      86      1818    1898
chr16   33880822        69      chr21   9720357 59      325     1818    2119
chr16   33880963        46      chr21   9720357 59      232     1818    2047
chr16   33881047        38      chr21   9720357 59      147     1818    1957
chr16   33881122        39      chr21   9720357 59      132     1818    1939
chr16   33881172        50      chr21   9720357 59      68      1818    1876
chr21   9719830 37      chr21   9720357 59      1143    1818    2985
chr21   9719892 14      chr21   9720357 59      373     1818    2188
chr21   9719933 34      chr21   9720357 59      198     1818    2018
chr21   9720003 46      chr21   9720357 59      1035    1818    2923
chr21   9720073 40      chr21   9720357 59      645     1818    2497
chr21   9720137 7       chr21   9720357 59      675     1818    2541
chr21   9720156 35      chr21   9720357 59      351     1818    2127
chr21   9720221 47      chr21   9720357 59      1259    1818    2926
chr21   9720320 31      chr21   9720357 59      469     1818    2285
chr21   9720357 59      chr21   9720428 10      1818    43      1652
chr21   9720357 59      chr21   9720564 38      1818    793     2565
chr21   9720357 59      chr21   9720619 43      1818    795     2562
chr21   9720357 59      chr21   9720671 25      1818    429     2220
chr21   9720357 59      chr21   9720723 21      1818    530     2332
chr21   9720357 59      chr21   9720751 48      1818    292     2107
chr21   9720357 59      chr21   9720819 20      1818    694     2511
chr21   9720357 59      chr21   9720853 31      1818    98      1918

Reference Genome File

command:

cgatools decodecrr --reference ${path_to_data}/cgatools/cgatools_input/hg18.crr --range chr21,45724800,45724870

result:

TTCCTCAGACCTTCATTGACATGGAGGGATCTGGCTTCGGGGGCGATCTGGAGGCCCTGCGGGTGAGTGG


...

command:

cgatools listcrr --reference reference ${path_to_data}/cgatools/cgatools_input/hg18.crr --mode chromosome

result:

cgatools listcrr: multiple_occurrences

...

command:

cgatools listcrr --reference reference ${path_to_data}/cgatools/cgatools_input/hg18.crr --mode contig

result:

cgatools listcrr: multiple_occurrences

Gene Variation Summary File

command:

cat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/geneVarSummary-GS19240-180-36-ASM.tsv | head -20

result:

#ASSEMBLY_ID    GS19240-180-36-ASM
#DBSNP_BUILD    dbSNP build 130
#FORMAT_VERSION 1.3
#GENERATED_AT   2010-Jul-14 05:39:36.249717
#GENERATED_BY   callannotate
#GENE_ANNOTATIONS       NCBI build 36.3
#GENOME_REFERENCE       NCBI build 36
#SAMPLE GS00028-DNA_C01
#SOFTWARE_VERSION       1.8.0.23
#TYPE   GENE-VAR-SUMMARY-REPORT

>geneId mrnaAcc symbol  chromosome      begin   end     missense        nonsense        nonStop frameshift      inframe total   missenseNovel     nonsenseNovel   nonStopNovel    frameshiftNovel inframeNovel    totalNovel
653635  XR_017611.2     LOC653635       chr1    814     19919   0       0       0       0       0       0       0       0       0       000
79501   NM_001005484.1  OR4F5   chr1    58953   59871   4       0       0       0       0       4       0       0       0       0       00
100132632       XM_001724183.1  LOC100132632    chr1    77384   80096   0       0       0       0       0       0       0       0       000       0
643670  XR_039242.1     LOC643670       chr1    110380  130714  0       0       0       0       0       0       0       0       0       000
729737  XM_001133863.1  LOC729737       chr1    114642  134341  0       0       0       0       0       0       0       0       0       000
653340  XR_039243.1     LOC653340       chr1    123323  130714  0       0       0       0       0       0       0       0       0       000
728481  XR_015292.2     LOC728481       chr1    217632  218641  0       0       0       0       0       0       0       0       0       000
100132287       XR_039254.1     LOC100132287    chr1    313132  319752  0       0       0       0       0       0       0       0       000       0

...

command:

grep '[^0-9]' ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/geneVarSummary-GS19240-180-36-ASM.tsv | sort -k18nr | head

result:

#ASSEMBLY_ID    GS19240-180-36-ASM
#DBSNP_BUILD    dbSNP build 130
#FORMAT_VERSION 1.3
#GENERATED_AT   2010-Jul-14 05:39:36.249717
#GENERATED_BY   callannotate
#GENE_ANNOTATIONS       NCBI build 36.3
#GENOME_REFERENCE       NCBI build 36
#SAMPLE GS00028-DNA_C01
#SOFTWARE_VERSION       1.8.0.23150483  NM_144705.2     TEKT4   chr2    94900958        94906295        22      0       0       1       2       25      19      0       0       0    221
100132267       XM_001718197.1  LOC100132267    chr16   33843834        33844897        20      0       0       1       0       21      12      0    01       0       13
4583    NM_002457.2     MUC2    chr11   1064901 1094419 14      0       0       8       0       22      6       0       0       6       0       12
5558    NM_000947.2     PRIM2   chr6    57290380        57621335        13      2       0       1       0       16      8       1       0       1    010
100132821       XM_001714759.1  LOC100132821    chr2    240501949       240507031       5       1       0       7       0       13      1       0    07       0       8
219417  NM_001005204.1  OR8U1   chr11   55899675        55900635        13      0       0       0       0       13      7       0       0       0    07
140453  NM_001040105.1  MUC17   chr7    100450083       100488860       23      0       0       0       0       23      6       0       0       0    06
3115    NM_002121.4     HLA-DPB1        chr6    33151737        33162954        13      0       0       4       0       17      2       0       0    40       6
3768    NM_021012.4     KCNJ12  chr17   21220291        21263772        12      0       0       0       0       12      6       0       0       0    06
440563  XM_001720694.1  LOC440563       chr1    13105374        13106872        8       0       0       0       0       8       6       0       0    00       6     0


Gene File

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | head -20

result:

#ASSEMBLY_ID    GS19240-180-36-ASM
#DBSNP_BUILD    dbSNP build 130
#FORMAT_VERSION 1.3
#GENERATED_AT   2010-Jul-14 05:39:36.249717
#GENERATED_BY   callannotate
#GENE_ANNOTATIONS       NCBI build 36.3
#GENOME_REFERENCE       NCBI build 36
#SAMPLE GS00028-DNA_C01
#SOFTWARE_VERSION       1.8.0.23
#TYPE   GENE-ANNOTATION

>index  locus   allele  chromosome      begin   end     varType reference       call    xRef    geneId  mrnaAcc proteinAcc      symbol  orientation       component       componentIndex  codingRegionKnown       impact  nucleotidePos   proteinPos      annotationRefSequence   sampleSequence    genomeRefSequence
0       5       1       chr1    972     979     no-call         ?               653635  XR_017611.2             LOC653635       -       EXON      11      N       UNDEFINED       0
0       5       2       chr1    972     979     no-call         ?               653635  XR_017611.2             LOC653635       -       EXON      11      N       UNDEFINED       5867
1       9       1       chr1    1046    1053    no-call         ?               653635  XR_017611.2             LOC653635       -       EXON      11      N       UNDEFINED       5867
1       9       2       chr1    1046    1053    no-call         ?               653635  XR_017611.2             LOC653635       -       EXON      11      N       UNDEFINED       5793
2       21      1       chr1    1367    1374    no-call         ?               653635  XR_017611.2             LOC653635       -       EXON      11      N       UNDEFINED       5793
2       21      2       chr1    1367    1374    no-call         ?               653635  XR_017611.2             LOC653635       -       EXON      11      N       UNDEFINED       5472
3       33      1       chr1    1722    1729    no-call         ?               653635  XR_017611.2             LOC653635       -       EXON      11      N       UNDEFINED       5472
3       33      2       chr1    1722    1729    no-call         ?               653635  XR_017611.2             LOC653635       -       EXON      11      N       UNDEFINED       5117

...

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep 'rs1042522'

result:

4223284 18936084        1       chr17   7520196 7520197 snp     G       C       dbsnp.86:rs1042522      7157    NM_000546.3     NP_000537.3       TP53    -       EXON    3       Y       MISSENSE        465     71      P       R       P
4550307 20202772        1       chr19   13748223        13748224        snp     G       C       dbsnp.119:rs10425229    28974   NM_014047.2       NP_054766.1     C19orf53        +       INTRON  1       Y

[long running, generates count of components]

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep '^[0-9]' | cut -f16 | sort | uniq -c | sort -k1nr

result:

9058542 INTRON
1651007 UTR
 283921 EXON
  22565 ACCEPTOR
   7752 DONOR


[long running, generates count of impacts]

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep '^[0-9]' | cut -f19 | sort | uniq -c | sort -k1nr

result:

10723425 
 119677 NO-CHANGE
 102256 UNKNOWN
  38143 UNDEFINED
  20675 COMPATIBLE
  18170 MISSENSE
    545 FRAMESHIFT
    190 MISSTART
    182 NONSENSE
    129 DISRUPT
     97 DELETE+
     93 INSERT+
     88 NONSTOP
     70 DELETE
     47 INSERT


...

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2  | grep -w 'TEKT4' | perl -F'\t' -ane 'print if ($F[18] =~ /FRAMESHIFT/);'

result:

558976  2440057 1       chr2    94901151        94901173        del     TCCGCACCTCCAAGTACCTGCT          dbsnp.126:rs34195690    150483  NM_144705.2       NP_653306.1     TEKT4   +       EXON    0       Y       FRAMESHIFT      193     33      FRTSKYLL        W       FRTSKYLL


...

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep -w 'TEKT4' | perl -F'\t' -ane 'print if ($F[18] =~ /DELETE/ || $F[18] =~ /INSERT/);'

result:

558976  2440057 1       chr2    94901151        94901173        del     TCCGCACCTCCAAGTACCTGCT          dbsnp.126:rs34195690    150483  NM_144705.2       NP_653306.1     TEKT4   +       EXON    0       Y       FRAMESHIFT      193     33      FRTSKYLL        W       FRTSKYLL
[splaisan@localhost training_data]$ bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2  | grep -w 'TEKT4' | perl -F'\t' -ane 'print if ($F[18] =~ /DELETE/ || $F[18] =~ /INSERT/);'
558970  2440045 1       chr2    94901055        94901055        ins             TGCAGACGGATGTGCTCCTACCAGAGCCGGCAC               150483  NM_144705.2       NP_653306.1     TEKT4   +       EXON    0       Y       INSERT+ 97      1       A       VQTDVLLPEPAP    A
558982  2440069 2       chr2    94901247        94901247        ins             TGCAGG          150483  NM_144705.2     NP_653306.1     TEKT4     +       EXON    0       Y       INSERT+ 289     65      R       LQG     R


...

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/gene-GS19240-180-36-ASM.tsv.bz2 | grep -w 'TEKT4' | grep -w 'MISSENSE' | cut -f1 | sort | uniq -c | sort -k1nr | head -5

result:

      1 558977
      1 558978
      1 558979
      1 558981
      1 558986


chr21 Varfile Subset

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2 | head -200 | grep -v '^[0-9]' > ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv

result:

# result saved to file, no terminal output

...

command:

bzcat ${path_to_data}/NA19240/GS00028-DNA_C01/ASM/var-GS19240-180-36-ASM.tsv.bz2 | grep 'chr21' >> ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv

result:

# result added to file, no terminal output

see result

command:

head -20 ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv

result:

#ASSEMBLY_ID    GS19240-180-36-ASM
#DBSNP_BUILD    dbSNP build 130
#FORMAT_VERSION 1.3
#GENERATED_AT   2010-Jul-14 04:33:11.149915
#GENERATED_BY   dbsnptool
#GENOME_REFERENCE       NCBI build 36
#SAMPLE GS00028-DNA_C01
#SOFTWARE_VERSION       1.8.0.23
#TYPE   VAR-ANNOTATION

>locus  ploidy  allele  chromosome      begin   end     varType reference       alleleSeq       totalScore      hapLink xRef
21030570        ?       all     chr21   0       9719767 no-ref  =       ?
21030571        2       1       chr21   9719767 9719767 no-call         ?
21030571        2       1       chr21   9719767 9719773 ref     AATTCT  AATTCT  112
21030571        2       2       chr21   9719767 9719773 no-call AATTCT  ?
21030572        2       all     chr21   9719773 9719792 ref     =       =
21030573        2       1       chr21   9719792 9719793 ref     G       G       112     3931115
21030573        2       2       chr21   9719792 9719793 snp     G       T       112     3931116
21030574        2       all     chr21   9719793 9719796 ref     =       =
21030575        2       1       chr21   9719796 9719797 ref     G       G       112     3931115

calldiff

command:

 cgatools calldiff --reference ${path_to_data}/cgatools/cgatools_input/hg18.crr \
 --variantsA ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv \
 --variantsB ${path_to_data}/cgatools/cgatools_input/var-NA19238-chr21.tsv \
 --superlocus-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-output.tsv \
 --superlocus-stats ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-stats.csv \
 --locus-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-output.tsv \
 --locus-stats ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-stats.csv \
 --debug-call-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-debug-call-output.tsv \
 --debug-superlocus-output ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-debug-superlocus-output.tsv

result:

# result saved to file, no terminal output

# superlocus-output
head -20 ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-output.tsv
# superlocus-stats
cat ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-superlocus-stats.csv
# locus-output
head -20 ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-output.tsv
# locus-stats
cat ${path_to_data}/cgatools/cgatools_output/NA19240vsNA19238-locus-stats.csv

snpdiff

command:

cgatools snpdiff --reference ${path_to_data}/cgatools/cgatools_input/hg18.crr \
 --variants ${path_to_data}/cgatools/cgatools_input/var-NA19240-chr21.tsv \
 --genotypes ${path_to_data}/cgatools/cgatools_input/NA19238_Infinium_Genotypes.tsv \
 --output ${path_to_data}/cgatools/cgatools_output/NA19238_out.tsv \
 --verbose ${path_to_data}/cgatools/cgatools_output/NA19238_verb.tsv \
 --stats ${path_to_data}/cgatools/cgatools_output/NA19238_stats.csv

result:

# result saved to file, no terminal output

# preview output with
head -20  ${path_to_data}/cgatools/cgatools_output/NA19238_out.tsv
cat ${path_to_data}/cgatools/cgatools_output/NA19238_stats.csv

end of 'September-9th' exercises

when we get new commands for version 1.1, they will be added here
back to [training page]