.msf

From BITS wiki
Jump to: navigation, search

Source

GCG/MSF Format

  • The file may begin with as many lines of comment or description as required.
  • The comments are terminated with a line starting with two slashes.
  • The first mandatory line that is recognised as part of the MSF file is the line containing the text "MSF:", this line also includes the sequence length, type and date plus an internal check sum value.
  • The next line is a mandatory blank line inserted before the sequence names.
  • There then follows one line per sequence describing the sequence name, length, checksum and a weight value. Only one name per line is allowed; the qualifier "Name: " is followed by the sequence name. Names are restricted to 10 characters or less. Extra characters, between the sequence names and "Len: " are acceptable if they contain no blank characters. Another blank line is added followed by a line starting with two slashes "//" , this indicates the end of the name list.
  • There then follows another blank line.
  • Sequences are interleaved on separate lines with gaps represented by periods. Each sequence line starts with the sequence name which is separated from the aligned sequence residues by white space.

Example

      MSF:  510  Type: P    Check:  7736   ..
    
    Name: ACHE_BOVIN oo  Len:  510  Check:  7842  Weight:  16.0
    Name: ACHE_HUMAN oo  Len:  510  Check:  8553  Weight:  17.8
    Name: ACHE_MOUSE oo  Len:  510  Check:   229  Weight:  12.5
    Name: ACHE_RAT oo  Len:  510  Check:  8410  Weight:  14.2
    Name: ACHE_XENLA oo  Len:  510  Check:  2702  Weight:  39.2
   
   //
   
   
   
   ACHE_BOVIN      MAGALLCALL LLQLLGRGEG KNEELRLYHY LFDTYDPGRR PVQEPEDTVT
   ACHE_HUMAN      MARAPLGVLL LLGLLGRGVG KNEELRLYHH LFNNYDPGSR PVREPEDTVT
   ACHE_MOUSE      MAGALLGALL LLTLFGRSQG KNEELSLYHH LFDNYDPECR PVRRPEDTVT
   ACHE_RAT        MTMALLGTLL LLALFGRSQG KNEELSLYHH LFDNYDPECR PVRRPEDTVT
   ACHE_XENLA      MESGVRILSL LILLHNSLAS ESEESRLIKH LFTSYDQKAR PSKGLDDVVP
   
   
   ACHE_BOVIN      ISLKVTLTNL ISLNEKEETL TTSVWIGIDW QDYRLNYSKG DFGGVETLRV
   ACHE_HUMAN      ISLKVTLTNL ISLNEKEETL TTSVWIGIDW QDYRLNYSKD DFGGIETLRV
   ACHE_MOUSE      ITLKVTLTNL ISLNEKEETL TTSVWIGIDW HDYRLNYSKD DFAGVGILRV
   ACHE_RAT        ITLKVTLTNL ISLNEKEETL TTSVWIGIEW QDYRLNFSKD DFAGVEILRV
   ACHE_XENLA      VTLKLTLTNL IDLNEKEETL TTNVWVQIAW NDDRLVWNVT DYGGIGFVPV