Python training exercise 9

Introduction

So now that we know how to make functions, how can you use them in another program if they're not inside the same file? In Python you can import a function into a file containing your code, even if the code for the function is in another file.

This is possible by using imports. In this way you can import your own functions, but also draw on a very extensive library of functions provided by Python that you can use to help you when you are writing a program. We will first look at the syntax for doing imports, then explore the most commonly used Python libraries.

Exercises

Import syntax

From within the same directory, you can import functions from other files. Make sure the first file we created that contains the getMeanValue() function is called 'functions1.py', then create a new file in the same directory with the following code:

from functions1 import getMeanValue
 
print(getMeanValue([1,4,5,4,3,45,5]))

So here we import the function getMeanValue (without brackets!) from the file functions1 (without .py extension!), then call it on a new list. However, when you run this program you will get output like this:

27.75
4916309.28571
9.57142857143

The last line is the one we were after, but where are the first lines coming from? When Python imports something from a file, it will also execute all other code that is in there (and not in a function). You can avoid this behaviour by always making sure that code is only executed if you directly call that file:

def getMeanValue(valueList):
 
  valueTotal = 0.0
 
  for value in valueList:
    valueTotal += value
 
  numberValues = len(valueList)
 
  return (valueTotal/numberValues)
 
if __name__ == '__main__':
 
  print(getMeanValue([4,6,77,3,67,54,6,5]))
  print(getMeanValue([3443,434,34343456,32434,34,34341,23])))

The line

if __name__ == '__main__':

will make sure that everything underneath it is only executed when you directly call the file, not when you import something from it.

Another way to use imports is not to import a specific function, but to import the whole 'file'. In this case you can call the function as a method, similar to the methods for lists and strings that we saw earlier:

import functions1
 
if __name__ == '__main__':
 
  print(functions1.getMeanValue([1,4,5,4,3,45,5]))

Modify the last version of the script to read the TestFile.pdb file so that the code to be executed is separate from the functions. [click on show more for answer)

def getTitle(line):
 
  # Gets the title
 
  title = line.replace(cols[0],'')
  title = title.strip()
 
  return ("The title is '{}'".format(title))
 
def getAtomInfo(line):
 
  # Get relevant information from an ATOM line and convert to the right type
 
  atomSerial = int(cols[1])
  atomName = cols[2]
  residueNumber = int(cols[5])
  x = float(cols[6])
  y = float(cols[7])
  z = float(cols[8])
 
  return (atomSerial,atomName,residueNumber,x,y,z)
 
def calculateDistance(coordinate1,coordinate2):
 
  # Calculate the distance between two 3 dimensional coordinates
 
 return ((coordinate1[0] - coordinate2[0]) ** 2 + (coordinate1[1] - coordinate2[1]) ** 2 + (coordinate1[2] - coordinate2[2]) ** 2 ) ** 0.5
 
 
if __name__ == '__main__':
 
  # Open the file
  fileHandle = open("TestFile.pdb")
 
  # Read all the lines in the file (as separated by a newline character), and store them in the lines list
  # Each element in this list corresponds to one line of the file!
  lines = fileHandle.readlines()
 
  # Close the file
  fileHandle.close()
 
  # Initialise some information
  searchCoordinate = (-8.7,-7.7,4.7)
  modelNumber = None
 
  # Loop over the lines, and do some basic string manipulations
  for line in lines:
 
    line = line.strip()  # Remove starting and trailing spaces/tabs/newlines
 
    # Only do something if it's not an empty line
    if line:
      cols = line.split()   # Split the line by white spaces; depending on the format this could be commas, ...
 
      # Print off the title
      if cols[0] == 'TITLE':
        print(getTitle(line))
 
      # Track the model number
      elif cols[0] == 'MODEL':
        modelNumber = int(cols[1])
 
      # For atom lines, calculate the distance
      elif cols[0] == 'ATOM':
 
        (atomSerial,atomName,residueNumber,x,y,z) = getAtomInfo(line)
 
        # Calculate the distance
        distance = calculateDistance((x,y,z),searchCoordinate)
 
        if distance < 2.0:
          print("Model {}, residue {}, atom {} (serial {}) is {:.2f} away from reference.".format(modelNumber,residueNumber,atomName,atomSerial,distance))

Python libraries import

Python has many ready-to-use functions that can save you a lot of time when writing code. The most common ones are time, sys, os/os.path and re.

With time you can get information on the current time and date, ...:

import time
 
if __name__ == '__main__':
 
  print(time.ctime())  # Print current day and time
  print(time.time())   # Print system clock time
  time.sleep(5)       # Sleep for 5 seconds - the program will wait here

See the Python documentation for a full description of time. Also see datetime, which is a module to deal with date/time manipulations.

sys gives you system-specific parameters and functions:

import sys
 
if __name__ == '__main__':
 
  print(sys.argv)        # A list of parameters that are given when calling this script 
                          # from the command line (e.g. ''python myScript a b c'')
  print(sys.platform)  # The platform the code is currently running on
  print(sys.path)      # The directories where Python will look for things to import
 
  sys.exit()          # Exit the code immediately

See the Python documentation for a full description.

os and os.path are very useful when dealing with files and directories:

import os
 
if __name__ == '__main__':
 
  # Get the current working directory (cwd)
  currentDir = os.getcwd()
  print(currentDir)
 
  # Get a list of the files in the current working directory    
  myFiles = os.listdir(currentDir)
  print(myFiles)
 
  # Create a directory, rename it, and remove it
  os.mkdir("myTempDir")
  os.rename("myTempDir","myNewTempDir")
  os.removedirs("myNewTempDir")
 
  # Create a the full path name to the first file of myFiles
  myFileFullPath = os.path.join(currentDir,myFiles[0])
  print(myFileFullPath)
 
  # Does this file exist?
  print(os.path.exists(myFileFullPath))
 
  # How big is the file?
  print(os.path.getsize(myFileFullPath))
 
  # Split the directory path from the file name
  (myDir,myFileName) = os.path.split(myFileFullPath)
  print(myDir, myFileName)

See the Python documentation for os and os.path for a full description.

A library that is very powerful for dealing with strings is re. It allows you to use regular expressions to examine text - using these is a course in itself, so just consider this simple example:

import re
 
if __name__ == '__main__':
 
  myText = """Call me Ishmael. Some years ago - never mind how long precisely -
having little or no money in my purse, and nothing particular to interest me on 
shore, I thought I would sail about a little and see the watery part of the 
world."""
 
  # Compile a regular expression, 
  myPattern = re.compile("(w\w+d)")  # Look for the first word that starts with a w,
                                     # is followed by 1 or more characters (\w+)
                                     # and ends in a d
 
  mySearch = myPattern.search(myText)
 
  # mySearch will be None if nothing was found
  if mySearch:
    print(mySearch.groups())

See the full Python reference on regular expressions for more information.

Make a new directory in which you write out 5 files with a 2 second delay. Each file should contain the date and time when it was originally written out.[click on show more for answer)

import time, os
 
if __name__ == '__main__':
 
  # Create a variable for the directory name
  myDir = "timeTest"
 
  # Check whether the directory exists, if not create it
  if not os.path.exists(myDir):
    os.mkdir(myDir)
 
 
  # Loop from 1 to 5
  for i in range(1,6):
 
    # Get the current time
    currentTime = time.ctime()
 
    # Write out the file - use i to give a different name to each
    filePath = os.path.join(myDir,"myFile{}.txt".format(i))
 
    outFileHandle = open(filePath,'w')    
    outFileHandle.write("{}\n".format(currentTime))
    outFileHandle.close()
 
    print("Written file {}...".format(filePath))
 
    # Sleep for 2 seconds
    time.sleep(2)

Write a function to read in a FASTA file with an RNA sequence and return the RNA sequence (in 3 base unit chunks).[click on show more for answer)

import os
 
def readRnaFastaFile(fileName):
 
  if not os.path.exists(fileName):
    print("Error: File {} not available!".format(fileName))
    return (None,None,None)
 
  fconnect = open(fileName)
  lines = fconnect.readlines()
  fconnect.close()
 
  sequenceInfo = []
  moleculeName = None
  description = None
 
  # Get information from the first line - ignore the >
  firstLine = lines[0]
  firstLineCols = firstLine[1:].split()
  moleculeName = firstLineCols[0]
  description = firstLine[1:].replace(moleculeName,'').strip()
 
  # Now get the full sequence out
  fullSequence = ""
  for line in lines[1:]:
 
    line = line.strip()
    fullSequence += line
 
  # Divide up the sequence depending on type (amino acid or nucleic acid)
  for seqIndex in range(0,len(fullSequence),3):
    sequenceInfo.append(fullSequence[seqIndex:seqIndex+3])
 
  return (moleculeName,description,sequenceInfo)
 
if __name__ == '__main__':
 
  print(readRnaFastaFile("rnaSeq.txt"))

More exercises combining dictionaries, files and imports

Write a program where you ask the user for a one-letter amino acid sequence, and print out the three-letter amino acid codes. Download this file and save it as SequenceDicts.py first; you can get the reference information from there using an import.[click on show more for answer)

# Note how you can import a function (or variable) with a different name for your program!
from SequenceDicts import proteinOneToThree as oneToThreeLetterCodes
 
oneLetterSeq = input('Give one letter sequence:')
 
if oneLetterSeq:
  for oneLetterCode in oneLetterSeq:
    if oneLetterCode in oneToThreeLetterCodes.keys():
      print(oneToThreeLetterCodes[oneLetterCode])
    else:
      print("One letter code '{}' is not a valid amino acid code!".format(oneLetterCode))
else:
  print("You didn't give me any information!")

Write a program where you translate the RNA sequence from the exercise in the previous section into 3 letter amino acid codes. Also use the this file and save it as SequenceDicts.py first; you can get the reference information from there using an import.[click on show more for answer)

from sequenceDicts import standardRnaToProtein, proteinOneToThree
 
from readFasta import readRnaFastaFile
 
if __name__ == '__main__':
 
  (molName,description,sequenceInfo) = readRnaFastaFile("rnaSeq.fasta")
 
  proteinThreeLetterSeq = []
 
  for rnaCodon in sequenceInfo:
 
    aaOneLetterCode = standardRnaToProtein[rnaCodon]
 
    aaThreeLetterCode = proteinOneToThree[aaOneLetterCode]
 
    proteinThreeLetterSeq.append(aaThreeLetterCode)
 
 
  print(proteinThreeLetterSeq)

Write a program that:

Has a function readSampleInformationFile() to read the information from this sample data file into a dictionary. Also check whether the file exists.
Has a function getSampleIdsForValueRange() that can extract sample IDs from this dictionary. Print the sample IDs for pH 6.0-7.0, temperature 280-290 and volume 200-220 using this function.

[click on show more for answer)

import os
 
def readSampleInformationFile(fileName):
 
  # Read in the sample information file in .csv (comma-delimited) format
 
  # Doublecheck if file exists
  if not os.path.exists(fileName):
    print("File {} does not exist!".format(fileName))
    return None
 
  # Open the file and read the information
  fileHandle = open(fileName)
  lines = fileHandle.readlines()
  fileHandle.close()
 
  # Now read the information. The first line has the header information which
  # we are going to use to create the dictionary!
 
  fileInfoDict = {}
 
  headerCols = lines[0].strip().split(',')
 
  # Now read in the information, use the first column as the key for the dictionary
  # Note that you could organise this differently by creating a dictionary with
  # the header names as keys, then a list of the values for each of the columns.
 
  for line in lines[1:]:
 
    line = line.strip()  # Remove newline characters
    cols = line.split(',')
 
    sampleId = int(cols[0])
 
    fileInfoDict[sampleId] = {}
 
    # Don't use the first column, is already the key!
    for i in range(1,len(headerCols)):
      valueName = headerCols[i]
 
      value = cols[i]
      if valueName in ('pH','temperature','volume'):
        value = float(value)
 
      fileInfoDict[sampleId][valueName] = value
 
  # Return the dictionary with the file information
  return fileInfoDict
 
def getSampleIdsForValueRange(fileInfoDict,valueName,lowValue,highValue):
 
  # Return the sample IDs that fit within the given value range for a kind of value
 
  sampleIdList = fileInfoDict.keys()
  sampleIdList.sort()
 
  sampleIdsFound = []
 
  for sampleId in sampleIdList:
 
    currentValue = fileInfoDict[sampleId][valueName]
 
    if lowValue <= currentValue <= highValue:
      sampleIdsFound.append(sampleId)
 
  return sampleIdsFound
 
if __name__ == '__main__':
 
  fileInfoDict = readSampleInformationFile("sampleInfo.txt")
 
  print(getSampleIdsForValueRange(fileInfoDict,'pH',6.0,7.0))
  print(getSampleIdsForValueRange(fileInfoDict,'temperature',280,290))
  print(getSampleIdsForValueRange(fileInfoDict,'volume',200,220))

Back to main page

Python training exercise 9

Contents

Introduction

Exercises

Import syntax

Python libraries import

More exercises combining dictionaries, files and imports

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Resources

Toolbox