Python training exercise 9
Contents
Introduction
So now that we know how to make functions, how can you use them in another program if they're not inside the same file? In Python you can import a function into a file containing your code, even if the code for the function is in another file.
This is possible by using imports. In this way you can import your own functions, but also draw on a very extensive library of functions provided by Python that you can use to help you when you are writing a program. We will first look at the syntax for doing imports, then explore the most commonly used Python libraries.
Exercises
Import syntax
From within the same directory, you can import functions from other files. Make sure the first file we created that contains the getMeanValue() function is called 'functions1.py', then create a new file in the same directory with the following code:
from functions1 import getMeanValue print(getMeanValue([1,4,5,4,3,45,5]))
So here we import the function getMeanValue (without brackets!) from the file functions1 (without .py extension!), then call it on a new list. However, when you run this program you will get output like this:
27.75 4916309.28571 9.57142857143
The last line is the one we were after, but where are the first lines coming from? When Python imports something from a file, it will also execute all other code that is in there (and not in a function). You can avoid this behaviour by always making sure that code is only executed if you directly call that file:
def getMeanValue(valueList): valueTotal = 0.0 for value in valueList: valueTotal += value numberValues = len(valueList) return (valueTotal/numberValues) if __name__ == '__main__': print(getMeanValue([4,6,77,3,67,54,6,5])) print(getMeanValue([3443,434,34343456,32434,34,34341,23])))
if __name__ == '__main__':
Another way to use imports is not to import a specific function, but to import the whole 'file'. In this case you can call the function as a method, similar to the methods for lists and strings that we saw earlier:
import functions1 if __name__ == '__main__': print(functions1.getMeanValue([1,4,5,4,3,45,5]))
Modify the last version of the script to read the TestFile.pdb file so that the code to be executed is separate from the functions. [click on show more for answer) |
---|
def getTitle(line): # Gets the title title = line.replace(cols[0],'') title = title.strip() return ("The title is '{}'".format(title)) def getAtomInfo(line): # Get relevant information from an ATOM line and convert to the right type atomSerial = int(cols[1]) atomName = cols[2] residueNumber = int(cols[5]) x = float(cols[6]) y = float(cols[7]) z = float(cols[8]) return (atomSerial,atomName,residueNumber,x,y,z) def calculateDistance(coordinate1,coordinate2): # Calculate the distance between two 3 dimensional coordinates return ((coordinate1[0] - coordinate2[0]) ** 2 + (coordinate1[1] - coordinate2[1]) ** 2 + (coordinate1[2] - coordinate2[2]) ** 2 ) ** 0.5 if __name__ == '__main__': # Open the file fileHandle = open("TestFile.pdb") # Read all the lines in the file (as separated by a newline character), and store them in the lines list # Each element in this list corresponds to one line of the file! lines = fileHandle.readlines() # Close the file fileHandle.close() # Initialise some information searchCoordinate = (-8.7,-7.7,4.7) modelNumber = None # Loop over the lines, and do some basic string manipulations for line in lines: line = line.strip() # Remove starting and trailing spaces/tabs/newlines # Only do something if it's not an empty line if line: cols = line.split() # Split the line by white spaces; depending on the format this could be commas, ... # Print off the title if cols[0] == 'TITLE': print(getTitle(line)) # Track the model number elif cols[0] == 'MODEL': modelNumber = int(cols[1]) # For atom lines, calculate the distance elif cols[0] == 'ATOM': (atomSerial,atomName,residueNumber,x,y,z) = getAtomInfo(line) # Calculate the distance distance = calculateDistance((x,y,z),searchCoordinate) if distance < 2.0: print("Model {}, residue {}, atom {} (serial {}) is {:.2f} away from reference.".format(modelNumber,residueNumber,atomName,atomSerial,distance)) |
Python libraries import
Python has many ready-to-use functions that can save you a lot of time when writing code. The most common ones are time, sys, os/os.path and re.
With time you can get information on the current time and date, ...:
import time if __name__ == '__main__': print(time.ctime()) # Print current day and time print(time.time()) # Print system clock time time.sleep(5) # Sleep for 5 seconds - the program will wait here
See the Python documentation for a full description of time. Also see datetime, which is a module to deal with date/time manipulations.
sys gives you system-specific parameters and functions:
import sys if __name__ == '__main__': print(sys.argv) # A list of parameters that are given when calling this script # from the command line (e.g. ''python myScript a b c'') print(sys.platform) # The platform the code is currently running on print(sys.path) # The directories where Python will look for things to import sys.exit() # Exit the code immediately
See the Python documentation for a full description.
os and os.path are very useful when dealing with files and directories:
import os if __name__ == '__main__': # Get the current working directory (cwd) currentDir = os.getcwd() print(currentDir) # Get a list of the files in the current working directory myFiles = os.listdir(currentDir) print(myFiles) # Create a directory, rename it, and remove it os.mkdir("myTempDir") os.rename("myTempDir","myNewTempDir") os.removedirs("myNewTempDir") # Create a the full path name to the first file of myFiles myFileFullPath = os.path.join(currentDir,myFiles[0]) print(myFileFullPath) # Does this file exist? print(os.path.exists(myFileFullPath)) # How big is the file? print(os.path.getsize(myFileFullPath)) # Split the directory path from the file name (myDir,myFileName) = os.path.split(myFileFullPath) print(myDir, myFileName)
See the Python documentation for os and os.path for a full description.
A library that is very powerful for dealing with strings is re. It allows you to use regular expressions to examine text - using these is a course in itself, so just consider this simple example:
import re if __name__ == '__main__': myText = """Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.""" # Compile a regular expression, myPattern = re.compile("(w\w+d)") # Look for the first word that starts with a w, # is followed by 1 or more characters (\w+) # and ends in a d mySearch = myPattern.search(myText) # mySearch will be None if nothing was found if mySearch: print(mySearch.groups())
See the full Python reference on regular expressions for more information.
Make a new directory in which you write out 5 files with a 2 second delay. Each file should contain the date and time when it was originally written out.[click on show more for answer) |
---|
import time, os if __name__ == '__main__': # Create a variable for the directory name myDir = "timeTest" # Check whether the directory exists, if not create it if not os.path.exists(myDir): os.mkdir(myDir) # Loop from 1 to 5 for i in range(1,6): # Get the current time currentTime = time.ctime() # Write out the file - use i to give a different name to each filePath = os.path.join(myDir,"myFile{}.txt".format(i)) outFileHandle = open(filePath,'w') outFileHandle.write("{}\n".format(currentTime)) outFileHandle.close() print("Written file {}...".format(filePath)) # Sleep for 2 seconds time.sleep(2) |
Write a function to read in a FASTA file with an RNA sequence and return the RNA sequence (in 3 base unit chunks).[click on show more for answer) |
---|
import os def readRnaFastaFile(fileName): if not os.path.exists(fileName): print("Error: File {} not available!".format(fileName)) return (None,None,None) fconnect = open(fileName) lines = fconnect.readlines() fconnect.close() sequenceInfo = [] moleculeName = None description = None # Get information from the first line - ignore the > firstLine = lines[0] firstLineCols = firstLine[1:].split() moleculeName = firstLineCols[0] description = firstLine[1:].replace(moleculeName,'').strip() # Now get the full sequence out fullSequence = "" for line in lines[1:]: line = line.strip() fullSequence += line # Divide up the sequence depending on type (amino acid or nucleic acid) for seqIndex in range(0,len(fullSequence),3): sequenceInfo.append(fullSequence[seqIndex:seqIndex+3]) return (moleculeName,description,sequenceInfo) if __name__ == '__main__': print(readRnaFastaFile("rnaSeq.txt")) |
More exercises combining dictionaries, files and imports
Write a program where you ask the user for a one-letter amino acid sequence, and print out the three-letter amino acid codes. Download this file and save it as SequenceDicts.py first; you can get the reference information from there using an import.[click on show more for answer) |
---|
# Note how you can import a function (or variable) with a different name for your program! from SequenceDicts import proteinOneToThree as oneToThreeLetterCodes oneLetterSeq = input('Give one letter sequence:') if oneLetterSeq: for oneLetterCode in oneLetterSeq: if oneLetterCode in oneToThreeLetterCodes.keys(): print(oneToThreeLetterCodes[oneLetterCode]) else: print("One letter code '{}' is not a valid amino acid code!".format(oneLetterCode)) else: print("You didn't give me any information!") |
Write a program where you translate the RNA sequence from the exercise in the previous section into 3 letter amino acid codes. Also use the this file and save it as SequenceDicts.py first; you can get the reference information from there using an import.[click on show more for answer) |
---|
from sequenceDicts import standardRnaToProtein, proteinOneToThree from readFasta import readRnaFastaFile if __name__ == '__main__': (molName,description,sequenceInfo) = readRnaFastaFile("rnaSeq.fasta") proteinThreeLetterSeq = [] for rnaCodon in sequenceInfo: aaOneLetterCode = standardRnaToProtein[rnaCodon] aaThreeLetterCode = proteinOneToThree[aaOneLetterCode] proteinThreeLetterSeq.append(aaThreeLetterCode) print(proteinThreeLetterSeq) |
Write a program that:
[click on show more for answer) |
---|
import os def readSampleInformationFile(fileName): # Read in the sample information file in .csv (comma-delimited) format # Doublecheck if file exists if not os.path.exists(fileName): print("File {} does not exist!".format(fileName)) return None # Open the file and read the information fileHandle = open(fileName) lines = fileHandle.readlines() fileHandle.close() # Now read the information. The first line has the header information which # we are going to use to create the dictionary! fileInfoDict = {} headerCols = lines[0].strip().split(',') # Now read in the information, use the first column as the key for the dictionary # Note that you could organise this differently by creating a dictionary with # the header names as keys, then a list of the values for each of the columns. for line in lines[1:]: line = line.strip() # Remove newline characters cols = line.split(',') sampleId = int(cols[0]) fileInfoDict[sampleId] = {} # Don't use the first column, is already the key! for i in range(1,len(headerCols)): valueName = headerCols[i] value = cols[i] if valueName in ('pH','temperature','volume'): value = float(value) fileInfoDict[sampleId][valueName] = value # Return the dictionary with the file information return fileInfoDict def getSampleIdsForValueRange(fileInfoDict,valueName,lowValue,highValue): # Return the sample IDs that fit within the given value range for a kind of value sampleIdList = fileInfoDict.keys() sampleIdList.sort() sampleIdsFound = [] for sampleId in sampleIdList: currentValue = fileInfoDict[sampleId][valueName] if lowValue <= currentValue <= highValue: sampleIdsFound.append(sampleId) return sampleIdsFound if __name__ == '__main__': fileInfoDict = readSampleInformationFile("sampleInfo.txt") print(getSampleIdsForValueRange(fileInfoDict,'pH',6.0,7.0)) print(getSampleIdsForValueRange(fileInfoDict,'temperature',280,290)) print(getSampleIdsForValueRange(fileInfoDict,'volume',200,220)) |