Python training exercise 7

From BITS wiki
Jump to: navigation, search

Introduction

More often than not the data you need for your program will come from somewhere else - either from user input or a file. Especially for more complex data it becomes essential to be able to read in data files, do something with the data, and write out a new file with modified information or a set of analysis results.

Exercises

Reading a file

To read in a file you have to create a file handle. This is a sort of live connection to the file on disk that you can use to pull data from it. You create a connection to a file by using the open() command.

Before we do this, download [this fake PDB coordinate file for a 5 residue peptide and save it in the directory you are working in. Then create the new program below in the same directory - Python has to know where the file is in order to access it.

# Open the file
fileHandle = open("TestFile.pdb")
 
# Read all the lines in the file (as separated by a newline character), and store them in the lines list
# Each element in this list corresponds to one line of the file!
lines = fileHandle.readlines()
 
# Close the file
fileHandle.close()
 
# Print number of lines in the file
print(len(lines))
 
# Loop over the lines, and do some basic string manipulations
for line in lines:
 
  line = line.strip()  # Remove starting and trailing spaces/tabs/newlines
 
  # Only do something if it's not an empty line
  if line:
    cols = line.split()   # Split the line by white spaces; depending on the format this could be commas, ...
 
    # Now you can do many other things with the data in the file...

If all is well, the file has 263 lines.

Writing a file

Writing a file is very similar, except that you have to let Python know you are writing this time. Try this:

# Open the file for writing - by using the extra 'w' argument.
# Be careful - if the file exists already it will be overwritten without warning!
fileHandle = open("testFile.txt",'w') 
 
# Write a header line for the data we will be writing. Don't forget the newline at the end!!!
fileHandle.write("LineNumber Value Divided_five  Rest_divided_five\n")
 
# Create some data to write out
myData = list(range(50,100))
myDivider = 5
 
for dataIndex in range(len(myData)):
  myNumber = myData[dataIndex]
 
  divided = myNumber / myDivider
  restDivided = myNumber % myDivider
 
  fileHandle.write("{:6d}    {:5d}    {:5d}        {:5d}\n".format(dataIndex+1,myNumber,divided,restDivided))
 
# Close the file
fileHandle.close()
 
# Print number of lines in the file
print(len(lines))

The file is written to the directory you're executing the program in - have a look!

Advanced file reading and interpretation exercise

Read in the TestFile.pdb atom coordinate file, print out the title of the file, and find all atoms that have coordinates closer than 2 angstrom to the (x,y,z) coordinate (-8.7,-7.7,4.7). Print out the model number, residue number, atom name and atom serial for each; the model is indicated by:

MODEL        1

lines, the atom coordinate information is in:

ATOM      1  N   ASP A   1     -10.341  -9.922   9.398  1.00  0.00           N

lines, where column 1 is always ATOM, column 2 is the atom serial, the column 3 the atom name, column 4 the residue name, column 5 the chain code, column 6 the residue number, followed by the x, y and z coordinates in angstrom in columns 7, 8 and 9.

Note that the distance between two coordinates is calculated as the square root of (x1-x2)²+(y1-y2)²+(z1-z2)².

Back to main page