Manipulating Data and Files¶

About files¶

In this Section we will learn how to write and read data to and from files using Python.

We can then use this data to do useful things like plotting or calculations.

To use a file it has to be opened, and when finished it has to be closed.
While the file is open, it can either be read from or written to.

To open a file, we specify its name and indicate whether we want to read or write.

Files for Examples:¶

These examples and the ones later require us having some files to process, these are in the Files folder, so make sure you download the files in this if working on your own machine.

Make sure they are inside correctly named sub-folders on your hard disk.
Use the same capitalisation as the original for cross platform compatibility. (Many operating systems are case sensitive).

To find out where a Jupyter Notebook is running run the following command:¶

import os

here = os.getcwd() #get location of Current Working Directory
print(here)

You can then navigate to this folder using the file-browser on your operating system.

Organising Folders¶

When we create a new file by opening it and writing, the new file goes in the current working folder (also called a “directory”).
When we open a file for reading, Python also looks for it in the same place.

If we want to open a file somewhere else, we have to specify the path to the file, which is the name of the directory (or folder) where the file is located.

Usually we use relative paths, when we have a subfolder in the current working directory e.g.: Files/Work/mynotes.txt.
The directory above is ../ etc.

Use the same case (uppercase capitals etc.), which may not matter on Windows byt will affect the script on other operating systems and for marking the work.

A full (“absolute”) Windows path might be "c:/Users/Nick/words.txt" or "c:\\Users\\Nick\\words.txt", but do NOT use these in shared or submitted files only if you are ever going to use the script on a single machine.
Because backslashes are used to escape things like newlines and tabs, we need to write two backslashes in a string to get a real backslash \, so can use a forward slash instead (like web addresses).

We cannot use / or \ as part of a filename; they are reserved as a delimiter between directory and filenames.

Writing our First File¶

To open a file we use the open(NAME, MODE) function, which takes two arguments.
The first is the name of the file, and the second is called the mode.
Mode "w" means that we are opening the file for writing.

With mode "w", if there is no file named test.txt on the disk, it will be created.
If there already is one, it will be replaced by the file we are writing (be careful to not overwrite important files!).

myfile = open("workfile.txt", "w")
print(myfile)

The variable named myfile acts like a container holding all the contents copied to memory from the file on disk.
You can move information from the container a piece at time, or all at once until it is empty.

You can then use different methods on the file, using a dot as in: FILE.METHOD, and this makes changes to the myfile container.

The following line writes text into a file, using the write method. The file is written to the current working folder where the programme script is run from (See Section below on **Using the OS Module} for more on working folders).

myfile.write("This is a test...")

Try opening the new file from the main JupyterHub file menu or on your disk using the file browser.
- notice that it will be empty.
- This is because you haven’t yet saved the file to disk (like editing but not pressing save in a word processor).

To send any data to the file before we are finished we can use the flush method.

myfile.flush()

Now try reopening the file to see if the contents have appeared.

Using the close method will both save the data and close the file.

To write something other than a string, it needs to be converted to a string first:

value = 42
s = str(value)
myfile.write(s)
myfile.close()

Reopen the file and see the new contents.

Note: if you try to save something other than a string, or a variable that does not exist we will get an error.

The `with` statement¶

A safe way of writing to files is using the with statement and an indented block of code that does something to the file. This is safe because you do not need to specify close() at the end. The contents are written to the file whether the code exited normally or not. This means you do not lose any work in progress.

The general syntax for opening the file FILENAME and assigning it to a FILEHANDLE (like FH=FN.open()) is:

with open(FILEMNAME, MODE) as FILEHANDLE:
    DO FUN STUFF
    FILEHANDLE.write()
    OTHER STUFF
    ...

Note in the example below the “triple quotes” allow us to use a multi-line string.

textlines="""A bit of text I want to write to a file.
1. The first point I want to make is this.
2. Next I want to tell you that..."""

fname="outfile.txt"

with open(fname,"w") as f:
    f.write(textlines)

Just to prove that the file has been written and closed properly, read its contents, which we will cover in the next subsection

print(open(fname,"r").read())

The write method can be used repeatedly to print more output to the file, as shown in lines 2, 3 and 4 below.
In bigger programs, lines 2-4 will usually be replaced by a loop that writes many more lines into the file.

with open("test.txt", "w") as mynewfile:
    mynewfile.write("# A bit of text I want to write to a file.")
    mynewfile.write("1. The first point I want to make is this.")
    mynewfile.write("2. Next I want to tell you that...")

Now look in your working folder and you will be able to open the new file test.txt.
- You will notice that the text is all on one line. This is because we didn’t tell Python to start a new line.
- We do this using the special character "\n" (backslash n), which stands for “newline”.
- This is like pressing enter or return on the keyboard. Other special characters exist, such as "\t" for tab.
Exercise: Try again, but putting a \n inside the end of each string

#with opener
# YOUR CODE HERE
    # YOUR CODE HERE
    # YOUR CODE HERE
    # YOUR CODE HERE

Now reload the file in a browser to see it.

It should look like this:

# A bit of text I want to write to a file.
1. The first point I want to make is this.
2. Next I want to tell you that...

Click here for solution

You will need the correct code run above for the following exercises to work!

Reading from Files¶

If a file exists on our disk, we can open it for reading.
This time, the mode argument is "r" for read:

thefile = open("test.txt", "r")
print(thefile)

This returns the contents of the file in a “container” object.

However, if we try to open a file for reading that doesn’t exist, we get an error:

anotherfile = open("iamnotafile.txt", "r")

We can write some code to catch errors and prevent them from raising an error message (a so called “exception”).

# This checks that the expected error is raised and raises an exception
fname = "iamnotafile.txt"

try: open(fname, "r")
except: print(f'File "{fname}" could not be opened.')

File Methods¶

There are a variety of methods for reading data and text from files, depending on the format of the data and how we want to use it. We can either read the contents of the whole file at once, or scan it in line-by-line.

Reading the Whole File¶

The read method returns the entire contents of the file, emptying the whole “container” into the string called this at the same time:

thefile = open("test.txt", "r")
this = thefile.read()
print(this)

If we try to read it a second time, the file container called thefile is now empty, so will return an empty string:

that = thefile.read()
print(that)

Reading Files a Line at a Time¶

Another file method is readline, which scans the contents line-by-line:

f = open("test.txt", "r")
print(0, f.readline())  # This will read the first line of the file.
print(1, f.readline())  # and next the second line
print(2, f.readline())  # and the last line
print(3, f.readline())  # the handle (container) is now empty...

The end="" argument prevents an extra newline being added:

f = open("test.txt", "r")
print(0, f.readline(), end="")  # This will read the first line of the file.
print(1, f.readline(), end="")  # and next the second line
print(2, f.readline(), end="")  # and the last line
print(3, f.readline(), end="")  # the handle (container) is now empty...

Iterative Methods for Scanning Files¶

A file can also be iterated over in the same way as a list.

Exercise: Put the previous code in a loop using for i in range(3): and changing the numbers for the counter i

# open the file
# YOUR CODE HERE
#FOR loop condition:
# YOUR CODE HERE
    #single print function
    # YOUR CODE HERE
#end of loop

Click for solution

Alternatively you can just iterate over the file contents directly:

the_file = open("test.txt", "r")

for each_line in the_file:
    print(each_line, end="")

Reading a File in to a List of Lines¶

It is often useful to fetch data from a disk file and turn it into a list of lines.
We can then perform useful tasks on this list.
The readlines method in line 2 reads all the lines and returns a list of the strings.

f = open("test.txt", "r")
list_of_lines = f.readlines()
print(list_of_lines)

print("The last line is:\n", list_of_lines[-1])

Sorting a File¶

This example sorts the lines of a file alphabetically (with capitalised words first).
We will use the friends.txt file we downloaded, which has a name per line.

Take a look at this file using a normal text editor.

First we read everything into a list of lines, then sort the list, and then write the sorted list back to another file:

thefile = open("Files/friends.txt", "r")
contentlist = thefile.readlines()
thefile.close()

print(contentlist)

Now we can sort the list alphanumerically:

contentlist.sort()  # sort is a list method, guess what it does before running this cell
list(contentlist) # spot the difference!

Now write back to another file:

with open("sortedfriends.txt", "w") as outfile:
    for next_entry in contentlist:
        outfile.write(next_entry)

Open the file using the browser to see its contents.

Exercise: File Reversing¶

Write a program that reads a file and writes out a new file with the lines in reversed order (i.e. the first line in the old file becomes the last one in the new file.)

Hint: use reverse indexing using:

range(-1,-N,-1)

#open the input file
# YOUR CODE HERE

# read into a list (called contents) using readlines
# YOUR CODE HERE

N = len(contents) # may be needed later...

# open your new file to write to, using a WITH statement:
# YOUR CODE HERE
    ## EVERYTHING INDENTED IN THE WITH BLOCK ##
    # loop backwards FOR the LENgth of the contents list obtained above
    # YOUR CODE HERE
        # write lines back to your file 
        # YOUR CODE HERE
    ## END OF THE WITH BLOCK ##

Click for solution

Filtering a File¶

Many useful line-processing programs will read a text file line-at-a-time and do some minor processing as they write the lines to an output file. They might number the lines in the output file, or insert extra blank lines after every 60 lines to make it convenient for printing on sheets of paper, or extract some specific columns only from each line in the source file, or only print lines that contain a specific substring.
We call this kind of program a filter.

Here is a filter that copies one file to another, omitting any lines that begin with #.

Have a look at the contents of the file Files/intext.txt to see what it looks like before running the next cell.

oldfilename = "Files/intext.txt"
newfilename = "outtext.txt"

infile = open(oldfilename, "r")

with open(newfilename, "w") as outfile:
    for text in infile:
        if text[0] != "#":
            outfile.write(text)
    outfile.close()

Look at outtext.txt to see what this did to the data.

Methods such as sorting and filtering are also very useful on numerical data.

Manipulating Numerical Data¶

Numerical data can be read in, processed and written to files in the same way as text.
Instead of using text methods we simply perform mathematical operation on the numbers.

Import of data using NumPy¶

Numerical data manipulation is made easier by importing and using the numerical module NumPy.
NumPy also has special methods for saving and loading purely numerical array data to and from files.

The following programme loads a column of text from the file "temps.txt" into an array using the ARRAY=np.loadtxt(FILENAME) method.

import numpy as np

data = np.loadtxt("Files/temps.txt")

print(data)

We can then take the mean (average) and standard deviation of the data using the .mean() function:

help(np.mean)

m = data.mean()
s = data.std()

print(f"mean = {m:.1f}, standard deviation = {s :.1f}")

Export of Data using Numpy¶

Continuing the last example we will now:

Calculate the the difference of the individual temperature values from the mean.
Save these to a new file using the np.savetxt(FILENAME, DATA) method.

devs = data-m # a new array with the differences
print(devs) #print the array
np.savetxt("deviations.txt", devs)

Look at the contents of the file from the file browser.

Manipulating CSV files¶

Numerical columns of data are commonly sored as Comma Separated Value files with a .csv extension

The file Files/weatherdata.csv contains hourly weather data in plain text, with values separated by commas (or other so called “delimiters”).

The file looks like this in a plain text editor:

Dry Bulb Temperature {C}, Dew Point Temperature {C}, Relative Humidity {%}, ...
2.50E+0, 1.2, 91, ...
5.70E+0, 3.5, 86, ...
7.90E+0, 4.7, 80, ...
... , ... , ...,

But when opened in Excel appears as a table like this:

Dry Bulb Temperature {C}	Dew Point Temperature {C}	Relative Humidity {%}	…
2.5	1.2	91	…
5.7	3.5	86	…
7.9	4.7	80	…
8.7	5.6	81	…
8.9	5.8	81	…
…	…	…	…

The np.loadtxt() function method can read the values into a data array by telling .loadtxt how the data is laid out in the .csv file.

The keyword argument delimiter=<SOME_STRING> tells numpy how the data is separated (without this option it assumes a space).
The keyword argument skiprows=1 is used to ignore the first row, which is the non-numerical header.

import numpy as np

# delimiter=',' tells .loadtxt that the values are separated with commas (rather than spaces)
filedata = np.loadtxt("Files/weatherdata.csv", delimiter=",", skiprows=1)

print(filedata) #will show only the head (top rows) and tail (bottom rows) of a long array

Each row is an hour’s weather data, with the temperatures in the first column.

Slicing data from an array¶

Numpy arrays can be sliced using ARRAYNAME[<STARTROW>:<ENDROW>, <STARTCOL>:<ENDCOL>] for example:

Taking the value in the 1st row (i=0) and third column (j=2):

filedata[0, 2]

91.0

Taking the value in the second row (i=1), from the second (j=1) to third (j=2) column:

filedata[1, 1:3] #note the end position is not included

array([ 3.5, 86. ])

Values before the fourth column (j=3) in the last row:

filedata[-1, :3]

array([ 4.3, 2. , 85. ])

the entire second column (i=1):

filedata[:, 1]

array([1.2, 3.5, 4.7, ..., 3.4, 2.8, 2. ])

Note that an empty value in a A:B specifier takes value to the end of the row or column.

Example: Calculating the average temperature:¶

import numpy as np

filedata = np.loadtxt("Files/weatherdata.csv", delimiter=",", skiprows=1)
temperatures = filedata[:, 0] # take the first (temperature) column

average = temperatures.mean()
print(f"Average temperature: {average:.2f} degrees C")

Saving Numerical Only Data¶

We can instead write the output to a file using np.savetxt(FILENAME, DATA, <keyword=options>) using the keyword options:

delimiter = "," to use commas in the .csv file (if there are multiple columns)
fmt="%.2f" to format the numbers as floats with 2 decimal places (similar to the format method above).

np.savetxt("temperatures.csv", temperatures, delimiter = ",", fmt="%.1f")

The file contents look like this:

2.5
5.7
7.9
.
.
.

Download it and open it in Excel

Exercise: Deviations from the average.¶

Load the weather data from the file.
Slice out the first column of temperatures.
Take the average value of the temperatures.
Subtract the average value from the temperatures array to give the deviations (\(d = T - \mu(T)\))
Save this back to a file called "deviations.csv" with 1dp floating point precision

import numpy as np

# load the weather file
# YOUR CODE HERE

# slice column 0
# YOUR CODE HERE

# obtain the mean
# YOUR CODE HERE

# calculate the deviations
# YOUR CODE HERE

# save back to a CSV text file
# YOUR CODE HERE

Opening the file should have the contents:

-7.7
-4.5
-2.3
...

Click for solution

Processing Multiple Files¶

Scripts can allow you to process many files in one go. You can split up a single file into many, join data from lots of files into one place or plot data to a range of figures in one go.

Using the `os` module.¶

A nice module for working with our Operating System is the os module.
This allows us to see/change our current working location as well as make new folders.
To view your current working directory use the .getcwd function method:

import os
myWD = os.getcwd()

print(myWD)

The method .listdir(FOLDER) allows us to list the contents of a directory (FOLDER).

Note that when referring to the current folder we can use the string "." and the one above we can use ".."

Try the following command:

contents = os.listdir(myWD)  # lists the contents of the working directory
print(contents)

os.listdir("Files") # this will only work if the `Files` folder exists

the os.mkdir(<FOLDERNAME>) tries to create a new directory named whatever string you replace <FOLDERNAME> with.
We use try: and except: to catch and ignore any errors such as the folder already existing.

Weekly Weather files:¶

import numpy as np
import os

filedata = np.loadtxt("Files/weatherdata.csv", delimiter=",", skiprows=1)
temperatures = filedata[:, 0]

folder = "Weather/"
try: os.mkdir(folder) # try to make the new folder if it doesn't exist
except: pass # if the folder already exists move on

hours_per_week = 24*7

# count for 52 weeks:
for i in range(52):
    start_hour = i*hours_per_week #takes values: 0, hours_per_week, 2*hours_per_week, ...
    end_hour = start_hour + hours_per_week
    weekly_temperatures = temperatures[start_hour:end_hour] #slice out the hours for that week
    
    # make a two digit week number 01, 02, 03, ..., 50, 51, 52
    week_number = i+1
    week_string = str(week_number).zfill(2) # fill with leading zeros to make all two digits
    
    # create a new file
    newfilename = f"temp_week{week_string}.txt"
    filepath = folder+newfilename
    np.savetxt(filepath, weekly_temperatures, fmt="%.1f")

The Weather folder should now contain 52 files, each with a week’s worth of hourly temperature data as a single column.

wfiles = os.listdir("Weather")
print(wfiles)

month	day	hour	gas
1	1	1	0.746
1	1	2	0.672
1	1	3	0.075

AR10366 (Computer Applications): Programming in Python

Manipulating Data and Files¶

About files¶

Files for Examples:¶

To find out where a Jupyter Notebook is running run the following command:¶

Organising Folders¶

Writing our First File¶

The `with` statement¶

Reading from Files¶

File Methods¶

Reading the Whole File¶

Reading Files a Line at a Time¶

Iterative Methods for Scanning Files¶

Reading a File in to a List of Lines¶

Sorting a File¶

Exercise: File Reversing¶

Filtering a File¶

Manipulating Numerical Data¶

Import of data using NumPy¶

Export of Data using Numpy¶

Manipulating CSV files¶

Slicing data from an array¶

Example: Calculating the average temperature:¶

Saving Numerical Only Data¶

Exercise: Deviations from the average.¶

Processing Multiple Files¶

Using the `os` module.¶

Weekly Weather files:¶

Task 7: File Manipulation (2%)¶

Task: Energy Data¶

The script below checks if your code has produced the desired output file and its contents are as expected.¶

Extra Example: Plotting Data from a Set of Files¶

1. Plotting a figure for each file¶

2. Collecting data from many files into one figure¶

Dry Bulb Temperature {C}	Dew Point Temperature {C}	Relative Humidity {%}	…
2.5	1.2	91	…
5.7	3.5	86	…
7.9	4.7	80	…
8.7	5.6	81	…
8.9	5.8	81	…
…	…	…	…

Dry Bulb Temperature {C}	Dew Point Temperature {C}	Relative Humidity {%}	…
2.5	1.2	91	…
5.7	3.5	86	…
7.9	4.7	80	…
8.7	5.6	81	…
8.9	5.8	81	…
…	…	…	…

AR10366 (Computer Applications): Programming in Python

Manipulating Data and Files¶

About files¶

Files for Examples:¶

To find out where a Jupyter Notebook is running run the following command:¶

Organising Folders¶

Writing our First File¶

The with statement¶

Reading from Files¶

File Methods¶

Reading the Whole File¶

Reading Files a Line at a Time¶

Iterative Methods for Scanning Files¶

Reading a File in to a List of Lines¶

Sorting a File¶

Exercise: File Reversing¶

Filtering a File¶

Manipulating Numerical Data¶

Import of data using NumPy¶

Export of Data using Numpy¶

Manipulating CSV files¶

Slicing data from an array¶

Example: Calculating the average temperature:¶

Saving Numerical Only Data¶

Exercise: Deviations from the average.¶

Processing Multiple Files¶

Using the os module.¶

Weekly Weather files:¶

Task 7: File Manipulation (2%)¶

Task: Energy Data¶

The script below checks if your code has produced the desired output file and its contents are as expected.¶

Extra Example: Plotting Data from a Set of Files¶

1. Plotting a figure for each file¶

2. Collecting data from many files into one figure¶

The `with` statement¶

Using the `os` module.¶

Dry Bulb Temperature {C}	Dew Point Temperature {C}	Relative Humidity {%}	…
2.5	1.2	91	…
5.7	3.5	86	…
7.9	4.7	80	…
8.7	5.6	81	…
8.9	5.8	81	…
…	…	…	…