Complex Data Structures in Python

Most programming languages offer the facility for making large, compound data structures. For example C, Pascal, Perl and Python. A few simple data types are provided, out of which larger structures can be built. A programmer can store data in a whatever way is most suitable for the application.

Often, a simple list or dictionary will be enough. Read the data in, process it, and print the results out. Perfect. But for a larger or more useful application, more data, and more kinds of data, will need to be stored and processed at the same time.

This article demonstrates the building of a complex data structure in Python. Note: it is not about classes, or object oriented programming, just the syntax for handling complex data structures, made up of lists, dictionaries and simple strings and integers.

Starting with a Simple Data Structure

I’m running a second hand car business. There are some cars on the lot and I want to store information about them. Let’s start by creating a simple, empty list:

cars = []

Great. Okay, let’s add a car. We will add it as a dictionary.

cars.append ({"make": "alfa", "colour": "red"})
print cars

Output:

[{'make': 'alfa', 'colour': 'red'}]

We used the append function to add an element to the list. That element was a dictionary. The list is represented by the outer square brackets, and curly braces enclose the inner dictionary.

Extending the List

What we have is a list of dictionaries. (Well, only one dictionary so far). Now let’s add some more dictionaries to describe more cars:

cars.append ({"make": "land rover", "colour": "green"})
cars.append ({"make": "aston", "colour": "gold"})
print cars

Output:

[{'make': 'alfa', 'colour': 'red'}, {'make': 'land rover', 'colour': 'green'}, {'make': 'aston', 'colour': 'gold'}]

Pretty Printing

That’s a list containing 3 dictionaries. To make it look better, the Python pprint module will be used. Adding a few lines, our whole program now appears like this:

#!/usr/bin/python

import pprint

cars = []

cars.append ({"make": "alfa", "colour": "red"})
cars.append ({"make": "land rover", "colour": "green"})
cars.append ({"make": "aston", "colour": "gold"})

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(cars)

and the output is:

[   {   'colour': 'red', 'make': 'alfa'},
    {   'colour': 'green', 'make': 'land rover'},
    {   'colour': 'gold', 'make': 'aston'}]

That looks better. Out list of dictionaries is clearly illustrated.

Add More Data to Existing Dictionaries

Now these are used cars. Each car has a number of previous owners. A number field could be added for each car, showing how many people have owned it. We will use a straightforward assignment to do it. Our first car was an Alfa and being first, it is element number zero in the main list.

cars[0]["ownercount"] = 2

To explain. cars[0] is the Alfa’s dictionary. Using the syntax above, we have added a new key and value to the dictionary. Add the line and re-running the program, we get:

[   {   'colour': 'red', 'make': 'alfa', 'ownercount': 2},
    {   'colour': 'green', 'make': 'land rover'},
    {   'colour': 'gold', 'make': 'aston'}]

To continue, the Land Rover, cars[1], has had one owner, while the Aston has had 3. They can be added as follows:

cars[1]["ownercount"] = 1
cars[2]["ownercount"] = 3

our the output becomes:

[   {   'colour': 'red', 'make': 'alfa', 'ownercount': 2},
    {   'colour': 'green', 'make': 'land rover', 'ownercount': 1},
    {   'colour': 'gold', 'make': 'aston', 'ownercount': 3}]

Add a sub List

Let’s add some information about those previous owners of our cars. For each car, we can have a list containing all owners. The Alfa had two previous owners. We can add the list like this:

cars[0]["owners"] = ["tom", "james"]

which makes the structure look like this:

[   {   'colour': 'red',
        'make': 'alfa',
        'ownercount': 2,
        'owners': ['tom', 'james']},
    {   'colour': 'green', 'make': 'land rover', 'ownercount': 1},
    {   'colour': 'gold', 'make': 'aston', 'ownercount': 3}]

A new field has been added to the dictionary for the Alfa, with the key “owners”. The value is not a simple variable but a list, containing the names of tom and james. Now do the same for our other two cars. Alison bought the Land Rover new, while the Aston (car 3) is a bit older, and has had 3 lucky owners:

cars[1]["owners"] = ["alison"]
cars[2]["owners"] = ["tom", "james", "jake"]

The effect of these assignments is reflected in the program output:

[   {   'colour': 'red',
        'make': 'alfa',
        'ownercount': 2,
        'owners': ['tom', 'james']},
    {   'colour': 'green',
        'make': 'land rover',
        'ownercount': 1,
        'owners': ['alison']},
    {   'colour': 'gold',
        'make': 'aston',
        'ownercount': 3,
        'owners': ['tom', 'james', 'jake']}]

The list of previous owners is the deepest bit of the structure so far, having the form list -> dictionary -> list. Or as C/Perl programmers might have it, an array of hashes of an array.

The Program So Far

The code is now:

#!/usr/bin/python

import pprint

cars = []

cars.append ({"make": "alfa", "colour": "red"})
cars.append ({"make": "land rover", "colour": "green"})
cars.append ({"make": "aston", "colour": "gold"})

cars[0]["ownercount"] = 2
cars[1]["ownercount"] = 1
cars[2]["ownercount"] = 3

cars[0]["owners"] = ["tom", "james"]
cars[1]["owners"] = ["alison"]
cars[2]["owners"] = ["tom", "james", "jake"]

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(cars)

A List of Dictionaries of a List of Dictionaries

Nice. But we can go on building. Each car has a maintenance record, a very important thing for a second hand car. Our data structure can sprout a new limb to deal with it. We will have a list of maintenance tasks for each car, and each task will be a dictionary describing the properties of the maintenance event. We are going to add another list of dictionaries.

The following assignment for car[0], the Alfa, will instantiate the list and populate its first element with a dictionary:

cars[0]["history"] = [ { "desc": "new clutch", "cost": "2314", "date": "4/3/2012" } ]

Note that cars[0][“history”] is a new node in the cars[0] dictionary. On the right hand side of the assignment, the square brackets bring about a new list and the inner braces define its first element as a new record, or dictionary.

Now use the same technique with car 1 (the Land Rover), but indent the code this time, for added readability:

cars[1]["history"] = [ { "desc": "cambelt replacement",
                         "cost": "1688",
                         "date": "5/5/2014" } ]

For car 2 (the Aston), we will use a slightly alternative method, just to show it that it all turns out the same:

cars[2]["history"] = []
cars[2]["history"].append ( { "desc": "new engine",
                              "cost": "9599",
                              "date": "30/8/2010" } )

Here, for the Aston, the (empty) maintenance list was instantiated with the first statement. An element was added to it with the append function, being a dictionary containing the details of that first maintenance event. £9,599 seems dear for a new engine, but it is
a V12.

There was some minor trouble with the Aston’s steering a year later. We can use append again:

cars[2]["history"].append ( { "desc": "wheel alignment",
                              "cost": "125",
                              "date": "4/9/2011" } )

Addressing the Structure

Cool. We have a list of dictionaries of lists of dictionaries. And we can address it as such. Check the date of the Aston’s wheel alignment:

print "Car 2 had a wheel alignment on", cars[2]["history"][1]["date"]

Output:

Car 2 had a wheel alignment on 4/9/2011

We could also have used the “-1” index on the history array to get the last maintenance event, which in this case amounts to the same thing:

print "Car 2 last maintenance was a wheel alignment on", cars[2]["history"][-1]["date"]

Car 2 last maintenance was a wheel alignment on 4/9/2011

Local References

Any node in the structure can be abstracted into another variable, making it less onerous to work with the data. For example:

last_alfa_maint = cars[0]["history"][-1]
print "Last Alfa maintenance was a", last_alfa_maint["desc"], "on", last_alfa_maint["date"]

Output:

Last Alfa maintenance was a new clutch on 4/3/2012

last_alfa_maint acts like a reference in Perl. It can of course be used to write data into the structure too. Here, the date is altered from 4/3/2012 to 5/3/2012:

last_alfa_maint["date"] = "5/3/2012"
print "Last Alfa maintenance was a", last_alfa_maint["desc"], "on", last_alfa_maint["date"]

Output:

Last Alfa maintenance was a new clutch on 4/3/2012
Last Alfa maintenance was a new clutch on 5/3/2012

Code Review

Our finished example program:

#!/usr/bin/python

import sys
import pprint


cars = []

cars.append ({"make": "alfa", "colour": "red"})
print cars

cars.append ({"make": "land rover", "colour": "green"})
cars.append ({"make": "aston", "colour": "gold"})

cars[0]["ownercount"] = 2
cars[1]["ownercount"] = 1
cars[2]["ownercount"] = 3

cars[0]["owners"] = ["tom", "james"]
cars[1]["owners"] = ["alison"]
cars[2]["owners"] = ["tom", "james", "jake"]

cars[0]["history"] = [ { "desc": "new clutch", "cost": "2314", "date": "4/3/2012" } ]
cars[1]["history"] = [ { "desc": "cambelt replacement",
                         "cost": "1688",
                         "date": "5/5/2014" } ]

cars[2]["history"] = []
cars[2]["history"].append ( { "desc": "new engine",
                              "cost": "9599",
                              "date": "30/8/2010" } )

cars[2]["history"].append ( { "desc": "wheel alignment",
                              "cost": "125",
                              "date": "4/9/2011" } )


pp = pprint.PrettyPrinter(indent=4)
pp.pprint(cars)

print "Car 2 had a wheel alignment on", cars[2]["history"][1]["date"]
print "Car 2 last maintenance was a wheel alignment on", cars[2]["history"][-1]["date"]
last_alfa_maint = cars[0]["history"][-1]
print "Last Alfa maintenance was a", last_alfa_maint["desc"], "on", last_alfa_maint["date"]
last_alfa_maint["date"] = "5/3/2012"
print "Last Alfa maintenance was a", last_alfa_maint["desc"], "on", last_alfa_maint["date"]

And the output:

[   {   'colour': 'red',
        'history': [   {   'cost': '2314',
                           'date': '4/3/2012',
                           'desc': 'new clutch'}],
        'make': 'alfa',
        'ownercount': 2,
        'owners': ['tom', 'james']},
    {   'colour': 'green',
        'history': [   {   'cost': '1688',
                           'date': '5/5/2014',
                           'desc': 'cambelt replacement'}],
        'make': 'land rover',
        'ownercount': 1,
        'owners': ['alison']},
    {   'colour': 'gold',
        'history': [   {   'cost': '9599',
                           'date': '30/8/2010',
                           'desc': 'new engine'},
                       {   'cost': '125',
                           'date': '4/9/2011',
                           'desc': 'wheel alignment'}],
        'make': 'aston',
        'ownercount': 3,
        'owners': ['tom', 'james', 'jake']}]
Car 2 had a wheel alignment on 4/9/2011
Car 2 last maintenance was a wheel alignment on 4/9/2011
Last Alfa maintenance was a new clutch on 4/3/2012
Last Alfa maintenance was a new clutch on 5/3/2012

Conclusion

Python supports custom data structures just as do other languages like Perl, C and Pascal. The syntax is a bit unexpected for Perl programmers, and perhaps strange for people new to programming.

Note that there is no “autovivification” like there is in Perl. In Python, Dictionaries (hashes) and lists (arrays) must be explicitly instantiated before they are populated. On the upside, this prevents errors arising with structure nodes being unexpectedly created. Eg. just testing a non-existant structure node in Perl can “accidentally” create the node, and extensive “exists()” tests are required to avoid doing just that while walking a structure.

2 thoughts on “Complex Data Structures in Python

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.