ICS 32 Winter 2022
Python Background Notes: Sets and Dictionaries
Why tuples and lists aren't enough
Thus far, we've seen two different data structures that are built into Python: tuples and lists. At a quick glance, they seem to be essentially the same thing, because they each store a collection of elements where the order of those elements is considered relevant. The key difference, though, lies in how flexible their structure is:
Why tuples and lists are useful for different things has mostly to do with a difference in their mutability. Tuples are set in stone once you create them; they are what they are. Lists can be mutated during their lifetime.
So, ultimately, there's value in having both tuples and lists available to us, because they each fit a different kind of problem. But they aren't enough, because not all problems require storing an ordered sequence of elements. For differently-shaped problems, you'd need different structures. Fortunately, Python provides some additional choices, fitting additional commonly-occurring problems very nicely.
Sets
You've likely been introduced to the mathematical concept of sets at some point. In mathematics, a set is a collection of zero or more unique values. What's important about a set is what's in it and what's not in it; there is no meaningful notion of ordering involved. For example, the set {1, 2, 3} is considered to be equivalent to the set {3, 1, 2}, because they both contain the same collection of values.
As it turns out, sets aren't just a mathematical curiosity. Sometimes, you'll want your program to manage a collection of objects with the same characteristics:
When you have a problem that has those characteristics, a set provides a nice way to solve it. Obviously, not all problems have those characteristics, but some certainly do. If, for example, you were implementing a program with a graphical user interface, which contained a collection of buttons that could each be toggled on or off, and you wanted to know which buttons had been toggled on, you might store them in a set. Why? Because this scenario fits those two characteristics:
How to create and use sets
Sets are actually pretty simple to create and use in Python, though there is a bit of additional syntax we'll need to learn. We've seen previously that lists are denoted, literally, by being surrounded by the bracket characters '[' and ']', with their elements separated inside by commas. Sets are written similarly, except they're surrounded instead by the curly-brace characters '{' and '}'.
>>> s = {'a', 'c', 'e'} >>> type(s) <class 'set'> >>> s {'e', 'a', 'c'}
One thing to note, right away, is that s was displayed to us differently from the way we wrote it. When we wrote it, we listed the strings alphabetically; when it was displayed to us, it was displayed in a different order. There are a couple of things to know about sets:
Just because sets aren't stored in any kind of order doesn't make them useless; it just means that you can't count on anything having to do with that ordering. But you can nonetheless ask the key questions you'd like to ask. For example, checking whether an element is in a set can be done using the in operator.
>>> 'e' in s True >>> 'f' in s False
(It should be noted that you can use this same in operator on lists, as well, though you might expect it to run a fair bit faster on a large set than on a large list. In ICS 46, we'll talk in more detail about why that is, too.)
Note, too, that the syntax for creating an empty set isn't what you might think. The syntax {} is legal in Python, but it builds something else (which we'll see a little later) called a dictionary.
>>> empty_set = {} >>> type(empty_set) <class 'dict'>
All is not lost, however; you can create an empty set, by using the set constructor.
>>> empty_set = set() >>> type(empty_set) <class 'set'>
Sets can be constructed from sequences, as well, which means all of the following examples are ways to construct sets. Notice, in every case, that duplicates are (intentionally) dropped, because sets cannot contain duplicate elements.
>>> set([1, 2, 1, 1, 2, 1, 2, 3]) {1, 2, 3} >>> set((1, 2, 3, 1, 2, 3)) {1, 2, 3} >>> set(range(10)) {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Sets can also be combined with other sets in the ways you've seen in mathematics before: There is a notion of union, intersection, set difference, and symmetric difference. The only tricky part is recognizing the operators.
Putting all of these together leads to the following examples.
>>> s1 = {1, 3, 5, 7, 9, 11} >>> s2 = {2, 5, 8, 11, 14} >>> s1 | s2 {1, 2, 3, 5, 7, 8, 9, 11, 14} >>> s1 & s2 {11, 5} >>> s1 - s2 {1, 3, 9, 7} >>> s1 ^ s2 {1, 2, 3, 7, 8, 9, 14}
(Note, again, that the order in which you see the elements in each resulting set may vary if you run these same expressions in the Python shell. There are essentially no guarantees about the visible ordering of elements in a set when it gets printed in the shell.)
Like lists, sets are mutable, though the only operators we've seen so far have the characteristic that they build new sets. But not all of a set's operators do. For example, there are "assigning" versions of the operators above, which modify the set on the left, similar to how the += opearator mutates a list (but the + operator does not).
>>> s = set() >>> s |= {3, 4} >>> s {3, 4} >>> s |= {5} >>> s {3, 4, 5}
Still, that's a pretty clunky syntax for adding an element to an existing set, so there is also an add method that does that job more readably, as well as an update method, which adds a sequence of elements to a set.
>>> s.add(6) >>> s {3, 4, 5, 6} >>> s.update([7, 8, 9]) >>> s {3, 4, 5, 6, 7, 8, 9}
Whether or not other operations that you've seen on lists are supported on sets is mainly a matter of whether they're sensible when used on a set. Indexing, for example, is not supported.
>>> s = {1, 2, 3} >>> s[0] Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> s[0] TypeError: 'set' object does not support indexing
This is a sensible restriction, because indexing this way implies that there's a definitive order (i.e., one of them is definitively the first one in the set).
On the other hand, you can iterate over a set using a for loop.
>>> for x in s: print(x) 1 2 3
Though it should be noted that you won't be guaranteed that the iteration will happen in any particular order. Note, too, that this means anything else that uses iteration will work as you would expect, with the same caveat about ordering.
>>> list(s) [1, 2, 3]
Dictionaries
Sets can be handy, but they are somewhat limited in their usefulness. There are problems that have set's "shape," but, in practice, there are a lot of data-storage problems that don't. However, we can extend the idea of a set a little bit and end up with something that has even broader usefulness.
In a dictionary, we store a collection of unique keys, each one associated with a value. The uniqueness of the keys in a dictionary means that the collection of keys, conceptually, forms a set. (This also means that the keys are not ordered, similar to how they aren't ordered in a set.) But the important distiction is that you can know something about each key besides just the key; by being able to associate a value with a key, you have a tool that you can use in different circumstances for which a set would be a poor fit.
How to create and use dictionaries
Creating a dictionary is generally done using the built-in function dict() and passing it no arguments. (Alternatively, the syntax {} is technically legal, though a little bit confusing, because you have to remember that the result is an empty dictionary, rather than an empty set. Clearer is usually better than shorter.)
>>> d = dict() >>> type(d) <class 'dict'> >>> d {}
Dictionaries are mutable, the way that lists and sets are; the expectation is that you should be able to use them to manipulate a collection of data that changes while your program runs. Mutating them is a matter of changing the value that is associated with some key, which you can do using an operation that looks very similar to indexing a list. The difference is that the index you pass it isn't a numeric index, necessarily; it's the key whose value you want to update.
>>> d['b'] = 3 >>> d['b'] 3 >>> d['b'] += 1 >>> d['b'] 4
There is wide latitude in which is allowed as a key in a dictionary; more or less anything that is immutable will do. (The reason why immutability of keys is such a big deal is that you don't want changes outside of the dictionary to suddenly have an impact of the set of keys within that dictionary.)
You can have as many different keys in the dictionary as you want, but each key can only appear once; any subsequent attempt to set its value replaces the value that's already there.
>>> d['b'] = 9 >>> d['q'] = -5 >>> d {'b': 9, 'q': -5}
You can see, also, that a dictionary's representation is similar to the representation of a set, except that each key is followed by a colon and a value. Like sets, the order of the keys is considered irrelevant, and there's no guarantees about what order you'll see them in. Note, too, that this is also legal syntax for writing dictionary literals.
>>> d2 = {'a': 1, 'b': 2, 'c': 3} >>> d2 {'a': 1, 'b': 2, 'c': 3}
Keys can be removed from dictionaries similar to how you remove values from other collections: by using the del operator.
>>> del d2['b'] >>> d2 {'a': 1, 'c': 3}
You can check for the presence of a key similar to the way you check for the presence of an element in a set, by using the in operator.
>>> 'c' in d2 True >>> 'b' in d2 False
Dictionaries can also be iterated, though you do need to be aware of what you get when you do so. When you iterate a dictionary, all you get back are the keys.
>>> for x in d2: print(x) a c
However, you can also ask a dictionary for all of its keys, all of its values, or all of its items.
>>> for x in d2.keys(): print(x) a c >>> for x in d2.values(): print(x) 1 3 >>> for x in d2.items(): print(x) ('a', 1) ('c', 3)
Notice that the latter of these, the items() method, returns both each key and its associated value. The natural way to return two values is a tuple, so that's what Python returns.
Example: Calculating the occurrences of characters in a string
Suppose that we wanted to write a function that takes a string as a parameter and can tell you how many time each different, unique character occurs in that string. So, for example, if you called this function with the argument 'BOO BEAR', then you'd expect to see something that told you the following:
You might call this kind of information a mapping, similar to a mathematical function. The order of the characters in our answer is irrelevant — except maybe from a standpoint of how we might display it, but that's a user interface problem, rather than a problem that this function should solve. All we want to know is what characters occur and, for each of those, how many times they occur.
Dictionaries are a perfect way to communicate this kind of information. The keys would be the characters; the values would be the number of times each character appears. There would never be more than one count for a particular character, so the fact that keys are unique is a good thing, rather than a limitation. We would only bother to store information for the characters that appeared at all; if a character doesn't appear, it will not be part of our output.
The following is one way you could write that function.
def count_chars(s: str) -> dict: # Start with an empty dictionary counts = dict() # Iterate through the characters of the string. For characters that # we've seen before, increment the count we already have. For # newly-seen characters, add them to the dictionary with a count of 1. for c in s: if c in counts: counts[c] += 1 else: counts[c] = 1 return counts