Dictionaries Here are the bare bones. I will demonstrate dictionaries in class. The next lecture will include many scripts and functions that we can write to process dictionaries. As with lists, the reality of understanding dictionaries is understanding the basic operations that we can perform on them. There are fewer operations that we can perform on dictionaries than lists, but their meanings are a bit more subtle. Dictionaries are like generalized lists: lists associate indexes with values, with the indexes ranging from 0 to len(...)-1; dictionaries (the dict type) associate arbitrary keys (not just integers) with values. Often the keys are strings and often values are ints, strings, or lists/tuples. The three major categories of operations on dictionaries are (1) setting the value associated with a key (2) examining the value associated with a key (and sometimes mutating it) (3) iterating through a dictionary (there are three ways: by keys, by values, and by items: items are 2-tuples consisting of a key and its value); dictionaries are not ordered with indexes (like strings/lists/tuples) so there is NO STANDARD ORDER that they are iteratated over, but we will learn about the sorted function, which we canuse to iterate through dictionaries in a special order. So, like lists, dictionaries are mutable data structures. Sometimes dictionaries are called maps, because they embody how to map a key to its associated value. Dictionaries have literals: any number of key:values pairs (0 or more, with the key and its associated value separated by a colon), with different key:value pairs separated by commas, all in braces: the empty dictionary is written {}. Note that a key/value pair, stored as a tuple, is called an "item" (see the iteration over .items() in a dictionary). Here are example of dictionary literals. a = {'a':1, 'b':2, 'c':3} is a dictionary of str:int associations b = {1:'a', 2:'b', 3:'c'} is a dictionary of int:str associations c = {'penny':1, 'nickel':5, 'dime':10, 'quarter':25} like a, but useful information d = {'bob':['ICS-31','MATH-2A','ICS-6B'], 'mary':['ICS-31','BIO-9','ICS-6D']} is a dictionary of str:list[str] e = {'even': {0:'even',1:'odd'}, 'odd': {0:'odd',1:'even'}} is a dictionary whose values are sub-dictionaries (just as we can have lists whose values are sub-lists. We describe this annotation as {str:{str:str}}. f = {'a':'str', 2:'int', (0,0):'tuple'} is a dictionary whose keys are different types; we cannot use lists as KEYS in dictionaries, but we can use tuples. The difference is mutability: all keys must be immutable (recall strings and ints are immutable, so we can use them as keys); more on this later. Dictionary operations: (1) len: we can compute the length of a dictionary (# of key:value pairs at the top-level) len(a) is 3; len(b) is 3; len(c) is 4, len(d) is 2; len(e) is 2. (2) Indexing: we can refer to each value in a dictionary by its key: we index a dictionary by a key, computing its value: a['a'] is 1; b[3] is 'c'; c['quarter'] is 25; d['bob'] is ['ICS-31','MATH-2A','ICS-6B'] and note that d['bob'][0] is 'ICS-31'; e['even'][1] is 'odd'; f[(0,0)] is 'tuple' Note the asymmetry: given a key we can easily find its value, but given a value we cannot easily find its key: in fact, although keys are unique, there may be muliple values with different keys: {'alex':24, 'jessie':24}. Note that c['peso'] raises an exception: KeyError: 'peso'; because there is no value in the dictionary c associated with 'peso'. Another way to say this is there is no key 'peso' in this dictionary. (3) No Slicing (4) Checking containment: the in/not in operators These operators work on the KEYS in a dictionary 'a' in a is True; 'a' in b is False, but 3 in b is True; 'peso' in c is False; 'bob' in d is True; 0 in d['even'] is True; 'int' in f is False Sometimes we will see code like the following: it checks whether a key (x) is in a dictionary (d) before using the key as in index, to ensure no exception is not raised. if x in d: print(d[x]) (5) No Catenation (6) No Multiplication (7) Iterability: there are three ways to iterate through a dictionary: by its keys, by its values, and by its items (tuples of key:value pairs). Each produces len(...) values, but their order is not fixed. for k in d:/for k in d.keys(): produce all top-level keys in d for v in d.values() : produce all top-level values in d for kv in d.items() : produce all top-level (key,value) pairs in d We can write for k in sorted(...) if the keys/values/items can be compared for order (they are produced in the order specified: as with the the sort method, we can optionally supply key/reverse arguments to control the order). Note that we cannot sort a dictionary, but we can iterate over the keys in a sorted order; this is an important distinction, and is unlike lists. for k in c: print(k,end=';') prints: penny;dime;nickel;quarter; for k in sorted(c): print(k,end=';') prints: dime;nickel;penny;quarter; This is similar to keys = list(c) # created list by iterating over d's keys keys.sort() # sort the list (lists are sortable) for k in keys: # iterate through sorted keys list print(k,end=';') for k in sorted(c.keys()): # a common idiom: keys and values print(k,c[k],sep=':',end=';') prints: dime:10;nickel:5;penny:1;quarter:25; for kv in sorted(c.items()): print(kv,end=';') prints: ('dime', 10);('nickel', 5);('penny', 1);('quarter', 25); for kv in sorted(c.items()): print(kv[0],kv[1],sep=':',end=';') # note how each tuple in kv can have its indexes 0 and 1 accessed prints: 'dime':10;'nickel:'5;'penny':1;'quarter':25; for k,v in sorted(c.items()): # a common idiom: keys and values print(k,v,sep=':',end=';') # note how each tuple produced is unpacked into the names k and v prints: 'dime':10;'nickel:'5;'penny':1;'quarter':25; for v in sorted(c.values()): # this form isn't common but can be used print(v,end=';') prints: 1;5;10;25; In fact, we can write the sorted function as def sorted(iterable,key=None,reverse=None): alist = list(iterable) alist.sort(key=key, reverse=reverse) return alist which creates a list with every value produced by iterating over the iterable parameter. Then it sorts that list using the key/reverse parameters. Finally it returns that list. So when writing for i in sorted(...): i takes on every value (one after another) that is in the list returned by calling the sorted function: a list whose values are sorted. Dictionary (mutation) operations (a) Assignment Suppose x = {'a':1, 'b':2, 'c':3} x['a'] is 1 x['a'] = 'z' now x is {'a':'z', 'b':2, 'c':3}; the value associated with key 'a' has been changed to 'z' x['b'] = ['b1','b2'] now x is {'a':'z', 'b':['b1','b2'], 'c':3}; the value in key 'b' has been changed to ['b1','b2'] If we assign to a key not already in the dictionary, Python just adds it (very different from lists, which require a call to append). x['d'] = 'new' now x is {'a':'z', 'b':['b1','b2'], 'c':3, 'd':'new'}; key 'd' is added to the dictionary, with an associated value of 'new'. len(x) goes from 3 to 4. So, looking up a non-existant key raises KeyError, but assigning a non-existant key is fine. This process if like like bind names in general: to look up the value of name, the name must be bound to some value; but to set/rebind a value to a name the name can exist (change its value)or not exist (create the name and set its value) (b) del form: del adict[key] del x['b'] now x is {'a':'z', 'c':3, 'd':'new'} if we try to delete a key not in adict, Python raises KeyError; we can always write if key in adict: del adict[key] to ensure this exception is not raised. (c) The get function is convenient to explain here, but it does not mutate the dictionary adict.get(key,default) Same as adict[key] except if key is not in adict, it returns the value of default (and if default is not specified, returns None); but unlike [] indexing, it never raises KeyError Similar to the following function definition (we'll learn the truth soon) def get(key,default=None) if key in adict: return adict[key] else: return default (d) adict.setdefault(key,default) Same as adict.get(key,default) except if key is not in adict, it first adds the key:default pair to the dict and then returns the value of adict[key], which is now guaranteed to exist. Similar to the following function definition (we'll learn the truth soon) def setdefault(key,default=None) if key not in adict: adict[key] = default return adict[key] # which is default So setdefault is like get, but if the key is not found, it not only returns default, but first mutates the dictionary, puting the key:value pair in it. (e) adict.pop(key) or adict.pop(key,default) First form: removes the key (and its associted value) from the dict and returns the value associated with the key: it raises a KeyError if key is not in adict. Second form: the same, but returns default if key is not in adict, not raising an exception adict.popitem() Removes a random key from the dictionary and returns the key:value as a tuple: raises KeyError if the dictionary is empty: len(adict) == 0 (f) adict.clear() Deletes all key:value pairs from the adict; equivalent to adict = {} (g) keys(), values(), items() are technically called views of the dictionary. Besides iterating over these views, we can check if values are in/not in them (h) d = dict(...) Recall that we can write list(...) and tuple(...) to construct a list/tuple that contains all the values specified by ... For dict, there are two forms of ... (1) anything we can iterate over, that produces a sequence of 2-list/-tuple which is treated as a key/value. For example, we write a list of 2-tuples: d = dict([('a',1),('b',2),('c',3)]) is the same as d = {'a':1, 'b':2, 'c':3} (2) a list of parameters of the form p=v, where p (as a string) becomes a key and v becomes its associated value. This only works for keys that are string, but this is very common. d = dict(a=1,b=2,c=3) is the same as d = {'a':1, 'b':2, 'c':3} (i) adict.update(...) Update adict with the information in ..., with three forms of .... (0) another dict (1) see (h) above (2) see (h) above so, if adict is {'a':100, 'b':200, 'c':300} the following adict.update([('a',1),('x',2),('c',3)]) # note the 'x' adict.update({'a':1, 'x':2, 'c':3}) # note the 'x' adict.update(a=1,x=2,c=3) # note the 'x' all are like executing adict['a'] = 1 adict['x'] = 2 adict['c'] = 3 which result in adict being {'a':1, 'b':200, 'c':3, 'x':2} Students use update too much. Try to use [] and mutation instead. For example, if classes is a dictionary whose keys are UCInetIDs and whose associated values are the courses that student is taking, writing classes['pattis'].append('ICS-31') would mutate the list associated with UCInetID 'pattis' to also include 'ICS-31'. ------------------------------------------------------------------------------ Comprehensions As with lists/tuples, we can build dictionaries via comprehensions as d = {dict-comprehension} The form of a dict-comprehension is as follows (bool-expression-i is a boolean expression that can refer to i). Note that the [] mean EBNF option. key-expression-i:value-expresion-i for i in iterable [if bool-expression-i] So, to create a dictionary of keys that are strings and values that are their lengths, we could write d = {s : len(s) for s in 'Four score and seven years ago'.split(' ')} Here d is {'Four':4, 'score':5, 'and':3, 'seven':5, 'years':5, 'ago':3} If we wanted only the words longer than 3 characters, we could include the option and write: {s : len(s) for s in 'Four score and seven years ago'.split(' ') if len(s)>3} Generally, we can translate a dictonary comprehension as follows. comprehension = {} for i in iterable: if bool_expression-i: comprehension[key-expression-i] = value-expression-i As with list, comprehensions are good for creating new dictionaries, but not for mutating existing dictionaries ------------------------------------------------------------------------------ Putting it all Together: Some Real Dictionary Code (The Power of Dictionaries) Suppose that we have a list of words and we want a count of how often each word appears. Here is a function that takes such a list as an argument and returns a dictionary of how often each word occurs. def count_words(alist): answer = {} # create empty dictionary for w in alist: # iterate over all words in alist if w in answer: # check if that word is a key answer[w] += 1 # udpate count for existing key else: answer[w] = 1 # create count for non-existing key return answer We can simplify this function by using setdefault (review its meaning). It is so useful just because this update-idiom is so frequent. In fact, we will learn about defaultdict that even makes this idiom easier to express. def count_words(alist): answer = {} for w in alist: answer[w] = answer.get(w,0) + 1 return answer In both cases, calling count_words('how much wood could a woodchuck chuck if a woodchuck could chuck wood'.split(' ')) returns {'could': 2, 'much': 1, 'chuck': 2, 'if': 1, 'a': 2, 'how': 1, 'wood': 2, 'woodchuck': 2} Notice that the order of the key:value pairs is indeterminate. If we wrote answer = count_words(...) for k in sorted(answer): print(k,'->',answer[k]) the printed result would be a -> 2 chuck -> 2 could -> 2 how -> 1 if -> 1 much -> 1 wood -> 2 woodchuck -> 2 If you understand all aspects of count_words, you have a good starting understanding of dictionaries.