Dictionary Review: Building/Iterating

Now that we have had two lectures on dictionaries, I would like this one to
review the fundamentals of dictionaries, concentrating on the most frequently
used operators/iteration on dictionaries.

------------------------------------------------------------------------------

3 Important operations (and one less important one)

1) The "in" operator determines whether a k is associated with some values in a
dictionary. We can write it as k in d, and can also be written x in d.keys().

The in operator is often important when building dictionaries, to determine
whether a key is already in the dictionary, associated with some value: often
we perform one operation to update the values associated with a key in a
dictionary, but perform another operation to create a new key/value association
in a dictionary. The in operator helps us determine which operation to do (see
Building Dictionaries below).

2) Indexing a dictionary with a key retrieves the value associated with a key.
Often we either examine this value or update/mutate it (when building a
dictionary). We write d[k] to perform this retrieval. Note that if k is not
a key in the dictionary, then indexing d[k] raises an exception. Of course, we
can always check whether k is in a d by using the "in" operator. 

3) There are three ways to iterate through a dictionary. By far the two most
common iteration idioms are iterating through the keys and iterating through
the items (key value pairs, appearing in a 2-tuple). The first form is written
for k in d, and can also be written for k in d.keys(); the second form is
typically written for k,v in d.items().

We can also iterate over all the values associated with keys in a dictionary,
without refering to keys at all, although this is uncommon: it is written
for v in d.values().

Note that if nothing is stored in a dictionary, all these loops execute 0
iterations (never execute the code in their bodies).

4) Less frequently used is the del operator, which deletes a key from a
dictionary. It is written del d[k], which raises an exception if the key is
not in the dictionary; a variant is d.pop(k) which not only removes the key from
the dictionary, but returns its associate value.

------------------------------------------------------------------------------

Building Dictionaries

For building dictionaries, there are two cases we must code

  (1) The key is not already in the dictionary (initialize an associated value)
  (2) The key is already in the dictionary     (update its associated value)

Typically a dictionary is built by iterating through a string, range, lines is
a file, list, or even another dictionary.

Let's look at two specific common tasks: on the left, counting how often a key
occurs; on the right, adding a value to a list associated with a key. We will
assume that we want to initiallize/update the dictionary for the name "key"
(and in the list problem, add "value" to the list associated with "key").

(a) Here is an example of these two case for a standard dictionary, using only
the simplest dictionary operations. Assume we originally define/initialize d as
either d = {} or d = dict(): both are empty dictionaries

if key not in d:		if key not in d:
    d[key] = 1			    d[key] = [value]
else:	   			else:
    d[key] += 1			    d[key].add(value)

(b) Here is an example which simplifies the if/else to just an if statement
followed by a second statement. The if statement initializes the dictionary for
that key (if it is not there) and then the second statements updates the value
associated with the key. In both cases, the access d[key] in the second
statement will never raise an exception, because by the time it is executed
key is guaranteed to be in the dictionary: if it was not originally in the
dictionary, the if statement will put it there.

if key not in d:		if key not in d:
    d[key] = 0			    d[key] = []
d[key] += 1			d[key].add(value)


(c) These forms also leads to the study/use of setdefault, to simplify this
code into a single line.

d.setdefault(key,0) += 1        d.setdefault(key,[]).add(value)

Recall the setdefault acts as follows, which does the if/test itself, so we
don't have to write it in our code above. It always returns a reference to
the value associated with key (even it this function itself creates the
association).

def setdefault(key,default=None)
    if key not in adict:
        adict[key] = default
    return adict[key]

(d) In the next lecture we will learn about a more advanced kind of dictionary
called defaultdict (which we must import from the collections module). With
this data structure we define the dictionary as follows.

d = defaultdict(int)		d = defaultdict(list)

If a defaultdict accesses a key that is not in the dictionary, unlike for dict
it does NOT raise an exception. Instead, in the first case it will put an int()
associated with that key (int() is 0; just something you need to know); in the
second case it will put a list() associated with that key (list() is the same
as the literal []: it just constructs an empty list). By using a default dict,
we need only the following

d[key] += 1			d[key].add(value)

In both cases, by using a default dict we do not have to call setdefault
explicitly: setdefault's actions are automatically done by the default dict

On the left, if key exists it will be incremented; if it doesn't exist, it will
be automatically associated with the value 0 and then incremented.

On the right, if key exists it will have value appended to its list; if it
doesn't exist, it will be automatically associated with an empty list, and then
will have value appended to its list.

(e) If we wanted not duplicate value in the list, the best code would be

       	  	     	        if key not in d:
				    d[key] = []
				if value not in d[key]
       	  			    d[key].add(value)

If we don't want duplicates, instead of using a list associated with a key we
should use a set (which is covered in the next lecture).

------------------------------------------------------------------------------

Iterating Through Dictionaries (where order is unimportant)

Once we build a dictionary, we often examine it by iterating over its keys,
and items (and less likely values). Thus, there are 3 standard ways to iterate
over dictionaries. Here are three simple code fragments to print every
key/value in the dictionary.

(1) Iterate over keys: use the dictionary to get the value associated with each
key. Often when iterating over just keys, the name k is used; if you know
something more explicit about the keys (say they are all words) we can write
for word in d:

for k in d:  # or more explicitly for k in d.keys()
    print(k,'->',d[k])
    
(2) Iterate over items (a 2-tuple): print the key (index 0) and its associate
value (index 1); often when iterating over items, the name kv is used; if you
know something more explicit about the keys (say the keys are words and the
values are counts) we can write for word_count in d:

for kv in d.items():
    print(kv[0],'->',kv[1])


(3) Iterate over items, but "unpacking" the 2-tuple into two names

for k,v in d.items():
    print(k,'->',v)
    
If we wanted to print just the values (not the keys at all) we could use any of
the three previous forms and just not print k (or kv[0]) but better is the
following form

for v in d.values():
    print(v)

Typically we iterate over keys; if we aren't concerned with their values at all,
we use the first form of iteration. But, if we iterate are going to find the
value associated with each key, it is most convenient to use the third form of
iteration.

------------------------------------------------------------------------------

Iterating through Dictionaries in Special Orders

Generally, list/tuples are ordered (index 0, 1, ...), but not dictionaries. If
we want to iterate through a dictionary in a special order, we typically create
a list of the dictionary's keys (or a 2-tuple of its keys and their associated
values), sort that list, and then iterate over the list. For example, for a
dictionary d:

keys = list(d)         # or keys = list(d.keys): put all the keys in a list
keys.sort()	       # sort the list (defaulting the key/reverse parameters)
for k in keys:	       # iterate in sorted order, getting each key
    print(k,'->',d[k])

Likewise we could write

If we call sort and specify reverse=True then it sorts in the opposite order
(high to low not low to high).

If we supply a function to the "key" parameter of the sort method, Python sorts
not by the standard ordering of keys but by the standard ordering of calling the
"key" function on each key. If keys are strings, we could call
keys.sort(key=lambda x : x.upper()) in which case 'rich' and 'Rich' compare the
same. Or, we could call keys.sort(key=lambda x : x[-1])) so that the strings
are sorted, but by their last letter only.

But what if we want an even more complicated order, based on items (both keys
and values)? We can use a similar technique, storing items (keys and values) as
2-tuples in a list, sorting that list (in even more interesting ways), and then
iterating over the list of keys and their values.

items = list(d.items())# put all the items, 2-tuples, in a list
items.sort()	       # sort the list (defaulting the key/reverse parameters)
for k,v in items:      # iterate in sorted order, getting each items
    print(k,'->',v)

Note that when sorting 2-tuples, the values are sorted based on index 0; if two
tuples have the same value for index 0, they are sorted according to index 1.
So ('a', 10) appears before ('z', 1) but after ('a', 2).

So, let's ignore dictionaries (which cause confusion because dictionaries
have keys and there is a key parameter in sorting) and focus just on lists:
even if we want to sort items in a dictionary, once we put the items in a list,
we are just sorting lists.

So, if we have a dictionary d with keys that are strings and values that are
integers, we might specify l = list(d.items()). Now we have list
l = [('b',3), ('a', 2), ('c',1)]. Here are some sorting examples.

l.sort() changes l to be [('a', 2), ('b', 3), ('c',1)]

because sorting tuples first orders them by index 0, then by index 1 (for all
tuples with the same index 0)

l.sort(key=lambda x : x[1]) changes l to be [('c',1), ('a', 2), ('b', 3)]

because the key function used on teach tuple in the list selects the second
value in the tuple to sort on.

l.sort(key=lambda x : x[1], reverse=True)

  changes l to be [('b', 3), ('a', 2), ('c',1)]

because it uses the same key function as before (using the second tuple value),
but reverse=True reverses the order.

In fact, if we has a list of dates, where index 0 is the month, index 1 is the
day, and index 2 is the year (5, 19, 2014) represents the May 19, 2014, then
we could sort a list of dates as follows.

l.sort(key=lambda x : (x[2],x[0],x[1])) which would sort the list where each
tuple is compared to another according to its year (first), month (second), and
day (third). So (5, 19, 2014) would appear before (5, 20, 2014) because the
tuples specifed by the lambdas (2014, 5, 19) < (2014, 5, 20). The general way
to compare tuples is according to index 0; but if index 0 is the same, according
to index 1; but if index 1 is the same, according to index 2; etc.

Finally, Python allows us to simplify this process by using a function named
"sorted". Unlike "sort", which is a method that mutates a list, "sorted" is a
function that produces a list from anything it can iterate over. It has the
same key and reverse parameters as "sort".

So, we can call "sorted" in a for loop. So instead of writing

kv = list(d.items())
kv.sort(key=lambda x : x[1], reverse=True)
for k,v in kv:
    print(k,'->',v)

we can write the more compact

for k,v in sorted(d.items(), key=lambda x : x[1], reverse=True):
    print(k,'->',v)

For the dictionary d = {'a':2, 'b':3, 'c':1}, both of which would print 

b -> 3
a -> 2
c -> 1

In fact, we can write the sorted function as

def sorted(iterable,key=None,reverse=False):
   alist = list(iterable)
   alist.sort(key=Key,reverse=reverse)

Recall that sort has parameters key and reverse; we match those with the
arguments passed to key and reverse in the sorted method.