ICS 32 Winter 2022
Python Background Notes: Functions


Functions and abstraction

We've seen already that Python includes a collection of functions built into the language, such as type(), len(), and constructors such as int() and str(). The ones we've seen so far have been pretty fundamental — in the sense that they mainly don't do jobs that we could have done in some other way in Python, but instead provide baseline functionality that we need in order to do bigger things. Fundamental functions like these are a good place to start, but we can achieve more with functions than just fundamental things; certainly, not all functions are fundamental.

The key benefit a function provides in a design is abstraction. Abstraction is hiding complexity beneath a veil of simplicity. It takes something that might be quite complex and makes it easy to use, so that you don't have to know every detail of how it works in order to use it. To use an abstraction, you need to understand how to interact with it — what you have to do, and what result or effect you expect to get back in return — but not how it works. For example, imagine you do something like this in the Python shell.

>>> name = 'Alex'
>>> len(name)
4

When you do this, it's not important that you understand how the len() function is able to determine the string's length. Maybe there's a while loop in the len() function that counts characters. Maybe the length is just an integer stored in another variable, hidden behind the scenes somewhere. Maybe the operating system or the underlying hardware has some mechanism for tracking it. Whatever the implementation details are, all you need to know is that some objects in Python have a length, and if you call the len() function, you obtain that length. That's it.

So, if functions are such a powerful device for hiding complexity, it stands to reason that we should want to write our own functions, too. If Python can provide us built-in functions that make certain commonly-done tasks easier, that's nice in itself. But if we can write our own functions, we can take the complexity that arises in our own designs — things that aren't so common that everyone who writes Python programs would need them, but that are instead specific to the program we're writing — and neatly hide it away. In small, simple programs, this probably isn't that big of a deal. But most software isn't just written once and thrown away, but is instead built and maintained over a long of period of time, often by many different people. So we need a way to allow someone to work on one part of the program without accidentally introducing problems into other parts of it; if someone has to remember every detail of a 50,000-line program in order to successfully change any detail about it, no one will ever be able to change it successfully. Isolating indivdual portions of our program from one another as much as we can is the only way we can build large programs that stand the test of time. Functions are the most fundamental tool for achieving that isolation.


Writing functions

We can introduce a new function into a Python program by using a statement called def. (The word def is short for definition or function definition.) Like everything else in Python, you can do this in the Python shell, though we'll much more often do it in scripts.

A def statement is a compound statement, the way that if statements and loops are compound; defs have other statements inside of them. The syntactic mechanism for expressing this is the same: We write a colon at the end of the first line of the def, then use indention to indicate what other statements are meant to be inside of it.

So what might you need to say in order to define a function? Let's consider how you call a function. (Calling a function is what you do when you want to use it. When you define it, you want it to be ready to be used, so there's a correspondence between how you use it and what you might say in order to define it.)

As a first example, we could define a function gimme_five in the Python shell. Suppose that our intent is to be able to call it by passing it no arguments and, no matter what, it will always return the integer 5.

>>> gimme_five()
5

How do we achieve that intent? Here's the definition of that function, followed by a call to it.

>>> def gimme_five():
        return 5

>>> gimme_five()
5

Let's unpack this syntax a bit:

Calling a function that returns an integer, like this one, is a lot like any other expression that returns an integer. This means you can take that integer and do whatever you'd like with it: print it, store it in a variable, use it in an arithmetic expression, and so on. Every time the function is called within an expression, its body is executed, and its return value becomes the value of that call within the expression.

>>> print(gimme_five())
5
>>> x = gimme_five()
>>> x + 10
15
>>> gimme_five() + (gimme_five() * 3)
20

The difference between a function and a function call

I should point out, too, that calling a function requires the parentheses, even if you're not passing any arguments to it; it's the parentheses that establish that you want to call it. That doesn't mean you can't evaluate gimme_five without the parentheses in the Python shell, but you should be aware that you would be doing something very different: Its result would be the function itself, as opposed to the result you'd get from calling it. Functions, as it turns out, are objects, just like strings, ints, bools, and so on; their type is function.

>>> gimme_five
<function gimme_five at 0x000001E5497A16A8>
>>> type(gimme_five)
<class 'function'>

(The funny-looking value 0x000001E5497A16A8 is an address in memory; you might find, if you try the same thing, that you get a different address than I did. For the most part, we're unconcerned about addresses in memory when we write Python code, and we have little or no control over where things will be stored. We care about two addresses being equal or different sometimes, but not so much about specifically where things are.)

The implications of functions themselves being objects are more powerful than you might at first realize, but that's a conversation we'll return to later. For now, you should be aware that you need parentheses both around the arguments when you call a function and around the parameters when you define them, even when there aren't any arguments or parameters.

Parameters and arguments

More often than not, the functions you write will need parameters. Functions are each intended to do a job; quite frequently, it takes some kind of input for the function to know what job you want done, so you'll need to pass arguments to it (and the function will need parameters to accept those arguments). The print() function needs arguments because it needs to know what you want printed; the int() function needs an argument because it needs to know what value you want to attempt to convert to an integer; and so on.

If you want to write a function that accepts arguments, you'll need to define the corresponding parameters. You'll also need them to have names, because you'll need to refer to them within the body of the function. Parameters are a lot like variables, in the sense that their job is to store an object and allow it to be used later. The difference is that they are more temporary than the variables we've seen so far; they live only as long as the function is executing, then they're destroyed.

Suppose you wanted to write a function that takes a number and tells you its square (i.e., the result of multiplying that number by itself). After you've written the function, you might expect to be able to do this.

>>> square(3)
9

How you would write that function is similar to how we wrote gimme_five() above, with the main difference being that we'll need to define one parameter (to accept the number that we want squared), and then we'll need to use that parameter within the body of the function (so we can square it).

>>> def square(n):
        return n * n

>>> square(3)
9
>>> square(5.5)
30.25

Defining a parameter in a function is as simple as listing the parameter's desired name within the parentheses after the function's name; this is enough to establish its existence. You can define multiple parameters by simply separating their names with commas. (If you've previously written programs in a language like Java or C++, you might wonder why you don't need to specify parameters' types. There is no explicit restriction on what type of value can be passed into a parameter, similar to how there is no restriction on what type of value can be stored in a variable.) Once you've defined a parameter, you're free to use it within the body of the function by simply specifying its name.

When you call a function, the arguments are matched to the parameters in the order specified, with the first argument passed into the first parameter, the second argument passed into the second parameter, and so on. (If the number of arguments doesn't match the number of parameters, an error will occur.) The body of the function is then executed, with the values of the parameters being whatever was passed into them. So, in the case of the call square(3) above, the following things happen.

Type errors

Even though the types of the parameters are not explictly specified, there is nonetheless an assumption being made within the body of the function about their types. By virtue of what we do with those parameters within the function, a particular type of argument might be compatible or incompatible with it.

For example, the square() function we wrote above is making an implicit assumption, even if we didn't say it directly in the code we wrote. The value of n has to be something that can be multiplied by itself. This means that n could certainly accept an integer or a float — because you can certainly multiply numbers by numbers — but could not accept a string.

However, even though our square() function can't successfully process a string argument that you pass to it, it is still possible for a Python program to run with this line of code in it.

square('Boo')

The program would still be syntactically legal Python, so it would still be possible for it to run. However, it wouldn't necessarily run successfully; the function square will fail, at run time, when it's called with an argument that can't be multiplied by itself.

We can see this in the Python shell by trying to call it that way.

>>> square('Boo')
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    square('Boo')
  File "<pyshell#2>", line 2, in square
    return n * n
TypeError: can't multiply sequence by non-int of type 'str'

A couple of interesting facts emerge from this example.

Docstrings

The body of a Python function can begin with a docstring, which is a string literal that describes — to a human reader — how a function works. The best docstrings briefly describe what the function's job is, along with anything that one would need to know about its parameters, its return value, and the ways in which it might fail. What you'll find, generally, is that the need to write a long, complex docstring is actually indicative of a function that is solving too many problems; functions that have a single responsibility will tend to have short docstrings, for the simple reason that there won't be that much to say about them.

Writing a function with a docstring is as simple as beginning its body with a string literal.

def square(n):
    'Computes the square of a numeric argument'
    return n * n

When you write functions in this course, you'll generally want to write a docstring, both to communicate your design goals to us, but also to ensure that you're thinking about them yourself. If you can't figure out what to write in a docstring, how can you understand what function you're trying to write? How will you know when you're done?

It's worth noting, too, that Python provides multi-line string literals, which are denoted syntactically by being surrounded by three single-quote characters on either side. That provides a nice way of writing a docstring that is long enough not to fit readably on a single line.

def square(n):
    '''
    Computes the square of a numeric argument, while failing when
    given an argument that is not numeric.
    '''
    return n * n

Writing functions in Python scripts

Aside from when we're experimenting, most of the functions we write in Python will be written in Python scripts. Functions allow us to take complexity in our program and "hide" it — not in the sense that the complexity can't then be seen, but in the sense that we can then call the function without considering every small detail of how it works. Where this kind of thing pays off is when we're writing programs that we can use again and again, so it stands to reason we would tend to benefit from this when we're writing scripts.

We write functions in Python scripts the same way we do in the Python shell, by writing def statements. For example, we could create a new Python script and write these statements in it.

def square(n):
    return n * n


def cube(n):
    return n * n * n

Suppose we then ran that script in IDLE. Here's what we would see, after the shell restarted.

>>>

Why didn't we see any numbers get squared or cubed? Remember that there is a difference between defining a function and calling it. Here, we have a script that defines two functions, square and cube, but doesn't call either of them. Defining a function makes it available to be called, but doesn't actually call it. Of course, having executed the script, the functions will have been defined, so we could then call them within the Python shell.

>>> square(4)
16
>>> cube(5)
125

Many Python scripts we write will only contain definitions. In other words, their role will be to make things available to other scripts, but not to do anything on their own. But when we want to write Python scripts that are stand-alone programs — ones that should do something when we run them — then they'll need not only to define functions, but also to call them somewhere. The simplest way to do that is to include the calls directly within the script.

def square(n):
    return n * n


def cube(n):
    return n * n * n


def read_number():
    return int(input('Enter a number: '))


num = read_number()
print('The square of', num, 'is', square(num))
print('The cube of', num, 'is', cube(num))

The order in which we say all of this matters. Python will execute this script in the order that it's written; executing it will cause the following things to happen in the following order.

So, if we executed this script in IDLE, we would be able to have the following interaction with it (including evaluating some additional expressions in the shell after it finishes running).

Enter a number: 5
The square of 5 is 25
The cube of 5 is 125
>>> num
5
>>> cube(7)
343

Why the order in which the script is written matters is because, broadly, things need to be defined before they're used in Python. The functions square, cube, and read_number can only be called once they've already been defined. So, the last three statements in our script — which call each of the three functions — must appear at the bottom of our script. If they appeared before the definitions of the functions, the script would terminate with an error, because of an attempt to call a function that didn't yet exist.


Scope and scoping rules

So far, we've only seen Python functions whose bodies each contain a single return statement. While we will write one-line functions like these sometimes, many of the functions we write will be longer than that. All of the statements that we've seen so far in Python can be used within the body of a function, and functions can legally contain as many statements in their bodies as you'd like, which are subject to the same rules of control flow — if statements for conditionality, loops for repetition, and so on — that we've seen already. (The only "special" statement we've seen so far is return, which can only appear in the body of a function.)

Suppose that we wanted to write a function that asks a user to specify a person's first name and last name separately, then return that name formatted with the last name specified first, the first name specified afterward, and a comma separating them. And, for the sake of argument, let's assume that both the first and last name have to be non-empty — though, of course, there are people who don't have both. All of this actually entails a fair bit of complexity, so it would be worth writing a function to encapsulate it.

def read_name():
    while True:
        first_name = input('What is the first name? ').strip()

        if first_name == '':
            print("You'll need to enter a first name")
        else:
            last_name = input('What is the last name? ').strip()

            if last_name == '':
                print("You'll need to enter a last name after the first")
            else:
                return last_name + ', ' + first_name

If you wrote this function by itself in a Python script and then executed that script in IDLE, the function would be available to call, which might lead to an interaction like this in the Python shell.

>>> name = read_name()
What is the first name? Boo
What is the last name? 
You'll need to enter a last name after the first
What is the first name? Boo
What is the last name? Thornton
>>> name
'Thornton, Boo'

Notice that the body of the function read_name contained assignments to two variables: first_name and last_name. Based on what we've seen so far, it stands to reason that we should be able to obtain their values in the Python shell — whenever we've defined something in a Python script, we've been able to get to it in the Python shell after the script finishes executing. So, let's try it and see what happens.

>>> first_name
Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    first_name
NameError: name 'first_name' is not defined

As usual, this error message turns out not to be an accident; it's indicative of some additional rules in Python that we've not seen yet. When we define a name in Python — such as when we assign a value to a variable or define a function — it is not necessarily available throughout the entire program. Not all names are defined globally; many of them are (purposefully) defined more restrictively than that. (This is an example of a broader principle we'll see throughout this course: Perhaps paradoxically, our programs become more flexible as we restrict the ways that each part of it can be used.)

Global and local scopes

In Python, each definition exists within a scope, which is the portion of the program in which that definition is available. Anything that is named — variables, functions, parameters, and so on — are subject to this rule.

A global definition is one that is made within a Python script (or within the Python shell), but not inside of a function. For example, consider the following Python script.

x = 10
y = 20
z = 30

def foo(a, b, c):
    parameter_sum = a + b + c
    return x + y + z + parameter_sum

In this script, the variables x, y, and z are part of the global scope, as is the function foo(). For that reason, we would expect that we could execute the script and then access x, y, z, and foo() from within the Python shell.

>>> x
10
>>> y + z
50
>>> foo(1, 2, 3)
66

This is because our shell interactions, too, are being made from within the global scope; we can access anything in the shell that is globally accessible.

By way of contrast, the function foo() contains some definitions that are in its local scope, which is to say that they are accessible only from within the function. The parameters a, b, and c, and the variable parameter_sum, are all local to foo(). Let's think about why that is.

So, when we assign to a variable from within a function, we're assigning to a local variable, which is to say a variable that is defined in the function's local scope. Note that this is true even when we're talking about variables with the same name within and outside of a function.

q = 5

def example(n):
    q = n * n
    return q

print(example(q))
print(q)

If we executed the Python script above, the output would look like this:

25
5

This is because the assignment to q in the example() function does not affect the global variable q; it instead creates a new local variable q, which is local to the function example() and is separate from the global one defined previously. When the same name is defined in more than one scope, there are rules about which one Python will "prefer," which generally boil down to "Prefer things that are defined more closely to where you are."

There is one other additional rule to be aware of, which might come as a surprise if you haven't thought it through carefully. Consider this Python script, which is similar to the previous one, but is not quite the same.

q = 5

def example(n):
    m = q * n
    q = m * n
    return q

print(example(q))
print(q)

Consider what might happen if we executed this script. Knowing what we know about Python already, we would expect the following things to happen.

Traceback (most recent call last):
  File "D:/Examples/32/scopes3.py", line 8, in <module>
    print(example(q))
  File "D:/Examples/32/scopes3.py", line 4, in example
    m = q * n
UnboundLocalError: local variable 'q' referenced before assignment

This outcome may seem a little bit perplexing, because the local variable q doesn't seem to exist yet. Its value will not be assigned until the second statement within the example() function. But Python first scans a function's body and determines all of the local variables it will need. They're all created at the beginning of the function's execution — albeit without values, which is what it means for them to be "unbound" — but they can't be used until a value has been assigned to them. So, in this case, we see an error message. You can't use a local variable until after you've assigned a value to it.

Functions defined inside of functions

Functions don't have to be defined only in the global scope, though that's most often where you'll define them. It turns out that you can define them locally — within other functions — as well. Like local variables, this will render them usable only within the function in which they're defined. (In truth, it's not all that often that I use this technique; it's comparatively rare that I want to write a function this way. But, as we'll see later in the quarter, isolating functions can have its uses. For now, though, we'll focus on the affect this technique has on the scoping rules of Python.)

Suppose you wrote the following Python function.

def read_and_sum_numbers():
    def read_number():
        return int(input('Enter a number, or 0 to stop: '))

    total = 0

    while True:
        number = read_number()

        if number == 0:
            return total

        total += number

Here, we've written a function read_and_sum_numbers(), which reads a sequence of numbers from the user as input, then returns the sum of the numbers it read. Part of what it does is encapsulated in a smaller function read_number() inside of it, which is used to read a single number. Like anything else defined locally within a function, read_number() can only be called from within read_and_sum_numbers(), which we've done here.

We've seen previously that names defined globally can be accessed from within a function, unless they're "hidden" by a more locally defined version of the same name. But what if there are multiple nested scopes, like we have here? Now we're ready to take a more complete look at how names are looked up in Python when we use them.

Python's name lookup rule: Local, Enclosing, Global, Built-in (LEGB)

At any given time, there may be multiple scopes in play. For example, in the previous example, when we're within the function read_number(), there are three scopes whose names are potentially accessible:

Note that what makes a scope accessible is structural — it's a matter of which functions are defined within which other functions, not which functions have called which other functions.

When you use a name in a statement or expression, Python uses a simple rule that is sometimes abbreviated as LEGB to determine which name you've used. LEGB stands for Local, Enclosing, Global, Built-In and is the order in which Python looks for that name.

For the most part, we will avoid assigning values to variables defined in scopes other than the local one, though we will feel free to read values from outer scopes (as we will feel free to have global constants — variables whose values we never intend to change — even if we won't use global variables whose values vary). Particularly as our programs grow larger, it is paramount that we be able to understand portions of them without having to consider the fine-grained details from other portions. Global variables are problematic because they essentially tie an entire program together; they're a detail that transcends our entire program, making it more difficult to keep different parts of the program isolated from the others. Since one of our key goals in this course is "leveling up" our skills, so that we can write dramatically larger programs than we could before, we'll stick to techniques that scale up properly.