ICS 33 Fall 2024
Notes and Examples: Abstract Base Classes


Background

Earlier this quarter, we re-acquainted ourselves with the quirky term duck typing that you might have seen in previous coursework. Duck typing is used to describe the way that Python allows you to deduce what an object can (and can't) do based on the presence (or absence) of the attributes that conform to a protocol. What makes it possible for an object to participate in its own initialization is the presence of an __init__ method; what makes an object be sized is the presence of a __len__ method; and so on. "If an object walks like a duck and quacks like a duck, it's a duck." So, if we need an object to do a job for us, we call the method we need, and, assuming it's an object that knows how to do that job, everything happens the way we expect. Any sized object, for example, can tell us its length and will automatically do so in a way that's appropriate to its type.

We've looked in some detail, too, at the underlying mechanisms in Python that make this work, by discussing and experimenting with how attribute lookup is done, as well as how inheritance influences its outcome. So, by now, we have a pretty good handle on the flexibility that duck typing offers us in Python, and we understand why and how it works.

But this particular coin has a flip side, because we have more flexibility than we might first consider. Not only do we have the flexibility to do things correctly; we also have the flexibility to make mistakes, yet there may be some distance between the point at which the mistake is made and the point at which it surfaces. When we ask for the length of a sized object, the right thing happens automatically; that's unequivocally good. However, there are two ways for things to go wrong, too.

The distance between the point where our code is incorrect and the point where the error surfaces is caused by the fact that our usage of protocols in Python has been implicit, which is to say that a class is designed to conform to a protocol only by defining the necessary attributes. We don't say "the objects of this class are sized" in some way; in fact, the word "sized" never appears in our code at all. Instead, we just play by the right set of rules — making sure we write a __len__ method that behaves as it should — and that makes our objects sized automatically. Automation has its upsides, but it can also have downsides; in this case, there can be a quietly unacknowledged disconnect between what we wanted and what we actually said.

An alternative way to implement a protocol would be an explicit one, in which we would specifically say, in the code that defines our class, something to the effect of "the objects of this class are intended to be sized." In other words, "sized" becomes an idea that's named in our program, with the relationships between classes and the protocols they implement also becoming a part of our program, rather than a looser notion that we keep in our heads. When an intention is stated explicitly in a program, that provides the human benefit of someone being able to read it and infer the author's intent. Additionally, though, it can provide the ability for these intentions to be validated at or near the point where we specified them, so that a mismatch between what we intended and what we wrote becomes an error that we can find and fix more easily.

In some programming languages, such as Java or C#, an explicit approach is the only way to implement the equivalent of a class that conforms to a protocol. Some programming languages, with JavaScript being a popularly used example, offer an implicit technique without an explicit one. Python, interestingly, offers a choice. In Python, we can continue doing what we've been doing and implement protocols in an implicit way, relishing the flexibility and terseness that duck typing offers while tolerating the way it postpones failure handling. But we also have the option of implementing protocols more explicitly, by specifying them directly within our classes and checking for their presence in an object before relying on them. Let's see how to be more explicit about our intentions, and take a look at the design tradeoffs between explicit and implicit approaches.


How abstract base classes establish and enforce intent

An abstract base class in Python is one that describes the requirement that some other class has a particular collection of methods without specifying precisely what those methods do. In other words, they offer us a way to describe a protocol in Python programmatically, by specifying exactly which methods must be present in order to conform to that protocol. As we'll see, by describing these things programmatically, they also provide the ability for them to be checked programmatically.

The collections.abc module in Python's standard library provides a number of abstract base classes that describe the various protocols that are common to the ways that we interact with objects in Python. For example, collections.abc.Sized is an abstract base class that describes the notion of a class whose objects are sized, by specifying that the class must have a __len__ method. You can find a list of the abstract base classes provided by the collections.abc module, many of which you'll recognize from our previous conversations, below.

Similarly, the numbers module in the standard library provides abstract base classes that describe the features of numeric types, to give us a way to refer to them generally rather than specifically.

But what do we do with an abstract base class? How do we associate our class with it, and what changes about our class as a result? The answer lies in a Python feature we've already seen: inheritance. If we want to specify explicitly that objects of our class are sized, we inherit our class from collections.abc.Sized.

person_sized.py (click here for a commented version)

import collections.abc

_DAYS_PER_YEAR = 365

class Person(collections.abc.Sized):
    def __init__(self, name, age):
        self._name = name
        self._age = age

    @property
    def name(self):
        return self._name

    @property
    def age(self):
        return self._age

    def __len__(self):
        return self._age * _DAYS_PER_YEAR

If we execute the preceding module, we get exactly what we might expect.


>>> p = Person('Boo', 13)
>>> len(p)
    4745             # Person objects are sized, with their length determined by
                     # their __len__ method.

But, strangely, we would have seen the same result even if we hadn't derived our Person class from collections.abc.Sized. What makes objects sized is the presence of the __len__ method, not the fact that we've inherited from collections.abc.Sized. So, other than the readability benefit of having explicitly stated "a Person is sized," what good is collections.abc.Sized?

One immediate benefit is that collections.abc.Sized knows the basic requirement that a __len__ method is necessary. Consequently, it can enforce this requirement for us automatically. In other words, if we say "objects of this class are sized," the absence of a __len__ method can be an error at the point where the object is first created, rather than seeing a downstream failure when we try to obtain its length later.


>>> class NotReallySized(collections.abc.Sized):
...     pass
...
>>> NotReallySized()
    Traceback (most recent call last):
      ...
    TypeError: Can't instantiate abstract class NotReallySized with abstract method __len__
                   # Creating an object of our NotReallySized class fails
                   # with an error message telling us what's missing.

Making it impossible to have invalid objects beats coping with the impact of having invalid ones bouncing around within a running program. Either way, there is an exception to be handled, but this exception, at least, is happening at the point where something has actually gone wrong: creating an object that isn't able to do what it claims. (There is also the potential to turn this into an error that can be caught before the program runs, though we'll leave that topic for another day.)

The other clear benefit of an abstract base class is that it establishes a name that can be used to describe all objects that have its characteristics, which gives us a straightforward way to ask a question like "Is this object sized?", even if the object's class doesn't inherit from collections.abc.Sized.


>>> p = Person('Boo', 13)
>>> isinstance(p, collections.abc.Sized)
    True            # Objects that inherit from collections.abc.Sized are sized.
>>> class NotSized:
...     pass
...
>>> isinstance(NotSized(), collections.abc.Sized)
    False           # Objects without __len__ methods are not sized.
>>> class ImplicitlySized:
...     def __len__(self):
...         return 0
...
>>> isinstance(ImplicitlySized(), collections.abc.Sized)
    True            # Objects with __len__ methods are sized, even if they don't
                    # derive from collections.abc.Sized.

The latter of these benefits leads to a slightly different way to interact with Python programs than we're used to seeing, so let's look at it in more detail before proceeding.


Goose typing

Thus far, when we've interacted with Python objects, we've done so by relying on the technique known as duck typing. In other words, we've called a method on an object to ask it do some job for us. If it knows how to do that job, that's great! In that case, the right thing happens. If it doesn't, an exception is raised, which we can then handle in whatever way (and in whatever place) we think is appropriate.

Another way to think about duck typing is in terms of its relationship to another common phrase that you sometimes hear people say, both within and outside of programming circles: "It's better to ask for forgiveness than permission." So, rather than checking whether it's possible to call some method on some object, we just try it and deal with the fallout afterward if it turned out to be problematic.

Of course, there's an opposite technique that we could employ. Rather than the "trying and hoping" of duck typing, we could instead verify in some way that an object was capable of performing a job, then only if it was verified affirmatively would we actually do it. This technique has come to be known in the Python community as goose typing. (The choice of the name "goose typing" is one for which I've yet to find a rationale.) Like duck typing, goose typing is done while the program runs, but allows us to be explicit about what our needs are, as well as giving us the ability to react to an object's lack of a feature in some way other than an exception of a fairly broad type like AttributeError being raised.

Aside from the fact that we'd never considered this before, what stood in our way of employing a technique like goose typing before was the difficulty of doing it properly. It's not a matter of asking whether an object has a particular type; it's a matter of asking what an object can do. Before we had abstract base classes, how would we have verified that our Person objects were sized? Our only option would be to check for the presence of a __len__ attribute directly.


>>> p = Person('Boo', 13)
>>> hasattr(p, '__len__')
    True        # We could verify that the __len__ attribute is present this way.

For a protocol as simple as the one used to allow an object to be sized, maybe that's fine, though it's certainly possible to mistype '__len__'. To be complete about it, we might also want to be sure that __len__'s value is callable, which adds some more complexity to the problem.


>>> hasattr(p, '__len__') and callable(p.__len__)
    True

And, of course, for protocols made up of multiple methods, this is a clunky technique indeed — and surely not one that we'd want to repeat in many places throughout a large program.

Now that we have a definitive name for the concept of objects being sized, though, we have a short and recognizable way to ask the question, which may hide a fair amount of complexity from us, and which looks a lot like other type-related checking we might do in Python.


>>> isinstance(p, collections.abc.Sized)
    True
>>> isinstance(13, numbers.Number)
    True            # Integers are numbers.
>>> isinstance(13, numbers.Real)
    True            # Integers are real numbers, since they can be converted to floats
                    # (more or less) without information loss.
>>> isinstance(13.015625, numbers.Integral)
    False           # Floats are not integral, though.

How an abstract base class like collections.abc.Sized or numbers.Integral answers these questions can be simple or complex, but, either way, it's no longer our problem to implement those details by hand. And, should those details ever change in the future, the abstract base classes will change accordingly, with our calls to isinstance picking up those changes automatically. We might reasonably expect no future changes to long-standing protocols in Python like the one used to determine an object's length, but if we apply these same concepts to our own protocols that are part of our custom designs — where we expect more evolution over time, especially early in a program's lifetime — it would be very useful indeed to nail these details down in one place, so they could be modified in one place when we encounter the need to change them.


Writing our own abstract base classes

Like almost every Python feature we've seen in this course, abstract base classes aren't magical devices available only to Python's designers. We can also write our own abstract base classes in Python, which will give us a way to explicitly describe our own protocols and allow for their automatic enforcement, just the way that the abstract base classes we saw in Python's standard library do for Python's built-in protocols. In order to do it, there are surprisingly few things we'll need to learn, because most of the machinery is implemented in easy-to-use tools like base classes and decorators that can be found in the Python standard library.

Let's suppose that we've decided to write several classes that solve variations of a problem that we'll loosely call cleaning, which we'll say is the transformation of one string into another for the purposes of making it safer to use for some purpose. There are many reasons you might want to clean a string — to remove problematic characters or words from it, to prevent it from exceeding a length limit, to prevent it from being empty — but, abstractly, these are all shades of the same color. If we implemented many different cleaners in a similar way, we could use them interchangeably, combine them into new aggregate cleaners, and so on. We could describe their similarity using an abstract base class, with the same benefits we saw previously.

An abstract base class is written just like any other class, except that there are two additional things we need to do.

In the file linked below, you'll find an abstract base class StringCleaner that describes our protocol, along with a handful of implementations of it. Read through that code — including the comments — before proceeding, and note that it's all been built with a goose typing mindset, which is to say that the cleaners verify properties of their inputs, so they can fail immediately upon misuse. (Now that we've accumulated the abilities to enforce our requirements, we should be applying them wherever applicable.)

Now, let's take our new set of tools for a test drive, both to see that they work as we expected, and also to see that they provide the safety nets we aimed for in their design.


>>> NonLetterRemover().clean('ABCDEF123456GHIJKL7890')
    'ABCDEFGHIJKL'          # NonLetterRemovers remove characters that aren't letters.
>>> TruncatingCleaner(3).clean('Boo is happy today')
    'Boo'                   # TruncatingCleaners make strings shorter if they're too long.
>>> TruncatingCleaner(5).clean('Buddy Boy')
    'Buddy'                 # Each TruncatingCleaner has a separate limit.
>>> TruncatingCleaner(10).clean('Boo')
    'Boo'                   # TruncatingCleaners have no effect on strings shorter
                            # than the limit.
>>> TruncatingCleaner('Boo')
    Traceback (most recent call last):
      ...
    ValueError: length_limit must be an integer, but was Boo
>>> StringCleanerSequence((NonLetterRemover(), TruncatingCleaner(6))).clean('ABCDE12345FGHIJ67890')
    'ABCDEF'                # StringCleanerSequences apply a sequence of other cleaners.
>>> StringCleanerSequence((13, 'Boo'))
    Traceback (most recent call last):
      ...
    ValueError: All of the cleaners must implement StringCleaner
                            # StringCleanerSequences cannot be created with non-cleaners.

All in all, we should be pretty pleased with the outcome here. Our StringCleaner abstract base class and the concrete cleaners we derived from it have one of my favorite basic characteristics a software tool can have: They're easy to use correctly and difficult to use incorrectly. (It's certainly true that Python won't stop us from misusing them until the program runs — that's inherent in Python's dynamically typed nature — but at least the misuse is called out immediately when it happens, so that the location of the error will be more closely related to the location of the underlying problem.)

Those are the basics of abstract base classes, but we'll close with a few additional odds and ends that are worth being aware of.

How goose typing helps with naming collisions

When we use our protocols implicitly via duck typing, one problem we face is that two different protocols may include a method with the same name. Our StringCleaner protocol, for example, is centered around a single method with the fairly generic name of clean, which means there's a reasonable chance — especially in a large-sized program — that there will be another protocol that includes a method named clean, but that intends for it to be used for a completely different purpose. Our design would benefit from some way to differentiate between an object that has a clean method because it's a StringCleaner and one that just happens to have a clean method serving some other purpose. Abstract base classes provide that mechanism, and using goose typing allows us to verify this before we depend on it.


>>> class HasClean:
...     def clean(self):
...         return 'Boo!'
...
>>> hasattr(HasClean(), 'clean')
    True          # A HasClean object has a clean attribute.
>>> isinstance(HasClean(), StringCleaner)
    False         # But that doesn't mean that a HasClean object is a StringCleaner.
>>> StringCleanerSequence((HasClean(),))
    Traceback (most recent call last):
      ...
    ValueError: All of the cleaners must implement StringCleaner
                  # This means our StringCleanerSequence won't be fooled, which is good.

Abstract properties in an abstract base class

Methods aren't the only thing that can be specified as abstract in an abstract base class. We can also do the same with properties, though the syntax requires some attention to detail, since we need to use two decorators and their order turns out to be important.


class HasName(abc.ABC):
    @property
    @abc.abstractmethod
    def name(self):
        raise NotImplementedError


class Person(HasName):
    def __init__(self, name):
        self._name = name

    @property
    def name(self):
        return self._name

Given those two classes, we can use them in the way you'd expect.


>>> p = Person('Boo')
>>> isinstance(p, HasName)
    True
>>> p.name
    'Boo'

Registration of virtual subclasses

When all of the classes in a program are written by us, we have the ability to define abstract base classes that specify our protocols, derive other classes from them when those protocols are applicable to them, and consequently have full control over which classes implement which protocols. But what do we do when we want an existing class — one for which we don't have the ability to update its code, such as a class in the Python standard library or a third-party library — to be compatible with one of our protocols? The answer to that question lies in a feature called the registration of virtual subclasses.

An abstract base class has a class method — a job that we ask a class to do, instead of an individual object of that class to do — named register. We pass it another class X as an argument, and it instructs the abstract base class that X should be considered to be a virtual subclass of the abstract base class. All that means is that we'll pretend like X implements the necessary protocol, whether it does or not.


>>> class Xyz:
...     pass
...
>>> HasName.register(Xyz)
    <class '__main__.Xyz'>
                   # A little weirdly, register returns the class we passed it,
                   # but that trick has a purpose.  It allows us to use the
                   # HasName.register method as a class decorator.
>>> isinstance(Xyz(), HasName)
    True           # Now Xyz objects are considered to implement HasName.

It's rarely the case that you'll want to use register to establish things that aren't actually true — you'd only want to do this for a class that really does have the necessary methods implemented, or you're setting yourself up for difficult-to-debug problems down the road — but being able to register pre-existing classes with new protocols can be a powerful way to stitch libraries together that weren't designed to be used alongside one another. These kinds of techniques are in the category of things you only use if you need them, but this kind of flexibility, when used judiciously, can sometimes make all the difference.