ICS 33 Spring 2025, Notes and Examples: Class Design

Background

So far in this course, we've spent a fair amount of our time exploring what makes a lot of Python's features tick, introducing ourselves to both the mechanisms that make them work, as well as the ways that we might customize those mechanisms to alter their normal behavior. As is so often the case, when we're concentrating so heavily on the details, it's difficult sometimes to see the big picture. (Even highly experienced engineers can fall victim to this; it can be difficult to think about the same problem at multiple levels of granularity at the same time.) You may have found the journey to be overwhelming at times, or you may have been excited about it every step of the way — some people take more of a natural interest in the small-picture details than others — but, either way, now is a wonderful time to ascend and look from above at what we've seen so far, because the investment we've made in plumbing Python's depths has reached the point where it'll begin to pay off. The power in what we've learned isn't in each individual detail; it's in the way their advantages compound when we combine them together.

So, we should spend some time considering bigger-picture problems that may have been out of our reach before, where we can combine the things we've learned to allow us to solve them. Now that we know so many individual techniques that are thought to be Pythonic, how do we design a Pythonic class? Once we understand that, then we'll recognize that many of the tools built by others — both the ones in the Python standard library and the ones you can obtain from third parties — use the same combinations of techniques. We'll begin to be able to recognize the qualities that "best of breed" tools have, so that we can select them when they exist instead of building our own, while still understanding enough about those techniques that the tools will feel familiar to us. (This is, in large part, what makes learning the details of your preferred programming language so important. It's not because we need every detail for every problem, but because it enables us to recognize patterns in existing designs, so that we can find our way around unfamiliar work much more quickly than we could before.)

To build that understanding, let's explore what we might call Pythonic class design. What should we be aiming for when we design a class? What should we be avoiding? How can we tell the difference between higher- and lower-quality solutions to the same problem? (In all fairness, the questions in this paragraph have answers that are at least partly a matter of taste, but there are some universal ideas worth exploring, even if reasonable people differ in some of their preferences. At least some of what we'll discuss here is my own opinion, but I'll attempt to offer alternatives where appropriate.)

Gradually refining the design of a Python class

When we engage in designing an individual software component, such as a class in Python, there are two simple but fundamental questions whose answers we can use to steer our decision-making.

What do we want to be able to say? When a Python class has been written, it gives us the ability to create objects that can solve a particular kind of problem. How do we want to be able to interact with those objects? We can call methods on them, store values in their attributes, use them in expressions with operators, pass them as arguments to functions or methods, and so on. Which of those things do we want to enable? What names should they have? What arguments or operands should they require? In short, what will it look like to use the tool correctly?
What do we want to prevent ourselves from saying? We've seen that a running Python program runs in a highly dynamic environment, in the sense that all kinds of things about a program — the values of attributes (and, therefore, their types), the methods of a class, the meaning of a function, and so on — can be changed while a program runs. Some of these changes are both necessary and productive; part of why we have variables, for example, is so their values can vary when we need them to. Other changes would certainly be program bugs, though; if our design depends on a value being immutable, then any attempt to change it would surely be a mistake. Preventing ourselves from writing things that are obviously incorrect, or at least making sure that programs fail at the point where things have begun to go wrong — rather than muddling through and hoping for the best — is an essential part of writing large programs successfully. In a highly dynamic environment where we can seemingly assign any kind of value anywhere we want, how do we ensure that we don't do things we shouldn't?

Relatively novice programmers rightly struggle with both of these, for the simple reason that they both rely on our sense of program smell, which is to say that the way we recognize a less-than-ideal solution is less a matter of noticing an obvious symptom like a syntax error or an immediate exception being raised, and more a matter of recognizing that the choices we've made now will lead to problems down the road. A missing feature won't be obviously missing until we come to the point where we need it. The presence of a misfeature — something that our designs should disallow but didn't, or that fits poorly alongside the other features of the tool we're building — won't be obvious until we've accidentally made the mistake we could have prevented, or try to use ill-fitting features in concert and discover them to be dissonant. These situations might occur much later in the process than when we wrote the original code; in a project involving multiple people, they're likely to be encountered first by someone other than the original author. Experience is the best teacher when it comes to recognizing mistakes like these, whose only impact is in the future; if you've been down more roads, you're likelier to make an educated guess about where the roads you're on will lead.

Still, all hope is not lost, even for novices. As long as we make ourselves aware of these questions and make sure we're asking them repeatedly along the way, we'll make better decisions than we would have made if we'd ignored them. If we have more experienced people to guide us, we can ask their opinion, rather than making a guess. As our experience grows and the snippets of advice from our mentors converge into the guiding principles we follow, the quality of our decisions will naturally improve. It's been more than forty years since I first typed in a program and saw it run, and I still feel like my skills improve every year, and that there's more to learn than I already know (or will ever know); that's a large part of why I enjoy this stuff so much.

Perhaps paradoxically, there's no better way to learn about software design than to design software, so we'll start with a very simple Python class and gradually refine its design, each time with our two guiding questions in mind: What do we want to be able to say, and what do we want to prevent ourselves from saying?

The problem

We'll begin with a small design problem, in which we have three requirements to follow.

We want to be able to store two pieces of information about people: their names and their birth dates.
There are many people whose information we want to store, and we'd like to be able to distinguish between them.
We don't want the information about a person to be changed once it's been specified.

The simplest thing that could possibly work

Software design luminary Ward Cunningham once described that a good design process begins with a question: What's the simplest thing that could possibly work? (The word "possibly" isn't just fluff; the idea is not to aim for a solution that you're sure is correct, just the simplest thing that might be.) If you can implement that, then you have something tangible to look at, think carefully about, test manually in a Python shell, and write unit tests to cover. One's reactions to those things become catalysts for future action. Maybe you keep what you wrote; maybe you throw it away. Maybe you keep some parts of it and replace others. But, at the very least, you're moving, even if you're not sure whether you're moving in the right direction yet, and sometimes getting oneself moving is half the battle.

So, what's the simplest thing that could possibly work for solving our problem?

If we need to store separate information for each person, then we need a class, a good name for which might be Person; that way, we can have many objects of that class, each object representing one person.
If we need to know two separate pieces of information about each person, then Person objects need two attributes, which might reasonably be called name and birthdate. These need to be attributes of each object rather than attributes of the class, because each person can have a different name and birthdate.
It needs to be possible to read the value of the name and birthdate attributes, but not to write to them.

What's the simplest thing that can do those things? It turns out that the Python standard library offers a solution you've probably seen before: a namedtuple, which you'll find in the module linked below.

person_step01.py

If we execute that module in a Python shell, we can create and interact with Person objects, and they're surprisingly capable, given that we only wrote two lines of code ourselves.

>>> from datetime import date
>>> Person
    <class '__main__.Person'>
                   # We didn't write a class ourselves, but Person is a class nonetheless.
>>> p = Person('Boo', date(2005, 11, 1))
                   # We can construct Person objects.
>>> p.name
    'Boo'          # We can obtain their attributes.
>>> p.birthdate
    datetime.date(2005, 11, 1)
>>> p.name = 'Alex'
    Traceback (most recent call last):
      ...
    AttributeError: can't set attribute
                   # We're disallowed from changing their attributes.
>>> p2 = Person('Boo', date(2005, 11, 1))
>>> p3 = Person('NotBoo', datetime(2020, 1, 1))
>>> p == p2
    True
>>> p == p3
    False          # We can compare Person objects for equivalence, not just identity.
>>> hash(p), hash(p2), hash(p3)
    (1431444366259735973, 1431444366259735973, 3919173570769142541)
                   # We can hash Person objects.

Given what we've learned this quarter, we have a better idea of the various mechanisms that might make namedtuples work the way they do, but rather than diving into those details, let's ask higher-level questions. Did we get what we wanted here?

We certainly got a lot of what we wanted. We now have a Person class, with each Person object storing a name and a birthdate. Its objects are hashable and can be compared for equivalence. From a standpoint of "What do we want to be able to say?", we've done pretty well, though one faint smell in the distance is the question of what we do if Person objects ever need to be more than just "dumb holders of data." What if they end up needing additional methods?
We did less well on the other measure: "What do we want to prevent ourselves from saying?" We did get the immutability we wanted, which prevents us from changing a Person's name or birthdate once it's created. But Person objects can be created with names that are empty, or that aren't strings. They can be created with birthdates that aren't dates at all. So, this side of the ledger is not looking as good.

Adding a requirement

Let's suppose that we've just been asked to add one feature: the ability to tell a person's age as of a particular date. Could we adapt our namedtuple to solve this problem? And, if we did, would we be satisfied with the outcome?

We can indeed add methods to a namedtuple, though we have to do so carefully.

person_step02.py

>>> from datetime import date
>>> p = Person('Boo', date(2005, 11, 1))
>>> p.age(date(2018, 11, 7))
    13
>>> p.age(date(2006, 1, 17))
    0

How satisfied we are with this outcome depends on whether we think this is going to be an out-of-the-ordinary request, or whether we think this Person class may grow a bit further. At this point, we've already got code that looks like something it isn't — we've built a class with a method in it, but you have to read our module fairly carefully to notice; the code's shape is no longer congruent with its meaning.

And, besides, we're still missing something else we want: the ability for the construction of a Person to be validated (i.e., an exception should be raised if we specify a name or birthdate that doesn't meet its requirements). We could try to contort our namedtuple further to try to solve this, but we're probably better off recognizing that its limitations are standing in our way. If the tools we're using don't solve our problems, it's time that we either find or build better tools.

Turning our implementation into its own class

Since we've decided that our design has outgrown the ability of namedtuples, we'll take a step back and re-implement our Person class by hand.

person_step03.py

>>> from datetime import date
>>> p = Person('Boo', date(2005, 11, 1))
>>> p.name
    'Boo'
>>> p.age(date(2018, 11, 7))
    13
>>> p.name = 'Alex'
        # Oops!  This wasn't supposed to be allowed!

Unfortunately, we've lost one thing that we decided was important to us, which is the immutability of Person objects. The immutability of namedtuples isn't free; namedtuples have to enforce it, since Python's default is that we can assign values into attributes of any object at any time.

Re-establishing immutability

One technique we know to re-establish immutability is to store the name and birthdate in protected attributes, then to implement name and birthdate methods that return their values. It'll still be possible to assign to the protected attributes, since adding an underscore to an attribute's name doesn't imply any kind of enforcement, but at least programs that respect the Pythonic convention of protected attributes will benefit.

person_step04.py

>>> from datetime import date
>>> p = Person('Boo', date(2005, 11, 1))
>>> p.name(), p.birthdate()
    'Boo', datetime.date(2005, 11, 1)
>>> p.age(date(2018, 11, 7))
    13

So, we could stop here, but there's something worth considering. The Python standard library is filled with classes whose attributes can be accessed but not modified, such as the date class we've been using here. In fact, the usual way objects expose their values in Python is via attributes, whether they're mutable or not.

>>> d = date(2005, 11, 1)
>>> d.year
    2005       # year is an attribute, not a method
>>> d.year = 2018
    Traceback (most recent call last)
      ...
    AttributeError: attribute 'year' of 'datetime.date' objects is not writable

If we're aiming to write a class that offers the same feel as the ones that are typically written in Python, we'll want name and birthdate to be accessed as attributes, but, because of our requirements, we'll want them to be immutable. To do that, we'll need to understand a little bit more about how to control the way that attributes are accessed.

Taking more control over attribute access

Ordinarily, when we ask for the value of an object's attribute, it's (potentially) a three-step process.

First, check the object's dictionary. If there's a key matching the attribute's name, return the corresponding value.
If the object's dictionary has no such key, try looking in the dictionary belonging to the class instead. If there's a key matching the attribute's name, return the corresponding value.
Otherwise, raise an AttributeError.

Between the second and third steps, though, one more thing happens: If the class has a __getattr__ method, it's called instead, and whatever it returns is our result. In other words, a class with a __getattr__ method has customized what happens when we access an attribute that's missing in the dictionary belonging to an object and the various class dictionaries that are usually checked.

>>> class Thing:
...     def __getattr__(self, name):
...         return name[::-1]
...
>>> t = Thing()
>>> t.abc = 'Boo!'
>>> t.abc
    'Boo!'     # Accessing an attribute that's stored in t's dictionary gives us that value.
>>> t.fgh
    'hgf'      # Otherwise, we get the value returned by __getattr__ instead.
>>> t.aabbcc
    'ccbbaa'   # That means there are no attributes this object doesn't have, since
               # Thing's __getattr__ method returns a value no matter what the name
               # of the attribute is.

It's rarely the case that you'd ever want to build a class that exposed an effectively infinite set of pseudo-attributes like this, but now that we understand the mechanism, we can use it to solve the problem at hand. What if we wrote a __getattr__ method in our Person class capable of giving us a customized value for name and birthdate?

person_step05.py

>>> from datetime import date
>>> p = Person('Boo', date(2005, 11, 1))
>>> p.name
    'Boo'
>>> p.birthdate
    datetime.date(2005, 11, 1)
>>> p.missing
    Traceback (most recent call last):
      ...
    AttributeError: Person object has no attribute 'missing'
>>> p.name = 'Alex'
               # This is where our solution still needs work.  This should be disallowed.
>>> p.__dict__
    {'_values': ('Boo', datetime.date(2005, 11, 1)), 'name': 'Alex'}
>>> p.name
    'Alex'     # Once name is an attribute of the object, it wins over what our
               # __getattr__ method would have returned.

This is a step in the right direction, but now we need to take control over the other side of the equation, too. If you can customize what happens when you get the value of an attribute, presumably you can customize what happens when you change it, too.

Re-establishing immutability (again)

Just as there's a __getattr__ dunder method that allows us to customize the behavior of getting an attribute's value, there are also __setattr__ and __delattr__ methods that provide similar hooks for setting the value of an attribute or deleting an attribute, respectively. Unlike __getattr__, __setattr__ and __delattr__ are called (when present) whether an attribute exists in the object's dictionary or not. Otherwise, the idea is pretty similar; we're just tweaking a slightly different part of a similar process.

By adding __setattr__ and __delattr__ methods to our Person class, we can re-establish the immutability of the name and birthdate attributes again.

person_step06.py

>>> from datetime import date
>>> p = Person('Boo', date(2005, 11, 1))
>>> p.name
    'Boo'
>>> p.birthdate
    datetime.date(2005, 11, 1)
>>> p.name = 'Alex'
    Traceback (most recent call last):
      ...
    AttributeError: Attribute 'name' of Person object cannot be assigned
>>> del p.name
    Traceback (most recent call last):
      ...
    AttributeError: Attribute 'name' of Person object cannot be deleted

This is all looking pretty good; it looks like we've got immutability nailed down again. But, thinking ahead a bit, if we needed to write many classes like this, it would be a shame to have to implement this rote pattern — including three carefully-implemented dunder methods — every time. It would be better if we could generalize the idea of the immutability of an attribute, so we could reuse it anywhere we need it.

Using descriptors to generalize immutability

Amidst our discussion of Decorators, we saw that descriptors can be used to influence what happens when an attribute is accessed. That's similar to what we did with __getattr__, __setattr__, and __delattr__, but there's a key difference. When an attribute's value is itself a descriptor, it customizes its own value, which means the attribute's class won't need to handle it; it'll be automatic.

Where that technique is useful is when the same kind of customization needs to be done across many classes, so that we don't have to reimplement that same customization repeatedly. We've seen that functions automatically turn themselves into methods in every class, which happens because functions are descriptors; that way, methods in every class can have that special ability, without every class having to provide it. The idea that we'd want an attribute's value to be immutable is similarly a common one, so it would make sense for us to implement in a way that it could be reused across classes.

What if we implement a class named ImmutableValue, which is a descriptor that enforces the immutability of an attribute's value? If we then store ImmutableValue objects into the name and birthdate attributes of our Person class, their __get__, __set__, and __delete__ methods will be called automatically whenever we attempt to access, modify, or delete them. The resulting design would look like this.

person_step07.py

>>> from datetime import date
>>> p = Person('Boo', date(2005, 11, 1))
>>> p.name, p.birthdate
    ('Boo', datetime.date(2005, 11, 1))
                 # We can obtain the values from the attributes.
>>> Person.name
    <__main__.ImmutableValue object at 0x000002313DF2DFF0>
                 # The class attributes are actually ImmutableValue objects.
>>> p.__dict__
    {'_name': 'Boo', '_birthdate': datetime.date(2005, 11, 1)}
                 # The actual values can be found in protected attributes, but the
                 # ImmutableValue.__get__ method will find them for us.
>>> p.name = 'Alex'
    Traceback (most recent call last):
      ...
    AttributeError: Cannot assign to an immutable attribute
                 # Meanwhile, the attributes are immutable, as we'd like them to be.

Automatically determining the underlying attribute names

One small point of friction in our Person class was the need to tell each ImmutableValue the name of the underlying attribute, which led to this unfortunate-looking code.

class Person:
    name = ImmutableValue('_name')
    age = ImmutableValue('_age')

When things are formulaic, it's better for them to be automated, so what we'd like to be able to say is this instead.

class Person:
    name = ImmutableValue()
    age = ImmutableValue()

The trick is finding a way for the ImmutableValues to know the names of the attributes into which they've been stored. As it turns out, this can be automatic; when a descriptor is stored in a class attribute, its __set_name__ method is called (if it has one), which tells it both the class and the name of the attribute into which it's being stored. That leads to a small but beautiful simplification.

person_step08.py

Our design is coming together nicely, but it's requiring us to venture further and further into the weeds, embracing things like overriding attribute accesses, descriptors, and other low-level techniques that, while they are powerful, can be difficult to get right.

Perhaps now would be a good time to stop and think about whether the problem we're solving is actually a novel one. There's a lot of value in solving problems from first principles when we're learning a programming language, but once our focus is on getting things done, we should evaluate whether we're re-inventing the wheel. Immutable attributes seem like the kind of problem for which Python or its standard library might already offer a solution.

Using properties instead of manually enforcing immutability

A property of a class is an automatically-managed value associated with each object of that class. When accessed, properties look and feel like attributes, but they can do whatever you'd like when their values are accessed, modified, or deleted. In that sense, properties are a lot like the ImmutableValue descriptor we just wrote, except they provide us these abilities without the need for us to write a descriptor ourselves.

The built-in Python function property is a decorator that transforms a no-argument method (i.e., one that has a self parameter but no others) into a property, where its value is determined by calling the method behind the scenes.

>>> class Thing:
...     @property
...     def name(self):
...         return 'Boo!'
...
>>> t = Thing()
>>> t.name
    'Boo!'
>>> t.name = 'Alex'
    Traceback (most recent call last):
      ...
    AttributeError: can't set attribute 'name'

Rather than seeing something like this as unexplainable magic, let's take a look at the mechanism that makes this happen, because it's based entirely around things we've seen already.

>>> Thing.name
    <property object at 0x000002599A1A8EF0>
                 # Our name method was entirely replaced by a property object.
                 # That's not as weird as it sounds: Decorators replace what they
                 # decorate with a new value, and there's no requirement that it
                 # be a value of the same type.
>>> dir(Thing.name)
    ['__class__', '__delattr__', '__delete__', '__dir__', '__doc__', '__eq__',
     '__format__', '__ge__', '__get__', '__getattribute__', '__gt__',
     '__hash__', '__init__', '__init_subclass__', '__isabstractmethod__',
     '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
     '__repr__', '__set__', '__set_name__', '__setattr__', '__sizeof__',
     '__str__', '__subclasshook__', 'deleter', 'fdel', 'fget', 'fset',
     'getter', 'setter']
                 # We don't need to scrutinize every detail here, but the important
                 # thing to notice is that properties have __get__, __set__,
                 # and __delete__ methods.  Properties are descriptors, just like
                 # our ImmutableValue is.
>>> Thing.name.__get__(t)
    'Boo!'
>>> Thing.name.fget
    <function Thing.name at 0x00002599A1DD7E0>
                 # This is our original name method, which was decorated and became
                 # a property.  That's how the property's __get__ method knows how
                 # to call our method.

Having looked at those details, we can be fairly satisfied that properties are a built-in Python solution to the problem that we were already trying to solve. When we can rely on built-in solutions, we win in two ways.

We don't have to spend time building, testing, and maintaining things that are already built, tested, and maintained.
Other people who already know Python will recognize built-in techniques more quickly than they can read our bespoke solutions and figure out what they do.

So, it seems wise for us to continue in this direction. Our previous attempt, using our own ImmutableValue class, taught us some things that are important, but we'll leave it behind now. Let's use properties to re-write our Person class.

person_step09.py

>>> from datetime import date
>>> p = Person('Boo', date(2005, 11, 1))
>>> p.name
    'Boo'
>>> p.birthdate
    datetime.date(2005, 11, 1)
>>> p.name = 'Alex'
    Traceback (most recent call last):
      ...
    AttributeError: can't set attribute 'name'

It looks like we're right back where we were, but we've removed most of the complexity, allowing Python's built-in property feature to do the heavy lifting for us.

Clarifying our design a bit

As our Person class evolves, we can imagine that we might later want to add more attributes to it. If so, the syntax we're using to construct a Person will start to become cumbersome. What we're saying to construct a Person presently is this.

Person('Boo', date(2005, 11, 1))

Requiring the arguments to be specified with keywords would be a good way to make this clearer.

Person(name = 'Boo', birthdate = date(2005, 11, 1))

One could reasonably argue that keyword arguments were already possible — name and birthdate aren't positional-only parameters, after all — but there's something to be said for enforcing that the code that uses a tool be written hygenically.

Speaking of hygiene, we might also consider adding type annotations where appropriate, so let's take care of that, too.

Solving these problems turns out to be straightforward, because we only need to change the signatures of our methods accordingly; nothing else needs to change.

person_step10.py

>>> from datetime import date
>>> p = Person('Boo', date(2005, 11, 1))
    Traceback (most recent call last):
      ...
    TypeError: Person.__init__() takes 1 positional argument but 3 were given
>>> p = Person(name = 'Boo', birthdate = date(2005, 11, 1))
>>> p.name, p.birthdate
    ('Boo', datetime.date(2005, 11, 1))

Our Person class has gotten a lot simpler over the last couple of iterations, but it's hard to escape the idea that there may be other aspects of this design for which built-in automation exists. What could be more common than a class whose objects have a few immutable attributes, can be compared for equivalence, hashed, and provide additional methods with some problem-specific logic in them? Aside from that one age method, everything else in our class is purely formulaic.

Dataclasses

A dataclass in Python is one that is primarily focused on storing values in a collection of fields. Each field has a name and a type annotation specifying its expected type, with objects of the resulting class appearing to have an attribute corresponding to each field.

>>> from dataclasses import dataclass
             # Dataclasses are implemented in Python's standard library, so we  need to
             # import them if we want to use them.
>>> @dataclass
... class Thing:
...     a: int
...     b: int
...          # Decorating a class with the @dataclass decorator causes it to be
             # turned into a dataclass automatically.
>>> t = Thing(11, 1)
             # We can construct a Thing by specifying values for its fields.
>>> t.a, t.b
    (11, 1)  # We can access attributes to obtain the values of the fields.
>>> t.a = 17
>>> t.a
    17       # Fields are mutable by default, though that default can be changed.
>>> Thing(11, 1) == Thing(11, 1)
    True     # Objects of dataclasses can be compared for equality, which means that
             # all of their fields are equal.
>>> Thing(11, 1) == Thing(1, 17)
    False
>>> hash(t)
    Traceback (most recent call last):
      ...
    TypeError: unhashable type: 'Thing'
             # Objects of dataclasses aren't hashable unless they're immutable (as they
             # shouldn't be).  But if we mark them as immutable and they can be compared
             # for equality, they will also be hashable automatically.

We've seen many of the mechanisms on which dataclasses are built: class decorators, descriptors, and a variety of dunder methods come together to make them possible. Many aspects of dataclasses can be configured, the details of which are described in PEP 557 (which introduced this Python feature) and the Python standard library documentation for the dataclasses module, both of which are linked below.

But let's use dataclasses to rewrite our Person class again. Given how many things dataclasses can do automatically, we may be able to boil it all the way down to its essence.

person_step11.py

>>> p = Person(name = 'Boo', birthdate = date(2005, 11, 1))
>>> p.name, p.birthdate
    ('Boo', datetime.date(2005, 11, 1))
>>> p.name = 'Alex'
    Traceback (most recent call last):
      ...
    dataclasses.FrozenInstanceError: cannot assign to field 'name'
>>> p.age(date(2018, 11, 7))
    13

So, have we reached nirvana? Not quite. For example, we can still create a Person object with values in its fields that aren't valid with respect to our assumptions about their types.

>>> bogus = Person(name = 13, birthdate = 'Hello!')
>>> bogus.name, bogus.birthdate
    (13, 'Hello!')

Adding validation

Fortunately, if what we want is validation in an immutable dataclass, there's one more knob we can turn to good effect. We don't write an __init__ method in a dataclass, since one is provided automatically, but the provided one does play a handy trick: If there's a __post_init__ method in the class, the provided __init__ method calls it after it initializes all of the fields. We can do anything we'd like in __post_init__, which means we could validate those field values and, if they're invalid in some way, we could raise an exception.

person_step12.py

>>> p = Person(name = '', birthdate = date(2005, 11, 1))
    Traceback (most recent call last):
      ...
    ValueError: name cannot be an empty string
>>> p = Person(name = 13, birthdate = date(2005, 11, 1))
    Traceback (most recent call last):
      ...
    ValueError: name must be a string, but was int
>>> p = Person(name = 'Boo', birthdate = 13)
    Traceback (most recent call last):
      ...
    ValueError: birthdate must be a date, but was int
                  # Invalid inputs are now rejected, as we'd like them to be.
>>> p = Person(name = 'Boo', birthdate = date(2005, 11, 1))
>>> p.name, p.birthdate
    ('Boo', datetime.date(2005, 11, 1))
                  # Valid ones work just like they did before.

At this point, we can be pretty pleased with where we are. Here's what we have:

Person objects store a name and a birthdate.
The name of a Person must be a non-empty string.
The birthdate of a Person must be a date.
A Person is immutable and hashable.
Two Person objects can be compared for equality.
A Person can calculate its age as of a particular date.
Constructing a Person object requires passing keyword arguments to its constructor, to make clear which arguments correspond to which fields.
Accessing a Person's attributes can be used with a Pythonic attribute-access syntax, rather than calling methods.

All of that fit into a grand total of 20 lines of code, when we count neither blank lines nor comments. To be fair, there's a lot of code at work that we didn't have to write, but that's neither here nor there. In terms of the effort required to build and maintain our code — which is most of what we care about, ultimately — we ended up with 20 lines. The rest is a matter of us standing on the shoulders of giants.

Contrasting properties and dataclasses

Having seen both properties and dataclasses in the previous example, it may appear that using dataclasses is an unvarnished win, as they seem to provide a simpler way of doing the same thing that properties can do. This is certainly true for the class that we built in that example, which is why we landed on dataclasses as our ultimate choice. But let's be sure we understand that while these two features of Python overlap slightly, there are differences.

Property setters and deleters

Properties can do more than just replace the name of an attribute with a method that can return its value; such a method is sometimes called a getter or a property getter. Importantly, it's also possible to add both setter and deleter methods to a property, which allows us to control what happens when someone assigns a value into the attribute or attempts to delete it.

>>> class Thing:
...     def __init__(self, value):
...         self._value = value
...     @property
...     def value(self):
...         return self._value
...
>>> t = Thing(13)
>>> t.value
    13
>>> t.value = 17
    Traceback (most recent call last):
      ...
    AttributeError: can't set attribute 'value'
               # This exception is caused by there not being a setter method for the
               # value property.  Whether this is a good or bad thing depends
               # on the design of one's class.
>>> class MutablePositiveThing:
...     def __init__(self, value):
...         self.value = value
...     @property
...     def value(self):
...         return self._value
...     @value.setter
...     def value(self, new_value):
...         if new_value <= 0:
...             raise ValueError('cannot be non-positive')
...         self._value = new_value
...
>>> m = MutablePositiveThing(13)
>>> m.value
    13
>>> m.value = 17
>>> m.value
    17
>>> m.value = -3
    Traceback (most recent call last):
      ...
    ValueError: cannot be non-positive
               # Our validation has protected us.
>>> m2 = MutablePositiveThing(0)
    Traceback (most recent call last):
      ...
    ValueError: cannot be non-positive
               # Because we assigned to self.value in the __init__ method,
               # it protected us here, too.

That @value.setter syntax is a little strange, but here's what's happening there.

After the first def value statement has been executed, its @property decorator has turned MutablePositiveThing.value into a property object rather than a method.
A property object has a setter attribute that is itself a decorator that can turn another method into a setter method and associate it with the property. That's what we did with the second value method.

Given a setter method with validation in it, such as the way that our MutablePositiveThing.value setter is requiring its argument to be positive, we can prevent objects with these properties from ever having invalid values assigned into their attributes. If we use those same property setters in the __init__ method, we can also prevent them from being initialized in an invalid way.

A @value.deleter decorator could be added to what appears to be a third value method, which would add a deleter method into the mix (i.e., a method that would be called when an attempt is made to delete the value attribute from a MutablePositiveThing object), though this technique is less common than getters or setters; it's much more often the case that we don't want to be able to delete an attribute from an object, which is the default when we don't provide a deleter.

The limitations of dataclasses

Dataclasses automatically turn the type-annotated attributes of a class into a somewhat more feature-rich version of a namedtuple. But, by way of contrast with using properties, dataclasses provide no mechanism for customizing what happens when the values of its fields are read, modified, or deleted, which offers no ability to validate the values of its fields except for a __post_init__ method that does one-time validation of the object just after it's been initialized. For a frozen dataclass consisting only of immutable fields, as in our previous example, that limitation is something we can live with. But if any field's value can change afterward, there's no built-in way to prevent it from becoming invalid later.

As usual, when there are two ways to solve a similar problem, one way is not definitively better than the other. They offer a tradeoff — dataclasses give us simplicity, at the cost of being more limited. When we can live within those limits, the simplicity is great; when we need features beyond those limits, we need something less simple.