ICS H32 Fall 2025, Exercise Set 5 Solutions

Problem 1

The crux of this question is to understand the basic mechanism used to communicate over a computer network. We don't need to be experts, but we do need to understand a couple of things. In particular, we need to understand that information sent over a computer network passes through many "hops" between the initial sender and the ultimate receiver — basically, it's sent from one machine to another, which forwards it to another, and then another, and so on, until it finally reaches its destination. And, of course, any of those machines along the way could potentially store that information, introduce any changes they want before forwarding it, forward it to someone other than who it was intended for, or even refuse to forward it altogther. We simply can't be sure what happens between our machine and the destination machine. (Think of it like a courier service moving packages and letters from place to place, but independent operators are handling each leg of the journey separately, so every 20 or 30 miles, the package is given to someone new to carry for a while; all it takes is one of those operators being untrustworthy to cause all kinds of problems.)

Encryption is usually a pretty simple idea for people to understand; if you encrypt information, it means that it can be read and understood only by someone who holds the "keys" with which it was encrypted, meaning that any intermediary who intercepts the information, but who doesn't have the keys, won't be able to make any useful sense out of it. That's useful protection, indeed; virtually everything sent over the Internet passes through some intermediaries before reaching its final destination.

Authentication, which is what this question is about, is a little different. Even if we believe we've connected to a particular host or IP address, we can't be sure that we actually have. Just because we connect to www.bankofamerica.com doesn't mean that that's actually who is sending back the response, because we could instead be receiving a response from one of the intermediaries instead. Authentication is how we can be relatively certain that what we've received is actually coming from Bank of America and not someone else. But, of course, authentication takes time at the outset of each connection; a certificate has to be sent, its chain of trust has to be built, and so on. So, it's not without cost.

With that backstory in place, here are the answers to the two questions.

We might prefer to use HTTPE in a scenario where we're in complete control of all of the network infrastructure between us and the "other end" of the connection (e.g., all within the same physical building, which we control), so that we know that there are no nefarious intermediaries in between. We might also prefer it in a scenario where we simply don't care who the other end of the connection is (e.g., when we're passive consumers of information), though that's less often than you might think. If you're trying to find out the news of the day, don't you want to know that you're receiving it from the source you think you are? If you're obtaining data about air quality, so you can decide whether to go outside and take a walk, don't you want to know that it's not doctored data?
We might prefer to use HTTPS in almost any other scenario, where the cost of the authentication is outweighed by the benefits. If we're going to use the open Internet at all, then we have little control over precisely how information will be sent or received, and even if we know the usual route it might take, that route can change quietly and without warning. In that case, it would quite rarely be the case that we'd not like to be sure that we're talking to who we think we are, even in what might seem like low-stakes circumstances like passive web browsing.

Problem 2

Below is the link to a solution, which is a little more full-featured than what I asked you to write, because I'm attempting to handle invalid data files.

problem2solution.py

Problem 3

The choice of a JSON-based format for the data files would almost certainly made the program dramatically easier to solve, because so much of the effort was spent in parsing the bespoke format that I designed specifically for the problem. That's not to say that specifically designed formats are necessarily a bad thing, but they're best suited to situations where the kinds of information we're describing have some kind of specially occurring complexity in them. But what was our format in Problem 2, ultimately?

A string describing a location's title.
A multi-line string describing a location's description.
A mapping between commands and the resulting locations they lead.
An optional tag specifying that the game is over when that location is reached.

So, how might we design a format for that same information using JSON? How about this?


{
    "title": "Hallway",
    "description": [
        "You are in an empty hallway, stretching in both directions, with white",
        "walls, white tile floors, and white ceiling tiles.  It feels vaguely",
        "like a hospital here.",
        "",
        "You can go north or south from here."
    ],
    "commands": [
        {
            "destination": "outside_office",
            "phrases": ["N", "NORTH"]
        },
        {
            "destination": "elevator",
            "phrases": ["S", "SOUTH"]
        }
    },
    "isGameOver": false
}

Now, this is a little bit clunkier for the author of the data files to type, but think about what it would do to our program. In fact, instead of thinking about it, let's take a look. Here's an alternative solution to the same problem, but where that JSON format is used instead of the custom format that I designed before.

problem2solution_json.py

The game is identical, except a lot of fiddly code that dealt with parsing strings is gone. The nice thing about standard formats is that a function like json.loads handles an awful lot of the heavy lifting for us. And if I wasn't trying to handle invalid inputs delicately, a lot of what's left would be gone, too. (That, too, could have been dealt with using libraries, albeit not a module in Python's standard library. There are ways to specify a schema that describes the structure we expect in JSON text — "I expect an object with an isGameOver field that is a boolean value," for example. Feed that schema and JSON text to an appropriate library and it'll produce a sensible error message whenever the JSON text doesn't match the schema.)

Give this version of the game a try. There's a JSON-based version of Ants with this base URL:

https://www.ics.uci.edu/~thornton/icsh32/Exercises/Set5/AntsJSON/

Problem 4

Text has many different encodings in practice, for lots of reasons. Some of those reasons are historical: Older systems might use encodings that were more prominent in times gone by. Others are based on need: Some encodings handle certain kinds of characters (e.g., certain printed languages) better than others. While UTF-8 is a pretty common encoding these days, it's certainly not the only encoding in use around the world — and it doesn't necessarily turn out to be the default encoding that Python uses for text, since Python often uses operating system defaults to decide what its default is — so it's best for us to specify an encoding.
As a practical matter, it would be better for us not to assume a particular encoding, as we've done, unless we're interacting with a web API whose documentation made absolutely clear that we could assume a particular one. Text itself can't tell us how it's encoded, but HTTP responses have headers and those headers can tell us more about the data we're getting back. One commonly-occurring header is called Content-Type, whose job is to tell us roughly what kind of content to expect; it's not uncommon for that header to tell us what the encoding is when we're receiving text.