ICS 45C Spring 2022
Notes and Examples: C++ Standard Library
Includes a code example with the moniker StandardLibrary
Background
Like most industrial-strength programming languages, C++ provides a standard library of tools that augment what is built directly into the language. We've already used some parts of the C++ Standard Library in our work this quarter, such as I/O tools like those declared in the <iostream> header, the std::string type, and a small handful of others. But the library is more extensive than the parts we've seen, so this example begins exploring parts of the library we've yet to see.
One of the core parts of the C++ Standard Library is a set of commonly-useful data structures and algorithms, which are some of the most immediately valuable pieces of the library when you want to write programs in C++ more quickly; broadly, these are sometimes called the Standard Template Library (STL), though this name is mostly a historical curiosity nowadays. The need for common data structures like array-based sequences (known in C++ as vectors), linked lists (known in C++ as lists or forward_lists), or for common algorithms such as finding the first object in a collection that satisfies some requirement or summing the numbers in a collection of numbers, is pervasive; it's hard to imagine very many interesting programs that don't require some kind of data structure or a common algorithm. So C++ provides a collection of these right out of the box — and that collection has gotten larger in recent years, as C++ has undergone its C++11, C++14, and C++17 standardization efforts, and it will continue to grow as new standards like C++20 are completed and implemented.
In this example, we focus our efforts on the data structures and algorithms in the C++ Standard Library. Rather than exhaustively demonstrating all of them, we aim to show the general design that they have in common with one another. The C++ Standard Library provides its data structures and algorithms in a portion of the library that is split, roughly, into three interlocking parts: containers, generic algorithms, and iterators. Once you understand how each of these parts work, and once you understand how they fit together, you'll find it much easier to learn the individual details when you need them.
Containers
The C++ Standard Library provides a set of what are called containers (or data structures), some examples of which include:
While each of these containers supports a slightly different set of operations, there are similarities wherever such similarities make sense, so you'll find that learning about one of them will make it easier to learn about the others.
Interestingly, inheritance is not used to enforce the commonalities between the various containers; in other words, not all container classes inherit from a common base class the way they do in some other object-oriented implementations (such as the Java Collections Framework that's included in Java's standard library). Instead, the containers are built substantially out of C++ templates, which are able to relate types that have syntactically similar features (e.g., public member functions with the same name or similarly overloaded operators) even if they don't inherit from the same base class.
The containers are type-generic, meaning that different vectors can be specified to hold different kinds of elements, though each one will be specified to hold a particular kind and, like everything else in C++, this constraint will be enforced by the compiler. The syntax for this is generally driven by type parameters — which become part of the container's type — listed within angle brackets. So, for example, a std::vector<int> is a vector in which each element is an integer. We say, then, that std::vector is a class template — a blueprint from which many classes can be created — that takes type parameters. These type parameters are not optional; each time you use the std::vector template (e.g., by declaring a variable that stores a vector), it will be necessary for the compiler to know specifically what type of element will be stored in that vector. In other words, there's no such thing as a std::vector; there are only std::vector<int>, std::vector<std::string>, and so on. Each of these separate instantiations of the std::vector template are separate, incompatible types; they're similar but not the same, so you can't assign, say, a std::vector<int> into a std::vector<std::string>.
All of the containers are "well-behaved" in the ways that we've talked about before: They manage their own memory, clean that memory up when they die, can be copied (e.g., passed by value) and subsequent changes to the copy will not affect the original, can be const and still allow you to use them in a read-only fashion, and so on. That the containers are well-behaved means that you can use the containers as member variables in your classes and not worry about writing a "Big Three"; when these member variables are copied or destroyed, the right thing will happen behind the scenes automatically.
More details about std::vector
To start to get a feel for the design of the containers in the C++ Standard Library, let's take a quick look at the std::vector container.
A std::vector stores a sequence of objects. By sequence, I mean that the objects appear, conceptually, to be stored one after another, and that the order of these objects is considered important. If we iterate over the objects in the container, then, we'll get them back in that same order. Like any container, using it effectively requires knowing enough about its underlying implementation that we understand its costs and benefits. In the case of a vector, the important thing to know is that it stores its elements in an array, just like our ArrayList implementation in a previous example. While that array isn't directly accessible to us — one of the key jobs of a vector is to manage that array automatically — it's the reality, and our understanding of a vector's performance is built on our understanding of how arrays work.
Since a vector's elements are stored, behind the scenes, in a dynamically-allocated array, some things become immediately apparent:
Another thing to understand is that in a vector with capacity c that's currently storing n elements, we would say that the size of the vector is n. In other words, the size of a vector is determined by how many things we're currently storing in it; its capacity is determined by how many things we can store in it without running out of space in the array. Note, too, that vectors behave the way our ArrayList class did when you try to add an element and they're out of space: A new array is allocated and the elements are copied from the original array into the new one, then the original array is deleted. This doesn't happen often and, in fact, one of the guarantees that vector makes is that this cost will be spread evenly across a long sequence of adds (i.e., adding to the end of a vector n times will take a total of O(n) time, even though a few of the operations along the way will be much more expensive than the others because of the reallocations).
Creating a vector is as simple as obtaining its declaration — by including a standard header called <vector> — and declaring a variable of the appropriate type.
std::vector<int> v;
Adding a new element to the end of the vector, in the cell immediately after the last one that contains an element, can be done by calling the member function push_back.
std::vector<int> v; for (unsigned int i = 0; i < 10; i++) { v.push_back(i); }
You can also initialize a vector to contain a predefined sequence of elements, using the same initializer list syntax we saw when we learned about structs previously. The objects in the initializer list become the vector's elements, while the size of the vector is the number of objects in the initializer list.
std::vector<int> v{1, 2, 3, 4, 5};
An initializer list used to construct a vector will always be treated this way, but it's worth knowing that you can also call other vector constructors that take other parameters, but you'll need to use an alternate syntax (parentheses instead of curly braces) to differentiate it from the "initializer list" syntax. (It should be noted, too, that using parentheses to call constructors is always legal, though the curly-brace-based syntax has emerged in recent C++ standards as the "standard" way to initialize something, with the parentheses-based notation used as a fallback to differentiate it when initializer lists have a special meaning, like in this case.) For example, this variant creates a vector containing 10 elements that all have the value 100.
std::vector<int> v(10, 100);
Individual elements are accessible using the same [ ] (indexing) operator that you've seen previously when accessing elements of an array. This operator has been overloaded to do the appropriate thing on a vector.
for (unsigned int i = 0; i < 10; i++) { v[i] += 10; std::cout << v[i] << std::endl; }
Just like arrays, the [ ] operator is not bounds-checked, which means that it misbehaves in the same way that arrays do if you access a cell that doesn't exist (i.e., you get undefined behavior, which might include reading from or writing to memory that's not part of the underlying array). However, there is a bounds-checked alternative: a member function called at(), which throws an exception when an attempt is made to access an element outside of the boundaries of the vector's size (i.e., outside of the index range [0, size-1]). Other than the bounds checking, at() is just like the [ ] operator, including the ability to use it to write to a cell.
for (unsigned int i = 0; i < 10; i++) { v.at(i) += 10; std::cout << v.at(i) << std::endl; }
If it looks strange to you that you can assign to the result of calling a function, this isn't as crazy as it looks. The at() member function returns a reference to the cell in question, as opposed to a copy of it, so that assigning to the result of calling at() is actually assigning directly into a cell of the array.
Of course, if the vector is const, you won't be able to assign into its cells, since that would constitute a change to the vector's publicly observable state; in fact, anything that changes any element of the vector will be disallowed.
void foo(const std::vector<int>& v) { v.push_back(15); // illegal v[3] = 50; // illegal v.at(3) = 50; // illegal for (unsigned int i = 0; i < v.size(); i++) // legal { std::cout << v.at(i) << std::endl; // legal } }
The loop at the end of the foo() function above is entirely legal, because it only uses the vector in ways that don't change it: asking it for its size and reading the value of one of its elements. (Notice that the at() member function can be either legal or illegal in the case of a const vector, depending on how you use the result. This is because the const version of this member function returns a reference to a cell of the array that is itself const-protected.)
Vectors are "well-behaved" in the way that we've understood the term previously. That's a good thing, in general, because we can assume that it'll clean up its own memory and so on. However, it's also important to realize that this is not without cost sometimes. Any time you copy a vector, you're copying the entire vector, including all of its elements; if it's a large vector (imagine, for example, a vector of 1000 strings), this can be quite an expensive operation. It's wise to recognize, and quite often avoid, operations like this that have such high cost; it's an easy performance-related mistake to make before you realize how important it is.
// This function takes a vector by value, which means it will be copied -- along // with all of its elements -- on the way into the function. That can make a // call to this function quite expensive! void blah(std::vector<std::string> v) { // ... } std::vector<int> v; // ... std::vector<int> w = v; // copies v into w v = w; // copies w back into v
What happens when you don't specify a vector's element type
As we've seen, whenever you declare a variable or a parameter that is a vector, it's necessary to specify what type of element will be stored in that vector. There's no such thing as a std::vector; there are only std::vector<int>, std::vector<std::string>, and so on. The compiler needs to know the element type of every vector and the best way to make sure of that is to specify it directly.
That said, in the most recent C++ standard (C++17), a feature was introduced that allows you to leave that detail out of a declaration like this (i.e., to use a class template without specifying its arguments), but only if the compiler can unambiguously deduce what you must have wanted. For example, the following variable declaration turns out to be legal, but it doesn't mean what you might think.
std::vector v1{1, 2, 3, 4, 5};
What is the type of v1? It turns out to be std::vector<int>, even though we didn't say so explicitly, because the compiler was able to deduce from our initializer list that we wanted it to contain integers.
So, how about this one?
std::vector v2{"hello", "there"};
You might expect that one to be deduced as std::vector<std::string>, but it's actually something else: std::vector<const char*>, because string literals in C++ are implemented as pointers to arrays of constant characters. (The reason you can initialize a std::string with a string literal is because std::string has a constructor capable of doing the appropriate conversion.)
And, of course, if the compiler has no way to deduce what we want, it won't allow us to leave out the element type of a std::vector at all.
std::vector v3; // illegal void foo(std::vector v4) // also illegal { // ... }
The moral of this story is that C++ compilers have ways to deduce our types for us, but these features weren't added to the language to save us from having to type as much. They were actually added to provide flexibility in cases where it's difficult to say what we want, such as when we're writing our own templates and want something more flexible than a hard-coded single type. It's important to remember that your programs aren't intended only to be consumed by a compiler; they're also intended to be read by human beings. std::vector<int> simply says more than std::vector does. In short, we don't want to leave things out of a program that would make it easier for a person to read and understand.
Generic algorithms
In addition to containers, the C++ Standard Library provides a set of generic algorithms, which generalize commonly-occurring operations that you might like to perform, such as these:
Algorithms such as these would typically manifest themselves in your programs as loops or more complex functions that you might find yourself having to write. The generic algorithms in the C++ Standard Library remove the need to write a lot of these kinds of things, shortening and clarifying the programs you write, so that you can not only write them more quickly, but also understand them better when you read them. The name of an algorithm reads more clearly than its implementation details, and this is especially true when you combine more than one of them.
The algorithms generally are not aware of what kind of container they're operating on, which means they generally work on all of the containers in the standard library (except for those for which they don't make sense, such as sorting the values in a std::unordered_set). What's more, if you write a container implementation yourself, you can apply the generic algorithms in the standard library to your container, provided that you follow the typical C++ design for a container. Similarly, if you write a generic algorithm yourself that's designed like the ones in the standard library, you can apply it to any of the library's containers.
So how does an algorithm operate on the values in a container without depending on — or even being aware of — what kind of container it is? The answer lies in the third of the interlocking parts: iterators.
Iterators
In the C++ Standard Library, iterators are the glue between the containers and the generic algorithms. An iterator is an abstraction for a position in a container, whose main role is the provide access to the value stored at that position (e.g., in a particular cell of a std::vector), while hiding all of the details about the container's underlying implementation.
Given an iterator, a generic algorithm can access (and potentially modify) the values stored in a container without having to know what kind of container it is, meaning that existing generic algorithms will work with new containers you build, and that new generic algorithms can easily work with the existing containers that have the right characteristics. This idea makes for quite a flexible arrangement, which depends only on agreeing on how to manipulate iterators.
Syntactically, iterators behave a lot like pointers. In some cases, that's what they actually are; in others, they're more complex. But they support the same basic operators either way.
Iterators support whichever operations are sensible for the kind of container they're iterating, and they are categorized on the basis of which operations they support. A few examples of those categorizations are:
Depending on what kind of container you're iterating, you'll get a different kind of iterator back. For example, std::vector provides random-access iterators, because the array that underlies a vector provides constant-time access to any cell given its index. On the other hand, std::list (a doubly-linked list) would provide only a bidirectional iterator, and std::forward_list (a singly-linked list) would provide only a forward iterator, because other operations would be too expensive for an algorithm to reasonably rely on.
Using iterators to traverse a container
Recall a loop that we wrote previously to iterate over an array using a technique called pointer arithmetic.
int a[10]; for (int* p = a; p != a + 10; p++) { *p = 0; }
Recall what's being done here.
While one could argue that this is a fairly obtuse piece of syntax, it turns out to be important to understand it, because iterators in the C++ Standard Library work the same way. Iterators behave a lot like pointers do. We use * to dereference them, ++ to advance them, and calculate "begin" and "end" positions as a way to traverse the elements. The only difference is the types at work.
std::vector<std::string> s; // ... for (std::vector<std::string>::iterator i = s.begin(); i != s.end(); i++) { std::cout << *i << std::endl; }
And since iterators behave like pointers, we can also use the -> operator to access members of the values they point to, just like pointers.
for (std::vector<std::string>::iterator i = s.begin(); i != s.end(); i++) { std::cout << i->length() << std::endl; }
The "auto" keyword and type inference
The technique above is pleasant enough, once you get used to it. But there's one thing about it that's really an eyesore: std::vector<std::string>::iterator is one heck of a type name! And, as you might imagine, this is the tip of the iceberg; as the container types get more complex, so do the iterator types. This, for example, is a legal iterator type for a more complex kind of container: std::map<std::string, std::vector<unsigned int>>::iterator.
The problem, in general, is that the type information itself is somewhat complex. Because different kinds of containers have different, incompatible types of iterators, the types have to communicate enough information to differentiate one from another. While the name of the type is long, none of it is noise; it's all meaningful! But, of course, this leads us to have to say these things in all their complexity, which makes our programs harder to write and harder to read. Statically-typed languages like C++ generally require every declaration to be given a type at compile time, but what do we do when the types themselves have names that are long and complex?
There are a lot of scenarios where a compiler, given contextual information surrounding a declaration, could unequivocally determine exactly what the right type for a declaration would be, without you having to say so. If, for example, you call begin() on a std::vector<std::string>, the compiler is well aware — because of the declaration of the begin() member function that was included already from the <vector> header — that the result will be a std::vector<std::string>::iterator. Beginning in C++11, C++ compilers include a new feature called type inference, which allows the compiler to deduce the proper type for you automatically, instead of you having to say it. The way you ask for this behavior is simple: specify the type as auto.
for (auto i = s.begin(); i != s.end(); i++) { std::cout << i->length() << std::endl; }
Note that this doesn't mean that i has no type, nor does it mean that i can store anything you want, nor that its type will be determined at run time. Just as it did in our original example, i has a compile-time type of std::vector<std::string>::iterator, and we still won't be allowed to assign a value of any other type into i. All of the same type checking will be done the same way. The only thing auto means here is that we'd like the compiler to deduce the type of i for us automatically, so we don't have to write it.
There's at least some controversy in the broader C++ community about when you should and shouldn't use the auto keyword, and a lot of the benefit doesn't arise until we learn more C++ features (most notably, templates, where we want to write one chunk of code that is capable of working with wildly different types), but it's a good time to discuss this feature now, since it can, at the very least, simplify your uses of the C++ Standard Library.
Iterators over "const" containers
We've seen previously that you can declare a container to be const, meaning that it's possible to read from it, but not to alter it. But how does const interact with iterators, which we've seen are capable of both reading from and writing to the values that they point to.
Let's consider a function that prints all of the elements in a vector of strings.
void printAll(const std::vector<std::string>& v) { // What would be the appropriate type for i? for (??? i = v.begin(); i != v.end(); i++) { std::cout << *i << std::endl; } }
As we've seen, the word const can be used in different places and mean different things. Let's think about some possible ways to write the correct type for i.
Unfortunately, there isn't anywhere else we could legally add the keyword const to this type declaration to mean something different. But there is a way to say what we want to say here: use the type const_iterator instead of the type iterator. A const_iterator is one that can be used to read values from a container but not change them. So, the proper type for i would be std::vector<std::string>::const_iterator.
void printAll(const std::vector<std::string>& v) { for (std::vector<std::string>::const_iterator i = v.begin(); i != v.end(); i++) { std::cout << *i << std::endl; } }
Note, too, that the auto keyword would deduce this automatically, because of the vector's constness, making this code equivalent. (It's in matters like this where auto really shines.)
void printAll(const std::vector<std::string>& v) { for (auto i = v.begin(); i != v.end(); i++) { std::cout << *i << std::endl; } }
Range-based "for" loops
You may have noticed that there is a recurring pattern in these examples:
In this pattern, the only reason we need the iterator is to track our position as we iterate; it's an implementation detail of our pattern, but is otherwise boilerplate. For this reason, another kind of for loop — called a range-based for loop — was introduced into C++. The range-based for loop automates precisely this pattern.
std::vector<int> v{1, 2, 3, 4, 5}; for (int i : v) { std::cout << i << std::endl; }
Behind the scenes, the range-based for loop is turned into this equivalent:
std::vector<int> v{1, 2, 3, 4, 5}; for (std::vector<int>::iterator v_iterator = v.begin(); v_iterator != v.end(); v_iterator++) { int i = *v_iterator; std::cout << i << std::endl; }
In other words, the iterator will still be there, but we won't be the ones to have to manage it. All of the boilerplate — the call to begin(), the comparison to end(), the incrementing, and obtaining the value where the iterator points — is automated. It should be noted, too, that the range-based for loop doesn't work only on vectors; it works on anything that looks like a C++ Standard Library container (i.e., it has begin() and end() that return iterators).
As with the use of auto that we saw previously, constness is handled properly and automatically. And we can even avoid copying the vector's elements into our loop variable by using techniques like const references.
void printAll(const std::vector<std::string>& v) { for (const std::string& s : v) { std::cout << s << std::endl; } }
Of course, if we need the iterator for some reason, then we'll have to do things the old-fashioned way. But, in my experience, I don't end up needing the iterators very often. And, in fact, I don't even write all that many of the range-based for loops for common algorithms, because of what we'll see next.
Using iterators with generic algorithms
Most of the generic algorithms in the C++ Standard Library are meant to operate on a range of values, so there needs to be a way for them to access the various values in that range. However, the goal is for the algorithms to be implemented once and work on many different kinds of containers. It shouldn't matter whether you're using a std::vector, a std::list, or a std::map; if it's possible to iterate through it, many of the generic algorithms should be legal to use on it. Examples of generic algorithms include:
There are many others; the above list is just a small sampling of what's available. But the important thing is that none of these algorithms needs to be specific to a particular container; they should work on lots of different kinds of containers. But, to do that, they need a way to access the values in that container without knowing how it's implemented. That's no problem; that's what iterators do!
The generic algorithms generally take a range of values by accepting two parameters: a begin iterator and an end iterator, with the begin iterator specifying the first input value and the end iterator specifying the value just after the last input value. Among other things, this means that the begin() and end() member functions on the various containers provide a nice way to pass the entire container into a generic algorithm.
For example, suppose we wanted to print all of the values in a std::vector<std::string> to the standard output, one per line. The std::for_each algorithm will make that very simple, indeed.
#include <algorithm> // This is where the generic algorithms are declared #include <string> #include <vector> void printStringLine(const std::string& s) { std::cout << s << std::endl; } // ... std::vector<std::string> v; // ... std::for_each(v.begin(), v.end(), printStringLine);
The first two arguments to std::for_each specify the range of values that we want to operate on; in this case, we're passing the entire contents of the vector v by using the begin() and end() member functions. The third argument specifies the function that we want to apply to every value; in this case, we've passed our own printStringLine function, which prints a single string value on a line by itself.
Note, too, that the generic algorithms that accept functions as arguments actually accept function objects, which means that we can pass anything that can be called as a function, including not only functions that we've written, but also arbitrary objects with overloaded function call operators, including the ones built by lambdas.
std::vector<std::string> v; // ... std::for_each( v.begin(), v.end(), [](const std::string& s) { std::cout << s << std::endl; });
Finding out more
After working through this example, it's not a bad idea to spend a little bit of time experimenting with a few of the containers and generic algorithms not discussed here, such as the std::map and std::list containers and the std::find and std::sort algorithms, so you can get a sense of how they work.
A lot of good information can be found online but, as usual, the trick is to separate what's useful from what's worthless. I've found that a good place to start is cppreference.com, which I find to be the most comprehensive and well-written reference documentation for the C++ Standard Library online. (You may disagree and, in general, that's fine. Use what you'd like. But if you're not sure where to start, use cppreference.com.) This will give you an idea of what's available, and will show you the details of how it works underneath, though you might also want to look in other places to see concrete examples of these details in action.
The code
The official moniker for this code example is StandardLibrary, so your best bet is to do this:
Alternatively, you can click the link to the tarball below: