for Statements Python has a general fo" loop statement that we use in a variety of programming contexts. In this lecture, we will learn and explore how to use a for loop to iterate over characters in a string, integers in a range, and lines in a file (each line is a string; and we will learn some string methods that are useful in this context). As we learn about new types in Python (e.g., tuples, lists, sets, and dicts) we will find that many of them (all of these) allow us to use a for loop to iterate over their values as well. For loops are called definite loops, because when they execute the number of iterations is known and there is a guarantee that the loop will terminate: i.e., they know the amount of data that they will process, so they know the number of iterations they will perform: all the characters in a string; all the numbers in a range; all the lines in a file. In the next lecture we will study while loops, which are called indefinite loops: at the time they start they do not know how many iterations they will perform (and if written incorrectly, they might never terminate: what we call an infinite loop). While loops use a boolean-condition to determine when to terminate. Loops are powerful constructs. There is little that we've programmed up to this point in the course that we couldn't do more easily by hand or with a pocket calculator. But once we know about loops, it is easy to solve problems on the computer that include lots of data, that we could solve only clumsily with simpler tools. The EBNF for a for_statement is quite simple to state (compared to the the repetition and options in the EBNF for if_statement). Note that like an if statement, its first line ends with a : indicating that a block (indented) follows. Here is the simplest version of the EBNF for a for statement. for_statement <= for identifier in iterable: block Note the block in a for statement is typically called the body of the for loop. Typically the body will involve some statement(s) that refers to the identifier, which is just a name bound to different values in the iterable (see below). The power of this language feature comes from the many objects in Python that are classified as iterable objects. For now we will identify three categories of iterable objects, with many more to come. Much of what we will learn (and use) in Python relates to executing for loops on iterable objects. iterable <= str | range | open | ...more later Semantically, Python executes a for_statement as follows (1) Evaluate the iterable (2) Repeatedly execute block with identifier bound to every successive value in the iterable We can describe the semantics of this loop in a bit more detail as follows (1) Evaluate the iterable (2) If there is a first value in the iterable, bind identifier to the first value and execute block (3) If there is a second value in the iterable, bind identifier to the second value and execute block .... (?) When there is a last value in the iterable, bind identifier to the last value and execute block, then the loop terminates because there are no more values in the iterable Finally, we can describe these semantics in a more compact/loop-like way. (1) Evaluate the iterable (2) Is there is a(nother) value in the iterable? True : bind identifier to the value and execute block redo (loop back) to step (2) False: terminate the loop: if the for loop is in a block itself, execute the statement in the block that comes after the for loop We can think of an iterable object as producing a sequence of values that are consumed, one at a time, by the for loop, by binding each value in the sequence to identifier and then executing block with that value for identifier. So for loops execute 0 or more times: 0 times if the iterable has no values (for example the empty string, '', which contains 0 characters). Once again 0 takes a prominent places in programming. ------------------------------------------------------------------------------ Iterating over Strings Let us start with an example that we know: string objects. When we iterate over a string object, the identifier in the for loop repeatedly takes on the values of successive/sequential characters in the string: first the character at index 0 (recall indexes for strings start at 0), then the character at index 1, ..., finally the character at the last index. The simple loop for c in 'abc': print(c,end='->') #note argument matching print's end parameter produces the result a->b->c-> Copy, paste, and run this code in Eclipse. Experiment with all the code in this lecture. We will now explore most of the interesting aspects of the for loop in this section (using strings) so it is quite long. The range and open sections that follow are much shorter, not because they are simpler or less powerful, but because we will have learned most of what we need to know about for loops by seeing how they process strings. We can adapt our trace tables to include for loops too. Statement | c | Console | Explanation ------------------+-----+-----------+--------------------------------------- Initial state | | | Nothing interesting in the intial state for c in 'abc': | 'a' | | Start loop; bind c to 'a'(1st in iterable) print(c,end='->') | | a-> | print c's value with -> at end ...for c in 'abc':| 'b' | | Continue loop; bind c to 'b'(next in ...) print(c,end='->') | | a->b-> | print c's value with -> at end ...for c in 'abc':| 'c' | | Continue loop; bind c to 'c'(next in ...) print(c,end='->') | | a->b->c-> | print c's value with -> at end ...for c in 'abc':| | | Terminate loop; no next value in iterable Here we preface all loop iterations after the start by ... We can also use a variable to refer to a string object. The script below s = 'abc' for c in s: print(c,end='->') produces exactly the same output. Think about expressions and what they evaluate to: the value of a string literal and the value of a name bound to the same string literal both evaluate to the same string object. Note that there is nothing special about the name c. The great programmer Shakespeare (who wrote some of the greatest scripts) had it right when he said, "What's in a name? That which we call a rose by any other name would smell as sweet." Examine the following for loop that uses i, not c, for the identifer (and also uses i in the print statement in its body). Also notice that in this script, the for loop is the first statement in a block (of 3 statements), so when the for loop terminates, Python will continue executing code in the block that is the script, and print more information. Here it prints the value of i AFTER THE THE LOOP TERMINATES (still on the same line as the other values printed, because of end='->'); but print(i) specifies no end= so after printing this value it advances to the next line where it prints 'done'. Running this script for i in 'abc': print(i,end='->') print(i) print('done') produces the result a->b->c->c done Examine the trace table written above to see that the for loop identifier stores 'c' when the loop terminates, so c is what is (re)printed. Although it is generally a VERY BAD IDEA to refer to the identifier used in the for loop AFTER THE FOR LOOP TERMINATES, Python does allow it. But pragmatic rules for programming dictage that we should refer to that identifier ONLY INSIDE THE block THAT IS THE BODY OF ITS for LOOP. Here is one more interesting example that illustrates why we should not refer to the for loop identifier after the loop. Here the same code above, but with the loop iterating over and empty (0-character) string. for i in '': print(i,end='->') print(i) print('done') What happens? Python executes the loop zero times and then raises an exception: NameError: name 'i' is not defined. Why? The iterable contains no characters, so the for loop terminates without i every being bound to a value, so when Python tries to print i's value after the loop, Python must raise an exception. The body of the for loop is executed 0 times (another way to say "is not executed"), because the string is empty: there is no first character in an empty string. Let's go back to a slightly extended version of the original example, and fix it, illustrating looping over string slices. s = 'abcde' for c in s: print(c,end='->') produces the result a->b->c->d->e-> It is a bit strange to have that extra -> at the end (leading to nothing). Really it would make more sense to print '->' only between characters in the string, with the last character on the line not followed by "->' So let's suppose that we specify that we want to write a script that prints every character in a string, with the characters SEPARATED by -> (and no -> at the end). For this example, we want to print: a->b->c->d->e. This is actually a hard problem to solve perfectly. You might think we could solve it by using sep instead of end in the code above; a good guess, and something to try, but s = 'abcde' for c in s: print(c,sep='->') produces the result a b c d e Do you know why? You should be able to predict this result with your knowledge of how the print function works and the difference between what the sep and end parameters control, and what values end defaults to when not specified in a print function. Instead, we will attempt to solve this problem first by writing the following code. s = 'abcde' print(s[0],end='') for c in s[1:]: print('->'+c,end='') # could also write print('->',c,sep='',end='') In fact, it produces the result we want a->b->c->d->e Python prints the first letter; then to execute the for loop it must compute the iterable object by the expression s[1:]; recall that this specifies a slice of string s that contains all characters from index 1 to the end; so here it evaluates to the string 'bcde'. So Python binds c to each of these values, printing -> prefacing each c. Therefore it produces the correct result. Note that we need to print 5 letters and four arrows; if each loop prints a letter and arrow, then there will be the same number of each. But, we need to print one more letter than arrow, so we need a print statement outside the loop (either before, as we showed it, or after). There is a famous problem in computer science called the fence-post problem, which relates to this issue. If we want to build a 30 foot fence with horizontal rails that are 3 feet wide, how many rails and fenceposts do we need? Here is a picture +---+---+---+---+---+---+---+---+---+---+ | | | | | | | | | | | Most students just divide 30 by 3 and get 10. And inded we need 10 rails to span 30 feet, but the number of fenceposts we need is 11; that might be hard to see in the picture above, but it is much simpler to see below, in a 3 foot wide fence. +---+ | | Obviously we need 2 fenceposts and 1 rail. In fact we always need one more fencepost than rail. Just as we printed one more letter than arrow separator. So, we are almost done, but not quite. We should think about/test "strange" cases. What would happen if we set s = 'a' (just one character) and ran this script? Our script should work correctly in all cases, no matter how many characters are in the string s. Here, because there is one character, the script should print that one character and be done: print no -> because with only one character, there is nothing to separate. Can you predict what will happen? What happens is that we print the a, the first character in index 0, then we evaluate s[1:] but for a one character string s, the result here is the empty string, so the for loop executes its body 0 times. So the result is this script prints just a, which is correct. Now let's look at an even "stranger" case. What would happen if we set s = '' the empty string and ran this code? Can you predict what will happen? Python will raise an exception when it tries to index s[0] in the first print function. Because s is the empty string, it contains no characters (there are no character at any index, not even at index 0). So using 0 as an index forces Python to raise an exception: IndexError: string index out of range. It raises this same exception in the Python interpreter if we write ''[0]. What do we want the script to print in this case? We want it to print nothing, because there are no characters in the string: it should print neither a character nor a separator. Here is one script to solve the problem s = '' if s != '': print(s[0],end='') for c in s[1:]: print('->'+c,end='') # could also write print('->',c,sep='',end='') By using an if statement, we execute the code we wrote before, but only when we know s is not the empty string (we could have also written this boolean expression as len(s) > 0 and gotten the same behavior). Now the script works correctly for the empty string and strings with 1 or more characters. Actually, another version of this script that is also correct/equivalent in execution is s = '' if s != '': print(s[0],end='') for c in s[1:]: print('->'+c,end='') # could also write print('->',c,sep='',end='') Instead of a script with an if statement controlling a print function and a for loop statement, this code has an if statement controlling only a print function; the for loop statement is always executed after the if is finished: but for any strings with 0 or 1 characters, the for loop's body is executed 0 times (because s[1:] is the empty string). So which script is simpler/easier to understand? Some would say the first because the if controls/(groups together) the two statments; some would say the second because the if controls only what it must control. Both perspectives have merit. What is most important is that we can prove the two are equivalent. Note the first is like [print loop] and the second like [print]loop. We will discuss programming pragmatics throughout the quarter. Of course, we should generalize this script (to make testing easier) to prompt the user for a string to test, and then do the computation on it. import prompt s = prompt.for_string('Enter string to test') if s != '': print(s[0],end='') for c in s[1:]: print('->'+c,end='') # could also write print('->',c,sep='',end='') ------------------------------ Let's use a for loop (and an if) to solve another problem: counting how many vowels are in a string. This script will also include the in operator, which here determines whether or not a character is in (one of the characters in) a string. Finally, it shows a common idiom of for counting something conditionally in Python. Here is the script to count and print the number of vowels in any string input by the user. import prompt s = prompt.for_string('Enter string to test') count = 0 for c in s: if c in 'aeoiu': count += 1 print('There were', count, 'vowels in:', s) Let's write a trace table for hand-simulation of this code, using what we know both about hand-simulating both for loops and if statements. To save space we will omitt the input/output (so no Console column) Statement | s | c |count | Explanation -----------------+--------+---+------+--------------------------------------- Initial state | 'amen' | | | Initialized from prompt count = 0 | | | 0 | Create and intialize name for c in s: | |'a'| | Start loop; bind c to 'a' (1st in s) if c in 'aeoiu': | | | | True: execute next block count += 1 | | | 1 | increment count; block/if finished ...for c in s: | |'m'| | Continue loop; bind c to 'm' (2nd in s) if c in 'aeoiu': | | | | False: skip next block ...for c in s: | |'e'| | Continue loop; bind c to 'e' (3rd in s) if c in 'aeoiu': | | | | True: execute next block count += 1 | | | 2 | increment count; block/if finished ...for c in s: | |'n'| | Continue loop; bind c to 'n' (4th in s) if c in 'aeoiu': | | | | False: skip next block ...for c in s: | | | | Terminte loop; no next value in s At this point, the script would print the information that it accumulated in the variable count: print('There were', count, 'vowels in:', s) prints There were 2 vowels in: amen This computation really belongs in a function, which we could write as follows (which I slightly generalized to count vowels in both upper- and lower-case). Notice that the body of the function contains most of the code from the script, embedded in a function defintion (which we have seen, but not discussed formally yet). def vowel_count(s : str) -> int: count = 0 for c in s: if c in 'aeoiuAEIOU': count += 1 return count After defining this function, we might call it in the following script import prompt s = prompt.for_string('Enter string to test') print('There were', vowel_count(s), 'vowels in:', s) We might also call it in the following context import prompt s = prompt.for_string('Enter string to test') if vowel_count(s) == len(s): print('All vowels!') IMPORTANT: Note that most functions (whether we write them or find them in a library) do not perform input/prompting or output/printing, unless that is their primary purpose: all the functions in the prompt class and the print function itself. Typically it is the script that performs these operations, in conjunction with calling the function to compute its value. A good way to think about a function is that its "inputs" come from the arguments matching its parameters and its "output" is the result that it returns. Note that in both scripts calling the vowel_count function, we prompt the user for a string (s) and then pass it (use it as an argument) when calling the vowel_count function. In the first script we call vowel_count in a print function to print its value: in the second script we call vowel_count to control whether or not another message is printed. But vowel_count itself does no prompting or printing. If we prompt or print a value in the function, we lose versatility. Let the code calling the function determine what informtion to give it and what to do with the resulting value (printing it for using it in some other context). Let the function be simple and useful in its form. Here is one final similar function for computing whether or not all the letters in a string are capitalized. def all_caps(s : str) -> bool: # look for a counterexample: a non-upper-case character for c in s: if c not in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ': return False # found a counterexample # found no counterexamples return True ------------------------------------------------------------------------------ Iterating over ranges of integers In this section we will see how to iterate over ranges of integers, using objects constructed from range and irange. Note that range is automatically imported from Python's builtins module; irange is defined in my goody module, and is normally imported as from goody import irange (so we can use its name by itself, just like range). First let's look at the syntax for constructing range and irange objects, then we will discuss their semantics, see exmples of their use, and explore the exact rules Python applies when these iterators are in for loops. start <= expression stop <= expression step <= expression range <= range (stop | start,stop | start,stop,step) irange <= irange(stop | start,stop | start,stop,step) We could describe range as range <= range (expression[,expression][,expresssion]) but it is useful to name the three expressions in our discussion of the semantics of range/irange objects. Note that range/irange always require one argument: stop. They accept two or three arguments, whose difference we will explore below. Semantically, Python computes a range/irange iterable object as follows: (1) If start is omitted, its default value is 0 (2) If step is omitted, its default value is 1 range : produces values from start (inclusive) to stop (NOT INCLUSIVE) using increments of step irange: produces values from start (inclusive) to stop (INCLUSIVE) using increments of step; in fact, the "i" in "irange" means "i"nclusive of the stop value. Here are some examples of ranges/iranges; we can explore ranges/iranges in Eclipse by simply running the following script for i in range(fill in details): print(i) Range | Values bound to identifier in the for loop ----------------+----------------------------------------------- range (5) | 0, 1, 2, 3, 4 irange(5) | 0, 1, 2, 3, 4, 5 range (1,5) | 1, 2, 3, 4 irange(1,5) | 1, 2, 3, 4, 5 range (1,5,2) | 1, 3 irange(1,5,2) | 1, 3, 5 irange(5,1,1) | empty range, no values bound to identifier irange(1,5,-1) | empty range, no values bound to identifier irange(5,1,-1) | 5, 4, 3, 2, 1 irange(5,1,-2) | 5, 3, 1 Here are the details of how ranges/iranges work. When a range/irange object is constructed, (0) there is a secret name called next that is initialized to start. When an irange (easier to descibe than range) has to produce a value (1) if step > 0 and next > stop, produce no more values; terminate the loop if step < 0 and next < stop, produce no more values; terminate the loop (2) produce the value next, but internaly update next += step (next goes up or down depending on whether step is positive or negative) for use the next time the irange must produce a value. So here is how irange(3,5) produces values: next is initialized to 3 (start is 3, stop is 5 and step is 1). When asked to produce a new value, rule (1) doesn't apply; rule (2) produces 3 and updates next to 4. When asked to produce a new value, rule (1) doesn't apply; rule (2) produces 4 and updates next to 5. When asked to produce a new value, rule (1) doesn't apply; rule (2) produces 5 and updates next to 6. When asked to produce a new value, rule (1) applies and the irange produces no more values. Use this detailed description to better understand how all the values are produced in strange iranges like irange(5,1,-2). Note that we could be in big trouble if we specified a step that was 0; can you explain why? So Python prohibits this value from being used at the third argument in a range/irange. If we write range(x,y,0) Python raises an exception: ValueError: range() arg 3 must not be zero If you think back to the indexes used in string slices, ranges are similar. The biggest difference is that a string s knows how long it is, so it can use len(s) as a default stop value, but there is no such upper bound in an integer range. We can easily write a simple script to print all the prime values in any range. ------------------------------ import prompt from predicate import is_prime min = prompt.for_int('Enter minimum value to check') max = prompt.for_int('Enter maximum value to check') for i in irange(min,max): if is_prime(i): print(i) ------------------------------ Here is a version that counts and prints the number of primes in a range. It just combines the counting idiom for loops we saw above, with this particular loop. ------------------------------ import prompt from predicate import is_prime min = prompt.for_int('Enter minimum value to check') max = prompt.for_int('Enter maximum value to check') count = 0 for i in irange(min,max): if is_prime(i): count += 1 print(count) ------------------------------ Here is a function that computes the number of primes in a range. Note we DIDN'T prompt for min/max in the function, but instead listed those as parameters to the function, following the rules explained above. The prompts occur outside the function, and the returned result is printed outside the function as well. from goody import irange from predicate import is_prime def primes_between(min,max): count = 0 for i in irange(min,max): if is_prime(i): count += 1 return count We would call it as follows print('The number of primes between 1000 and 2000 is',primes_between(1000,2000)) which prints The number of primes between 1000 and 2000 is 135 I want to show you something now, but cautiously. Because, what I am going to show you is illustrative, but typically the WRONG thing to do in most circumstances. We can print all the characters in a string (we've already seen how to do it one way) by using a range of integers to index each character. The code would be s = 'abcde' for i in range(len(s)): print(s[i],end='') which prints: abcde Note for the string s, 0 1 2 3 4 +-+-+-+-+-+ |a|b|c|d|e| +-+-+-+-+-+ Notice that range produces values from 0 (by default; and the start of the string) up to but not including len(s) - here 5. So it produces the the values 0, 1, 2, 3, 4 which are all legal indexes of the string. So range does exactly the thing we want in this example, by not including the last value (5) among the values it produces. Why is what I showed typically not the best way to solve this problem? Because, we have a simpler way to solve it without resorting to the indexes of the characters in the string. Recall we can write the simpler code s = 'abcde' for c in s: print(c,end='') which produces the same result, without refering to integers and indexes, just relying on the properties of strings as iterables. It prints: abcde So, we should avoid writing for loops that produce indexes if we don't need them. When might we need them? Well suppose that we wanted to print all the characters in a string in reverse order: we could write the script s = 'abcde' for i in range(len(s)-1,-1,-1): # or, for i in irange(len(s)-1,0,-1) print(s[i],end='') which prints: edcba Maybe here it would be a little more intuitive to write the for loop using an irange, as for i in irange(len(s)-1,0,-1): although really the best for loop uses a Python feature we will learn later: for c in reversed(s): in which the reversed object iterates backward through whatever object is its argument. We'll save that for another day. On Python and Language Extensions The standard Python includes a range object but not an irange object. Yet I found that more often than not, it is easier to write irange: I would have to write things like range(1,11) to get the value 1-10, which is confusing. So, I wrote the irange definition and put it in my goody module so I could easily use it whenever I wanted it. Programmers are constantly creating better tools for themselves. Python is popular because it is an easy language to make such tools and use them. If we don't like something about the Python language, we might be able to "fix it", when we've learned enough about Python. ------------------------------------------------------------------------------ Iterating over lines in a file (and some useful string methods) In this section we will see how to iterate over lines (each represented by a string) in a file using objects constructed from open. Note that open, like range, is automatically imported from Python's builtins module. First let's look at the syntax for constructing open objects, then we will discuss their semantics and see examples of their use. open <= open(file-name) | ...other options for writing files later For file-name, we must specify a string representing the name of an existing file: e.g., 'letter.txt'. Note that to read information from files easily using a script running in Eclipse, the file we are opening must appear in the project folder that contains the .py script we are running, in the project folder at the same level as the script's .py file. We should see it in the same location as the script file in the PyDev Package Explorer. In fact, we can open files that reside anywhere on our computer, but it is harder (and irrelevant to what we are learning now), so we will assume the files we want to open are in the correct location in the project folder. Semantically, Python computes an open iterable as follows: (1) Find the file (raise the FileNotFoundError if not found or unreadable) (2) Produce values for each line in the file (each a string representing the line) So, one simple script would echo a file to the console: binding the identifier to each line and printing it. Install the project folder accompanying this lecture, which contains the following script (test.py) and two text files: letter.txt (a dozen lines) and dictionary.txt (25,000 lines). for l in open('letter.txt'): print(l) Here, the for loop binds l to each string/line in the file, one after another, and print l for each binding. If we run this script successfully, it will print every line from the file (which we can examine by double-clicking its name in Eclipse) in the console, but with a mysterious blank line between every line of text. Now, we could fix this problem by changing the print to print(l,end='') but we need to learn what the real problem is and how to fix it in many other contexts. Here is what the start of the print looks like ------------------------------ Dear Jack: I want a man who knows what love is all about. *Good direct start You are generous, kind, and thoughtful. ...more lines ------------------------------ If we change the print to have the following "magic" (__repr__ is a special method we will learn more about later), we will see the lines printed, but with no blank lines between them. for l in open('letter.txt'): print(l.__repr__()) Each line looks like ------------------------------ 'Dear Jack:\n' 'I want a man who knows what love is all about.\n' '*Good direct start\n' 'You are generous, kind, and thoughtful.\n' ------------------------------ Using __repr__ shows the string literal equivalent of each line, including its opening and closing quotes, and each line is seen to have the \n escape character at the end (because each line in a file ends with a newline). When we use print(l), the '\n' in the string itself forces Python to go to the next line, and when the print has finished printing l it goes to the next line too, which is why the line is skipped. Typically when we read a line from a file, we want to strip any special white-space characters at its end (but not always, which is why Python doesn't automatically do it). There is a right strip method (like a function, but remember methods are called with the syntax object.method(...)) that strips white-space, including newlines, off the right end of a string. Here is an example that illustrates (with .__repr__()) the meaning of l = 'Hello\n \n' ls = l.rstrip() print(l.__repr__()) print(ls.__repr__()) Note that rstrip does not CHANGE the string it is called on (string are immutable) but it returns a new string that has all the characters of the old string, but not the white-space at the end. These statemetns print the following (producing a new string stripped of all of '\n \n' off the right end of this string). 'Hello\n \n' 'Hello' FYI, there is a lstrip method (for stripping white space off the left end) and a strip method (for stripping white space off both ends). But rstrip is the most useful method. So if we go back to our original code and call the rstrip method on each line, to strip the whitespace on the right, our code would be for l in open('letter.txt'): print(l.rstrip()) Again, l does not change when its rstrip method is called, but calling this method produces a new string without the white-space at the end; this new string is what print prints. Running the script above would print as follows with no extra lines. This is called echoing a file (printing its contents on the console, one line at a time). ------------------------------ Dear Jack: I want a man who knows what love is all about. *Good direct start You are generous, kind, and thoughtful. ...more lines ------------------------------ The actual letter.txt file represents a letter, in which each line can be annotated below by lines that starts with a *. Let's examine a program that uses an if statement to print only the lines in the letter, not printing any annotation lines (which all start with a *). for l in open('letter.txt'): ls = l.rstrip() if ls.find('*') != 0: # or test len(ls) > 0 and ls[0] =! '*': print(ls) Here I needed to use the stripped line twice: once in the if statement and again in the print function call. So I have decided to define an extra name, ls, to store a reference to the line stripped; and then use that name ls where necessary. Note the boolean expression is a call to the find method, which returns the index (remember they start at 0) of the first '*' in the string (or a -1 if * does not occur in the string). So this boolean expression evaluates to True exactly when strings that have a * in their first position (at index 0). Note that for an empty string, the find method always returns -1, and hence the boolean expression is always False for empty strings. So, this script would print as follows (missing from above is the line: *Good direct start) ------------------------------ Dear Jack: I want a man who knows what love is all about. You are generous, kind, and thoughtful. ...more lines ------------------------------ The following script reads a file that has one word on each line, and computes the average word length for all the words in that file. The file dictionary.txt has this correct forma. word_count = 0 # accumulate count of all words/lines read length_sum = 0 # accumulate # of all characters in these words for l in open('dictionary.txt'): word_count += 1 length_sum += len(l.rstrip()) print('Word count =',word_count) print('Length sum =',length_sum) print('Average length =',length_sum/word_count) Notice that since I used l.rstrip() just once in this script, I did not define a new name to refer to its value. This script would run almost instantaneously (on this file of about 25,000 words) and print the following. Word count = 25094 Length sum = 181268 Average length = 7.223559416593608 Here is a simlar verion that reads a file of numbers (one per line) and computes the sum and the average of the numbers read from the file. Remember that open produces a sequence of string values: we use the int(...) conversion function on each line: e.g., int('123') is the int 123. count = 0 sum = 0 for l in open('numbers.txt'): count += 1 sum += int(l.rstrip()) # strip the '\n' and determine int equivalent of str print('Count =',count) print('Sum =',sum) print('Average =',sum/count) As a final example, here is a function that determines whether a word is legal according to a specified dictionary file. def is_legal(word : str, dict_file : str) -> bool: # look for a word in the dictionary file for l in open(dict_file): if word == l.rstrip(): # found it; legal return True return False # couldn't find it in the dictionary: illegal print('Is immature a legal word?',is_legal('immature','dictionary.txt')) print('Is immatur a legal word?',is_legal('immatur' ,'dictionary.txt')) Using this function, you could you write a script that read a file of words (1 word per line) and printed all the illegal/misspelled words. ------------------------------------------------------------------------------ Closing: We should now have a good understanding a working knowledge of for loops, and how to write code using for loops iterating over strings, integer ranges, and files. Because it is so easy to repeat code over and over again in loops, even though the work done in each block that is the body of the loop may be small, we can specify loops that do large amounts of work by repeating that simple calculation many times. ------------------------------------------------------------------------------ Problems 1) What happens: s = 'abc': for c in s: print(c) s = 'xyz' 2) Rewrite code to print a->b->c->d->e with the extra print statement after the loop, not before it; make it go to the next line after printing this last value as well (not stay on the current line as needed in the loop). 3) Write a scripts that prompts the user for a word and file name (both are strings) and reads the file, printing every line that contains the word; enhance this code to print the number of the line in the file: so if only the 100th line in a file contained the word, it would print as line 100.