ICS 180A, Spring 1997:
Strategy and board game programming

Lecture notes for April 10, 1997
Evaluation Functions

General considerations

The evaluation function is where most of the game-specific knowledge goes into your program. We start off with two basic assumptions:

  1. We can represent the quality of a position as a number. For instance, this number might be our estimate of the probability that we can win the game; but most programs don't try to make the number mean anything so specific, it's just a number.

  2. The quality we measure is or should be the same as the quality our opponent measures (so if we think we're in a good position, our opponent thinks he's in a bad position and vice versa). This is unlikely to be really true, but it's needed to make our search algorithms work well, and in practice it comes pretty close to the truth.

The evaluation can be more or less complicated, depending on how much knowledge you build in to it. The more complicated it is, and the more knowledge it encodes, the slower it is likely to be. Typically, the performance of a program (how well it plays) has been estimated as behaving like the product of the knowledge and speed:

So, if you have a fast dumb program you can often make it better by adding more knowledge and slowing it down a little. But that same additional knowledge and slowdown might actually make a smart slow program worse; there is a diminishing rate of return of performance to knowledge. Similarly once you speed your program up past a certain point, there is a diminishing improvement for adding more speed, and you would be better off balancing speed and knowledge somewhere closer to the middle of the chart. This balance point varies somewhat depending on what kind of opponent you expect to face; speed works better for defeating other computers, while human opponents are very good at exploiting holes in your knowledge and are more easily defeated by knowledge-based programs.

Implementation methods

There are two major types of evaluation function method. The first, "end-point evaluation", is simply to evaluate each position independently of each other position, using your favorite evaluation algorithm. This can give good results, but is slow, so some programmers have resorted to the following trick, known as pre-computation, first order evaluation, or piece-square tables.

Before we begin a search for the best move from a position, we examine carefully the position itself, and compute values to store in an array T[square,piece type]. The evaluation of any position found in the search will then be simply the sum of the array values for the pieces in the position. We don't have to compute the sum from scratch at each step; instead when moving a piece from one square to another update the score using the formula

score += T[new square,piece] - T[old square,piece]

Examples of piece-square table values in chess: when a king is castled into the corner of the board, the pawns in front of it are very useful in defending against attacks. Their ability to defend becomes less as they move forward. So, if the king is in the corner in the starting position of the search, we might build piece-square tables for the pawns having the values

    ... 1   1   1   1
    ... 1   1.1 1.1 1.1
    ... 1   1.2 1.2 1.2
on the three rows in front of the king, to encourage the pawns to stay close to the king by giving them a greater value than their usual one point when they are nearby.

Unfortunately while piece-square tables are blindingly fast, and you can incorporate some interesting kinds of knowledge this way, piece-square tables are pretty stupid in general. They can't take into account interactions between several moving pieces; those interactions have to be approximated by looking at where the pieces were when the piece-square table was computed. So, for instance, if we search through a long sequence of moves, in which the king goes to a different part of the board, the piece-square table values above would be inaccurate because they would be making the pawns defend the place the king used to be, rather than defending the king itself.

Programs that use piece-square tables often combine them with some amount of end-point evaluation. Another strategy for making piece-square table methods more accurate is to delay building the tables until later in the search; e.g. if you are searching 9-move sequences, build the tables after sequences of 5 moves and use them for the remaining 4-move search. If you do that, though, you should be careful to make the tables resulting from one 5-move sequence be consistent with those from other sequences, so that the overall evaluation scores can be compared meaningfully. In class Dave O. suggested another possible improvement: make incremental modifications to the piece-square tables, e.g. move the bonuses for pawns in front of kings when the kings move; this seems like a good idea but I don't know whether it's been implemented or if so how well it worked.

How to combine evaluation terms

Typically, like the first-order evaluations above, an evaluation function is a sum of several terms, where each term is the result of a function that looks for certain specific information in a position. Why sums? It's a relatively simple way of combining information that works ok in practice.

My own feeling is that game programmers really should try more carefully to model their evaluation functions on probabilities: combine terms to determine probabilities of winning soon (by carrying out some kind of attack), in a moderate number of moves, or in an endgame (say by taking advantage of a passed pawn in chess), and combine the probabilities appropriately. If the probability of winning soon for black is bs and for white is ws, if the probability of winning in a moderate number of moves (assuming no sooner win) is bm or wm, and if the probability of winning in an endgame is be or we, then the overall probability of winning is

bs + (1 - bs - ws) bm + (1 - bs - ws - bm - wm) be
ws + (1 - bs - ws) wm + (1 - bs - ws - bm - wm) we.
I think it might be a useful idea for an evaluation function to compute terms estimating these individual probabilities, and combine them with formulas like the ones above. How well each probability is estimated could be tested by comparing the program's estimates against the actual results in databases of games, and this would give a program the ability to do some rudimentary planning (judging whether to go for a certain attack based on how likely it is to work). But this is purely speculation, it hasn't been tested in a real program, and you won't go far wrong just using sums.

What kinds of information go into evaluation functions?

Evaluation functions typically combine terms encoding knowledge of different types:
David Eppstein, Dept. Information & Computer Science, UC Irvine, .