
ICS 180, Winter 1999:
Strategy and board game programming
First, let's quickly review the original alpha-beta search, omitting details like hashing or adjust winning scores for the current ply.
    // basic alpha-beta search
    int alphabeta(int depth, int alpha, int beta)
    {
        move bestmove;
        if (game over or depth <= 0) return winning score or eval();
        for (each possible move m) {
            make move m;
            score = -alphabeta(depth - 1, -beta, -alpha)
            if (score >= alpha) { alpha = score; bestmove = m; }
            unmake move m;
            if (alpha >= beta) break;
        }
        return alpha;
    }
    // fail-soft alpha-beta search
    int alphabeta(int depth, int alpha, int beta)
    {
        move bestmove;
        int current = -WIN;
        if (game over or depth <= 0) return winning score or eval();
        for (each possible move m) {
            make move m;
            score = -alphabeta(depth - 1, -beta, -alpha)
            unmake move m;
            if (score >= current) {
                current = score;
                bestmove = m;
                if (score >= alpha) alpha = score;
                if (score >= beta) break;
            }
        }
        return current;
    }
With this change, one can determine a little more information than before 
about a position.  If the returned value x is less than or equal to alpha, 
then we still don't know the true value of the position (because we may 
have pruned away some important lines of the search), but we do know that 
the true value is at most x.  Similarly, if x is greater than or equal to 
beta, we know that the true search value is at least x.
These slightly tighter upper and lower bounds don't improve the search 
itself, but they could lead to a greater number of successful hash probes.
The use of fail-soft alpha-beta is also essential in the MTD(f) algorithm 
described below.
    alphabeta(depth, -WIN, WIN)
where the huge range between -WIN and 
WIN indicates that we don't know what the true search value will be, so
all possible scores should be considered interesting.
Then, the move one makes should be the one set in the variable bestmove at 
the outer level of the search.
Instead, it is often helpful to call alpha-beta with an artificially narrow window centered around the previous search value. If the result is a score within that window, you've saved time and found the correct search value. But if the search fails, you must widen the window and search again:
    // aspiration search
    int alpha = previous - WINDOW;
    int beta = previous + WINDOW;
    for (;;) {
        score = alphabeta(depth, alpha, beta)
        if (score <= alpha) alpha = -WIN;
        else if (score >= beta) beta = WIN;
        else break;
    }
The constant WINDOW should be set in a way that balances the time savings 
from a narrower search with the time lost from repeating an unsuccessful 
search.  A typical value in chess might be around half a pawn.
Variants of aspiration search include widening the window more gradually 
in the event of an unsuccessful search, or using an initial search window 
that is not necessarily centered around the previous search result.
One could use this idea to perform a binary search for the true score:
    int alpha = -WIN;
    int beta = +WIN;
    while (beta > alpha+1) {
        int test = (alpha+beta)/2;
        if (alphabeta(depth, test, test+1) <= test) beta = test;
        else alpha = test+1;
    }
However, this will lead to a large number of searches (the logarithm of the difference between WIN and -WIN). The MTD(f) idea is to instead use fail-soft alpha-beta to control the search: each call to fail-soft alpha-beta returns a search value which is closer to the final score, so if we use that search value as the start of the next test, we should eventually converge.
    // MTD(f)
    int test = 0;
    for (;;) {
        score = alphabeta(depth, test,test+1);
        if (test == score) break;
        test = score;
    }
Unfortunately, complicated interactions with the hash table can cause this 
routine to get into an infinite loop, so one needs additional code to halt 
the search if too many iterations have been made without any convergence.
One big advantage of MTD(f) is that we can simplify the code to the alpha-beta search, since it only really has two parameters (depth and alpha) rather than three.
So PVS performs that first search with a normal window, but on subsequent searches uses a zero-width window to test each successive move against the first move. Only if the zero-width search fails does it do a normal search.
    // principal variation search (fail-soft version)
    int alphabeta(int depth, int alpha, int beta)
    {
        move bestmove, current;
        if (game over or depth <= 0) return winning score or eval();
        move m = first move;
        make move m;
        current = -alphabeta(depth - 1, -beta, -alpha)
        unmake move m;
        for (each remaining move m) {
            make move m;
            score = -alphabeta(depth - 1, -alpha-1, -alpha)
            if (score > alpha && score < beta)
                score = -alphabeta(depth - 1, -beta, -alpha)
            unmake move m;
            if (score >= current) {
                current = score;
                bestmove = m;
                if (score >= alpha) alpha = score;
                if (score >= beta) break;
            }
        }
        return current;
    }
This shares the advantage with MTD(f) that most nodes in the search tree 
have zero-width windows, and can use a simpler two-parameter form of 
alpha-beta.  Since there are very few calls with beta > alpha+1,
one can do extra work in those calls (such as saving the best move for 
later use) without worrying much about the extra time it takes.