------------------------ Simulation and modelling! -------------------------- - The two topics go hand-in-hand: to simulate a system, we need to model its components. We need to model "entities" in the system, relationships between those entities, and "events" (changes in the state of the system). - Simulations come in many different flavours... The common ones are: discrete (entities can only have one of finitely many values) vs. continuous (entities can have one of infinitely many values, conceptually); static (time is not a part of the system) vs. dynamic (time *is* a part of the system); and deterministic (no event occurs at random) vs. probabilistic (or "stochastic") (some events occur at random). Here are some examples of various simulations in these different categories. - Performing "randomized integration" (details): continuous, static, probabilistic. (Maybe faster, or more accurate, than other methods.) - Simulating the trajectory of a projectile: continuous, dynamic, deterministic. (Easier than solving all equations analytically.) - Random walks on a grid: discrete, dynamic, probabilistic. - Etc. (see readings) - For the rest of the course, we're going to concentrate on dynamic, proba- bilistic, discrete simulations. For such a simulation, we must have some way of keeping track of the simulation time, and of generating and executing various events as time progresses. There are two basic ways of doing this. TIME-DRIVEN SIMULATION - In a "time-driven" simulation, we have a variable that holds the current time, and we increment that time by some fixed amount at every step of the simulation. After each time step, we check all possible event types to determine if an event of that type happens at the current time, and handle it if it does. - Example: random walk on a two-dimensional grid. Starting at (0,0), we move in a random direction (up, down, right, or left) by 1 unit, every time step. A time-driven simulation is ideal in this case because we know that an event (movement) happens at every step. - Stopping condition? Either stop when time reaches a specific value (allowing us to answer questions like "how far can we get in that much time", for example) or when a specific state is reached in the system (allowing us to answer questions like "how long would it take to get as far as this", for example). - What would code for an event-driven simulation generally look like (supposing we're given startTime, endTime, timeStep)? initialize( state ); time := startTime; while ( time < endTime ) [ or while ( state != finalState) ] collect statistics from the current state; handle all events that occured in the interval [time, time+timeStep]; time := time + timeStep; end while EVENT-DRIVEN SIMULATION - When the events are not garanteed to occur at regular intervals in our simulation, and we do not have a good bound on a time increment that is appropriate (i.e., that is not too small so that the simulation takes a long time and not too big so that we have lots of events to deal with at each time step), then it is more appropriate to use an "event-driven" simulation. (A typical example of this is simulating a line of customers at a bank.) - In an event-driven simulation, we have a list of all events that occur at various times. The main loop simply "jumps" to the time for the first event and handles it, then jumps to the second one, and so on. - There are a few issues to discuss here. First of all, how do we keep track of this list of events? Do we simply generate *all* the events for the entire simulation at the beginning and then start processing? Generally, this is not what is done. Instead, we use a "trick": certain events are responsible for scheduling future events when they happen. This means that we have to keep track of the events in an ordered list, ordered by increasing time (so that when new events are scheduled during the simulation, they can be inserted at the right point in the list). At the start, we will simply generate a fixed number of starting events randomly, and from this point on, the rest of the events will be scheduled during the simulation. - Now, how does such a simulation know when to stop? Again, we can stop once time reaches a specific point, or once the system reaches a specific state. But sometimes, we might want the stopping condition itself to be random. Here again, there is a standard "trick" that is used: we will schedule a "pseudo-event" for termination. It's called a pseudo-event because it's not a change in the state of the system being simulated: it's only something used by the simulation algorithm itself that indicates the simulation should stop, in this case. - In the same way, when we want to collect statistics about the state of the system, we can collect them after every single event, or we can schedule pseudo-events that indicate to the simulation that it should collect statistics. - So, what does the algorithm for an event-driven simulation look like, conceptually? initialize( system state ); initialize( event-list ); while ( simulation not finished ) remove earliest event from event-list; set time = time of this event; handle event (including possibly scheduling new events); end while In fact, if the initialization of the event-list consists only of scheduling the pseudo-event "beginning of simulation", then we don't even need to initialize the system state explicitly because this can be taken care of by the "beginning of simulation" event. Also, if we don't need to keep track of the global time explicitly, we can simplify the loop even further, conceptually. RANDOM NUMBER GENERATION (BASIC PROBABILITY THEORY) - Have a look at the readings (the section on simulation): they have more details on all of this! - A "random variable" is simply a value associated with some random event. For example, if we are throwing a die, we could define a random variable to be equal to the value showing on the die (1-6), and if we were throwing two dice, we could define a random variable to be equal to the sum of the values (2-12). - Given a "discrete" random variable X (that can only have one of finitely many possible values), there is a fixed finite probability that the variable will have a particular value x, for each value x. Obviously, the probabilities do *not* have to be the same for different values of the random variable. As an illustration, in our example of throwing two dice, we don't expect that the sum will be 2 as often as it will be 6. This gives rise to the idea of a "probability distribution": given a random variable X, a probability distribution is simply a function f_X that assigns a probability f_X(x) to each possible value x of the variable, respecting the condition that the sum of the probabilities over all values is equal to 1 (i.e., sum_{possible x} f_X(x) = 1). - Some common distributions: uniform (e.g., when throwing one die), normal (e.g., when throwing 2 dice, or more), etc. (We'll look at more later.) - Given a "continuous" random variable X (that can have one of infinitely many possible values), the probability that it is exactly equal to one single value x is always 0, so we need to talk of the probability of the value lying in some interval. We still want to define something similar to the probability distribution: this will be called a "probability density function" f_X. f_X represents the probability that the value of X lies within a certain interval [a,b], as follows: /b Pr[ X = x | a <= x <= b ] = | f_X(x) dx. /a By analogy with the discrete case, we want f_X to "sum up to 1" over all possible values of the random variable X, which we can express as follows: /oo | f_X(x) dx = 1. /-oo - Again, there are various common distributions. The "uniform" distribution over an interval [a,b] is given by: { 1/(b-a) if a <= x <= b, f_X(x) = { { 0 otherwise. It represents a distribution where X has equal probability of having any single value in [a,b]. - The "exponential" distribution is given by f_X(t) = r e^{-rt}, for some real number r > 0, and represents the times between consecutive events when they occur at a uniform rate r over time. (More on this, and on its relationship to "Poisson" distributions, next time.) USING `RANDOM()' IN YOUR PROGRAMS - The function call `random()' returns a pseudo-random long integer in the range 0 .. (RANDOM_MAX - 1). Why "pseudo-random"? Well, computers behave in a totally predictable manner, so they don't really have any natural randomness to use. They generate these pseudo-random numbers according to a well-defined deterministic computation. What this means is that `random' has access to a certain amount of data that it uses to determine the next number to return. And every time that this data has a particular value, random will return the same number. - `srandom( unsigned )' is used to initialize the state of the data that `random' uses to produce these pseudo-random values. This function should be called ONLY ONCE in the entire program, at the very beginning of the program. Whenever srandom is called with a particular "seed" as argument, `random' will generate a particular fixed sequence of numbers. This means that if srandom is called with a fixed value, say `srandom( 25 )', then the program will behave in exactly the same way each time that it is run. To get a program to have more "random" behaviour, we need to use a seed that's a little bit different every time we run the program. A standard trick is to use the value of the "system clock" for this: `srandom( time(0) )' will do the trick (the `time(...)' function is declared in the header , which is *already* included in "random.h"). MORE PROBABILITY THEORY - Recall the continuous probability density functions for the "uniform" and "exponential" distributions: The uniform distribution over an interval [a,b] is given by: { 1/(b-a) if a <= x <= b, f_X(x) = { { 0 otherwise. It represents a distribution where X has equal probability of having any single value in [a,b]. The exponential distribution is given by { r e^{-rt} if t >= 0, f_X(t) = { { 0 otherwise, for some real number r > 0, and represents the times between consecutive events when they occur at a uniform rate r over time, i.e., r is the average number of events that occur per unit time. - Now, we want to be able to generate random numbers with these distributions, and in order to do this, we must talk about one more concept: the "cumulative distribution function" F_X(x), represents the probability that the value of X is *less than or equal to* x, i.e., /x F_X(x) = | f_X(z) dz. /-oo