------------------------ Simulation and modelling! -------------------------- 

- The two topics go hand-in-hand: to simulate a system, we need to model
  its components.  We need to model "entities" in the system, relationships
  between those entities, and "events" (changes in the state of the system).

- Simulations come in many different flavours...  The common ones are:
  discrete (entities can only have one of finitely many values) vs.
  continuous (entities can have one of infinitely many values, conceptually);
  static (time is not a part of the system) vs. dynamic (time *is* a part of
  the system); and deterministic (no event occurs at random) vs.
  probabilistic (or "stochastic") (some events occur at random).  Here are
  some examples of various simulations in these different categories.

- Performing "randomized integration" (details): continuous, static,
  probabilistic.  (Maybe faster, or more accurate, than other methods.)

- Simulating the trajectory of a projectile: continuous, dynamic,
  deterministic.  (Easier than solving all equations analytically.)

- Random walks on a grid: discrete, dynamic, probabilistic.

- Etc. (see readings)

- For the rest of the course, we're going to concentrate on dynamic, proba-
  bilistic, discrete simulations.  For such a simulation, we must have some
  way of keeping track of the simulation time, and of generating and executing
  various events as time progresses.  There are two basic ways of doing this.


TIME-DRIVEN SIMULATION

- In a "time-driven" simulation, we have a variable that holds the current
  time, and we increment that time by some fixed amount at every step of the
  simulation.  After each time step, we check all possible event types to
  determine if an event of that type happens at the current time, and handle
  it if it does.

- Example: random walk on a two-dimensional grid.  Starting at (0,0), we move
  in a random direction (up, down, right, or left) by 1 unit, every time
  step.  A time-driven simulation is ideal in this case because we know that
  an event (movement) happens at every step.

- Stopping condition?  Either stop when time reaches a specific value
  (allowing us to answer questions like "how far can we get in that much
  time", for example) or when a specific state is reached in the system
  (allowing us to answer questions like "how long would it take to get as far
  as this", for example).

- What would code for an event-driven simulation generally look like
  (supposing we're given startTime, endTime, timeStep)?

     initialize( state );
     time := startTime;
     while ( time < endTime )   [ or while ( state != finalState) ]
	collect statistics from the current state;
	handle all events that occured in the interval
	   [time, time+timeStep];
	time := time + timeStep;
     end while

EVENT-DRIVEN SIMULATION

- When the events are not garanteed to occur at regular intervals in our
  simulation, and we do not have a good bound on a time increment that is
  appropriate (i.e., that is not too small so that the simulation takes a
  long time and not too big so that we have lots of events to deal with at
  each time step), then it is more appropriate to use an "event-driven"
  simulation.  (A typical example of this is simulating a line of customers
  at a bank.)

- In an event-driven simulation, we have a list of all events that occur
  at various times.  The main loop simply "jumps" to the time for the
  first event and handles it, then jumps to the second one, and so on.

- There are a few issues to discuss here.  First of all, how do we keep
  track of this list of events?  Do we simply generate *all* the events
  for the entire simulation at the beginning and then start processing?
  Generally, this is not what is done.  Instead, we use a "trick":
  certain events are responsible for scheduling future events when they
  happen.  This means that we have to keep track of the events in an
  ordered list, ordered by increasing time (so that when new events are
  scheduled during the simulation, they can be inserted at the right
  point in the list).  At the start, we will simply generate a fixed
  number of starting events randomly, and from this point on, the rest
  of the events will be scheduled during the simulation.

- Now, how does such a simulation know when to stop?  Again, we can
  stop once time reaches a specific point, or once the system reaches a
  specific state.  But sometimes, we might want the stopping condition
  itself to be random.  Here again, there is a standard "trick" that is
  used: we will schedule a "pseudo-event" for termination.  It's called
  a pseudo-event because it's not a change in the state of the system
  being simulated: it's only something used by the simulation algorithm
  itself that indicates the simulation should stop, in this case.

- In the same way, when we want to collect statistics about the state
  of the system, we can collect them after every single event, or we
  can schedule pseudo-events that indicate to the simulation that it
  should collect statistics.

- So, what does the algorithm for an event-driven simulation look like,
  conceptually?

     initialize( system state );
     initialize( event-list );
     while ( simulation not finished )
	remove earliest event from event-list;
	set time = time of this event;
	handle event (including possibly scheduling new events);
     end while

  In fact, if the initialization of the event-list consists only of
  scheduling the pseudo-event "beginning of simulation", then we don't
  even need to initialize the system state explicitly because this can
  be taken care of by the "beginning of simulation" event.  Also, if we
  don't need to keep track of the global time explicitly, we can
  simplify the loop even further, conceptually.


RANDOM NUMBER GENERATION (BASIC PROBABILITY THEORY)

- Have a look at the readings (the section on simulation): they have
  more details on all of this!

- A "random variable" is simply a value associated with some random
  event.  For example, if we are throwing a die, we could define a
  random variable to be equal to the value showing on the die (1-6),
  and if we were throwing two dice, we could define a random variable
  to be equal to the sum of the values (2-12).

- Given a "discrete" random variable X (that can only have one of
  finitely many possible values), there is a fixed finite probability
  that the variable will have a particular value x, for each value x.
  Obviously, the probabilities do *not* have to be the same for
  different values of the random variable.  As an illustration, in our
  example of throwing two dice, we don't expect that the sum will be 2
  as often as it will be 6.  This gives rise to the idea of a
  "probability distribution": given a random variable X, a probability
  distribution is simply a function f_X that assigns a probability
  f_X(x) to each possible value x of the variable, respecting the
  condition that the sum of the probabilities over all values is equal
  to 1 (i.e., sum_{possible x} f_X(x) = 1).

- Some common distributions: uniform (e.g., when throwing one die), normal
  (e.g., when throwing 2 dice, or more), etc.  (We'll look at more later.)

- Given a "continuous" random variable X (that can have one of
  infinitely many possible values), the probability that it is exactly
  equal to one single value x is always 0, so we need to talk of the
  probability of the value lying in some interval.  We still want to
  define something similar to the probability distribution: this will
  be called a "probability density function" f_X.  f_X represents the
  probability that the value of X lies within a certain interval [a,b],
  as follows:
					     /b
	       Pr[ X = x | a <= x <= b ]  =  | f_X(x) dx.
					     /a

  By analogy with the discrete case, we want f_X to "sum up to 1" over
  all possible values of the random variable X, which we can express as
  follows:
			   /oo
			   |  f_X(x) dx  =  1.
			   /-oo

- Again, there are various common distributions.  The "uniform"
  distribution over an interval [a,b] is given by:

			   { 1/(b-a)   if a <= x <= b,
		  f_X(x) = { 
			   { 0         otherwise.

  It represents a distribution where X has equal probability of having
  any single value in [a,b].

- The "exponential" distribution is given by f_X(t) = r e^{-rt}, for
  some real number r > 0, and represents the times between consecutive
  events when they occur at a uniform rate r over time.  (More on this,
  and on its relationship to "Poisson" distributions, next time.)

USING `RANDOM()' IN YOUR PROGRAMS

- The function call `random()' returns a pseudo-random long integer in
  the range 0 .. (RANDOM_MAX - 1).  Why "pseudo-random"?  Well,
  computers behave in a totally predictable manner, so they don't
  really have any natural randomness to use.  They generate these
  pseudo-random numbers according to a well-defined deterministic
  computation.  What this means is that `random' has access to a
  certain amount of data that it uses to determine the next number to
  return.  And every time that this data has a particular value, random
  will return the same number.

- `srandom( unsigned )' is used to initialize the state of the data
  that `random' uses to produce these pseudo-random values.  This
  function should be called ONLY ONCE in the entire program, at the
  very beginning of the program.  Whenever srandom is called with a
  particular "seed" as argument, `random' will generate a particular
  fixed sequence of numbers.
  This means that if srandom is called with a fixed value, say
  `srandom( 25 )', then the program will behave in exactly the same way
  each time that it is run.  To get a program to have more "random"
  behaviour, we need to use a seed that's a little bit different every
  time we run the program.  A standard trick is to use the value of the
  "system clock" for this: `srandom( time(0) )' will do the trick (the
  `time(...)' function is declared in the header <time.h>, which is
  *already* included in "random.h").

MORE PROBABILITY THEORY

- Recall the continuous probability density functions for the "uniform"
  and "exponential" distributions:  The uniform distribution over an
  interval [a,b] is given by:

			   { 1/(b-a)   if a <= x <= b,
		  f_X(x) = { 
			   { 0         otherwise.

  It represents a distribution where X has equal probability of having
  any single value in [a,b].  The exponential distribution is given by

			     { r e^{-rt}   if t >= 0,
		    f_X(t) = { 
			     { 0           otherwise,

  for some real number r > 0, and represents the times between
  consecutive events when they occur at a uniform rate r over time,
  i.e., r is the average number of events that occur per unit time.

- Now, we want to be able to generate random numbers with these
  distributions, and in order to do this, we must talk about one more
  concept: the "cumulative distribution function" F_X(x), represents
  the probability that the value of X is *less than or equal to* x,
  i.e.,
				  /x
			F_X(x) =  |  f_X(z) dz.
				  /-oo