CS 146
                                ======
                              Assignment #4
			 Due before the last lecture of Week 7

1) [10 marks] Say you are really low on disk space on openlab, and you have some
   utility programs in C that you like to use often, but you don't have enough
   disk space to keep the compiled executables around all the time. Each program
   consists of just one .c file.  However, you don't want to manually re-compile
   each C program each time you want to use it.

   Write a short shell script called "C-interp" which is intended to have soft
   links point at it, and pretends to be a C language interpreter.  That is,
   if you have C file "foo.c", then you would make a link "ln -s C-interp foo".
   Then, what C-interp does, when called as "foo", is compile "foo.c" and run
   the resulting executable on the arguments given to "foo".  (You can test it
   on your solutions to the other questions in this assignment.)

   Some caveats:
   - delete ALL temporary files generated, inculding the executable after it's
     been executed.
   - To ensure you don't delete any files that exist before you start whose
     names may conflict with the temporary filenames you choose, put *all*
     temporary files (including the executable) in /tmp/DDD where DDD is a
     random directory name (doesn't need to be 3 characters).  Be sure to
     remove the directory after C-interp finishes.
   - the executable should be called with argv[0] equal to the basename of the
     .c file, without the '.c', eg "foo.c" gets called as "foo" (but the path
     can [and should!] be different, so the compiled executable should be in a
     temporary directory).
   - to ensure that your executable name doesn't conflict with other users,
     you should put the executable in a uniquely-named subdirectory of /tmp.
     In fact, it would be best if all your temp files went into this directory,
     as long as the entire directory is removed when the executable is finished.
   - ensure that the temporary files are deleted even if the program is
     interrupted.  ie, use the "trap" command in the Bourne shell to trap
     signals 0 (Exit), 1 (Hangup), 2 (Interrupt) ,3 (Quit), and 15 (Terminate).
     See signal(5) for a list and more details about signals.

2) [10 marks] Write a filter in C that prints M lines out of every N.  It can
   be done using the shell and awk (see ~wayne/pub/cs146).  It's more simple
   and efficient in C (and can be used as a test case for C-interp above).
   The program's name is "every".  It's SYNOPSiS is:

        $ every [-N,M] [list-of-files]

   where N, M are both integers, N > 0, M >= 0, and M <= N. (Anything in square
   brackets '[]' is optional, and doesn't need to appear on the command line.
   This is standard for Unix manual pages.)  The option argument, if present,
   must come before any filenames.  If no "-N,M" option is on the command line,
   then "every" should look for an environment variable called EVERY and take
   its options from there, in the same format as the command line.  If "every"
   can't find options either on the command line or in the environment variable
   EVERY, then the default is "-1,1".  That is, with no options, "every"
   acts just like cat(1).  For example, if we number lines starting at 0, then

        $ every -10,2 foo.c

   prints out the following lines of foo.c: 0,1, 10,11, 20,21, 30,31,
   etc.  If M is omitted, eg

        $ every -10 foo.c

   then it defaults to 1.  (If either N or M is specified on the command line,
   the environment variable EVERY should be ignored.) If multiple files are
   given on the command line, each one should be handled INDEPENDENTLY, so
   "-10,2" means lines 0,1,10,11, etc. of each file. Like all Unix filters, if
   no files are on the command line, every processes its standard input


3) [20 marks] Below are two different versions of a script called "rename", whose
   purpose is to use sed(1) to programmatically rename any list of files. Both
   have the same SYNOPSIS:

       $ rename '/find/replace/' {list of files}

   where "find" is a regular expression, "replace" in the replacement text, and
   we'll prepend the 's' so that internally sed(1) will see "s/find/replace/".
   The first uses a Bourne shell for loop. Let's call it "rename.loop":

    SED_EXPR="s$1"; shift
    for i in "$@"; do
	new=`echo "$i" | sed "$SED_EXPR"`
	mv "$i" "$new"
    done

   The second one uses awk(1) to create the required sequence of "mv" commands,
   without a Bash loop. Note that to actually have the mv's performed you'd need
   to pipe the output of this script to "sh". Let's call this one "rename.awk":

    SED_EXPR="s$1"; shift
    TMPDIR=`mktemp -d $MYTMP/rename.XXXX`
    trap "/bin/rm -rf $TMPDIR" 0 1 2 3 15
    /bin/ls "$@" | tee $TMPDIR/old | sed "$SED_EXPR" > $TMPDIR/new
    paste $TMPDIR/old $TMPDIR/new | awk -F'\t' '{printf "mv \"%s\" \"%s\"\n", $1,$2}'

   From openlab, get the file ~wayne/pub/cs146/big-dir.7z. It's a 7zip archive
   of a directory containing 10,000 files (named 0000 through 9999 inclusive).
   Unpack the archive--preferably to a local disk rather than a networked disk.
   Then use each of the "rename" versions above to perform some renaming of
   your choice of all 10,000 files (eg., use one of them to append ".txt" to 
   the name of each file, and then the other to rename them all back).

   YOUR TASK: determine which script is faster... which is trivial.

   More important is the question of WHY??? To answer why, you will perform a
   detailed comparative analysis of the resources required for these two versions
   of "rename". This analysis should include a breakdown of total user (CPU),
   system, and real (wall-clock) time for each of the two versions of "rename".
   Use the time(1) command; there's one built in to Bash, but also /usr/time or
   /usr/bin/time, depending on your system. Check out the "--verbose" option.

   Then, break down the timings by running time(1) on each individual component
   of the respective scripts (ls, echo sed, mv, for loop, etc). Specific
   questions you might ask are: how does the sum of the timings of the sub-components
   compare to the total time of the top-level script? Can we use that difference
   to estimate how much time is spent by the top-level process vs. the child processes?
   And if the answer is "yes", provide some details: how much time is spent by the top-
   level Bourne shell that performs the looping & piping (remember that back-quotes
   are internally implemeted using a pipe).

   Your write-up should probably be in a PDF, written nicely as a document with
   tables, possibly even figures if you think that'll help. The quality of your
   write-up (including level of detail, presentation, and quality of discussion)
   will count just as much as correctness of your analysis.