CS 146 ====== Assignment #4 Due before the last lecture of Week 7 1) [10 marks] Say you are really low on disk space on openlab, and you have some utility programs in C that you like to use often, but you don't have enough disk space to keep the compiled executables around all the time. Each program consists of just one .c file. However, you don't want to manually re-compile each C program each time you want to use it. Write a short shell script called "C-interp" which is intended to have soft links point at it, and pretends to be a C language interpreter. That is, if you have C file "foo.c", then you would make a link "ln -s C-interp foo". Then, what C-interp does, when called as "foo", is compile "foo.c" and run the resulting executable on the arguments given to "foo". (You can test it on your solutions to the other questions in this assignment.) Some caveats: - delete ALL temporary files generated, inculding the executable after it's been executed. - To ensure you don't delete any files that exist before you start whose names may conflict with the temporary filenames you choose, put *all* temporary files (including the executable) in /tmp/DDD where DDD is a random directory name (doesn't need to be 3 characters). Be sure to remove the directory after C-interp finishes. - the executable should be called with argv[0] equal to the basename of the .c file, without the '.c', eg "foo.c" gets called as "foo" (but the path can [and should!] be different, so the compiled executable should be in a temporary directory). - to ensure that your executable name doesn't conflict with other users, you should put the executable in a uniquely-named subdirectory of /tmp. In fact, it would be best if all your temp files went into this directory, as long as the entire directory is removed when the executable is finished. - ensure that the temporary files are deleted even if the program is interrupted. ie, use the "trap" command in the Bourne shell to trap signals 0 (Exit), 1 (Hangup), 2 (Interrupt) ,3 (Quit), and 15 (Terminate). See signal(5) for a list and more details about signals. 2) [10 marks] Write a filter in C that prints M lines out of every N. It can be done using the shell and awk (see ~wayne/pub/cs146). It's more simple and efficient in C (and can be used as a test case for C-interp above). The program's name is "every". It's SYNOPSiS is: $ every [-N,M] [list-of-files] where N, M are both integers, N > 0, M >= 0, and M <= N. (Anything in square brackets '[]' is optional, and doesn't need to appear on the command line. This is standard for Unix manual pages.) The option argument, if present, must come before any filenames. If no "-N,M" option is on the command line, then "every" should look for an environment variable called EVERY and take its options from there, in the same format as the command line. If "every" can't find options either on the command line or in the environment variable EVERY, then the default is "-1,1". That is, with no options, "every" acts just like cat(1). For example, if we number lines starting at 0, then $ every -10,2 foo.c prints out the following lines of foo.c: 0,1, 10,11, 20,21, 30,31, etc. If M is omitted, eg $ every -10 foo.c then it defaults to 1. (If either N or M is specified on the command line, the environment variable EVERY should be ignored.) If multiple files are given on the command line, each one should be handled INDEPENDENTLY, so "-10,2" means lines 0,1,10,11, etc. of each file. Like all Unix filters, if no files are on the command line, every processes its standard input 3) [20 marks] Below are two different versions of a script called "rename", whose purpose is to use sed(1) to programmatically rename any list of files. Both have the same SYNOPSIS: $ rename '/find/replace/' {list of files} where "find" is a regular expression, "replace" in the replacement text, and we'll prepend the 's' so that internally sed(1) will see "s/find/replace/". The first uses a Bourne shell for loop. Let's call it "rename.loop": SED_EXPR="s$1"; shift for i in "$@"; do new=`echo "$i" | sed "$SED_EXPR"` mv "$i" "$new" done The second one uses awk(1) to create the required sequence of "mv" commands, without a Bash loop. Note that to actually have the mv's performed you'd need to pipe the output of this script to "sh". Let's call this one "rename.awk": SED_EXPR="s$1"; shift TMPDIR=`mktemp -d $MYTMP/rename.XXXX` trap "/bin/rm -rf $TMPDIR" 0 1 2 3 15 /bin/ls "$@" | tee $TMPDIR/old | sed "$SED_EXPR" > $TMPDIR/new paste $TMPDIR/old $TMPDIR/new | awk -F'\t' '{printf "mv \"%s\" \"%s\"\n", $1,$2}' From openlab, get the file ~wayne/pub/cs146/big-dir.7z. It's a 7zip archive of a directory containing 10,000 files (named 0000 through 9999 inclusive). Unpack the archive--preferably to a local disk rather than a networked disk. Then use each of the "rename" versions above to perform some renaming of your choice of all 10,000 files (eg., use one of them to append ".txt" to the name of each file, and then the other to rename them all back). YOUR TASK: determine which script is faster... which is trivial. More important is the question of WHY??? To answer why, you will perform a detailed comparative analysis of the resources required for these two versions of "rename". This analysis should include a breakdown of total user (CPU), system, and real (wall-clock) time for each of the two versions of "rename". Use the time(1) command; there's one built in to Bash, but also /usr/time or /usr/bin/time, depending on your system. Check out the "--verbose" option. Then, break down the timings by running time(1) on each individual component of the respective scripts (ls, echo sed, mv, for loop, etc). Specific questions you might ask are: how does the sum of the timings of the sub-components compare to the total time of the top-level script? Can we use that difference to estimate how much time is spent by the top-level process vs. the child processes? And if the answer is "yes", provide some details: how much time is spent by the top- level Bourne shell that performs the looping & piping (remember that back-quotes are internally implemeted using a pipe). Your write-up should probably be in a PDF, written nicely as a document with tables, possibly even figures if you think that'll help. The quality of your write-up (including level of detail, presentation, and quality of discussion) will count just as much as correctness of your analysis.