UC Irvine, Information and Computer Science Department Winter 2000

ICS 54: awk: Brief notes for Chapter 17


awk Selected References

http://docs.sun.com/ab2/coll.40.5/REFMAN1/@Ab2PageView/idmatch(AWK-1)
Sun's Solaris 7 man page
http://www.opengroup.org/onlinepubs/7908799/xcu/awk.html
The Open Group's Single UNIX Specification, Version 2.
http://www.faqs.org/faqs/computer-lang/awk/faq/
awk FAQ (Frequently Asked Questions)
http://www.ora.com/catalog/unixnut3/chapter/ch11.html
"The awk Programming Language," chapter 11 of UNIX in a Nutshell: System V Edition, 3rd Edition by Arnold Robbins.
http://www.novia.net/~phridge/programming/awk/
Worked examples from The Awk Programming Language, by Aho, Kernighan, and Wienberger, Addison-Wesley, 1998, ISBN 020107981X.
awk is a pattern scanning and processing language, named after its inventors: Alfred Aho, Peter Wienberger, and Brian Kernighan.

Please note that our coverage of awk is very much simplified and shows only a small part of its full range of capabilities.


awk

awk [ -f progfile ] [ -Fc ] [ 'prog' ] [ parameters ] [ filename ] ...
An awk program is a sequence of pattern-action commands, each of the form
       pattern{action}
awk scans each input filename (multiple ones can be given and a hyphen, -, can be used as the "filename" for standard input) for lines which match patterns specified in a program given either by the prog string (which is enclosed in single quotes) or in the file progfile.
If no filename present, the program operates on standard input.
For each pattern in the program, the corresponding action is done on all lines of the input which the pattern matches.
-f progfile
Take the pattern-action statments/commands from progfile rather than from prog.
-Fc
Use character c, rather than space/blank as the field separator on each line of the input.


awk Commands

Examples

% cat in
Alice   20   4
Bob     12   3
Carol   18   3
Dave    24   4
Ed      30   5
%
% awk '/0/{print}' in
Alice   20   4
Ed      30   5
%
% awk '{print NR":"$0" ="$3"-"$2"-"$1}' < in
1:Alice   20   4 =4-20-Alice
2:Bob     12   3 =3-12-Bob
3:Carol   18   3 =3-18-Carol
4:Dave    24   4 =4-24-Dave
5:Ed      30   5 =5-30-Ed
%
% awk -F2 '{print "="$2"="$1"="}' in
=0   4=Alice   =
=   3=Bob     1=
==Carol   18   3=
=4   4=Dave    =
==Ed      30   5=
%
% awk 'BEGIN{print "Hi"}{s+=$2}END{print "Sum2=",s}' in
Hi
Sum2= 104
%
% awk '$3==4{s+=$2}END{print s}' in
44
%
ls -l | awk '/^d/{d++}/^-/{f++}END{print d" dirs + "f" files"}'
3 dirs + 6 files
%
% awk '$2~/[3-8]/{print;s+=$3}END{print s}' in
Carol   18   3
Dave    24   4
Ed      30   5
12
%
% awk '{printf "%s = %4.1f\n",$0,$2/$3}' in
Alice   20   4 =  5.0
Bob     12   3 =  4.0
Carol   18   3 =  6.0
Dave    24   4 =  6.0
Ed      30   5 =  6.0
%
% awk '{s+=$2;t+=$3}END{printf "%4d/%d=%4.2f\n",s,t,s/t}' in
 104/19=5.47
%
% awk 'BEGIN{step=2}{n++; if(n>=step){print NR": "$0;n=0}}' in

or
% awk '{n++; if(n>=step){print NR": "$0;n=0}}' step=2 in
2: Bob 12 3 4: Dave 24 4 %

Explanations

awk '/0/{print}' in
Print all lines from the file in which contain a "0"
awk '{print $0 " = " $3 "-" $2 "-" $1}' < in
Print each line preceded by its number (and a colon) and followed by the 3 fields of the line in reverse order, using "=" and "-" as separators.
Note: NR = "Number of this Record," a built-in variable.
awk -F2 '{print "="$2"="$1"="}' in
Using "2" as a field separator, print fields 2 and 1 in that order, using "=" to delimit them.
awk 'BEGIN{print "Hi"}{s+=$2}END{print "Sum2=",s}' in
Print "Hi" at the start, sum up field 2 of each line, and print the sum at the end.
Note the use of BEGIN and END as special patterns.
Note also that "+=" means that s is to be incremented by the value of the 2nd field and that there is no "$" before the s.
awk '$3==4{s+=$2}END{print s}' in
Print the sum of the 2nd field of each line whose 3rd field is 4.
Note the use of "==" to test for equality.
ls -l|awk '/^d/{d++}/^-/{f++}END{print d" dirs + "f" files"}'
Count separately the lines beginning "d" and those beginning "-"
Note that ++ means that the variable's value is incremented by 1.
awk '$2~/[3-8]/{print;s+=$3}END{print s}' in
Print each line whose 2nd field matches the pattern "[3-8]" and then print the sum of the 3rd field of each of these lines.
awk '{printf "%s = %4.1f\n",$0,$2/$3}' in
Append to each line field 2 divided by field 3.
awk '{s+=$2;t+=$3}END{printf "%4d/%d=%4.2f\n",s,t,s/t}' in
Total columns 3 and 4 and print the average of the totals.
% awk 'BEGIN{step=2}{n++; if(n>=step){print NR": "$0;n=0}}' in
% awk '{n++; if(n>=step){print NR": "$0;n=0}}' step=2 in
Print, preceded by its line number, every line whose number is a multiple of step.
One can also do this as follows:
      awk '{if(step<=++n){print NR": "$0;n=0}}' step=2 in
or even
      awk '0==NR%step{print NR": "$0}' step=2 in


awk programs

A awk program consists of one or more commands of the form:
       pattern{action}

The action is performed on all lines of the input which match the pattern.

Every input line matches an empty pattern.

awk patterns

empty
action is done for each line.
BEGIN
action is done before first line.
END
action is done after last line.
/RegularExpression/
action is done if the line matches RegularExpression.
PatternMatchingExpression
action is done if the PatternMatchingExpression is true for this line.
Such expressions are composed of the ~ (match) and !~ (no match) operators
RelationalExpression
action is done if the RelationalExpression is true for this line.
The relational operations are:   <   <=   ==   !=   >=   >
BooleanExpression
action is done if the BooleanExpression is true for this line.
Such expressions are the combination of pattern matching and relational expressions using boolean operations (&&, ||, and !) and (if/as needed) parentheses.


awk Built-in Variables

FILENAME
Name of the current input file
FS
input Field Separator regular expression (default blank and tab)
NF
Number of Fields in the current record
NR
Number of the current Record
OFMT
Output ForMaT for numbers (default %.6g)
OFS
Output Field Separator (default blank)
ORS
Output Record Separator (default newline)
RS
input Record Separator (default newline)


Special Characters and Regular Expressions in awk

\Combines with following character to give it special meaning (see below) or, if it would have had a special meaning without the \, to make it revert to its literal meaning.
\aAlert/Bell (CTRL/G = ASCII 7)
\bBackspace (CTRL/H = ASCII 8)
\fForm feed (CTRL/L = ASCII 12)
\nNewline (CTRL/J = ASCII 10)
\rCarriage return (CTRL/M = ASCII 13)
\tTab (CTRL/I = ASCII 9)
\/Literal slash (in regular expressions)
\nnn  Octal value nnn
.Match any character
^Match start of line
$Match end of line
[...]Match any character in brackets
Example: [abcA-Z7]
[^...]  Match any character except those in brackets
Example: [^abcA-Z7]
*Match 0 or more repetitions of previous item
+Match 1 or more repetitions of previous item
?Match 0 or 1 repetitions of previous item
(...) Treat enclosed text as a group/item
|Separator for items which are considered alternatives.
Example: (NY|LA|SF)

Examples

% cat in
Alice   20   4
Bob     12   3
Carol   18   3
Dave    24   4
Ed      30   5
% awk '/^[^A-CE]/{print}' in
Dave    24   4
% awk '($1~/e/)||(NR>4){print}' in
Alice   20   4
Dave    24   4
Ed      30   5
% awk '($1~/e/)&&($2\!~/0/){print}' in
Dave    24   4
% sh
$ awk '($1~/e/)&&($2!~/0/){print}' in
Dave    24   4
$ echo $0
sh
$ exit
% echo $0
tcsh
% awk '{s+=$2*$3}END{print s}' in
416
% ls -la ~ | awk '/^-/{s+=$5;n++}END{print n" files, Avg="s/n" bytes"}'
32 files, Avg=1376.21 bytes


awk Arithmetic

+   -   *   /   %   ^
Addition, subtraction, multiplication, division, modulus(remainder), exponentiation.
++   --  
Increment and Decrement the value of a variable: prefix (++x) before the value is used or postfix (x++) after it is used.
=   +=   -=   *=   /=   %=   ^=  
Assignment: x ?= y    is the same as x = x ? y
String concatenation is indicated by a blank or simple juxtaposition when token boundaries are clear.


awk Statements

An awk action is a sequence of statments, each terminated by a semicolon, newline, or right brace.

if ( expression ) statement [else statement ]
while ( expression ) statement
do statement while ( expression ) # In nawk
for ( expression ;expression ; expression) statement
for ( var in array ) statement
break
continue
{ [ statement ] . . . }
expression # commonly variable = expression
print [ expression-list ] [ > expression ]
printf format [ ,expression-list ] [ > expression ]
next  # skip remaining patterns on this input line
exit [expr] # skip the rest of the input; exit status is expr

Examples

% cat in
Alice   20   4
Bob     12   3
Carol   18   3
Dave    24   4
Ed      30   5
%
% awk '$2<20{print $1" is small"}' in
Bob is small
Carol is small
%
% awk '{if($2<20)print $1" is small"}' in
Bob is small
Carol is small
%
% awk '{if($2<20)print $1" is small " else print $1 " is big"}' in
awk: syntax error near line 1
awk: illegal statement near line 1
%
% awk '{if($2<20){print $1" is small "} else print $1 " is big"}' in
Alice is big
Bob is small
Carol is small
Dave is big
Ed is big
%
% awk '{if ($2<20) print $1" is small " else {print $1 " is big"}}' in
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: bailing out near line 1
%
% awk 'END{while(i++<3)print i}' < /dev/null
1
2
3
%
% awk 'END{while(++i<3)print i}' < /dev/null
1
2
%
% awk 'END{do print i}while (++i<3)}' < /dev/null
awk: syntax error near line 1
awk: illegal statement near line 1
awk: bailing out near line 1
%
% nawk 'END{do{print i}while (++i<3)}' < /dev/null

1
2
%
% nawk 'END{while(++i<3)print i}' < /dev/null
1
2
%
% awk 'END{for(i=0;i<3;i++)print i}' < /dev/null
0
1
2
%
% awk 'END{for(;i<3;i++)print i}' < /dev/null

1
2
%
%
% cat in2
Alice:20:4
Bob:12:3
Carol:18:3
Dave:24:4
Ed:30:5
%
% cat 2.awk
/^[A-C]/{print "A-C: "$0}
/e/{print "--e: "$0}
%
% awk -F':' -f 2.awk in2
A-C: Alice:20:4
--e: Alice:20:4
A-C: Bob:12:3
A-C: Carol:18:3
--e: Dave:24:4
%
% cat 2n.awk
/^[A-C]/{print "A-C: "$0; next}
/e/{print "--e: "$0}
%
% awk -F: -f 2n.awk in2
A-C: Alice:20:4
A-C: Bob:12:3
A-C: Carol:18:3
--e: Dave:24:4
%
% awk -F: '/o/{exit}END{print "Exit at\n"NR"="$0}' in2
Exit at
2=Bob:12:3
%


awk Arrays

awk's array variables are associative arrays.

Each array variable is, in fact, a collection of variables written in the form avar[index] where avar is a name they share and index (called a "subscript" or "index") can be an integer or string value.

Like all other, user-defined variables, avar[index] is automatically created when it is first used. Thus the array variable avar is created and extended simply by being used.

Examples

% cat in
Alice   20   4
Bob     12   3
Carol   18   3
Dave    24   4
Ed      30   5
%
%
% awk '{myline[$1]=NR}END{for(n in myline)print n"="myline[n]}' in
Ed=5
Bob=2
Alice=1
Dave=4
Carol=3
%
% awk '{name[NR]=$1}END{for(n in name)print n"="name[n]}' in
2=Bob
3=Carol
4=Dave
5=Ed
1=Alice
%
% awk '{name[NR]=$1}END{for(n=1;n<=NR;n++)print n"="name[n]}' in
1=Alice
2=Bob
3=Carol
4=Dave
5=Ed
%
% cat in3
Alice   20
Bob     12
Carol   18
Dave    24
Ed      30
Alice   16
Bob     12
Carol   12
Dave    14
Ed      20
%
% awk '{sum[$1]+=$2}END{for(n in sum)print n"="sum[n]}' in3
Ed=50
Bob=24
Alice=36
Dave=38
Carol=30
%


awk functions

awk has a generous set of built-in functions.

The on-line form of Chapter 11: The awk Programming Language of UNIX in a Nutshell: System V Edition, 3rd Edition by Arnold Robbins, has an excellent Group Listing and Alphabetic Summary of awk Functions and Commands.

Standard/minimal awk doesn't include user-defined functions.

nawk, gawk, and other extended versions of awk do provide user-defined functions.

Examples

% cat in.val
Alice   1 2 3 4
Bob     2 5 4 1
Carol   3 2 4 2
Dave    4 3 2 1
Ed      5 3 4 2
%
% awk -f valsum.awk in.val
awk: syntax error near line 4
awk: bailing out near line 4
%
% cat valsum.awk
# valsum.awk -- sum values in each record
# input: name followed by a series of values

function sum(s) {
  for (i=2; i<=NF; ++i) s+=$i
  return s
}

{print($0" = " sum())}
%
% nawk -f valsum.awk in.val
Alice   1 2 3 4 = 10
Bob     2 5 4 1 = 12
Carol   3 2 4 2 = 11
Dave    4 3 2 1 = 10
Ed      5 3 4 2 = 14
%
% nawk -f valsort.awk in.val
  Alice:  1 2 3 4
    Bob:  1 2 4 5
  Carol:  2 2 3 4
   Dave:  1 2 3 4
     Ed:  2 3 4 5
%
% cat valsort.awk
# valsort.awk -- sort values in each record
# input: name followed by a series of values
# based on grade.sort.awk script from chapter 9 of
# "sed & awk, 2nd ed," Dougherty & Robbins, O'Reilly, 1997

# sort function -- sort numbers in ascending order
function sort(A,n) {
  for (i=2; i<=n; ++i)
    for (j=i; (j>1) in A && A[j-1]>A[j]; --j) {
      tmp=A[j]; A[j]=A[j-1]; A[j-1]=tmp
  }
  return
}

# main routine
{
for (i=2; i<=NF; ++i) val[i-1]=$i
sort(val, NF-1)
printf("%7s:  ", $1)
for (j=1; j<=NF-1; ++j) printf("%d ", val[j])
printf("\n")
}

Comments are welcome.
Current as of 14 February 2000
HTML 4.01 Checked.