Scripting in AWK

walden systems, walden, system, developer, geek, geeks corner, programming, awk, scripting, variable, scope, global, local, c shell, bash
Awk is text processing program that are mainstays of the UNIX/Linux programmer's toolbox.

AWK is a standard tool on every POSIX-compliant UNIX system. It is ideal for text-processing tasks and other scripting needs. It has a C-like syntax, but without mandatory semicolons, although, you should use them anyway, because they are required when you're writing one-liners, manual memory management, or static typing. It excels at text processing. You can call to it from a shell script, or you can use it as a stand-alone scripting language. AWK is also easier to read than Perl.

AWK is an excellent filter and report writer. Many UNIX utilities generates rows and columns of information. AWK is an excellent tool for processing these rows and columns, and is easier to use AWK than most conventional programming languages. AWK understands the same arithmatic operators as C. AWK also has string manipulation functions, so it can search for particular strings and modify the output. AWK also has associative arrays, a feature most computing languages lack. Associative arrays can make a complex problem a trivial exercise.

Basic structure

The essential organization of an AWK program follows the form: pattern { action }. The pattern specifies when the action is performed. Like most UNIX utilities, AWK is line oriented. The pattern specifies a test that is performed with each line read as input. If the condition is true, then the action is taken. The default pattern is something that matches every line. This is the blank or null pattern. Two other important patterns are specified by the keywords BEGIN and END. As you might expect, these two words specify actions to be taken before any lines are read, and after the last line is read.

#!/bin/sh

awk '
        BEGIN { print "File	Owner" }
        { print $8, "	", $3}
        END { print " - DONE -" } 
    '


Different shell scripts

AWK can be called through various shells or can be called from it's own shell. In C shell, we will have to end each line with a backslash unless it is the last line since multiline statements must be denoted by a backslash. In Bash, it is not necessary since Bash supports quoted string to span several lines. A third option is to use AWK itself since AWK is also an interpreter.

C Shell

#!/bin/csh -f
# Linux users have to change $8 to $9
awk '
       BEGIN    { print "File	Owner" } 
                { print $8, "	", $3}	
       END      { print " - DONE -" } 
'



Bash
#!/bin/sh

awk '
        BEGIN { print "File	Owner" }
        { print $8, "	", $3}
        END { print " - DONE -" } 
    '



AWK
#!/bin/awk -f
BEGIN { print "File	Owner" }
      { print $8, "	", $3}
END { print " - DONE -" }



Notice that the first line of the file specifies what shell the script will be using.

Which Shell to use

The AWK format is not free form, you can't put line breaks anywhere. Line breaks can only be placed after curly braces or at the end of a command line. If you need to break a long line into multiple lines anywhere else, you have to use a backlash. This can make writing AWK scripts in C shell hard to follow as can be seen in the example below.

#!/bin/csh -f
awk '
    BEGIN { print "File	Owner" }
          { print $8, "	", \
            $3}
    END { print "done"}
'