Searching using regular expressions in AWK

walden systems, walden, system, developer, geek, geeks corner, programming, awk, scripting, variable, scope, global, local, c shell, bash, grouping, regular expressions, regexp
Awk is text processing program that are mainstays of the UNIX/Linux programmer's toolbox.

A regular expression, or regexp, is a way of describing a set of strings. A regular expression enclosed in slashes / is an awk pattern that matches every input record whose text belongs to that set. AWK is very powerful and efficient in handling regular expressions. A number of complex tasks can be solved with simple regular expressions. Regular expressions makes AWK an almost ideal language for text manipulation.

Searching

There are several ways that AWK allows us to search a string. We can match a single character using the .. We can search for word at beginning of sentence using ^. We can search for a word t the end of a line using $. We can search for a set of characters using by enclosing characters in []. We can also search for an exclusive set were we include all character but what is enclosed in [^}.

Character match:
$  echo -e "cat
car
fun
den
fan
foo" | awk '/f.n/'
Will output fun, fan and foo

Match begining of line:
$ echo -e "Who
What
There
Their
these" | awk '/^The/'
Will output There, Their

Match end of line:
echo -e "foo
where
Den
fan
boon
boot" | awk '/n$/'
Will output Den, fan, boon

Match set:
$ echo -e "Coo
Tall
Ball
Mall" | awk '/[BM]all/'
Will output Mall, Mall

Exclusive set:
echo -e "Coo
Tall
Ball
Mall" | awk '/[^BM]all/'
Will output Coo, Tall


Match occurrences

AWK allows us to mach zero or more occurrences of a preceding character using the ? for a single occurrence and a * for zero or more occurrences. We can also match for one or more occurrences using the +() and |. Below are some examples:

Searching for zero or one occurrence	
$ echo -e "colour
color" | awk '/colou?r/' 
Will output color, color

Searching for zero or more occurrences
$ echo -e "ca
mat
matt" | awk '/mat*/'
Will output mat, matt

Searching for one or more occurrences
$ echo -e "abcccc
zyz
cdef
ncaa
lmnop"  | awk '/c+/'
Will output abc, bccc, cdef, ncaa

Search for groups
$ echo -e "candy apple
candy corn
candy cane
candy" | awk '/candy (apple|cane)/'
Will output candy apple, candy cane