Searching using regular expressions in AWK
A regular expression, or regexp, is a way of describing a set of strings. A regular expression enclosed in slashes / is an awk pattern that matches every input record whose text belongs to that set. AWK is very powerful and efficient in handling regular expressions. A number of complex tasks can be solved with simple regular expressions. Regular expressions makes AWK an almost ideal language for text manipulation.
There are several ways that AWK allows us to search a string. We can match a single character using the .. We can search for word at beginning of sentence using ^. We can search for a word t the end of a line using $. We can search for a set of characters using by enclosing characters in . We can also search for an exclusive set were we include all character but what is enclosed in [^}.
Character match: $ echo -e "cat car fun den fan foo" | awk '/f.n/' Will output fun, fan and foo Match begining of line: $ echo -e "Who What There Their these" | awk '/^The/' Will output There, Their Match end of line: echo -e "foo where Den fan boon boot" | awk '/n$/' Will output Den, fan, boon Match set: $ echo -e "Coo Tall Ball Mall" | awk '/[BM]all/' Will output Mall, Mall Exclusive set: echo -e "Coo Tall Ball Mall" | awk '/[^BM]all/' Will output Coo, Tall
AWK allows us to mach zero or more occurrences of a preceding character using the ? for a single occurrence and a * for zero or more occurrences. We can also match for one or more occurrences using the +() and |. Below are some examples:
Searching for zero or one occurrence $ echo -e "colour color" | awk '/colou?r/' Will output color, color Searching for zero or more occurrences $ echo -e "ca mat matt" | awk '/mat*/' Will output mat, matt Searching for one or more occurrences $ echo -e "abcccc zyz cdef ncaa lmnop" | awk '/c+/' Will output abc, bccc, cdef, ncaa Search for groups $ echo -e "candy apple candy corn candy cane candy" | awk '/candy (apple|cane)/' Will output candy apple, candy cane