Skip to content

Bash tools: Awk

Jean-Michel Gigault edited this page Aug 10, 2015 · 15 revisions

An awk script can be applied either to a file or to the standard entry:

cat "path/to/the/file" | awk '<AWK SCRIPT>'  # Using 'cat' and a pipe
awk '<AWK SCRIPT>' "path/to/the/file"        # Using the file path 
                                             # as the third argument

Awk reads a file descriptor one line at a time. For each line, called records, awk executes one or more blocks of instructions surrounded by braces {...}. The most basic instruction is print:

{print}               # Print each line on the standard output

Awk splits each record into fields, or columns, according to a character (the character <space> by default). Each field is stored in a positional parameter $1, $2, $3.. where $1 is the first field:

{print $1, $2, $3}    # Print the first three fields of each line
                      # If a line is:    "This is a line from a file"
                      # Output would be: "This is a"

The positional parameter $0 contains the entire line, as the following instructions are similar:

{print}               # Print each line on the standard output
{print $0}

Awk may execute several blocks of instructions for each line:

{print $0} {print $0} # Print each line twice on the standard output

A block of instructions may have a condition which is prefixed:

$0 == "this is a string" {print $0}
# Print each line that is strictly equal to "this is a string"

$0 ~  /this is a string/ {print $0}
# Print each line that contains the string "this is a string"

An awk script may have a BEGIN and an END statements. These blocks of instructions are executed before or after reading the file. Use it to declare local variables or to display headings/footers:

BEGIN {print "START"}  {print $0}  END {print "END OF FILE"}
# Will output:  START
#               [Lines of the file]
#               END OF FILE

For greater readability in the following examples, the file is first read into a variable "$TEXT" which is sent within the standard entry to awk:

declare TEXT=$(cat "./path/to/the/file")

echo "$TEXT" | awk '$0 ~ /word/ {print $0}'
# Print every lines that contain the term 'word'

echo "$TEXT" | awk '$0 ~ /^word/ {print $0}'
# Print every lines that begin with the term 'word'

echo "$TEXT" | awk '$0 ~ /word$/ {print $0}'
# Print every lines that end with the term 'word'

echo "$TEXT" | awk '$0 ~ /^word$/ {print $0}'
# Print every lines that strictly contain the term 'word'

echo "$TEXT" | awk '$0 ~ /word\t/ {print $0}'
# Print every lines that contain the term 'word' followed by a tab

echo "$TEXT" | awk '$0 ~ /[\t]word/ {print $0}'
# Print every lines that contain the term 'word' prefixed by a tab

echo "$TEXT" | awk '$0 ~ /word[\t]?/ {print $0}'
# Print every lines that contain the term 'word' followed by zero or one tab

echo "$TEXT" | awk '$0 ~ /word[\t]+/ {print $0}'
# Print every lines that contain the term 'word' followed by one or more tabs

echo "$TEXT" | awk '$0 ~ /word[\t]*/ {print $0}'
# Print every lines that contain the term 'word' followed by zero, one or more tabs

echo "$TEXT" | awk '$0 ~ /word[\t ]/ {print $0}'
# Print every lines that contain the term 'word' followed by either a space or a tab

echo "$TEXT" | awk 'BEGIN {print "I found this:"} $0 ~ /word/ {print $0}'
# At the beginning of the script, print the sentence "If found this:"
# Then find and print every lines that contain the term 'word'

echo "$TEXT" | awk '$0 ~ /word/ {print $0} END {print "I have read all."}'
# Print every lines that contain the term 'word'
# At the end of the script, print the sentence "I have read lines."

echo "$TEXT" | awk '$0 ~ /word/ {print "true"; exit} END {print "false"}'
# Find a line that contains the term 'word', print "true" and stop searching by exiting the script
# If no line is found, the 'END' block is executed and prints "false"