First steps with Awk

Awk is a Unix programming language specifically dedicated to the processing of text files.

While it’s been around for ages (it’s almost as old as Unix) it remains fairly unknown and/or unused compared to other utilities such as grep, vi, find… Strange really as it’s powerful and quite easy to use.

Example case: looking up and summarizing data in a log file.

Imagine the “usual” application log file of the form: [Date] [Thread #] [Log level] [log message]

Say some of the lines logged in there account for the time spent on one given algorithm:
13:42:07,019 [Thread-1] DEBUG Calculation: time spent on algo #12 is 831 ms

It would be interesting to parse all the lines containing the word algo to extract 1) the total time spent on algo calculation 2) the average time spent across all calculations.

In Java,  coding such a parser from scratch would take about one full day for most developers (if not more)
– locate file, open and read , manage io exceptions
– parse lines (manage parsing exceptions)
– calculate results and print
– create a build script

By comparison the equivalent script can be written with Awk in minutes.

proceeding step by step

Step 1.
to print to screen all lines containing the term algo in the file myfile.log

awk ‘/algo/ {;print $0}’ myfile.log

Note: Awk divides each line in columns, where a column is a block of text separated by whitespaces.
Eg. in the format above [Date] [Thread] [Debug]… $0 will print the whole line, $1 will print the date, $2 the thread number ..etc…

Step2.
To sum the number of lines with the term algo:

awk ‘/algo/ {nb++} END {printf “%d”,nb}’ myfile.log

[here nb is a local variable declared on the fly and used as a line counter]

Step3.
To keep a running total of the time spent on algo calculations:

awk ‘/algo/ {total+=$9} END {printf “%d”,total}’ myfile.log

[assuming the time spent on a calculation is printed on the 9th column of the line]

Step4.
putting it all together – print the total time spent + average on each calculation:

awk ‘/algo/ {nb++;total+= $9} END {printf “%d %d”, total, total/nb}’ myfile.log



The best way to learn more about Awk is to try it….

– Awk should come as a standard with all linux distributions.

– On Windows it’s available via Cygwin. An alternative implementation, gawk, (for Gnu Awk) can also be found here.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s