Fundamentals 8 min read

Introduction to AWK: Syntax, Script Structure, and Practical Use Cases

This article introduces the AWK scripting language, covering its basic syntax, command‑line options, script components such as BEGIN and END blocks, and demonstrates common text‑filtering, data‑analysis, and formatting tasks with concrete examples.

360 Quality & Efficiency
360 Quality & Efficiency
360 Quality & Efficiency
Introduction to AWK: Syntax, Script Structure, and Practical Use Cases

Today we recommend a very simple scripting language—AWK—which excels at processing formatted text, often combined with the shell for log handling and statistical work. Its syntax is concise, execution fast, and it offers built‑in features like arrays and functions similar to C, making it easy for beginners.

Syntax

i. awk [options] 'script' var=value file(s)

ii. awk [options] -f scriptfile var=value file(s)

Common options include:

-F fs Specify input field separator (string or regex, e.g., -F "\t").

-v var=value Pass external variables to AWK.

-f scriptfile Read AWK commands from a script file.

Script Structure and Working Principle

An AWK script typically consists of optional BEGIN , pattern/action, and END blocks, each enclosed in single or double quotes.

awk 'BEGIN{ print "start" } pattern{ commands } END{ print "end" }' file

BEGIN runs before any input is read (e.g., variable initialization, header printing).

END runs after all input has been processed (e.g., summary output).

The pattern/action block processes each line; if omitted, the default action is { print } , printing every line.

Application Scenarios

Assume IP_file contains visitor data in the format "area,IP,date".

Text Filtering

Goal: select lines where the area contains "北京".

awk '/^北京/{print $0}' IP_file

Here ^ matches the start of the line and $0 prints the entire line.

Data Statistics

Goal: find the top 100 IPs by access count on "2016-11-11".

cat IP_file | awk -F"," '{ if($3 == "2016-11-11") sum[$2]++ } END{ for(i in sum) print i"\t"sum[i] }' | sort -nrk2,2 | head -100

Formatted Output

Using printf (similar to C) for aligned results:

awk -F"\t" '{printf("The number of IP %-15s occurrences is %d times\n", $1, $2)}'

Finding Duplicate Records

awk 'NR==FNR{a[$1]++} NR>FNR && a[$1]>1' IP_file IP_file | uniq

Alternative solution:

cat IP_file | awk '{a[$1]++} END{ for(i in a) if(a[i]>1) print i }'

Integration with Shell

AWK can pipe to/from shell commands, access shell variables via '$var' , "$var" , the -v option, or export . Shell functions can be called after exporting them, and shell commands can be executed from AWK using system() or print cmd | "/bin/bash" .

Important Tips

Handle single quotes correctly; e.g., gsub(/A/,"a") not gsub(/A/,'a') .

Large floating‑point numbers may be printed in scientific notation.

When passing shell variables containing spaces, use '"$var"' .

Ensure comparable types for arithmetic comparisons.

When using FILENAME and ARGIND , keep file order consistent between the command line and the script.

Conclusion

AWK provides a rich set of built‑in variables, functions, flow control statements, standard output formatting, arrays, arithmetic and date functions, meeting many text‑processing needs. This article covered the basics to help you get started quickly; further exploration will reveal AWK as a powerful, handy assistant.

data analysiscommand-linetext processingshell scriptingawk
360 Quality & Efficiency
Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.