Index: > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Business Industries Finance Tax

Home > AWK programming language


First Prev [ 1 2 ] Next Last

AWK is a general purpose computer language that is designed for processing text based data, either in files or data streams. The name AWK is derived from the surnames of its authors -- Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan.

Awk is an example of a programming language that extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions.

Awk is one of the early tools to appear in Version 3 UNIX and gained popularity as a way to add computational features to a UNIX pipeline. A version of awk is a standard feature of nearly every modern Unix-like operating system available today. Implementations of awk exist as installed software for almost all other operating systems.

1 Structure of awk programs

Generally speaking, two pieces of data are given to awk: a command file and a primary input file. A command file (which can be an actual file, or can be included in the command line invocation of awk) contains a series of commands which tell awk how to process the input file. The primary input file is typically text that is formatted in some way; it can be an actual file, or it can be read by awk from the standard input. A typical awk program consists of a series of lines, each of the form

/pattern/ { action }

where pattern is a regular expression and action is a command. Awk looks through the input file; when it finds a line that matches pattern, it executes the command(s) specified in action. Alternate line forms include:

BEGIN { action }
Executes action commands at the beginning of the script execution, i.e., before any of the lines are processed.
END { action }
Similar to the previous form, but executes action after the script execution.
/pattern/
Prints any lines matching pattern.
{ action }
Executes action for each line in the input.

Each of these forms can be included multiple times in the command file. Lines in the command file are executed in order, so if there are two "BEGIN" statements, the first is executed, then the second, and then the rest of the lines. BEGIN and END statements do not have to be located before and after (respectively) the other lines in the command file.

2 Awk commands

Awk commands are the statement that is substituted for action in the examples above. Awk commands can include function calls, variable assignments, calculations, or any combination thereof. Awk contains built-in support for many functions; many more are provided by the various flavors of awk. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.

For brevity, the enclosing curly braces ( { } ) will be omitted from these examples.

2.1 The print command

The print command is used to display text to the output device (usually a monitor, though often a file or

output stream as well). The simplest form of this command is print

This displays the contents of the current line. In awk, lines are broken down into fields, and these can be displayed separately:

print $1
Displays the first field of the current line
print $1, $3
Displays the first and third fields of the current line, separated by a predefined string

Although these fields ($X) may bear resemblance to variables (the $ symbol indicates variables in perl), they actually refer to the fields of the current line. A special case, $0, refers to the entire line. In fact, the commands "print" and "print $0" are identical in functionality.

The print command can also display the results of calculations and/or function calls:

print 3+2 print foobar(3) print foobar(variable) print sin(3-2)

2.2 Variables, et cetera

Variable names can use any of the characters [A-Za-z0-9_], with the exception of language keywords. The operators + - * / are addition, subtraction, multiplication, and division, respectively. For string concatenation, simply place two variables (or string constants) next to each other, optionally with a space in between. String constants are delimited by double quotes. Statements need not end with semicolons. Finally, comments can be added to programs by using # as the first character on a line.

2.3 User-defined functions

In a format similar to CThe C Programming Language Brian Kernighan and Dennis Ritchie, the original edition that served for many years as an informal specification of the language The C programming language is a low-level standardized programming language developed in the early, function declarations consist of the function name and arguments to the function. Here is an example function:

function add_three (number) { temp = number + 3 return temp }

This statement can be invoked as follows:

print add_three(36) # prints 39

Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some whitespaceFor information on the programming language Whitespace, see Whitespace programming language. In computer science, a whitespace (or a whitespace character is any character which does not display itself but does take up space. For example, the character sym in the argument list before the local variables, in order to indicate where the parameters end and the local variables begin.

A freeThis article refers to free software as defined by the Free Software Foundation. For software available free of charge, see Freeware. The term free software refers to software which, once obtained, can be used, copied, studied, modified and redistributed. GNUFor the African animal gnu see wildebeest. logo Believed to be the original artwork of Etienne Suvasa GNU is a recursive acronym for "GNU's Not Unix". The GNU project was launched in 1983 by Richard Stallman with the goal of creating a complete operating version of awk is named gawk . Documentation and downloads are available at [1].

comp.lang.awk is a USENET newsgroupA newsgroup is a repository, usually within the Usenet system, for messages posted from many users at different locations. The term is somewhat confusing, because it is usually a discussion group. Newsgroups are technically distinct from, but functionally dedicated to awk.

The power, terseness, and limitations of awk programs and sedsed (which stands for S tream ED itor) is a simple but powerful computer program used to apply various pre-specified textual transformations to a sequential stream of text data. It reads input files line by line, edits each line according to rules specifi scripts inspired Larry WallLarry Wall programmer, linguist, author, born March 10, 1949 in Duncan, British Columbia, Canada, is most widely known for his creation of the Perl computer language in 1987. Wall is the author of the rn Usenet software and the nearly universally used pat to write Perl.





Non User