Using Lex to Build a Simple cat‑Like Utility and an SQL Syntax Highlighter
This article demonstrates how to use the Lex lexical analyzer generator to build a minimal cat‑like utility by creating an empty rule file, then extends the technique to develop a simple SQL syntax highlighter with colorized output, explaining the role of %% separators, default rules, and compilation steps.
This tutorial shows how to employ the Lex lexical analyzer generator to create a tiny program that mimics the behavior of the Unix cat command.
First, create an empty Lex rule file that contains only the four percent‑sign separators ( %%) and no definitions, rules, or user code. Save it as a file (e.g., %%%%), then run the following commands: $ lex %%%%\n$ gcc lex.yy.c -ll The resulting executable ( a.out) reads its standard input and writes it unchanged, effectively acting like cat. For example: $ ./a.out < /etc/hosts Lex works by converting a rule file into C source code. The rule file is divided into three sections separated by %% lines: a definitions section, a rules section, and a user‑code section. An empty file therefore produces a lexer whose only rule is the default one that echoes any unmatched character.
The default rule can be written explicitly as: %%\n\n.|\n { printf("%s", yytext); }\n%% Building on this, the article presents a more practical example: a Lex program that highlights SQL statements. The Lex file begins with a definitions block ( %{ … %}) that includes #include <stdio.h> and defines ANSI colour macros ( BOLD, GREEN, BLUE) and a helper macro _format to wrap strings with colour codes.
%{
#include <stdio.h>
#define BOLD "\e[1;30m"
#define GREEN "\e[32m"
#define BLUE "\e[34m"
#define FORMAT_RESET "\e[0m"
#define _format(format, str) "%s%s%s", format, str, FORMAT_RESET
%}
%%
SELECT|FROM|WHERE { printf(_format(BOLD, yytext)); }
[0-9]+ { printf(_format(GREEN, yytext)); }
\"[^\"]*\" { printf(_format(BLUE, yytext)); }
%%
int main() {
printf("%s
", "Input SQL:");
yylex();
}
After saving this file (e.g., sql.l), compile it with: $ lex sql.l\n$ gcc lex.yy.c -ll The resulting program reads an SQL query from standard input and prints keywords in bold, numbers in green, and string literals in blue.
Lex provides a default main() function in its static library ( libl.a), which is linked when the -ll option is used. The symbol table can be inspected with:
$ nm /usr/lib/x86_64-linux-gnu/libl.a
libmain.o:
U _GLOBAL_OFFSET_TABLE_
U exit
0000000000000000 T main
U yylex
libyywrap.o:
0000000000000000 T yywrap
If the user supplies their own main(), the library’s default is overridden. This flexibility, together with Lex’s ability to generate scanners from concise specifications, makes it a handy tool for building command‑line utilities, compilers, configuration parsers, and syntax‑highlighting programs.
Overall, the article illustrates how a few percent signs and a simple rule file can be leveraged to produce functional tools, and encourages readers to explore more sophisticated Lex specifications for real‑world applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
