Design and Implementation of TinyLanguage Using Flex and Bison
The article walks readers through designing and implementing TinyLanguage—a minimal interpreted language with 4‑byte integers, constants, if/else, for loops, and print—by using Flex for lexical analysis, Bison for parsing into an AST, and an execution engine, even showing a Fibonacci example and future LLVM compilation possibilities.
This article introduces the concept of computer programming languages and focuses on designing and implementing a minimal language called TinyLanguage . It discusses why one might create a new language, the required features (4‑byte integer variables, constants, if/else, for loops, and a built‑in print function), and provides a high‑level overview of the language’s syntax.
1. Example C program
#include
#include
int main(int argc, char *argv[])
{
int i, n;
int t1 = 0, t2 = 1;
int nextTerm = t1 + t2;
if (argc != 2) {
fprintf(stderr, "please input one number\n");
return -1;
}
n = atoi(argv[1]);
for (i = 3; i <= n; ++i) {
t1 = t2;
t2 = nextTerm;
nextTerm = t1 + t2;
}
printf("%dth value is %d\n", n, nextTerm);
return 0;
}The program computes the N‑th Fibonacci number and prints the result.
2. Lexical analysis with Flex
Flex is used to generate a C lexer that recognizes digits, identifiers, keywords, and operators. The lexer definition is split into three sections (definitions, rules, and user code). Example Flex rules:
/* definitions */
DIGIT [0-9]
ID [a-zA-Z][a-zA-Z0-9]*
%%
{DIGIT}+ { printf("An integer %s(%d)\n", yytext, atoi(yytext)); yylval.integer = atoi(yytext); return INTEGER; }
{ID} { printf("An identifier(%s)\n", yytext); strcpy(yylval.identifier, yytext); return VARIABLE; }
[\t\n]+ { /* eat up whitespace */ }
. { printf("Unrecognized character: %s\n", yytext); }
%%
int yywrap() { return 1; }
int main(int argc, char *argv[]) {
yyin = stdin;
yylex();
return 0;
}Running the lexer on the input 123 abc abc123 produces:
An integer 123(123)
An identifier(abc)
An identifier(abc123)3. Grammar with Bison
Bison generates a LR parser from a context‑free grammar. The grammar defines non‑terminals such as program , function , stmt , and expr . Tokens and their precedence are declared, and each rule can contain C actions that build an abstract syntax tree (AST). Example Bison fragment:
%union {
int integer;
char identifier[256];
struct listnode *node_ptr;
}
%token
INTEGER
%token
VARIABLE
%token IF ELSE FOR
%nonassoc IFX FOR
%nonassoc ELSE
%nonassoc '='
%left AND OR
%left GE LE EQ NE '>' '<'
%left '+' '-'
%left '*' '/'
%precedence NEG
%%
program: function { DEBUG("program done!\n"); } ;
function: function stmt { list_add_tail(&g_stmt_list, $2); } | /* empty */ ;
stmt: ';' { $$ = parse_operator(';', 2, NULL, NULL); }
| expr ';' { $$ = $1; }
| '{' stmt_list '}' { $$ = $2; }
| IF '(' expr ')' stmt %prec IFX { $$ = parse_operator(IF, 2, $3, $5); }
| IF '(' expr ')' stmt ELSE stmt { $$ = parse_operator(IF, 3, $3, $5, $7); }
| FOR '(' expr ';' expr ';' expr ')' stmt { $$ = parse_operator(FOR, 4, $3, $5, $7, $9); }
;
expr: INTEGER { $$ = parse_constant($1); }
| VARIABLE { $$ = parse_variable($1); }
| VARIABLE '=' expr { $$ = parse_operator('=', 2, parse_variable($1), $3); }
| expr '+' expr { $$ = parse_operator('+', 2, $1, $3); }
| expr '-' expr { $$ = parse_operator('-', 2, $1, $3); }
| expr '*' expr { $$ = parse_operator('*', 2, $1, $3); }
| expr '/' expr { $$ = parse_operator('/', 2, $1, $3); }
| '(' expr ')' { $$ = $2; }
| '-' expr %prec NEG { $$ = parse_operator(NEG, 1, $2); }
;
%%The actions use helper functions such as parse_constant , parse_variable , and parse_operator to construct nodes of types nbr_node , var_node , and opr_node . The opr_node stores the operator code, the number of operands, and a linked list of operand nodes.
4. AST node definitions
enum NodeType { NT_OPR = 0, NT_NBR, NT_VAR, NT_NON };
struct listnode { struct listnode *next; struct listnode *prev; };
struct ast_node { enum NodeType type; struct listnode node; };
struct nbr_node { struct ast_node ast_node; int value; };
struct var_node { struct ast_node ast_node; char name[256]; int value; };
struct opr_node { struct ast_node ast_node; int opr; int nops; struct listnode expr_list; };Helper functions allocate these structures and link them into a tree. For example, parse_operator uses va_list to accept a variable number of operand nodes and stores them in expr_list .
5. Execution engine
The interpreter walks the AST. run_stmt dispatches on node type and calls run_operator for operators. Sample execution of the + operator:
case '>': {
struct listnode *left_node = list_head(&opr_node->expr_list);
struct listnode *right_node = left_node->next;
return run_stmt(left_node) > run_stmt(right_node);
}Variable handling uses a global symbol list ( g_sym_list ) to store name/value pairs. If a variable is not found, it is created with an initial value of 0.
6. Full program flow
The driver reads a source file into memory, calls yyparse() (generated by Bison) to build the AST, then iterates over the global statement list ( g_stmt_list ) and executes each statement with run_stmt . The result of the program is the result of the last executed statement.
7. Fibonacci example
The article shows a TinyLanguage program that computes Fibonacci numbers using the language features described above. When compiled and run as ./tiny_lang fibonacci.t 10 , it prints the sequence up to the 10th term:
=====PRINT:1=====
=====PRINT:1=====
=====PRINT:2=====
=====PRINT:3=====
=====PRINT:5=====
=====PRINT:8=====
=====PRINT:13=====
=====PRINT:21=====
=====PRINT:34=====
=====PRINT:55=====
program result:08. Extensions
The author notes that TinyLanguage is currently an interpreted language but could be transformed into a compiled language using LLVM, referencing the LLVM tutorial for building a compiler front‑end.
Overall, the article provides a step‑by‑step guide to building a tiny interpreted language, covering lexical analysis with Flex, grammar definition with Bison, AST construction, and execution, making it a valuable resource for developers interested in compiler construction and language design.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.