Features

The UniCC LALR(1) Parser Generator features the following, unique tools and possibilities.

Target language independency

UniCC is not bound to a specialized programming language. It features two kinds of code generators: a program-module generator that constructs program code for a particular target programming language based on a target language template, and a parser description generator. Latter one is an XML-based output file format which can be handled by any individual subsequent tasks or modules - specialized code-generators, analyzers, or direct interpreters.

A chart, explaining the code generation possibilities of UniCC.Parser templates for new code-generation targets can be easily implemented via a tagged file that is used by UniCC. No rebuild of UniCC itself is required to support new or modified targets.

Parser construction modes

UniCC is a flexible parser generator that can handle two different methods to construct its parsers and their lexical analyzators.

The first and default method is called the sensitive parser construction mode. This construction mode is a speciality of UniCC, and gives a maximum of flexibility to implement parsers for nearly any type of context-free language. UniCC analyzes and rewrites the grammar according to several rules influencing whitespace and lexeme detection and separation. The lexical analysis, including whitespace, can be broken down to single input characters to enable full context-free grammars on lexem level this way. Lexical analysis is still done silently, but with the option that there is no direct cut between lexer and parser required.

As an example, this sensitively expressed four function calculator

#!mode sensitive;

#left '+' '-';
#left '*' '/';

#whitespaces whitespace;
whitespace -> ' \t\n';

#lexeme integer;
integer -> '0-9'+ ;

start$ -> expr* ;

expr -> expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '(' expr ')'
| integer
;

results with the input 12 * 3 in the following syntax tree

Visualized image of a parse tree example in sensitive mode.The second method, called insensitive parser construction mode, always uses one single lexical analyzer that identifies terminal symbols. The difference to the sensitive mode is, that lesser states are produced, because the grammar is not rewritten, and whitespace is directly absorbed within the stage of lexical analysis. Overlapping character-classes can not be used in this mode. This construction mode can be compared to most other parser generators like the one used by the combination of lex and yacc.

The same input like above parsed with the insensitively configured grammar

#!mode insensitive;

#left '+' '-';
#left '*' '/';

#whitespaces ' \t\n';

@integer '0-9'+ ;

start$ -> expr* ;

expr -> expr '+' expr
| expr '-' expr
     | expr '*' expr
     | expr '/' expr
     | '(' expr ')'
     | @integer
     ;

yields in a syntax tree

A visualized parse tree example using insensitive mode.