Table of Contents

General

Introduction

phorward is a free toolkit for parser development, lexical analysis, regular expressions and more.

The toolkit is primarily a library, written in C, that provides an interface for defining, running and processing parsers, lexical anlyzers and regular expressions through a consistent and easy-to-use interface. The following example program defines a simple expressional language, runs a parser on it and prints the generated abstract syntax tree.

#include <phorward.h>

int main()
{
    pparse* parser;
    ppast*  ast;
    char*   input = "1+2*(3+4)+5";
    char*   end;

    parser = pp_create( 0,  "@int /[0-9]+/ ;"
                            "fact : int | '(' expr ')' ;"
                            "term : @mul( term '*' fact ) | fact ;"
                            "expr : @add( expr '+' term ) | term ;" );

    if( !pp_parse_to_ast( &ast, parser, input, &end ) )
        return 1; /* parse error */

    pp_ast_dump_short( stdout, ast );
    return 0;
}

It can easily be compiled with: cc -o example example.c -lphorward

Furthermore, the toolkit comes with a command-line tool serving testing and prototyping facilities. The following command call yields in an equivalent parser and its abstract syntax tree, althought some symbol names are shortened.

$ pparse -e "1+2*(3+4)+5" -g "[int #fn_int] /[0-9]+/; f: int | '(' e ')'; t: [mul #fn_mul]( t '*' f ) | f; e: [add #fn_add]( e '+' t ) | t;"

phorward also provides useful general-purpose extensions for C programming. This includes dynamic data structures (e.g. linked lists, hash-tables, stacks and arrays), extended string management functions and platform-independent, system-specific helper functions.

Features

phorward provides the following features:

Please check out http://phorward.phorward-software.com/ continuously to get latest news, documentation, updates and support on the Phorward Toolkit.

Building from sources

Building the Phorward Toolkit is simple as every GNU-style open source program. Extract the release tarball or clone the Mercurial repository into a directory of your choice.

Then, run

$ ./configure

to configure the build-system and generate the Makefiles for your current platform. After successful configuration, run

$ make

and

$ make install

(properly as root), to install the toolkit into your system.

On Windows systems, the usage of http://cygwin.org/ or another Unix shell environment is required. The Phorward Toolkit also perfectly cross-compiles on Linux using the MinGW and MinGW_x86-64 compilers.

To compile into 32-Bit Windows executables, configure with

$ ./configure --host=i486-mingw32 --prefix=/usr/i486-mingw32

To compile into 64-Bit Windows executables, configure with

$ ./configure --host=x86_64-w64-mingw32 --prefix=/usr/x86_64-w64-mingw32

Alternative local development system

Alternatively to the autotools build system used for installation, there is also a simpler method on setting up a local build system for development and testing purposes.

Once, type

$ make -f Makefile.gnu make_install

then, a simple run of

$ make

can be used to simply build the entire library or parts of it.

Note, that changes to the build system then must be done in the local Makefile, the local Makefile.gnu as well as the Makefile.am for the autotools-based build system.

Who develops libphorward?

The Phorward Toolkit is developed and maintained by Jan Max Meyer, Phorward Software Technologies.

This work is the result of several years experiencing in parser development systems, and has been preceded by the open source parser generators UniCC and JS/CC. It shall be the final step for an ultimate, powerful compiler toolchain, mainly focusing on compiler-frontends. A sister project is the pynetree parsing library which is written in and for the Python programming language. It shares the same BNF-syntax for expressing grammars.

Help of any kind to extend and improve this software is always appreciated.

Copyright

Copyright © 2006-2017 by Phorward Software Technologies, Jan Max Meyer.

You may use, modify and distribute this software under the terms and conditions of the 3-clause BSD license. The full license terms can be obtained from the file LICENSE.

THIS SOFTWARE IS PROVIDED BY JAN MAX MEYER (PHORWARD SOFTWARE TECHNOLOGIES) AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL JAN MAX MEYER (PHORWARD SOFTWARE TECHNOLOGIES) BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Guidelines

The Phorward Toolkit, or shortly called libphorward, is a well-established C programming library which has its origins in several existing projects, where many features had been modularized and unified for a general purpose usage.

The roots of this library go back into the year 2006, where Phorward Software began with the development of a universal programming library shared by several C projects that existed at this time. Since then, the library heavily grew and also had many redesign stages behind.

Althought time went on, libphorward still is a state-of-the-art software toolchain for many everyday C programmer's tasks.

libphorward has its own function style paradigm and coding standard. Therefore, the following rules and conditions are characteristic when working with libphorward's tools.

As general C coding style, we don't use ugly K&R style or any other coding standard style. The libphorward style (which is only our style within the library) is:

/* Clear, use of spaces behind and before brackets,
	curly-bracket blocks are _ALWAYS_ written in SINGLE lines: */
if( a && !b( c ) )
{
	/* newlines on every block */
	c = 0;
	we_like_this_style();
}

/* Block only where more than one statement is executed */
if( !c )
	fuck_yeah();
else
{
	c = 0;
	this_is_better();
}

/* Cascading view on multiple lines, never go over 80 chars
	per line */
if( this_would_be_a_long_condition()
		&& i_would_prefer_a_cascading_code_style()
			&& i < 300
				&& that_looks_cooler_eh() ? FALSE : TRUE )
	that_s_libphorward_style( TRUE );

If you don't like our coding style, you still can code on your own style, but please not within libphorward's own sources which are for the general public - these shall be kept clear - Thank you! :)

General tools

The Phorward Toolkit provides some extensional general-purpose functions, making C programming a little way easier and simpler. It is not necessary or required to use these functions when creating software that makes use of features from the Phorward Toolchain, but maybe they fit some developer's needs and could be useful.

Booleans

libphorward makes a heavy usage of the self-hosted data-type pboolean, which is also referenced by a define BOOLEAN, and the constants TRUE and FALSE.

It is known, that several header files and projects define these constants on their own, so that pboolean, the data-type, exists on its own using the p-prefix, but any of the defines are conditionally flagged with

#ifndef TRUE
#define TRUE					1
#endif

within phorward.h.

Replacement for memory functions

The standard memory allocation functions can be replaced by these pendants:

These functions are entirely used trough libphorward's internal object mapping functions.

Debug and trace facilities

Althought this option is not widely used in modern C/C++ projects, libphorward offers an own debug and trace facility that can be turned on for modules for detecting bugs or view the program trace.

For this, the library provides the following macros:

So when a function is written like this in libphorward's way:

int faculty( int x )
{
	int ret;

	PROC( "faculty" );
	PARMS( "x", "%d", x );

	if( x < 0 )
	{
		WRONGPARAM;
		RETURN( -1 );
	}
	else if( x == 0 )
	{
		MSG( "x is 0, so faculty is 1" );
		RETURN( 1 );
	}
	else
	{
		MSG( "Calling faculty recursively with:" );
		VARS( "x - 1", "%d", x - 1 );
		ret = x * faculty( x - 1 );
	}

	VARS( "ret", "%d", ret );
	RETURN( ret );
}

Trace is written to stderr, and only will be compiled into the executable code if the DEBUG preprocessor-flag is defined with a value > 0.

Calling this function with

faculty( 3 );

yields in a debug log

20772 (demo.c:  380) {
20772 (demo.c:  380) .ENTRY : faculty
20772 (demo.c:  381) .PARMS : x = >3<
20772 (demo.c:  395) .MSG   : Calling faculty recursively with:
20772 (demo.c:  396) .VARS  : x - 1 = >2<
20772 (demo.c:  380) .{
20772 (demo.c:  380) ..ENTRY : faculty
20772 (demo.c:  381) ..PARMS : x = >2<
20772 (demo.c:  395) ..MSG   : Calling faculty recursively with:
20772 (demo.c:  396) ..VARS  : x - 1 = >1<
20772 (demo.c:  380) ..{
20772 (demo.c:  380) ...ENTRY : faculty
20772 (demo.c:  381) ...PARMS : x = >1<
20772 (demo.c:  395) ...MSG   : Calling faculty recursively with:
20772 (demo.c:  396) ...VARS  : x - 1 = >0<
20772 (demo.c:  380) ...{
20772 (demo.c:  380) ....ENTRY : faculty
20772 (demo.c:  381) ....PARMS : x = >0<
20772 (demo.c:  390) ....MSG   : x is 0, so faculty is 1
20772 (demo.c:  391) ....RETURN: faculty
20772 (demo.c:  391) ...}
20772 (demo.c:  400) ...VARS  : ret = >1<
20772 (demo.c:  401) ...RETURN: faculty
20772 (demo.c:  401) ..}
20772 (demo.c:  400) ..VARS  : ret = >2<
20772 (demo.c:  401) ..RETURN: faculty
20772 (demo.c:  401) .}
20772 (demo.c:  400) .VARS  : ret = >6<
20772 (demo.c:  401) .RETURN: faculty
20772 (demo.c:  401) }

Command-line tools

libphorward also provides some general-purpose command-line tools which are installed and made available. These tools are heavily used by libphorward's own build process.

The general command-line tools are written as shell scripts in combination with standard Unix command-line utilities like awk, grep and sed.

String tools

The libphorward provides a set of functions for extended, dynamic string memory handling. These functions are named according to their standard C library counterparts, with a preceded "p".

These function are

Additionally, the following pendants for wide-character strings (wchar_t) exist and are available, when libphorward is compiled with the UNICODE flag enabled.

Althought the wide-character pendants to the standard extended string functions are not consistent right now, they may be extended in the future, when the existence of such functions is needed, or another, brave programmer is having the fun to implement them.

Dynamic data management tools

parray: Arrays and stacks

Overview

The parray object is a general-purpose data structure which can be used for several operations.

parray forms a data management container for handling homogenious elements of the same size in a dynamic way. These elements can be atomic data types, pointers or structures. Elements are automatically allocated with a specified chunk-size, and can be appended or prepended to the given parray object.

The parray object brings the following advantages and disadvantages:

Construction and destruction

parray objects are created using parray_create() or initialized with parray_init(). The specific functions require the objects byte size that is required for every single element, and a chunk-size. Latter one can be omitted by specifing a chunk-size of 0, so that 128 elements per chunk will be the default. Objects can be cleared with parray_erase(), respectively cleared and freed with parray_free().

parray* a;

a = parray_create( sizeof( usertype ), 0 );

/* Do something... */

parray_free( a );

Inserting elements

Elements can be inserted with

parray* a;
usertype t;
usertype* tp;

fill_usertype( &t );
parray_insert( a, 2, &t );

fill_usertype( &t );
parray_push( a, &t );
parray_shift( a, &t );

tp = (usertype*)parray_malloc( a );
fill_usertype( tp );
tp = (usertype*)parray_rmalloc( a );
fill_usertype( tp );

Accessing and iterating elements

Direct access to specific elements is done using

Iterating over the elements:

parray* a;
usertype* tp;
int i;

/* Iterate from first to last. */
for( i = 0; ( tp = (usertype*)parray_get( a, i ) ); i++ )
	;

/* Iterate from last to first. */
for( i = 0; ( tp = (usertype*)parray_rget( a, i ) ); i++ )
	;

Elements can be replaced with

Quick access to the first or last elements is gained by parray_last() and parray_first().

Removing elements

Elements can be removed with

parray* a;
usertype t;
usertype* tp;

parray_remove( a, 2, &t );
tp = (usertype*)parray_pop( a );
tp = (usertype*)parray_unshift( a );

Additional functions

Other, useful functions are

plist: Linked lists, hash-tables and queues

Overview

Next to the parray object, the plist object is a powerful C implementation of a double-linked list with some extra features. It is also used for handling homogenious elements of the same size in a dynamic way, and can be used for many tasks.

plist can be seen as a superset of the parray object, because it features nearly the same operations but with other underlying data management methods.

The plist object implements:

plist can be used as a generic data structure for

The plist object brings the following advantages and disadvantages:

Construction and destruction

plist objects are created using plist_create() or initialized with plist_init(). The specific functions require the objects byte size that is required for every single element, and a flag configuration, that configures the plist object to a specified behavior.

Possible flags are listed below, and can be combined using bitwise or (|).

The mode PLIST_MOD_PTR is automatically set if the elements size is specified as 0.

A plist object must be freed using plist_free() or cleared with plist_clear().

plist* l;

l = plist_create( sizeof( usertype ), PLIST_MOD_RECYCLE );

/* Do something... */

plist_free( l );

Inserting elements

Elements can be inserted with

plist* l;
usertype t;
usertype* tp;

/* Insert to position */
fill_usertype( &t );
plist_insert( l, plist_get( l, 2 ), (char*)NULL, &t );

/* Insert to end, with key value */
plist_insert( l, (plistel*)NULL, "hello", &t );

/* Insert and end and begin */
fill_usertype( &t );
plist_push( a, &t );
plist_shift( a, &t );

/* Retrieve fresh element memory at end and begin */
tp = (usertype*)plist_malloc( a );
fill_usertype( tp );
tp = (usertype*)plist_rmalloc( a );
fill_usertype( tp );

Accessing and iterating elements

Elements within a plist object are references by plistel items. To access the data element behind a plistel item, the function plist_access() is used, in combination with element retrival functions, like

plist* l;
plistel* e;
usertype* tp;
int i;

/* Get 6th data element */
tp = (usertype*)plist_access( plist_get( l, 5 ) );

/* Get data element with key "hello" */
tp = (usertype*)plist_access( plist_get_by_key( l, "hello" ) );

/* Iterate from begin to end */
for( e = plist_first( l ); e; e = plist_next( e ) )
	tp = (usertype*)plist_access( e );

/* Alternative: Using the plist_for()-macro */
plist_for( l, e )
	tp = (usertype*)plist_access( e );

/* Alternative: Using offset */
for( i = 0; ( tp = (usertype*)plist_access( plist_get( l, i ) ) ); i++ )
	;

/* Reversely iterate from end to begin */
for( e = plist_last( l ); e; e = plist_prev( e ) )
	tp = (usertype*)plist_access( e );

/* Reverse alternative: Using offset */
for( i = 0; ( tp = (usertype*)plist_access( plist_rget( l, i ) ) ); i++ )
	;

Removing elements

To remove elements from a plist object, the following functions can be used.

plist* l;
usertype t;

plist_remove( l, plist_get( l, 7 ) );
plist_pop( a, &t );
plist_unshift( a, &t );

Sorting elements

plist objects provide a sorting automatism, so that elements can be sorted on demand or on the fly at each element insertation.

The sorting order and rule is defined using an element comparison callback-function, which returns element lower, greater or equality like strcmp() does. This function can be individually set using plist_set_sortfn(), and defaultly points to a callback function that uses memcmp() as the element comparison function.

As prototype and example, the following comparison function:

int my_compare( plist* list, plistel* l, plistel* r )
{
	usertype*	tr;
	usertype*	tl;

	tl = (usertype*)plist_access( l );
	tr = (usertype*)plist_access( r );

	if( tl < tr )
		return -1;
	else if( tl > tr )
		return 1;

	return 0;
}

This can, than, be attached to the plist object with

plist_set_sortfn( l, my_compare );

To configure a plist object to be auto-sorted, the flag PLIST_MOD_AUTOSORT should be used at construction. The sorting can be also be performed by invoking the functions

Interchanging functions

plist objects also provide functions to handle data collections and sets, by providing functions that can deal with two objects of kind plist. Both lists must be configured with the same element memory size, else all functions will fail.

To implement these functions, every plist object also refers to a comparison-callback function. This is, by default, the same function as used for the sorting, and has also the same signature. This function can be implemented to check for element equality within set handling functions.

Additional functions

plist provides these additional functions:

pccl: Character-classes

The pccl object is established on top of the plist object and encapsulates easy-to-handle low-level functions for character-class handling.

These functions are heavily used by the library's regular expressions and parser implementations, but may also be helpful for other related projects. The pccl can handle character classes, by chaining ranges.

It supports a fully-fledged set theory automatism, including construction of intersections, unions, range removal or appending. pccl objects are designed to work on huge alphabets with a low memory consumption. By default, characters are specified as wchar_t (wide-character unicode) values.

pccl* ccl;

/* Construct a character-class within a universe of the ASCII-alphabet (0-255):
	"-0123456789ABCDEFGHIJKLMNOQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"
*/
ccl = p_ccl_create( 0, 255, "A-Za-z0-9_-" );

/* Invert character class */
p_ccl_negate( ccl );

/* Remove system chars */
p_ccl_delrange( ccl, 0, 31 );

/* Oh, and delete the "z" */
p_ccl_del( ccl, 'z' );

/* But add tab again! */
p_ccl_add( ccl, '\t' );

/* Enable all from 32 to 126  */
p_ccl_addrange( ccl, ' ', '~' );

/* Alternative way: */
p_ccl_parse( ccl, " -~", TRUE );

/* Test for characters */
printf( "A is in? %s\n", BOOLEAN_STR( p_ccl_test( ccl, 'A' ) );
printf( "a-z are in? %s\n", BOOLEAN_STR( p_ccl_testrange( ccl, 'a', 'z' ) );
printf( "\\n is in? %s\n", BOOLEAN_STR( p_ccl_test( ccl, '\n' ) );

/* Generate string */
printf( "My ccl is: %s\n", p_ccl_to_str( ccl, TRUE ) );

/* Drop it! */
p_ccl_free( ccl );

Useful, additional functions when working with pccl are:

pregex, plex: Regular expression tools

Overview

libphorward provides a powerful set of functions relating to general string pattern matching and lexical analysis using regular expressions.

Meta constructs

In general, regular expressions are made-up of the following elements:

Construct Usage
[...] or [^...] Specifies a character, character-class or negated character-class (^).
. Specifies a character-class standing for "any character". Using this construct causes the terminal to be configured as "non-greedy".
( and ) Parantheses to build sub-expressions.
| The alternative operator to define multiple expressions at one expression level.
* Kleene closure (none or several of previous expression) modifier.
+ Positive closure (one or several of previous expression) modifier.
? Optional closure (none or one of previous expression) modifier.

All meta-characters can be escaped by backslash, so they are interpretered as usual characters.

Characters and escape sequences

Any other character coming up is consumed as one expression to be matched in the input. By default, all strings are interpretered in UTF-8 encoded unicode style, so unicode is fully supported.

Escape sequences (C-style) are supported according to the following table:

Escape sequence Description
\a Bell (alert)
\b Backspace
\f Formfeed
\n New line
\r Carriage return
\t Horizontal tab
\v Vertical tab
\' Single quotation mark
\" Double quotation mark
\\ Backslash
\OOO ASCII character in octal notation, (O = octal digit)
\xHH ASCII character in hexadecimal notation (H = hexadecimal digit)
\uHHHH 32-Bit Unicode character in hexadecimal notation (H = hexadecimal digit)
\UHHHHHHHH 64-Bit Unicode character in hexadecimal notation (H = hexadecimal digit)

Note: When specifying escape-sequences in static strings within C-code, they must be double-escaped, because they are first analyzed by the C compiler and then by libphorward's regex-parser. Therefore, specifiying a backslash \\ must become \\\\ in the C-Code, which then yields in \\ after compiling the C program.

Shorthand character classes

Pre-defined shorthand character-classes are also supported by libphorward's pregex and plex tools.

Shorthand Complains with Explanation
\w [A-Za-z0-9_] All alphanumeric characters (ASCII only)
\W [^A-Za-z0-9_] Any other than (ASCII only) alphanumeric characters
\d [0-9] All digit characters (ASCII-only)
\D [^0-9] Any other than (ASCII only) digit characters
\s [ \f\n\r\t\v] All whitespace characters (ASCII-only)
\S [^ \f\n\r\t\v] Any other than (ASCII only) whitespace characters

Anchoring

The following anchors are supported, when specified at beginning or ending of an expression:

Anchor Usage
^ Anchor at begin of pattern, matching begin-of-line.
$ Anchor at end of pattern, matching end-of-line.
< Anchor at begin of pattern, matching begin-of-word.
> Anchor at end of pattern, matching end-of-word.

Anchors can be entirely switched off when setting the PREGEX_COMP_NOANCHORS on compile-time, respectively PREGEX_RUN_NOANCHORS on run-time.

Examples

Some examples:

(TODO: more examples)

pregex: Operating on regular expressions

The pregex object is the object-oriented interface for string operations based on regular expressions.

Patterns are compiled into a DFA and associated with the pregex object as a reusable state machine that can be executed several times.

Generally, the actions

are supported by the compiled pattern.

Construction and destruction

pregex objects are constructed by pregex_create(). First parameter is the regular expression pattern string that is compiled into a DFA. The second parameter allows to specify several flags for influencing the compile- and execution process. All flags can be combined using the bitwise or-operator (|).

Flag Usage
PREGEX_COMP_WCHAR The regular expression provided to pregex_create() shall be casted to wchar_t.
PREGEX_COMP_NOANCHORS Ignore anchor tokens, handle them as normal characters
PREGEX_COMP_NOREF Don't compile references.
PREGEX_COMP_NONGREEDY Compile regex to be forced nongreedy.
PREGEX_COMP_NOERRORS Don't report errors, and try to compile as much as possible
PREGEX_COMP_INSENSITIVE Parse regular expression as case insensitive.
PREGEX_COMP_STATIC The regular expression passed should be converted 1:1 as it where a string-constant. Any regex-specific symbols will be ignored and taken as they where escaped.
PREGEX_RUN_WCHAR Run regular expression with wchar_t as input.
PREGEX_RUN_NOANCHORS Ignore anchors while processing the regex.
PREGEX_RUN_NOREF Don't create references.
PREGEX_RUN_NONGREEDY Force run regular expression nongreedy.
PREGEX_RUN_DEBUG Debug mode; output some debug to stderr.

pregex_free() destructs and releases a pregex object after its use.

pregex* r;

r = pregex_create( "[_A-Za-z]+", 0 );

/* do something with r */

pregex_free( r );

Matching

To immediatelly test if a pregex-object matches on a string, the function pregex_match() shall be invoked.

pregex* r;
char* s = "a1337b";
char* e;

r = pregex_create( "[0-9]+", 0 );
pregex_match( r, s, &e ); /* returns FALSE */
pregex_match( r, s + 1, &e ); /* returns TRUE, e receives s+5. */

pregex_match() only tests for a string that immediatelly matches the pattern. To find a matching pattern within a string, the function pregex_find() shall be invoked, which is called with the same parameters, but returns the position of the match instead of a bool state.

pregex_find( r, s, &e ); /* returns s + 1, e receives s+5. */

To find all matching patterns, pregex_find() must be called in a loop.

while( ( s = pregex_find( r, s, &e ) ) )
{
	printf( ">%.*s<\n", e - s, s );
	s = e;
}

The function pregex_findall() can do this with one call, and fills an parray object with prange structures.

parray* a;
prange* rg;

pregex_findall( r, s, &a );

while( ( rg = (prange*)parray_shift( a ) ) )
	printf( ">%.*s<\n", rg->end - rg->begin, rg->begin );

parray_free( a );

Splitting

Splitting a string by a regular expression can be done with pregex_split(). This function takes several parameters, and is designed to be called in a loop.

pregex* r;
char* s = "5 and 6 are the cross sums of 23 and 42.";
char* e;
char* n;

r = pregex_create( "[0-9]+", 0 );

while( s )
{
	if( ( s = pregex_split( r, s, &e, &n ) ) )
			printf( ">%.*s<\n", e - s, s );

	s = n;
}

There is also an one call shortcut pregex_splitall(), filling an array:

parray* a;
prange* rg;

pregex_splitall( r, s, &a );

while( ( rg = (prange*)parray_shift( a ) ) )
	printf( ">%.*s<\n", rg->end - rg->begin, rg->begin );

parray_free( a );

Replacing

The function pregex_replace() allows for replacing parts of strings by regular expressions.

char* ns;

ns = pregex_replace( r, s, "int" );

The result of this function is always an dynamically allocated string that contains the replaces version of the input string, even if there hadn't been any matches. The above example will return "int and int are the cross sums of int and int." when executed on the string of previous chapter. The returned string must always be released with pfree() after its use.

The replacement may also allow backreference-placeholders defined as $<backreference>, to take parts of the matched regular expression into the replaces string. A back-reference is created for every opening that exists in the regular expression, and begins counting at 1.

This modified version

r = pregex_create( "([0-9]+)", 0 );
ns = pregex_replace( r, s, "int[$1]" );

will return "int[5] and int[6] are the cross sums of int[23] and int[42]." when executed on the example string.

To disable the backreference replacement features, the regular expression flag PREGEX_RUN_NOREF must be switched.

Quick-access functions

Based on the pregex-object, libphorward provides the shortcut functons

for immediate use, without creating and destroying a pregex-object. Because this is done within the function calls, multiple calls of these functions result in huger runtime latencies and produce more overhead.

plex: Lexical analysis using regular expressions

The second part of libphorwards regular expression tools is the plex-object, which encapsulates all required features for creating lexical analyzers (in terms of compiler-writers also called "scanners" or "lexers") into one handy object.

A plex-object can be seen as a container, which merges multiple regular expressions together into one state machine to recognize tokens. A token is then identified by an unique number that is associated with the matching regular expression.

To tokenize a C-styled variable assignment, one could write a simple lexical analyzer, like this:

enum
{
	IDENT = 1, INTEGER, EQUALS, PLUS_OP, SEMICOLON
};

char* nid[] = { "ident", "integer", "equals", "plus_op", "semicolon" };

int m;
char* s = "sum = 5 + 23 + x;";
char* e;
plex* l;

l = plex_create( 0 );

plex_define( l, "[A-Za-z_][A-Za-z0-9_]*", IDENT, 0 );
plex_define( l, "[0-9]+", INTEGER, 0 );
plex_define( l, "=", EQUALS, 0 );
plex_define( l, "+", PLUS_OP, 0 );
plex_define( l, ";", SEMICOLON, 0 );

while( *s && ( s = plex_next( l, s, &m, &e ) ) )
{
	printf( "%s >%.*s<\n", nid[m - 1], e - s, s );
	s = e;
}

plex_free( l );

When running, this yields in the output:

ident >sum<
equals >=<
integer >5<
plus_op >+<
integer >23<
plus_op >+<
ident >x<
semicolon >;<

pparse: Parser development tools

Overview

The libphorward serves as a parser generator and language processing toolchain. It provides a flexible, integrated and consequent solution for any parsing issues.

Grammars

Grammars in the libphorward's pparse module are expressed using the Phorward Grammar Definition Language (GDL), which was itself implemented using the pparse toolchain.

The GDL is a simple but effective implementation of the Backus-Naur-Form (BNF) for expressing grammatical rules, but it does also provide extensions and attributions that influence the construction of the resulting abstract syntax tree. Therefore, it can be seen as a tree-augmented BNF, sometimes called TBNF.

This language is made up of the fundamental elements: terminals, nonterminals and productions, and implements some special symbol attribution that influences the construction of the resulting abstract syntax tree, which is the result of a successful parse.

Terminals

Terminals, also called terminal symbols, are atomics of the implemented language. They are directly read from the input stream. A terminal can be a single character, a string or a regular expression that matches on a pattern. It is on the language designer's choice how terminal symbols are made up in the particular implementation. Some examples for widely used terminals in programming languages are identifiers for variables and functions, operators, brackets, keywords like if or while, floating point or integer numbers, and so on. The parser will expect these terminals in a valid order according to the position it is during the parse - which is in turn defined by the underlying grammatical rules it follows.

In GDL, terminals can be expressed inline or as named terminals.

A named terminal is defined in its own definition block like this:

integer /\d+/;
while_keyword "while";
hexchar '0-9A-F';

Each block is always closed with a semicolon. These blocks associate a named terminal identifier (integer, while_keyword and hexchar) with a particular terminal definition.

Nonterminals

Nonterminals, also called nonterminal symbols, can be seen like some sort of variable or "function calls" within a grammar, although they aren't. They reference to one or a bunch of the so called productions, which means that each production is always associated with at least one nonterminal. But one nonterminal may exist of several productions. They form valid sequences of terminals and other nonterminals which are allowed in the current context in they particular order. In other words: Nontermals can be expanded into their productions.

Productions

Productions, also called grammar rules or just rules, finally describe the syntax. Its better to say, that productions define a syntactical part of the grammar - which can be replaced by the specific nonterminal each production is associated with. This syntactical description is done by defining a sequence in which terminals and nonterminals may occur to form a valid sentence. This includes, that a nonterminal can reference itself recursively in its own productions, which is a very important aspect in non-regular languages. In other words: Productions can be substituted by their associated nonterminals.

Productions are always defined in combination with a nonterminal definition in a block. The name before the colon (:) is the nonterminal that is defined (its also called left-hand side of a production), the sequences behind the colon define the productions (so they are also called right-hand sides). Several productions can be defined by separating them with a pipe (|) symbol. The block must be closed with a semicolon (;).

The full grammar of a four-function calculator (yes, the simples example for a programming language in nearly any compiler construction textbook!) in libphorward's grammar definition languages would be

factor : /\d+/ | '(' expr ')' ;
term : term '*' factor | term '/' factor | factor ;
expr : expr '+' term | expr '-' term | term ;

Utility functions

Coming soon.

Function reference

Macros

PARMS

Definition:

PARMS( char* param_name, char* format, param_type parameter ) - Macro

Usage:

Write parameter content to trace.

The PARMS-macro is used to write parameter names and values to the program trace. PARMS() should - by definition - only be used right behind PROC(). If the logging of variable values is wanted during a function exection to trace, the VARS()-macro shall be used.

param_name is the name of the parameter format is a printf-styled format placeholder. parameter is the parameter itself.

PROC

Definition:

PROC( char* func_name ) - Macro

Usage:

Write function entry to trace.

The PROC-macro introduces a new function level, if compiled with trace.

The PROC-macro must be put behind the last local variable declaration and the first code line, else it won't compile. A PROC-macro must exists within a function to allow for other trace-macro usages. If PROC() is used within a function, the macros RETURN() or VOIDRET, according to the function return value, must be used. If PROC is used without RETURN, the trace output will output a wrong call level depth.

The parameter func_name is a static string for the function name.

RETURN

Definition:

RETURN( function_type return_value ) - Macro

Usage:

Write function return to trace. RETURN() can only be used if PROC() is used at the beginning of the function. For void-functions, use the macro VOIDRET.

return_value is return-value of the function.

VARS

Definition:

VARS( char* var_name, char* format, var_type variable ) - Macro

Usage:

Write variable content to trace.

The VARS-macro is used to write variable names and values to the program trace. For parameters taken to functions, the PARMS()-macro shall be used.

var_name is the name of the variable format is a printf-styled format placeholder. variable is the the parameter itself.

VOIDRET

Definition:

VOIDRET - Macro

Usage:

Write void function return to trace.

VOIDRET can only be used if PROC() is used at the beginning of the function. For typed functions, use the macro RETURN().

Functions

p_ccl_add

Definition:

pboolean p_ccl_add( pccl* ccl, wchar_t ch )

Usage:

Integrates a single character into a character-class.

ccl is the pointer to the character-class to be affected. ch is the character to be integrated.

The function is a shortcut for p_ccl_addrange().

p_ccl_addrange

Definition:

pboolean p_ccl_addrange( pccl* ccl, wchar_t begin, wchar_t end )

Usage:

Integrates a character range into a character-class.

ccl is the pointer to the character-class to be affected. If ccl is provided as (pccl*)NULL, it will be created by the function.

begin is the begin of character range to be integrated. end is the end of character range to be integrated.

If begin is greater than end, the values will be swapped.

p_ccl_compare

Definition:

int p_ccl_compare( pccl* left, pccl* right )

Usage:

Checks for differences in two character-classes.

left is the pointer to the first character-class. right is the pointer to the second character-class.

Returns a value < 0 if left is lower than right, 0 if left is equal to right or a value > 0 if left is greater than right.

p_ccl_compat

Definition:

pboolean p_ccl_compat( pccl* l, pccl* r )

Usage:

Checks if the character-classes l and r are in the same character universe and compatible for operations.

p_ccl_count

Definition:

int p_ccl_count( pccl* ccl )

Usage:

Returns the number of characters within a character-class.

ccl is a pointer to the character-class to be processed.

Returns the total number of characters the class is holding.

p_ccl_create

Definition:

pccl* p_ccl_create( int min, int max, char* ccldef )

Usage:

Constructor function to create a new character-class.

Returns a pointer to the newly created character-class. This pointer should be released with p_ccl_free() when its existence is no longer required.

p_ccl_del

Definition:

pboolean p_ccl_del( pccl* ccl, wchar_t ch )

Usage:

Removes a character from a character-class.

ccl is the pointer to the character-class to be affected. ch is the character to be removed from ccl.

The function is a shortcut for p_ccl_delrange().

p_ccl_delrange

Definition:

pboolean p_ccl_delrange( pccl* ccl, wchar_t begin, wchar_t end )

Usage:

Removes a character range from a character-class.

ccl is the pointer to the character-class to be affected. begin is the begin of character range to be removed. end is the end of character range to be removed.

p_ccl_diff

Definition:

pccl* p_ccl_diff( pccl* ccl, pccl* rem )

Usage:

Returns the difference quantity of two character-classes. All elements from rem will be removed from ccl, and put into a new character-class.

ccl is the pointer to the first character-class. rem is the pointer to the second character-class.

Returns a new pointer to a copy of ccl, without the ranges contained in rem. Returns (pccl*)NULL in case of memory allocation or parameter error.

p_ccl_dup

Definition:

pccl* p_ccl_dup( pccl* ccl )

Usage:

Duplicates a character-class into a new one.

ccl is the pointer to the character-class to be duplicated.

Returns a pointer to the duplicate of ccl, or (pcrange)NULL in error case.

p_ccl_erase

Definition:

pboolean p_ccl_erase( pccl* ccl )

Usage:

Erases a character-class ccl.

The function sets a character-class to zero, as it continas no character range definitions. The object ccl will be still alive. To delete the entire object, use p_ccl_free().

p_ccl_free

Definition:

pccl* p_ccl_free( pccl* ccl )

Usage:

Frees a character-class ccl and all its used memory.

The function always returns (pccl*)NULL.

p_ccl_get

Definition:

pboolean p_ccl_get( wchar_t* from, wchar_t* to, pccl* ccl, int offset )

Usage:

Return a character or a character-range by its offset.

If the function is called only with pointer from provided, and to as (wchar_t*)NULL, it writes the character in offsetth position of the character-class into from.

If the function is called both with pointer from and to provided, it writes the begin and end character of the character-range in the offsetth position of the character-class into from and to.

If no character or range with the given offset was found, the function returns FALSE, meaning that the end of the characters is reached. On success, the function will always return TRUE.

p_ccl_instest

Definition:

pboolean p_ccl_instest( pccl* ccl, wchar_t ch )

Usage:

Tests for a character in case-insensitive-mode if it matches a character-class.

ccl is the pointer to character-class to be tested. ch is the character to be tested.

The function is a shortcut for p_ccl_testrange().

It returns TRUE, if the character matches the class, and FALSE if not.

p_ccl_intersect

Definition:

pccl* p_ccl_intersect( pccl* ccl, pccl* within )

Usage:

Returns a new character-class with all characters that exist in both provided character-classes.

ccl is the pointer to the first character-class. within is the pointer to the second character-class.

Returns a new character-class containing the insersections from ccl and within. If there is no intersection between both character-classes, the function returns (pccl*)NULL.

p_ccl_negate

Definition:

pccl* p_ccl_negate( pccl* ccl )

Usage:

Negates all ranges in a character-class.

ccl is the pointer to the character-class to be negated.

Returns a pointer to ccl.

p_ccl_parse

Definition:

pboolean p_ccl_parse( pccl* ccl, char* ccldef, pboolean extend )

Usage:

Parses the character-class definition provided in ccldef and assigns this definition to the character-class ccl. ccldef may contain UTF-8 formatted input. Escape-sequences will be interpretered to their correct character representations.

A typical character-class definition simply exists of single characters and range definitions. For example, "$A-Z#0-9" defines a character-class that consists of the characters "$#0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ".

The parameter extend specifies, if the provided character-class overwrites (extend = FALSE) or extends (extend = TRUE) the provided character-class. This means that definitions that already exist in the character-class, should be erased first or not.

The function returns TRUE on success, and FALSE on an error.

p_ccl_parsechar

Definition:

size_t p_ccl_parsechar( wchar_t* retc, char *str, pboolean escapeseq )

Usage:

Reads a character from a string. The character may exist of one single character or it may be made up of an escape sequence or UTF-8 character. The function returns the number of bytes read.

retc is the return pointer for the character code of the escaped string. str is the begin pointer of the string at which character parsing begins. If escapeseq is TRUE, the function regards escape sequences, else it ignores them.

Returns the number of bytes that had been read for the character.

p_ccl_parseshorthand

Definition:

size_t p_ccl_parseshorthand( pccl* ccl, char *str )

Usage:

Tries to parse a shorthand sequence from a string. This matches the shorthands \w, \W, \d, \D, \s and \S. If it matches, all characters are added to ccl.

The function returns the number of bytes that had been read for the character. If no shorthand sequence could be found, it returns 0, and leaves ccl untouched.

p_ccl_print

Definition:

void p_ccl_print( FILE* stream, pccl* ccl, int break_after )

Usage:

Print character-class to output stream. This function is provided for debug-purposes only.

stream is the output stream to dump the character-class to; This can be left (FILE*)NULL, so stderr will be used. ccl is the pointer to character-class

break_after defines:

p_ccl_size

Definition:

int p_ccl_size( pccl* ccl )

Usage:

Returns the number of range pairs within a character-class.

ccl is a pointer to the character-class to be processed.

To retrieve the number of characters in a character-class, use p_ccl_count() instead.

Returns the number of pairs the charclass holds.

p_ccl_test

Definition:

pboolean p_ccl_test( pccl* ccl, wchar_t ch )

Usage:

Tests a character-class if it cointains a character.

ccl is the pointer to character-class to be tested. ch is the character to be tested.

The function is a shortcut for p_ccl_testrange().

It returns TRUE, if the character matches the class, and FALSE if not.

p_ccl_testrange

Definition:

pboolean p_ccl_testrange( pccl* ccl, wchar_t begin, wchar_t end )

Usage:

Tests a character-class to match a character range.

ccl is a pointer to the character-class to be tested. begin is the begin of character-range to be tested. end is the end of character-range to be tested.

Returns TRUE if the entire character range matches the class, and FALSE if not.

p_ccl_to_str

Definition:

char* p_ccl_to_str( pccl* ccl, pboolean escape )

Usage:

Converts a character-class back to a string representation of the character-class definition, which in turn can be converted back into a character-class using p_ccl_create().

ccl is the pointer to character-class to be converted. escape, if TRUE, escapes "unprintable" characters in their hexadecimal representation. If FALSE, it prints all characters, except the zero, which will be returned as "\0"

Returns a pointer to the generated string that represents the charclass. The returned pointer belongs to the ccl and is managed by the character-class handling functions, so it should not be freed manually.

p_ccl_union

Definition:

pccl* p_ccl_union( pccl* ccl, pccl* add )

Usage:

Unions two character-classes into a new, normalized one.

ccl is the pointer to the character-class that will be extended to all ranges contained in add. add is character-class that will be unioned with ccl.

The function creates and returns a new character-class that is the union if ccl and add.

pany_convert

Definition:

pboolean pany_convert( pany* val, panytype type )

Usage:

Converts a pany-structure to any supported type.

val is the pany-object to be converted. type is the type define to which val should be converted to.

The function returns TRUE on success, FALSE else.

pany_copy

Definition:

pboolean pany_copy( pany* dest, pany* src )

Usage:

Copy any value from src into dest.

dest will be reset and stand on its own after copying.

pany_create

Definition:

pany* pany_create( char* str )

Usage:

Creates a new pany-object.

It allows for parsing a value from str.

This object must be released after usage using pany_free().

pany_dup

Definition:

pany* pany_dup( pany* src )

Usage:

Duplicate the object src into a new object that stands on its own.

pany_fprint

Definition:

void pany_fprint( FILE* stream, pany* val )

Usage:

Print the type and value of val to stream without any conversion. This function shall be used for debug only.

stream is the stream to write to. val is the pany-object to be printed.

pany_free

Definition:

pany* pany_free( pany* val )

Usage:

Frees an allocated pany object and all its used memory.

The function always returns (pany*)NULL.

pany_get_bool

Definition:

pboolean pany_get_bool( pany* val )

Usage:

Returns the pboolean-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as pboolean. This value could be converted from the original value.

pany_get_char

Definition:

char pany_get_char( pany* val )

Usage:

Returns the char-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as char. This value could be converted from the original value.

pany_get_cstr

Definition:

char* pany_get_cstr( pany* val )

Usage:

Returns the char*-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as char*. This value could be converted from the original value.

pany_get_cwcs

Definition:

wchar_t* pany_get_cwcs( pany* val )

Usage:

Returns the wchar_t*-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as wchar_t*. This value could be converted from the original value.

pany_get_double

Definition:

double pany_get_double( pany* val )

Usage:

Returns the double-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as double. This value could be converted from the original value.

pany_get_float

Definition:

float pany_get_float( pany* val )

Usage:

Returns the float-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as float. This value could be converted from the original value.

pany_get_int

Definition:

int pany_get_int( pany* val )

Usage:

Returns the int-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as int. This value could be converted from the original value.

pany_get_long

Definition:

long pany_get_long( pany* val )

Usage:

Returns the long-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as long. This value could be converted from the original value.

pany_get_ptr

Definition:

void* pany_get_ptr( pany* val )

Usage:

Returns the void*-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as void*. This value could be converted from the original value.

pany_get_str

Definition:

char* pany_get_str( pany* val )

Usage:

Returns the char*-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as char*. This value could be converted from the original value.

pany_get_ulong

Definition:

unsigned long pany_get_ulong( pany* val )

Usage:

Returns the unsigned long-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as unsigned long. This value could be converted from the original value.

pany_get_wcs

Definition:

wchar_t* pany_get_wcs( pany* val )

Usage:

Returns the wchar_t*-value of val.

val is the pointer to the pany-object.

If the pany-object exists in another data type, it will be converted. The function returns the value assigned to val as wchar_t*. This value could be converted from the original value.

pany_init

Definition:

pboolean pany_init( pany* val )

Usage:

Initializes a pany-element.

val is the pointer to the pany-structure to be initialized.

pany_parse

Definition:

pboolean pany_parse( pany* val, char* str, panytype enforce )

Usage:

Parse any value from a string.

The function will check and ignore for leading and following whitespace, and matches long integer, double values and strings.

If a string is encapsulated between C-styled string or character tokens (", '), the content between the delimiters will be taken as a string and ran trough an escaping function.

Any other content is taken as string. If the parameter enforce is set to an desired PANY_-type, this type will be enforced, and no special recognition is done.

This function tries to detect

pany_reset

Definition:

pboolean pany_reset( pany* val )

Usage:

Frees all memory used by a pany-element.

All memory used by the element is freed, and the union's structure is reset to be of type PANYTYPE_NULL.

val is the pointer to pany structure.

pany_set_bool

Definition:

pboolean pany_set_bool( pany* val, pboolean b )

Usage:

Sets the pboolean-value and type of va.

val is the pany-object to be set. b is the pboolean-value to be assigned to val.

The function always returns the value b.

pany_set_char

Definition:

char pany_set_char( pany* val, char c )

Usage:

Sets the char-value and type of va.

val is the pany-object to be set. c is the char-value to be assigned to val.

The function always returns the value c.

pany_set_cstr

Definition:

char* pany_set_cstr( pany* val, char* s )

Usage:

Sets the char*-value and type of va.

val is the pany-object to be set. s is the char*-value to be assigned to val.

The function always returns the value s.

pany_set_cwcs

Definition:

wchar_t* pany_set_cwcs( pany* val, wchar_t* ws )

Usage:

Sets the wchar_t*-value and type of va.

val is the pany-object to be set. ws is the wchar_t*-value to be assigned to val.

The function always returns the value ws.

pany_set_double

Definition:

double pany_set_double( pany* val, double d )

Usage:

Sets the double-value and type of va.

val is the pany-object to be set. d is the double-value to be assigned to val.

The function always returns the value d.

pany_set_float

Definition:

float pany_set_float( pany* val, float f )

Usage:

Sets the float-value and type of va.

val is the pany-object to be set. f is the float-value to be assigned to val.

The function always returns the value f.

pany_set_int

Definition:

int pany_set_int( pany* val, int i )

Usage:

Sets the int-value and type of va.

val is the pany-object to be set. i is the int-value to be assigned to val.

The function always returns the value i.

pany_set_long

Definition:

long pany_set_long( pany* val, long l )

Usage:

Sets the long-value and type of va.

val is the pany-object to be set. l is the long-value to be assigned to val.

The function always returns the value l.

pany_set_ptr

Definition:

void* pany_set_ptr( pany* val, void* ptr )

Usage:

Sets the void*-value and type of va.

val is the pany-object to be set. ptr is the void*-value to be assigned to val.

The function always returns the value ptr.

pany_set_str

Definition:

char* pany_set_str( pany* val, char* s )

Usage:

Sets the char*-value and type of va.

val is the pany-object to be set. s is the char*-value to be assigned to val.

The function always returns the value s.

pany_set_ulong

Definition:

unsigned long pany_set_ulong( pany* val, unsigned long ul )

Usage:

Sets the unsigned long-value and type of va.

val is the pany-object to be set. ul is the unsigned long-value to be assigned to val.

The function always returns the value ul.

pany_set_wcs

Definition:

wchar_t* pany_set_wcs( pany* val, wchar_t* ws )

Usage:

Sets the wchar_t*-value and type of va.

val is the pany-object to be set. ws is the wchar_t*-value to be assigned to val.

The function always returns the value ws.

pany_to_bool

Definition:

pboolean pany_to_bool( pany* val )

Usage:

Converts the current value of val into a pboolean value.

val is the pany-object to convert from.

The function returns the pboolean-value of val.

pany_to_char

Definition:

char pany_to_char( pany* val )

Usage:

Converts the current value of val into a char value.

val is the pany-object to convert from.

The function returns the char-value of val.

pany_to_double

Definition:

double pany_to_double( pany* val )

Usage:

Converts the current value of val into a double value.

val is the pany-object to convert from.

The function returns the double-value of val.

pany_to_float

Definition:

float pany_to_float( pany* val )

Usage:

Converts the current value of val into a float value.

val is the pany-object to convert from.

The function returns the float-value of val.

pany_to_int

Definition:

int pany_to_int( pany* val )

Usage:

Converts the current value of val into a int value.

val is the pany-object to convert from.

The function returns the int-value of val.

pany_to_long

Definition:

long pany_to_long( pany* val )

Usage:

Converts the current value of val into a long value.

val is the pany-object to convert from.

The function returns the long-value of val.

pany_to_ptr

Definition:

void* pany_to_ptr( pany* val )

Usage:

Converts the current value of val into a void* value.

val is the pany-object to convert from.

The function returns the void*-value of val.

pany_to_str

Definition:

char* pany_to_str( pany* val )

Usage:

Converts the current value of val into a char* value.

val is the pany-object to convert from.

The function returns the char*-value of val.

pany_to_ulong

Definition:

unsigned long pany_to_ulong( pany* val )

Usage:

Converts the current value of val into a unsigned long value.

val is the pany-object to convert from.

The function returns the unsigned long-value of val.

pany_to_wcs

Definition:

wchar_t* pany_to_wcs( pany* val )

Usage:

Converts the current value of val into a wchar_t* value.

val is the pany-object to convert from.

The function returns the wchar_t*-value of val.

parray_count

Definition:

size_t parray_count( parray* array )

Usage:

Returns the number of elements in a array.

parray_create

Definition:

parray* parray_create( size_t size, size_t chunk )

Usage:

Create a new parray as an object with an element allocation size size and a reallocation-chunk-size of chunk.

The returned memory must be released with parray_free().

parray_erase

Definition:

pboolean parray_erase( parray* array )

Usage:

Erase a dynamic array.

The array must not be reinitialized after destruction, using parray_init().

array is the pointer to the array to be erased.

parray_first

Definition:

void* parray_first( parray* array )

Usage:

Access first element of the array.

Returns the address of the accessed item, and (void*)NULL if nothing is on the array.

parray_free

Definition:

parray* parray_free( parray* array )

Usage:

Releases all the memory array uses and destroys the array object.

The function always returns (parray*)NULL.

parray_get

Definition:

void* parray_get( parray* array, size_t offset )

Usage:

Access an element from the array by its offset position from the left.

array is the pointer to array where to access the element from. offset is the offset of the element to be accessed from the array's base address.

Returns the address of the accessed item, and (void*)NULL if the item could not be accessed (e.g. if the array is empty or offset is beyond the last of array).

Use parray_rget() for access items from the end.

parray_init

Definition:

pboolean parray_init( parray* array, size_t size, size_t chunk )

Usage:

Performs an array initialization.

array is the pointer to the array to be initialized.

size defines the size of one array element, in bytes. This should be evaluated using the sizeof()-macro.

chunk defines the chunk size, where an array-(re)allocation will be performed. If, e.g. this is set to 128, then, if the 128th item is created within the array, a realloction is done. Once allocated memory remains until the array is freed again.

parray_insert

Definition:

void* parray_insert( parray* array, size_t offset, void* item )

Usage:

Insert item item at offset into array array. Items right to offset will move up.

Gap space between the offset is filled with zero elements; Handle with care!

parray_last

Definition:

void* parray_last( parray* array )

Usage:

Access last element of the array.

Returns the address of the accessed item, and (void*)NULL if nothing is on the array.

parray_malloc

Definition:

void* parray_malloc( parray* array )

Usage:

Pushes and "allocates" an empty element on the array.

This function is just a shortcut to `parray_push( array, (void*)NULL )`, and the memory of the pushed element is initialized to zero.

parray_offset

Definition:

size_t parray_offset( parray* array, void* ptr )

Usage:

Return offset of element ptr in array array. Returns the offset of ptr in array. The function returns the size of the array (which is an invalid offset) if ptr is not part of array.

To check if a pointer belongs to an array, call parray_partof().

parray_partof

Definition:

pboolean parray_partof( parray* array, void* ptr )

Usage:

Returns TRUE, if ptr is an element of array array.

parray_pop

Definition:

void* parray_pop( parray* array )

Usage:

Removes an element from the end of an array.

The function returns the pointer of the popped item. Because dynamic arrays only grow and no memory is freed, the returned data pointer is still valid, and will only be overidden with the next push operation.

array is the pointer to array where to pop an item off.

The function returns the address of the popped item, and (void*)NULL if the item could not be popped (e.g. array is empty).

parray_push

Definition:

void* parray_push( parray* array, void* item )

Usage:

Appends an element to the end of the array.

The element's memory is copied during the push. The item must be of the same memory size as used at array initialization.

array is the pointer to array where to push an item on.

item is the pointer to the memory of the item that should be pushed onto the array. The caller should cast his type into void, or wrap the push-operation with a macro. It can be left (void*)NULL, so no memory will be copied.

The function returns the address of the newly pushed item, and (void*)NULL if the item could not be pushed.

parray_put

Definition:

void* parray_put( parray* array, size_t offset, void* item )

Usage:

Put an element item at position offset of array array.

array is the pointer to array where to put the element to. offset is the offset of the element to be set. item is a pointer to the memory that will be copied into the position at offset. If this is NULL, the position at offset will be set to zero.

Returns the address of the item in the array, or NULL if the desired offset is out of the array bounds.

parray_remove

Definition:

void* parray_remove( parray* array, size_t offset, void** item )

Usage:

Remove item on offset from array array.

The removed item will be copied into item, if item is not NULL. The function returns the memory of the removed item (it will contain the moved up data part or invalid memory, if on the end).

parray_reserve

Definition:

pboolean parray_reserve( parray* array, size_t n )

Usage:

Reserves memory for n items in array.

This function is only used to assume that no memory reallocation is done when the next n items are inserted/malloced.

parray_rget

Definition:

void* parray_rget( parray* array, size_t offset )

Usage:

Access an element from the array by its offset position from the right.

array is the pointer to array where to access the element from. offset is the offset of the element to be accessed from the array's base address.

Returns the address of the accessed item, and (void*)NULL if the item could not be accessed (e.g. if the array is empty or offset is beyond the bottom of the array).

Use parray_get() for access items from the begin.

parray_rmalloc

Definition:

void* parray_rmalloc( parray* array )

Usage:

Unshifts and "allocates" an empty element on the array.

This function is just a shortcut to `parray_unshift( array, (void*)NULL )`, and the memory of the unshifted element is initialized to zero.

parray_rput

Definition:

void* parray_rput( parray* array, size_t offset, void* item )

Usage:

Put an element item at position offset from the right of array array.

array is the pointer to array where to put the element to. offset is the offset of the element to be set. item is a pointer to the memory that will be copied into the position at offset. If this is NULL, the position at offset will be set to zero.

Returns the address of the item in the array, or NULL if the desired offset is out of the array bounds.

parray_shift

Definition:

void* parray_shift( parray* array )

Usage:

Removes an element from the begin of an array.

The function returns the pointer of the shifted item. Because dynamic arrays only grow and no memory is freed, the returned data pointer is still valid, and will only be overidden with the next unshift operation.

array is the pointer to array where to pop an item off.

The function returns the address of the shifted item, and (void*)NULL if the item could not be popped (e.g. array is empty).

parray_swap

Definition:

void* parray_swap( parray* array, size_t pos1, size_t pos2 )

Usage:

Swap two elements of an array.

parray_unshift

Definition:

void* parray_unshift( parray* array, void* item )

Usage:

Appends an element to the begin of the array.

The elements memory is copied during the unshift. The item must be of the same memory size as used at array initialization.

array is the pointer to array where to push an item to the beginning.

item is the pointer to the memory of the item that should be pushed onto the array. The caller should cast his type into void, or wrap the push-operation with a macro. It can be left (void*)NULL, so no memory will be copied.

The function returns the address of the newly unhshifted item, and (void*)NULL if the item could not be unshifted.

pasprintf

Definition:

char* pasprintf( char* fmt, ... )

Usage:

Implementation and replacement for asprintf. pasprintf() takes only the format-string and various arguments. It outputs an allocated string to be freed later on.

fmt is the format string. ... are the parameters according to the placeholders set in fmt.

Returns a char* Returns the allocated string which cointains the format string with inserted values.

pbasename

Definition:

char* pbasename( char* path )

Usage:

Returns the basename of a file.

path is the file path pointer.

Returns a pointer to the basename, which is a part of path.

pdbl_to_str

Definition:

char* pdbl_to_str( double d )

Usage:

Converts a double-value into an allocated string buffer.

d is the double value to become converted. Zero-digits behind the decimal dot will be removed after conversion, so 1.65000 will become "1.65" in its string representation.

Returns a pointer to the newly allocated string, which contains the string-representation of the double value. This pointer must be released by the caller.

pdbl_to_wcs

Definition:

wchar_t* pdbl_to_wcs( double d )

Usage:

Converts a double-value into an allocated wide-character string buffer.

d is the double value to become converted. Zero-digits behind the decimal dot will be removed after conversion, so 1.65000 will become L"1.65" in its wide-character string representation.

Returns a pointer to the newly allocated wide-character string, which contains the string-representation of the double value. This pointer must be released by the caller.

pfileexists

Definition:

pboolean pfileexists( char* filename )

Usage:

Checks for file existence.

filename is the path to a file that will be checked.

Returns TRUE on success, FALSE if not.

pfiletostr

Definition:

pboolean pfiletostr( char** cont, char* filename )

Usage:

Maps the content of an entire file into memory.

cont is the file content return pointer. filename is the path to file to be mapped

The function returns TRUE on success.

pfree

Definition:

void* pfree( void* ptr )

Usage:

Free allocated memory.

The function is a wrapper for the system-function free(), but accepts NULL-pointers and returns a (void*)NULL pointer for direct pointer memory reset.

It could be used this way to immedatelly reset a pointer to NULL:

ptr = pfree( ptr );

ptr is the pointer to be freed.

Returns always (void*)NULL.

pgetopt

Definition:

int pgetopt( char* opt, char** param, int* next, int argc, char** argv, char* optstr, char* loptstr, int idx )

Usage:

Implementation of a command-line option interpreter.

This function works similar to the getopt() functions of the GNU Standard Library, but uses a different style of parameter submit.

It supports both short- and long- option-style parameters. This function is currently under recent development relating to the issues it is used for. It can't be seen as compatible or feature-proven, and does not follow a clear concept right now.

The function returns 0, if the parameter with the given index was successfully evaluated. It returns 1, if there are still command-line parameters, but not as part of options. The parameter param will receive the given pointer. It returns -1 if no more options could be read, or if an option could not be evaluated (unknown option). In such case, param will hold a string to the option that is unknown to pgetopt().

plex_create

Definition:

plex* plex_create( int flags )

Usage:

Constructor function to create a new plex object.

flags can be a combination of compile- and runtime-flags and are merged with special compile-time flags provided for each pattern.

Flag Usage
PREGEX_COMP_WCHAR The regular expressions are provided as wchar_t.
PREGEX_COMP_NOANCHORS Ignore anchor tokens, handle them as normal characters
PREGEX_COMP_NOREF Don't compile references.
PREGEX_COMP_NONGREEDY Compile all patterns to be forced nongreedy.
PREGEX_COMP_NOERRORS Don't report errors, and try to compile as much as possible
PREGEX_COMP_INSENSITIVE Parse regular expressions as case insensitive.
PREGEX_COMP_STATIC The regular expressions passed should be converted 1:1 as it where a string-constant. Any regex-specific symbols will be ignored and taken as they where escaped.
PREGEX_RUN_WCHAR Run regular expressions with wchar_t as input.
PREGEX_RUN_NOANCHORS Ignore anchors while processing the lexer.
PREGEX_RUN_NOREF Don't create references.
PREGEX_RUN_NONGREEDY Force run lexer nongreedy.
PREGEX_RUN_DEBUG Debug mode; output some debug to stderr.

On success, the function returns the allocated pointer to a plex-object. This must be freed later using plex_free().

plex_define

Definition:

pboolean plex_define( plex* lex, char* pat, int match_id, int flags )

Usage:

Defines and parses a regular expression pattern into the plex-object.

pat is the regular expression string.

match_id must be a token match ID, a value > 0. The lower the match ID is, the higher precedence takes the appended expression when there are multiple matches.

flags may ONLY contain compile-time flags, and is combined with the compile-time flags of the plex-object provided at plex_create().

Flag Usage
PREGEX_COMP_WCHAR The regular expressions are provided as wchar_t.
PREGEX_COMP_NOANCHORS Ignore anchor tokens, handle them as normal characters
PREGEX_COMP_NOREF Don't compile references.
PREGEX_COMP_NONGREEDY Compile all patterns to be forced nongreedy.
PREGEX_COMP_NOERRORS Don't report errors, and try to compile as much as possible
PREGEX_COMP_INSENSITIVE Parse regular expressions as case insensitive.
PREGEX_COMP_STATIC The regular expressions passed should be converted 1:1 as it where a string-constant. Any regex-specific symbols will be ignored and taken as they where escaped.

plex_free

Definition:

plex* plex_free( plex* lex )

Usage:

Destructor function for a plex-object.

lex is the pointer to a plex-structure that will be released.

Returns always (plex*)NULL.

plex_lex

Definition:

int plex_lex( plex* lex, char* start, char** end )

Usage:

Performs a lexical analysis using the object lex on pointer start.

If a token can be matched, the function returns the related id of the matching pattern, and end receives the pointer to the last matched character.

The function returns 0 in case that there was no direct match. The function plex_next() ignores unrecognized symbols and directly moves to the next matching pattern.

plex_next

Definition:

char* plex_next( plex* lex, char* start, int* id, char** end )

Usage:

Performs lexical analysis using lex from begin of pointer start, to the next matching token.

start has to be a zero-terminated string or wide-character string (according to the configuration of the plex-object).

If a token can be matched, the function returns the pointer to the position where the match starts at. id receives the id of the matching patternn, end receives the end pointer of the match, when provided. id and end can be omitted by providing NULL-pointers.

The function returns (char*)NULL in case that there is no match.

plex_prepare

Definition:

pboolean plex_prepare( plex* lex )

Usage:

Prepares the DFA state machine of a plex-object lex for execution.

plex_reset

Definition:

pboolean plex_reset( plex* lex )

Usage:

Resets the DFA state machine of a plex-object lex.

plex_tokenize

Definition:

int plex_tokenize( plex* lex, char* start, parray** matches )

Usage:

Tokenizes the string beginning at start using the lexical analyzer lex.

start has to be a zero-terminated string or wide-character string (according to the configuration of the plex-object).

The function initializes and fills the array matches, if provided, with items of size prange. It returns the total number of matches.

plist_access

Definition:

void* plist_access( plistel* e )

Usage:

Access data-content of the current element e.

plist_clear

Definition:

pboolean plist_clear( plist* list )

Usage:

Clear content of the list list.

The function has nearly the same purpose as plist_erase(), except that the entire list is only cleared, but if the list was initialized with PLIST_MOD_RECYCLE, existing pointers are hold for later usage.

plist_count

Definition:

int plist_count( plist* l )

Usage:

Return element count of list l.

plist_create

Definition:

plist* plist_create( size_t size, int flags )

Usage:

Create a new plist as an object with an element allocation size size. Providing a size of 0 causes automatic configuration of PLIST_MOD_PTR.

flags defines an optional flag configuration that modifies the behavior of the linked list and hash table usage. The flags can be merged together using bitwise or (|).

Possible flags are:

Use plist_free() to erase and release the returned list object.

plist_diff

Definition:

int plist_diff( plist* left, plist* right )

Usage:

Tests the contents (data parts) of the list left and the list right for equal elements.

The function returns a value < 0 if left is lower right, a value > 0 if left is greater right and a value == 0 if left is equal to right.

plist_dup

Definition:

plist* plist_dup( plist* list )

Usage:

Creates an independent copy of list and returns it.

All elements of list are duplicated and stand-alone.

plist_erase

Definition:

pboolean plist_erase( plist* list )

Usage:

Erase all allocated content of the list list.

The object list will be still alive, but must be re-configured using plist_init().

plist_first

Definition:

plistel* plist_first( plist* l )

Usage:

Return first element of list l.

plist_free

Definition:

plist* plist_free( plist* list )

Usage:

Releases all the memory list uses and destroys the list object.

The function always returns (plist*)NULL.

plist_get

Definition:

plistel* plist_get( plist* list, size_t n )

Usage:

Retrieve list element by its index from the begin.

The function returns the nth element of the list list.

plist_get_by_key

Definition:

plistel* plist_get_by_key( plist* list, char* key )

Usage:

Retrieve list element by hash-table key.

This function tries to fetch a list entry plistel from list list with the key key.

plist_get_by_ptr

Definition:

plistel* plist_get_by_ptr( plist* list, void* ptr )

Usage:

Retrieve list element by pointer.

This function returns the list element of the unit within the list list that is the pointer ptr.

plist_hashnext

Definition:

plistel* plist_hashnext( plistel* u )

Usage:

Access next element with same hash value of current unit u.

plist_hashprev

Definition:

plistel* plist_hashprev( plistel* u )

Usage:

Access previous element with same hash value of a current unit u.

plist_init

Definition:

pboolean plist_init( plist* list, size_t size, int flags )

Usage:

Initialize the list list with an element allocation size size. flags defines an optional flag configuration that modifies the behavior of the linked list and hash table usage.

plist_insert

Definition:

plistel* plist_insert( plist* list, plistel* pos, char* key, void* src )

Usage:

Insert src as element to the list list before positon pos.

If pos is NULL, the new element will be attached to the end of the list. If key is not NULL, the element will be additionally engaged into the lists hash table.

plist_key

Definition:

char* plist_key( plistel* e )

Usage:

Access key-content of the current element e.

plist_last

Definition:

plistel* plist_last( plist* l )

Usage:

Return last element of list l.

plist_malloc

Definition:

void* plist_malloc( plist* list )

Usage:

Allocates memory for a new element in list list, push it to the end and return the pointer to this.

The function works as a shortcut for plist_access() in combination with plist_push().

plist_next

Definition:

plistel* plist_next( plistel* u )

Usage:

Access next element of current unit u.

plist_offset

Definition:

int plist_offset( plistel* u )

Usage:

Return the offset of the unit u within the list it belongs to.

plist_pop

Definition:

pboolean plist_pop( plist* list, void* dest )

Usage:

Pop last element to dest off the list list.

Like list would be a stack, the last element of the list is poppend and its content is written to dest, if provided at the end of the list.

dest can be omitted and given as (void*)NULL, so the last element will be popped off the list and discards.

plist_prev

Definition:

plistel* plist_prev( plistel* u )

Usage:

Access previous element of a current unit u.

plist_push

Definition:

plistel* plist_push( plist* list, void* src )

Usage:

Push src to end of list.

Like list would be a stack, src is pushed at the end of the list. This function can only be used for linked lists without the hash-table feature in use.

plist_remove

Definition:

pboolean plist_remove( plist* list, plistel* e )

Usage:

Removes the element e from the the list and frees it or puts it into the unused element chain if PLIST_MOD_RECYCLE is flagged.

plist_rget

Definition:

plistel* plist_rget( plist* list, size_t n )

Usage:

Retrieve list element by its index from the end.

The function returns the nth element of the list list from the right.

plist_rmalloc

Definition:

void* plist_rmalloc( plist* list )

Usage:

Allocates memory for a new element in list list, shift it at the begin and return the pointer to this.

The function works as a shortcut for plist_access() in combination with plist_shift().

plist_set_comparefn

Definition:

pboolean plist_set_comparefn( plist* list, int (*comparefn)( plist*, plistel*, plistel* ) )

Usage:

Set compare function

plist_set_printfn

Definition:

pboolean plist_set_printfn( plist* list, void (*printfn)( plist* ) )

Usage:

Set an element dump function.

plist_set_sortfn

Definition:

pboolean plist_set_sortfn( plist* list, int (*sortfn)( plist*, plistel*, plistel* ) )

Usage:

Set sort function

plist_shift

Definition:

plistel* plist_shift( plist* list, void* src )

Usage:

Shift src at begin of list.

Like list would be a queue, src is shifted at the begin of the list. This function can only be used for linked lists without the hash-table feature in use.

plist_size

Definition:

int plist_size( plist* l )

Usage:

Return element size of list l.

plist_sort

Definition:

pboolean plist_sort( plist* list )

Usage:

Sorts list according to the sort-function that was set for the list.

To sort only parts of a list, use plist_subsort().

The sort-function can be modified by using plist_set_sortfn(). The default sort function sorts the list by they contents, internally by using the memcmp() standard function.

plist_subsort

Definition:

pboolean plist_subsort( plist* list, plistel* from, plistel* to )

Usage:

Sorts list between the elements from and to according to the sort-function that was set for the list.

To sort the entire list, use plist_sort().

The sort-function can be modified by using plist_set_sortfn(). The default sort function sorts the list by they contents, internally by using the memcmp() standard function.

plist_swap

Definition:

pboolean plist_swap( plistel* a, plistel* b )

Usage:

Swaps the positions of the list elemements a and b with each other. The elements must be in the same plist object, else the function returns with FALSE.

plist_union

Definition:

int plist_union( plist* all, plist* from )

Usage:

Unions elements from list from into list all.

An element is only added to all, if there exists no other element with the same size and content.

The function will not run if both lists have different element size settings.

The function returns the number of elements added to from.

plist_unshift

Definition:

pboolean plist_unshift( plist* list, void* dest )

Usage:

Take first element to dest from the list list.

Like list would be a queue, the first element of the list is taken and its content is written to dest.

dest can be omitted and given as (void*)NULL, so the first element will be taken from the list and discards.

pmalloc

Definition:

void* pmalloc( size_t size )

Usage:

Dynamically allocate heap memory.

The function is a wrapper for the system function malloc(), but with memory initialization to zero, and immediatelly stops the program if no more memory can be allocated.

size is the size of memory to be allocated, in bytes.

The function returns the allocated heap memory pointer. The returned memory address should be freed using pfree() after it is not required anymore.

pmemdup

Definition:

void* pmemdup( void* ptr, size_t size )

Usage:

Duplicates a memory entry onto the heap.

ptr is the pointer to the memory to be duplicated. size is the size of pointer's data storage.

Returns the new pointer to the memory copy. This should be casted back to the type of ptr again.

pp_ast_create

Definition:

ppast* pp_ast_create( char* emit, ppsym* sym, ppprod* prod, char* start, char* end, int row, int col, ppast* child )

Usage:

Creates new abstract syntax tree node.

pp_ast_dump

Definition:

void pp_ast_dump( FILE* stream, ppast* ast )

Usage:

Dump detailed ast to stream.

pp_ast_dump_json

Definition:

void pp_ast_dump_json( FILE* stream, ppast* ast )

Usage:

Dump ast to stream as JSON-formatted string.

Only opening matches are printed.

pp_ast_dump_short

Definition:

void pp_ast_dump_short( FILE* stream, ppast* ast )

Usage:

Dump simplified ast to stream.

Only opening matches are printed.

pp_ast_dump_tree2svg

Definition:

void pp_ast_dump_tree2svg( FILE* stream, ppast* ast )

Usage:

Dump ast in notation for the tree2svg tool that generates a graphical view of the parse tree.

pp_ast_free

Definition:

ppast* pp_ast_free( ppast* node )

Usage:

Frees entire ast structure and subsequent links.

Always returns (ppast*)NULL.

pp_ast_get

Definition:

ppast* pp_ast_get( ppast* node, int n )

Usage:

Returns the nth element of node.

pp_ast_len

Definition:

int pp_ast_len( ppast* node )

Usage:

Returns length of node chain.

pp_ast_select

Definition:

ppast* pp_ast_select( ppast* node, char* emit, int n )

Usage:

Returns the nth element matching emit emit starting at node.

pp_create

Definition:

pparse* pp_create( int flags, char* bnf )

Usage:

Creates a new parser object with flags flags and the grammar bnf.

pp_free

Definition:

pparse* pp_free( pparse* par )

Usage:

Free parser par.

pp_gram_create

Definition:

ppgram* pp_gram_create( void )

Usage:

Creates a new ppgram-object.

pp_gram_dump

Definition:

void pp_gram_dump( FILE* stream, ppgram* g )

Usage:

Dumps the grammar g to stream.

pp_gram_free

Definition:

ppgram* pp_gram_free( ppgram* g )

Usage:

Frees grammar g and all its related memory.

pp_gram_from_bnf

Definition:

pboolean pp_gram_from_bnf( ppgram* g, char* bnf )

Usage:

Compiles a grammar definition into a grammar.

g is the grammar that receives the result of the parse. bnf is the BNF definition string that defines the grammar.

pp_gram_prepare

Definition:

pboolean pp_gram_prepare( ppgram* g )

Usage:

Prepares the grammar g by computing all necessary stuff required for runtime and parser generator.

The preparation process includes:

This function is only run internally. Don't call it if you're unsure ;)...

pp_ll_parse

Definition:

pboolean pp_ll_parse( parray** ast, ppgram* grm, char* start, char** end )

Usage:

Parses the string str using the grammar grm using a LL(1) parser. Parsing stops at least when reading the zero terminator of str.

ast receives an allocated parray-object with items of ppmatch that describe the prooduced abstract syntax tree.

end receives the position of the last character matched. The function returns TRUE if no parse error orccured.

pp_lr_parse

Definition:

pboolean pp_lr_parse( ppast** root, ppgram* grm, char* start, char** end )

Usage:

Parses the string str using the grammar grm with a LALR(1) parser.

Parsing stops at least when reading the zero terminator of str.

ast receives an allocated parray-object with items of ppmatch elements that describe the produced abstract syntax tree.

end receives the position of the last character matched. The function returns TRUE if no parse error occured.

pp_parse_to_ast

Definition:

pboolean pp_parse_to_ast( ppast** root, pparse* par, char* start, char** end )

Usage:

Run parser par with input start.

The function uses the parsing method defined when then parser was created. end receives a pointer to the position where the parser stopped.

It returns a parse-tree to root on success.

pp_prod_append

Definition:

pboolean pp_prod_append( ppprod* p, ppsym* sym )

Usage:

Appends the symbol sym to the right-hand-side of production p.

pp_prod_create

Definition:

ppprod* pp_prod_create( ppgram* g, ppsym* lhs, ... )

Usage:

Creates a new production on left-hand-side lhs within the grammar g.

pp_prod_drop

Definition:

ppprod* pp_prod_drop( ppprod* p )

Usage:

Frees the production object p and releases any used memory.

pp_prod_get

Definition:

ppprod* pp_prod_get( ppgram* g, int n )

Usage:

Get the nth production from grammar g. Returns (ppprod*)NULL if no symbol was found.

pp_prod_getfromrhs

Definition:

ppsym* pp_prod_getfromrhs( ppprod* p, int off )

Usage:

Returns the offs element from the right-hand-side of production p. Returns (ppsym*)NULL if the requested element does not exist.

pp_prod_remove

Definition:

int pp_prod_remove( ppprod* p, ppsym* sym )

Usage:

Removes all occurences of symbol sym from the right-hand-side of production p.

pp_prod_to_str

Definition:

char* pp_prod_to_str( ppprod* p )

Usage:

Returns the string representation of production p.

The returned pointer is part of p and can be referenced multiple times. It may not be freed by the caller.

pp_sym_create

Definition:

ppsym* pp_sym_create( ppgram* g, ppsymtype type, char* name, char* def )

Usage:

Creates a new symbol of the type type in the grammar g.

name is the name for nonterminal symbols, for terminal symbols it can be left empty. def contains the definition of the symbol in case of a terminal type. It will be ignored else.

pp_sym_drop

Definition:

ppsym* pp_sym_drop( ppsym* sym )

Usage:

Frees a symbol.

pp_sym_get

Definition:

ppsym* pp_sym_get( ppgram* g, int n )

Usage:

Get the nth symbol from grammar g. Returns (ppsym*)NULL if no symbol was found.

pp_sym_get_by_name

Definition:

ppsym* pp_sym_get_by_name( ppgram* g, char* name )

Usage:

Get a symbol from grammar g by its name.

pp_sym_get_nameless_term_by_def

Definition:

ppsym* pp_sym_get_nameless_term_by_def( ppgram* g, char* name )

Usage:

Find a nameless terminal symbol by its pattern.

pp_sym_getprod

Definition:

ppprod* pp_sym_getprod( ppsym* sym, int n )

Usage:

Get the nth production from symbol sym. sym must be of type nonterminal.

Returns (ppprod*)NULL if the production is not found or the symbol is configured differently.

pp_sym_to_str

Definition:

char* pp_sym_to_str( ppsym* sym )

Usage:

Returns the string representation of symbol p.

Nonterminals are not expanded, they are just returned as their name. The returned pointer is part of sym and can be referenced multiple times. It may not be freed by the caller.

prealloc

Definition:

void* prealloc( void* oldptr, size_t size )

Usage:

Dynamically (re)allocate memory on the heap.

The function is a wrapper to the system-function realloc(), but always accepts a NULL-pointer and immediatelly stops the program if no more memory can be allocated.

oldptr is the pointer to be reallocated. If this is (void*)NULL, prealloc() works like a normal call to pmalloc().

size is the size of memory to be reallocated, in bytes.

The function returns the allocated heap memory pointer. The returned memory address should be freed using pfree() after it is not required anymore.

pregex_create

Definition:

pregex* pregex_create( char* pat, int flags )

Usage:

Constructor function to create a new pregex object.

pat is a string providing a regular expression pattern. flags can be a combination of compile- and runtime-flags.

Flag Usage
PREGEX_COMP_WCHAR The regular expression pat is provided as wchar_t.
PREGEX_COMP_NOANCHORS Ignore anchor tokens, handle them as normal characters
PREGEX_COMP_NOREF Don't compile references.
PREGEX_COMP_NONGREEDY Compile regex to be forced nongreedy.
PREGEX_COMP_NOERRORS Don't report errors, and try to compile as much as possible
PREGEX_COMP_INSENSITIVE Parse regular expression as case insensitive.
PREGEX_COMP_STATIC The regular expression passed should be converted 1:1 asit where a string-constant. Any regex-specific symbols will be ignored and taken as they where escaped.
PREGEX_RUN_WCHAR Run regular expression with wchar_t as input.
PREGEX_RUN_NOANCHORS Ignore anchors while processing the regex.
PREGEX_RUN_NOREF Don't create references.
PREGEX_RUN_NONGREEDY Force run regular expression nongreedy.
PREGEX_RUN_DEBUG Debug mode; output some debug to stderr.

On success, the function returns the allocated pointer to a pregex-object. This must be freed later using pregex_free().

pregex_find

Definition:

char* pregex_find( pregex* regex, char* start, char** end )

Usage:

Find a match for the regular expression regex from begin of pointer start.

start has to be a zero-terminated string or wide-character string (according to the configuration of the pregex-object).

If the expression can be matched, the function returns the pointer to the position where the match begins. end receives the end pointer of the match, when provided.

The function returns (char*)NULL in case that there is no match.

pregex_findall

Definition:

int pregex_findall( pregex* regex, char* start, parray** matches )

Usage:

Find all matches for the regular expression regex from begin of pointer start, and optionally return matches as an array.

start has to be a zero-terminated string or wide-character string (according to the configuration of the pregex-object).

The function fills the array matches, if provided, with items of size prange. It returns the total number of matches.

pregex_free

Definition:

pregex* pregex_free( pregex* regex )

Usage:

Destructor function for a pregex-object.

regex is the pointer to a pregex-structure that will be released.

Returns always (pregex*)NULL.

pregex_match

Definition:

pboolean pregex_match( pregex* regex, char* start, char** end )

Usage:

Tries to match the regular expression regex at pointer start.

If the expression can be matched, the function returns TRUE and end receives the pointer to the last matched character.

pregex_qmatch

Definition:

int pregex_qmatch( char* regex, char* str, int flags, parray** matches )

Usage:

Performs a regular expression match on a string, and returns an array of matches via prange-structures, which hold pointers to the begin- and end-addresses of all matches.

regex is the regular expression pattern to be processed.

str is the string on which the pattern will be executed on.

flags are for regular expression compile- and runtime-mode switching. Several of them can be used with the bitwise or-operator (|).

matches is the array of results to the matched substrings within str, provided as parray-object existing of one prange-object for every match. It is optional. matches must be released with parray_free() after its usage.

Returns the number of matches, which is the amount of result entries in the returned array matches. If the value is negative, an error occured.

pregex_qreplace

Definition:

char* pregex_qreplace( char* regex, char* str, char* replace, int flags )

Usage:

Replaces all matches of a regular expression pattern within a string with the replacement. Backreferences can be used with $x for each opening bracket within the regular expression.

regex is the regular expression pattern to be processed.

str is the string on which the pattern will be executed on.

replace is the string that will be inserted as replacement for each pattern match. $x back-references can be used.

flags are for regular expression compile- and runtime-mode switching. Several of them can be used with the bitwise or-operator (|).

Returns an allocated pointer to the generated string with the replacements. This string must be released after its existence is no longer required by the caller using pfree().

pregex_qsplit

Definition:

int pregex_qsplit( char* regex, char* str, int flags, parray** matches )

Usage:

Performs a regular expression search on a string and uses the expression as separator; All strings that where split are returned as matches-array.

regex is the regular expression pattern to be processed.

str is the string on which the pattern will be executed on.

flags are for regular expression compile- and runtime-mode switching. Several of them can be used with the bitwise or-operator (|).

matches is the array of results to the matched substrings within str, provided as parray-object existing of one prange-object for every match. It is optional. matches must be released with parray_free() after its usage.

Returns the number of split substrings, which is the amount of result entries in the returned array matches. If the value is negative, an error occured.

pregex_replace

Definition:

char* pregex_replace( pregex* regex, char* str, char* replacement )

Usage:

Replaces all matches of a regular expression object within a string str with replacement. Backreferences in replacement can be used with $x for each opening bracket within the regular expression.

regex is the pregex-object used for pattern matching. str is the string on which regex will be executed. replacement is the string that will be inserted as the replacement for each match of a pattern described in regex. The notation $x can be used for backreferences, where x is the offset of opening brackets in the pattern, beginning at 1.

The function returns the string with the replaced elements, or (char*)NULL in error case.

pregex_split

Definition:

char* pregex_split( pregex* regex, char* start, char** end, char** next )

Usage:

Returns the range between string start and the next match of regex.

This function can be seen as a "negative match", so the substrings that are not part of the match will be returned.

start has to be a zero-terminated string or wide-character string (according to the configuration of the pregex-object). end receives the last position of the string before the regex. next receives the pointer of the next split element behind the matched substring, so next should become the next start when pregex_split() is called in a loop.

The function returns (char*)NULL in case that there is no more string to split, else it returns start.

pregex_splitall

Definition:

int pregex_splitall( pregex* regex, char* start, parray** matches )

Usage:

Split a string at all matches of the regular expression regex from begin of pointer start, and optionally return the splitted matches as an array.

start has to be a zero-terminated string or wide-character string (according to the configuration of the pregex-object).

The function fills the array matches, if provided, with items of size prange. It returns the total number of matches.

pstr_to_wcs

Definition:

wchar_t* pstr_to_wcs( char* str, pboolean freestr )

Usage:

This functions converts an UTF-8-multi-byte string into an Unicode wide-character string.

The string conversion is performed into dynamically allocated memory. The function wraps mbstowcs(), so set_locale() must be done before this function works properly.

str is the zero-terminated multi-byte-character string to be converted into a wide-character string. freestr defines if the input-string shall be freed after successfull conversion, if set to TRUE.

Returns the wide-character pendant of str as pointer to dynamically allocated memory.

pstrcasecmp

Definition:

int pstrcasecmp( char* s1, char* s2 )

Usage:

Compare a string by ignoring case-order.

s1 is the string to compare with s2. s2 is the string to compare with s1.

Returns 0 if both strings are equal. Returns a value <0 if s1 is lower than s2 or a value >0 if s1 is greater than s2.

pstrcatchar

Definition:

char* pstrcatchar( char* str, char chr )

Usage:

Dynamically appends a character to a string.

str is the pointer to a string to be appended. If this is (char*)NULL, the string will be newly allocated. chr is the the character to be appended to str.

Returns a char*-pointer to the (possibly re-)allocated and appended string. (char*)NULL is returned if no memory could be (re)allocated. This pointer must be released with pfree() when its existence is no longer required.

pstrcatstr

Definition:

char* pstrcatstr( char* dest, char* src, pboolean freesrc )

Usage:

Dynamically appends a zero-terminated string to a dynamic string.

str is the pointer to a zero-terminated string to be appended. If this is (char*)NULL, the string is newly allocated.

append is the string to be appended at the end of str.

release_append frees the pointer provided as append automatically by this function, if set to TRUE. This parameter has only a comfort-function.

Returns a char*-pointer to (possibly re-)allocated and appended string. (char*)NULL is returned if no memory could be (re)allocated, or both strings where NULL. If dest is NULL and freesrc is FALSE, the function automatically returns the pointer src. This pointer must be released with pfree() when its existence is no longer required.

pstrdup

Definition:

char* pstrdup( char* str )

Usage:

Duplicate a string in memory.

str is the string to be copied in memory. If str is provided as NULL, the function will also return NULL.

Returns a char*-pointer to the newly allocated copy of str. This pointer must be released with pfree() when its existence is no longer required.

pstrget

Definition:

char* pstrget( char* str )

Usage:

Savely reads a string.

str is the string pointer to be savely read. If str is NULL, the function returns a pointer to a static address holding an empty string.

pstrlen

Definition:

size_t pstrlen( char* str )

Usage:

Return length of a string.

str is the parameter string to be evaluated. If (char*)NULL, the function returns 0. pstrlen() is much more saver than strlen() because it returns 0 when a NULL-pointer is provided.

Returns the length of the string str.

pstrltrim

Definition:

char* pstrltrim( char* s )

Usage:

Removes whitespace on the left of a string.

s is the string to be left-trimmed.

Returns s.

pstrlwr

Definition:

char* pstrlwr( char* s )

Usage:

Convert a string to lower-case order.

s is the acts both as input and output-string.

Returns s.

pstrncasecmp

Definition:

int pstrncasecmp( char* s1, char* s2, size_t n )

Usage:

Compare a string by ignoring case-order about a maximum of n bytes.

s1 is the string to compare with s2. s2 is the string to compare with s1. n is the number of bytes to compare.

Returns 0 if both strings are equal. Returns a value <0 if s1 is lower than s2 or a value >0 if s1 is greater than s2.

pstrncatstr

Definition:

char* pstrncatstr( char* str, char* append, size_t n )

Usage:

Dynamicaly appends a number of n-characters from one string to another string.

The function works similar to pstrcatstr(), but allows to copy only a maximum of n characters from append.

str is the pointer to a string to be appended. If this is (char*)NULL, the string is newly allocated. append is the begin of character sequence to be appended. n is the number of characters to be appended to str.

Returns a char*-pointer to (possibly re-)allocated and appended string. (char*)NULL is returned if no memory could be (re)allocated, or both strings where NULL. This pointer must be released with pfree() when its existence is no longer required.

pstrndup

Definition:

char* pstrndup( char* str, size_t len )

Usage:

Duplicate n characters from a string in memory.

The function mixes the functionalities of strdup() and strncpy(). The resulting string will be zero-terminated.

str is the parameter string to be duplicated. If this is provided as (char*)NULL, the function will also return (char*)NULL. n is the the number of characters to be copied and duplicated from str. If n is greater than the length of str, copying will stop at the zero terminator.

Returns a char*-pointer to the allocated memory holding the zero-terminated string duplicate. This pointer must be released with pfree() when its existence is no longer required.

pstrput

Definition:

char* pstrput( char** str, char* val )

Usage:

Assign a string into a dynamically allocated pointer. pstrput() manages the assignment of an dynamically allocated string.

str is a pointer receiving the target pointer to be (re)allocated. If str already references a string, this pointer will be freed and reassigned to a copy of val.

val is the the string to be assigned to str (as a independent copy).

Returns a pointer to the allocated heap memory on success, (char*)NULL else. This is the same pointer as returned like calling *str. The returned pointer must be released with pfree() or another call of pstrput(). Calling pstrput() as pstrput( &p, (char*)NULL ); is equivalent to p = pfree( &p ).

pstrrender

Definition:

char* pstrrender( char* tpl, ... )

Usage:

String rendering function.

Inserts multiple values dynamically into the according wildcards positions of a template string. The function can be compared to the function of pstrreplace(), but allows to replace multiple substrings by multiple replacement strings.

tpl is the template string to be rendered with values. ... are the set of values to be inserted into the desired position;

These consist of three values each:

Returns an allocated string which is the resulting source. This string must be release by pfree() or another function releasing heap memory when its existence is no longer required.

pstrreplace

Definition:

char* pstrreplace( char* str, char* find, char* replace )

Usage:

Replace a substring sequence within a string.

str is the string to be replaced in. find is the substring to be matched. replace is the the string to be inserted for each match of the substring find.

Returns a pointer to char* containing the allocated string which is the resulting source. This pointer must be released with pfree() when its existence is no longer required.

pstrrtrim

Definition:

char* pstrrtrim( char* s )

Usage:

Removes trailing whitespace on the right of a string.

s is the string to be right-trimmed.

Returns s.

pstrsplit

Definition:

int pstrsplit( char*** tokens, char* str, char* sep, int limit )

Usage:

Splits a string at a delimiting token and returns an allocated array of token reference pointers.

tokens is the an allocated array of tokenized array values. Requires a pointer to char**. str is the input string to be tokenized. sep is the token separation substring. limit is the token limit; If set to 0, there is no token limit available, so that as much as possible tokens are read.

Returns the number of separated tokens, or -1 on error.

pstrtrim

Definition:

char* pstrtrim( char* s )

Usage:

Removes beginning and trailing whitespace from a string.

s is the string to be trimmed.

Returns s.

pstrunescape

Definition:

char* pstrunescape( char* str )

Usage:

Converts a string with included escape-sequences back into its natural form.

The following table shows escape sequences which are converted.

Sequence is replaced by
\n newline
\t tabulator
\r carriage-return
\b backspace
\f form feed
\a bell / alert
\' single-quote
\" double-quote

The replacement is done within the memory bounds of str itself, because the unescaped version of the character requires lesser space that its previous escape sequence.

The function always returns its input pointer.

Example:

char* s = (char*)NULL;

psetstr( &s, "\tHello\nWorld!" );
printf( ">%s<\n", pstrunescape( s ) );

s = pfree( s );

pstrupr

Definition:

char* pstrupr( char* s )

Usage:

Convert a string to upper-case order.

s is the acts both as input and output-string.

Returns s.

pvasprintf

Definition:

int pvasprintf( char** str, char* fmt, va_list ap )

Usage:

Implementation and replacement for vasprintf.

str is the pointer receiving the resultung, allocated string pointer. fmt is the the format string. ... are the parameters according to the placeholders set in fmt.

Returns the number of characters written, or -1 in error case.

pvm_create

Definition:

pvm* pvm_create( void )

Usage:

Creates a new virtual machine.

pvm_define

Definition:

int pvm_define( pvm* vm, char* mn, pvmop op )

Usage:

Implements mnemonic mn with operational function op into vm.

Returns the operation's opcode, or a value < 0 on error.

pvm_free

Definition:

pvm* pvm_free( pvm* vm )

Usage:

Frees a virtual machine.

pvm_init

Definition:

pvm* pvm_init( pvm* vm )

Usage:

Initializes the virtual machine vm.

pvm_prog_run

Definition:

void pvm_prog_run( pvmprog* prog )

Usage:

Run vm

pvm_reset

Definition:

pvm* pvm_reset( pvm* vm )

Usage:

Resets a virtual machine vm.

pwcs_to_str

Definition:

char* pwcs_to_str( wchar_t* str, pboolean freestr )

Usage:

This functions converts a wide-character string into an UTF-8 string.

The string conversion is performed into dynamically allocated memory. The function wraps the system function wcstombs(), so set_locale() must be called before this function works properly.

str is the zero-terminated string to be converted to UTF-8. freestr defines if the input-string shall be freed after successfull conversion, if set to TRUE.

Returns the UTF-8 character pendant of str as pointer to dynamically allocated memory.

pwcscatchar

Definition:

wchar_t* pwcscatchar( wchar_t* str, wchar_t chr )

Usage:

Appends a character to a dynamic wide-character string.

str is the pointer to a wchar_t-string to be appended. If this is (wchar_t*)NULL, the string is newly allocated. chr is the the character to be appended to str.

Returns a wchar_t* Pointer to (possibly re-)allo- cated and appended string. (wchar_t*)NULL is returned if no memory could be (re)allocated.

pwcscatstr

Definition:

wchar_t* pwcscatstr( wchar_t* dest, wchar_t* src, pboolean freesrc )

Usage:

Appends a (possibly dynamic) wide-character string to a dynamic wide-character string.

str is the pointer to a wchar_t-string to be appended. If this is (wchar_t*)NULL, the string is newly allocated. append is the the string to be appended. freesrc if true, append is free'd automatically by this function.

Returns a wchar_t* Pointer to (possibly re-)allo- cated and appended string. (wchar_t*)NULL is returned if no memory could be (re)allocated, or both strings where NULL.

pwcsdup

Definition:

wchar_t* pwcsdup( wchar_t* str )

Usage:

Duplicate a wide-character string in memory.

str is the string to be copied in memory. If str is provided as NULL, the function will also return NULL.

Returns a wchar_t*-pointer to the newly allocated copy of str. This pointer must be released with pfree() when its existence is no longer required.

pwcsget

Definition:

wchar_t* pwcsget( wchar_t* str )

Usage:

Savely reads a wide-character string.

str is the string pointer to be savely read. If str is NULL, the function returns a pointer to a static address holding an empty string.

pwcslen

Definition:

size_t pwcslen( wchar_t* str )

Usage:

Saver strlen replacement for wide-character.

str is the parameter string to be evaluated. If (wchar_t*)NULL, the function returns 0.

pwcsncatstr

Definition:

wchar_t* pwcsncatstr( wchar_t* str, wchar_t* append, size_t n )

Usage:

Appends a number of N characters from one wide-character string to a dynamic string.

str is the pointer to a wchar_t-string to be appended. If this is (wchar_t*)NULL, the string is newly allocated. append is the begin of character sequence to be appended. n is the amount of characters to be appended to str.

Returns a wchar_t* Pointer to (possibly re-)allo- cated and appended string. (wchar_t*)NULL is returned if no memory could be (re)allocated, or both strings where NULL.

pwcsndup

Definition:

wchar_t* pwcsndup( wchar_t* str, size_t len )

Usage:

Duplicate n characters from a wide-character string in memory.

The function mixes the functionalities of wcsdup() and wcsncpy(). The resulting wide-character string will be zero-terminated.

str is the parameter wide-character string to be duplicated. If this is provided as (wchar_t*)NULL, the function will also return (wchar_t*)NULL.

n is the the number of characters to be copied and duplicated from str. If n is greater than the length of str, copying will stop at the zero terminator.

Returns a wchar_t*-pointer to the allocated memory holding the zero-terminated wide-character string duplicate. This pointer must be released with pfree() when its existence is no longer required.

pwcsput

Definition:

wchar_t* pwcsput( wchar_t** str, wchar_t* val )

Usage:

Assign a wide-character string into a dynamically allocated pointer. pwcsput() manages the assignment of an dynamically allocated wide-chararacter string.

str is a pointer receiving the target pointer to be (re)allocated. If str already references a wide-character string, this pointer will be freed and reassigned to a copy of val.

val is the the wide-character string to be assigned to str (as a independent copy).

Returns a pointer to the allocated heap memory on success, (char_t*)NULL else. This is the same pointer as returned like calling *str. The returned pointer must be released with pfree() or another call of pwcsput(). Calling pwcsput() as pwcsput( &p, (char*)NULL ); is equivalent to p = pfree( &p ).

pwhich

Definition:

char* pwhich( char* filename, char* directories )

Usage:

Figures out a filepath by searching in a PATH definition.

filename is the filename to be searched for.

directories is a string specifying the directories to search in. If this is (char*)NULL, the environment variable PATH will be used and evaluated by using getenv(). The path can be split with multiple pathes by a character that depends on the current platform (Unix: ":", Windows: ";").

Returns a static pointer to the absolute path that contains the file specified as filename, else it will return (char*)NULL.

u8_char

Definition:

wchar_t u8_char( char* str )

Usage:

Return single character (as wide-character value) from UTF-8 multi-byte character string.

str is the pointer to character sequence begin.

u8_isutf

Definition:

pboolean u8_isutf( unsigned char c )

Usage:

Check for UTF-8 character sequence signature.

The function returns TRUE, if the character c is the beginning of a UTF-8 character signature, else FALSE.

u8_move

Definition:

char* u8_move( char* str, int count )

Usage:

Moves count characters ahead in an UTF-8 multi-byte character sequence.

str is the pointer to UTF-8 string to begin moving. count is the number of characters to move left.

The function returns the address of the next UTF-8 character sequence after count characters. If the string's end is reached, it will return a pointer to the zero-terminator.

u8_parse_char

Definition:

wchar_t u8_parse_char( char** ch )

Usage:

Read one character from an UTF-8 input sequence. This character can be escaped, an UTF-8 character or an ordinary ASCII-char.

chr is the input- and output-pointer (the pointer is replaced by the pointer to the next character or escape-sequence within the string.

The function teturns the character code of the parsed character.

u8_seqlen

Definition:

int u8_seqlen(char *s)

Usage:

Returns length of next UTF-8 sequence in a multi-byte character string.

s is the pointer to begin of UTF-8 sequence.

Returns the number of bytes used for the next character.