Pure Library Manual

Author: Albert Gräf <Dr.Graef@t-online.de>
Date: 2009-10-06

Copyright (c) 2009 by Albert Gräf. This document is available under the GNU Free Documentation License.

This manual describes the operations in the standard Pure library, including the prelude and the other library modules which come bundled with the interpreter.

There is a companion to this manual, the Pure Manual which describes the Pure language and the operation of the Pure interpreter.

Contents

1   Prelude

The prelude defines the basic operations of the Pure language. This includes the basic arithmetic and logical operations, string, list and matrix functions, as well as the support operations required to implement list and matrix comprehensions. (The string and matrix operations are in separate modules string.pure and matrix.pure, the primitive arithmetic and logical operations can be found in primitives.pure.)

The prelude also declares a signature of commonly used constant and operator symbols.

  • The built-in exception values failed_cond (failed conditional in guard or if-then-else), failed_match (failed pattern match in lambda, case expression, etc.), stack_fault (not enough stack space, PURE_STACK limit exceeded) and bad_matrix_value x (error in matrix construction).
  • Other predefined exceptions: bad_list_value x, bad_tuple_value x (which are thrown by some list and tuple operations when they fail to find an expected list or tuple value), out_of_bounds (which is thrown by the index operator ! if a list, tuple or matrix index is out of bounds), and malloc_error which indicates a memory allocation error.
  • The truth values true and false. These are actually just integers in Pure, but sometimes it's convenient to refer to them using these symbolic constants. Note that if you also want to use these on the left-hand side of equations, you still have to declare them as nonfix symbols yourself, using a declaration like: nonfix false true;

Here's the list of predefined operator symbols. Note that the parser will automagically give unary minus the same precedence level as the corresponding binary operator. Also note that the "mapsto" operator a.k.a. "hash rocket" => doesn't have a predefined meaning in Pure; the prelude only implements the relations == and ~= to check such "hash pairs" x=>y for equality and inequality. The hash rocket is used in several libraries, however, which usually employ it to denote some kind of key-value associations. See, e.g., Dictionaries for an example.

infixl  1000   $$ ;                // sequence operator
infixr  1000   $ ;                 // right-associative application
infixr  1100   , ;                 // pair (tuple)
infix   1200   => ;                // mapsto constructor
infix   1300   .. ;                // arithmetic sequences
infixr  1400   || ;                // logical or (short-circuit)
infixr  1500   && ;                // logical and (short-circuit)
prefix  1600   ~ ;                 // logical negation
infix   1700   < > <= >= == ~= ;   // relations
infix   1700   === ~== ;           // syntactic equality
infixr  1800   : ;                 // list cons
infix   1900   +: <: ;             // complex numbers (cf. math.pure)
infixl  2000   << >> ;             // bit shifts
infixl  2100   + - or ;            // addition, bitwise or
infixl  2200   * / div mod and ;   // multiplication, bitwise and
infixl  2200   % ;                 // exact division (cf. math.pure)
prefix  2300   not ;               // bitwise not
infixr  2400   ^ ;                 // exponentiation
prefix  2500   # ;                 // size operator
infixl  2600   ! !! ;              // indexing, slicing
infixr  2700   . ;                 // function composition
prefix  2800   ' ;                 // quote
postfix 2900   & ;                 // thunk

1.1   Basic Combinators

The most important function combinators are $ (right-associative application) and . (function composition), which are also defined as macros so that saturated calls of these are eliminated automatically. Examples:

> foo $ bar 99;
foo (bar 99)
> (foo.bar) 99;
foo (bar 99)

The customary identity and constant combinators from the combinatorial calculus are also available, in Pure these are named id and cst, respectively:

> map id (1..5);
[1,2,3,4,5]
> map (cst 0) (1..5);
[0,0,0,0,0]

There's also a combinator void which is basically equivalent to cst (), but with the special twist that it is also defined as a macro optimizing the case of "throwaway" list and matrix comprehensions. This is useful if a comprehension is evaluated solely for its side effects. E.g.:

> using system;
> extern int rand();
> foo = void [printf "%d\n" rand | _ = 1..3];
> show foo
foo = do (\_ -> printf "%d\n" rand) (1..3);
> foo;
1714636915
1957747793
424238335
()

Note that the above list comprehension is actually implemented using do (instead of map, which would normally be the case), so that the intermediate list value of the comprehension is never constructed. This is described in more detail in section Optimization Rules of the Pure Manual.

In addition, Pure also provides the following combinators adopted from Haskell:

  • flip f swaps arguments of a binary function f, e.g.:

    > map (flip (/) 2) (1..3);
    [0.5,1.0,1.5]
    

    This combinator is also used by the compiler to implement right operator sections, which allows you to write the above simply as:

    > map (/2) (1..3);
    [0.5,1.0,1.5]
    
  • curry f turns a function f expecting a pair of values into a curried function of two arguments:

    > using system;
    > dowith (curry (printf "%d: %g\n")) (0..2) [0.0,2.718,3.14];
    0: 0
    1: 2.718
    2: 3.14
    ()
    
  • Conversely, uncurry f turns a curried function f expecting two arguments into a function processing a single pair argument:

    > map (uncurry (*)) [(2,3),(4,5),(6,7)];
    [6,20,42]
    
  • curry3 and uncurry3 are defined analogously, but are used to work with ternary functions.

1.2   Lists and Tuples

The prelude defines the list and tuple constructors (x:y, x,y), as well as equality (==) and inequality (~=) on these structures. (The latter compare two lists or tuples by recursively comparing their members, so == must be defined on the list or tuple members if you want to use these operations.) It also provides the predicate null x which tests whether x is the empty list or tuple, the function reverse x which reverses a list or tuple, and the operators #x (size of a list or tuple), x!i (indexing), x!!is (slicing) and x+y (list concatenation). Note that there isn't a separate operation for concatenating tuples, since the pairing operator already does this:

> (1,2,3),(10,9,8);
1,2,3,10,9,8

This works because the (,) constructor is associative in Pure and will always produce right-recursive pairs. This also implies that tuples are always flat in Pure and can't be nested; if you need this, you should use lists instead. Also note that the empty tuple () acts as a neutral element with respect to (,):

> (),(a,b,c);
a,b,c
> (a,b,c),();
a,b,c

Lists are the usual right-recursive aggregates, pretty much the same as in Lisp or Prolog except that they use a Haskell-like syntax. In difference to Haskell, list concatenation is denoted +, and lists may contain an arbitrary mixture of arguments, i.e., they are fully polymorphic:

> 1:2:3:[];
[1,2,3]
> [1,2,3]+[u,v,w]+[3.14];
[1,2,3,u,v,w,3.14]

Lists are eager in Pure by default, but they can also be made lazy, see section Lazy Evaluation and Streams in the Pure Manual.

Arithmetic sequences can be constructed with the infix .. operator:

> 1..5;
[1,2,3,4,5]
> 1:3..11;
[1,3,5,7,9,11]

Note that the Pure syntax differs from Haskell in that there are no brackets around the construct and a step width is indicated by specifying the first two elements as x:y instead of x,y. Also, to specify infinite sequences you have to use an infinite upper bound (inf or -inf):

> 1:3..inf;
1:#<thunk 0x7f696cd2dbd8>
> -1:-3..-inf;
-1:#<thunk 0x7f696cd2fde8>

The lower bounds of an arithmetic sequence must always be finite.

You can convert between (finite) lists and tuples using the list and tuple operations:

> tuple (1..5);
1,2,3,4,5
> list (a,b,c);
[a,b,c]

The list function can also be used to turn a finite lazy list into an eager one:

> list $ take 10 (-1:-3..-inf);
[-1,-3,-5,-7,-9,-11,-13,-15,-17,-19]

You can also achieve the same effect somewhat more conveniently by slicing a finite part from a stream (see below):

> (-1:-3..-inf)!!(0..9);
[-1,-3,-5,-7,-9,-11,-13,-15,-17,-19]

Conversely, it is also possible to convert a list to a stream:

> stream (1..10);
1:#<thunk 0x7fe537fe2b58>

This might appear a bit useless at first sight, since all elements of the stream are in fact already known. However, this operation then allows you to apply other functions to the list and have them evaluated in a lazy fashion.

Indexing of lists and tuples is always zero-based (i.e., indices run from 0 to #x-1), and an exception will be raised if the index is out of bounds:

> [1,2,3]!2;
3
> [1,2,3]!4;
<stdin>, line 34: unhandled exception 'out_of_bounds' while evaluating
'[1,2,3]!4'

The slicing operator !! takes a list or tuple and a list of indices and returns the list or tuple of the corresponding elements, respectively. Indices which are out of the valid range are silently ignored:

> (1..5)!!(3..10);
[4,5]
> (1,2,3,4,5)!!(3..10);
4,5

Indices can actually be specified in any order, so that you can retrieve any permutation of the members, also with duplicates. E.g.:

> (1..5)!![2,4,4,1];
[3,5,5,2]

This is less efficient than the case of contiguous index ranges (which is optimized so that it always works in linear time), because it requires repeated traversals of the list for each index. For larger lists you should hence use vectors or matrices instead, to avoid the quadratic complexity.

(The prelude actually implements the slicing operation in a fairly generic way, so that it works with any kind of container data structure which defines ! in such a manner that it throws an exception when the index is out of bounds. It also works with any kind of index container that implements the catmap operation.)

1.3   List Functions

This mostly comes straight from the Q prelude which in turn was based on the first edition of the Bird/Wadler book, and is very similar to what you can find in the Haskell prelude. Some functions have slightly different names, though, and of course everything is typed dynamically.

1.3.1   Common List Functions

any p xs
tests whether the predicate p holds for any of the members of xs
all p xs
tests whether the predicate p holds for all of the members of xs
cat xs
concatenate a list of lists
catmap f xs
convenience function which combines cat and map; this is also used to implement list comprehensions
do f xs
apply f to all members of xs, like map, but throw away all intermediate results and return ()
drop n xs
remove n elements from the front of xs
dropwhile p xs
remove elements from the front of xs while the predicate p is satisfied
filter p xs
return the list of all members of xs satisfying the predicate p
foldl f a xs
accumulate the binary function f over all members of xs, starting from the initial value a and working from the front of the list towards its end
foldl1 f xs
accumulate the binary function f over all members of xs, starting from the value head xs and working from the front of the list towards its end; xs must be nonempty
foldr f a xs
accumulate the binary function f over all members of xs, starting from the initial value a and working from the end of the list towards its front
foldr1 f xs
accumulate the binary function f over all members of xs, starting from the value last xs and working from the end of the list towards its front; xs must be nonempty
index xs x

search for an occurrence of x in xs and return the index of the first occurrence, if any, -1 otherwise

Note: This uses equality (==) to decide whether a member of xs is an occurrence of x, so == must have an appropriate definition on the list members.

init xs
return all but the last element of xs; xs must be nonempty
last xs
return the last element of xs; xs must be nonempty
listmap f xs
convenience function which works like map, but also deals with matrix and string arguments while ensuring that the result is always a list; this is primarily used to implement list comprehensions
map f xs
apply f to each member of xs
scanl f a xs
accumulate the binary function f over all members of xs, as with foldl, but return all intermediate results as a list
scanl1 f xs
accumulate the binary function f over all members of xs, as with foldl1, but return all intermediate results as a list
scanr f a xs
accumulate the binary function f over all members of xs, as with foldr, but return all intermediate results as a list
scanr1 f xs
accumulate the binary function f over all members of xs, as with foldr1, but return all intermediate results as a list
tail xs
return all but the first element of xs; xs must be nonempty
take n xs
take n elements from the front of xs
takewhile p xs
take elements from the front of xs while the predicate p is satisfied

1.4   String Functions

Pure strings are null-terminated character strings encoded in UTF-8, see the Pure Manual for details. The prelude provides various operations on strings, including a complete set of list-like operations, so that strings can be used mostly as if they were lists, although they are really implemented as C character arrays for reasons of efficiency. Pure also has some powerful operations to convert between Pure expressions and their string representation, see Eval and Friends for those.

1.4.1   Basic String Functions

Concatenation, indexing and slicing works just like with lists:

> "abc"+"xyz";
"abcxyz"
> let s = "The quick brown fox jumps over the lazy dog.";
> s!5;
"u"
> s!!(20..24);
"jumps"

Checking for empty strings and determining the size of a string also works as expected:

> null "";
1
> null s;
0
> #s;
44

You can search for the location of a substring in a string, and extract a substring of a given length:

index s u
Returns the (zero-based) index of the first occurrence of the substring u in s, or -1 if u is not found in s.
substr s i n
Extracts a substring of (at most) n characters at position i in s. This takes care of all corner cases, adjusting index and number of characters so that the index range stays confined to the source string.

Example:

> index s "jumps";
20
> substr s 20 10;
"jumps over"

Note that Pure doesn't have a separate type for individual characters. Instead, these are represented as strings c containing exactly one (UTF-8) character (i.e., #c==1). It is possible to convert such single character strings to the corresponding integer character codes, and vice versa:

ord c
Ordinal number of a single character string c. This is the character's code point in the Unicode character set.
chr n
Converts an integer back to the character with the corresponding code point.

In addition, the usual character arithmetic works, including arithmetic sequences of characters, so that you can write stuff like the following:

> "a"-"A";
32
> "u"-32;
"U"
> "a".."k";
["a","b","c","d","e","f","g","h","i","j","k"]

Strings are also ordered lexicographically based on their character codes:

> "awe">"awesome";
0
> "foo">="bar";
1

For convenience, the prelude provides the following functions to convert between strings and lists (or other aggregates) of characters.

chars s, list s
Convert a string s to a list of characters.
tuple s, matrix s
Convert a string s to a tuple or (symbolic) matrix of characters, respectively.
strcat xs
Concatenate a list xs of strings (in particular, this converts a list of characters back to a string).
string xs
Convert a list, tuple or (symbolic) matrix of strings to a string. In the case of a list, this is synonymous with strcat, but it also works with the other types of aggregates.

For instance:

> list "abc";
["a","b","c"]
> string ("a".."z");
"abcdefghijklmnopqrstuvwxyz"

The following functions are provided to deal with strings of "tokens" separated by a given delimiter string.

split delim s
Splits s into a list of substrings delimited by delim.
join delim xs
Joins the list of strings xs to a single string, interpolating the given delim string.

Example:

> let xs = split " " s; xs;
["The","quick","brown","fox","jumps","over","the","lazy","dog."]
> join ":" xs;
"The:quick:brown:fox:jumps:over:the:lazy:dog."

We mention in passing here that more elaborate string matching, splitting and replacement operations based on regular expressions are provided by the system module, see System Interface.

If that isn't enough already, most generic list operations carry over to strings in the obvious way, treating the string like a list of characters. (Some operations will actually return character lists in that case, so you might have to apply string explicitly to convert these back to a string.) For instance:

> filter (>="k") s;
"qukrownoxumpsovrtlzyo"
> string $ map pred "ibm";
"hal"

List comprehensions can draw values from strings, too:

> string [x+1 | x="HAL"];
"IBM"

1.4.2   Low-Level Operations

The following routines are provided by the runtime to turn raw C char* pointers (also called byte strings in Pure parlance, to distinguish them from Pure's "cooked" UTF-8 string values) into corresponding Pure strings. Normally you don't have to worry about this, because the C interface already takes care of the necessary marshalling, but in some low-level code these operations are useful. Also note that here and in the following, the cstring routines also convert the string between the system encoding and Pure's internal UTF-8 representation.

string s, cstring s
Convert a pointer s to a Pure string. s must point to a null-terminated C string. These routines take ownership of the original string value, assuming it to be malloced, so you should only use these for C strings which are specifically intended to be freed by the user.
string_dup s, cstring_dup s
Convert a pointer s to a Pure string. Like above, but these functions take a copy of the string, leaving the original C string untouched.

The reverse transformations are also provided. These take a Pure string to a byte string (raw char*).

byte_string s, byte_cstring s
Construct a byte string from a Pure string s. The result is a raw pointer object pointing to the converted string. The original Pure string is always copied (and, in the case of byte_cstring, converted to the system encoding). The resulting byte string is a malloced pointer which can be used like a C char*, and has to be freed explicitly by the caller when no longer needed.

Finally, it is also possible to convert Pure string lists to byte string vectors and vice versa. These are useful if you need to pass an argv-like string vector (i.e., a char** or char*[]) to C routines. The computed C vectors are malloced pointers which have an extra NULL pointer as the last entry, and should thus be usable for almost any purpose which requires such a string vector in C. They also take care of garbage-collecting themselves. The original string data is always copied. As usual, the cstring variants do automatic conversions to the system encoding.

byte_string_pointer xs, byte_cstring_pointer xs
Convert a list of Pure strings to a C char**.
string_list n p, cstring_list n p
Convert a C char** to a list of Pure strings.

Note that the back conversions take an additional first argument which denotes the number of strings to retrieve. If you know that the vector is NULL-terminated then this can also be an infinite value (inf) in which case the number of elements will be figured out automatically. Processing always stops at the first NULL pointer encountered.

1.5   Matrix Functions

Most of the generic list operations are implemented on matrices as well, see Common List Functions. Hence stuff like map and zipwith works as expected:

> map succ {1,2,3;4,5,6};
{2,3,4;5,6,7}
> zipwith (+) {1,2,3;4,5,6} {1,0,1;0,2,0};
{2,2,4;4,7,6}

The matrix module also provides a bunch of other specialized matrix operations, including all the necessary operations for matrix comprehensions. We briefly summarize the most important operations below; please refer to matrix.pure for all the gory details.

Also make sure you check Matrix Computations in the Pure Manual for some more examples.

1.5.2   Matrix Inspection and Manipulation

dmatrixp x, cmatrixp x, imatrixp x, smatrixp x, nmatrixp x
Check for different kinds of matrices (double, complex, int, symbolic and numeric, i.e., non-symbolic).
vectorp x, rowvectorp x, colvectorp x
Check for different kinds of vectors (these are just matrices with one row or column).
stride x
The stride of a matrix denotes the real row size of the underlying C array, see the description of the pack function below for further details. There's little use for this value in Pure, but it may be needed when interfacing to C.
row x i, col x i
Extract the ith row or column of a matrix.
rows x, cols x
Return the list of all rows or columns of a matrix.
list x, list2 x, tuple x, matrix xs

Convert a matrix to a list or tuple, and vice versa. list x converts a matrix x to a flat list of its elements, while list2 converts it to a list of lists. Likewise, tuple x converts the matrix x to a tuple. Conversely, matrix xs converts a list or tuple to a corresponding matrix.

Note that matrix turns a list of lists or matrices specifying the rows of the matrix to the corresponding rectangular matrix; otherwise, the result is a row vector. Also note that the matrix function may throw a bad_matrix_value x in case of dimension mismatch, where x denotes the offending submatrix.

diag x, subdiag x k, supdiag x k:
Extract (sub-,super-) diagonals from a matrix. Sub- and super-diagonals for k=0 return the main diagonal. Indices for sub- and super-diagonals can also be negative, in which case the corresponding super- or sub-diagonal is returned instead. In each case the result is a row vector.
submat x (i,j) (n,m)
Extract a submatrix of a given size at a given offset. The result shares the underlying storage with the input matrix (i.e., matrix elements are not copied) and so this is a comparatively cheap operation.
rowcat xs, colcat xs
Construct matrices from lists of rows and columns. These take either scalars or submatrices as inputs; corresponding dimensions must match. rowcat combines submatrices vertically, like {x;y}; colcat combines them horizontally, like {x,y}. Note: Like the built-in matrix constructs, these operations may throw a bad_matrix_value x exception in case of dimension mismatch, where x denotes the offending submatrix.
matcat xs
Construct a matrix from a (symbolic) matrix of other matrices and/or scalars. This works like a combination of rowcat and colcat, but draws its input from a matrix instead of a list of matrices, and preserves the overall layout of the "host" matrix. The net effect is that the host matrix is flattened out. If all elements of the input matrix are scalars already, the input matrix is returned unchanged.
rowcatmap f xs, colcatmap f xs
Combinations of rowcat, colcat and map. These are used, in particular, for implementing matrix comprehensions.
dmatrix (n,m), cmatrix (n,m), imatrix (n,m)
Convenience functions to create zero matrices with the given dimensions (either a pair denoting the number of rows and columns, or just the row size in order to create a row vector).
diagmat x, subdiagmat x k, supdiagmat x k
Create a (sub-,super-) diagonal matrix from a row vector x of size n. The result is always a square matrix with dimension (n+k,n+k), which is of the same matrix type (double, complex, int, symbolic) as the input and has the elements of the vector on its kth sub- or super-diagonal, with all other elements zero. A negative value for k turns a sub- into a super-diagonal matrix and vice versa.
dmatrix x, cmatrix x, imatrix x, smatrix x
Matrix conversions. These convert between different types of numeric and symbolic matrices. The same operations also convert a list directly to a row vector; this is usually much faster than the generic matrix operation, but requires that the elements already are of the appropriate type.
re x, im x, conj x
Extract the real and imaginary parts and compute the conjugate of a numeric matrix.
pack x, packed x

Pack a matrix. This creates a copy of the matrix which has the data in contiguous storage. It also frees up extra memory if the matrix was created as a slice from a bigger matrix (see submat above) which has since gone the way of the dodo.

The packed predicate can be used to verify whether a matrix is already packed. Note that even if a matrix is already packed, pack will make a copy of it anyway, so pack also provides a quick way to copy a matrix, e.g., if you want to pass it as an input/output parameter to a GSL routine.

redim (n,m) x, redim n x

Change the dimensions of a matrix without changing its size. The total number of elements must match that of the input matrix. Reuses the underlying storage of the input matrix if possible (i.e., if the matrix is packed).

You can also redim a matrix to a given row size n. In this case the row size must divide the total size of the matrix.

rowvector x, colvector x
Convenience functions to convert a matrix to a row or column vector.
transpose x

Transpose a matrix. Example:

> transpose {1,2,3;4,5,6};
{1,4;2,5;3,6}
rowrev x, colrev x, reverse x
Reverse a matrix. rowrev reverses the rows, colrev the columns, reverse both dimensions.

1.5.3   Pointers and Matrices

The matrix module offers a bunch of low-level operations for converting between matrices and raw pointers. These are typically used to shovel around massive amounts of numeric data between Pure and external C routines, when performance and throughput is an important consideration (e.g., graphics, video and audio applications). The usual caveats apply.

pointer x
Get a pointer to the underlying C array of a matrix. The data is not copied. Hence you have to be careful when passing such a pointer to C functions if the underlying data is non-contiguous; when in doubt, first use the pack function to place the data in contiguous storage, or use one of the matrix-pointer conversion routines below.

The following operations copy the contents of a matrix to a given pointer and return that pointer, converting to the target data type on the fly if necessary. The given pointer may also be NULL, in which case suitable memory is malloced and returned; otherwise the caller must ensure that the memory pointed to by p is big enough for the contents of the given matrix.

  • double_pointer p x
  • float_pointer p x
  • complex_pointer p x
  • complex_float_pointer p x
  • int_pointer p x
  • short_pointer p x
  • byte_pointer p x

Conversely, the following functions allow you to create a numeric matrix from a pointer, copying the data and converting it from the source type on the fly if necessary. The source pointer p may also be NULL, in which case the new matrix is filled with zeros instead. Otherwise the caller must ensure that the pointer points to properly initialized memory big enough for the requested dimensions. The given dimension may also be just an integer n if a row vector is to be created.

  • double_matrix (n,m) p
  • float_matrix (n,m) p
  • complex_matrix (n,m) p
  • complex_float_matrix (n,m) p
  • int_matrix (n,m) p
  • short_matrix (n,m) p
  • byte_matrix (n,m) p

Finally, you can use the following operations to create a numeric matrix view of existing data, without copying the data. The data must be double, complex or int, the pointer must not be NULL and the caller must also ensure that the memory persists for the entire lifetime of the matrix object. The given dimension may also be just an integer n if a row vector view is to be created.

  • double_matrix_view (n,m) p
  • complex_matrix_view (n,m) p
  • int_matrix_view (n,m) p

1.6   Primitives

This prelude module is a collection of various lowlevel operations, which are implemented either directly by machine instructions or by C functions provided in the runtime. In particular, this module defines the basic arithmetic and logic operations on machine integers, bigints and floating point numbers, as well as various type checking predicates and conversions between different types. Some low-level pointer operations are also provided, as well as "sentries" (Pure's flavour of object finalizers) and "references" (mutable expression pointers).

1.6.1   Arithmetic

The basic arithmetic and logic operations provided by this module are summarized in the following table:

Kind Operator Meaning
Arithmetic + - * / div mod ^ addition, subtraction (also unary minus), multiplication, division (inexact), exact int/bigint division/modulus, exponentiation (inexact)
Comparisons == ~= < > <= >= equality, inequality, less than, greater than, less than or equal, greater than or equal
Logic ~ && || logical not, and, or (short-circuit)
Bitwise not and or << >> bitwise not, and, or, bit shifts

Precedence and and associativity of the operators can be found in the operators table at the beginning of this section.

The names of some operations are at odds with C. Note, in particular, that logical negation is denoted ~ instead of ! (and, consequently, ~= denotes inequality, rather than !=), and the bitwise operations are named differently. This is necessary because Pure uses !, & and | for other purposes. Also, / always denotes inexact (double) division in Pure, whereas the integer division operators are called div and mod. (%, which is not defined by this module, also has a different meaning in Pure; it's the exact division operator, see Rational Numbers.)

The above operations are implemented for int, bigint and, where appropriate, double and pointer operands. (Pointer arithmetic comprises + and - and works in the usual way, i.e., p-q returns the byte offset between two pointers p and q, and p+n or p-n offsets a pointer p by the given integer n denoting the amount of bytes.) The math module (see Mathematical Functions) also provides implementations of the arithmetic and comparison operators for rational, complex and complex rational numbers.

Note that the logical operations are actually implemented as special forms in order to provide for short-circuit evaluation. This needs special support from the compiler to work. The primitives module still provides definitions for these, as well as other special forms like quote and the thunking operator & so that they may be used as function values and in partial applications, but when used in this manner they lose all their special call-by-name properties; see Special Forms in the Pure Manual for details.

The constants inf and nan are defined as the usual IEEE floating point infinities and NaNs, and the predicates infp and nanp are provided to check for these kinds of values.

In addition, the following arithmetic and numeric functions are provided:

abs x, sgn x
Absolute value and sign of a number.
min x y, max x y
Minimum and maximum of two values. This works with any kind of values which have the ordering relations defined on them.
succ x, pred x
Successor (+1) and predecessor (-1) functions.
gcd x y, lcd x y
The greatest common divisor and least common multiple functions from the GMP library. These return a bigint if at least one of the arguments is a bigint, a machine int otherwise.
pow x y
Computes exact powers of ints and bigints. The result is always a bigint. Note that y must always be nonnegative here, but see the math module (Mathematical Functions) which deals with the case y<0 using rational numbers.

1.6.4   Inspection

The following operations let you peek at various internal information that the interpreter provides to Pure programs either for convenience or for metaprogramming purposes. They are complemented by the evaluation primitives discussed below, see Eval and Friends.

ans

Retrieve the most recently printed result of a toplevel expression evaluated in the read-eval-print loop. This is just a convenience for interactive usage. Note that the ans value will stick around until a new expression is computed. (It is possible to clear the ans value with the interactive command clear ans, however.) Example:

> 1/3;
0.333333333333333
> ans/2;
0.166666666666667
__locals__

Return a list with the local function bindings (with clauses) visible at this point in the program. This is actually implemented as a built-in. The return value is a list of "hash" pairs x=>f where x is the (quoted) symbol denoting the function and f is the function itself (or its value, if f is a parameterless function). The __locals__ function is useful for debugging purposes, as well as to implement dynamic environments. It is also used internally to implement the reduce macro, see Eval and Friends. Example:

> __locals__ with foo x = x+1; x = a+b end;
[x=>a+b,foo=>foo]
> f 99 when _=>f = ans!1 end;
100

The prelude also provides some functions to retrieve various attributes of a function symbol which determine how the operation is applied to its operands or arguments. These functions all take a single argument, the symbol or function object to be inspected, and return an integer value.

nargs x
Get the argument count of a function object, i.e., the number of arguments it expects. Returns 0 for thunks and saturated applications, -1 for over-saturated applications and non-functions.
arity x
Determine the arity of an operator symbol. The returned value is 0, 1 or 2 for nullary, unary and binary symbols, respectively, -1 for symbols without a fixity declaration or other kinds of objects.
fixity f
Determine the fixity of an operator symbol. The fixity is encoded as a 2-digit number 10*n+m where n is the precedence level (ranging from 0 to PREC_MAX, where PREC_MAX denotes the precedence of primary expressions, 16777216 in the current implementation) and m indicates the actual fixity at each level, in the order of increasing precedence (0 = infix, 1 = infixl, 2 = infixr, 3 = prefix, 4 = postfix). The fixity value of nonfix and outfix symbols, as well as symbols without a fixity declaration, is always given as 10*PREC_MAX, and the same value is also reported for non-symbol objects. (PREC_MAX isn't actually defined as a constant anywhere, but you can easily do that yourself by setting PREC_MAX to the fixity value of any nonfix symbol or non-symbol value, e.g.: const PREC_MAX = fixity [];)

Note that only closures (i.e., named and anonymous functions and thunks) have a defined argument count in Pure, otherwise nargs returns -1 indicating an unknown argument count. Partial applications of closures return the number of remaining arguments, which may be zero to indicate a saturated (but unevaluated) application, or -1 for over-saturated and constructor applications. (Note that in Pure a saturated application may also remain unevaluated because there is no definition for the given combination of arguments and thus the expression is in normal form, or because the application was quoted. If such a normal form application is then applied to some "extra" arguments it becomes over-saturated.)

The value returned by nargs always denotes the actual argument count of the given function, regardless of the declared arity if the function also happens to be an operator symbol. Often these will coincide (as, e.g., in the case of + which is a binary operator and also expects two arguments). But this is not necessarily the case, as shown in the following example of a binary operator which actually takes three arguments:

> infix 0 oops;
> (oops) x y z = x*z+y;
> arity (oops);
2
> nargs (oops);
3
> nargs (5 oops 8);
1
> map (5 oops 8) (1..5);
[13,18,23,28,33]

Also note that the argument count of a named function will be the actual number of arguments, as given in the function definition:

> foo x y = x*y;
> nargs foo;
2
> nargs (foo x);
1

In contrast, anonymous functions (lambdas) always yield an argument count of 1, because a multi-argument lambda like \x y -> x*y is in fact just a shorthand for several nested 1-argument lambdas, \x -> \y -> x*y in this example. Hence:

> bar = \x y -> x*y;
> nargs bar;
1
> nargs (bar x);
1

1.6.5   Eval and Friends

Pure provides some rather powerful operations to convert between Pure expressions and their string representation, and to evaluate quoted expressions ('x). The string conversions str, val and eval also provide a convenient means to serialize Pure expressions, e.g., when terms are to be transferred to/from persistent storage. (Note, however, that this has its limitations. Specifically, some objects like pointers and anonymous functions do not have a parsable string representation. Also see the Expression Serialization section for some dedicated serialization operations which provide a more compact binary serialization format.)

str x
Yields the print representation of an expression in Pure syntax, as a string.
val s
Parses a single simple expression, specified as a string in Pure syntax, and returns the result as is, without evaluating it. Note that this is much more limited than the eval operation below, as the expression must not contain any of the special constructs (conditional expressions, when, with, etc.).
eval x
Parses any expression, specified as a string in Pure syntax, and returns its value. In fact, eval can also parse and execute arbitrary Pure code. In that case it will return the last computed expression, if any. Alternatively, eval can also be invoked on a (quoted) Pure expression, which is recompiled and then evaluated. Exceptions during evaluation are reported back to the caller.
evalcmd x
Like eval, but allows execution of interactive commands and returns their captured output as a string. No other results are returned, so this operation is most useful for executing Pure definitions and interactive commands for their side-effects. (At this time, only the regular output of a few commands can be captured, most notably clear, save and show; otherwise the result string will be empty.)
lasterr
Reports errors in eval and evalcmd. This string value will be nonempty iff a compilation or execution error was encountered during the most recent invokation of eval and evalcmd. In that case each reported error message is terminated with a newline character.
reduce x
Reevaluates an expression in a local environment. This dynamically rebinds function symbols in the given expression to whatever local definitions are in effect at the point of the reduce call. Note that reduce is actually implemented as a macro which uses the __locals__ builtin to enumerate the bindings which are in effect at the call site.

Examples:

> str (1/3);
"0.333333333333333"
> val "1/3";
1/3
> eval "1/3";
0.333333333333333
> eval ('(1/3));
0.333333333333333
> evalcmd "show evalcmd";
"extern expr* evalcmd(expr*);\n"

> eval "1/3)";
eval "1/3)"
> lasterr;
"<stdin>, line 1: syntax error, unexpected ')', expecting '=' or '|'\n"

The reduce macro provides a restricted form of dynamic binding which is useful to implement local rewriting rules. It is invoked without parameters and expands to a curried primitive which takes one additional argument, the expression to be rewritten. The following example shows how to expand or factorize an expression using local rules for the corresponding laws of distributivity:

expand = reduce with
  (a+b)*c = a*c+b*c;
  a*(b+c) = a*b+a*c;
end;

factor = reduce with
  a*c+b*c1 = (a+b)*c if c===c1;
  a*b+a1*c = a*(b+c) if a===a1;
end;

expand ((a+b)*2); // yields a*2+b*2
factor (a*2+b*2); // yields (a+b)*2

Note that instances of locally bound functions are substituted back in the computed result, thus the instances of * and + in the results a*2+b*2 and (a+b)*2 shown above denote the corresponding globals, not the local incarnations of * and + defined in expand and factor, respectively.

reduce also adjusts to quoted arguments. In this case, the local rules are applied as usual, but back-substituted globals are not evaluated in the result:

> expand ((a+1)*2);
a*2+2
> expand ('((a+1)*2));
a*2+1*2

Note that reduce only takes into account local function bindings from with clauses, local variable bindings do not affect its operation in any way:

> let y = [x,x^2,x^3];
> reduce y when x = u+v end;
[x,x^2,x^3]

However, in such cases you can perform the desired substitution by just turning the when into a with clause:

> reduce y with x = u+v end;
[u+v,(u+v)^2,(u+v)^3]

1.6.6   Expression Serialization

Like str and eval, the following blob and val operations can be used to safely transfer expression data to/from persistent storage and between different processes (using, e.g., POSIX shared memory, pipes or sockets). However, blob and val use a binary format which is usually much more compact and gets processed much faster than the string representations used by str and eval. Also, val offers some additional protection against transmission errors through a crc check. (The advantage of the string representation, however, is that it's readable plain text in Pure syntax.)

blob x
Stores the contents of the given expression as a binary object. The return value is a cooked pointer which frees itself when garbage-collected.
val p
Reconstructs a serialized expression from the result of a previous invocation of the blob function.
blobp p
Checks for a valid blob object. (Note that val may fail even if blobp returns true, because for performance reasons blobp only does a quick plausibility check on the header information of the blob, whereas val also performs a crc check and verifies data integrity.)
blob_size p, blob_crc p
Determines the size (in bytes) and crc checksum of a blob, respectively. blob_size always returns a bigint, blob_crc a machine int (use uint on the latter to get a proper unsigned 32 bit value). For convenience, #p is defined as an alias for blob_size p on blob pointers.

Example:

> let b = blob {"Hello, world!", 1/3, 4711, NULL};
> b; #b; uint $ blob_crc b;
#<pointer 0x141dca0>
148L
3249898239L
> val b;
{"Hello, world!",0.333333333333333,4711,#<pointer 0>}

Please note that the current implementation has some limitations:

  • Just as with str and eval, runtime data (local closures and pointers other than the NULL pointer) can't be serialized, causing blob to fail. However, it is possible to transfer a global function, provided that the function exists (and is the same) in both the sending and the receiving process. (This condition can't be verified by val and thus is at the programmer's responsibilty.)
  • Sharing of subexpressions will in general be preserved, but sharing of list and tuple tails will be lost (unless the entire list or tuple is shared).
  • The val function may fail to reconstruct the serialized expression even for valid blobs, if there is a conflict in symbol fixities between the symbol tables of the sending and the receiving process. To avoid this, make sure that symbol declarations in the sending and the receiving script match up.

1.6.7   Other Special Primitives

throw x
Throw an exception, cf. Exception Handling.
force x
Force a thunk (x&), cf. Special Forms. This usually happens automagically when the value of a thunk is needed.

1.6.9   Sentries

Sentries are Pure's flavour of object finalizers. A sentry is simply an object (usually a function) which gets applied to the target expression when it is garbage-collected. This is useful to perform automatic cleanup actions on objects with internal state, such as files. Pure's sentries are much more useful than finalizers in other garbage-collected languages, since it is guaranteed that they are called as soon as an object "goes out of scope", i.e., becomes inaccessible.

sentry f x
Places a sentry f at an expression x and returns the modified expression.
clear_sentry x
Removes the sentry from an expression x.
get_sentry x
Returns the sentry of an expression x (if any, fails otherwise).

Note that in the current implementation sentries can only be placed at symbols, applications, as well as function and pointer objects, but the department of fake statistics has assured us that this covers 99.7% of all practical uses. The sentry itself can be any type of object (but usually it's a function). There can be only one sentry per expression but, building on the operations provided here, it's easy to design a scheme where sentries are chained.

Example:

> using system;
> sentry (\_->puts "Hello, world!") (1..3);
[1,2,3]
> clear ans
Hello, world!

Note that setting a finalizer on a global symbol won't usually be of much use since such values are cached by the interpreter. (However, the sentry will be invoked if the symbol gets recompiled because its definition has changed. This may be useful for some purposes.)

One common case where sentries are particularly useful are pointers. The library provides some special support for these by means of the following operations:

cooked ptr
Create a cooked pointer which disposes itself after use. This is just a shorthand for sentry free. The given pointer ptr must be malloced to make this work.
cookedp ptr
Check whether a given pointer is cooked (this predicate actually assumes any pointer to be cooked which has a sentry set on it).

Example:

> using system;
> let p = sentry (\p->puts "I'm done for!" $$ free p) (malloc 1024);
> cookedp p;
1
> clear p
I'm done for!

Besides their use as finalizers, sentries can also be handy in other circumstances, when you need to associate an expression with another, "invisible" value. In this case the sentry is usually some kind of data structure instead of a function to be executed at finalization time. For instance, here's how we can employ sentries to implement hashing of function values:

using dict;
hashed f x = case get_sentry f of
               h::Hdict = h!x if member h x;
               _ = y when y = f x; sentry (update h x y) f
                       when h = case get_sentry f of
                                  h::Hdict = h; _ = emptyhdict
                                end;
                       end;
                     end;
             end;

E.g., consider the naive recursive definition of the Fibonacci function:

fib n = if n<=1 then 1 else fib (n-1)+fib (n-2);

A hashed version of the Fibonacci function can be defined as follows:

let hfib = hashed f with
  f n = if n<=1 then 1 else hfib (n-1)+hfib (n-2)
end;

This turns the naive definition of the Fibonacci function (which has exponential time complexity) into a linear time operation:

> stats
> fib 35;
14930352
12.78s
> hfib 35;
14930352
0.25s

1.6.10   Expression References

Expression references provide a kind of mutable data cells which can hold any Pure expression. If you need these, then you're doomed. ;-) However, they can be useful as a last resort when you need to keep track of some local state or interface to the messy imperative world. Pure's references are actually implemented as expression pointers so that you can readily pass them as pointers to a C function which expects a pure_expr** parameter. This may even be useful at times.

ref x
Create a reference pointing to x initially.
put r x
Set a new value x, and return that value.
get r
Retrieve the current value r points to.
unref r
Purge the referenced object and turn the reference into a dangling pointer. (This is used as a sentry on reference objects and shouldn't normally be called directly.)
refp x
Predicate to check for reference values.

Note that manually removing the unref sentry of a reference turns the reference into just a normal pointer object and renders it unusable as a reference. Doing this will also leak memory, so don't!

2   Mathematical Functions

The math.pure module provides Pure's basic math routines. It also defines complex and rational numbers.

2.1   Imports

To use the operations of this module, add the following import declaration to your program:

using math;

2.3   Complex Numbers

We provide both rectangular (x+:y) and polar (r<:a) representations, where (x,y) are the Cartesian coordinates and (r,t) the radius (absolute value) and angle (in radians) of a complex number, respectively. The +: and <: constructors (declared in the prelude) bind weaker than all other arithmetic operators and are non-associative.

The polar representation r<:t is normalized so that r is always nonnegative and t falls in the range -pi<t<=pi.

The constant i is provided to denote the imaginary unit 0+:1.

The arithmetic operations +, * etc. and the equality relations == and ~= work as expected, and the square root, exponential, logarithms, trigonometric and hyperbolic trigonometric functions (see Basic Math Functions) are extended to complex numbers accordingly. These do not rely on complex number support in the C library, but should still conform to IEEE 754 and POSIX, provided that the C library provides a standards-compliant implementation of the basic math functions.

The following operations all work with both the rectangular and the polar representation, promoting real (double, int/bigint) inputs to complex where appropriate. When the result of an operation is again a complex number, it generally uses the same representation as the input (except for explicit conversions). Similarly, mixed rect/polar and polar/rect arithmetic always return a rect result, and mixed complex/real and real/complex arithmetic yields a rect or polar result, depending on what the complex input was.

complex x
Convert any kind of number to a complex value.
polar z, rect z
Convert between polar and rectangular representations.
cis t
Create complex values on the unit circle. Note: To quickly compute exp (x+:y) in polar form, use exp x <: y.
abs z, arg z
Modulus (absolute value) and argument (angle, a.k.a. phase). Note that you can also find both of these in one go by converting to polar form.
re z, im z
Real and imaginary part.
conj z
Complex conjugate.

Examples:

> using math;
> let z = 2^(1/i); z;
0.769238901363972+:-0.638961276313635
> let z = ln z/ln 2; z;
0.0+:-1.0
> abs z, arg z;
1.0,-1.5707963267949
> polar z;
1.0<:-1.5707963267949

Please note that, as the +: and <: constructors bind weaker than the other arithmetic operators, complex numbers must be parenthesized accordingly, e.g.:

> (1+:2)*(3+:4);
-5+:10

2.4   Rational Numbers

Pure's rational numbers are constructed with the exact division operator % (declared in the prelude) which has the same precedence and fixity as the other division operators.

The % operator returns a rational or complex rational for any combination of integer, rational and complex integer/rational arguments, provided that the denominator is nonzero (otherwise it behaves like x div 0, which will raise an exception). Machine int operands are always promoted to bigints, thus normalized rationals always take the form x%y where both the numerator x and the denominator y are bigints. For other numeric operands % works just like /. Rational results are normalized so that the sign is always in the numerator and numerator and denominator are relatively prime. In particular, a rational zero is always represented as 0L%1L.

The usual arithmetic operations and equality/order relations are extended accordingly, as well as the basic math functions and the rounding functions, and will return exact (rational or complex rational) results where appropriate. Rational operations are implemented using the GMP bigint functions where possible, and thus are reasonably fast.

In addition, the module also provides following operations:

rational x

Converts a real or complex value x to a rational or complex rational. Note that the conversion from double values doesn't do any rounding, so it is guaranteed that converting the resulting rational back to a double reconstructs the original value.

Conversely, the int, bigint, double, complex, rect, polar and cis conversion functions are overloaded so that they convert a rational to one of the other number types.

num x, den x
Numerator and denominator of a rational x.

Examples:

> using math;
> 5%7 + 2%3;
29L%21L
> 3%8 - 1%3;
1L%24L
> pow (11%10) 3;
1331L%1000L
> let x = pow 3 (-3); x;
1L%27L
> num x, den x;
1L,27L
> rational (3/4);
3L%4L

Note that doubles can't represent most rationals exactly, so conversion from double to rational will yield funny results in many cases (which are still accurate up to rounding errors). For instance:

> let x = rational (1/17); x;
4238682002231055L%72057594037927936L
> num x/den x;
0.0588235294117647
> double (1%17);
0.0588235294117647

2.5   Semantic Number Predicates

In difference to the syntactic predicates in Primitives, these check whether the given value can be represented as an object of the given target type (up to rounding errors). Note that if x is of syntactic type X, then it is also of semantic type X. Moreover, intvalp x => bigintvalp x => ratvalp x => realvalp x => compvalp x <=> numberp x.

compvalp x
Check for complex values (this is the same as numberp).
realvalp x
Check for real values (im x==0).
ratvalp x
Check for rational values (same as realvalp, except that IEEE 754 infinities and NaNs are excluded).
bigintvalp x
Check for "big" integer values which can be represented as a bigint.
intvalp x
Check for "small" integer values which can be represented as a machine int.

3   Container Types

The standard library provides a variety of efficient container data structures for different purposes. Note that these are all purely functional, i.e., immutable data structures implemented using different flavours of binary trees. Nevertheless operations are performed efficiently, in logarithmic time where possible.

The container types are all implemented as abstract data structures, so client modules shouldn't rely on the internal representation. Each type provides a corresponding type tag (cf. Type Tags in the Pure Manual), as given in the "Data Structure" section of each type, which can be used to match values of the type, e.g.:

shift a::Array = rmfirst a;

All container types implement the equality predicates == and ~= by recursively comparing their members. In addition, the set and bag data structures also implement the other comparison predicates (<, <= etc.) by checking whether one set/bag is a subset/subbag of another.

3.1   Arrays

The array.pure module implements an efficient functional array data structure which allows to access and update individual array members, as well as to add and remove elements at the beginning and end of an array. All these operations are carried out in logarithmic time.

3.1.1   Imports

To use the operations of this module, add the following import declaration to your program:

using array;

3.1.2   Data Structure

Arrays are represented as balanced tree structures of the form Array T, thus Array may be used as a type tag to restrict variables in pattern matching.

The internal data structure T is a binary tree built with the following constructors:

nil
the empty tree.
tip value
a leaf of the tree, holding the given value.
bin balance left right
a nonempty tree with the given balance flag (0 or 1) and left and right subtrees.

(This is for informational purposes only. The tree constructors are private, and client modules must not rely on the internal representation.)

3.1.4   Examples

Import the module:

> using array;

A one-dimensional array:

> let a::Array = array (0.0:0.1..1.0);
> #a; members a;
11
[0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]

Indexing an array works in the usual way, using Pure's ! operator. By virtue of the prelude, slicing an array with !! also works as expected:

> a!5;
0.5
> a!!(3..7);
[0.3,0.4,0.5,0.6,0.7]

Updating a member of an array produces a new array:

> let b::Array = update a 1 2.0;
> members b;
[0.0,2.0,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]

Two-dimensional arrays can be created with array2 from a list of lists:

> let a2::Array = array2 [[i,x | x = [u,v,w]] | i = 1..2];
> members2 a2;
[[(1,u),(1,v),(1,w)],[(2,u),(2,v),(2,w)]]
> a2!(1,2);
2,w
> a2!![(0,1),(1,2)];
[(1,v),(2,w)]
> a2!!(0..1,1..2);
[[(1,v),(1,w)],[(2,v),(2,w)]]

Here's how to convert an array to a Pure matrix:

> matrix $ members a;
{0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}
> matrix $ members2 a2;
{(1,u),(1,v),(1,w);(2,u),(2,v),(2,w)}

Converting back from a matrix to an array:

> let b2::Array = array2 $ list2 {(1,u),(1,v),(1,w);(2,u),(2,v),(2,w)};
> members2 b2;
[[(1,u),(1,v),(1,w)],[(2,u),(2,v),(2,w)]]

3.2   Heaps

Heaps allow quick (constant time) access to the smallest member, and to remove the smallest nember and insert new elements in logarithmic time. This implementation does not allow quick update of heap members; if such functionality is required, bags should be used instead (see Sets and Bags).

Heap members must be ordered by the <= predicate. Multiple instances of the same element may be stored in a heap; however, the order in which equal elements are retrieved is not specified.

3.2.1   Imports

To use the operations of this module, add the following import declaration to your program:

using heap;

3.2.2   Data Structure

Heaps are represented as balanced tree structures of the form Heap T, thus Heap may be used as a type tag to restrict variables in pattern matching.

The internal data structure T is a binary tree built with the following constructors:

nil
the empty tree.
bin balance value left right
a nonempty tree holding the given value at its root node, with the given balance flag (0 or 1) and left and right subtrees.

(This is for informational purposes only. The tree constructors are private, and client modules must not rely on the internal representation.)

3.2.4   Examples

> let h::Heap = heap [5,1,3,11,3];
> members h;
[1,3,3,5,11]
> first h;
1
> members $ rmfirst h;
[3,3,5,11]

3.3   Dictionaries

The dict.pure module provides Pure's Dict and Hdict data types based on AVL trees. Dict is an ordered dictionary (assuming an ordered key type), Hdict a hashed dictionary which works with any (mixture of) key types but stores members in an apparently random order.

The used AVL tree algorithm has its origin in the SWI-Prolog implementation of association lists. The original implementation was created by R. A. O'Keefe and updated for SWI-Prolog by Jan Wielemaker. For the original source see http://www.swi-prolog.org.

The port from SWI-Prolog and the deletion stuff (rmfirst, rmlast, delete) missing in the Prolog implementation was provided by Jiri Spitz.

3.3.1   Imports

To use the operations of this module, add the following import declaration to your program:

using dict;

3.3.2   Data Structure

Dictionaries are represented as balanced tree structures of the form Dict T or Hdict T, respectively, thus Dict and Hdict may be used as type tags to restrict variables in pattern matching.

The internal data structure T is an AVL tree built with the following constructors:

nil
the empty tree.
bin key value balance left right
a nonempty tree with given key and value in the root node, where left and right are the left and right subtree, and balance is either 1, 0 or -1, denoting |left|-|right| = 1, 0, or -1, respectively.

(This is for informational purposes only. The tree constructors are private, and client modules must not rely on the internal representation.)

3.3.4   Examples

A normal (ordered) dictionary:

> using dict;
> let d::Dict = dict ["foo"=>77,"bar"=>99.1];
> keys d; vals d; members d;
["bar","foo"]
[99.1,77]
["bar"=>99.1,"foo"=>77]

Indexing a dictionary works in the usual way, using Pure's ! operator. By virtue of the prelude, slicing a dictionary with !! also works as expected:

> d!"foo";
77
> d!!["foo","bar"];
[77,99.1]

A hashed dictionary can be used with any key values, which are stored in a seemingly random order:

> let h::Hdict = hdict [foo=>77,42=>99.1];
> keys h; vals h; members h;
[42,foo]
[99.1,77]
[42=>99.1,foo=>77]
> h!foo;
77
> h!!keys h;
[99.1,77]

3.4   Sets and Bags

The set.pure module implements Pure's Set and Bag (multiset) data types based on AVL trees. Set and bag elements must be ordered, i.e., the predicates ==, < and > must be defined on them. The used AVL tree algorithm has its origin in the SWI-Prolog implementation of association lists and was ported to Pure by Jiri Spitz, see Dictionaries for details.

3.4.1   Imports

To use the operations of this module, add the following import declaration to your program:

using set;

3.4.2   Data Structure

Sets and bags are represented as balanced tree structures of the form Set T or Bag T, respectively, thus Set and Bag may be used as type tags to restrict variables in pattern matching.

The internal data structure T is an AVL tree built with the following constructors:

nil
the empty tree.
bin key balance left right
a nonempty tree with given key (set element) in the root node, where left and right are the left and right subtree, and balance is either 1, 0 or -1, denoting |left|-|right| = 1, 0, or -1, respectively.

(This is for informational purposes only. The tree constructors are private, and client modules must not rely on the internal representation.)

3.4.4   Examples

Some basic set operations:

> let m::Set = set [5,1,3,11,3];
> members m;
[1,3,5,11]
> map (member m) (1..5);
[1,0,1,0,1]
> members $ m+set (3..6);
[1,3,4,5,6,11]
> members $ m-set (3..6);
[1,11]
> members $ m*set (3..6);
[3,5]

The bag operations work in a similar fashion, but note that multiple instances are permitted in this case, and each instance counts as a separate member:

> let m::Bag = bag [5,1,3,11,3];
> members m;
[1,3,3,5,11]
> members $ delete m 3;
[1,3,5,11]
> members $ insert m 1;
[1,1,3,3,5,11]
> members $ m+bag (3..6);
[1,3,3,3,4,5,5,6,11]
> members $ m-bag (3..6);
[1,3,11]
> members $ m*bag (3..6);
[3,5]

4   System Interface

This module offers some useful system routines, straight from the C library, as well as some convenience functions for wrapping these up in Pure. Even the "purest" program needs to do some basic I/O every once in a while, and this module provides the necessary stuff to do just that. The interface is rather minimalistic and preliminary right now, but will probably grow over time, and it already offers many common system functions that Pure programmers might want to use in their scripts.

Most of the following functions are extensively documented in the C library manual pages, so we concentrate on the Pure-specific aspects here.

4.1   Imports

To use the operations of this module, add the following import declaration to your program:

using system;

4.5   Time Functions

time
Reports the current time in seconds since the epoch, 00:00:00 UTC, Jan 1 1970. The result is always a bigint (in fact, the time value is already 64 bit on many OSes nowadays).

The following functions are provided to convert a time value to "broken-down" time or a string. See the ctime(3), gmtime(3), localtime(3), asctime(3) and strftime(3) manual pages for details.

ctime t
Convert a time value as returned by the time function to a string in local time.
gmtime t, localtime t
Convert a time value to UTC or local time in "broken-down" form (a static pointer to a tm struct containing a bunch of int fields) which can then be passed to the asctime and strftime functions, or to int_matrix if you want to convert the data to a matrix; see the example below.
asctime tm, strftime format tm
Format broken-down time as a string. strftime also uses a format string supplied by the user.

For instance:

> let tm = localtime time; tm;
#<pointer 0x7f1292131de0>
> asctime tm;
"Sat Mar 14 01:07:18 2009\n"
> int_matrix 9 tm;
{18,7,1,14,2,109,6,72,0}

We also provide some functions to retrieve wallclock and cpu time which usually offer much better resolution than time.

gettimeofday
Returns wallclock time as seconds since the epoch, like time, but theoretically offers resolutions in the microsec range (actual resolutions vary, but are usually in the msec range for contemporary systems). The result is returned as a double value (which also limits precision). This function may actually be implemented through different system calls, depending on what's available on the host OS.
clock
Returns the current CPU (not wallclock) time since an arbitrary point in the past, as a machine int. The number of "ticks" per second is given by the CLOCKS_PER_SEC constant. Note that this value will wrap around approximately every 72 minutes.
sleep t, nanosleep t
Suspend execution for a given time interval in seconds. sleep takes integer (int/bigint) arguments only and uses the sleep() system function. nanosleep also accepts double arguments and theoretically supports resolutions down to 1 nanosecond (again, actual resolutions vary). This function may actually be implemented through different system calls, depending on what's available on the host OS. Both functions usually return zero, unless the sleep was interrupted by a signal, in which case the time remaining to be slept is returned.

4.6   Process Functions

The following process functions are available on all systems. Please note that, as of Pure 0.28, some additional process-related functions such as fork, kill, wait and waitpid are available in the posix module, which can be used together with the system module if the additional functionality is required.

exit status
Terminate the program with the given status code.
system cmd
Execute a shell command.
execv prog argv, execvp prog argv, execve prog argv envp
Execute a new process. prog denotes the name of the executable to be run, argv the argument vector (which repeats the program name in the first component), and envp a vector of environment strings of the form "var=value". The execv function executes the program prog exactly as given, while execvp also performs a path search. The execve function is like execv, but also specifies an environment to be passed to the process. In either case, the new process replaces the current process. For convenience, both argv and envp can be specified as Pure string lists, which are automatically translated to the raw, NULL-terminated C string vectors (i.e., char**) required by the underlying C functions, using the byte_cstring_pointer function (see Low-Level Operations in the String Functions section).
spawnv mode prog argv, spawnvp mode prog argv, spawnve mode prog argv envp
Spawn a new child process. These work like the corresponding MS Windows functions; on Un*x systems this functionality is implemented using a combination of fork and execv. The arguments are the same as for the execv functions, except that there's an additional mode argument which specifies how the process is to be executed: P_WAIT waits for the process to finish, after which spawnv returns with the exit status of the terminated child process; P_NOWAIT makes spawnv return immediately, returning the process id; and P_OVERLAY causes the child process to replace its parent, just like with execv. (On Windows, there's an additional P_DETACH flag which works like P_NOWAIT but also turns the child process into a background task.)

Examples:

> system "pwd";
/home/ag/svn/pure-lang/trunk/pure/lib
0

> spawnvp P_WAIT "pwd" ["pwd"];
/home/ag/svn/pure-lang/trunk/pure/lib
0
> spawnv P_WAIT "/bin/sh" ["/bin/sh","-c","pwd"];
/home/ag/svn/pure-lang/trunk/pure/lib
0

> exit 0; // This exits from the interpreter with the given status code.

4.7   Basic I/O Interface

Note that this module also defines the standard I/O streams stdin, stderr and stdout as variables on startup. These are ready to be used with the operations described below. Also note that for convenience some of the following routines are actually Pure wrappers, rather than just providing the raw C library routines.

fopen name mode, popen cmd mode
Open a file or a pipe. These take care of closing a file object automagically when it's garbage-collected, and fail (instead of returning a null pointer) in case of error, so that you can provide any desired error handling simply by adding suitable equations.
fclose fp, pclose fp
Close a file or a pipe.
feof fp, ferror fp, clearerr fp
Check the end-of-file and error bits. clearerr clears the error bit.
fflush fp
Flushes the given file (or all open files if fp is NULL).
fgets fp, gets
Pure wrappers for the C fgets and gets functions which handle the necessary buffering automatically.
fget fp
A variation of fgets which slurps in an entire text file at once.
fputs s fp, puts s
Output a string to the given file or stdout, respectively. These are just the plain C functions. Note that puts automatically adds a newline, while fputs doesn't. Hmm.
fread ptr size nmemb fp, fwrite ptr size nmemb fp
Binary read/writes. Here you'll have to manage the buffers yourself. See the corresponding manual pages for details.
fseek fp offset whence, ftell fp, rewind fp
Reposition the file pointer and retrieve its current value. The constants SEEK_SET, SEEK_CUR and SEEK_END can be used for the whence argument of fseek. The call rewind fp is equivalent to fseek fp 0 SEEK_SET (except that the latter also returns a result code). See the corresponding manual pages for details.

Examples:

> puts "Hello, world!";
Hello, world!
14

> let fp = fopen "/etc/passwd" "r";
> fgets fp;
"at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash\n"
> fgets fp;
"avahi:x:103:104:User for Avahi:/var/run/avahi-daemon:/bin/false\n"
> ftell fp;
121L
> rewind fp;
()
> fgets fp;
"at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash\n"

> split "\n" $ fget $ popen "ls *.pure" "r";
["array.pure","dict.pure","getopt.pure","heap.pure","math.pure",
"matrices.pure","prelude.pure","primitives.pure","quasiquote2.pure",
"quasiquote.pure","set.pure","strings.pure","system.pure",""]

C-style formatted I/O is provided through the following wrappers for the C printf and scanf functions. Our wrapper functions take or return a tuple of values, and check these against the format specifiers, so they shouldn't segfault. However, only simple formats derived from %cdioux, %efg, %s and %p are supported right now.

printf format args, fprintf fp format args
Print a formatted string to stdout or the given file, respectively. Normally, these functions return the result of the underlying C routines (number of characters written, or negative on error). However, in case of an abnormal condition in the wrapper function (error in format string, argument mismatch), they will throw an exception.
sprintf format args
Print a formatted string to a buffer and return the result as a string. Unlike the C routine, this wrapper just returns the string result, or a null pointer in case of an error; otherwise, the error handling is the same as with printf and fprintf. The implementation actually uses the C routine snprintf for safety, and a suitable output buffer is provided automatically.
scanf format, fscanf fp format
Read formatted input from stdin or the given file, respectively. These normally return a tuple (or singleton) with the converted values. An exception of the form scanf_error ret, where ret is the tuple of successfully converted values (which may be less than the number of requested input items), is thrown if end-of-file was met or another error occurred while still reading. The handling of other abnormal conditions (e.g., error in format string) is analogous to printf et al. Also note that our implementation here doesn't accept any of the length modifiers used by the C routines. Floating point values will always be read in double precision, so you just specify "e", "g" etc. for these. However, the "assignment suppression" flag "*" is understood; the corresponding items will not be returned.
sscanf s format
This works exactly like fscanf, but input comes from a string (first argument) rather than a file.

Examples:

> do (printf "%s%d\n") [("foo",5),("catch",22)];
foo5
catch22
()
> sscanf "foo 5 22" "%s %d %g";
"foo",5,22.0

4.8   Stat and Friends

stat path
Return information about the given file. This is a simple wrapper around the corresponding system call, see the stat(2) manual page for details. The function returns a tuple with the most important fields from the stat structure, in this order: st_dev, st_ino, st_mode, st_nlink, st_uid, st_gid, st_rdev, st_size, st_atime, st_mtime, st_ctime. Among these, st_mode, st_nlink, st_uid and st_gid are simple machine integers, the rest is encoded as bigints (even on 32 bit platforms).
lstat path
Return information about the given symbolic link (rather than the file it points to). On systems where this function isn't supported (e.g., Windows), lstat is identical to stat.
fstat fp
Return information about the given file object. Same as stat, but here the file is given as a file pointer created with fopen (see Basic I/O Interface above). Note that the corresponding system function actually takes a file descriptor, so the Pure implementation is equivalent to the C call fstat(fileno(fp)). This function might not be supported on all platforms.

For average applications, the most interesting fields are st_mode and st_size, which can be retrieved with stat filename!![2,7]. Note that to facilitate access to the st_mode field, the usual masks and bits for file types (S_IFMT, S_IFREG, etc.) and permissions (S_ISUID, S_ISGID, S_IRWXU, etc.) are defined as constants by this module. Use the command show -g S_* in the interpreter to get a full list of these. Other interesting fields are st_atime, st_mtime and st_ctime, which can be accessed using stat filename!!(8..10). The values of these fields are the times of last access, last modification and creation, respectively, which can be decoded using the appropriate time functions like ctime or strftime, see Time Functions.

Examples:

> stat "/etc/passwd";
64773L,9726294L,33188,1,0,0,0L,1623L,1250373163L,1242692339L,1242692339L
> stat "/etc/passwd"!7;                                // file size
1623L
> strftime "%c" $ localtime $ stat "/etc/passwd"!10;   // creation time
"Tue 19 May 2009 02:18:59 AM CEST"
> sprintf "0%o" $ stat "/etc/passwd"!2 and not S_IFMT; // permissions
"0644"
> stat "/etc/passwd"!2 and S_IFMT == S_IFREG; // this is a regular file
1
> stat "/etc"!2 and S_IFMT == S_IFDIR;        // this is a directory
1

4.12   Regex Matching

The POSIX regex functions (regcomp and regexec) have a somewhat difficult calling sequence, hence we provide a couple of rather elaborate high-level wrapper functions for use in Pure programs.

regex pat cflags s eflags
Compiles and matches a regex in one go, and returns the list of submatches (if any).

The arguments are:

  • pat::string, the regular expression pattern;
  • cflags::int, the compilation flags (bitwise or of any of the flags accepted by regcomp(3));
  • s::string, the subject string to be matched;
  • eflags::int, the matching execution flags (bitwise or of any of the flags accepted by regexec(3)).

Symbolic REG_* constants are provided for the different flag values, see the regcomp(3) manpage for an explanation of these. (In extension to POSIX, Pure also provides the REG_SIZE constant which indicates the size needed for the compiled regex buffer, but this is only for internal use.)

Two particularly important compilation flags (to be included in the cflags argument) are REG_NOSUB, which prevents submatches to be computed, and REG_EXTENDED, which switches regex from "basic" to "extended" regular expressions so that it understands all the regular expression elements of egrep(1) in the pattern argument.

Depending on the flags and the outcome of the operation, the result of this function can take one of the following forms:

  • regerr code msg: This indicates an error during compilation of the pattern (e.g., if there was a syntax error in the pattern). code is the nonzero integer code returned by regcomp, and msg is the corresponding error message string, as returned by regerror. You can redefine the regerr function as appropriate for your application (e.g., if you'd like to print an error message or throw an exception).
  • 0 or 1: Just a truth value indicates whether the pattern matched or not. This will be the form of the result if the REG_NOSUB flag was specified for compilation, indicating that no submatch information is to be computed.
  • 0 (indicating no match), or 1 (indicating a successful match), where the latter value is followed by a tuple of (pos,substr) pairs for each submatch. This will be the form of the result only if the REG_NOSUB flag was not specified for compilation, so that submatch information is available.

Note that, according to POSIX semantics, a return value of 1 does not generally mean that the entire subject string was matched, unless you explicitly tie the pattern to the beginning (^) and end ($) of the string.

If the result takes the latter form, each (pos,substr) pair indicates a portion of the subject string which was matched; pos is the position at which the match starts, and substr is the substring (starting at position pos) which was matched. The first (pos,substr) pair always indicates which portion of the string was matched by the entire pattern, the remaining pairs represent submatches for the parenthesized subpatterns of the pattern, as described on the regcomp(3) manual page. Note that some submatches may be empty (if they matched the empty string), in which case a pair (pos,"") indicates the (nonnegative) position pos where the subpattern matched the empty string. Other submatches may not participate in the match at all, in which case the pair (-1,"") is returned.

The following helper functions are provided to analyze the result returned by regex.

reg_result res
Returns the result of a regex call, i.e., a regerr term if compilation failed, and a flag indicating whether the match was successful otherwise.
reg_info res
Returns the submatch info if any, otherwise it returns ().
reg n info
Returns the nth submatch of the given submatch info, where info is the result of a reg_info call.
regs info
Returns all valid submatches, i.e., the list of all triples (n,p,s) for which reg n == (p,s) with p>=0.

In addition, the following convenience functions are provided to perform global regex searches, to perform substitutions, and to tokenize a string according to a given delimiter regex.

regexg f pat cflags s eflags
Perform a global regular expression search. This routine will scan the entire string for (non-overlapping) instances of the pattern, applies the given function f to the reg_info for each match, and collects all results in a list. Note: Never specify the REG_NOSUB flag with this function, it needs the submatch info.
regexgg f pat cflags s eflags
This works like regexg, but allows overlapping matches.
regsub f pat cflags s eflags
Replaces all non-overlapping instances of a pattern with a computed substitution string. To these ends, the given function f is applied to the reg_info for each match; it should return a string value. The result string is then obtained by concatenating f info for all matches, with the unmatched portions of the string in between.
regsplit pat cflags s eflags
Splits a string into constituents delimited by substrings matching the given pattern.

4.12.1   Basic Examples

Let's have a look at some simple examples:

> let pat = "[[:alpha:]][[:alnum:]]*";
> let s = "1var foo 99 BAR $%&";

Simple match:

> regex pat 0 s 0;
1,1,"var"

Same without match info:

> regex pat REG_NOSUB s 0;
1

Global match, return the list of all matches:

> regexg id pat 0 s 0;
[(1,"var"),(5,"foo"),(12,"BAR")]

Same with overlapping matches:

> regexgg id pat 0 s 0;
[(1,"var"),(2,"ar"),(3,"r"),(5,"foo"),(6,"oo"),(7,"o"),(12,"BAR"),
(13,"AR"),(14,"R")]

Note that id (the identity function) in the examples above can be replaced with an arbitrary function which processes the matches. For instance, if we only want the matched strings instead of the full match info:

> regexg (!1) pat 0 s 0;
["var","foo","BAR"]

4.12.2   Regex Substitutions and Splitting

We can also perform substitutions on matches:

> regsub (sprintf "<%d:%s>") pat 0 s 0;
"1<1:var> <5:foo> 99 <12:BAR> $%&"

Or split a string using a delimiter pattern (this uses an egrep pattern):

> let delim = "[[:space:]]+";
> regsplit delim REG_EXTENDED s 0;
["1var","foo","99","BAR","$%&"]
> regsplit delim REG_EXTENDED "The   quick brown    fox" 0;
["The","quick","brown","fox"]

4.12.3   Empty Matches

Empty matches are permitted, too, subject to the constraint that at most one match is reported for each position (which also prevents looping). And of course an empty match will only be reported if nothing else matches. For instance:

> regexg id "" REG_EXTENDED "foo" 0;
[(0,""),(1,""),(2,""),(3,"")]
> regexg id "o*" REG_EXTENDED "foo" 0;
[(0,""),(1,"oo"),(3,"")]
> regexgg id "o*" REG_EXTENDED "foo" 0;
[(0,""),(1,"oo"),(2,"o"),(3,"")]

This also works when substituting or splitting:

> regsub (cst " ") "" REG_EXTENDED "some text" 0;
" s o m e   t e x t "
> regsub (cst " ") " ?" REG_EXTENDED "some text" 0;
" s o m e  t e x t "
> regsplit "" REG_EXTENDED "some text" 0;
["s","o","m","e"," ","t","e","x","t",""]
> regsplit " ?" REG_EXTENDED "some text" 0;
["s","o","m","e","","t","e","x","t",""]

4.12.4   Submatches

Parenthesized subexpressions in a pattern yield corresponding submatch information, which is useful if we need to retrieve the text matched by a given subexpression. For instance, suppose we want to parse environment lines, such as those returned by the shell's set command. These can be dissected using the following regex:

> const env_pat = "^([^=]+)=(.*)$";
> const env_flags = REG_EXTENDED or REG_NEWLINE;
> regex env_pat env_flags "SHELL=/bin/sh" 0;
1,0,"SHELL=/bin/sh",0,"SHELL",6,"/bin/sh"

Note that we again used an extended regex here, and we also added the REG_NEWLINE flag so that we properly deal with multiline input. The desired information is in the 4th and 6th element of the submatch info, we can retrieve that as follows:

> parse_env s = regexg (\info -> info!3 => info!5) env_pat env_flags s 0;
> parse_env "SHELL=/bin/sh\nHOME=/home/bar\n";
["SHELL"=>"/bin/sh","HOME"=>"/home/bar"]

We can get hold of the real process environment as follows:

> let env = parse_env $ fget $ popen "set" "r";
> #env;
109
> head env;
"BASH"=>"/usr/bin/sh"

Just for the fun of it, let's employ dict to convert this to a dictionary, providing easy random access to the environment variables:

> using dict;
> let env_dict = dict env;
> env_dict!!["SHELL","HOME"];
["/bin/bash","/home/ag"]

5   Getopt

This is a quick-and-dirty replacement for the GNU getopt functions, ported from the Q library.

5.1   Imports

To use the operations of this module, add the following import declaration to your program:

using getopt;

5.2   Operations

This module provides one operation: getopt opts args

The getopt function takes two arguments: opts, a list of option descriptions in the format described below, and args, a list of strings containing the command line parameters to be parsed for options. The result is a pair (opts_return,args_return) where opts_return is a list of options and their values, and args_return is the list of remaining (non-option) arguments. Options are parsed using the rules of GNU getopt(1). If an invalid option is encountered (unrecognized option, missing or extra argument, etc.), getopt throws the offending option string as an exception.

The opts_return value is a list of "hash pairs" opt=>val where opt is the (long) option name (as given by the long_opt field given in the opts argument, see below) and val is the corresponding value (() if none). Note that this format is ready to be passed to the dict or hdict function, cf. Dictionaries, which makes it easy to retrieve option values or check for the presence of options.

The opts argument of getopt must be a list of triples (long_opt, short_opt, flag), where long_opt denotes the long option, short_opt the equivalent short option, and flag is one of the symbolic integer values NOARG, OPTARG and REQARG which specifies whether the option has no argument, an optional argument or a required argument, respectively. Either long_opt or short_opt should be a string value of the form "--abc" or "-x", respectively. Note that since the long_opt value is always used to denote the corresponding option in the opts_return list, you always have to specify a sensible value for that field. If no separate long option name is needed, you can specify the same value as in the short_opt field, or some other convenient value (e.g., an integer) which designates the option. Conversely, to indicate that an option has no short option equivalent, simply specify an empty option string for the short_opt field.

5.3   Examples

> let opts = [("--help", "-h", NOARG),       // no argument
>             ("--version", "", NOARG),      // no short option
>             ("--filename", "-f", REQARG),  // required argument
>             ("--count", "-n", OPTARG)];    // optional argument
> getopt opts ["foo", "-h", "--filename", "bar", "-n0", "baz"];
["--help"=>(),"--filename"=>"bar","--count"=>"0"],["foo","baz"]
> catch invalid_option $ getopt opts ["-h","-v"];
invalid_option "-v"
> getopt opts [foo, "-h", bar];
["--help"=>()],[foo,bar]

As the last example shows, non-option arguments (as well as option values specified as separate arguments) can actually be any values which are just copied to the result lists as is.

6   Index