Author: | Albert Gräf <Dr.Graef@t-online.de> |
---|---|
Date: | 2009-09-03 |
Copyright (c) 2009 by Albert Gräf. This document is available under the GNU Free Documentation License. Also see the Copying section for licensing information of the software.
This manual describes the Pure programming language and how to invoke the Pure interpreter program. To read the manual inside the interpreter, just type help at the command prompt. See the Online Help section for details.
There is a companion to this manual, the Pure Library Manual which contains the description of the standard library operations. More information about Pure can be found under the following URLs:
Contents
Pure is a modern-style functional programming language based on term rewriting. Pure programs are basically collections of equational rules used to evaluate expressions in a symbolic fashion by reducing them to normal form. An overview of the language can be found in the Pure Overview section below, and subsequent sections discuss most language features in detail.
The Pure interpreter has an LLVM backend which JIT-compiles Pure programs to machine code, hence programs run blazingly fast and interfacing to C modules is easy, while the interpreter still provides a convenient, fully interactive environment for running Pure scripts and evaluating expressions. You can also compile your scripts to standalone executables if you prefer that.
Pure programs (a.k.a. scripts) are just ordinary text files containing Pure code. They must be encoded in UTF-8 (which subsumes 7 bit ASCII), other encodings such as Latin-1 are not supported. A bunch of syntax highlighting files and programming modes for various popular text editors are included in the Pure sources. There's no difference between the Pure programming language and the input language accepted by the interpreter, except that the interpreter also understands some special commands when running in interactive mode; see the Interactive Usage section for details.
(In case you're wondering, the name "Pure" actually refers to the adjective. But you can also write it as "PURE" and take this as a recursive acronym for the "Pure Universal Rewriting Engine".)
The Pure interpreter is invoked as follows:
pure [options ...] [script ...] [-- args ...] pure [options ...] -x script [args ...]
The interpreter accepts various options which are described in more detail below:
If any source scripts are specified on the command line, they are loaded and executed, after which the interpreter exits. Otherwise the interpreter enters the interactive read-eval-print loop, see Running Interactively below. You can also use the -i option to enter the interactive loop (continue reading from stdin) even after processing some source scripts.
Options and source files are processed in the order in which they are given on the command line. Processing of options and source files ends when either the -- or the -x option is encountered. The -x option must be followed by the name of a script to be executed, which becomes the "main script" of the application. In either case, any remaining parameters are passed to the executing script by means of the global argc and argv variables, denoting the number of arguments and the list of the actual parameter strings, respectively. In the case of -x this also includes the script name as argv!0. The -x option is useful, in particular, to turn Pure scripts into executable programs by including a "shebang" like the following as the first line in your main script. (This trick only works with Unix shells, though.)
#!/usr/local/bin/pure -x
On startup, the interpreter also defines the version variable, which is set to the version string of the Pure interpreter, and the sysinfo variable, which provides a string identifying the host system. These are useful if parts of your script depend on the particular version of the interpreter and the system it runs on. (Moreover, Pure 0.21 and later also define the variable compiling which indicates whether the program is executed in a batch compilation, see below.)
If available, the prelude script prelude.pure is loaded by the interpreter prior to any other definitions, unless the -n or --noprelude option is specified. The prelude is searched for in the directory specified with the PURELIB environment variable. If the PURELIB variable is not set, a system-specific default is used. Relative pathnames of other source scripts specified on the command line are interpreted relative to the current working directory. In addition, the executed program may load other scripts and libraries via a using declaration in the source, which are searched for in a number of locations, including the directories named with the -I and -L options; see the Declarations and C Interface sections for details.
If the -c option is specified, it forces the interpreter to non-interactive mode (unless -i is specified as well, which overrides -c). Any scripts specified on the command line are then executed as usual, but after execution the interpreter takes a snapshot of the program and compiles it to one of several supported output formats, LLVM assembler (.ll) or bitcode (.bc), native assembler (.s) or object (.o), or a native executable, depending on the output filename specified with -o. If the output filename ends in the .ll extension, an LLVM assembler file is created which can then be processed with the LLVM toolchain. If the output filename is just '-', the assembler file is written to standard output, which is useful if you want to pass the generated code to the LLVM tools in a pipeline. If the output filename ends in the .bc extension, an LLVM bitcode file is created instead.
The .ll and .bc formats are supported natively by the Pure interpreter, no external tools are required to generate these. If the target is an .s, .o or executable file, the Pure interpreter creates a temporary bitcode file on which it invokes the LLVM tools opt and llc to create a native assembler file, and then uses gcc to assemble and link the resulting program (if requested). You can also specify additional libraries to be linked into the executable with the -l option. If the output filename is omitted, it defaults to a.out (a.exe on Windows).
The -c option provides a convenient way to quickly turn a Pure script into a standalone executable which can be invoked directly from the shell. One advantage of compiling your script is that this eliminates the JIT compilation time and thus considerably reduces the startup time of the program. Another reason to prefer a standalone executable is that it lets you deploy the program on systems without a full Pure installation (usually only the runtime library is required on the target system). On the other hand, compiled scripts also have some limitations, mostly concerning the use of the built-in eval function. Please see Batch Compilation in the Caveats and Notes section for details.
The -v64 (or -v0100) verbosity option can be used to have the interpreter print the commands it executes during compilation, see Verbosity and Debugging Options below. When creating an object file, this also prints the suggested linker command (including all the dynamic modules loaded by the script, which also have to be linked in to create a working executable), to which you only have to add the options describing the desired output file.
If the interpreter runs in interactive mode, it repeatedly prompts you for input (which may be any legal Pure code or some special interpreter commands provided for interactive usage), and prints computed results. This is also known as the read-eval-print loop and is described in much more detail in the Interactive Usage section. To exit the interpreter, just type the quit command or the end-of-file character (^D on Unix) at the beginning of the command line.
The interpreter may also source a few additional interactive startup files immediately before entering the interactive loop, unless the --norc option is specified. First .purerc in the user's home directory is read, then .purerc in the current working directory. These are ordinary Pure scripts which can be used to provide additional definitions for interactive usage. Finally, a .pure file in the current directory (containing a dump from a previous interactive session) is loaded if it is present.
When the interpreter is in interactive mode and reads from a tty, unless the --noediting option is specified, commands are read using readline(3) (providing completion for all commands listed under Interactive Usage, as well as for symbols defined in the running program). When exiting the interpreter, the command history is stored in ~/.pure_history, from where it is restored the next time you run the interpreter.
As of Pure 0.22, the interpreter also provides a simple source level debugger when run in interactive mode, see Debugging for details. To enable the interactive debugger, you need to specify the -g option when invoking the interpreter. This option causes your script to run much slower and also disables tail call optimization, so you should only use this option if you want to run the debugger.
The -v option is most useful for debugging the interpreter, or if you are interested in the code your program gets compiled to. The level argument is optional; it defaults to 1. Seven different levels are implemented at this time (one more bit is reserved for future extensions). For most purposes, only the first two levels will be useful for the average Pure programmer; the remaining levels are most likely to be used by the Pure interpreter developers.
These values can be or'ed together, and, for convenience, can be specified in either decimal, hexadecimal or octal. Thus 0xff or 0777 always gives you full debugging output (which isn't likely to be used by anyone but the Pure developers). Some useful flag combinations for experts are (in octal) 007 (echo definitions along with de Bruijn indices and matching automata), 011 (definitions and assembler code) and 021 (parser debugging output along with parsed definitions).
Note that the -v option is only applied after the prelude has been loaded. If you want to debug the prelude, use the -n option and specify the prelude.pure file explicitly on the command line. Verbose output is also suppressed for modules imported through a using clause. As a remedy, you can use the interactive show command (see the Interactive Usage section) to list definitions along with additional debugging information.
The interpreter may source various files during its startup. These are:
Various aspects of the interpreter can be configured through the following shell environment variables:
Pure is a fairly simple yet powerful language. Programs are basically collections of rewriting rules and expressions to be evaluated. For convenience, it is also possible to define global variables and constants, and for advanced uses Pure offers macro functions as a kind of preprocessing facility. These are all described below and in the following sections.
Here's a first example which demonstrates how to define a simple recursive function in Pure, entered interactively in the interpreter (note that the '>' symbol at the beginning of each input line is the interpreter's default command prompt):
> // my first Pure example > fact 0 = 1; > fact n::int = n*fact (n-1) if n>0; > let x = fact 10; x; 3628800
Pure is a free-format language; whitespace is insignificant, except if it serves to delimit other symbols. Hence, as shown above, definitions and expressions at the toplevel have to be terminated with a semicolon, even in interactive mode.
Comments have the same syntax as in C++ (using // for line-oriented and /* ... */ for multiline comments; the latter may not be nested). Lines beginning with #! are treated as comments, too; as already discussed above, on Unix-like systems this allows you to add a "shebang" to your main script in order to turn it into an executable program.
There are a few reserved keywords which cannot be used as identifiers:
case const def else end extern if infix infixl infixr let namespace nonfix of otherwise outfix postfix prefix private public then using when with
The customary notations for identifiers, numbers and strings are all provided. In addition, Pure also allows you to define your own operator symbols. Pure fully supports Unicode, so that you can write your programs in almost any language and make good use of the special symbols in the Unicode character set, provided that you encode your scripts in UTF-8. To keep this simple, besides the ASCII punctuation characters, Pure also considers the following code points in the Unicode repertoire as punctuation: U+00A1 through U+00BF, U+00D7, U+00F7, and U+20D0 through through U+2BFF. This comprises the special symbols in the Latin-1 repertoire, as well as the Combining Diacritical Marks for Symbols, Letterlike Symbols, Number Forms, Arrows, Mathematical Symbols, Miscellaneous Technical Symbols, Control Pictures, OCR, Enclosed Alphanumerics, Box Drawing, Blocks, Geometric Shapes, Miscellaneous Symbols, Dingbats, Miscellaneous Mathematical Symbols A, Supplemental Arrows A, Supplemental Arrows B, Miscellaneous Mathematical Symbols B, Supplemental Mathematical Operators, and Miscellaneous Symbols and Arrows. This should cover almost everything you'd ever want to use in an operator symbol. All other extended Unicode characters are effectively treated as "letters" which can be used as identifier constituents.
On the surface, Pure is quite similar to other modern functional languages like Haskell and ML. But under the hood it is a much more dynamic language, more akin to Lisp. In particular, Pure is dynamically typed, so functions can be fully polymorphic and you can add to the definition of an existing function at any time. For instance, we can extend the example above to make the fact function work with floating point numbers, too:
> fact 0.0 = 1.0; > fact n::double = n*fact (n-1) if n>0; > fact 10.0; 3628800.0 > fact 10; 3628800
Note the n::double construct on the left-hand side of the second equation, which means that the equation is only to be applied for (double precision) floating point values x. This construct is also called a "type tag" in Pure parlance, which is actually a simple form of pattern matching (see below). Similarly, our previous definition at the beginning of this section employed the int tag to indicate that the n parameter is an integer value. The int and double types are built into the Pure language, but it is also possible to introduce your own type tags for user-defined data structures. This will be explained in more detail under Type Tags in the Rule Syntax section below.
Expressions are generally evaluated from left to right, innermost expressions first, i.e., using call by value semantics. Pure also has a few built-in special forms (most notably, conditional expressions, the short-circuit logical connectives && and ||, the sequencing operator $$, the lazy evaluation operator &, and the quote) which take some or all of their arguments unevaluated, using call by name.
Like in Haskell and ML, functions are often defined by pattern matching, i.e., the left-hand side of a definition is compared to the target expression, binding the variables in the pattern to their actual values accordingly:
> foo (bar x) = x-1; > foo (bar 99); 98
Due to its term rewriting semantics, Pure goes beyond most other functional languages in that it can do symbolic evaluations just as well as "normal" computations:
> square x = x*x; > square 4; 16 > square (a+b); (a+b)*(a+b)
Leaving aside the built-in support for some common data structures such as numbers and strings, all the Pure interpreter really does is evaluating expressions in a symbolic fashion, rewriting expressions using the equations supplied by the programmer, until no more equations are applicable. The result of this process is called a normal form which represents the "value" of the original expression. Keeping with the tradition of term rewriting, there's no distinction between "defined" and "constructor" function symbols in Pure. Consequently, any function symbol or operator can be used anywhere on the left-hand side of an equation, and may act as a constructor symbol if it happens to occur in a normal form term. This enables you to work with algebraic rules like associativity and distributivity in a direct fashion:
> (x+y)*z = x*z+y*z; x*(y+z) = x*y+x*z; > x*(y*z) = (x*y)*z; x+(y+z) = (x+y)+z; > square (a+b); a*a+a*b+b*a+b*b
Note that, in contrast, languages like Haskell and ML always enforce the so-called "constructor discipline", which stipulates that only pure constructor symbols (without any defining equations) may occur as a subterm on the left-hand side of a definition. Thus equational definitions like the above are forbidden in these languages. In Pure such definitions are just normal business, which makes the language directly usable as a tool for symbolic algebra.
Taking a look at the above examples, you might have been wondering how the Pure interpreter figures out what the parameters (a.k.a. "variables") in an equation are. This is quite obvious in rules involving just variables and special operator symbols, such as (x+y)*z = x*z+y*z. However, what about an equation like foo (foo bar) = bar? Since most of the time we don't declare any symbols in Pure, how does the interpreter know that foo is a literal function symbol here, while bar is a variable?
The answer is that the interpreter considers the different positions in the left-hand side expression of an equation. Basically, a Pure expression is just a tree formed by applying expressions to other expressions, with the atomic subexpressions like numbers and symbols at the leaves of the tree. (This is even true for infix expressions like x+y, since in Pure these are always equivalent to a function application of the form (+) x y which has the atomic subterms (+), x and y at its leaves.)
Now the interpreter divides the leaves of the expression tree into "head" (or "function") and "parameter" (or "variable") positions based on which leaves are leftmost in a function application or not. Thus, in an expression like f x y z, f is in the head or function position, while x, y and z are in parameter or variable positions. (Note that in an infix expression like x+y, (+) is the head symbol, not x, as the expression is really parsed as (+) x y, see above.)
Identifiers in head positions are taken as literal function symbols by the interpreter, while identifiers in variable positions denote, well, variables. We also refer to this convention as the the head = function rule. It is quite intuitive and lets us get away without declaring the variables in equations. (There are some corner cases not covered here, however. In particular, Pure allows you to declare special constant symbols, if you need a symbol to be recognized as a literal even if it occurs in a variable position. This is done by means of a nonfix declaration, see Symbol Declarations for details.)
The Pure language provides built-in support for machine integers (32 bit), bigints (implemented using GMP), floating point values (double precision IEEE 754), character strings (UTF-8 encoded) and generic C pointers (these don't have a syntactic representation in Pure, though, so they need to be created with external C functions). Truth values are encoded as machine integers (as you might expect, zero denotes false and any non-zero value true). Pure also provides some built-in support for lists and matrices, although most of the corresponding operations are actually defined in the prelude.
Expressions consist of the following elements:
The usual C'ish notations for integers (decimal: 1000, hexadecimal: 0x3e8, octal: 01750), floating point values and double-quoted strings are all provided, although the Pure syntax differs in some minor ways, as discussed in the following. First, there is a special notation for denoting bigints. Integer constants that are too large to fit into machine integers are promoted to bigints automatically. Moreover, integer literals immediately followed by the uppercase letter L are always interpreted as bigint constants, even if they fit into machine integers. This notation is also used when printing bigint constants.
Second, character escapes in Pure strings have a more flexible syntax borrowed from the author's Q language, which provides notations to specify any Unicode character. In particular, the notation \n, where n is an integer literal written in decimal (no prefix), hexadecimal (0x prefix) or octal (0 prefix) notation, denotes the Unicode character (code point) #n. Since these escapes may consist of a varying number of digits, parentheses may be used for disambiguation purposes; thus, e.g. "\(123)4" denotes character #123 followed by the character 4. The usual C-like escapes for special non-printable characters such as \n are also supported. Moreover, you can use symbolic character escapes of the form \&name;, where name is any of the XML single character entity names specified in the XML Entity definitions for Characters. Thus, e.g., "\©" denotes the copyright character (code point 0x000A9).
For convenience, Pure also provides you with a limited means to extend the syntax of the language with special operator and constant symbols by means of a corresponding fixity declaration, as discussed in section Symbol Declarations. Besides the usual infix, prefix and postfix operators, Pure also provides outfix (bracket) and nonfix (constant) symbols. (Nonfix symbols actually work more or less like ordinary identifiers, but the nonfix attribute tells the compiler that when such a symbol occurs on the left-hand side of an equation, it is always to be interpreted as a literal constant, cf. Parameters in Equations.)
Operator and constant symbols may take the form of an identifier or a sequence of punctuation characters. They must always be declared before use. Once declared, they are always special, and can't be used as ordinary identifiers any more. However, like in Haskell, by enclosing an operator in parentheses, such as (+) or (not), you can turn it into an ordinary function symbol. Also, operators and constant symbols can be qualified with a namespace just like normal identifiers.
Pure's basic list syntax is the same as in Haskell, thus [] is the empty list and x:xs denotes a list with head element x and tail list xs. (The infix constructor symbol ':' is declared in the prelude.) The usual syntactic sugar for list values in brackets is provided, thus [x,y,z] is exactly the same as x:y:z:[].
There's also a way to denote arithmetic sequences such as 1..5, which denotes the list [1,2,3,4,5]. (Haskell users should note the missing brackets. In difference to Haskell, Pure doesn't use any special syntax for arithmetic sequences, the '..' symbol is just an ordinary infix operator declared and defined in the prelude.) Sequences with arbitrary stepsizes can be written by denoting the first two sequence elements using the ':' operator, as in 1.0:1.2..3.0. (To prevent unwanted artifacts due to rounding errors, the upper bound in a floating point sequence is always rounded to the nearest grid point. Thus, e.g., 0.0:0.1..0.29 actually yields [0.0,0.1,0.2,0.3], as does 0.0:0.1..0.31.)
Pure also offers matrices, a kind of arrays, as a built-in data structure which provides efficient storage and element access. These work more or less like their Octave/MATLAB equivalents, but using curly braces instead of brackets. As indicated, commas are used to separate the columns of a matrix, semicolons for its rows. In fact, the {...} construct is rather general, allowing you to construct new matrices from individual elements and/or submatrices, provided that all dimensions match up. E.g., {{1;3},{2;4}} is another way to write a 2x2 matrix in "column-major" form (however, internally all matrices are stored in C's row-major format).
Note that, while the [...] and {...} constructs look superficially similar, they work in very different ways. The former is just syntactic sugar for a corresponding constructor term and can thus be used as a pattern on the left-hand side of an equation. In contrast, the latter is a built-in operation which creates objects of a special matrix type. These can not be used as patterns (instead, matrix values can be matched using the special matrix type tag, see the Rule Syntax section for details).
If the interpreter was built with support for the GNU Scientific Library (GSL) then both numeric and symbolic matrices are available. The former are thin wrappers around GSL's homogeneous arrays of double, complex double or (machine) int matrices, while the latter can contain any mixture of Pure expressions. Pure will pick the appropriate type for the data at hand. If a matrix contains values of different types, or Pure values which cannot be stored in a numeric matrix, then a symbolic matrix is created instead (this also includes the case of bigints, which are considered as symbolic values as far as matrix construction is concerned). If the interpreter was built without GSL support then symbolic matrices are the only kind of matrices supported by the interpreter.
More information about matrices and corresponding examples can be found in the Examples section below.
Pure provides both list and matrix comprehensions as a convenient means to construct list and matrix values from a "template" expression and one or more "generator" and "filter" clauses. The former bind a pattern to values drawn from a list or matrix, the latter are just predicates determining which generated elements should actually be added to the result. Both list and matrix comprehensions are in fact syntactic sugar for a combination of nested lambdas, conditional expressions and "catmaps" (a collection of operations which combine list or matrix construction and mapping a function over a list or matrix, defined in the prelude), but they are often much easier to write.
Matrix comprehensions work pretty much like list comprehensions, but produce matrices instead of lists. List generators in matrix comprehensions alternate between row and column generation so that most common mathematical abbreviations carry over quite easily. Examples of both kinds of comprehensions can be found in the Examples section below.
As in other modern FPLs, function applications are written simply as juxtaposition (i.e., in "curried" form) and associate to the left. This means that in fact all functions only take a single argument. Multi-argument functions are represented as chains of single-argument functions. For instance, in f x y = (f x) y first the function f is applied to the first argument x, yielding the function f x which in turn gets applied to the second argument y. This makes it possible to derive new functions from existing ones using partial applications which only specify some but not all arguments of a function. For instance, taking the max function from the prelude as an example, max 0 is the function which, for a given x, returns x itself if it is nonnegative and zero otherwise. This works because (max 0) x = max 0 x is the maximum of 0 and x.
Operator applications are written using prefix, postfix, outfix or infix notation, as the declaration of the operator demands, but are just ordinary function applications in disguise. As already mentioned, enclosing an operator in parentheses turns it into an ordinary function symbol, thus x+y is exactly the same as (+) x y. For convenience, partial applications of infix operators can also be written using so-called operator sections. A left section takes the form (x+) which is equivalent to the partial application (+) x. A right section takes the form (+x) and is equivalent to the term flip (+) x. (This uses the flip combinator from the prelude which is defined as flip f x y = f y x.) Thus (x+) y is equivalent to x+y, while (+x) y reduces to y+x. For instance, (1/) denotes the reciprocal and (+1) the successor function. (Note that, in contrast, (-x) always denotes an application of unary minus; the section (+-x) can be used to indicate a function which subtracts x from its argument.)
Expressions are parsed according to the following precedence rules: Lambda binds most weakly, followed by when, with and case, followed by conditional expressions (if-then-else), followed by the simple expressions, i.e., all other kinds of expressions involving operators, function applications, constants, symbols and other primary expressions. Precedence and associativity of operator symbols are given by their declarations (cf. Symbol Declarations), and function application binds stronger than all operators. Parentheses can be used to override default precedences and associativities as usual.
The common operator symbols like +, -, *, / etc. are all declared at the beginning of the prelude, see the Pure Library Manual for a list of these. Arithmetic and relational operators mostly follow C conventions. However, out of necessity (!, & and | are used for other purposes in Pure) the logical and bitwise operations, as well as the negated equality predicates are named a bit differently: ~, && and || denote logical negation, conjunction and disjunction, while the corresponding bitwise operations are named not, and and or. Moreover, following these conventions, inequality is denoted ~=. Also note that && and || are special forms which are evaluated in short-circuit mode like in C (see below), whereas the bitwise connectives receive their arguments using call-by-value, just like the other arithmetic operations.
As already mentioned, some operations are actually implemented as special forms. In particular, the conditional expression if x then y else z is a special form with call-by-name arguments y and z; only one of the branches is actually evaluated, depending on the value of x. Similarly, the logical connectives && and || evaluate their operands in short-circuit mode just like in C. Thus, e.g., x&&y immediately becomes false if x evaluates to false, without ever evaluating y. Also note that && and || only work with machine int arguments and always raise an exception in case of argument mismatch, so they cannot be used for symbolic evaluations. In contrast, the bitwise connectives (not, and, or) use the standard call-by-value argument passing, and the prelude defines these operations only for machine int and bigint operands, so that they can be used in symbolic expressions like a and b or c.
The sequencing operator $$ evaluates its left operand, immediately throws the result away and then goes on to evaluate the right operand which gives the result of the entire expression. This operator is useful to write imperative-style code such as the following prompt-input interaction:
> using system; > puts "Enter a number:" $$ scanf "%g"; Enter a number: 21 21.0
We mention in passing here that the same effect can be achieved with a when clause, which also allows you to execute a function solely for its side-effects and just ignore the return value:
> scanf "%g" when puts "Enter a number:" end; Enter a number: 21 21.0
The & operator does lazy evaluation. This is the only postfix operator defined in the standard prelude, written as x&, where x is an arbitrary Pure expression. It turns its operand into a kind of parameterless anonymous closure, deferring its evaluation. These kinds of objects are also commonly known as thunks or futures. When the value of a future is actually needed (during pattern-matching, or when the value becomes an argument of a C call), it is evaluated automatically and gets memoized, i.e., the computed result replaces the thunk so that it only has to be computed once. Futures are useful to implement all kinds of lazy data structures in Pure, in particular: lazy lists a.k.a. streams. A stream is simply a list with a thunked tail, which allows it to be infinite. The Pure prelude defines many functions for creating and manipulating these kinds of objects; further details and examples can be found in the Examples section below.
Last but not least, the special form quote quotes an expression, i.e., quote x (or, equivalently, 'x) returns just x itself without evaluating it. The prelude also provides a function eval which can be used to evaluate a quoted expression at a later time. For instance:
> let x = '(2*42+2^12); x; 2*42+2^12 > eval x; 4180.0
The quote also inhibits evaluation inside matrix values, including the "splicing" of embedded submatrices:
> '{1,2+3,2*3}; {1,2+3,2*3} > '{1,{2,3},4}; {1,{2,3},4}
The quote should be well familiar to Lisp programmers. However, there are some notable differences, please see The Quote in the Caveats and Notes section for details and more examples.
At the toplevel, a Pure program basically consists of rewriting rules (which are used to define functions and macros), constant and variable definitions, and expressions to be evaluated:
A few remarks about the scope of identifiers and other symbols are in order here. Like most modern functional languages, Pure uses lexical or static binding for local functions and variables. What this means is that the binding of a local name is completely determined at compile time by the surrounding program text, and does not change as the program is being executed. In particular, if a function returns another (anonymous or local) function, the returned function captures the environment it was created in, i.e., it becomes a (lexical) closure. For instance, the following function, when invoked with a single argument x, returns another function which adds x to its argument:
> foo x = bar with bar y = x+y end; > let f = foo 99; f; bar > f 10, f 20; 109,119
This works the same no matter what other bindings of x may be in effect when the closure is invoked:
> let x = 77; f 10, (f 20 when x = 88 end); 109,119
Global bindings of variable and function symbols work a bit differently, though. Like many languages which are to be used interactively, Pure binds global symbols dynamically, so that the bindings can be changed easily at any time during an interactive session. This is mainly a convenience for interactive usage, but works the same no matter whether the source code is entered interactively or being read from a script, in order to ensure consistent behaviour between interactive and batch mode operation.
So, for instance, you can easily bind a global variable to a new value by just entering a corresponding let command:
> foo x = c*x; > foo 99; c*99 > let c = 2; foo 99; 198 > let c = 3; foo 99; 297
This works pretty much like global variables in imperative languages, but note that in Pure the value of a global variable can only be changed with a let command at the toplevel. Thus referential transparency is unimpaired; while the value of a global variable may change between different toplevel expressions, it will always take the same value in a single evaluation.
Similarly, you can also add new equations to an existing function at any time:
> fact 0 = 1; > fact n::int = n*fact (n-1) if n>0; > fact 10; 3628800 > fact 10.0; fact 10.0 > fact 1.0 = 1.0; > fact n::double = n*fact (n-1) if n>1; > fact 10.0; 3628800.0 > fact 10; 3628800
(In interactive mode, it is even possible to completely erase a definition, see section Interactive Usage for details.)
So, while the meaning of a local symbol never changes once its definition has been processed, toplevel definitions may well evolve while the program is being processed, and the interpreter will always use the latest definitions at a given point in the source when an expression is evaluated. This means that, even in a script file, you have to define all symbols needed in an evaluation before entering the expression to be evaluated.
Basically, the same rule syntax is used in all kinds of global and local definitions. However, some constructs (specifically, when, let, const and def) use a restricted rule syntax where no guards or multiple left-hand and right-hand sides are permitted. When matching against a function or macro call, or the subject term in a case expression, the rules are always considered in the order in which they are written, and the first matching rule (whose guard evaluates to a nonzero value, if applicable) is picked. (Again, the when construct is treated differently, because each rule is actually a separate definition.)
The left-hand side of a rule is a special kind of simple expression, called a pattern. Patterns consist of function and operator applications as well as any of the "atomic" expression types (symbols, numbers, strings and list values). Not permitted are any of the special expression types (lambda, case, when, with, conditional expressions, as well as list and matrix comprehensions). For technical reasons, the current implementation also forbids matrix values in patterns, but it is possible to match a matrix value as a whole using the matrix type tag, see below.
Patterns must not contain repeated variables, i.e., rules must be "left-linear". The only exception to this rule is the anonymous variable '_' which matches an arbitrary value (independently for all occurrences) without binding a variable symbol.
Patterns may contain the following special elements which are not permitted in right-hand side expressions:
Syntactically, both "as" patterns and type tags are primary expressions. If the subpattern of an "as" pattern is a compound expression, it must be parenthesized. For instance, the following function duplicates the head element of a list:
foo xs @ (x:_) = x:xs;
Note that if you accidentally forget the parentheses around the subpattern x:_, you still get a syntactically correct definition:
foo xs @ x:_ = x:xs;
But this gets parsed as (foo xs@x):_ = x:xs (despite the spacing which is ignored by the parser), which is most certainly not what you want. It is thus a good idea to just always enclose the subpattern with parentheses in order to prevent such glitches.
Another potential pitfall is that the notation foo::bar is also used to denote "qualified symbols" in Pure, cf. Namespaces. Usually this will be resolved correctly, but if foo happens to also be a valid namespace then most likely you'll get an error message about an undeclared symbol. You can always work around this by adding spaces around the '::' symbol, as in foo :: bar. Spaces are never permitted in qualified symbols, so this makes it clear that the construct denotes a type tag.
Type tags are really nothing but a special form of "as" patterns which restrict variables to given data "domains". Like Lisp, Pure is essentially a typeless language and doesn't really have a notion of "data types"; all data belongs to the same universe of terms. Thus the only way to restrict the type of values which can be matched by a variable is to provide an "as" pattern which matches objects of the desired domain and nothing else. However, in the special case of the built-in types (machine and big integers, double values, strings, matrices and pointers) there is no way to spell out all "constructors", as there are infinitely many (or none, as in the case of matrix and pointer which are constructed and inspected using special primitives, but are otherwise "opaque" at the Pure level). As a remedy, an appropriate type tag makes it possible to match these values.
In order to generalize this to user-defined domains of data, Pure adopts the convention that any other tagged variable x::bar is just a shorthand for the "as" pattern x@(bar _). Thus a custom data type can be represented by designating a special constructor symbol (bar in the above example) which takes the actual data as its single argument. This works the same no matter what the internal structure of the data is (which in many cases you wouldn't want to depend on anyway, for the sake of data abstraction).
Note that this is merely a convention, but it works reasonably well and makes up for the fact that Pure doesn't support data types at the language level. For instance, we might represent points in the plane using a constructor symbol Point which gets applied to pairs of coordinates. We equip this data type with an operation point to construct a point from its coordinates, and two operations xcoord and ycoord to retrieve the coordinates:
point x y = Point (x,y); xcoord (Point (x,y)) = x; ycoord (Point (x,y)) = y;
Now we might define a function translate which shifts the coordinates of a point by a given amount in the x and y directions as follows:
translate (x,y) p::Point = point (xcoord p+x) (ycoord p+y);
Note the use of Point as a type tag on the p variable. By these means, we can ensure that the argument is actually an instance of the point data type, without knowing anything about the internal representation. We can use these definitions as follows:
> let p::Point = point 3 3; > p; translate (1,2) p; Point (3,3) Point (4,5)
Some data types in Pure's standard library (specifically, the container data types) are actually represented in this fashion, see the Pure Library Manual for details.
The most general type of rule, used in function definitions and case expressions, consists of a left-hand side pattern, a right-hand side expression and an optional guard. The left-hand side of a rule can be omitted if it is the same as for the previous rule. This provides a convenient means to write out a collection of equations for the same left-hand side which discriminates over different conditions:
lhs = rhs if guard; = rhs if guard; ... = rhs otherwise;
For instance:
fact n = n*fact (n-1) if n>0; = 1 otherwise;
Pure also allows a collection of rules with different left-hand sides but the same right-hand side(s) to be abbreviated as follows:
lhs | ... lhs = rhs;
This is useful if you need different specializations of the same rule which use different type tags on the left-hand side variables. For instance:
fact n::int | fact n::double | fact n = n*fact(n-1) if n>0; = 1 otherwise;
In fact, the left-hand sides don't have to be related at all, so that you can also write something like:
foo x | bar y = x*y;
However, this construct is most useful when using an "as" pattern to bind a common variable to a parameter value after checking that it matches one of several possible argument patterns (which is slightly more efficient than using an equivalent type-checking guard). E.g., the following definition binds the xs variable to the parameter of foo, if it is either the empty list or a list starting with an integer:
foo xs@[] | foo xs@(_::int:_) = ... xs ...;
The same construct also works in case expressions, which is convenient if different cases should be mapped to the same value, e.g.:
case ans of "y" | "Y" = 1; _ = 0; end;
Sometimes it is useful if local definitions (when and with) can be shared by the right-hand side and the guard of a rule. This can be done by placing the local definitions behind the guard, as follows (we only show the case of a single when clause here, but of course there may be any number of when and with clauses behind the guard):
lhs = rhs if guard when defns end;
Note that this is different from the following, which indicates that the definitions only apply to the guard but not the right-hand side of the rule:
lhs = rhs if (guard when defns end);
Conversely, definitions placed before the guard only apply to the right-hand side but not the guard (no parentheses are required in this case):
lhs = rhs when defns end if guard;
An example showing the use of a local variable binding spanning both the right-hand side and the guard of a rule is the following quadratic equation solver, which returns the (real) solutions of the equation x^2+p*x+q = 0 if the discriminant d = p^2/4-q is nonnegative:
> using math; > solve p q = -p/2+sqrt d,-p/2-sqrt d if d>=0 when d = p^2/4-q end; > solve 4 2; solve 2 4; -0.585786437626905,-3.41421356237309 solve 2 4
Note that the above definition leaves the case of a negative discriminant undefined.
As already mentioned, when, let and const use a simplified kind of rule syntax which just consists of a left-hand and a right-hand side separated by the equals sign. In this case the meaning of the rule is to bind the variables in the left-hand side of the rule to the corresponding subterms of the value of the right-hand side. This is also called a pattern binding.
Guards or multiple left-hand or right-hand sides are not permitted in these rules. However, it is possible to omit the left-hand side if it is just the anonymous variable '_' by itself, indicating that you don't care about the result. The right-hand side is still evaluated, if only for its side-effects, which is handy, e.g., for adding debugging statements to your code. For instance, here is a variation of the quadratic equation solver which also prints the discriminant after it has been computed:
> using math, system; > solve p q = -p/2+sqrt d,-p/2-sqrt d if d>=0 > when d = p^2/4-q; printf "The discriminant is: %g\n" d; end; > solve 4 2; The discriminant is: 2 -0.585786437626905,-3.41421356237309 > solve 2 4; The discriminant is: -3 solve 2 4
Note that simple rules of the same form lhs = rhs are also used in macro definitions (def), to be discussed in the Macros section. In this case, however, the rule denotes a real rewriting rule, not a pattern binding, hence the left-hand side is mandatory in these rules.
Here are a few examples of simple Pure programs.
The factorial:
fact n = n*fact (n-1) if n>0; = 1 otherwise; let facts = map fact (1..10); facts;
The Fibonacci numbers:
fib n = a when a,b = fibs n end with fibs n = 0,1 if n<=0; = case fibs (n-1) of a,b = b,a+b; end; end; let fibs = map fib (1..30); fibs;
It is worth noting here that in most cases Pure performs tail call optimization so that tail-recursive definitions like the following will be executed in constant stack space (see Stack Size and Tail Recursion in the Caveats and Notes section for more details on this):
// tail-recursive factorial using an "accumulating parameter" fact n = loop 1 n with loop p n = if n>0 then loop (p*n) (n-1) else p; end;
Here is an example showing how constants are defined and used. Constant definitions take pretty much the same form as variable definitions with let (see above), but work more like the definition of a parameterless function whose value is precomputed at compile time:
> extern double atan(double); > const pi = 4*atan 1.0; > pi; 3.14159265358979 > foo x = 2*pi*x; > show foo foo x = 2*3.14159265358979*x; > foo 1; 6.28318530717958
List comprehensions are Pure's main workhorse for generating and processing all kinds of list values. Here's a well-known example, a variation of Erathosthenes' classical prime sieve:
primes n = sieve (2..n) with sieve [] = []; sieve (p:qs) = p : sieve [q | q = qs; q mod p]; end;
(This definition is actually rather inefficient, there are much better albeit more complicated implementations of this sieve.)
For instance:
> primes 100; [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
If you dare, you can actually have a look at the catmap-lambda-if-then-else expression the comprehension expanded to:
> show primes primes n = sieve (2..n) with sieve [] = []; sieve (p:qs) = p:sieve (catmap (\q -> if q mod p then [q] else []) qs) end;
List comprehensions are also a useful device to organize backtracking searches. For instance, here's an algorithm for the n queens problem, which returns the list of all placements of n queens on an n x n board (encoded as lists of n pairs (i,j) with i = 1..n), so that no two queens hold each other in check:
queens n = search n 1 [] with search n i p = [reverse p] if i>n; = cat [search n (i+1) ((i,j):p) | j = 1..n; safe (i,j) p]; safe (i,j) p = ~any (check (i,j)) p; check (i1,j1) (i2,j2) = i1==i2 || j1==j2 || i1+j1==i2+j2 || i1-j1==i2-j2; end;
(Again, this algorithm is rather inefficient, see the examples included in the Pure distribution for a much better algorithm by Libor Spacek.)
As already mentioned, lists can also be evaluated in a "lazy" fashion, by just turning the tail of a list into a future. This special kind of list is also called a stream. Streams enable you to work with infinite lists (or finite lists which are so huge that you would never want to keep them in memory in their entirety). E.g., here's one way to define the infinite stream of all Fibonacci numbers:
> let fibs = fibs 0L 1L with fibs a b = a : fibs b (a+b) & end; > fibs; 0L:#<thunk 0xb5d54320>
Note the & on the tail of the list in the definition of the local fibs function. This turns the result of fibs into a stream, which is required to prevent the function from recursing into samadhi. Also note that we work with bigints in this example because the Fibonacci numbers grow quite rapidly, so with machine integers the values would soon start wrapping around to negative integers.
Streams like these can be worked with in pretty much the same way as with lists. Of course, care must be taken not to invoke "eager" operations such as # (which computes the size of a list) on infinite streams, to prevent infinite recursion. However, many list operations work with infinite streams just fine, and return the appropriate stream results. E.g., the take function (which retrieves a given number of elements from the front of a list) works with streams just as well as with "eager" lists:
> take 10 fibs; 0L:#<thunk 0xb5d54350>
Hmm, not much progress there, but that's just how streams work (or rather they don't, they're lazy bums indeed!). Nevertheless, the stream computed with take is in fact finite and we can readily convert it to an ordinary list, forcing its evaluation:
> list (take 10 fibs); [0L,1L,1L,2L,3L,5L,8L,13L,21L,34L]
An easier way to achieve this is to cut a "slice" from the stream:
> fibs!!(0..10); [0L,1L,1L,2L,3L,5L,8L,13L,21L,34L,55L]
Also note that since we bound the stream to a variable, the already computed prefix of the stream has been memoized, so that this portion of the stream is now readily available in case we need to have another look at it later. By these means, possibly costly reevaluations are avoided, trading memory for execution speed:
> fibs; 0L:1L:1L:2L:3L:5L:8L:13L:21L:34L:55L:#<thunk 0xb5d54590>
Let's take a look at some of the other convenience operations for generating stream values. The prelude defines infinite arithmetic sequences, using inf or -inf to denote an upper (or lower) infinite bound for the sequence, e.g.:
> let u = 1..inf; let v = -1.0:-1.2..-inf; > u!!(0..10); v!!(0..10); [1,2,3,4,5,6,7,8,9,10,11] [-1.0,-1.2,-1.4,-1.6,-1.8,-2.0,-2.2,-2.4,-2.6,-2.8,-3.0]
Other useful stream generator functions are iterate, which keeps applying the same function over and over again, repeat, which just repeats its argument forever, and cycle, which cycles through the elements of the given list:
> iterate (*2) 1!!(0..10); [1,2,4,8,16,32,64,128,256,512,1024] > repeat 1!!(0..10); [1,1,1,1,1,1,1,1,1,1,1] > cycle [0,1]!!(0..10); [0,1,0,1,0,1,0,1,0,1,0]
Moreover, list comprehensions can draw values from streams and return the appropriate stream result:
> let rats = [m,n-m | n=2..inf; m=1..n-1; gcd m (n-m) == 1]; rats; (1,1):#<thunk 0xb5d54950> > rats!!(0..10); [(1,1),(1,2),(2,1),(1,3),(3,1),(1,4),(2,3),(3,2),(4,1),(1,5),(5,1)]
Finally, let's rewrite our prime sieve so that it generates the infinite stream of all prime numbers:
all_primes = sieve (2..inf) with sieve (p:qs) = p : sieve [q | q = qs; q mod p] &; end;
Note that we can omit the empty list case of sieve here, since the sieve now never becomes empty. Example:
> let P = all_primes; > P!!(0..20); [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73] > P!299; 1987
You can also just print the entire stream. This will run forever, so hit Ctrl-C when you get bored:
> using system; > do (printf "%d\n") all_primes; 2 3 5 ...
(Make sure that you really use the all_primes function instead of the P variable to print the stream. Otherwise, because of memoization the stream stored in P will grow with the number of elements printed until memory is exhausted. Calling do on a fresh instance of the stream of primes allows do to get rid of each "cons" cell after having printed the corresponding stream element.)
Pure offers a number of basic matrix operations, such as matrix construction, indexing, slicing, as well as getting the size and dimensions of a matrix (these are briefly described in the Standard Library section below). However, it does not supply built-in support for matrix arithmetic and other linear algebra algorithms. The idea is that these can and should be provided through separate libraries (please check the Pure website for the pure-gsl module which is an ongoing project to provide a full GSL interface for the Pure language).
But Pure's facilities for matrix and list processing also make it easy to roll your own, if desired. First, the prelude provides matrix versions of the common list operations like map, fold, zip etc., which provide a way to implement common matrix operations. E.g., multiplying a matrix x with a scalar a amounts to mapping the function (a*) to x, which can be done as follows:
> a * x::matrix = map (a*) x if ~matrixp a; > 2*{1,2,3;4,5,6}; {2,4,6;8,10,12}
Likewise, matrix addition and other element-wise operations can be realized using zipwith, which combines corresponding elements of two matrices using a given binary function:
> x::matrix + y::matrix = zipwith (+) x y; > {1,2,3;4,5,6}+{1,2,1;3,2,3}; {2,4,4;7,7,9}
Second, matrix comprehensions make it easy to express a variety of algorithms which would typically be implemented using for loops in conventional programming languages. To illustrate the use of matrix comprehensions, here is how we can define an operation to create a square identity matrix of a given dimension:
> eye n = {i==j | i = 1..n; j = 1..n}; > eye 3; {1,0,0;0,1,0;0,0,1}
Note that the i==j term is just a Pure idiom for the Kronecker symbol. Another point worth mentioning here is that the generator clauses of matrix comprehensions alternate between row and column generation automatically, if values are drawn from lists as in the example above. (More precisely, the last generator, which varies most quickly, yields a row, the next-to-last one a column of these row vectors, and so on.) This makes matrix comprehensions resemble customary mathematical notation very closely.
Of course, matrix comprehensions can also draw values from other matrices instead of lists. In this case the block layout of the component matrices is preserved. For instance:
> {x,y|x={1,2};y={a,b;c,d}}; {(1,a),(1,b),(2,a),(2,b);(1,c),(1,d),(2,c),(2,d)}
Note that a matrix comprehension involving filters may fail because the filtered result isn't a rectangular matrix any more. E.g., {2*x|x={1,2,3,-4};x>0} works, as does {2*x|x={-1,2;3,-4};x>0}, but {2*x|x={1,2;3,-4};x>0} doesn't because the rows of the result matrix have different lengths.
As a slightly more comprehensive example (no pun intended!), here is a definition of matrix multiplication in Pure. The building block here is the "dot" product of two vectors which can be defined as follows:
> sum = foldl (+) 0; > dot x::matrix y::matrix = sum $ zipwith (*) (rowvector x) (rowvector y); > dot {1,2,3} {1,0,1}; 4
The general matrix product now boils down to a simple matrix comprehension which just computes the dot product of all rows of x with all columns of y (the rows and cols functions are prelude operations found in matrices.pure):
> x::matrix * y::matrix = {dot u v | u = rows x; v = cols y}; > {0,1;1,0;1,1}*{1,2,3;4,5,6}; {4,5,6;1,2,3;5,7,9}
(For the sake of simplicity, this doesn't do much error checking. In production code you'd check at least the conformance of matrix dimensions, of course.)
Well, that was easy. So let's take a look at a more challenging example, Gaussian elimination, which can be used to solve systems of linear equations. The algorithm brings a matrix into "row echelon" form, a generalization of triangular matrices. The resulting system can then be solved quite easily using back substitution.
Here is a Pure implementation of the algorithm. Note that the real meat is in the pivoting and elimination step (step function) which is iterated over all columns of the input matrix. In each step, x is the current matrix, i the current row index, j the current column index, and p keeps track of the current permutation of the row indices performed during pivoting. The algorithm returns the updated matrix x, row index i and row permutation p.
gauss_elimination x::matrix = p,x when n,m = dim x; p,_,x = foldl step (0..n-1,0,x) (0..m-1) end; // One pivoting and elimination step in column j of the matrix: step (p,i,x) j = if max_x==0 then p,i,x else // updated row permutation and index: transp i max_i p, i+1, {// the top rows of the matrix remain unchanged: x!!(0..i-1,0..m-1); // the pivot row, divided by the pivot element: {x!(i,l)/x!(i,j) | l=0..m-1}; // subtract suitable multiples of the pivot row: {x!(k,l)-x!(k,j)*x!(i,l)/x!(i,j) | k=i+1..n-1; l=0..m-1}} when n,m = dim x; max_i, max_x = pivot i (col x j); x = if max_x>0 then swap x i max_i else x; end with pivot i x = foldl max (0,0) [j,abs (x!j)|j=i..#x-1]; max (i,x) (j,y) = if x<y then j,y else i,x; end;
Please refer to any good textbook on numerical mathematics for a closer description of the algorithm. But here is a brief rundown of what happens in each elimination step: First we find the pivot element in column j of the matrix. (We're doing partial pivoting here, i.e., we only look for the element with the largest absolute value in column j, starting at row i. That's usually good enough to achieve numerical stability.) If the pivot is zero then we're done (the rest of the pivot column is already zeroed out). Otherwise, we bring it into the pivot position (swapping row i and the pivot row), divide the pivot row by the pivot, and subtract suitable multiples of the pivot row to eliminate the elements of the pivot column in all subsequent rows. Finally we update i and p accordingly and return the result.
In order to complete the implementation, we still need the following little helper functions to swap two rows of a matrix (this is used in the pivoting step) and to apply a transposition to a permutation (represented as a list):
swap x i j = x!!(transp i j (0..n-1),0..m-1) when n,m = dim x end; transp i j p = [p!tr k | k=0..#p-1] with tr k = if k==i then j else if k==j then i else k end;
Finally, let us define a convenient print representation of double matrices a la Octave (the meaning of the __show__ function is explained in the Caveats and Notes section):
using system; __show__ x::matrix = strcat [printd j (x!(i,j))|i=0..n-1; j=0..m-1] + "\n" with printd 0 = sprintf "\n%10.5f"; printd _ = sprintf "%10.5f" end when n,m = dim x end if dmatrixp x;
Example:
> let x = dmatrix {2,1,-1,8; -3,-1,2,-11; -2,1,2,-3}; > x; gauss_elimination x; 2.00000 1.00000 -1.00000 8.00000 -3.00000 -1.00000 2.00000 -11.00000 -2.00000 1.00000 2.00000 -3.00000 [1,2,0], 1.00000 0.33333 -0.66667 3.66667 0.00000 1.00000 0.40000 2.60000 0.00000 0.00000 1.00000 -1.00000
Macros are a special type of functions to be executed as a kind of "preprocessing stage" at compile time. In Pure these are typically used to define custom special forms and to perform inlining of function calls and other simple kinds of source-level optimizations.
Whereas the macro facilities of most programming languages simply provide a kind of textual substitution mechanism, Pure macros operate on symbolic expressions and are implemented by the same kind of rewriting rules that are also used to define ordinary functions in Pure. In difference to these, macro rules start out with the keyword def, and only simple kinds of rules without any guards or multiple left-hand and right-hand sides are permitted.
Syntactically, a macro definition looks just like a variable or constant definition, using def in lieu of let or const, but they are processed in a different way. Macros are substituted into the right-hand sides of function, constant and variable definitions. All macro substitution happens before constant substitutions and the actual compilation step. Macros can be defined in terms of other macros (also recursively), and are evaluated using call by value (i.e., macro calls in macro arguments are expanded before the macro gets applied to its parameters).
Here is a simple example, showing a rule which expands saturated calls of the succ function (defined in the prelude) at compile time:
> def succ x = x+1; > foo x::int = succ (succ x); > show foo foo x::int = x+1+1;
Rules like these can be useful to help the compiler generate better code. Note that a macro may have the same name as an ordinary Pure function, which is essential if you want to optimize calls to an existing function, as in the previous example. (Just like ordinary functions, the number of parameters in each rule for a given macro must be the same, but a macro may have a different number of arguments than the corresponding function.)
A somewhat more practical example is the following rule from the prelude, which eliminates saturated instances of the right-associative function application operator:
def f $ x = f x;
Like in Haskell, this low-priority operator is handy to write cascading function calls. With the above macro rule, these will be "inlined" as ordinary function applications automatically. Example:
> foo x = bar $ bar $ 2*x; > show foo foo x = bar (bar (2*x));
Here are two slightly more tricky rules from the prelude, which optimize the case of "throwaway" list comprehensions. This is useful if a list comprehension is evaluated solely for its side effects:
def void (catmap f x) = do f x; def void (listmap f x) = do f x;
Note that the void function simply throws away its argument and returns () instead. The do function applies a function to every member of a list (like map), but throws away all intermediate results and just returns (), which is much more efficient if you don't need those results anyway. These are both defined in the prelude.
Before we delve into this example, a few remarks are in order about the way list comprehensions are implemented in Pure. As already mentioned, list comprehensions are just syntactic sugar; the compiler immediately transforms them to an equivalent expression involving only lambdas and a few other list operations. Note that list comprehensions are essentially equivalent to piles of nested lambdas, filters and maps, but for various reasons they are actually implemented using two special helper operations, catmap and listmap.
The catmap operation combines map and cat; this is needed, in particular, to accumulate the results of nested generators, such as [i,j | i = 1..n; j = 1..m]. The same operation is also used to implement filter clauses, you can see this below in the examples. However, for efficiency simple generators like [2*i | i = 1..n] are translated to a listmap instead (which is basically just map, but works with different aggregate types, so that list comprehensions can draw values from aggregates other than lists, such as matrices).
Now let's see how the rules above transform a list comprehension if we "voidify" it:
> using system; > f = [printf "%g\n" (2^x+1) | x=1..5; x mod 2]; > g = void [printf "%g\n" (2^x+1) | x=1..5; x mod 2]; > show f g f = catmap (\x -> if x mod 2 then [printf "%g\n" (2^x+1)] else []) (1..5); g = do (\x -> if x mod 2 then [printf "%g\n" (2^x+1)] else []) (1..5);
Ok, so the catmap got replaced with a do which is just what we need to make this code go essentially as fast as a for loop in conventional programming languages (up to constant factors, of course). Here's how it looks like when we run the g function:
> g; 3 9 33 ()
It's not all roses, however, since the above macro rules will only get rid of the outermost catmap if the list comprehension binds multiple variables:
> u = void [puts $ str (x,y) | x=1..2; y=1..3]; > show u u = do (\x -> listmap (\y -> puts (str (x,y))) (1..3)) (1..2);
If you're bothered by this, you'll have to apply void recursively, creating a nested list comprehension which expands to a nested do:
> v = void [void [puts $ str (x,y) | y=1..3] | x=1..2]; > show v v = do (\x -> do (\y -> puts (str (x,y))) (1..3)) (1..2);
(It would be nice to have this handled automatically, but the left-hand side of a macro definition must be a simple expression, and thus it's not possible to write a macro which descends recursively into the lambda argument of catmap.)
Macros can also be recursive, in which case they usually consist of multiple rules and make use of pattern-matching like ordinary function definitions. As a simple example, let's implement a Pure version of Lisp's quasiquote which allows you to create a quoted expression from a "template" while substituting variable parts of the template. (For the sake of brevity, our definition is somewhat simplified and does not cover some corner cases. See the Pure distribution for a full version of this example.)
def quasiquote (unquote x) = x; def quasiquote (f@_ (splice x)) = foldl ($) (quasiquote f) x; def quasiquote (f@_ x) = quasiquote f (quasiquote x); def quasiquote x = quote x;
(Note the f@_, which is an anonymous "as" pattern forcing the compiler to recognize f as a function variable, rather than a literal function symbol. See Head = Function in the Caveats and Notes section for an explanation of this trick.)
The first rule above takes care of "unquoting" embedded subterms. The second rule "splices" an argument list into an enclosing function application. The third rule recurses into subterms of a function application, and the fourth and last rule takes care of quoting the "atomic" subterms. Note that unquote and splice themselves are just passive constructor symbols, the real work is done by quasiquote, using foldl at runtime to actually perform the splicing. (Putting off the splicing until runtime makes it possible to splice argument lists computed at runtime.)
If we want, we can also add some syntactic sugar for Lisp weenies. (Note that we cannot have ',' for unquoting, so we use ',$' instead.)
prefix 9 ` ,$ ,@ ; def `x = quasiquote x; def ,$x = unquote x; def ,@x = splice x;
Examples:
> `(2*42+2^12); 2*42+2^12 > `(2*42+,$(2^12)); 2*42+4096.0 > `foo 1 2 (,@'[2/3,3/4]) (5/6); foo 1 2 (2/3) (3/4) (5/6) > `foo 1 2 (,@'args) (5/6) when args = '[2/3,3/4] end; foo 1 2 (2/3) (3/4) (5/6)
We mention in passing here that, technically, Pure macros are just as powerful as (unconditional) term rewriting systems and thus they are Turing-complete. This implies that a badly written macro may well send the Pure compiler into an infinite recursion, which results in a stack overflow at compile time. See the Caveats and Notes section for information on how to deal with these by setting the PURE_STACK environment variable.
The quasiquote macro in the preceding subsection also provides an example of how you can use macros to define your own special forms. This works because the actual evaluation of macro arguments is put off until runtime, and thus we can safely pass them to built-in special forms and other constructs which defer their evaluation at runtime. In fact, the right-hand side of a macro rule may be an arbitrary Pure expression involving conditional expressions, lambdas, binding clauses, etc. These are never evaluated during macro substitution, they just become part of the macro expansion (after substituting the macro parameters).
Here is another useful example of a user-defined special form, the macro timex which employs the system function clock to report the cpu time in seconds needed to evaluate a given expression, along with the computed result:
> using system; > def timex x = (clock-t0)/CLOCKS_PER_SEC,y when t0 = clock; y = x end; > sum = foldl (+) 0L; > timex $ sum (1L..100000L); 0.43,5000050000L
Note that the above definition of timex wouldn't work as an ordinary function definition, since by virtue of Pure's basic eager evaluation strategy the x parameter would have been evaluated already before it is passed to timex, making timex always return a zero time value. Try it!
Here's yet another example, which is handy if you need to trace function calls. (As of Pure 0.22, the interpreter now has its own built-in debugging facility, see Debugging. However, the following macro allows you to trace functions using your own custom output format, and may thus be useful in situations where the built-in debugger is not appropriate.)
using system; def trace f x y = printf "** exit %s: %s -> %s\n" (str f,str x,str y) $$ y when y = printf "** call %s: %s\n: " (str f,str x) $$ gets $$ y end;
This macro is invoked with the function to be traced, the arguments (or whatever you want to be printed as additional debugging information) and the actual function call as parameters. (This is a rather simplistic version, which just prints a prompt on function entry and the final reduction after the call. You can easily make this as elaborate as you like. E.g., you might want to keep track of recursive levels and profiling information, add various interactive commands to selectively enable and disable tracing during the evaluation, etc.)
We can still make this a bit more convenient by introducing the following ordinary function definition:
trace f x = trace f x (f x);
This lets us patch up a call to trace a given function, as shown below, without having to change the definition of the function at all. This trick only works with global functions; for local functions you'll have to add an explicit call of the trace macro to the local definition yourself. Also note that the definition above only works with functions taking a single parameter; see the trace.pure example in the distribution for the full version which can deal with any number of arguments.
// Uncomment this line to trace calls to the 'fact' function. def fact n = trace fact n; // Sample function to be traced. fact n = if n>0 then n*fact(n-1) else 1;
Here's a trace of the fact function obtained in this fashion (hit carriage return after each ':' prompt to proceed with the computation):
> fact 2; ** call fact: 2 : ** call fact: 1 : ** call fact: 0 : ** exit fact: 0 -> 1 ** exit fact: 1 -> 1 ** exit fact: 2 -> 2 2
Note that by just removing the macro definition for fact above, you can make the function run untraced as usual again. This scheme is quite flexible, the only real drawback is that you have to explicitly add some code for each function you want to trace.
Pure macros are lexically scoped, i.e., the binding of symbols in the right-hand-side of a macro definition is determined statically by the text of the definition, and macro parameter substitution also takes into account binding constructs, such as with and when clauses, in the right-hand side of the definition. Macro facilities with these pleasant properties are also known as hygienic macros. They are not susceptible to so-called "name capture," which makes macros in less sophisticated languages bug-ridden and hard to use. (This is explained in more detail in the Hygienic Macros section.)
Pure macros also have their limitations. Specifically, the left-hand side of a macro rule must be a simple expression, just like in ordinary function definitions. This restricts the kinds of expressions which can be rewritten by a macro. But Pure macros are certainly powerful enough for most common preprocessing purposes, while still being robust and easy to use.
Pure is a very terse language by design. Usually you don't declare much stuff, you just define it and be done with it. However, there are a few toplevel constructs which let you declare symbols with special attributes and manage programs consisting of several source modules:
Declares the listed symbols as public or private, respectively. All identifiers without an explicit prior declaration default to public scope, which means that they are visible everywhere in a program. An explicit public declaration of ordinary identifiers is thus rarely needed, unless you want to declare symbols as members of a specific namespace. Symbols can also be declared private, meaning that the symbol is visible only in the namespace it belongs to. This is explained in more detail under Namespaces below.
The public and private keywords can also be used as a prefix in any of the special symbol declarations discussed below, to specify the scope of the declared symbols (if the scope prefix is omitted, it defaults to public). Also note that to declare multiple symbols in a single declaration, you just list them all with whitespace in between. The same applies to the other types of symbol declarations discussed below.
The following "fixity" declarations are available for introducing special operator and constant symbols. This changes the way that these symbols are parsed and thus provides you with a limited means to extend the Pure language at the lexical and syntactical level.
Pure provides you with a theoretically unlimited number of different precedence levels for user-defined infix, prefix and postfix operators. Precedence levels are numbered starting at 0; larger numbers indicate higher precedence. (For practical reasons, the current implementation does require that precedence numbers can be encoded as 24 bit unsigned machine integers, giving you a range from 0 to 16777215, but this should be large enough to incur no real limitations on applications. Also, the operator declarations in the prelude have been set up to leave enough "space" between the "standard" levels so that you can easily sneak in new operator symbols at low, high or intermediate precedences.)
On each precedence level, you can declare (in order of increasing precedence) infix (binary non-associative), infixl (binary left-associative), infixr (binary right-associative), prefix (unary prefix) and postfix (unary postfix) operators. For instance, here is a typical excerpt from the prelude (the full table can be found in the Prelude section of the Pure Library Manual):
infix 1700 < > <= >= == ~= ; infixl 2100 + - ; infixl 2200 * / div mod ; infixr 2400 ^ ; prefix 2500 # ;
One thing worth noting here is that unary minus plays a special role in the syntax. Like in Haskell and following mathematical tradition, unary minus is the only prefix operator symbol which is also used as an infix operator, and is always on the same precedence level as binary minus (whose precedence may be chosen freely in the prelude). Thus, with the standard prelude, -x+y will be parsed as (-x)+y, whereas -x*y is the same as -(x*y). Also note that the notation (-) always denotes the binary minus operator; the unary minus operation can be denoted using the built-in neg function.
Pure also provides unary outfix operators, which work like in Wm Leler's constraint programming language Bertrand. Outfix operators let you define your own bracket structures. The operators must be given as pairs of matching left and right symbols. For instance:
outfix |: :| BEGIN END;
After this declaration you can write bracketed expressions like |:x:| or BEGIN foo, bar END. These are always at the highest precedence level (i.e., syntactically they work like parenthesized expressions). Just like other operators, you can turn outfix symbols into ordinary functions by enclosing them in parentheses, but you have to specify the symbols in matching pairs, such as (BEGIN END).
Pure also has a notation for "nullary" operators, i.e., "operators without operands", which are used to denote special constants. These are introduced using a nonfix declaration, e.g.:
nonfix red green blue;
Syntactically, these work just like ordinary identifiers, so they may stand whereever an identifier is allowed (no parentheses are required to "escape" them). The difference to ordinary identifiers is that nonfix symbols are always interpreted as literals, even if they occur in a variable position on the left-hand side of a rule. So, with the above declaration, you can write something like:
> foo x = case x of red = green; green = blue; blue = red end; > map foo [red,green,blue]; [green,blue,red]
Thus nonfix symbols are pretty much like nullary constructor symbols in languages like Haskell. Non-fixity is just a syntactic attribute, however. Pure doesn't enforce that such values are really "constant", so you can still write a "constructor equation" like the following:
> red = blue; > map foo [red,green,blue]; [blue,blue,blue]
Examples for all types of symbol declarations can be found in the prelude which declares a bunch of standard (arithmetic, relational, logical) operator symbols as well as the list and pair constructors ':' and ',', and a few nonfix symbols (mostly for denoting different kinds of exceptions).
While Pure doesn't offer separate compilation, the using declaration provides a simple but effective way to assemble a Pure program from several source modules. A Pure program is basically the concatenation of all the source modules listed as command line arguments. A using clause instructs the compiler to include the corresponding source script at this point (if it wasn't already included before), which makes available all the definitions in the included script in your program.
For instance, the following declaration causes the math.pure script from the standard library to be included in your program:
using math;
You can also import multiple scripts in one go:
using array, dict, set;
Moreover, Pure provides a notation for qualified module names which can be used to denote scripts located in specific package directories, e.g.:
using examples::libor::bits;
In fact this is equivalent to the following using clause which spells out the real filename of the script:
using "examples/libor/bits.pure";
Both notations can be used interchangeably; the former is usually more convenient, but the latter allows you to denote scripts whose names aren't valid Pure identifiers.
Script identifiers are translated to the corresponding filenames by replacing the '::' symbol with the pathname separator '/' and tacking on the '.pure' suffix. The following table illustrates this with a few examples.
Script identifier | Filename |
---|---|
math | "math.pure" |
examples::libor::bits | "examples/libor/bits.pure" |
::pure::examples::hello | "/pure/examples/hello.pure" |
Note the last example, which shows how an absolute pathname can be denoted using a qualifier starting with '::'.
Unless an absolute pathname is given, the interpreter performs a search to locate the script. It first searches the directory of the script containing the using clause (or the current working directory if the clause was read from standard input, as is the case, e.g., in an interactive session), then the directories named in -I options on the command line (in the given order), then the colon-separated list of directories in the PURE_INCLUDE environment variable, and finally the directory named by the PURELIB environment variable. Note that the current working directory is not searched by default (unless the using clause is read from standard input), but of course you can force this by adding the option -I. to the command line, or by including '.' in the PURE_INCLUDE variable.
For the purpose of comparing script names (to determine whether two scripts are actually the same), the interpreter always uses the canonicalized full pathname of the script, following symbolic links to the destination file (albeit only one level). Thus different scripts with the same basename, such as foo/utils.pure and bar/utils.pure can both be included in the same program (unless they link to the same file).
The directory of the canonicalized pathname is also used when searching other scripts included in a script. This makes it possible to have an executable script with a shebang line in its own directory, which is then executed via a symbolic link placed on the system PATH. In this case the script search performed in using clauses will use the real script directory and thus other required scripts can be located there. This is the recommended practice for installing standalone Pure applications in source form which are to be run directly from the shell.
To facilitate modular development, Pure also provides namespaces as a means to avoid name clashes between symbols, and to keep the global namespace tidy and clean. Namespaces serve as containers holding logical groups of identifiers and other symbols. Inside each namespace, symbols must be unique, but the same symbol may be used to denote different objects (variables, functions, etc.) in different namespaces.
The global namespace is always available. By default, new symbols are created in this namespace, which is also called the default namespace. Additional namespaces can be created with the namespace declaration, which also switches to the given namespace (makes it the current namespace), so that subsequent symbol declarations create symbols in that namespace rather than the default one. The current namespace applies to all kinds of symbol declarations, including operator and constant symbol declarations, as well as extern declarations (the latter are described in the C Interface section).
For instance, in order to create two symbols with the same print name foo in two different namespaces foo and bar, you can write:
namespace foo; public foo; foo x = x+1; namespace bar; public foo; foo x = x-1; namespace;
Note that just the namespace keyword by itself in the last line switches back to the default namespace. We can now refer to the symbols we just defined using qualified symbols of the form namespace::symbol:
> foo::foo 99; 100 > bar::foo 99; 98
This avoids any potential name clashes, since the qualified identifier notation always makes it clear which namespace the given identifier belongs to.
Similar to the using declaration, a namespace declaration accepts either identifiers or double-quoted strings as namespace names. E.g., the following two declarations are equivalent:
namespace foo; namespace "foo";
The latter form also allows more descriptive labels which aren't identifiers, e.g.:
namespace "Private stuff, keep out!";
(Note that the namespace prefix in a qualified identifier must be a legal identifier, so it isn't possible to access symbols in namespaces with such descriptive labels in a direct fashion; the only way to get at the symbols in this case is to use a namespace or using namespace declaration, as described below.)
Since it is rather inconvenient if you always have to write identifiers in their qualified form, Pure allows you to specify a list of search namespaces which are used to look up symbols not in the default or the current namespace. This is done with the using namespace declaration, as follows:
> using namespace foo; > foo 99; 100 > using namespace bar; > foo 99; 98
The using namespace declaration also lets you search multiple namespaces simultaneously:
using namespace foo, bar;
However, this requires that a symbol exists in at most one of the listed namespaces, otherwise you get an error message:
> using namespace foo, bar; > foo 99; <stdin>, line 15: symbol 'foo' is ambiguous here
In such a case you have to use the appropriate namespace qualifier to resolve the name clash:
> foo::foo 99; 100
A using namespace declaration without any namespace arguments gets you back to the default empty list of search namespaces:
using namespace;
The precise rules for looking up symbols are as follows. The compiler searches for symbols first in the current namespace (if any), then in the currently active search namespaces (if any), and finally in the default (i.e., the global) namespace, in that order. This automatic lookup can be bypassed by using an absolute namespace qualifier of the form ::foo::bar. In particular, ::bar always denotes the symbol bar in the default namespace, while ::foo::bar denotes the symbol bar in the foo namespace. (Normally, the latter kind of notation is only needed if you have to deal with nested namespaces, see Hierarchical Namespaces below.)
If no existing symbol is found, a new symbol is created, implicitly declaring the identifier as a public symbol with default attributes. New unqualified symbols are always created in the default namespace, unless you explicitly declare them (in which case they become members of the current namespace, as explained above). New qualified symbols are created in the given namespace, which must be the current namespace. This makes it possible to avoid explicit symbol declarations in the most common case of ordinary, public identifiers. E.g., we could have written the above example simply as follows:
namespace foo; foo::foo x = x+1; namespace bar; bar::foo x = x-1; namespace;
Note that, as a little safety measure against silly typos, the compiler insists that new qualified symbols must be introduced in their "home" namespace, otherwise it complains about an undeclared symbol:
> namespace; > foo::bar x = 1/x; <stdin>, line 7: undeclared symbol 'foo::bar'
To avoid such errors, you either have to use an explicit declaration or make sure that the right namespace is current when introducing the symbol.
Explicit declarations must be used if you want to introduce special operator and constant symbols. This works just like in the default namespace, except that you add the appropriate namespace declaration before declaring the symbols. For instance, here is how you can create a new + operation which multiplies its operands rather than adding them:
> namespace my; > infixl 6 +; > x+y = x*y; > 5+7; 35
Note that the new + operation really belongs to the namespace we created. The + operation in the default namespace works as before, and in fact you can use qualified symbols to pick the version that you need:
> namespace; > 5+7; 12 > 5 ::+ 7; 12 > 5 my::+ 7; 35
Pure also allows you to have private symbols, as a means to hide away internal operations which shouldn't be accessed directly outside the namespace in which they are declared. The scope of a private symbol is confined to its namespace, i.e., the symbol is only visible when its "home" namespace is current. Symbols are declared private by using the private keyword in the symbol declaration:
> namespace secret; > private baz; > // 'baz' is a private symbol in namespace 'secret' here > baz x = 2*x; > // you can use 'baz' just like any other symbol here > baz 99; 198 > namespace;
Note that, at this point, secret::baz is now invisible, even if you have secret in the search namespace list:
> using namespace secret; > // this actually creates a 'baz' symbol in the default namespace: > baz 99; baz 99 > secret::baz 99; <stdin>, line 27: symbol 'secret::baz' is private here
The only way to bring the symbol back into scope is to make the secret namespace current again:
> namespace secret; > baz 99; 198 > secret::baz 99; 198
Namespace identifiers can themselves be qualified identifiers in Pure, which enables you to introduce a hierarchy of namespaces. This is useful, e.g., to group related namespaces together under a common "umbrella" namespace:
namespace my; namespace my::old; my::old::foo x = x+1; namespace my::new; my::new::foo x = x-1;
Note that the namespace my, which serves as the parent namespace, must be created before the my::old and my::new namespaces, even if it does not contain any symbols of its own. After these declarations, the my::old and my::new namespaces are part of the my namespace and will be considered in name lookup accordingly, so that you can write:
> using namespace my; > old::foo 99; 100 > new::foo 99; 98
This works pretty much like a hierarchy of directories and files, where the namespaces play the role of the directories (with the default namespace as the root directory), the symbols in each namespace correspond to the files in a directory, and the using namespace declaration functions similar to the shell's PATH variable.
Sometimes it is necessary to tell the compiler to use a symbol in a specific namespace, bypassing the usual symbol lookup mechanism. For instance, suppose that we introduce another global old namespace and define yet another version of foo in that namespace:
namespace old; public foo; foo x = 2*x; namespace;
Now, if we want to access that function, with my still active as the search namespace, we cannot simply refer to the new function as old::foo, since this name will resolve to my::old::foo instead. As a remedy, the compiler accepts an absolute qualified identifier of the form ::old::foo. This bypasses name lookup and thus always yields exactly the symbol in the given namespace (if it exists; as mentioned previously, the compiler will complain about an undeclared symbol otherwise):
> old::foo 99; 100 > ::old::foo 99; 198
Also note that, as a special case of the absolute qualifier notation, ::foo always denotes the symbol foo in the default namespace.
Pure also offers a useful exception handling facility. To raise an exception, you just invoke the built-in function throw with the value to be thrown as the argument. To catch an exception, you use the built-in special form catch with the exception handler (a function to be applied to the exception value) as the first and the expression to be evaluated as the second (call-by-name) argument. For instance:
> catch error (throw hello_world); error hello_world
Exceptions are also generated by the runtime system if the program runs out of stack space, when a guard does not evaluate to a truth value, and when the subject term fails to match the pattern in a pattern-matching lambda abstraction, or a let, case or when construct. These types of exceptions are reported using the symbols stack_fault, failed_cond and failed_match, respectively, which are declared as constant symbols in the standard prelude. You can use catch to handle these kinds of exceptions just like any other. For instance:
> fact n = if n>0 then n*fact(n-1) else 1; > catch error (fact foo); error failed_cond > catch error (fact 100000); error stack_fault
(You'll only get the latter kind of exception if the interpreter does stack checks, see the discussion of the PURE_STACK environment variable in the Caveats and Notes section.)
Note that unhandled exceptions are reported by the interpreter with a corresponding error message:
> fact foo; <stdin>, line 2: unhandled exception 'failed_cond' while evaluating 'fact foo'
Exceptions also provide a way to handle asynchronous signals. Pure's system module provides symbolic constants for common POSIX signals and also defines the operation trap which lets you rebind any signal to a signal exception. For instance, the following lets you handle the SIGQUIT signal:
> using system; > trap SIG_TRAP SIGQUIT;
You can also use trap to just ignore a signal or revert to the system's default handler (which might take different actions depending on the type of signal, see signal(7) for details):
> trap SIG_IGN SIGQUIT; // signal is ignored > trap SIG_DFL SIGQUIT; // reinstalls the default signal handler
Note that when the interpreter runs interactively, for convenience most standard termination signals (SIGINT, SIGTERM, etc.) are already set up to produce corresponding Pure exceptions of the form signal SIG where SIG is the signal number.
Last but not least, exceptions can also be used to implement non-local value returns. For instance, here's a variation of our n queens algorithm which only returns the first solution. Note the use of throw in the recursive search routine to bail out with a solution as soon as we found one. The value thrown there is caught in the main routine. Also note the use of void in the second equation of search. This effectively turns the list comprehension into a simple loop which suppresses the normal list result and just returns () instead. Thus, if no value gets thrown then the function regularly returns with () to indicate that there is no solution.
queens n = catch reverse (search n 1 []) with search n i p = throw p if i>n; = void [search n (i+1) ((i,j):p) | j = 1..n; safe (i,j) p]; safe (i,j) p = ~any (check (i,j)) p; check (i1,j1) (i2,j2) = i1==i2 || j1==j2 || i1+j1==i2+j2 || i1-j1==i2-j2; end;
E.g., let's compute a solution for a standard 8x8 board:
> queens 8; [(1,1),(2,5),(3,8),(4,6),(5,3),(6,7),(7,2),(8,4)]
Accessing C functions from Pure programs is dead simple. You just need an extern declaration of the function, which is a simplified kind of C prototype. The function can then be called in Pure just like any other. For instance, the following commands, entered interactively in the interpreter, let you use the sin function from the C library (of course you could just as well put the extern declaration into a script):
> extern double sin(double); > sin 0.3; 0.29552020666134
An extern declaration can also be prefixed with a public/private scope specifier:
private extern double sin(double);
Multiple prototypes can be given in one extern declaration, separating them with commas:
extern double sin(double), double cos(double), double tan(double);
For clarity, the parameter types can also be annotated with parameter names, e.g.:
extern double sin(double x);
Parameter names in prototypes only serve informational purposes and are for the human reader; they are effectively treated as comments by the compiler.
The interpreter makes sure that the parameters in a call match; if not, the call is treated as a normal form expression by default, which gives you the opportunity to extend the external function with your own Pure equations (see below). The range of supported C types is a bit limited right now (void, bool, char, short, int, long, float, double, as well as arbitrary pointer types, i.e.: void*, char*, etc.), but in practice these should cover most kinds of calls that need to be done when interfacing to C libraries.
Single precision float arguments and return values are converted from/to Pure's double precision floating point numbers automatically.
A variety of C integer types (char, short, int, long) are provided which are converted from/to the available Pure integer types in a straightforward way. In addition, the synonyms int8, int16 and int32 are provided for char, short and int, respectively, and int64 denotes 64 bit integers (a.k.a. ISO C99 long long). Note that long is equivalent to int32 on 32 bit systems, whereas it is the same as int64 on most 64 bit systems. All integer parameters take both Pure ints and bigints as actual arguments; truncation or sign extension is performed as needed, so that the C interface behaves as if the argument was "cast" to the C target type. Returned integers use the smallest Pure type capable of holding the result, i.e., int for the C char, short and int types, bigint for int64.
To make it easier to interface to various system routines, there's also a special size_t integer type which usually is 4 bytes on 32 bit and 8 bytes on 64 bit systems.
Pure considers all integers as signed quantities, but it is possible to pass unsigned integers as well (if necessary, you can use a bigint to pass positive values which are too big to fit into a machine int). Also note that when an unsigned integer is returned by a C routine, which is too big to fit into the corresponding signed integer type, it will "wrap around" and become negative. In this case, depending on the target type, you can use the ubyte, ushort, uint, ulong and uint64 functions provided by the prelude to convert the result back to an unsigned quantity.
Concerning the pointer types, char* is for string arguments and return values which need translation between Pure's internal utf-8 representation and the system encoding, while void* is for any generic kind of pointer (including strings, which are not translated when passed/returned as void*). Any other kind of pointer (except expr* and the GSL matrix pointer types, which are discussed below) is effectively treated as void* right now, although in a future version the interpreter may keep track of the type names for the purpose of checking parameter types.
The expr* pointer type is special; it indicates a Pure expression parameter or return value which is just passed through unchanged. All other types of values have to be "unboxed" when they are passed as arguments (i.e., from Pure to C) and "boxed" again when they are returned as function results (from C to Pure). All of this is handled by the runtime system in a transparent way, of course.
The matrix pointer types dmatrix*, cmatrix* and imatrix* can be used to pass double, complex double and int matrices to GSL functions taking pointers to the corresponding GSL types (gsl_matrix, gsl_matrix_complex and gsl_matrix_int) as arguments or returning them as results. Note that there is no marshalling of Pure's symbolic matrix type, as these aren't supported by GSL anyway. Also note that matrices are always passed by reference. If you need to pass a matrix as an output parameter of a GSL matrix routine, you can either create a zero matrix or a copy of an existing matrix. The prelude provides various operations for that purpose (in particular, see the dmatrix, cmatrix, imatrix and pack functions in matrices.pure). For instance, here is how you can quickly wrap up GSL's double matrix addition function in a way that preserves value semantics:
> extern int gsl_matrix_add(dmatrix*, dmatrix*); > x::matrix + y::matrix = gsl_matrix_add x y $$ x when x = pack x end; > let x = dmatrix {1,2,3}; let y = dmatrix {2,3,2}; x; y; x+y; {1.0,2.0,3.0} {2.0,3.0,2.0} {3.0,5.0,5.0}
Most GSL matrix routines can be wrapped in this fashion quite easily. A ready-made GSL interface providing access to all of GSL's numeric functions is in the works; please check the Pure website for details.
For convenience, it is also possible to pass a numeric matrix for a short*, int*, float* or double* parameter. The required conversions are done automatically, on the fly, and the matrix data is copied to temporary storage in order to preserve value sematics.
In addition, any kind of matrix (including symbolic matrices) can also be passed for a generic void* pointer. In this case no conversions are done and a pointer to the raw matrix data is passed, which allows the matrix to be modified in-place. Similarly, void* also allows you to pass a bigint argument as a raw mpz_t value. This makes it possible to call most GMP integer routines directly from Pure.
As already mentioned, it is possible to augment an external C function with ordinary Pure equations, but in this case you have to make sure that the extern declaration of the function comes first. For instance, we might want to extend our imported sin function with a rule to handle integers:
> extern double sin(double); > sin 0.3; 0.29552020666134 > sin 0; sin 0 > sin x::int = sin (double x); > sin 0; 0.0
Sometimes it is preferable to replace a C function with a wrapper function written in Pure. In such a case you can specify an alias under which the original C function is known to the Pure program, so that you can still call the C function from the wrapper. An alias is introduced by terminating the extern declaration with a clause of the form = alias. For instance:
> extern double sin(double) = c_sin; > sin x::double = c_sin x; > sin x::int = c_sin (double x); > sin 0.3; sin 0; 0.29552020666134 0.0
As an alternative, you can also declare the C function in a special namespace (cf. Namespaces in the Declarations section):
> namespace c; > extern double sin(double); > c::sin 0.3; 0.29552020666134
Note that the namespace qualification only affects the Pure side; the underlying C function is still called under the unqualified name as usual. The way in which such qualified externs are accessed is the same as for ordinary qualified symbols. In particular, the using namespace declaration applies as usual, and you can declare such symbols as private if needed. It is also possible to combine a namespace qualifier with an alias:
> namespace c; > extern double sin(double) = mysin; > c::mysin 0.3; 0.29552020666134
External C functions are resolved by the LLVM runtime, which first looks for the symbol in the C library and Pure's runtime library (or the interpreter executable, if the interpreter was linked statically). Thus all C library and Pure runtime functions are readily available in Pure programs. Other functions can be provided by adding them to the runtime, or by linking them into the runtime or the interpreter executable. Better yet, you can just "dlopen" shared libraries at runtime with a special form of the using clause:
using "lib:libname[.ext]";
For instance, if you want to call the functions from library libxyz directly from Pure:
using "lib:libxyz";
After this declaration the functions from the given library will be ready to be imported into your Pure program by means of corresponding extern declarations.
Shared libraries opened with using clauses are searched for in the same way as source scripts (see section Modules and Imports above), using the -L option and the PURE_LIBRARY environment variable in place of -I and PURE_INCLUDE. If the library isn't found by these means, the interpreter will also consider other platform-specific locations searched by the dynamic linker, such as the system library directories and LD_LIBRARY_PATH on Linux. The necessary filename suffix (e.g., .so on Linux or .dll on Windows) will be supplied automatically when needed. Of course you can also specify a full pathname for the library if you prefer that. If a library file cannot be found, or if an extern declaration names a function symbol which cannot be resolved, an appropriate error message is printed.
Pure comes with a collection of Pure library modules, which includes the standard prelude (loaded automatically at startup time) and some other modules which can be loaded explicitly with a using clause. The prelude offers the necessary functions to work with the built-in types (including arithmetic and logical operations) and to do most kind of list processing you can find in ML- and Haskell-like languages. It also provides a collection of basic string and matrix operations. Please refer to the Pure Library Manual for details on the provided operations. Here is a very brief summary of some of the prelude operations which, besides the usual arithmetic and logical operators, are probably used most frequently:
The prelude also offers support operations for the implementation of list and matrix comprehensions, as well as the customary list operations like head, tail, drop, take, filter, map, foldl, foldr, scanl, scanr, zip, unzip, etc., which make list programming so much fun in modern FPLs. In Pure, these also work on strings as well as matrices, although, for reasons of efficiency, these data structures are internally represented as arrays.
Besides the prelude, Pure's standard library also comprises a growing number of additional library modules which we can only mention in passing here. In particular, the math.pure module provides additional mathematical functions as well as Pure's complex and rational number data types. Common container data structures like sets and dictionaries are implemented in the set.pure and dict.pure modules, among others. Moreover, the (beginnings of a) system interface can be found in the system.pure module. In particular, this module also provides operations to do basic C-style I/O, including printf and scanf. These are all described in much more detail in the Pure Library Manual.
In interactive mode, the interpreter reads definitions and expressions and processes them as usual. You can use the -i option to force interactive mode when invoking the interpreter with some script files. Additional scripts can be loaded interactively using either a using declaration or the interactive run command (see the description of the run command below for the differences between these). Or you can just start typing away, entering your own definitions and expressions to be evaluated.
The input language is just the same as for source scripts, and hence individual definitions and expressions must be terminated with a semicolon before they are processed. For instance, here is a simple interaction which defines the factorial and then uses that definition in some evaluations. Input lines begin with '>', which is the interpreter's default command prompt:
> fact 1 = 1; > fact n = n*fact (n-1) if n>1; > let x = fact 10; x; 3628800 > map fact (1..10); [1,2,6,24,120,720,5040,40320,362880,3628800]
As indicated, in interactive mode the normal forms of toplevel expressions are printed after each expression is entered. We also call this the read-eval-print loop. Normal form expressions are usually printed in the same form as you'd enter them. However, there are a few special kinds of objects like anonymous closures, thunks ("lazy" values to be evaluated when needed) and pointers which don't have a textual representation in the Pure syntax and will be printed in the format #<object description> by default. It is also possible to override the print representation of any kind of expression by means of the __show__ function, see the Caveats and Notes section for details.
Online help is available in the interpreter with the interactive help command, see Interactive Commands below. You need to have a html browser installed for that. By default, the help command uses w3m(1), but you can change this by setting either the PURE_HELP or the BROWSER environment variable accordingly.
When invoked without arguments, the help command displays this manual:
> help
The help command also accepts a parameter which lets you specify a topic to show in the accompanying Pure Library manual, e.g.:
> help foldl
The help files distributed with the Pure interpreter are located in the Pure library directory (/usr/local/lib/pure by default). You can install additional documentation in html format in this directory, and look up topics in those help files with a command like the following:
> help mydoc#foo
Here mydoc is the basename of your help file (library path and .html suffix are supplied automatically), and foo can be any link target in the document (as specified with a <a name=...> tag or an id attribute in the html source). To just read the mydoc.html file without specifying a target, type the following:
> help mydoc#
Note that just help mydoc wouldn't work, since it would look for an entry mydoc in the standard library documentation.
If the basename of the help file is missing, it defaults to the library manual, so help #foldl does the same as help foldl, and just help # is a convenient shorthand to bring up the library manual without a specific topic. Of course, this syntax also works for looking up sections in the Pure manual:
> help pure#declarations
Note that the docutils tools used to generate the html source of the Pure manual mangle the section titles so that they are in lowercase and blanks are replaced with hyphens. So to look up the present section in this manual you'd have to type:
> help pure#online-help
You can also point the help browser to a proper URL, either a local file or some website (provided that your browser program can handle these). For instance:
> help file:mydoc.html#foo > help http://pure-lang.googlecode.com
When running interactively, the interpreter accepts a number of special commands useful for interactive purposes. Here is a quick rundown of the currently supported operations:
Dump a snapshot of the current function, macro, constant and variable definitions in Pure syntax to a text file. This works similar to the show command (see below), but writes the definitions to a file. The default output file is .pure in the current directory, which is then reloaded automatically the next time the interpreter starts up in interactive mode in the same directory. This provides a quick-and-dirty way to save an interactive session and have it restored later, but note that this isn't perfect yet. In particular, declarations of extern symbols won't be saved unless they're specified explicitly, and some objects like closures, thunks and pointers don't have a textual representation from which they could be reconstructed. To handle these, you'll probably have to prepare a corresponding .purerc file yourself, see Interactive Startup below.
A different filename can be specified with the -n option, which expects the name of the script to be written in the next argument, e.g: dump -n myscript.pure. You can then edit that file and use it as a starting point for an ordinary script or a .purerc file, or you can just run the file with the run command (see below) to restore the definitions in a subsequent interpreter session.
Note that these special commands are only recognized at the beginning of the interactive command line (they are not reserved keywords of the Pure language). Thus it's possible to "escape" identifiers looking like commands by entering a space at the beginning of the line.
Another convenience for interactive usage is the ans function, which retrieves the most recent result printed in interactive mode. For instance:
> fact n = if n<=1 then 1 else n*fact (n-1); > map fact (1..10); [1,2,6,24,120,720,5040,40320,362880,3628800] > scanl (+) 0 ans; [0,1,3,9,33,153,873,5913,46233,409113,4037913]
Note that ans is just an ordinary function, defined in the prelude, not a special command. However, there is a special clear ans command which purges the ans value. This is useful, e.g., if you got a huge result which you want to erase from memory before starting the next computation.
> clear ans > ans; ans
The clear, dump and show commands all accept the following options for specifying a subset of symbols and definitions on which to operate. Options may be combined, thus, e.g., show -mft is the same as show -m -f -t. Some options specify optional numeric parameters; these must follow immediately behind the option character if present, as in -t0.
-c | Selects defined constants. |
-f | Selects defined functions. |
-g | Indicates that the following symbols are actually shell glob patterns and that all matching symbols should be selected. |
-m | Select defined macros. |
-pflag | Select only private symbols if flag is nonzero (the default), otherwise (flag is zero) select only public symbols. If this option is omitted then both private and public symbols are selected. |
-tlevel | Select symbols and definitions at the given "level" of definitions and above. This is described in more detail below. Briefly, the executing program and all imported modules (including the prelude) are at level 0, while "temporary" definitions made interactively in the interpreter are at level 1 and above. Thus a level of 1 restricts the selection to all temporary definitions, whereas 0 indicates all definitions (i.e., everything, including the prelude). If level is omitted, it defaults to the current definitions level. |
-v | Select defined variables. |
In addition, the -h option prints a short help message describing all available options of the command at hand.
If none of the -c, -f, -m and -v options are specified, then all kinds of symbols (constants, functions, macros and variables) are selected, otherwise only the specified categories will be considered.
A reasonable default is used if the -t option is omitted. By default, if no symbols are specified, only temporary definitions are considered, which corresponds to -t1. Otherwise the command applies to all corresponding definitions, no matter whether they belong to the executing program, the prelude, or some temporary level, which has the same effect as -t0. This default choice can be overridden by specifying the desired level explicitly.
As a special case, just clear (without any other options or symbol arguments) always backs out to the previous definitions level (instead of level #1). This is inconsistent with the rules set out above, but is implemented this way for convenience and backward compatibility. Thus, if you really want to delete all your temporary definitions, use clear -t1 instead. When used in this way, the clear command will only remove temporary definitions; if you need to remove definitions at level #0, you must specify those symbols explicitly.
Note that clear -g * will have pretty much the same disastrous consequences as the Unix command rm -rf *, so don't do that. Also note that a macro or function symbol may well have defining equations at different levels, in which case a command like clear -tn foo might only affect some part of foo's definition. The dump and show commands work analogously (albeit less destructively). See Definition Levels below for some examples.
The show command can be used to obtain information about defined symbols in various formats. Besides the common selection options discussed above, this command recognizes the following additional options for specifying the content to be listed and the format to use.
-a | Disassembles pattern matching automata. Works like the -v4 option of the interpreter. |
-d | Disassembles LLVM IR, showing the generated LLVM assembler code of a function. Works like the -v8 option of the interpreter. |
-e | Annotate printed definitions with lexical environment information (de Bruijn indices, subterm paths). Works like the -v2 option of the interpreter. |
-l | Long format, prints definitions along with the summary symbol information. This implies -s. |
-s | Summary format, print just summary information about listed symbols. |
Symbols are always listed in lexicographic order. Note that some of the options (in particular, -a and -d) may produce excessive amounts of information. By setting the PURE_MORE environment variable accordingly, you can specify a shell command to be used for paging, usually more(1) or less(1).
For instance, to list all temporary definitions made in an interactive session, simply say:
> show
You can also list a specific symbol, no matter whether it comes from the interactive command line, the executing script or the prelude:
> show foldl foldl f a x::matrix = foldl f a (list x); foldl f a s::string = foldl f a (chars s); foldl f a [] = a; foldl f a (x:xs) = foldl f (f a x) xs;
Wildcards can be used with the -g option, which is useful if you want to print an entire family of related functions, e.g.:
> show -g foldl* foldl f a x::matrix = foldl f a (list x); foldl f a s::string = foldl f a (chars s); foldl f a [] = a; foldl f a (x:xs) = foldl f (f a x) xs; foldl1 f x::matrix = foldl1 f (list x); foldl1 f s::string = foldl1 f (chars s); foldl1 f (x:xs) = foldl f x xs;
Or you can just specify multiple symbols as follows (this also works with multiple glob patterns when you add the -g option):
> show min max max x y = if x>=y then x else y; min x y = if x<=y then x else y;
You can also select symbols by category. E.g., the following command shows summary information about all the variable symbols along with their current values (using the "long" format):
> show -lvg * argc var argc = 0; argv var argv = []; gsl_version var gsl_version = "1.9"; sysinfo var sysinfo = "i686-pc-linux-gnu"; version var version = "0.17"; 5 variables
Or you can list just private symbols of the namespace foo, as follows:
> show -pg foo::*
The following command will list each and every symbol that's currently defined (instead of -g * you can also use the -t0 option):
> show -g *
This usually produces a lot of output and is rarely needed, unless you'd like to browse through an entire program including all library imports. (In that case you might consider to use the dump command instead, which writes the definitions to a file which can then be loaded into a text editor for easier viewing. This may occasionally be useful for debugging purposes.)
Finally, there are two alternate forms of the show command: show namespace which lists the current and search namespaces, and show namespaces which lists all declared namespaces. These come in handy if you have forgotten what namespaces are currently active and which other namespaces are available in your program. For instance:
> show namespace > show namespaces namespace C; namespace matrix; > using namespace C; > namespace my; > show namespace namespace my; using namespace C;
To help with incremental development, the interpreter offers some facilities to manipulate the current set of definitions interactively. To these ends, definitions are organized into different subsets called levels. As already mentioned, the prelude, as well as other source programs specified when invoking the interpreter, are always at level 0, while the interactive environment starts at level 1.
Each save command introduces a new temporary level, and each subsequent clear command (without any arguments) "pops" the definitions on the current level and returns you to the previous one (if any). This gives you a "stack" of temporary environments which enables you to "plug and play" in a (more or less) safe fashion, without affecting the rest of your program. For all practical purposes, this stack is unlimited, so that you can create as many levels as you like. Example:
> foo (x:xs) = x+foo xs; > foo [] = 0; > show foo (x:xs) = x+foo xs; foo [] = 0; > foo (1..10); 55 > clear This will clear all temporary definitions at level #1. Continue (y/n)? y > show > foo (1..10); foo [1,2,3,4,5,6,7,8,9,10]
We've seen already that normally, if you enter a sequence of equations, they will be recorded in the order in which they were written. However, it is also possible to override definitions in lower levels with the override command:
> foo (x:xs) = x+foo xs; > foo [] = 0; > show foo (x:xs) = x+foo xs; foo [] = 0; > foo (1..10); 55 > save save: now at temporary definitions level #2 > override > foo (x:xs) = x*foo xs; > show foo (x:xs) = x*foo xs; foo (x:xs) = x+foo xs; foo [] = 0; > foo (1..10); warning: rule never reduced: foo (x:xs) = x+foo xs; 0
Note that the equation foo (x:xs) = x*foo xs; was inserted before the previous foo (x:xs) = x+foo xs; rule, which is at level #1. (The latter equation is now "shadowed" by the rule we just entered, hence the compiler warns us that this rule can't be reduced any more.)
Even in override mode, new definitions will be added after other definitions at the current level. This allows us to just continue adding more high-priority definitions overriding lower-priority ones:
> foo [] = 1; > show foo (x:xs) = x*foo xs; foo [] = 1; foo (x:xs) = x+foo xs; foo [] = 0; > foo (1..10); warning: rule never reduced: foo (x:xs) = x+foo xs; warning: rule never reduced: foo [] = 0; 3628800
Again, the new equation was inserted above the existing lower-priority rules, but below our previous foo (x:xs) = x*foo xs; equation entered at the same level. As you can see, we have now effectively replaced our original definition of foo with a version that calculates list products instead of sums, but of course we can easily go back one level to restore the previous definition:
> clear This will clear all temporary definitions at level #2. Continue (y/n)? y clear: now at temporary definitions level #1 clear: override mode is on > show foo (x:xs) = x+foo xs; foo [] = 0; > foo (1..10); 55
Note that clear reminded us that override mode is still enabled (save will do the same if override mode is on while pushing a new definitions level). To turn it off again, use the underride command. This will revert to the normal behaviour of adding new equations below existing ones:
> underride
Finally, it's also possible to use clear to back out multiple levels at once, if you specify the target level to be cleared with the -t option. For instance:
> save save: now at temporary definitions level #2 > let bar = 99; > show let bar = 99; foo (x:xs) = x+foo xs; foo [] = 0; > clear -t1 // this scraps all our scribblings! This will clear all temporary definitions at level #1 and above. Continue (y/n)? y clear: now at temporary definitions level #1 > show >
The interpreter provides a simple but reasonably convenient symbolic debugging facility when running interactively. To make this work, you have to specify the -g option when invoking the interpreter:
$ pure -g
This option disables tail call optimization (see Stack Size and Tail Recursion) to make it easier to debug programs. It also causes special debugging code to be generated which will make your program run much slower. Therefore the -g option should only be used if you actually need the debugger.
To debug a given function, just set a breakpoint with the break command. For instance:
> fact n::int = if n>0 then n*fact (n-1) else 1; > break fact > fact 1; ** [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 1 (Type 'h' for help.) : ** [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 0 : ++ [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 0 --> 1 ** [2] (*): x::int*y::int = x*y; x = 1; y = 1 : ++ [2] (*): x::int*y::int = x*y; x = 1; y = 1 --> 1 ++ [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 1 --> 1 1
Lines beginning with ** indicate that the evaluation was interrupted to show the rule (or external) which is currently being considered, along with the current depth of the call stack, the invoked function and the values of parameters and other local variables in the current lexical environment. In contrast, the prefix ++ denotes reductions which were actually performed during the evaluation and the results that were returned by the function call (printed as --> return value).
Sometimes you might also see funny symbols like #<case>, #<when> or #<closure> instead of the function name. These indicate lambdas and the special variable-binding environments, which are all implemented as anonymous closures in Pure. Also note that the debugger doesn't know about the argument names of external functions (which are optional in Pure and not recorded anywhere), so it will display the generic names x1, x2 etc. instead.
At the debugger prompt ':' you can enter various special debugger commands, or just keep on hitting the carriage return key to walk through an evaluation step by step, as we did in the example above. (Command line editing using readline works as usual at the debugger prompt, if it is enabled.) The usual commands are provided to walk through an evaluation, print and navigate the call stack, step over the current call, or continue the evaluation unattended until you hit another breakpoint. If you know other source level debuggers like gdb then you should feel right at home. You can type h at the debugger prompt to print the following list:
: h Debugger commands: a auto: step through the entire program, run unattended c [f] continue until next breakpoint, or given function f h help: print this list n next step: step over reduction p [n] print rule stack (n = number of frames) r run: finish evaluation without debugger s single step: step into reduction t, b move to the top or bottom of the rule stack u, d move up or down one level in the rule stack x exit the interpreter (after confirmation) . reprint current rule ! cmd shell escape ? expr evaluate expression <cr> single step (same as 's') <eof> step through program, run unattended (same as 'a')
The command syntax is very simple. Besides the commands listed above you can also enter comment lines (// comment text) which will just be ignored. Extra arguments on commands which don't expect any will generally be ignored as well. The single letter commands all have to be separated from any additional parameters with whitespace, whereas the '!', '?' and '.' commands count as word delimiters and can thus be followed immediately by an argument. For convenience, the '?' command can also be omitted if the expression to be evaluated doesn't start with a single letter or one of the special punctuation commands.
The debugger can be exited or suspended in the following ways:
At the debugger prompt, you can use the u ("up"), d ("down"), t ("top") and b ("bottom") commands to move around on the current call stack. The p command prints a range of the call stack centered around the currently selected stack frame, which is indicated with the >> tag, whereas ** denotes the current bottom of the stack (which is the rule to be executed with the single step command). The p command can also be followed by a numeric argument which indicates the number of stack frames to be printed (this will then become the default for subsequent invocations of p). The n command steps over the call selected with the stack navigation commands. For instance:
> fact 3; ** [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 3 : c * ** [4] (*): x::int*y::int = x*y; x = 1; y = 1 : p [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 3 [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 2 [3] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 1 ** [4] (*): x::int*y::int = x*y; x = 1; y = 1 : u >> [3] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 1 : u >> [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 2 : p [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 3 >> [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 2 [3] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 1 ** [4] (*): x::int*y::int = x*y; x = 1; y = 1 : n ++ [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 2 --> 2 ** [2] (*): x::int*y::int = x*y; x = 3; y = 2 :
If you ever get lost, you can reprint the current rule with the '.' command:
: . ** [2] (*): x::int*y::int = x*y; x = 3; y = 2
Another useful feature is the ? command which lets you evaluate any Pure expression, with the local variables of the current rule bound to their corresponding values. Like the n command, ? applies to the current stack frame as selected with the stack navigation commands. The expression must be entered on a single line, and the trailing semicolon is optional. For instance:
> fact 3; ** [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 3 : c * ** [4] (*): x::int*y::int = x*y; x = 1; y = 1 : ?x+y 2 : u >> [3] fact: fact n::int = if n>0 then n*fact (n-1) else 1; n = 1 : n>0, fact n 1,1
Note that the current set of breakpoints can only be changed with the break and del commands of the interpreter, see Interactive Commands above. Use break without arguments to list the currently defined breakpoints. Breakpoints can be deleted with the del command, which is followed by the function and operator symbols to be removed from the breakpoint list. If del is invoked without arguments, it clears all breakpoints (after confirmation).
In interactive mode, the interpreter also runs some additional scripts at startup, after loading the prelude and the scripts specified on the command line. This lets you tailor the interactive environment to your liking.
The interpreter first looks for a .purerc file in the user's home directory (as given by the HOME environment variable) and then for a .purerc file in the current working directory. These are just ordinary Pure scripts which may contain any additional definitions that you need. The .purerc file in the home directory is for global definitions which should always be available when running interactively, while the .purerc file in the current directory can be used for project-specific definitions.
Finally, you can also have a .pure initialization file in the current directory, which is created by the dump command (see above) and is loaded after the .purerc files if it is present.
The interpreter processes all these files in the same way as with the run command (see above). When invoking the interpreter, you can specify the --norc option on the command line if you wish to skip these initializations.
This section is a grab bag of casual remarks, useful tips and tricks, and information on common pitfalls, quirks and limitations of the current implementation and how to deal with them.
People keep asking me what's so "pure" about Pure. The long and apologetic answer is that at its core, Pure is in fact purely algebraic and purely functional. However, Pure doesn't get in your way if you want to call external operations with side effects; it does allow you to call any C function after all. But with a few exceptions the standard library operations are free of those. Just stay away from operations marked "IMPURE" in the library sources (most notably, eval, catch/throw, references, sentries and direct pointer manipulations) and avoid the system module, then your program will behave according to the semantics of term rewriting.
The short answer is that I simply liked the name, and there wasn't any programming language named "Pure" yet (quite a feat nowadays), so there's one now. :)
Pure 0.7 introduced built-in matrix structures, which called for some minor changes in the syntax of comprehensions and arithmetic sequences. Specifically, the template expression and generator/filter clauses of a comprehension are now separated with | instead of ;. Moreover, arithmetic sequences with arbitrary stepsize are now written x:y..z instead of x,y..z, and the '..' operator now has a higher precedence than the ',' operator. This makes writing matrix slices like x!!(i..j,k..l) much more convenient.
In Pure 0.13 the naming of the logical and bitwise operations was changed, so that these are now called ~, &&, || and not/and/or, respectively. (Previously, ~ was used for bitwise, not for logical negation, which was rather inconsistent, albeit compatible with the naming of the not operation in Haskell and ML.) Also, to stay in line with this naming scheme, inequality was renamed to ~= (previously !=).
Pure 0.14 introduced the namespaces feature. Consequently, the scope of private symbols is now confined to a namespace rather than a source module; scripts making use of private symbols need to be adapted accordingly. Also note that syntax like foo::int may now also denote a qualified symbol rather than a tagged variable, if foo has been declared as a namespace. You can work around such ambiguities by renaming the variable, or by placing spaces around the '::' delimiter (these aren't permitted in a qualified symbol, so the construct foo :: int is always interpreted as a tagged variable, no matter whether foo is also a valid namespace).
Pure 0.26 extended the namespaces feature to add support for hierarchical namespaces. This means that name lookup works in a slightly different fashion now (see Hierarchical Namespaces for details), but old code which doesn't use the new feature should continue to work unchanged.
Pure 0.26 also changed the nullary keyword to nonfix, which is more consistent with the other kinds of fixity declarations. Moreover, the parser was enhanced so that it can cope with a theoretically unbounded number of precedence levels, and the system of standard operators in the prelude was modified so that it becomes possible to sneak in new operator symbols with ease; details can be found in the Symbol Declarations section.
The parser uses a fairly simplistic panic mode error recovery which tries to catch syntax errors at the toplevel only. This seems to work reasonably well, but might catch some errors much too late. Unfortunately, Pure's terseness makes it rather difficult to design a better scheme. As a remedy, the parser accepts an empty definition (just ; by itself) at the toplevel only. Thus, in interactive usage, if the parser seems to eat away your input without doing anything, entering an extra semicolon or two should break the spell, putting you back at the toplevel where you can start typing the definition again.
As of Pure 0.21, the interpreter has a new -c option which provides a convenient means to turn Pure scripts into standalone executables. This feature is still a bit experimental right now. In particular, note that the compiled executable is essentially a static snapshot of your program which is executed on the "bare metal", without a hosting interpreter. Only a minimal runtime system is provided. This considerably reduces startup times, but also implies the following quirks and limitations:
What all this boils down to is that anything which requires the compile time or interactive facilities of the interpreter, is unavailable. These restrictions only apply at run time, of course. At compile time the program is being executed by the interpreter so you can use eval and evalcmd in any desired way. See the description of the compiling variable below for how to distinguish these cases in your script.
For most kinds of scripts, the above restrictions aren't really that much of an obstacle, or can easily be worked around. For the few scripts which actually need the full dynamic capabilities of Pure you'll just have to run the script with the interpreter. This isn't a big deal either, only the startup will be somewhat slower because the script is compiled on the fly. Once the JIT has done its thing the "interpreted" script will run every bit as fast as the "compiled" one, since in fact both are compiled (only at different times) to exactly the same code!
Also note that during a batch compilation, the compiled program is actually executed as usual, i.e., the script is also run at compile time. This might first seem to be a big annoyance, but it actually opens the door for some powerful programming techniques like partial evaluation. It is also a necessity because of Pure's highly dynamic nature. For instance, Pure allows you to define constants by evaluating an arbitrary expression (see Constant Definitions below), and using eval a program can easily modify itself in even more unforeseeable ways. Therefore pretty much anything in your program can actually depend on previous computations performed while the program is being executed.
For the sake of a concrete example, consider the following little script:
using system; fact n = if n>0 then n*fact (n-1) else 1; main n = do puts ["Hello, world!", str (map fact (1..n))]; if argc<=1 then () else main (sscanf (argv!1) "%d");
When invoked from the command line, with the number n as the first parameter, this program will print the string "Hello, world!" and the list of the first n factorials:
$ pure -x hello.pure 10 Hello, world! [1,2,6,24,120,720,5040,40320,362880,3628800]
Note the condition on argc in the last line of the script. This prevents the program from producing an exception if no command line parameters are specified, so that the program can also be run interactively:
$ pure -i -q hello.pure > main 10; Hello, world! [1,2,6,24,120,720,5040,40320,362880,3628800] () > quit
Turning the script into an executable works as follows:
$ pure -c hello.pure -o hello $ ./hello 10 Hello, world! [1,2,6,24,120,720,5040,40320,362880,3628800]
Next suppose that we'd like to supply the value n at compile rather than run time. To these ends we want to turn the value passed to the main function into a compile time constant, which can be done as follows:
const n = if argc>1 then sscanf (argv!1) "%d" else 10;
(Note that we provide 10 as a default if n isn't specified on the command line.)
Moreover, we want to skip the execution of main at compile time. The Pure runtime provides a special system variable compiling which holds a truth value indicating whether the program is actually running under the auspices of the batch compiler, so that it can adjust accordingly. In our example, the evaluation of main becomes:
if compiling then () else main n;
Our program now looks as follows:
using system; fact n = if n>0 then n*fact (n-1) else 1; main n = do puts ["Hello, world!", str (map fact (1..n))]; const n = if argc>1 then sscanf (argv!1) "%d" else 10; if compiling then () else main n;
This script "specializes" n to the first (compile time) parameter when being batch-compiled, and it still works as before when we run it through the interpreter in both batch and interactive mode, too:
$ pure -i -q hello.pure Hello, world! [1,2,6,24,120,720,5040,40320,362880,3628800] > main 5; Hello, world! [1,2,6,24,120] () > quit $ pure -x hello.pure 7 Hello, world! [1,2,6,24,120,720,5040] $ pure -o hello -c -x hello.pure 7 $ ./hello Hello, world! [1,2,6,24,120,720,5040]
You'll rarely need an elaborate setup like this, most of the time something like our simple first example will do the trick. But, as you've seen, Pure can easily do it.
A note on code sizes: Batch-compiled Pure programs tend to be large and take a while to compile. This is because, to be on the safe side, the output code includes all functions defined in the compiled program or imported through a using clause, even if they don't seem to be "used" anywhere. As a remedy, the batch compiler has an additional -s option to strip unused functions from the output code. For instance, on a 64 bit Linux systems with ELF binaries, the -s option shaves off some two thirds of the code size from our hello.pure example. (Note that, even then the resulting code is fairly large when compared to compiled C code, as it still contains the symbol table of the entire program, which is needed by the runtime environment.)
$ pure -o hello -c -x hello.pure 7 && ls -l hello -rwxr-xr-x 1 ag users 533334 2009-09-03 01:26 hello $ pure -o hello -c -s -x hello.pure 7 && ls -l hello -rwxr-xr-x 1 ag users 178491 2009-09-03 01:26 hello $ ./hello Hello, world! [1,2,6,24,120,720,5040]
Be careful when using this option. The -s option only does a static analysis of which functions might be reached from the initialization code (i.e., toplevel expressions and let bindings). It does not take into account code run via the eval routine. Thus, functions used only in evaled code will be stripped from the executable, as if they were never defined at all. If such a function is then being called using eval at runtime, it will evaluate to a plain constructor symbol. However, it is possible to work around this limitation by just adding a dummy "call" to functions which are to be used with eval, e.g.:
let foo, bar; eval "foo (bar 99)";
Note that while the batch compiler generates native executables by default, it can just as well create object files which can be linked into other C/C++ programs and libraries:
$ pure -o hello.o -c -x hello.pure 7
The .o extension tells the compiler that you want an object file. When linking the object module, you also need to supply an initialization routine which calls the __pure_main__ function in hello.o to initialize the compiled module. This routine is declared in C/C++ code as follows:
extern "C" void __pure_main__(int argc, char** argv);
As indicated, __pure_main__ is to be invoked with two parameters, the argument count and NULL-terminated argument vector which become the argc and the argv of the Pure program, respectively. (You can also just pass 0 for both arguments if you don't need to supply command line parameters.) The purpose of __pure_main__ is to initialize a shell instance of the Pure interpreter which provides the minimal runtime support necessary to execute the Pure program, and to invoke all "initialization code" (variable definitions and toplevel expressions) of the program itself.
A minimal C main function which does the job of initializing the Pure module looks as follows:
extern void __pure_main__(int argc, char** argv); int main(int argc, char** argv) { __pure_main__(argc, argv); return 0; }
If you link the main routine with the Pure module, don't forget to also pull in the Pure runtime library. Assuming that the above C code is in pure_main.c:
$ gcc -c pure_main.c -o pure_main.o $ g++ -o hello hello.o pure_main.o -lpure $ ./hello Hello, world! [1,2,6,24,120,720,5040]
(The C++ compiler is used as the linker here so that the standard C++ library gets linked in, too. This is necessary because Pure's runtime library is actually written in C++.)
In fact, this is pretty much what pure -c actually does for you when creating an executable.
If your script loads dynamic libraries (using "lib:...";) then you'll also have to link with those; all external references have to be resolved at compile time. This is taken care of automatically when creating executables. Otherwise it is a good idea to run pure -c with the -v0100 verbosity option so that it prints the libraries to be linked (in addition to the commands which are invoked in the compilation process):
$ pure -v0100 -c hello.pure -o hello.o opt -f -std-compile-opts hello.o.bc | llc -f -o hello.o.s gcc -c hello.o.s -o hello.o Link with: g++ hello.o -lpure
Well, we already knew that, so let's consider a slightly more interesting example from Pure's ODBC module:
$ pure -v0100 -c pure-odbc/examples/menagerie.pure -o menagerie.o opt -f -std-compile-opts menagerie.o.bc | llc -f -o menagerie.o.s gcc -c menagerie.o.s -o menagerie.o Link with: g++ menagerie.o /usr/local/lib/pure/odbc.so -lpure $ g++ -shared -o menagerie.so menagerie.o /usr/local/lib/pure/odbc.so -lpure
Note that the listed link options are necessary but might not be sufficient; pure -c just makes a best guess based on the Pure source. On most systems this will be good enough, but if it isn't, you can just add options to the linker command as needed to pull in additional required libraries.
As this last example shows, you can also create shared libraries from Pure modules. However, on some systems (most notably x86_64), this requires that you pass the -fPIC option when batch-compiling the module, so that position-independent code is generated:
$ pure -c -fPIC pure-odbc/examples/menagerie.pure -o menagerie.o
Also note that even when building a shared module, you'll have to supply an initialization routine which calls __pure_main__ somewhere.
Another point worth mentioning here is that you can't just call Pure functions in a batch-compiled module directly. That's because in order to call a Pure function, at least in the current implementation, you have to set up a stack frame for the function. However, there's a convenience function called pure_funcall in the runtime API to handle this. This function takes a pointer to the Pure function, the argument count and the arguments themselves (as pure_expr* objects) as parameters. For instance, here is a pure_main.c module which can be linked against the hello.pure program from above, which calls the fact function from the Pure program:
#include <stdio.h> #include <pure/runtime.h> extern void __pure_main__(int argc, char** argv); extern pure_expr *fact(pure_expr *x); int main() { int n = 10, m; __pure_main__(0, NULL); if (pure_is_int(pure_funcall(fact, 1, pure_int(n)), &m)) printf("fact %d = %d\n", n, m); return 0; }
And here's how you can compile, link and run this program:
$ pure -o hello.o -c -s -x hello.pure 7 $ gcc -o pure_main.o -c pure_main.c $ g++ -o myhello hello.o pure_main.o -lpure $ ./myhello Hello, world! [1,2,6,24,120,720,5040] fact 10 = 3628800
Note that the first two lines are output from the Pure program; the last line is what gets printed by the main routine in pure_main.c.
Last but not least, pure -c can also generate just plain LLVM assembler code:
pure -c hello.pure -o hello.ll
Note the .ll extension; this tells the compiler that you want an LLVM assembler file. An LLVM bitcode file can be created just as easily:
pure -c hello.pure -o hello.bc
In these cases you'll have to have to handle the rest of the compilation yourself. This gives you the opportunity, e.g., to play with special optimization and code generation options provided by the LLVM toolchain. Please refer to the LLVM documentation (in particular, the description of the opt and llc programs) for details.
As of Pure 0.6, the interpreter provides a "hook" to override the print representations of expressions at runtime by means of the __show__ function, which works in a fashion similar to Haskell's show function. This feature is still a bit experimental, but seems to work reasonably well for the purposes for which it is intended.
__show__ is just an ordinary Pure function expected to return a string with the desired custom representation of a normal form value given as the function's single argument. This function is not defined by default, so you are free to add any rules that you want. The interpreter prints the strings returned by __show__ just as they are. It will not check whether they conform to Pure syntax and/or semantics, or modify them in any way.
Custom print representations are most useful for interactive purposes, if you're not happy with the default print syntax of some kinds of objects. One particularly useful application of __show__ is to change the format of numeric values. Here are some examples:
> using system; > __show__ x::double = sprintf "%0.6f" x; > 1/7; 0.142857 > __show__ x::int = sprintf "0x%0x" x; > 1786; 0x6fa > using math; > __show__ (x::double +: y::double) = sprintf "%0.6f+%0.6fi" (x,y); > cis (-pi/2); 0.000000+-1.000000i
The prelude function str, which returns the print representation of any Pure expression, calls __show__ as well:
> str (1/7); "0.142857"
Conversely, you can call the str function from __show__, but in this case it always returns the default representation of an expression. This prevents the expression printer from going recursive, and allows you to define your custom representation in terms of the default one. E.g., the following rule removes the L suffixes from bigint values:
> __show__ x::bigint = init (str x); > fact n = foldl (*) 1L (1..n); > fact 30; 265252859812191058636308480000000
Of course, your definition of __show__ can also call __show__ itself recursively to determine the custom representation of an object.
One case which needs special consideration are thunks (futures). The printer will never use __show__ for those, to prevent them from being forced inadvertently. In fact, you can use __show__ to define custom representations for thunks, but only in the context of a rule for other kinds of objects, such as lists. For instance:
> nonfix ...; > __show__ (x:xs) = str (x:...) if thunkp xs; > 1:2:(3..inf); 1:2:3:...
Another case which needs special consideration are numeric matrices. For efficiency, the expression printer will always use the default representation for these, unless you override the representation of the matrix as a whole. E.g., the following rule for double matrices mimics Octave's default output format (for the sake of simplicity, this isn't perfect, but you get the idea):
> __show__ x::matrix = > strcat [printd j (x!(i,j))|i=0..n-1; j=0..m-1] + "\n" > with printd 0 = sprintf "\n%10.5f"; printd _ = sprintf "%10.5f" end > when n,m = dim x end if dmatrixp x; > {1.0,1/2;1/3,4.0}; 1.00000 0.50000 0.33333 4.00000
Finally, by just purging the definition of the __show__ function you can easily go back to the standard print syntax:
> clear __show__ > 1/7; 1786; cis (-pi/2); 0.142857142857143 1786 6.12303176911189e-17+:-1.0
Note that if you have a set of definitions for the __show__ function which should always be loaded at startup, you can put them into the interpreter's interactive startup files, see Interactive Usage.
In the current implementation, "as" patterns cannot be placed on the "spine" of a function definition. Thus rules like the following, which have the pattern somewhere in the head of the left-hand side, will all provoke an error message from the compiler:
a@foo x y = a,x,y; a@(foo x) y = a,x,y; a@(foo x y) = a,x,y;
This is because the spine of a function application is not available when the function is called at runtime. "As" patterns in pattern bindings (case, when) are not affected by this restriction since the entire value to be matched is available at runtime. For instance:
> case bar 99 of y@(bar x) = y,x+1; end; bar 99,100
"As" patterns are also a useful device if you need to manipulate function applications in a generic way. Note that the "head = function" rule means that the head symbol f of an application f x1 ... xn occurring on (or inside) the left-hand side of an equation, variable binding, or pattern-matching lambda expression, is always interpreted as a literal function symbol (not a variable). This implies that you cannot match the "function" component of an application against a variable, at least not directly. An anonymous "as" pattern like f@_ does the trick, however, since the anonymous variable is always recognized, even if it occurs as the head symbol of a function application. Here's a little example which demonstrates how you can convert a function application to a list containing the function and all arguments:
> foo x = a [] x with a xs (x@_ y) = a (y:xs) x; a xs x = x:xs end; > foo (a b c d); [a,b,c,d]
This may seem a little awkward, but as a matter of fact the "head = function" rule is quite useful since it covers the common cases without forcing the programmer to declare "constructor" symbols (except nonfix symbols). On the other hand, generic rules operating on arbitrary function applications are not all that common, so having to "escape" a variable using the anonymous "as" pattern trick is a small price to pay for that convenience.
Sometimes you may also run into the complementary problem, i.e., to match a function argument against a given function. Consider this code fragment:
foo x = x+1; foop f = case f of foo = 1; _ = 0 end;
You might expect foop to return true for foo, and false on all other values. Better think again, because in reality foop will always return true! In fact, the Pure compiler will warn you about the second rule of the case expression not being used at all:
> foop 99; warning: rule never reduced: _ = 0; 1
This happens because an identifier on the left-hand side of a rule, which is neither the head symbol of a function application nor a nonfix symbol, is always considered to be a variable (cf. Parameters in Equations), even if that symbol is defined as a global function elsewhere. So foo isn't a literal name in the above case expression, it's a variable! (As a matter of fact, this is rather useful, since otherwise a rule like f g = g+1 would suddenly change meaning if you happen to add a definition like g x = x-1 somewhere else in your program, which certainly isn't desirable.)
A possible workaround is to "escape" the function symbol using an empty namespace qualifier:
foop f = case f of ::foo = 1; _ = 0 end;
This trick works in case expressions and function definitions, but fails in circumstances in which qualified variable symbols are permitted (i.e., in variable and constant definitions). A better solution is to employ the syntactic equality operator === defined in the prelude to match the target value against the function symbol. This allows you to define the foop predicate as follows:
> foop f = f===foo; > foop foo, foop 99; 1,0
Another way to deal with the situation would be to just declare foo as a nonfix symbol. However, this makes the foo symbol "precious", i.e., after such a declaration it cannot be used as a local variable anymore. It's usually a good idea to avoid that kind of thing, at least for generic symbols, so the above solution is preferred in this case.
A common pitfall is the implicit declaration of unqualified symbols in conjunction with the namespace declaration. For instance, it is tempting to write the following, and expect foo to become a member of the bar namespace:
namespace bar; foo x = x+1;
But, assuming that foo is not already declared or defined elsewhere, it will actually become a member of the default namespace instead. This is necessary because in Pure it is impossible to distinguish local variables from unqualified function symbols at the lexical level, and thus all undeclared and unqualified identifiers are implicitly declared in the default namespace. Thus, if you really want the foo symbol to become a member of the bar namespace, you must either declare it explicitly or use a qualified identifier. That is, you have to write either:
namespace bar; bar::foo x = x+1;
Or:
namespace bar; public foo; foo x = x+1;
This will tell the compiler about the desired namespace of the symbol.
Another common source of confusion is that Pure provides two different constructs to bind local function and variable symbols, respectively. This distinction is necessary because Pure does not segregate defined functions and constructors, and thus there is no magic to figure out whether an equation like foo x = y by itself is meant as a definition of a function foo with formal parameter x and return value y, or a definition binding the local variable x by matching the constructor pattern foo x against the value y. The with construct does the former, when the latter.
Another speciality is that with and when clauses are tacked on to the end of the expression they belong to, which mimics mathematical language but may be unfamilar if you're more accustomed to programming languages from the Algol/Pascal/C family. If you want to figure out what is actually going on there, it is helpful to read nested scopes "in reverse" (proceeding from the rightmost/outermost to the leftmost/innermost clause).
If possible, you should decorate numeric variables on the left-hand sides of function definitions with the appropriate type tags, like int or double. This often helps the compiler to generate better code and makes your programs run faster. The | syntax makes it easy to add the necessary specializations of existing rules to your program. E.g., taking the polymorphic implementation of the factorial as an example, you only have to add a left-hand side with the appropriate type tag to make that definition go as fast as possible for the special case of machine integers:
fact n::int | fact n = n*fact(n-1) if n>0; = 1 otherwise;
(This obviously becomes unwieldy if you have to deal with several numeric arguments of different types, however, so in this case it is usually better to just use a polymorphic rule.)
Also note that int (the machine integers), bigint (the GMP "big" integers) and double (floating point numbers) are all different kinds of objects. While they can be used in mixed operations (such as multiplying an int with a bigint which produces a bigint, or a bigint with a double which produces a double), the int tag will only ever match a machine int, not a bigint or a double. Likewise, bigint only matches bigints (never int or double values), and double only doubles. Thus, if you want to define a function operating on different kinds of numbers, you'll also have to provide equations for all the types that you need (or a polymorphic rule which catches them all). This also applies to equations matching against constant values of these types. In particular, a small integer constant like 0 only matches machine integers, not bigints; for the latter you'll have to use the "big L" notation 0L. Similarly, the constant 0.0 only matches doubles, but not ints or bigints.
When definining a function in terms of constant values which have to be computed beforehand, it's usually better to use a const definition (rather than defining a variable or a parameterless function or macro) for that purpose, since this will often allow the compiler to generate better code using constant folding and similar techniques. Example:
> extern double atan(double); > const pi = 4*atan 1.0; > foo x = 2*pi*x; > show foo foo x = 2*3.14159265358979*x;
(If you take a look at the disassembled code for this function, you will find that the value 2*3.14159265358979 = 6.28318530717959 has actually been computed at compile time.)
Note that constant definitions differ from parameterless macros in that the right-hand side of the definition is in fact evaluated at definition time. E.g., compare the above with the following macro definition:
> clear pi foo > def pi = 4*atan 1.0; > foo x = 2*pi*x; > show foo foo x = 2*(4*atan 1.0)*x;
The LLVM backend also eliminates dead code automatically, which enables you to employ a constant computed at runtime to configure your code for different environments, without any runtime penalties:
const win = index sysinfo "mingw32" >= 0; check boy = bad boy if win; = good boy otherwise;
In this case the code for one of the branches of check will be completely eliminated, depending on the outcome of the configuration check.
Constants also differ from variables in that they cannot be redefined (that's their purpose after all) and will only take effect on subsequent definitions. E.g.:
> const c = 2; > foo x = c*x; > show foo foo x = 2*x; > foo 99; 198 > const c = 3; <stdin>, line 5: symbol 'c' is already defined as a constant
Well, in fact this not the full truth because in interactive mode it is possible to redefine constants after all, if the old definition is first purged with the clear command. However, this won't affect any other existing definitions:
> clear c > const c = 3; > bar x = c*x; > show foo bar foo x = 2*x; bar x = 3*x;
(You'll also have to purge any existing definition of a variable if you want to redefine it as a constant, or vice versa, since Pure won't let you redefine an existing constant or variable as a different kind of symbol. The same also holds if a symbol is currently defined as a function or a macro.)
Finally, note that in difference to nonfix symbols, by default const symbols cannot be used on the left-hand side of a rule. E.g., the following will not work as expected:
> const zero = 0; const one = 1; > foo x = case x of one = "one"; zero = "zero"; _ = "???" end; > show foo foo x = case x of one = "one"; zero = "zero"; _ = "???" end; > foo zero, foo one, foo 99; warning: rule never reduced: zero = "zero"; warning: rule never reduced: _ = "???"; "one","one","one"
This is the same kind of pitfall as discussed in the context of the foop predicate in Head = Function above. As a remedy, Pure allows you to declare an existing const symbol as nonfix and have it behave as a nonfix symbol also on the left-hand side of a rule. To these ends, the value of the symbol will then be substituted for its occurrences in a pattern:
> nonfix zero one; > bar x = case x of one = "one"; zero = "zero"; _ = "???" end; > show bar bar x = case x of 1 = "one"; 0 = "zero"; _ = "???" end; > bar zero, bar one, bar 99; "zero","one","???"
The interpreter always takes your extern declarations of C routines at face value. It will not go and read any C header files to determine whether you actually declared the function correctly! So you have to be careful to give the proper declarations, otherwise your program will probably segfault calling the function.
You also have to be careful when passing generic pointer values to external C routines, since currently there is no type checking for these; any pointer type other than char*, expr* and the matrix pointer types is effectively treated as void*. This considerably simplifies lowlevel programming and interfacing to C libraries, but also makes it very easy to have your program segfault all over the place. Therefore it is highly recommended that you wrap your lowlevel code in Pure routines and data structures which do all the checks necessary to ensure that only the right kind of data is passed to C routines.
Another limitation of the C interface is that it does not offer any special support for C structs and C function parameters. However, an optional addon module is available which interfaces to the libffi library to provide that kind of functionality, please see the description of the pure-ffi module on the Pure website for details.
Last but not least, to make it easier to create Pure interfaces to large C libraries, there's a separate pure-gen program available at the Pure website. This program takes a C header (.h) file and creates a corresponding Pure module with definitions and extern declarations for the constants and functions declared in the header. Please refer to the pure-gen(1) manual page for details.
Special forms are recognized at compile time only. Thus the catch function, as well as quote and the operators &&, ||, $$ and &, are only treated as special forms in direct (saturated) calls. They can still be used if you pass them around as function values or in partial applications, but in this case they lose all their special call-by-name argument processing.
Like in Lisp, the quote special form can be used to construct literal expressions which can then be manipulated before actually evaluating them using the built-in eval function. However, there are some notable differences to the Lisp version of quote:
While it's possible to define a Lisp-like quasiquote in Pure (see the Recursive Macros section for a simplified version, a full implementation can be found in the Pure library sources), it's usually not needed. As discussed above, substitution of local variables is performed even in quoted expressions, so that it is quite easy to fill in variable parts in a quoted "template" expression. For instance:
> (\x -> '(2*x+1)) 99; 2*99+1 > foo x = '(2*x+1); > foo 99; foo $ '(7/y); 2*99+1 2*(7/y)+1
In fact, it is possible to perform arbitrary computations right in the middle of a quoted expression, using one of the special if-then-else, case, when and with expressions, since these are never quoted either. Example:
> '(x+1 when x = '(2*3) end); 2*3+1
Another useful feature of Lisp's quasiquote is the capability to splice arguments into a function application. It is possible to achieve pretty much the same in Pure with the following variation of the $ operator which "curries" its second (tuple) operand:
infixr 0 $@ ; f $@ () = f; f $@ (x,xs) = f x $@ xs; f $@ x = f x;
Now you can write, e.g.:
> '(foo 1 2) $@ '(2/3,3/4); foo 1 2 (2/3) (3/4)
One shortcoming of Pure's quote is that there is no way to quote special constructs such as lambdas. Macro expansion is inhibited in quoted expressions, however, so it is possible to work around this limitation by defining a custom special form to be used as a symbolic representation for, say, a lambda expression, which reduces to a real lambda when evaluated. To these ends, the eval function can be invoked with a string argument as follows:
def lambda x y = eval $ "\\ "+str ('x)+" -> "+str ('y);
Example:
> let l = 'lambda x (x+1); l; lambda x (x+1) > let f = eval l; f; f 9; #<closure 0x7fdc3ca45be8> 10
Other special constructs, such as case, when and with can be handled in a similar fashion.
Pure does lazy evaluation in the same way as Alice ML, providing an explicit operation (&) to defer evaluation and create a "future" which is called by need. However, note that like any language with a basically eager evaluation strategy, Pure cannot really support lazy evaluation in a fully automatic way. That is, coding an operation so that it works with infinite data structures usually requires additional thought, and sometimes special code will be needed to recognize futures in the input and handle them accordingly. This can be hard, but of course in the case of the prelude operations this work has already been done for you, so as long as you stick to these, you'll never have to think about these issues. (It should be noted here that lazy evaluation has its pitfalls even in fully lazy FPLs, such as hidden memory leaks and other kinds of subtle inefficiencies or non-termination issues resulting from definitions being too lazy or not lazy enough. You can read about that in any good textbook on Haskell.)
The prelude goes to great lengths to implement all standard list operations in a way that properly deals with streams (a.k.a. lazy lists). What this all boils down to is that all list operations which can reasonably be expected to operate in a lazy way on streams, will do so. (Exceptions are inherently eager operations such as #, reverse and foldl.) Only those portions of an input stream will be traversed which are strictly required to produce the result. For most purposes, this works just like in fully lazy FPLs such as Haskell. However, there are some notable differences:
Pure versions since 0.12 offer some basic reflection capabilities via the evalcmd primitive. This function provides access to interactive commands like clear, save and show, which enable you to inspect and modify the running program. The only "canonical" way to represent an entire Pure program in Pure itself is the program text, hence evalcmd only provides a textual interface at this time. But of course custom higher-level representations can be built on top of that, similar to those discussed in section The Quote.
Here's an example showing what can be done using the show command and a little bit of trivial text processing. The following sym_info function retrieves information about a given collection of global symbols in a way which can be processed in a Pure program. The cat argument can be any combination of the letters "c", "v", "f" and "m" denoting the categories of constants, variables, functions and macros, respectively. (You can also just leave this empty if you don't care about the type of symbol.) The pat argument is a shell-like glob pattern for the name of symbols which should be listed (just "*" matches all symbols). The result is a list of tuples (name, value, cat, descr) with the name of the symbol and its value, as well as the category and description of the symbol, as provided by show -s.
using system; sym_info cat::string pat::string = [name,eval ("("+name+")"),descr | name,descr = info] when // Get the info about matching symbols from the 'show' command. info = evalcmd $ sprintf "show -sg%s %s" (cat,pat); // Split into lines. info = if null info then [""] else split "\n" $ init info; // Get rid of the last line with the summary information. info = init info; // Retrieve the information that we need. info = [x | x@(s,_) = map fields info; // Get rid of extra lines containing extern and fixity declarations. s ~= "extern" && s ~= "nonfix" && s ~= "outfix" && s ~= "prefix" && s ~= "postfix" && ~fnmatch "infix*" s 0]; end with // Regex call to split the summary information about one symbol, as // returned by 'show -s', into the name and description parts. fields s::string = tuple $ [info!2 | info = tail $ regs $ reg_info $ regex "([^ ]+)[ ]+([a-z]*)[ ]*(.*)" REG_EXTENDED s 0]; end;
E.g., this call retrieves information about all defined macros:
> sym_info "m" "*"; [("$",($),"mac","2 args, 1 rules"),(".",(.),"mac","3 args, 1 rules"), ("void",void,"mac","1 args, 6 rules")]
Macro hygiene is a somewhat esoteric topic for most programmers, so let us take a brief look at what it's all about. The problem avoided by hygienic macros is that of name capture. There are actually two kinds of name capture which may occur in unhygienic macro systems:
Pure's hygienic macros avoid both pitfalls. Here is an example for the first form of name capture:
> def G x = x+y; > G 10 when y = 99 end; 10+y
Note that the expansion of the G macro correctly uses the global instance of y, even though y is locally defined in the context of the macro call. (Sometimes this form of name capture is actually used deliberately in order to make the macro use the binding of the symbol which is active at the point of the macro call. This never works in Pure, hence in such cases you will have to explicitly pass such symbols to the macro.)
In contrast, the second form of name capture is usually not intended, and is therefore more dangerous. Consider the following example:
> def F x = x+y when y = x+1 end; > F y; y+(y+1)
Pure again gives the correct result here. You'd have to be worried if you got (y+1)+(y+1) instead, which would result from the literal expansion y+y when y = y+1 end, where the (free) variable y passed to F gets captured by the local binding of y. In fact, that's exactly what you get with C macros:
#define F(x) { int y = x+1; return x+y; }
Here F(y) expands to { int y = y+1; return y+y; } which is usually not what you want.
There is also one Pure-related caveat here. The expression printer currently doesn't check for different bindings of the same variable identifier when it prints a (compile time) expression. E.g.:
> foo y = F y; > show foo foo y = y+y when y = y+1 end;
This looks as if y got captured, but in fact it's not, it's just the show command which displays the definition in an incorrect way. You can add the -e option to show which prints the deBruijn indices of locally bound symbols, then you see that the actual bindings are all right anyway (note that the number before the colon is the actual deBruijn index, the sequence of bits behind it is the subterm path):
> show -e foo foo y/*0:1*/ = y/*1:1*/+y/*0:*/ when y/*0:*/ = y/*0:1*/+1 end;
Alas, this means that if you use dump to write such a definition to a text file and read it back with run later, then you'll get the wrong definition. Currently you will have to correct this manually.
Pure programs may need a considerable amount of stack space to handle recursive function and macro calls, and the interpreter itself also takes its toll. So you should configure your system accordingly (8 MB of stack space is recommended for 32 bit systems, systems with 64 bit pointers probably need more). If the PURE_STACK environment variable is defined, the interpreter performs advisory stack checks and raises a Pure exception if the current stack size exceeds the given limit. The value of PURE_STACK should be the maximum stack size in kilobytes. Please note that this is only an advisory limit which does not change the program's physical stack size. Your operating system should supply you with a command such as ulimit(1) to set the real process stack size. Also note that this feature isn't 100% foolproof yet, since for performance reasons the stack will be checked only on certain occasions, such as entry into a global function.
Fortunately, Pure normally does proper tail calls (if LLVM provides that feature on the platform at hand), so most tail-recursive definitions should work fine in limited stack space. For instance, the following little program will loop forever if your platform supports the required optimizations:
loop = loop; loop;
This also works if your definition involves function parameters, guards and multiple equations, of course. Moreover, conditional expressions (if-then-else) are tail-recursive in both branches, and the sequence operator $$ is tail-recursive in its second operand. Note, however, that the logical operators && and || are not tail-recursive in Pure, because they are required to always yield a proper truth value (0 or 1), which wouldn't be possible with tail call semantics. (The rationale behind this design decision is that it allows the compiler to generate much better code for logical expressions.)
There is one additional restriction in the current implementation, namely that a tail call will be eliminated only if the call is done directly, i.e., through an explicit call, not through a (global or local) function variable. Otherwise the call will be handled by the runtime system which is written in C and can't do proper tail calls because C can't (at least not in a portable way). This also affects mutually recursive global function calls, since there the calls are handled in an indirect way, too, through an anonymous global variable. (This is done so that a global function definition can be changed at any time during an interactive session, without having to recompile the entire program.) However, mutual tail recursion does work with local functions, so it's easy to work around this limitation.
Also note that tail call optimization is always disabled if the debugger is enabled (-g). This makes it much easier to debug programs, but of course this means that you should always run your program without -g when the debugger is not needed.
As described in section Exception Handling, signals delivered to the process can be caught and handled with Pure's exception handling facilities. Like stack checks, checks for pending signals are only performed at certain places, such as entry into a global function. This doesn't include tail calls, however, so a busy loop like loop = loop; will never be interrupted. To work around this, just add a call to another global function to your loop to make it interruptible. For instance:
loop = check $$ loop; check = ();
To handle signals while the above loop is executing, you can add an exception handler like the following:
loop = catch handle check $$ loop with handle (signal k) = catch handle (...) end;
(Note the catch handle around the signal processing code which is needed for safety because another signal may arrive while the signal handler is being executed.)
Of course, in a real application the check function would most likely have to do some actual processing, too. In that case you'd probably want the loop function to carry around some "state" argument to be processed by the check routine, which then returns an updated state value for the next iteration. This can be implemented as follows:
loop x = loop (catch handle (check x)) with handle (signal k) = catch handle (...) end; check x = ...;
Pure is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Pure is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License V3 for details. (You can also find the GPL in the COPYING file accompanying the software.)
Albert Gräf <Dr.Graef@t-online.de>, Dept. of Computer Music, Johannes Gutenberg University of Mainz, Germany.
The author gratefully acknowledges the contributions by Scott E. Dillard, Rooslan S. Khayrov, Eddie Rucker, Libor Spacek and Jiri Spitz, as well as Toni Graffy, Michel Salim and Ryan Schmidt who maintain the SUSE Linux, Fedora Core and OSX packages, respectively. Thanks are also due to Vili Aapro, Alvaro Castro Castilla, John Cowan, Chris Double, Tim Haynes, Roman Neuhauser, Wm Leler, John Lunney and Max Wolf.