SPL Language Reference Manual ============================= The SPL sources include many example programs. E.g.: example/example*.spl hanoi.spl SPL scripts can be executed with "splrun ". WebSPL scripts (*.webspl) can be executed using "webspl.cgi" or "webspld" (see README file or the 1st chapter in the big SPL manual). Basics ------ SPL is a C-like language (such as Java or PHP). Constructs such as "if" statements and "while" loops work as in those other languages. I am not explaining the basic language constructs derived from C in this manual. Please have a look at http://en.wikipedia.org/wiki/C_syntax if you have no idea of how the basic syntax of C (and other C-like languages) looks like. SPL is a dynamically typed language. So don't wonder how you can define a variable as "int", "float" or "array of objects". Just use the variables as "int", "float" or "array of objects" and they will be. However - the operators can have types: + - * / % ** Dynamically typed addition, subtraction, multiplication, division, modulo and power operators. #+ #- #* #/ #% #** Integer addition, subtraction, multiplication, division, modulo and power operators. .+ .- .* ./ .% .** Floating point addition, subtraction, multiplication, division, modulo and power operators. == != <= < > >= Dynamically typed comparison operators #== #!= #<= #< #> #>= Integer comparison .== .!= .<= .< .> .>= Floating point comparison ~== ~!= ~<= ~< ~> ~>= String comparison ! || && Logical NOT, OR and AND (the keywords 'not', 'or' and 'and' are also available and do exactly the same thing) ~ String concatenation .( ... ) Cast the expression in the parentheses to a floating point value. #( ... ) Cast the expression in the parentheses to an integer value. The dynamically typed operators are dynamically mapped to the integer, floating point or object operators (the object operators are described in a seperate section below). But they are never mapped to string operators. So if you want to e.g. compare two strings you must always use the '~==' operator. SPL also has support for the usual combined operation-assign operators such as '+=' (or '~=' for appending to a string). Also the famous '++' and '--' operators (pre- and postfix) do exist in SPL. The operator '<=>' can be used to switch the values of two variables. Have a look at the 'hanoi.spl' program in this directory. It is a nice example for a simple SPL program. Generating Debug Output ----------------------- The keyword 'debug' may be used to create debug output. The debug message itself is simply passed as argument. Since this is a language keyword (and not a function), no parentheses are required for the argument: debug "Hello World!"; The way the debug messages are diplayed depends on the host application embedding the SPL runtime. They may be printed to the console, displayed in popup windows, written to logfiles or simply be ignored. The keyword 'warning' works the same way as 'debug', but also includes a stack backtrace in the messege and the keyword 'panic' also terminates the program execution. Variables --------- As said already, SPL is entirely typeless. There is just the abstract construct of variables. A variable can represent everything, a simple scalar value, an array or hash, a function or an object. Variables are declared using the "var" keyword: var foobar; Values can be assigned using the '=' operator: foobar = 42; You can also use the variable as array or hash by appending '[ .. ]' to the variable name (as you might already be used to from other languages): foorbar[42] = 23; But more about that in the section about arrays and hashes. The keyword 'declared' can be used to check if a variable has been declared. The keyword 'defined' can be used to check if a declared variable has a value assigned. The keyword 'undef' represents an undefined value and can e.g. be used to remove the value of a variable: var x; if (declared x) debug "Variable x has been declared."; if (defined x) debug "This is never reached."; x = 42; if (defined x) debug "Now x is declared and defined."; x = undef; if (not defined x) debug "Now x is not defined again."; The keyword 'lengthof' can be used to get the length (in characters) of the srtring representation of a variable: debug lengthof "Hello World"; Using an undeclared variable (outside a 'declared' test) causes a runtime error. Functions --------- Functions can be declared with the "function" keyword, such as: function hypot(x, y) { return ( x.**2 .+ y.**2 ) .** (1./2); } And may be called by simply using the function name and passing the arguments in parentheses: var h = hypot(5, 12); Note that the parentheses are required when calling functions in SPL. However, there are some language keywords with function-like behavior. But they don't have parentheses around their arguments because they are not functions but compiler keywords: debug, warning, panic, delete, import, load, new, return, defined, declared, pop, shift, push, unshift, throw Note that functions can be copied as any other variable. So it is e.g. possible to simply pass a callback function as parameter to another function without any syntactical black magic. It is also possible to define anonymous functions. In this case the function definition evaluates to the function pointer and can e.g. be assigned to a variable. The syntax is the same as for normal function declarations, but the function name is skipped: var hypot = function(x, y) { return ( x.**2 .+ y.**2 ) .** (1./2); }; This is especially useful when functions need to be passed as arguments to other functions: array_sort_by_values(myarray, function(a,b) { return a < b; }); The keyword declared can be used to check if a function is actually available: if (declared hypot) hypot(3,4); All functions that are built into the SPL language are C functions which are usually registered at startup. Such functions are also called builtins. To see if a registered C function is available, declared_builtin can be used: if (declared_builtin("write")) debug "the function write is available"; Using the keyword declared for C functions does not work because declared only checks for SPL variables. Likewise, declared_builtin does not work for SPL functions or variables. SPL can easily be extended with additional C functions. For this purpose, modules can be created and loaded in SPL. It is also possible to use the SPL virtual machine in a C program where C functions can be registered to the VM (virtual machine). For details on modules, see the section on Loadable Modules. To learn how to use the SPL VM in C programs, see the C API Documentation in README.API or at the end of the manual. Array Operations ---------------- In SPL, there is no difference between arrays, hashes and other complex data structures like objects. A variable simply may have child variables. The children of a variable may be accessed using the dot-operator or brackets: foo.bar = 42; foo["bar"] = 42; the only difference between the two methods is that the name of the child variables is limited to the lexical rules for identifiers (the regular expression /[a-zA-Z_][a-zA-Z0-9_]*/) and that child variables must be explicitly defined with the 'var' keyword when accessed using the dot operator. It is possible to iterate over all children of a variable with the 'foreach' statement: foreach index (array) debug array[index]; where the 'index' variable is automatically defined by the foreach statement and only valid for the loop body. A foreach loop always returns the elements in the same order in which they have been created (except for elements added using the 'unshift' statement). The foreach loop usually iterates over the indexes of an array. It is also possible to iterate over the values using the foreach[] loop. So this is identical to the previous example: foreach[] value (array) debug value; There are four special variables which only exist within foreach loops: $[ .. is set to '1' when this is the first element and to '0' otherwise $] .. is set to '1' when this is the last element and to '0' otherwise $# .. the current index (not the value pointed to by the index) $@ .. the array which is processed by the foreach loop Arrays may be defined in-place using square brackets. There are two methods of doing so: With automatically assigned and with explicitly declared keys. When declaring the keys, the '=>' operator can be used when the key is specified as value and the ':' operator when the key is specified like a variable name. It is also possible to mix those two methods: var a1 = [ "a", "b", "c" ]; var a2 = [ "x" => "a", "y" => "b", z: "c" ]; var a3 = [ x: "a", 100 => "b", "c" ]; has the same effect as: var a1, a2, a3; a1[0] = "a"; a1[1] = "b"; a1[2] = "c"; a2["x"] = "a"; a2["y"] = "b"; a2["z"] = "c"; a3["x"] = "a"; a3[100] = "b"; a3[101] = "c"; So, when adding elements to an array without explicitly defining a key, the highest numeric key defined in the array will be incremented by one to build the key for the new element. There are two other instructions which can be used to add elements to an array: push array, 23; unshift array, 42; The push instruction adds an element to the end of the array, unshift to the beginning. The difference between these instructions is in the position the new element will have in the array, not in the key which is assigned to it. The key is the next integer value in both cases. There are also instructions for removing the first and last entry from an array: var last = pop array; var first = shift array; But it is also possible to simply remove elements by using the key: delete array[42]; Besides the 'foreach' loop, it is also possible to manually 'walk through' an array using the 'next' and 'prev' instructions: var a = [ 3 => 'x', 5 => 'y', 7 => 'z' ]; var three = next a, undef; var five = prev a, 7; var seven = next a, 5; They return the next or previous index value in the array passed as 1st parameter, relative to the index passed as 2nd parameter. If the 2nd parameter is undef, 'next' returns the first and 'prev' the last index. If there is no next or previous element, undef is returned. The keyword 'elementsof' can be used to get the number of elements in an array: debug elementsof [ 1, 2, 3, 4, 5, 6 ]; Additional functions for working with arrays (such as sorting) are provided by the "array" module which is described in SPLDOC (there are separate sections about SPL Modules and SPLDOC later in this document). Vaargs like arguments --------------------- SPL also supports 'vaargs' like argument passing. The last argument in a function declaration may be prefixed with an '@' character. In that case the variable representing this argument will be an array of all remaining arguments. If an '@' character is put in front of a function argument when calling a function, that argument is interpreted as an array and an argument for each array element will be inserted. E.g.: function foo(@args) { foreach i (args) debug args[i]; } function bar(@args) { foo("one", @args, "four"); } bar("two", "three"); This is useful for writing wrapper functions and functions with a variable number of arguments. Named Function Arguments ------------------------ SPL also supports named arguments (aka options). With this arguments, not the position in the function call but the assigned name identifies an argument. All named parameters are assigned to a hash which is declared with a '%' prefix in the function prototype. The '%' prefix may also be used when calling a function to pass all elements of a hash as named arguments. E.g.: function foo(%args) { foreach i (args) debug "$i -> ${args[i]}"; } function bar(%args) { foo(one: 1, %args, four: 4); } bar(two: 2, three: 3); This is e.g. useful for function arguments which are rarely used. When a named option is specified multiple times, the most left specification is used. That way it is possible to pass a hash with default values as last parameter using the '%' prefix. Objects ------- If you have no idea what object oriented programming is, read http://en.wikipedia.org/wiki/Object_Oriented_Programming first. SPL does not really know about the difference between classes and objects. If only one instance of a class/object is needed, it is possible to simply use the name defined with the 'object' keyword. (But if you do so, no constructor will be executed.) Because of this, we don't speak about classes and objects in SPL, but of objects and instances of objects. Objects can be defined in SPL like this: object Foo { var counter; method increment_counter() { return counter++; { method init(start_value) { counter = start_value; return this; } } The method 'init' has a special meaning: It is the constructor. There are no destructors in SPL, but it is common to create a method named 'destroy' for the job and call it manually when the object isn't needed anymore. If there is no init, a default init method is provided that returns this. If a custom init method is used special care must be given to return this at the end as is shown in the example above. If this is not returned, creating a new object with the new operator will return undef. The constructor must return a self pointer. The keyword 'this' can be used to create such a pointer. Thus the "return this;" in the example above. If the new object is derived from another one, the parent object is specified right after the object name in the object definition: object Bar Foo { method decrement_counter() { return counter--; } } Objects may be instantiated using the 'new' operator. If any arguments are specified when instantiating an object, they are passed through to the object constructor as they are: var mycounter = new Foo(42); SPL does support nested functions as well as nested objects. So it is possible to define 'local objects' in your functions. It is also possible to derive one object from more than one parent. This can be done using the 'import' statement: object Foobar Foo { import Bar; [...] } But the parent objects which are 'imported' neither show up in the object derivation path of the object, nor in its reflection string. When an object or one of its instances is used as scalar variable, they evaluate to their reflection strings. E.g.: object A { } object B A { } var x = new B(); debug A; // prints: SPL Debug: A debug B; // prints: SPL Debug: A | B debug x; // prints: SPL Debug: [ A | B ] Objects and instances are automatically created as references. So they can be passed as arguments and being returned from functions without implicitly being copied by those operations. If a method is overloaded in a child object and the implementation from the parent object needs to be called in the child object, this can be done using the context dispatching operator '*': object A { method foo() { debug "Now in foo from A."; } } object B A { method foo() { debug "Now in foo from B."; *A.foo(); } } Without the '*', A.foo() would be called in the context of A and not in the context of B (or the currently active instance of B). The keyword 'static' can be used to declare static object variables. A static object variable is shared between all objects in the derivation path and all instances of these objects. The declaration works exactly the same way as for declaring normal variables in objects, just that the keyword 'static' is used instead of the keyword 'var'. There is nothing like 'private functions' in SPL. All methods (and variables) are public. If some are intended for internal use only, they must be protected by choosing a name which avoids collisions. There is a separate section on the recommended naming conventions later in this document. Object Operators ---------------- There is no operator overloading support in SPL. Instead, there are object operators (as they are seperate integer and floating point operators too). Whenever such an object operator is used, an "operator_*" method is called in the left operand and both operands are passed as parameters. The method name is different for each object operator: (+) and (+)= are calling the method operator_add(a, b) (-) and (-)= are calling the method operator_sub(a, b) (*) and (*)= are calling the method operator_mul(a, b) (/) and (/)= are calling the method operator_div(a, b) (%) and (%)= are calling the method operator_mod(a, b) (**) and (**)= are calling the method operator_pow(a, b) (<) is calling the method operator_lt(a, b) (>) is calling the method operator_gt(a, b) (<=) is calling the method operator_le(a, b) (>=) is calling the method operator_ge(a, b) (==) is calling the method operator_eq(a, b) (!=) is calling the method operator_ne(a, b) The operator precedence is the same as for the normal integer and floating point operators. The generic dynamically typed operators (+, -, *, etc.) are automatically mapped to the object operators when used with objects. Exceptions ---------- SPL has an exception handling mechanism for error-reporting and -handling. The exceptions themselves are objects. In fact, every object could be used as an exception. But it is recommended to only use objects as exceptions which have been designed for this purpose. Usually the names of such exception objects end with the "Ex" postfix. E.g. all exceptions thrown by the SQL modules are of the type "SqlEx". Exceptions may be thrown using the keyword 'throw' and are catched in 'try' blocks. Here is a small example: object MyEx { } object MySpecialEx MyEx { } try (x) { throw new MySpecialEx(); debug "** this line is never reached **"; catch MyEx: debug "Got a 'MyEx' exception. The backtrace is:\n" ~ x.backtrace; } If no 'catch' rule matches the exception object or any of its parent objects, a runtime error is printed and the execution of the program is terminated. The 'throw' instruction automatically adds the 'backtrace' variable to the exception object. This variable contains a human-readable backtrace of the current task. If the exception object has a 'description' variable, its text value will be printed as part of the runtime error for uncatched exceptions. It is considered good coding style to only use exceptions for error handling. Strings and Here-Documents -------------------------- Strings can be quoted using double (") or single (') quotes. The only difference between the two quoting styles is that within double quotes it is possible to use single quotes unescaped and vice versa. Strings can be concatenated using the ~ operator. But a list of string constants is concatenated automatically. So this three code snippets are identical: debug "hello world"; debug "hello" ~ " " ~ "world"; debug "hello" " " "world"; The usual backslash escape sequences (such as '\n' for newlines) can be used in both quoting styles. Additionally, there are some special substitution sequences which are all starting with a dollar sign ($). These sequences are described in the next section. The section after it describes so-called templates, an even more advanced way of handling strings in SPL programs. Another way of quoting strings in SPL are so-called Here-Documents. They work a little bit differently in SPL compared to other languages such as Perl. An example: debug <>' instead of '<<' is used, the Here-Document is literal and no substitutions are performed. The '<>Token') may be followed by any special character, which is then used as so-called indenting character. All characters from the beginning of a line in the Here-Document until the first occurrence of that indenting character are ignored: debug <>>' does the same, but without processing any special substitutions. Special Substitutions in Strings -------------------------------- SPL has support for some special substitution operators in strings and templates (the latter are described in the next section). All these substitution operations start with the dollar sign ($): $variable Will be substituted by the current value of the variable. ${expression} Will be substituted by the value of the expression. This can be everything - including a really complex program code snippet. $(function) This is an embedded function. It will be executed and the return value substituted. E.g.: $( if (should_insert_foo()) return "Foo"; return "Bar"; ) $ Insert results from a regular expression (by name or numeric ID). The brackets can be left away for single digit IDs, $- and $+. $[ comment ] This is a comment. It is always substituted with an empty string. $$, $:, $? Just a "$", ":" or "?" respectively. (for escaping $.., and are terminated by . It is possible to use an empty string as token too, so <> ... also works fine. Within templates, all the $ substitutions are supported. In addition to them, there are also some special XML-like tags allowed in templates: ... A comment. Everything between the tags will be ignored. ... The SPL expression passed using the "code" attribute will be evaluated and the content of the tag will only be embedded if the result is true. ... This tag must directly follow a tag. Its content will only be embedded if the content of the tag has not been embedded. ... The content of this tag will be evaluated as embedded function. The return value (if any) is substituted. (like "$( ... )") ... The same as above, but the SPL program code is passed using the "code" attribute. The content of the tag is available through a variable with the name "_data_" when executing the code. This is e.g. useful if you want to pass a part of your document through an encoding function: This is xml encoded: < & > The encoding operator (::) is described later. ... The tag is evaluated like an embedded function. So e.g. variables declared in an tag is local to that tag. The tag is evaluated in the same context as the template text itself. So it can be used to e.g. declare functions or variables used in the rest of the template. Do not use the 'return' keyword in an tag unless you really want to return from the the current function context. A tag is always substituted with an empty string. ... This tag is always substituted with an empty string, but the variable named in the "var" attribute will be declared and set to the content of the tag. ... As above, but the value will again be passed through the specified SPL expression and is available as "_data_" in the expression. ... Builds a foreach instruction using the variable name passed in the "var" attribute as iterator and the list passed in the "list" attribute as list to iterate over. If the behavior of a foreach[] loop is wanted, the variable name in the "var" attribute must be prefixed with "[]". ... Localize the tag body (see "Internationalization and Localization" below). Localization domains can be passed by using the syntax .... Note that no other special tags are allowed inside of this tag. ... The tag body is parsed using the specified indenting character. Here is a small example of what can be done using SPL Templates. It selects messages from a database and creates HTML code for showing them in java script popup windows. SELECT message, timestamp FROM message_table WHERE user_id = ${sql::userid} AND urgent = 1 SPL also has support for PHP-like code snippets in templates using '' tags. The main difference to the and tags is that it is possible to let SPL code blocks and template blocks overlap with this syntax. E.g.: First Line word #$i is '$word' Last Line The and tags would require the '{' .. '}' block to be closed in the same code block in which it has been opened. Templates can also be started with <:Token>, in which case the ':' is used as indenting character. This does not change the termination string for the token (it is still ). Other indenting characters than ':' are not supported for inline templates (except they are introduced using the tag). As with the substitutions, you should consider wisely how to use SPL Templates in your programs because it is also possible to use them to produce obfuscated code. The and Template Tags ------------------------------------------- In addition to the template tags described above there is also support for and template tags. The tags are transformed to calls to splcall_*() functions. The tag body is transformed to a function which evaluates it and this function is passed as argument to the splcall_*() function. The return value of the splcall_*() is substituted. Example given: function splcall_toupper(textfunc) { var text = textfunc(); text =~ e/[a-z]/g chr(ord($0)+ord('A')-ord('a')); return text; } debug <>This is very important!; The tag attributes are passed as named parameters to the splcall_*() functions. When the attributes are quoted with ' or " they are interpreted as strings and when they are quoted with ( .. ) they are interpreted as SPL expressions. It is also possible to ommit the tag body by terminating the tag with '/>'. Example given: function splcall_getuser(%args) { return sql_value(db, "SELECT username " "FROM user WHERE id = $args.id"); } debug <>User $userid: ; The tags work exactly like the tags, but instead of passing the tag body as argument and substituting the return value the tag body is only evaluated and included in the template output when the return value of the splif_*() function returns true. It is not possible to ommit the tag body of tag. The tag can also be used with an tag. Example given: function splif_adminuser(%args) { return sql_value(db, "SELECT isadmin " "FROM user WHERE id = $args.id"); } debug <:> : Data: $data_string : Secret Data: $secret_data_string : Secret Data: ** admins only ** ; The and tags don't really add any functionality which are not already provided by the tags. But they can be used to simplify writing SPL template files a lot, especially if the person writing the template files is not a programmer. Loadable Modules ---------------- There are two different types of loadable modules: SPL byte code modules and machine byte code modules. Machine byte code modules are *.so files (or *.dll on Win32 host). The README.API file (and the last chapter in the manual) describes how to write such modules in C. SPL byte code modules are written in SPL and compiled using the "splrun" command line tool. Both types of modules are loaded using the 'load' instruction. E.g.: load "sql"; When executing such an instruction, the SPL virtual machine is first looking for a machine byte code module with such a name in the module search path and then for SPL byte code modules. The module files are named "mod_.splb" (or "mod_.so" or "mod_.dll" respectively). Each module can only be loaded once. Additional 'load' instructions for an already loaded module are ignored. A module is written like any other SPL program, but should only declare variables, functions and objects and should not execute any real code (or at least limit that to some simple initializations). SPL source files can be compiled to byte code files like this: ./splrun -N -e -x mod_foobar.splb mod_foobar.spl The -e is optional and instructs the compiler to include debug symbols in the byte code file. Running 'splrun' without any parameters prints out the full list of available command line options. SPLDOC Comments --------------- Modules should have SPLDOC Comments in their source, to make it possible to create API references for them directly from the source code with the 'spldoc.spl' tool. Running 'make spldoc' in the SPL source dir creates the documentation to all modules which are included in the SPL distribution. The generated API references are written to the "spldoc/" directory. They can also be found on the SPL web page. SPLDOC comments look like this: /** * This is an SPLDOC comment for the foobar function */ function foobar(); So they start with "/**" on a line on its own. Then comes the description text. If the first character in the line is an asterisk (*), it will be ignored. The SPLDOC comment ends with '*/' and right in the next line (a blank line is not allowed here) comes the commented function prototype (or variable declaration, etc). The first SPLDOC comment in a module source file is special: it describes the module itself (so the first line after the comment has no special meaning in this case). In order to be parse-able by SPLDOC, a source file must be "well formatted". That means, the functions, variables, etc. which are not an object member must be declared (and commented) first, then the objects. It is important that not only the object members, but also the object itself is commented. Otherwise SPLDOC wouldn't 'see' the object and the documentation wouldn't match the module. In an object, you should always define static variables first, then dynamic variables, then functions and then methods. Everything which is not an object member must be defined before the first object and should be defined in the same order (first variables, then functions). Have a look at "spl_modules/mod_wsf.spl" and "spldoc/wsf.html" (which is created by 'make spldoc') for a good example. Naming Conventions ------------------ SPL has a flat root name space. So it is important to have some naming conventions to avoid collisions in this flat name space. The module names themselves are using a kind of hierarchy. E.g. there is a module called "wsf" and a module "wsf_dialog" based on it. In the module "wsf" all object names start with the prefix "Wsf" and all functions and global variables start with "wsf_". In the "wsf_dialog" module, the object prefix is "WsfDialog" and the function/variable prefix is "wsf_dialog_". A few more words about object names. Each part of the object which start a new 'logical block' (and not each word!) should start with a capital letter. E.g.: object Myobject { } object MyobjectFoobar Myobject { } or: object MyobjectBase { } object MyobjectFoobar MyobjectBase { } But _not_: object MyObject { } object MyObjectFooBar MyObject { } It's up to you to decide what a "logical block" is in your case. There are some additional naming conventions for exception objects: They all end with "Ex", but if e.g. exceptions are derived from each other (and some text is appended to the object name), it is appended before the "Ex" postfix. Private variables and functions of modules (which are not documented using SPLDOC and are for internal use in the module only) must begin with "__". Private methods of an object (which are not documented and should not be called from methods in derived objects) must begin with "__". Regular Expressions ------------------- If you have no idea what regular expressions are, read http://en.wikipedia.org/wiki/Regular_expressions first. SPL is using the PCRE library for regular expression matching. So it is pretty compatible with Perl regular expression. If the PCRE library cannot be found by the SPL GNUmakefile, SPL is compiled without regular expression support and a runtime error is produced whenever regular expression instructions are executed. The syntax for regular expression matching is similar to the Perl syntax: x =~ /foobar/; x =~ s/foo/bar/g; Perl-like modifiers supported by SPL: i .. ignore case in pattern matching s .. dot metacharacters also match the newline charater x .. ignore unescaped white spaces and allow comments using '#' m .. multi line matching, ^ and $ also match newline characters g .. match (and substitute) globally, not only the first match Modifiers new in SPL: N .. include captured strings as child nodes in result, using numbers P .. include named captured strings (?P...) in result, using names A .. add an array with an element per match (with 'g' modifier) R .. when substituting, return new text and keep original unmodified I .. declare named captured strings (?P...) as local variables E .. store the text between the matches in $- (before) and $+ (after) L .. the $-variables have the data from the last match (with 'g' mod.) S .. return the text fragments between the matches (split mode) The return value of '=~' is the number of matches found (except when the 'R' modifier is used). If the 'g' modifier isn't used, the return value may only be 0 or 1. With the modifiers 'N', 'P' and 'A', the result will also have child variables with additional data about the matches. When the 'A' modifier is used without 'N' or 'P' the result is an array with the entire matches. With the 'N' or 'P' modifier the result is an array with the captures as child variables. So splitting up an input file in lines can be done like this (the file_read() function is provided by the "file" module): var lines = file_read("demo.txt") =~ /[^\n]*/Ag; It is possible to declare names for capturing parentheses using the python syntax (?P...). This is much of a help when dealing with complex regular expression with many capturing parentheses. Referring to the strings matched by a regular expression can be done by using $N, $ and $ (in addition to including them in the result value using the 'N', 'P' and 'A' modifiers). The special variable $0 represents the whole matched text and is also available if no capturing parentheses were present in the regular expression. These special variables are declared locally - they do not invalidate regular expression results in any higher context. So it is save to e.g. do a regular expression, then call a function which is also using regular expression, and after that refer to the matches of the first regular expression using these special variables. The special modifier 'E' stores the text which has not matched in $- (before the match) and $+ (after the match). In combination with 'N' or 'I' there will also be ["-"] and ["+"] elements in the returned object. When the 'E' modifier is used together with 'g', $- and $+ contains the text snippets between the matches. A very special variable is the variable $$. It is the local variable in which the regular expression results are stored. So instead of writing $0 you could write $$[0], or $$.foobar instead of $. This variable is function local, but it is possible to write 'var $$;' in a local block to create a seperate set of regex result variables. It is also possible to backup the value of $$ in another variable and restore it later on. Here is a nice example for using complex regular expressions: var x = "foolish bigfoot"; var r = x =~ /(?P(?P\S)\S*)\s*/ALPg; foreach i (r) { var $$; r[i].word =~ s/foo(.*)/bar$1/; debug "Match #$i: [${r[i].firstchar}] ${r[i].word} ($0)"; } var text1 = "Ever seen a ${r[0].word} $1?"; var text2 = text1 =~ s/seen/beeing eaten by/R; debug text1; debug text2; This script creates the following output: SPL Debug: Match #0: [f] barlish (foolish) SPL Debug: Match #1: [b] bigbart (foot) SPL Debug: Ever seen a barlish bigfoot? SPL Debug: Ever beeing eaten by a barlish bigfoot? It is possible to use the import statement with $$. That has almost the same effect as using the 'I' modifier, but gives you a better control on where the variables for the regular expression matches are declared: var text = "Hello World"; if (text =~ /(?P\S+)\s+(?P\S+)/) { import $$; debug "$foo $bar"; } if (declared foo) panic "This is never reached"; if (text =~ /(?P\S+)\s+(?P\S+)/I) { debug "$foo $bar"; } if (declared foo) debug "Foo is now defined here too."; A full description of the regular expression syntax supported by PCRE (and SPL) can be found in the "pcrepattern" man page. Instead of the slash (/) as quoting character for regular expressions, it is also possible to use colons (:), commas (,), exclamation marks (!), percentage sign (%) and the at-character (@). It is also possible to use regular expression substitutions in which the substitution text is re-evaluated for every match. But the syntax for this is slightly different as in perl: var text = "Some ASCII Codes: A = #A, B = #B, C = #C, D = #D"; debug text =~ e/#(.)/Rg ord($1); For complex to-be-evaluated expressions it is required to put the expression in parentheses. Note that not all of the modifiers listed above are allowed for and that the $-variables are not set by e// expressions. References ---------- Handling references (aka pointers) is easy in SPL. It is done more or less automatically by the virtual machine: Simple variables (such as scalars or functions) are always passed by value and complex variables (arrays, objects and everything else with child variables) are passed by reference. At least that's what it looks like. In fact the SPL virtual machine implements a complex copy-on-write behavior, but this is hidden in the machine internals. The basic idea behind copy-on-write systems is discussed at http://en.wikipedia.org/wiki/Copy_on_write Sometimes it is neccessary to create recursive copies of complex data structures. This can be done by assigning the variables using the ':=' operator (instead of using the normal '=' operator). It is possible to test if two variable names point to the same object using the '^==' operator. Testing for not-equal is performed with the '^!=' operator. The Quoting/Encoding Operator (::) ---------------------------------- There is a special operator in SPL for quoting and encoding text (::). In fact it is nothing else than a simple function call: foo::bar is identical to encode_foo(bar) but in some cases easier to write and read. There are various modules which provide encode_* functions. E.g. the "sql" module provides an "encode_sql()". Embedded Functions ------------------ Embedded functions are functions which are simply "inlined" in an expression using the special parentheses "({" ... "})". E.g. this code fragment: var x = 42; function foobar() { if (x == 42) return 23; if (x == 23) return 42; return 0; } debug foobar() + 23; does the same thing as this code fragment: var x = 42; debug ({ if (x == 42) return 23; if (x == 23) return 42; return 0; }) + 23; The mechanism used here is the same as the one used for the $( ... ) substitution described above. Gotos, break and continue ------------------------- SPL has support for gotos. The syntax is pretty much the same as in C: var i; for (i=0; i<42; i++) { if (i == 23) goto break_out; jump_back:; } if (i > 42) return; break_out: write("demo2: Now i is $i.\n"); goto jump_back; A goto label must always point to an instruction (as in ANSI C). That's why there is a ';' after the "jump_back" label: The ';' adds an empty instruction. It is not possible to use gotos to jump from one function in another (or from a function to the main program). Also note that statements such as 'var' are compiled to real virtual machine statements and if a goto is used to jump over a 'var' statement, the variable won't be declared and any use of that variable would result in a runtime error. So you should only use gotos to do stuff such as jumping out of loops. SPL also has support for the 'break' and 'continue' statements. They are internally implemented as gotos. Switch Statements ----------------- The switch statements are different in SPL compared to languages such as C. An SPL switch statement actually is nothing else then a series of "if .. else if" statements with a different syntax: var list = [ 1, 2, 3, 4 ]; while (1) switch { var x = shift list; case x == 1: debug "x is 1"; case x == 2: debug "x is 2"; case x == 3: debug "x is 3"; default: debug "whatever!"; exit; } The code before the first 'case' statement is always executed. This code block can be used to declare variables which are only used in the switch block. Each case block comes with a condition which defines if that block should be executed. The first case block with a true condition will be executed. Note that the SPL runtime does not optimize switch statements (e.g. using lookup tables). So a huge list of cases should better be implemented using a hash with function references. Compiler Pragmas and Preprocessor Statements -------------------------------------------- There are some compiler pragmas and preprocessor statements for the SPL compiler. First of all there are three statements for including external files at compile time. The compiler must be able (allowed) to read files when the pragmas for processing files are executed. This is for example not the case for the small code snippets which are compiled by the various eval implementations provided by some modules. The file include statements are: #file-as-const Filename #file-as-code Filename #file-as-template Filename #file-as-bytecode Filename All four statements include an external file. The first includes a file as string constant with no additional processing. The 2nd just continues compilation in the specified file (and returns when the end-of-file is reached). The third includes the file as template. This is pretty similar to #file-as-const, but $-substitutions and tags are evaluated (see "Templates" above). The fourth includes (aka. 'links in') an already precompiled SPL bytecode file. If the filename is prefixed with an asterisk character (*), the file is interpreted as so-called embedded-file. Embedded files must be declared in the same SPL program file as they are referred to and are similar to the Perl __DATA__ construct. Embedded files are declared (a little bit like Here-Documents) using: #embedded-file Filename Token .... Token The declaration can be anywhere in the file, but usually it is done after the actual program code. A very different kind of compiler pragma is '#encoding'. SPL usually expects all input in UTF-8. But if your files are not UTF-8 encoded, the '#encoding' pragma can be used to specify the encoding: #encoding iso8859_1 At the moment only the encodings "ascii", "iso8859_1" and "latin_1" (these are three names for the same character set) and "utf_8" are known to the SPL compiler. SPL also has statements for defining and deleting macros: #define pi 3.14159265 #define mysqrt(x) ((x)*(x)) debug sqrt(pi); #undef pi #undef mysqrt The macro value is terminated by the end of the line. Multi-line values are also possible by beginning all additional lines with a backslash character. Hexadecimal, Octal and Binary Numbers ------------------------------------- Hexadecimal, octal and binary numbers can be used in SPL with the prefixes '0x', '0o' and '0b'. So the following 4 lines of code are equal: debug 255; debug 0xff; debug 0o377; debug 0b11111111; The C-like prefix '0' for oktal numbers does not work here. Numbers with leading zeros are interpreted as decimal numbers in SPL. The 'eval' statement -------------------- The 'eval' statement can be used to to execute dynamically created SPL code. Example given: eval "debug 'Hello World';"; An 'eval' returns -1 on compiler errors and zero otherwise. It is strongly recommended to check the return code of an eval statement. Hosted Variables ---------------- Hosted variables (hnodes) are variables which are managed by SPL modules. Usually they are handlers such as open database connections. The module documentations describe the behavior of the hosted variables provided by the modules. Such hosted variables usually behave more or less like they are normal SPL variables. But some of them are very different. Some operators - especially those used for array and object operations - may show a very differnt bahvior. So read the module documentations carefully and don't wonder if e.g. such a variable looks like an assoziative array but the foreach loop doesn't work with it. Internationalization and Localization ------------------------------------- The SPL builtins library provides some bindings for the standard gettext setlocale(), bindtextdomain(), textdomain(), gettext() and dgettext() functions. In addition to that there also exists a special localization operator (the underscore) which can be used as prefix for string constants in SPL: debug _"Hello World!"; debug _<:> : The underscore can also be used as prefix for inline templates. : But that disables the support for the tags. ; The special thing about that operator is that the dollar substitutions are handled in a different way in such strings. E.g. the statement debug _"The sum of $a and $b is ${a+b}."; Is automatically transformed by the SPL compiler to a call to a special translation function: debug _("The sum of {0} and {1} is {2}.", undef, a, b, a+b); (The 2nd parameter is the text domain to be used. Passing undef means that the text domain set using textdomain() should be used.) That way it is possible to even translate messages that contain substitutions by creating a .po file containing something like: msgid "The sum of {0} and {1} is {2}." msgstr "Die summe von {0} und {1} ist {2}." When a different text domain than the one set by the last call to textdomain() should be used, the prefix '_DOMAINNAME_' can be used instead of a simple '_'. For example when the above message should be translated using the 'foobar' domain, the following code could be used: debug _foobar_"The sum of $a and $b is ${a+b}."; A dummy C file with all the translatable strings in a SPL source file can be generated using 'splrun -NX'. This dummy C file can then be used as input for the xgettext program for creating or updating .po files: splrun -NX demo.spl | xgettext -C - A more detailed description of the generic gettext API and the tools can be found in the 'gettext' info page. Walking through Contexts ------------------------ First of all: This section is more meant as information about what is possible than a recommendation to actually use the methods described here. SPL variables (aka SPL "nodes") may have child nodes. Those child nodes have unique names and a defined order. The arrays described above are in fact just nodes with such child nodes. When addressing nodes in that tree, i.e. when creating a path in that node tree, a dot is used to tell the virtual machine which part of the path specifies the parent, which the child, grandchild, etc. If parts of the path need to be created dynamically, the [ .. ] operator is used. E.g.: foo.bar[1234] will instruct the SPL virtual machine to look up the variable "foo", then look up its child "bar" and then its child "1234". But where to look for "foo" in the first place? Each task has a so-called context node. That is the node in which all the local variables of the current command block are stored (as children). If the variable can't be found in the current context, it will be looked for in the context node of that context node, and so on. That means, whenever a new command block is opened (e.g. with '{'), a new context node is created and the old task context node becomes the context node of the new task context node. The '}' destroys the new context node and the old context node becomes the new task context node again. (The foreach and for loops create a local context even if the loop body is not a '{' ... '}' block.) All this is done automatically by SPL and the resulting behavior is exactly as one would expect it from a C-like programming language. But here comes the interesting part: If a variable name starts with '[*]', it means that one context is skipped in the look up path. So it is possible to address the upper contexts directly. It is for example possible to declare global variables from a function context: function create_foobar(value) { var [*].foobar = value; } create_foobar(42); debug foobar; It is even possible to write to the context pointer directly and so change the context pointer. E.g.: object A { var foobar = "I am 'foobar' from object A."; } object B { method print_foobar() { if (declared foobar) debug foobar; else debug "Variable 'foobar' not found."; } } B.print_foobar(); B.[*] = A; B.print_foobar(); The '[*]' operator skips over local command blocks and only stops at function (or other non-local) contexts. One possible use case of all this is inserting a lookup context in splcall_*() functions. That way it is possible to implement loops with tags which declare their own local variables. But other that with the context dispatching operator (the '*' prefix for function calls) the body can still access the local variables from the context in which the body has been declared. Example given: function splcall_myloop(textfunc, %args) { var mytextfunc, myctx, text; mytextfunc := textfunc; myctx.[*] = mytextfunc.[*]; mytextfunc.[*] = myctx; for (var i = args.from; i<=args.to; i++) { myctx["counter"] = i; text ~= mytextfunc(); } return text; } write(<:> : $i: Current counter: $counter ); In addition the to '[*]' operator there is also the '[+]' operator for following the class pointer (pointing to the parent of an object), the '[/]' operator always points to the root node and the '[.]' operator always points to the current object (aka 'this' pointer). Command Blocks without local context ------------------------------------ Command blocks declared with '{' ... '}' have their own local context. Thus, variables or functions declared in that block are only local to this context. It is also possible to declare command blocks without a local context using the '{[' ... ']}' brackets. Example given: var msg = ""; function x() { msg ~= "World"; } /* append "Hello " */ { function x() { msg ~= "Hello "; } x(); } /* append "World" */ x(); /* append "!" */ {[ function x() { msg ~= "!"; } ]} x(); debug msg; This program prints "Hello World!". It wouldn't if '{' ... '}' would not have a local context or '{[' ... ']}' would have one. Inline Assembly --------------- This is another thing which is more of academic interest. Since the SPL compiler creates byte code for the SPL virtual machine, it is possible to create code for that machine more directly too. One way of doing that is by using SPL assembler code. The compiler allows inlining assembler code. The keyword "asm" expects a list of assembler statements. Each statement in an extra string constant (no substitutions, etc are allowed in these strings). There is no delimiter for these string constants. E.g.: asm 'pushc "Hello World"' 'debug'; There isn't anything which can be done by using the assembler but can't be done by the high-level language. So the inline assembler is more important for academic purposes or obscure optimizations. The command "../splrun -AN" can be used to dump the assembler code generated by the SPL compiler. Just in case you are interested. SPL Performance --------------- Variable values in SPL are always stored in so-called SPL Nodes ('struct spl_node' in spl.h). Such an SPL node is a heavy data structure, aprox. 100 bytes large. It has fields for a wide range of value types and some additional fields for internal purposes (example given garbage collection). The size, complexity and propabilities of the spl_node struct is the reason why SPL is very good for handling big and complex data very fast on the one hand but small data with small low-level-operations very slow on the other hand. Example given: Most scripts do a lot of string concatenations. SPL is internally storing all strings as binary trees so all the malloc-copy-free cycles found in other scripting languages are not required and string concatenation is a pretty fast operation in SPL. But small operations such as incrementing an integer are extreamly slow in SPL. Two extreme examples: Damn slow in SPL compared to other scripting languages: for (var i=0; i < 100000; i++) { /* do nothing */ } Damn fast in SPL compared to other scripting languages: var text = "foobar"; for (var i=0; i < 1000; i++) text = "$text$text"; The 1st example does nothing else than incrementing a counter up to 100000. This is pretty slow in SPL because the increment operator creates a new value (i.e. a new spl_node struct) and frees the old value for every increment operation. Other scripting languages would simply overwrite the integer pointed to by the variable 'i'. This is not possible in SPL because 'i' points to an spl_node and this node is strictly read only because other variables may also point to it and so changing it would also change these other variables. The 2nd example would try to allocate about 6^1000 bytes of memory (about 1.416e+769 Gigabytes) in other languages. But SPL never tries to actually allocate the memory because in this example program the string is only used in other string concatenations. Both examples are extreme and not representative for real world applications. In general one should try to do the "big logic" in SPL but implement the inner loops of performance critical algorithms somewhere else (example given in SPL functions written in C). The last chapter of the SPL manual (i.e. the file README.API) describes how to use the SPL C-API, example given for writing SPL modules in C. Command Line Debugger --------------------- The command line debugger is still under construction.