Preprocessed CamphorScript
A human-readable programming language that can be translated into Brainf*ck
1. Introduction
1-1. Introduction
Preprocessed CamphorScript is an imperative language designed to facilitate programming in Brainf*ck, an esoteric programming language made up of only eight instructions.
Programming in Brainf*ck can be difficult and tedious. However, by using many syntactic extensions, Preprocessed CamphorScript makes it possible to write programs much easier.
One of the most important properties of Preprocessed CamphorScript is that it is extensible. It is equipped with many ways to extend this language, including user-defined operators and customizable syntaxes.
2. Lexical Structure
2-1. Metasyntax Notations
The syntax of this language is represented by these notations:
[ pattern ] | optional |
{ pattern } | zero or more repetitions |
( pattern ) | grouping |
pat1 | pat2 | choice |
pat1 = pat2; | definition |
"string" | string |
'string' | string |
pat1, pat2 | concatenation |
pat1 - pat2 | exception |
? 0x2A ? | character (hexadecimal) |
Note that exception binds more strongly than concatenation and concatenation binds more strongly than choice.
2-2. Program Structure
A Preprocessed CamphorScript program consists of zero or more sentences.
Definitions of sentences are described in Chapter 4 and Chapter 6.
3. Tokens
3-1. Characters
anychar = alphanumbar | opsymbol | bracs | future | white | scolon;
allchar = anychar | ? 0x5C ?;
opsymbol = '!' | '%' | '&' | '*' | '+' | ',' | '-' | '/' | ':' | '<' | '=' | '>' | '?' | '@' | '^' | '|' | '~';
bracs = '(' | ')' | '{' | '}' | '"' | "'";
scolon = ';';
white = ? 0x20 ? | nswhite;
nswhite = newline | ? 0x09 ?;
newline = ? 0x0A ? | ? 0x0D ?;
future = '#' | '$' | '.' | '`' | '[' | ']';
digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';
alphanumbar = alphabar | digit;
alphabar = upper | lower;
upper = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z';
lower = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | '_';
Characters not in anychar cannot be used in a Preprocessed CamphorScript program except inside comments or pragmas.
3-2. Spaces and Comments
As in languages such as C, Preprocessed CamphorScript ignores most of the spaces.
The exceptions are:
- between two alphanumbars
- inside character literals
Comments are also ignored, parsed like spaces are, except when they are after opsymbols followed by zero or more spaces.
__ = { none };
none = spaces | comment;
spaces = white, { white };
comment = ( '/*', { allchar }, '*/' ) - pragma;
3-3. Identifiers
An identifier can be used for the name of a variable or a function.
ident = ( alphabar, { alphanumbar } ) - reserved;
reserved = 'char' | 'delete' | 'infixl' | 'infixr' | 'void' | 'constant' | 'const' | 'syntax' | 'block';
3-4. Operator Identifiers
An operator identifier can be used for the name of an operator.
op = op2 | opsymbol, { opsymbol | white };
op2 = op3 | '<', spaces, op2, spaces, '>';
op3 = '<', spaces, '{', spaces, alphabar, { alphanumbar }, spaces, '}', spaces, '>';
3-4. Literals
In Preprocessed CamphorScript, there are two kinds of literals: numeric literals and character literals. They only differ in the way they are expressed in the source code.
A numeric literal is a sequence of 0-9. Unlike some languages, even number literals starting with zero are considered to be decimal, not octal.
A character literal is a character except single quote and nswhite surrounded by single quotes.
num = uint1 | uint2;
uint1 = digit, { digit };
uint2 = "'", ( allchar - ( "'" | nswhite ) ), "'";
3-5. Type Bases
A type base represents a set of data. Types, which are properties of r-values, are derived from type bases.
The following type base is built-in.
char | an integer between 0 and 255 |
typebase = 'char';
Currently, there is no way to create any other type bases.
4. Variables
A variable is defined using, or linked to, a type base, storing a datum that belongs to the type base. Referring to an undefined variable must result in an error.
4-1. Variable Definition
Variables are defined by declaration statements.
It is possible to add initializers (e.g. char a = 5;) to variables. If not, they will automatically be initialized with 0.
Note that = is not a built-in operator; it is just that char a = 5; is a syntactic sugar of char a; a += 5; and that char a = b = 5; is a syntactic sugar of char a; a += 5; char b; b += 5;, += being a built-in operator.
vardef = typebase, none, __, singledef, __, { ',', __, singledef, __ }, __, scolon;
singledef = ident, __, [ '=', __, { ident, __, '=', __ }, num ];
4-2. Variable Deletion
Variables are deleted by deletion statements.
Variables defined in a block must be deleted in the same block.
The effect of attempting to delete a variable which contains non-zero data is undefined.
vardel = 'delete', none, __, ident, __, scolon;
4-3. R-value
An r-value is either a literal or a variable.
value = ident | num;
4-4. Block
A block wraps zero or more sentences and creates a scope. A semicolon is implicitly inserted before a '}'.
block2 = '{', { sentence } '}';
5. Types
5-1. Types
In Preprocessed CamphorScript, every r-value has a type, including the parameters used in the definitions of functions, operators and block syntaxes.
Each type base creates three types. In the following table, T represents a type base.
type | can be given to: | modifiable |
T & | variables | Yes |
const T | r-values | No |
constant T | literals | No |
type = 'constant', none, __, typebase, none | 'const', none, __, typebase, none | typebase, __, '&';
Variables defined using type bases T have the type of T &; numeric literals and character literals have constant char.
5-2. TypeLists and ValueLists
A typelist is a list of types and parameters, separated by operators.
A valuelist is a list of values separated by operators.
A tailtypelist is a list of types and parameters, separated and started by operators. It can be empty.
A tailvaluelist is a list of values, separated and started by operators. It can be empty.
typelist = type, __, ident, __, { op, __, type, __, ident, __ };
valuelist = value, __, { op, __, value, __ };
tailtypelist = { op, __, type, __, ident, __ };
tailvaluelist = { op, __, value, __ };
The following table shows whether or not an r-value can be passed to a parameter.
r-value \ parameter | T & | const T | constant T |
T & | Yes | Yes | No |
const T | No | Yes | No |
constant T | No | Yes | Yes |
6. Functions, Operators and Block Syntaxes
6-1. Details about Inlining
In Preprocessed CamphorScript, "functions", "operators" and "block syntaxes" are merely an advanced version of C-style macros; thus, all of them, excluding those that are built-in, are inlined.
Because macros in languages such as C do not have the concept of type, they cannot prevent wrong arguments from being passed to them. The type system of Preprocessed CamphorScript prevents such cases, making it easier to write structured programs in an otherwise function-less language.
A function, an operator or a block syntax has one or more typelists. It is possible to have multiple typelists. This is known as overloading.
Note that, since functions, operators and macros are inlined, they cannot recursively call itself. Mutual recursion is also forbidden.
6-2. Function Definition
Functions are defined by function definition statements.
funcdef = 'void', none, __, ident, __, '(', __, typelist, __, ')', __, blockornull;
blockornull = block2 | '=', __, '0', __, scolon;
Functions defined by using =0; are called null functions and compilers are required to report errors when those functions are called.
The following functions are built-in; they cannot be deleted or redefined, but they can be overloaded.
void read(char& a) | gets one byte of input and stores it in a |
void write(char& a) | outputs the content of a |
6-3. Fixity Definition
The fixity of an operator must be defined before it can be used.
There are two types of fixity, namely left fixity and right fixity.
An operator's fixity can be defined more than once, but they all have to be the same.
The fixities of built-in operators += and -= are implicitly defined as infixr 5 (+=); and infixr 5 (-=);
infixl = 'infixl', none, __, num, __, '(', __, op, __, ')', __, scolon;
infixr = 'infixr', none, __, num, __, '(', __, op, __, ')', __, scolon;
6-4. Operator Definition
Operators are defined by opreator definition statements.
operdef = 'void', __, '(', __, op, __, ')', __, '(', __, typelist, __, scolon, __, typelist, __, ')', __, blockornull;
The following operators are built-in.
void (+=)(char& a; constant char N) | adds N to a and stores the result in a |
void (-=)(char& a; constant char N) | subtracts N from a and stores the result in a |
6-5. BLOCK Statement
A BLOCK statement, not to be confused with a block, is a statement made by appending a semicolon after the reserved word block.
It can only be used inside the latter block of block syntax definition.
resblc = 'block', __, scolon;
6-6. Block Syntax Definition
Block syntaxes are defined by syntax definition statements.
syndef = 'syntax', none, __, ident, __, '(', __, (typelist | tailtypelist), __, ')', __, '{', __, 'block', __, { scolon | __ } '}', __, block2;
The following block syntax is built-in.
syntax while(char& a){block;} | repeats executing block while a is non-zero |
6-7. Function Call
Functions are called by function call statements.
funccall = ident, __, '(', __, valuelist, __, ')', __, scolon;
Further details of function calls are explained in 6-1.
6-8. Block Syntax Call
Block syntaxes are called by block syntax call statements.
syncall = ident, __, '(', __, ( valuelist | tailvaluelist ), __, ')', __, block2;
6-9. Operator Call
An operator is called by operator call statements; however, there are five ways to call an operator.
The first way is prefix notation, where an operator being called comes first, surrounded by parentheses.
Two operands are separated by a semicolon and then surrounded by parentheses.
opcall1 = '(', __, op, __, ')', __, '(', __, valuelist, __, scolon, __, valuelist, __, ')', __, scolon;
The second way is fully-parenthesized infix notation, where two operands are both surrounded by parentheses and the operator comes between the two.
opcall2 = '(', __, valuelist, __, ')', __, op, __, '(', __, valuelist, __, ')', __, scolon;
The third way is left-parenthesized infix notation, where only the left operand is parenthesized.
This notation also requires that the all the operators (if any) in the right valuelist must satisfy at least one of the following properties:
- has stronger fixity than the "central" operator
- has the same fixity as the "central" operator and both are infixr
opcall3 = '(', __, valuelist, __, ')', __, op, __, valuelist, __, scolon;
The fourth way is right-parenthesized infix notation, where only the right operand is parenthesized.
This notation also requires that the all the operators (if any) in the left valuelist must satisfy at least one of the following properties:
- has stronger fixity than the "central" operator
- has the same fixity as the "central" operator and both are infixl
opcall4 = opcall4a | opcall4b;
opcall4a = value, __, op, __, { value, __, op, __ }, __, '(', __, valuelist, __, ')', __, scolon;
opcall4b = '(', __, valuelist, __, ')', __, scolon;
The final way is non-parenthesized infix notation, where operands are not parenthesized. This is essentially a valuelist followed by a semicolon.
The actual operator to be called is determined by the following steps:
- If there is no operator, nothing is called.
- Look for the operator(s) with the smallest fixity.
- If all the operator(s) with the smallest fixity are infixl, the leftmost one is called.
- If all the operator(s) with the smallest fixity are infixr, the rightmost one is called.
- If mixed, the statement is invalid.
opcall5 = valuelist, __, scolon;
7. Pragmas
7-1. Pragmas
Pragmas are special statements used to give additional instructions to the compiler.
A compiler does not need to implement any pragmas, though unimplemented pragmas should be ignored.
pragma = '/*#', { allchar }, '#*/';
7-2. Line Start Pragmas
The LINE start pragma tells the compiler that following lines are from another file and that compilers implementing this pragma should produce error messages according to that information.
Note that LINE start pragmas can be recursively used.
Example: /*# LINE start "stdcalc" #*/
7-3. Line End Pragmas
The LINE end pragma tells the compiler that following lines are no longer from another file and that compilers implementing this pragma should produce error messages according to that information.
Example: /*# LINE end "stdcalc" #*/
7-4. Memory Using Pragmas
When a MEMORY using pragma is followed by a call whose definition defines "local" variable(s), it tells the compiler to use specific variable(s), all of which must be zero, instead of newly allocating memories for the "local" variable(s).
Example:
void (+=)(char& a; char& b){ char c2; while(b){ a += 1; c2 += 1; b -= 1; } while(c2){ b += 1; c2 -= 1; } delete c2; } char c,d,e; read(c); /*# MEMORY using e #*/ d += c ; |
8. Full Lexical Structure
8-1. Program and Sentence
program = { sentence };
sentence = vardef | syndef | vardel | scolon | resblc | infixl | infixr | spaces | block2 | comment | pragma | funcdef | operdef | funccall | syncall | opcall1 | opcall2 | opcall3 | opcall4 | opcall5;
8-2. Characters
anychar = alphanumbar | opsymbol | bracs | future | white | scolon;
allchar = anychar | ? 0x5C ?;
opsymbol = '!' | '%' | '&' | '*' | '+' | ',' | '-' | '/' | ':' | '<' | '=' | '>' | '?' | '@' | '^' | '|' | '~';
bracs = '(' | ')' | '{' | '}' | '"' | "'";
scolon = ';';
white = ? 0x20 ? | nswhite;
nswhite = newline | ? 0x09 ?;
newline = ? 0x0A ? | ? 0x0D ?;
future = '#' | '$' | '.' | '`' | '[' | ']';
digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';
alphanumbar = alphabar | digit;
alphabar = upper | lower;
upper = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z';
lower = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | '_';
8-3. Tokens
__ = { none };
none = spaces | comment;
spaces = white, { white };
comment = ( '/*', { allchar }, '*/' ) - pragma;
ident = ( alphabar, { alphanumbar } ) - reserved;
reserved = 'char' | 'delete' | 'infixl' | 'infixr' | 'void' | 'constant' | 'const' | 'syntax' | 'block';
op = op2 | opsymbol, { opsymbol | white };
op2 = op3 | '<', spaces, op2, spaces, '>';
op3 = '<', spaces, '{', spaces, alphabar, { alphanumbar }, spaces, '}', spaces, '>';
num = uint1 | uint2;
uint1 = digit, { digit };
uint2 = "'", ( allchar - ( "'" | nswhite ) ), "'";
typebase = 'char';
8-4. Variables
vardef = typebase, none, __, singledef, __, { ',', __, singledef, __ }, __, scolon;
singledef = ident, __, [ '=', __, { ident, __, '=', __ }, num ];
vardel = 'delete', none, __, ident, __, scolon;
value = ident | num;
block2 = '{', { sentence } '}';
8-5. Types
type = 'constant', none, __, typebase, none | 'const', none, __, typebase, none | typebase, __, '&';
typelist = type, __, ident, __, { op, __, type, __, ident, __ };
valuelist = value, __, { op, __, value, __ };
tailtypelist = { op, __, type, __, ident, __ };
tailvaluelist = { op, __, value, __ };
8-6. Functions, Operators and Block Syntaxes
funcdef = 'void', none, __, ident, __, '(', __, typelist, __, ')', __, blockornull;
blockornull = block2 | '=', __, '0', __, scolon;
infixl = 'infixl', none, __, num, __, '(', __, op, __, ')', __, scolon;
infixr = 'infixr', none, __, num, __, '(', __, op, __, ')', __, scolon;
operdef = 'void', __, '(', __, op, __, ')', __, '(', __, typelist, __, scolon, __, typelist, __, ')', __, blockornull;
resblc = 'block', __, scolon;
syndef = 'syntax', none, __, ident, __, '(', __, (typelist | tailtypelist), __, ')', __, '{', __, 'block', __, { scolon | __ } '}', __, block2;
funccall = ident, __, '(', __, valuelist, __, ')', __, scolon;
syncall = ident, __, '(', __, ( valuelist | tailvaluelist ), __, ')', __, block2;
opcall1 = '(', __, op, __, ')', __, '(', __, valuelist, __, scolon, __, valuelist, __, ')', __, scolon;
opcall2 = '(', __, valuelist, __, ')', __, op, __, '(', __, valuelist, __, ')', __, scolon;
opcall3 = '(', __, valuelist, __, ')', __, op, __, valuelist, __, scolon;
opcall4 = opcall4a | opcall4b;
opcall4a = value, __, op, __, { value, __, op, __ }, __, '(', __, valuelist, __, ')', __, scolon;
opcall4b = '(', __, valuelist, __, ')', __, scolon;
opcall5 = valuelist, __, scolon;
8-7. Pragmas
pragma = '/*#', { allchar }, '#*/';