Preprocessed CamphorScript

A human-readable programming language that can be translated into Brainf*ck

1. Introduction

1-1. Introduction

Preprocessed CamphorScript is an imperative language designed to facilitate programming in Brainf*ck, an esoteric programming language made up of only eight instructions.

Programming in Brainf*ck can be difficult and tedious. However, by using many syntactic extensions, Preprocessed CamphorScript makes it possible to write programs much easier.

One of the most important properties of Preprocessed CamphorScript is that it is extensible. It is equipped with many ways to extend this language, including user-defined operators and customizable syntaxes.

2. Lexical Structure

2-1. Metasyntax Notations

The syntax of this language is represented by these notations:

[ pattern ] optional
{ pattern } zero or more repetitions
( pattern ) grouping
pat1 | pat2 choice
pat1 = pat2 definition
"string" string
'string' string
pat1, pat2 concatenation
pat1 - pat2 exception
? 0x2A ? character (hexadecimal)

Note that exception binds more strongly than concatenation and concatenation binds more strongly than choice.


2-2. Program Structure

A Preprocessed CamphorScript program consists of zero or more sentences.

Definitions of sentences are described in Chapter 4 and Chapter 6.

3. Tokens

3-1. Characters

anychar = alphanumbar | opsymbol | bracs | future | white | scolon;

allchar = anychar | ? 0x5C ?;

opsymbol = '!' | '%' | '&' | '*' | '+' | ',' | '-' | '/' | ':' | '<' | '=' | '>' | '?' | '@' | '^' | '|' | '~';

bracs = '(' | ')' | '{' | '}' | '"' | "'";

scolon = ';';

white = ? 0x20 ? | nswhite;

nswhite = newline | ? 0x09 ?;

newline = ? 0x0A ? | ? 0x0D ?;

future = '#' | '$' | '.' | '`' | '[' | ']';

digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';

alphanumbar = alphabar | digit;

alphabar = upper | lower;

upper = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z';

lower = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | '_';

Characters not in anychar cannot be used in a Preprocessed CamphorScript program except inside comments or pragmas.


3-2. Spaces and Comments

As in languages such as C, Preprocessed CamphorScript ignores most of the spaces.

The exceptions are:

  • between two alphanumbars
  • inside character literals

Comments are also ignored, parsed like spaces are, except when they are after opsymbols followed by zero or more spaces.

__ = { none };

none = spaces | comment;

spaces = white, { white };

comment = ( '/*', { allchar }, '*/' ) - pragma;


3-3. Identifiers

An identifier can be used for the name of a variable or a function.

ident = ( alphabar, { alphanumbar } ) - reserved;

reserved = 'char' | 'delete' | 'infixl' | 'infixr' | 'void' | 'constant' | 'const' | 'syntax' | 'block';


3-4. Operator Identifiers

An operator identifier can be used for the name of an operator.

op = op2 | opsymbol, { opsymbol | white };

op2 = op3 | '<', spaces, op2, spaces, '>';

op3 = '<', spaces, '{', spaces, alphabar, { alphanumbar }, spaces, '}', spaces, '>';


3-4. Literals

In Preprocessed CamphorScript, there are two kinds of literals: numeric literals and character literals. They only differ in the way they are expressed in the source code.

A numeric literal is a sequence of 0-9. Unlike some languages, even number literals starting with zero are considered to be decimal, not octal.

A character literal is a character except single quote and nswhite surrounded by single quotes.

num = uint1 | uint2;

uint1 = digit, { digit };

uint2 = "'", ( allchar - ( "'" | nswhite ) ), "'";


3-5. Type Bases

A type base represents a set of data. Types, which are properties of r-values, are derived from type bases.

The following type base is built-in.

charan integer between 0 and 255

typebase = 'char';

Currently, there is no way to create any other type bases.

4. Variables

A variable is defined using, or linked to, a type base, storing a datum that belongs to the type base. Referring to an undefined variable must result in an error.


4-1. Variable Definition

Variables are defined by declaration statements.

It is possible to add initializers (e.g. char a = 5;) to variables. If not, they will automatically be initialized with 0.

Note that = is not a built-in operator; it is just that char a = 5; is a syntactic sugar of char a; a += 5; and that char a = b = 5; is a syntactic sugar of char a; a += 5; char b; b += 5;, += being a built-in operator.

vardef = typebase, none, __, singledef, __, { ',', __, singledef, __ }, __, scolon;

singledef = ident, __, [ '=', __, { ident, __, '=', __ }, num ];


4-2. Variable Deletion

Variables are deleted by deletion statements.

Variables defined in a block must be deleted in the same block.

The effect of attempting to delete a variable which contains non-zero data is undefined.

vardel = 'delete', none, __, ident, __, scolon;


4-3. R-value

An r-value is either a literal or a variable.

value = ident | num;


4-4. Block

A block wraps zero or more sentences and creates a scope. A semicolon is implicitly inserted before a '}'.

block2 = '{', { sentence } '}';

5. Types

5-1. Types

In Preprocessed CamphorScript, every r-value has a type, including the parameters used in the definitions of functions, operators and block syntaxes.

Each type base creates three types. In the following table, T represents a type base.

typecan be given to:modifiable
T &variablesYes
const Tr-values No
constant TliteralsNo

type = 'constant', none, __, typebase, none | 'const', none, __, typebase, none | typebase, __, '&';

Variables defined using type bases T have the type of T &; numeric literals and character literals have constant char.


5-2. TypeLists and ValueLists

A typelist is a list of types and parameters, separated by operators.

A valuelist is a list of values separated by operators.

A tailtypelist is a list of types and parameters, separated and started by operators. It can be empty.

A tailvaluelist is a list of values, separated and started by operators. It can be empty.

typelist = type, __, ident, __, { op, __, type, __, ident, __ };

valuelist = value, __, { op, __, value, __ };

tailtypelist = { op, __, type, __, ident, __ };

tailvaluelist = { op, __, value, __ };

The following table shows whether or not an r-value can be passed to a parameter.

r-value \ parameterT &const Tconstant T
T & YesYesNo
const T No YesNo
constant TNo YesYes

6. Functions, Operators and Block Syntaxes

6-1. Details about Inlining

In Preprocessed CamphorScript, "functions", "operators" and "block syntaxes" are merely an advanced version of C-style macros; thus, all of them, excluding those that are built-in, are inlined.

Because macros in languages such as C do not have the concept of type, they cannot prevent wrong arguments from being passed to them. The type system of Preprocessed CamphorScript prevents such cases, making it easier to write structured programs in an otherwise function-less language.

A function, an operator or a block syntax has one or more typelists. It is possible to have multiple typelists. This is known as overloading.

Note that, since functions, operators and macros are inlined, they cannot recursively call itself. Mutual recursion is also forbidden.


6-2. Function Definition

Functions are defined by function definition statements.

funcdef = 'void', none, __, ident, __, '(', __, typelist, __, ')', __, blockornull;

blockornull = block2 | '=', __, '0', __, scolon;

Functions defined by using =0; are called null functions and compilers are required to report errors when those functions are called.

The following functions are built-in; they cannot be deleted or redefined, but they can be overloaded.

void read(char& a) gets one byte of input and stores it in a
void write(char& a) outputs the content of a

6-3. Fixity Definition

The fixity of an operator must be defined before it can be used.

There are two types of fixity, namely left fixity and right fixity.

An operator's fixity can be defined more than once, but they all have to be the same.

The fixities of built-in operators += and -= are implicitly defined as infixr 5 (+=); and infixr 5 (-=);

infixl = 'infixl', none, __, num, __, '(', __, op, __, ')', __, scolon;

infixr = 'infixr', none, __, num, __, '(', __, op, __, ')', __, scolon;


6-4. Operator Definition

Operators are defined by opreator definition statements.

operdef = 'void', __, '(', __, op, __, ')', __, '(', __, typelist, __, scolon, __, typelist, __, ')', __, blockornull;

The following operators are built-in.

void (+=)(char& a; constant char N) adds N to a and stores the result in a
void (-=)(char& a; constant char N) subtracts N from a and stores the result in a

6-5. BLOCK Statement

A BLOCK statement, not to be confused with a block, is a statement made by appending a semicolon after the reserved word block.

It can only be used inside the latter block of block syntax definition.

resblc = 'block', __, scolon;


6-6. Block Syntax Definition

Block syntaxes are defined by syntax definition statements.

syndef = 'syntax', none, __, ident, __, '(', __, (typelist | tailtypelist), __, ')', __, '{', __, 'block', __, { scolon | __ } '}', __, block2;

The following block syntax is built-in.

syntax while(char& a){block;} repeats executing block while a is non-zero

6-7. Function Call

Functions are called by function call statements.

funccall = ident, __, '(', __, valuelist, __, ')', __, scolon;

Further details of function calls are explained in 6-1.


6-8. Block Syntax Call

Block syntaxes are called by block syntax call statements.

syncall = ident, __, '(', __, ( valuelist | tailvaluelist ), __, ')', __, block2;


6-9. Operator Call

An operator is called by operator call statements; however, there are five ways to call an operator.

The first way is prefix notation, where an operator being called comes first, surrounded by parentheses.

Two operands are separated by a semicolon and then surrounded by parentheses.

opcall1 = '(', __, op, __, ')', __, '(', __, valuelist, __, scolon, __, valuelist, __, ')', __, scolon;

The second way is fully-parenthesized infix notation, where two operands are both surrounded by parentheses and the operator comes between the two.

opcall2 = '(', __, valuelist, __, ')', __, op, __, '(', __, valuelist, __, ')', __, scolon;

The third way is left-parenthesized infix notation, where only the left operand is parenthesized.

This notation also requires that the all the operators (if any) in the right valuelist must satisfy at least one of the following properties:

  • has stronger fixity than the "central" operator
  • has the same fixity as the "central" operator and both are infixr

opcall3 = '(', __, valuelist, __, ')', __, op, __, valuelist, __, scolon;

The fourth way is right-parenthesized infix notation, where only the right operand is parenthesized.

This notation also requires that the all the operators (if any) in the left valuelist must satisfy at least one of the following properties:

  • has stronger fixity than the "central" operator
  • has the same fixity as the "central" operator and both are infixl

opcall4 = opcall4a | opcall4b;

opcall4a = value, __, op, __, { value, __, op, __ }, __, '(', __, valuelist, __, ')', __, scolon;

opcall4b = '(', __, valuelist, __, ')', __, scolon;

The final way is non-parenthesized infix notation, where operands are not parenthesized. This is essentially a valuelist followed by a semicolon.

The actual operator to be called is determined by the following steps:

  1. If there is no operator, nothing is called.
  2. Look for the operator(s) with the smallest fixity.
  3. If all the operator(s) with the smallest fixity are infixl, the leftmost one is called.
  4. If all the operator(s) with the smallest fixity are infixr, the rightmost one is called.
  5. If mixed, the statement is invalid.

opcall5 = valuelist, __, scolon;

7. Pragmas

7-1. Pragmas

Pragmas are special statements used to give additional instructions to the compiler.

A compiler does not need to implement any pragmas, though unimplemented pragmas should be ignored.

pragma = '/*#', { allchar }, '#*/';


7-2. Line Start Pragmas

The LINE start pragma tells the compiler that following lines are from another file and that compilers implementing this pragma should produce error messages according to that information.

Note that LINE start pragmas can be recursively used.

Example: /*# LINE start "stdcalc" #*/


7-3. Line End Pragmas

The LINE end pragma tells the compiler that following lines are no longer from another file and that compilers implementing this pragma should produce error messages according to that information.

Example: /*# LINE end "stdcalc" #*/


7-4. Memory Using Pragmas

When a MEMORY using pragma is followed by a call whose definition defines "local" variable(s), it tells the compiler to use specific variable(s), all of which must be zero, instead of newly allocating memories for the "local" variable(s).

Example:

void (+=)(char& a; char& b){
	char c2;
	while(b){
		a += 1; c2 += 1; b -= 1;
	}
	while(c2){
		b += 1; c2 -= 1;
	}
	delete c2;
}
char c,d,e; read(c);
/*# MEMORY using e #*/   d += c ; 

8. Full Lexical Structure

8-1. Program and Sentence

program = { sentence };

sentence = vardef | syndef | vardel | scolon | resblc | infixl | infixr | spaces | block2 | comment | pragma | funcdef | operdef | funccall | syncall | opcall1 | opcall2 | opcall3 | opcall4 | opcall5;


8-2. Characters

anychar = alphanumbar | opsymbol | bracs | future | white | scolon;

allchar = anychar | ? 0x5C ?;

opsymbol = '!' | '%' | '&' | '*' | '+' | ',' | '-' | '/' | ':' | '<' | '=' | '>' | '?' | '@' | '^' | '|' | '~';

bracs = '(' | ')' | '{' | '}' | '"' | "'";

scolon = ';';

white = ? 0x20 ? | nswhite;

nswhite = newline | ? 0x09 ?;

newline = ? 0x0A ? | ? 0x0D ?;

future = '#' | '$' | '.' | '`' | '[' | ']';

digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';

alphanumbar = alphabar | digit;

alphabar = upper | lower;

upper = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z';

lower = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | '_';


8-3. Tokens

__ = { none };

none = spaces | comment;

spaces = white, { white };

comment = ( '/*', { allchar }, '*/' ) - pragma;

ident = ( alphabar, { alphanumbar } ) - reserved;

reserved = 'char' | 'delete' | 'infixl' | 'infixr' | 'void' | 'constant' | 'const' | 'syntax' | 'block';

op = op2 | opsymbol, { opsymbol | white };

op2 = op3 | '<', spaces, op2, spaces, '>';

op3 = '<', spaces, '{', spaces, alphabar, { alphanumbar }, spaces, '}', spaces, '>';

num = uint1 | uint2;

uint1 = digit, { digit };

uint2 = "'", ( allchar - ( "'" | nswhite ) ), "'";

typebase = 'char';


8-4. Variables

vardef = typebase, none, __, singledef, __, { ',', __, singledef, __ }, __, scolon;

singledef = ident, __, [ '=', __, { ident, __, '=', __ }, num ];

vardel = 'delete', none, __, ident, __, scolon;

value = ident | num;

block2 = '{', { sentence } '}';


8-5. Types

type = 'constant', none, __, typebase, none | 'const', none, __, typebase, none | typebase, __, '&';

typelist = type, __, ident, __, { op, __, type, __, ident, __ };

valuelist = value, __, { op, __, value, __ };

tailtypelist = { op, __, type, __, ident, __ };

tailvaluelist = { op, __, value, __ };


8-6. Functions, Operators and Block Syntaxes

funcdef = 'void', none, __, ident, __, '(', __, typelist, __, ')', __, blockornull;

blockornull = block2 | '=', __, '0', __, scolon;

infixl = 'infixl', none, __, num, __, '(', __, op, __, ')', __, scolon;

infixr = 'infixr', none, __, num, __, '(', __, op, __, ')', __, scolon;

operdef = 'void', __, '(', __, op, __, ')', __, '(', __, typelist, __, scolon, __, typelist, __, ')', __, blockornull;

resblc = 'block', __, scolon;

syndef = 'syntax', none, __, ident, __, '(', __, (typelist | tailtypelist), __, ')', __, '{', __, 'block', __, { scolon | __ } '}', __, block2;

funccall = ident, __, '(', __, valuelist, __, ')', __, scolon;

syncall = ident, __, '(', __, ( valuelist | tailvaluelist ), __, ')', __, block2;

opcall1 = '(', __, op, __, ')', __, '(', __, valuelist, __, scolon, __, valuelist, __, ')', __, scolon;

opcall2 = '(', __, valuelist, __, ')', __, op, __, '(', __, valuelist, __, ')', __, scolon;

opcall3 = '(', __, valuelist, __, ')', __, op, __, valuelist, __, scolon;

opcall4 = opcall4a | opcall4b;

opcall4a = value, __, op, __, { value, __, op, __ }, __, '(', __, valuelist, __, ')', __, scolon;

opcall4b = '(', __, valuelist, __, ')', __, scolon;

opcall5 = valuelist, __, scolon;


8-7. Pragmas

pragma = '/*#', { allchar }, '#*/';

© 2014 hsjoihs