العربية

Design of Alusus Language

Open All   Close All

Introduction

Alusus Language is designed to be a comprehensive language that can be used to write any program regardless of the field, environment, or execution mechanism. This comprehensiveness required designing the language from a philosophical perspective rather than practical perspective that is often linked to a certain field or environment. This comprehensiveness also requires making the language expandable by the user or the community instead of limiting the development of the language to a central team. The design also needs to allow the programmer (the user) to reach and control the compiler itself. Also, expanding or modifying the language should be doable dynamically, i.e. without needing to rebuild the compiler.

Alusus' Definition of a Computer Program

Alusus Language's definition of a computer program is irrelevant of the field of this program or its execution environment. Alusus defines a computer program to be a set of statements; each statement consists of one or more subjects; each subject can be a literal, an identifier, an expression, a command, another statement, or a set of other statements. Based on this high level definition, a base grammar is created and is made dynamic and an ability was added to Alusus to create new grammar rules that inherits from the base grammar. This allows the language to be expandable without breaking the general guidelines and consistency of the grammar and without causing parsing ambiguity.

An Open, Flexible, and Decentralied Compilation System

Instead of creating a closed monolithic compiler that understands a predefined set of programming paradigms and a predefined execution mechanism, Alusus adopted a different design that allows the compilation to be done by a system that is open and modularized with a central component that manages the compilation process and provides the foundation on which the different components of the system is based on. This allows modifying or expanding the language and the compilation proces by replacing certain parts or adding new parts. The system also allows any programmer to reach its internal components and data entities in order to develop new compilation modules and that makes it possible for the community to develop the language rather than being limited to a small central group. It also allows developing different aspects of the language simultaneously by different teams. The following graph compares a traditional compilation method to the one adopted by Alusus:

Design Overview

Compilation System

Instead of depending on a monolithic compiler, Alusus uses a compilation system that is decentralized and modularized. The compilation system consists of:
  • Core: The central component of the system. The Core defines the base grammar, parses the provided source code, and manages the build libraries.
  • Build Libraries: Defines a specialized grammar using the base grammar and converts the parsed data into executable code. These libraries are linked dynamically to the Core which can load unspecified number of those libraries simultaneously. Loading the libraries is done through commands in the source code being compiled./li>
The following graph shows the flow of data from source code to executable code:

Build libraries are simply dynamic libraries that contain data types related to the grammar and the compilation process and they are loaded the same way regular libraries are loaded, i.e. with the `import` command inside the source code being compiled. This way each project can decide the language feature it needs without needing to configure the compiler in any way.
The Core contains a dynamic repository for grammar definitions that can be accessed by build libraries to add their own specialized grammar or build handlers. The Core also contains a generic definitions repository that can be used by build libraries to add their build results to make them available publicly to other build libraries or to the program being built.
The following graph shows the relations between the different components of the build system:

It's also possible to define additional grammar or build handlers inside the source code being built itself. In other words, the program being built can define its own grammar given that those definitions preceed their use in the source code.

Grammar of Alusus Language

Alusus' grammar has the following features:
  • Data Driven Grammar: It's possible to build grammar definitions that are controllable by variables at run time.
  • Dynamic Grammar: It's possible to add or modify new grammar during compilation.
  • Grammar Inheritance: It's possible to derive new grammar definitions from other definitions using grammar inheritance which allows the new grammar to inherit and override the properties of the parent grammar. This feature also allows building grammar templates.
  • Grammar Templates: Alusus' grammar contains predefined grammar templates that can be used by programmers (whether build library writers or not) to add their own grammar definitions that are consistent with the rest of the grammar.
  • Modular Grammar: Alusus allows the creation of grammar modules to group related grammar together and simplify grammar inheritance. For example, all grammar definitions related to expressions are grouped in one module which makes it easy to create specialized expressions by inheriting the expression module.
These grammar features enables the creation of generic base grammar definitions upon which the rest of the grammar is built, which guarantees the consistency of the grammar built by different independent teams. The Base Grammar in Alusus is simplified and generic and it directly matches Alusus' definition of a computer program:
  • Program: A set of statements.
  • Statement: Consists of one subject or a series of subjects.
  • Subject: Can either be a literal, an identifier, an expression, a command, a statement, or a set of statements.
  • Command: Consists of a keyword, followed optionally by a subject or a series of subjects.
  • Expression: Consists of a subject, or a hierarchy of subjects linked with operators.
In addition to the hierarchical structure above, Alusus' grammar contains Modifiers which are attachments that can be applied on any of the elements mentioned in the above list. Modifiers are used to add metadata to any part of the program.

Notice from the definitions above that the base grammar is not related in any way to the nature of the program or the environment of execution. It does not associate the language with a certain field, instead it leaves the language open to all fields of programming. The Core only understands a small set of specialized commands, among those is a command to load other libraries or source files (import command). When a build library is loaded, it feeds the Core with its own specialized grammar which is derived from the base grammar and it remains responsible for handling the data parsed with those specialized grammar definitions. The Core links the new grammar to those libraries and it calls them during parsing whenever it encounters that grammar. The Core can load an unspecified number of libraries and it remains responsible for coordinating between them.

Grammar and Parsing Techniques

In addition to the common techniques in writig grammars and parsers, Alusus uses the following techniques:

Using Data in Grammar Definitions

Grammar definitions can use data passed as parameters to the definition, or stored globally. The following example shows a command definition that keeps the keyword data driven:
SubCmd (kwd:string) : kwd Expression.
IfCommand : SubCmd("if") Statement.
WhileCommand : SubCmd("while") Statement.
In the upper example, the definition of `SubCmd` receives a string as a parameter and uses it as a literal in the definition. This deinition is then used to define two commands: If and While. This technique is not limited to using data as literals; it's also possible to use arrays and apply grammar operations on them. For example:
BinaryOperation (kwds:list[string]) : Operand (kwds[0] | kwds[1] | ...) Operand.
LogicalOperation : BinaryOperation(["and", "or", "xor"]).
MathOperation : BinaryOperation(["+", "-", "*", "/"]).
Usage of data in grammar definitions is open to all possibilities in a way similar to how data is used in programming languages. For example, it's possible to apply the elements of the array on a template and apply grammar operations on the result as in the following example:
BinaryOperation (kwds:list[string]) : Operand (Command(kwds[0]) Command(kwds[1]) ...).

Modularized Grammar

Grammar definitions can be grouped into modules in a way similar to object oriented programming. In the following example, definitions related to expressions are grouped into one module:
Expression : {
  Add (kwds=["+","-"]) : Multiply [(kwds[0] | kwds[1] | ...) Add].
  Multiply (kwds=["*","/"]) : Operand [(kwds[0] | kwds[1] | ...) Multiply].
  Operand : Identifier | Literal.
}.
It's also possible to define a module inside another module, and it's possible for definitions inside a module to refer to or be referenced by definitions outside the module.

Grammar Inheritance

In Alusus, grammar definitions can inherit from other definitions. As in object oriented programming, inheritance in the grammar copies the properties of a definition into the inheriting definition which can in turn override some of those properties. For example, if we have the following definition:
LogicalOperation (kwds=["and", "or"]) : Operand (kwds[0] | kwds[1] | ...) Operand.
then we can derive a new definition from it and add more keywords to the child (inheriting) definition:
MyLogicalOperation -> LogicalOperation (
  kwds = ["and", "or", "&&", "||"]
).
Inheritance is also possible with modules, so you can have one module inherits from another. In the case of modules, inheritance copies all elements of the parent module to the child module which in turn can replace some of those elements or add new elements. In the following example we define a module that inherits another module and replace one of its definitions:
MyExpression -> Expression {
  Operand : Identifier | Literal | "(" Add ")".
}.

Multi-Dimensional Parsing

Multi-dimensional parsing allows marking certain grammar productions to be parsed in parallel to the main parsing thread. On each step of the main parsing thread, the parser can jump into the parallel parsing thread and once it's done parsing the parallel thread it goes back to the same point where it left in the main parsing thread. The following figure shows how the operation works:

This technique is used to simplify the defintion of productions that can appear in many places across the grammar, instead of having to manually reference that production everywhere. The following example clarifies the benefit of this technique:
DefStatement : "def" Identifier ":" Identifier.
ParallelStatement : "@" Identifier.
With the definition of ParallelStatement as a parallel grammar, the following statements become all valid:
@myattribute def myvar : mytype;
def @myattribute myvar : mytype;
def myvar : @myattribute mytype;
Without multi-dimensional parsing, the defintion of DefStatement will have to be like this:
DefStatement : [ParallelStatement] "def" [ParallelStatement] Identifier ":" [ParallelStatement] Identifier.

Design Principles

There are some principles that were adopted during the design and implementation of Alusus, and it's required from Alusus developers and contiributors to adopt these principles while working on the Core or the libraries. In this list `programmer` refers to Alusus users, not developers:
  • Independence of Grammar from Context: Alusus grammar should remain independent from the context of the program. In other words, the parser should be able to parse the source code without needing to know what that code or its elements actually mean.
  • Direct Expression of Intent: The grammar should enable the programmer to follow a direct path towards his goal while writing his program. In other words, the programmer should be able to write and design his program according to its goals or features rather than the way it's going to be compiled.
  • Avoid Unneeded Syntax: For example, there is no need to force the use of brackets if the code can be parsed without them.
  • Consistency of Grammar and Design: We should keep consistency in the grammar and the libraries.
  • Rationality for Grammar Rather Than Habits: We don't necessarily need to follow what's common in progamming languages because the logical reasoning is more important than the beautiy of the code or the habits of the programmers.
  • There are no standards in syntax designs, but there is a standard for the syntax of math formulas. Therefore, mimicing math standards should be higher priority under the condition that it doesn't contradict with the rationality of the grammar. For example, functions in math are written using regular brackets therefore functions in Alusus should also be written using regular brackets.
  • Minimize Dependence on New Grammar: The more generic the grammar is, the less is the need for new grammar.
  • Orthogonality and Modular Design: Orthogonality and modular design should be targetted as much as possible.
  • Enabling the programmer to work on all levels starting from direct control of the hardware all the way to the highest programming level.
  • Limiting a Single Library to the Same Programming Level: When designing the standard libraries, mixing different programming levels inside the same library should be avoided as much as possible.
  • Support the Features at the Lowest Possible Level: The lower the level at which a feature is supported, the wider is its availability.
  • Enabling Centralized Control of Features: The openness of the language necessitates the ability to centrally control the availability of features to team members. For example, a team lead can prevent the use of pointers in a certain namespace, or he can limit the accessable libraries inside a certain namespace.
  • Avoid Making Decisions on Behalf of the Programmar: A programmer should know how the compilation system will treat his program. For example, it's not appropriate for a build library to decide the memory management model without allowing the programmer to control that decision.
  • Avoid Artificial Boundaries: For example, we should not prevent the programmer from using direct pointers in a certain context if such usage is possible. Depriving the programmer from a feature just because it can be misused is not acceptable.

Standard Libraries

Standard libraries are categorized into:
  • Build Libraries: A group of build libraries to support popular programming paradigms.
  • Runtime Libraries: A group of libraries containing functions and classes to support basic runtime functionalities like math or string operations.
Standard libraries focus on supporting language features and functionalities that are most common in various applications, and it leaves it to the community to implement the exceptional or less common features. Features that are to be supported by standard libraries are:
  • Programming Paradigms:
    • Object-Oriented Programming.
    • Procedural Programming.
    • Functional Programming.
    • Aspect-Oriented Programming.
    • Supporting multiple paradigms inside a single application.
  • Programming levels. The standard libraries will support three programming levels:
    • Lowest Level: This level supports basic language features like loops, conditional statements, variable definitions, pointers, etc.
    • Mid Level: This level provides low level libraries like math libraries, string libraries, system interaction libraries, etc.
    • High Level: This level supports various memory management models, parallel programming, database programming, etc.
  • All data types are treated like objects. For example, you can create a class that inherits from Integer.
  • Functions supports multiple inputs (arguments) as well as multiple outputs (multiple return values).
  • Spreading Operations: Write an operation once and apply it on multiple objects.
  • Class and function templates.
  • Parallel programming. The standard libraries are to support features to ease parallel programming:
    • Commands to simplify forking inside a single function, or call other functions parallely.
    • Automatic forking on loops marked for parallel execution.
    • Automatic forking on functions marked for parallel execution.
    • Automatic control of execution threads to guarantee maximum performance.
    • Automatic synchronization of resources marked for synchronous access.
    • Commands for communication between parallel threads.
  • Single Instruction Multiple Data.
  • Exception handling.
  • Modifiers: The possibility to define code units that can be applied on other code units to modify their features or functionalities. This is used in Aspect-Oriented Programming as well as other areas.
  • Dynamic objects. The libraries will also support using dynamic objects side-by-side with static objects.
  • Client-server development.
  • Compile-Time Execution: This allows executing code during compilation to control the compilation process and its results.
  • Optionally provide run time type information.
  • Ability to choose between interpretation, compilation to intermediate language, or compilation to machine code, with the ability to mix between the different methods and the ability to choose the target platform of the compilation.

Overview of the Syntax

Following are samples of the grammar defined in standard libraries. This is only an overview; it doesn't contain all the details of the grammar.

Expressions

Expressions consists of subjects linked with operators in a way similar to popular programming languages. The following is a list of the important operators: NOT operator: !
OR operator: |
XOR operator: $
AND operator: &
Math operators: +، -، *، /
Bitwise operators: &، |، $، !
Logical operators: &&، ||، $$، !!
Comparison operators: <، >، =>، =<، =
Assignment operator: =
Other Assignment operators: +=، -=، *=، /=، |=، &=، $=
Lists are separated by commas. For example: a,b,c
Grouping subjects is done using regular brackets: ()

Loops

 For: "for" Initial_Expression "," Condition_Expression "," Update_Expression (Statement|Block).
 While: "while" Expression (Statement|Block).
 Do-While: "do" (Statement|Block) "while" Expression.

Conditional Statements

 "if" Expression (Statement|Block) ["else" (Statement|Block)].

Definitions

All definitions in the language is done using the `def` command including variable definitions, constant definitions, function definitions, class definitions, etc. The general syntax is as follows:
"def" name ":" body.
`body` can be a function, a class, a namespace, a datatype, etc., as in the following:
Variable Definition:
"def" name ":" type.
Constant Definition:
"def" name ":" @const type.
Function Definition:
"def" name ":" "function" "(" Input_List ")" "=>" "(" Output_List ")" Block.
Class Definition:
"def" name ":" "type" [Inheritance_Specifier] Block.
Module Definition:
"def" name ":" "module" Block.
The `def` command is also used in other definitions like arrays and pointers as explained below.

Arrays

Arrays are defined using the `def` command as follows:
"def" name ":" "array" "[" type, number "]".
And to dynamically specify the array size the following syntax is used:
 "def" name ":" "array" "[" type "]" "(" number ")".
 "def" name "=" "array" "[" type "]" "~new" "(" number ")".
Array Usage:
name "(" number ")".

Pointers

Pointers are defined using `def` as follows:
"def" name ":" "ptr" "[" type "]".
To access the location pointed by a pointer the `~cnt` operator is used:
name "~cnt".
To get the location of a variable the `~ptr` operator is used after the variable's name:
name "~ptr".

Mixing Definitions

It's possible to mix between definition types using `def`. For example, you can define a pointer to an array, or a pointer to a function, or an array of pointers, etc. The following example shows how to define an array of pointers to functions:
"def" name ":" "array" "[" "ptr" "[" "function" "(" Params ")" "]" "]".

Merging Definitions

Definitions can be merged with an existing definition using the def command by adding the @merge modifier, as follows:
"@merge" "def" name ":" "module" "{" Definitions "}".
"@merge" "def" name ":" "{" Definitions "}".

Modifiers

Modifiers can appear almost anywhere in the program and not necessarily at the beginning of a statement. Modifiers have the following syntax:
"@" name [ Expression ].

Regular Brackets and Square Brackets

Regular brackets are used for runtime operations like grouping subjects in an expression or passing arguments in function calls. On the other hand square brackets are used for compile-time operations like defining the type of a pointer or an array. In other words, if the info is to be sent to the compiler itself the square brackets are used, otherwise regular brackets are used.

Curly Brackets {}

Curly brackets are used to group multiple statements into a block. These blocks are used in conditional statements for example or in bodies of functions, classes, or namespaces.
 Block: "{" [ Statement_List ] "}".
 Statement_List: Statement { ";" [Statement] }.

Seperating Statements

Semicolons are used to separate statements in a way similar to the usage of comma to separate elements of a list. In other words, the semicolon itself is not part of the statement and it can be ignored if no other statement follows it.

Rationale Behind Some Syntax Decisions

  • Function Brackets: Regular brackets have always been used for functions in math, so Alusus chose to follow suit and use them for functions.
  • Command Arguments: Command arguments (like expressions of conditional statements) do not include brackets because parsing can be done without them, therefore adding them is meaningless.
  • Some programmers prefer not to use statement separators (;) but if this is correct then why do we have statement separators in human languages?
  • Access control keywords like `public` and `private` are treated as modifiers (starting with @) because they only carry metadata that are used by the compilation system and they do not affect the way the program is executed.
  • The keyword `def` is used to define functions and classes in order to put the entire definition on the side that follows the colon instead of having the name of the function or the class in the middle of the definition. This makes it easier to understand the definition especially with complex definitions like pointers to functions or arrays of pointers to functions.
  • Regular brackets are used to access array members instead of square brackets because the latter is used for compile time arguments. For the same reason regular brackets are used to dynamically specify array sizes while square brackets are used to define array types which is a compile time info.

Examples

Some of these examples are implemented already like the Hello World and the factorial examples, while other examples will be implemented in upcoming releases like class definition examples.
import "Srl/Console.alusus";

def HelloWorld : module
{
  def main : function
  {
    Srl.Console.print("Hello World!");
  }
}

import "Srl/Console.alusus";

def Factorial : module
{
  def factorial : function (i:int)=>(int)
  {
    if i == 0 return 1 else return i * factorial(i-1)
  };

  def main : function
  {
    Srl.Console.print("Factorial of %d is %d", 5, factorial(5));
  }
}

import "Srl/Console.alusus";

def While : module
{
  def main : function
  {
    def i : int = 1;
    while i < 10 {
      Srl.Coneole.print("%d\n", i);
      ++i;
    };
  }
}

import "Srl/Console.alusus";

def Arrays : module
{
  def main : function
  {
    def a : array[int](10);
    for i:int=0, i<10, ++i a(i) = i;
    for i:int=0, i<10, ++i Srl.Console.print("%d\n", a(i));
  }
}

import "Srl/Console.alusus";

def Pointers : module
{
  def main : function
  {
    def p : ptr[int], i : int;
    i = 10;
    p = i~ptr;
    print("%d\n", p);     // prints the address of i.
    print("%d\n", p~cnt); // prints the value of i.
  }
}

import "Srl/Console.alusus";

def Casting : module
{
  def main : function
  {
    def i : int, f : float;
    i = 10;
    // Treat the bits of i as a floating point number by doing this:
    // Get the pointer of i,
    // then cast the pointer to a pointer of float,
    // then take the contents of that float pointer.
    f = i~ptr~cast[ptr[float]]~cnt;
    Srl.Console.print("%f\n", f);
  }
}

import "Srl.alusus";

def ClassDefinition : module
{
  def BaseClass : type
  {
    @private def i : int;
    @private def j : int;
    @public def constructor : function (a:int, b:int)
    {
      this.i = a;
      this.j = b;
    };
    @public def getI : function ()=>(int) { return this.i };
    @public def getJ : function ()=>(int) { return this.j }
  };

  def ChildClass : type inherits BaseClass
  {
    @private def k : int;
    @public def constructor : function (a:int, b:int, c:int)
    {
      parent.constructor(a, b);
      this.k = c
    };
    @public def getK : function ()=>(int) { return this.j }
  }
}

Copyright (C) 2018 Sarmad Khalid Abdullah.
This file is released under Alusus Public License, Version 1.0. For details on usage and copying conditions read the full license in the accompanying license file or at https://alusus.org/alusus_license_1_0.