This is the main repository/project folder of Jux.
Jux is a simple stack-based concatenative functional programming language. Jux has two intertwined goals:
- Be a very simple basic language that is as easy as possible to implement on any target platform.
- Be an easy target for other languages to compile to, not sacrificing features along the way.
To be able to live up to both of these goals, Jux defines a very small set of literals and functions that have to be defined by an implementation, but has a much larger standard library with 'fallback' implementations that are used by default. Target implementations can be made more efficient by building native versions of more and more of these standard library functions.
The same holds true for data types: The literal data types a target has to implement are int
, string
, quotation
(linked lists), symbol
(a name with a constant integer value). The other supported data types (tuple
(pairs, triples, etc; fixed-size arrays of values), dict
(an associative key->value store ) and tree
are by default implemented on top of lists.)
Jux is a functional, minimalistic, concatenative, homoiconic, nominally-typed programming language:
- functional: Everything in Jux is a function, taking the current environment as input, and returning a changed environment as output.
- minimalistic: Jux designed to be as simple as possible to implement. There are only +- 18 primitive instructions(most of which are trivial to implement), two literal types(integers and linked lists called 'quotations') and the rest of the interpreter/compiler is completely straightforward. Jux works without needing any behind-the-scenes data manipulation (i.e. no garbage collection necessary).
- concatenative: To combine multiple functions, the result is passed left-to-right.
- homoiconic: Jux's source maps neatly unto what happens on the stack. There also is no difference between a snippet of souce code and a linked list containing some other type of data.
- nominally typed: All values in Jux have a type, which basically is a 'label' which can be queried and certain low-level operations might alter. The advantage is that a certain implementation might decide to alter one of Jux's types from a quotation-based implementation to a lower-language literal (which is more efficient) without breaking anything.
Jux is inspired by the concatenative languages:
- Joy(WebArchive link; original website is down)
- Cat(WebArchive link; original website is down).
- PostScript
Furthermore, Jux is influenced by:
- LISPs like Common LISP, Scheme and Clojure.
- Elixir.
- The minimalistic Nock language that is part of Urbit.
-
Old Ruby implementation prototype: 100%; (not in this repository)
-
Flesh out this Readme: 85%;
-
Elixir implementation version: 0.5.85;
-
New Ruby implementation: 65%;
-
Think about custom function definitions: Syntax:
"name" "documentation" [implementation] def
-
Think about fallback rewrite function implementations:
- -> A Fallback rewrite can contain its own name, as long as this happens inside a quotation, so we can expand a rewrite rule all at once without creating an infinite loop. These kinds of recursive rewrites with the same name used inside a quotation are however very useful sometimes.
- All of these functions need documentation just like normal functions.
- Have definitions of all functions, including primitive ones.
- primitive function definitions contain a quotation with
__PRIMITIVE__
inside, which itself is a defined no-op, but handled by the evaluator to raise an error when encountered during execution.
-
Think about efficiency:
- Fully expand fallback rewrites, so when a fallback definition is found, only a single rewrite step is necessary.
- Custom rewrite rules to be applied before and after expanding that increase efficiency; Things like:
- How do we make them? When/how do we run them?
- examples:
reverse length ==> length
dup pop ==>
dup swap ==> dup
[] i ==>
0 + ==>
- see the Mathematical foundations of Joy for more examples of rewrite rules in concatenative languages
-
Think about core syntax:
- Allow improper lists or not?
- Allow anything (not starting with
/
, not containing[
or]
or whitespace) as an identifier. Allow escaped string-identifiers with custom characters in there as/"something with spaces"
- good idea or not?
- How does a nominal type system work in practice? What are its problems?
Minimize core language:
Literals:
- Integers (both positive and negative (?))
- Quotations
- Strings (Default implementation maps the string format to a quotation with the String type filled with integers )
Currently 18 primitive operations: (These are very trivial to implement)
dup, pop, swap
Stack manipulation.dip
Combinators.cons, uncons
Quotation manipulation.add
Integer arithmetic.ifte
Conditionals.eq?
Equality.compare
Comparisons of ordering.nand
Boolean operations.bnand
Bitwise operations.identifier_to_string
Convert an identifier to string form.string_to_identifier
Convert a string to identifier form.callable?
def
,redef
: Define a new identifier-implementation. TODO: Maybe a way to emulate one of them?print
Output to STDOUT. TODO: Maybe combine with other STDIO-stuff?
Probably going to add:
- Some way to listen to STDIN.
- Some way to access the file system.
- Some way to talk to external programs?
- Some way to crash with an error message.
Here is in greater detail what certain functions are used for:
- core stack manipulation:
dup pop swap
. - core stack manipulation combined with
dip
is enough to define the following combinators:i
(interpret)b
(interpret lower, then upper)k
(interpret, then put source of interpretation on top)m
w
- comparisons done by primitive functions
compare
andeq?
, which are enough foreq?, neq? lt?, lte?, gt?, gte?, zero?, one?, empty?
. Together with math used foreven?, odd?
. ifte
to allow conditionals.- together with above enough for many recursive definitions, as seen below.
- boolean operations: based on NAND-logic, so with only
nand
we can definenand, not, and, or, nor, xor
.- because of the truthiness nature of these operations, defining
true
andfalse
is not necessary, as they are trivially[] not not
and[] not
, respectively.
- because of the truthiness nature of these operations, defining
- bitwise operations: In a similar sense, with only
bnand
we can definebnand, bnot, band, bor, bnor, bxor
. sub
can be implemented using two's complement (built onbnot
) withadd
.- O(n) integer multiplication/division/modulo built on
add
/sub
with recursion.- TODO:
pow, isqrt, gcd, lcm
. - maybe something similar can be done with bitwise operators?
- TODO:
cons
anduncons
, together with comparisons,ifte
anddip
is enough for recursion through lists, which allows:foldl, foldr, append, reverse_append, backwards_append, map, length, any?, all?, filter, reject, sum, product
- recursion also allows nice definitions for functions like
factorial
,triangular
, etc. - Because the default strings are linked lists of integers, we can easily create a
to_string
for integers, lists and strings.identifier_to_string
is used to mix and match identifiers in there as well.
- Create a way to use rewrite rules to enhance efficiency; things like
reverse length === length
- Self-hosting.
- Multiple implementations.
- Explore data types, and how to 'fake' more advanced data types when you only have integers and quotations.
- Explore differences between string-as-list and string-as-byte_arr for implementation/efficiency.
- Explore the advantages/drawbacks of a bytecode variant of Jux.
- Explore (dependently?) statical typing and if/how it might work with a concatenative language.
A bare-bones (also called a level-0) implementation has to contain at least:
- The literals:
Integer
([+-]?\d+
), an integer of at least 32-bits precision. If you can use Bignums, that is a plus, as Jux does not check for overflow. If your target platform cannot use at least 32-bits precision, it will be a reduced level -1 implementation.- TODO (Possibly going to be removed)
Float
([+-]?\d+.\d+
), a float of at least 24-bits precision, preferably following the IEEE 754 standard. If your target does not support floating-point arithmetic, it will be a reduced level -1 implementation. String
(".*"
, escaping is possible with\"
.). Contains all characters between the quotes as string. Is converted to a Quotation.Identifier
([a-z][\w]*[?!]?
). These are the names of operations we want to perform. An identifier either refers to one of the built-in primitive functions, or a custom function implementation that was defined earlier in the program or in the standard library. There are many cases in which we pass the identifiers around, rather than calling them as functions right away.
In actuality, Integers and Strings can also be considered functions, that take zero parameters as input and push their inner value as output. In most implementations, it makes most sense to just take the literal and copy it to the stack when applicable, though.
Quotation
. (\[.*\]
, can be nested; contents have to be a proper Jux expression. ) Quotations are linked lists of arbitrary items. In a program, they can be manipulated by adding more elements to it or removing elements. Quotations are also used for metaprogramming and control flow: By putting functions (remember, literals are in fact also functions) inside them, one can pass one or multiple functions around without evaluating them right away.
A compiler/interpreter needs to:
- Take a program as string as input.
- Parsing the above literals to an internal in-memory representation called the function queue, filtering any whitespace, newlines and comments.
- To now evaluate the program:
- Initialize the empty memory stack.
- while the function queue is not empty:
- Take the function at the beginning of the function queue, and call it with the stack as input.
- (in practice it makes most sense to simply 'copy over' Integers, Strings and Quotations from the beginning of the function queue to the top of the stack, only really calling Functions)
- If it is a function, try to use custom implementation, if it exists. If it doesn't, fall back to one of the fallback rewrite rules that alter the function queue.
- Replace the current stack with the stack that the function returned as output.
- Take the function at the beginning of the function queue, and call it with the stack as input.
pop
: Removes the top value of the stack. Throws an error if there is nothing in the stack.dup
: Duplicates the top value on the stack. Throws an error if there is nothing in the stack.swap
: Swaps the top item with the item below it. Throws an error if there are less than two items on the stack.
The standard library extends on this with multiple variations of swapping, duplicating etc.
dip
: Take the quotation at the top of the stack. Temporarily store and remove the value right below it elsewhere. Execute the quotation on the rest of the stack. Afterwards, put the stored value back on top.
The standard library extends on this with:
i
: Interpret/evaluate the quotation on top of the stack.b
: Take the two quotations on top of the stack, and interpret/evaluate them in reverse order.- TODO
w
- TODO
k
- TODO
c
- TODO
y
: Useful to make recursive definitions.
compare
: Pops the top two items,a
andb
(a
being the item originally on top). Puts-1
on top of the stack ifb
is smaller thana
,0
if they are equal, and1
ifa
is larger thanb
.
Note that all Jux values are comparable with each other; They form a monoid. This is useful to easily create sorting functions and associative lookups without extra overhead or edge cases.
Comparisons have the following order:
Integer/Float < Boolean < Function < String < Quotation
- An Integer/Float is compared by its numerical value.
- Comparison between Functions and comparisons between Strings is done by using 'dictionary order'.
- Quotations are compared element-wise; if the first elements are the same, the second elements are considered, etc. A shorter quotation that has the same prefix as a longer quotation is smaller.
The standard library extends on this with:
eq
: Pops the top two items. Putstrue
if top two items are equal, otherwisefalse
.neq
: Pops the top two items. Putsfalse
if top two items are equal, otherwisetrue
.lt
,gt
,lte
,gte
,zero?
,one?
empty?
, which is true if the element on top is an empty list.
ifte
: Takes three quotatios from the top of the stack: The bottommost is the condition that is checked. After checking the condition, the stack is returned to its original state. If the result of this condition is anything other thanfalse
, the middle quotation (the then-part) is interpreted/evaluated. Otherwise, the topmost quotation (the else-part) is interpreted/evaluated.
add
: Pops the top two elementsa
andb
(a
being the item originally on top) and pushes the result of performing integer additionb + a
.sub
: Pops the top two elementsa
andb
(a
being the item originally on top) and pushes the result of performing integer additionb - a
.
The standard library extends on this with:
inc
dec
- TODO
mul
- TODO
pow
- TODO
div
- TODO
gcd
- TODO
lcm
- TODO
fact
- TODO
fib
- TODO
isqrt
Maybe more?
true
: Should push a value representing booleantrue
to the stack.false
: Should push a value representing booleanfalse
to the stack.
not
: Pops top of the stack. Pushestrue
if the top of the stack wasfalse
,false
otherwise.or
: Pops the top two elementsa
andb
(a
being the item originally on top) Pushesfalse
if botha
andb
arefalse
,true
otherwise.and
: Pops the top two elementsa
andb
(a
being the item originally on top) Pushesfalse
if at least one ofa
andb
arefalse
,true
otherwise.
As can be seen, anything except false
is considered truthy in Jux.
The standard library extends on this with:
xor
: Pops the top two elementsa
andb
(a
being the item originally on top) Pushesfalse
if exactly one ofa
andb
isfalse
,true
otherwise.
bnand
The standard library extends on this with:
bnot
: Pops top of the stack. Pushes the bitwise complement of that Integer.bor
: Pops the top two elementsa
andb
(a
being the item originally on top). Pushes the bitwise or of these two Integers.band
: Pops the top two elementsa
andb
(a
being the item originally on top). Pushes the bitwise and of these two Integers.bxor
: Pops the top two elementsa
andb
(a
being the item originally on top). Pushes the bitwise xor of these two Integers.
cons
: Pops the top of the stacka
and the quotationq
just below it, and returns a new quotationq2
wherea
is the final item inq
.uncons
: Pops the top of the stackq2
, and extracts the final itema
. Pushesq
which isq2
without this item. Then pushesa
.
The standard library extends on this with:
reverse
: Reverses a listfoldr
: Folds a list, starting at the rightmost end.append
: Concatenates two quotations into one.length
: Returns the number of items in a quotation.map
: Maps a quotationq
over each of the elements in the listl
, returning a new list.sum
: Calculates the arithmetic sum of a list of integers.- TODO
product
: Calculates the arithmetic product of a list of integers. foldl
: Given a quotationq
, a starting accumulatoracc
, and a quotation-listl
to perform on:- Pushes
acc
. - Pushes the first (leftmost) value in
l
- Evaluates
q
- Pushes the next value in
l
. - Evaluates
q
- etc, until the list is empty.
- Pushes
identifier_to_string
The standard library extends on this with:
integer_to_string
quotation_to_string
- the more generic
to_string
that dispatches to one of the above.
print
: Prints the string on top of the stack to STDOUT.
The standard library extends on this with:
puts
: Prints the string on top of the stack to STDOUT, followed by a newline.
TODO
TODO
Whitespace (including tabs and newlines) is ignored, but used to separate below expression types, as well as to improve human readability.
Single-line comments start with #
and continue until the end of the line.
The following literal values are accepted anywhere in the program:
- integers
- floats
- strings, delimited with
"
. These should escape\n
,\t
, etc. Quotes themselves can be escaped with\"
. - quotations. These are lists, delimited with
[
and]
. The contents of a quotation should be parsed, but immediately evaluated or expanded. Quotations can be nested.[]
is an empty quotation.
Identifiers, which are names of functions. The first character of an identifier has to be alphabetic or an underscore. All other characters can be alphanumeric, underscores or .
. Identifiers are allowed to end on ?
or !
.
An example of above rules in practice:
# This is a comment
1 2 3.0 "foo" dup swap [4 5 pop "bar"] # This is another comment
i pop [42]
To define a function, the built-in def
function can be used: "functionname" "documentation" [function implementation here] def
. For clarity, this is usually written as:
"xor"
"
Calculates the boolean XOR of the top two arguments.
"
[
dup2
or # get 'or' of top two arguments
[and not] dip # get 'nand' of bottom two arguments'
and # only true if both of the above are true
]
def
As can be seen, no special syntax is necessary to parse this.
- Heredoc multiline strings?
- Do/don't have multiline comments?
Idea: Data Types built on Records:
-
New syntax:
/foo
===[foo] uncons popd
. Puts a single identifier on the stack without evaluating it. Not required, but a lot nicer for below explanations. Could also be used for function names in normaldef
statements. -
Namespaces are used to keep the language explicit and to prevent name clashes. To keep the language simple, only conventions enforce the use of namespaces.
redef
ining a function in a namespace you're not working on is possible (and required for some inner workings of below things), but greatly frowned upon.- If there is a way to enforce this on language-level, or have certain implementations enforce this once they are mature enough, I would love to know.
-
CDTs are 'just' quotations, where the header is an identifier; the Record's Type.
-
This identifier is implemented as a function which constructs the CDT itself. (so
2 0 Point
==>[2 0 Point]
, and[2 0 Point] i
===[2 0 Point]
) -
A function (
Typename.__ancestors__
e.g.Point.__ancestors__
) that returns a list of all the (super)types a CDT implements.- implemented using
to_string
,string_append
andto_identifier
. - This function's definition is updated every time this type is listed as being part of a supertype.
- take old function result (list of supertypes)
- add new supertype to end
- remove duplicates from result
- use this new list in new definition. (
redef
)
- implemented using
-
Allow the definition of dispatching multimethods, which take the type of the input value, and then attempt to dispatch to an implementation of the desired function name.
- Check for implementations using
callable?
on each of the supertype function implementations, e.g.:Circle.Protocol.radius callable?
, then call the first one that exists.
- Check for implementations using
-
Allow similar multimethods for functions with multiple inputs:
- Protocol for custom Comparisons of CDTs with each other; Only needs a single implementation for two-way comparisons.
New required functions for this stuff:
callable?
-> True if identifier is implemented as function.to_string
/to_identifier
-> turns anything to a string, turns a string to an identifier (as long as it is a valid identifier!)redef
-> defines a function, possibly overriding an already-existing definition.- maybe
def
could be defined in terms ofredef
andcallable
?
- maybe
System overview:
A Jux implementation environment consists of two parts:
This is a linked-list esque structure (the head of the stack should be available in O(1), the other end of the stack might be available in O(n)). All data that is being calculated arrives on the stack.
This hash table stores the implementations that certain identifiers have. In other words, it gives certain identifiers a 'meaning'.
This HashTable is altered by the primitive def
, redef
or undef
operations. Furthermore, the callable?
primitive checks if a certain identifier is a key in the hash table.
HashTable Implementations are used both to define ordinary functions, as well as auxillary information about e.g. data types;
Data stored here is not garbage collected. It stays in here until explicitly undef
ined.
This is a queue filled with the not-yet consumed operations that will be executed on the stack. An interpreter usually first parses a source code file, and reads it into this in-memory list.
For some implementations, it might make most sense to first read the source file in a linked list (where the head is the lastly-read operation), and then reverse this list before starting execution.
The head of the Function Queue is changed when:
- An identifier with a non-native implementation is encountered. At this time, the identifiers making up its implementation are pushed to the front of the function queue (also known as 'unshifting').
- A quotation is passed in a Combinator: This unshifts the contents of this quotation to the front of the Function Queue (so that the tail-most part of the quotation is the new front of the queue, and the element originally at the head of the quotation is the element just on top of the original next function).
The Function Queue is an implementation 'artefact' and not considered to really be part of the current environment. Rather, it is a list of future steps to change the current environment by. There is no way for any Jux function to directly access any part of the Function Queue. The only thing that might happen, is for a quotation
Most Jux functions/operations are purely functional, as they take the current environment (Stack + HashTable), and return a new environment (Stack+HashTable).
The only exceptions to this are functions that deal with input/output:
print
- STDIO-stuff that hasn't been created yet.
-
fopen
opens a stream to the file at the given filename. -
fclose
closes an open stream. (Is this necessary? Should we maybe close as soon as stream handle is popped from stack?) -
fgetc
gets a single character from the stream. This is either an integer withe labeltypeInteger
or it is a0
with labeltypeEOF
if the end of the file was reached. -
fputc
puts a single character (byte) to the stream. -
fseek
moves the file position indicator to a specific point (in bytes) in the file. -
ftell
gets the current file position (in bytes). -
stdin
puts the STDIN stream handle on top of the stack. -
stdout
puts the STDOUT stream handle on top of the stack. -
stderr
puts the STDERR stream handle on top of the stack. (is this one absolutely necessary?)
TODO: File system names (ls, cd, etc).
A Jux Token is either a single identifier, or a pair of two jux tokens.
a 'simple' identifier:
- (identifier, type) where integers and [],(the empty quotation) are also identifiers.
or:
((a, b), type), where a and b are both a Jux Token. (a, b)
might also be written as [a | b]
.
This second definition lets us create more advanced data structures, such as quotations and trees. Many definitions abstract the second definition, to make it more fast, saying 'a token is either a single token with a type or a (possibly improper) linked list of tokens with a type'.
Strings are a list of Integers. They are only treated differently from normal quotations during output. (As well as there existing a parser shorthand to create a string by delimiting it with "..."
)
Depending on the type
label a structure has, values might be treated differently. This mostly is important when supplying custom implementations for certain kinds of structures. If, for instance, you would like to use native strings
A Jux Token is either an Atom, or a list of Jux Tokens.
A Jux Atom is either an Integer (positive or negative whole number), a String (any characters delimited with "
), an Identifier (a name that might or might not have an implemetation) or an Escaped Identifier (starts with a /
, is an identifier that is to be unescaped on copying over from function queue to stack.)