docs.html

<!DOCTYPE html>
<html>
<head>
    <title>Lonang 8086 Compiler</title>
    <style>
    body {
        background: #eee;
    }
    #main {
        background: #fff;
        width: 800px;
        margin: 40px auto;
        padding: 20px;
        border-radius: 2px;
    }
    pre {
        background: #222;
        color: #fff;
        font-family: monospace;
        padding: 8px 24px;
    }
    code {
        padding: 1px 4px;
        margin: 1px 0;
        font-size: 90%;
        color: #b32;
        background-color: #fee;
        border-radius: 2px;
    }
    #goup {
        position: fixed;
        right: 40px;
        bottom: 40px;
        text-align: center;
        opacity: 0.4;
        transition: opacity 400ms;
    }
    #goup:hover {
        opacity: 1;
    }
    .arrow-up {
        width: 0; 
        height: 0; 
        border-left: 18px solid transparent;
        border-right: 18px solid transparent;
        border-bottom: 24px solid #448;
    }
    table {
        border-collapse: collapse;
    }
    table, td, th {
        border: 1px solid #ddd;
    }
    td {
        padding: 4px 4px;
    }
    </style>
    <meta charset="utf-8" />
</head>
<body>
<a id="goup" href="#index">
    <div class="arrow-up" title="Go up"></div>
</a>
<div id="main">
    <h1>Lonang 8086 Compiler</h1>
    <p>This document describes the basic usage of the <b>Lonang 8086</b>.
    This language is a C++ like language which is used to generate
    <i>8086 assembly</i> executable code.</p>

    <h2 id="index">Index</h2>
    <ul>
        <li><a href="#general">General</a></li>
        <li>
            <a href="#variables">Variables</a>
            <ul>
                <li><a href="#registers">Registers</li>
                <li><a href="#vectors">Vectors</li>
            </ul>
        </li>
        <li><a href="#comments">Comments</a></li>
        <li>
            <a href="#control-flow">Control flow</a>
            <ul>
                <li><a href="#if-else-statements"><code>if-else</code> statements</a></li>
                <li><a href="#repeat-loop"><code>repeat</code> loop</a></li>
                <li><a href="#while-loop"><code>while</code> loop</a></li>
                <li><a href="#break-statement"><code>break</code> statement</a></li>
            </ul>
        </li>
        <li><a href="#functions">Functions</a></li>
        <li>
            <a href="#builtin-functions">Built-in functions</a>
            <ul>
                <li><a href="#divmod"><code>divmod</code> function</a></li>
            </ul>
        </li>
        <li><a href="#basic-io">Basic IO</a></li>
            <ul>
                <li><a href="#printf"><code>printf</code> function</a></li>
                <li><a href="#putchar"><code>putchar</code> function</a></li>
                <li><a href="#getchar"><code>getchar</code> function</a></li>
                <li><a href="#setcursor"><code>setcursor</code> function</a></li>
                <li><a href="#cls"><code>cls</code> function</a></li>
            </ul>
        </li>
        <li><a href="#compiling">Compiling</a></li>
        <li><a href="#debugging">Debugging</a></li>
    </ul>

    <h2 id="general">General</h2>
    <p>Only <b>one</b> statement per line is allowed, since the statement
    separator used is the new line character.</p>

    <p>Once you have one or more <i>variables</i> at your disposal, you can
    perform basic operations such as:</p>
<pre>
; Add 'cx' to 'dx'
dx += cx

; Subtract 'bx' from 'ax'
ax -= bx

; Bitwise operations (they work on the bits level)
ax |= bx  ; OR
dx ^= cx  ; XOR
ax &= bx  ; AND
</pre>

    <p>Internally, numbers are represented in base 2, for instance, <i>5</i> is
    <i>101</i> (<i>= 1×2<sup>2</sup> + 0×2<sup>1</sup> + 1×2<sup>0</sup></i>).
    The bitwise operations allow to work with them on this <i>1</i>'s and
    <i>0</i>'s level. The following table represent what each bitwise operation
    would do (first two are inputs, third one is output):</p>

    <table><tr>
        <td><table>
            <tr><td colspan=3>OR</td></tr>
            <tr><td>0</td><td>0</td><td><b>= 0</b></td></tr>
            <tr><td>0</td><td>1</td><td><b>= 1</b></td></tr>
            <tr><td>1</td><td>0</td><td><b>= 1</b></td></tr>
            <tr><td>1</td><td>1</td><td><b>= 1</b></td></tr>
        </table></td>
        <td><table>
            <tr><td colspan=3>XOR</td></tr>
            <tr><td>0</td><td>0</td><td><b>= 0</b></td></tr>
            <tr><td>0</td><td>1</td><td><b>= 1</b></td></tr>
            <tr><td>1</td><td>0</td><td><b>= 1</b></td></tr>
            <tr><td>1</td><td>1</td><td><b>= 0</b></td></tr>
        </table></td>
        <td><table>
            <tr><td colspan=3>AND</td></tr>
            <tr><td>0</td><td>0</td><td><b>= 0</b></td></tr>
            <tr><td>0</td><td>1</td><td><b>= 0</b></td></tr>
            <tr><td>1</td><td>0</td><td><b>= 0</b></td></tr>
            <tr><td>1</td><td>1</td><td><b>= 1</b></td></tr>
        </table></td>
    </tr></table>

    <p><b>Multiplication</b> is slightly special, and may be more expensive in
    certain cases. As a general rule of thumb, the destination should be either
    <i>ax</i> or <i>al</i> and the second product either <i>dx</i> or <i>ah</i>.
    The type of multiplication used, either 8 or 16 bits, will be determined by
    the <b>largest</b> size of the operands, either the destination or the
    source. For instance, <code>al *= 7</code> will use 8 bits mode since 7 can
    be represented with 8 bits. But <code>al *= 260</code> will perform 16 bits
    multiplication, and discard the bits that don't fit on <i>al</i>.</p>

    <p>In the same way, both <b>division</b> <i>and</i> the modulo operator
    are specal too. As a general rule of thumb, the destination should be either
    <i>ax</i> or <i>al</i> for division, and <i>dx</i> or <i>ah</i> for modulo.s
    The type of division used, either 8 or 16 bits, will be determined by
    the <b>largest</b> size of the operands, either the destination or the
    source. For instance, <code>al /= 3</code> will use 8 bits mode since 3 can
    be represented with 8 bits. But <code>al /= 277</code> will perform 16 bits
    division, and discard the bits that don't fit on <i>al</i>.</p>

    <p>When possible, you should <b>always</b> use an register instead of an
    inmediate value, specially important on loops, <i>unless</i> using a
    <b>power of two</b>. In this case, an inmediate value is better, and
    even preferred.</p>

    <h2 id="variables">Variables</h2>
    <p>Variables can be defined anywhere in the code, although it is
    recommended to define them at the <i>top</i> of the file. Currently
    the types supported are:</p>
    <ul>
        <li><code>byte</code>: Defines a 8-bit integer value.</li>
        <li><code>short</code>: Defines a 16-bit integer value.</li>
        <li><code>string</code>: Defines a string variable (sequence of bytes).</li>
        <li><code>const</code>: Defines a constant value, replaced on compile time.</li>
    </ul>
    <p>The <code>string</code> type <b>must</b> be ASCII values, and escape
    sequences are supported, such as <i>\r\n</i> for new line, <i>\t</i> for
    TAB, etc.</p>

    <p>Some examples are shown below:</p>
<pre>
; Declaration
byte mybyte = 0x7F
short myint = 0x7FFF
string str = "Some string...\r\n"

const VALUE = 0x7FFF

; Assignment
ax, mybyte = 42, 13

; Swapping values.
; The order in which they get assigned is <b>not</b> guaranteed to be kept!
ax, bx = bx, ax
</pre>

    <p>All variables all <b>global</b>, which means that when a function
    modifies a variable, the changes are reflected upon the whole file.
    One should be careful with this. The names must always start with a
    letter (upper or lower case), and may be followed by any number of
    alphanumeric characters.</p>

    <h3 id="registers">Registers</h3>
    <p>There exist certain "special" variables which directly correspond
    with the <b>registers</b> of the underlying machine. These are also global.
    One should always use registers when possible, since they're faster to
    access than any other variable. The available registers are,
    for <code>short</code> integers:</p>
    <pre>ax, cx, dx, bx, si, di</pre>

    <p>The <i>first</i> three are recommended for general use, and the
    <i>last</i> three are recommended when indexing a vector, since this
    allows certain optimizations.</p>

    <p>One can access to the <code>byte</code> version of these by specifying:</p>
    <pre>ah, al, bh, bl, ch, cl, dh, dl</pre>

    <p>Note, however, that <b>these point to the same registers</b> as the
    first letter indicates. However, one can select to use the <b>h</b>igh
    or the <b>l</b>ow part of the register, but modifying either will result
    in the modification of the <b>x</b> version.</p>

    <p>Available short registers, although <b>not recommended</b> to use:</p>
    <pre>cs, ds es, ss, sp, bp</pre>

    <p>The first four point to the application segments, such as data and
    code, and the last two point to the stack. Modifying any of these will
    result in unpredictable results.</p>

    <h3 id="vectors">Vectors</h3>
    <p>It's possible to define a <i>vector</i> of values by appending
    <code>[]</code> to the variable type. Vectors, also known as <i>arrays</i>
    in other languages, support containing one or more values under the same
    variable, which can be later be <i>indexed</i> if one desires to access
    its contents. The <code>length</code> of the vector will be infered by
    the amount of comma separated values specified. It's also possible to
    explicitly define its length inside the square braces. If only one
    value is supplied, and the given size is different, this value will be
    duplicated accross the vector.</p>

    <p>Some <b>valid</b> examples for defining vectors are:</p>
<pre>byte[] v1 = 14
short[] = 10, 20, 30

short[1] v2 = 6
byte[5] v3 = 2, 3, 5, 7, 11

byte[42] v4
byte[6] v5 = ?  ; The '?' denotes we don't care about its actual value
byte[9] v6 = 7
</pre>

    <p>Some <b>invalid</b> examples for defining vectors are:</p>
<pre>short[1] v1 = 6, 5
byte[5] v2 = 2, 3, 5
</pre>

    <p>In order to access a vector, one can make use of either an inmediate
    value, such as <code>[2]</code> to access the <b>third</b> item of the
    vector (since <code>[0]</code> denotes the <b>first</b> element), any
    variable, or some added together. It's important to note, however,
    that in order to be as efficient as possible, one should use the
    <b>i</b>ndex registers (si and di), possibly adding to them the <b>b</b>x
    as the <b>b</b>ase register. The following combinations would cause
    no trouble, assuming <code>vector</code> is a vector variable:</p>

<pre>vector[bx] = 1
vector[si] = 2
vector[di] = 3

vector[bx+4] = 4
vector[si-5] = 5
vector[di+6] = 6

vector[bx+si] = 7
vector[bx-di] = 8

vector[bx-si+9] = 9
vector[bx+di-10] = 10
</pre>

    <p><b>Any other combination</b> will need to be translated to some of these,
    probably causing a <b>lot of overhead</b>. It is strongly encouraged to
    use the mentioned registers whenever possible.</p>

    <p>The length of the vector cannot be changed on runtime, and can be
    queried on compile time by accessing <code>vector.length</code> which
    will be replaced with its actual length.</p>

    <p><b>Note</b> that the index used to access the vector elements is
    the <i>offset in bytes</i>, not the <i>n</i>'th element. One should
    be careful with this.</p>
    <!-- TODO This should probably be fixed -->

    <h2 id="comments">Comments</h2>
    <p>Inline comments can be specified after the statement, separated by
    the semicolon character <code>;</code>. Multiline comments can also be made,
    these being composed of a starting and an ending semicolon alone:</p>
<pre>
;
; This comment is multiline
; The starting semicolon here is optional
  It can be omitted too
  As shown here
;

byte value = 42 ; Inlined comment
</pre>

    <h2 id="control-flow">Control flow</h2>
    <h3 id="if-else-statements"><code>if-else</code> statements</h3>
    <p>Conditional blocks allow to execute a piece of code if a criteria is met.
    It is also possible to execute a different piece of code if that initial
    condition is not met, although this is optional. The braces <b>must</b>
    be on the same line as the statements</p>

    <p>It is possible to optionally label <i>if</i> and <i>else</i> statements
    by appending <code>@labelName</code> to the end, before any comments. This
    will result in different label names on the generated assembly code. When
    omitted, a unique ID will be used.</p>

    <p>The supported operators for use on conditional expressions are:</p>
    <ul>
        <li><code>==</code>. Met when both sides are equal.</li>
        <li><code>!=</code>. Met when both sides are different.</li>
        <li><code>&lt;</code>. Met when the left side is less than the right side.</li>
        <li><code>&gt;</code>. Met when the left side is greater than the right side.</li>
        <li><code>&lt;=</code>. Met when the left side is less or equal to the right side.</li>
        <li><code>&gt;=</code>. Met when the left side is greater or equal to the right side.</li>
    </ul>

    <p>As an example:</p>
<pre>
if ax &lt; bx { @myFirstIf
    ;
    ; code to execute when 'ax &lt; bx'
    ;
} else { @myFirstElse
    ;
    ; code to execute when 'ax &gt;= bx'
    ;
}
</pre>

    <p>Since the case where checking whether a number is even or odd, a
    different form for the if is available, <code>if var is even</code> and
    <code>if var is odd</code> will enter the if block if the specified
    <b>var</b>iable is even or odd respectively.</p>

    <h3 id="repeat-loop"><code>repeat</code> loop</h3>
    <p>The <code>repeat</code> loop can be used to repeat a specific part of
    the code a <i>n</i> amount of times. Its syntax is defined as follows:</p>

<pre>
repeat <i>n</i> with <i>var</i> {
    ;
    ; code to repeat <i>n</i> times
    ;
}
</pre>

    <p>This statement can also be labeled. The reason why the <code>with</code>
    part is explicit is because it implies that <code>var</code> will be used
    to count down <i>n</i> times until <i>0</i> is reached.</p>

    <p>The possible values for <b>n</b> are either a decimal number (can also
    be expressed in either hexadecimal or binary, with the <i>h</i> or <i>b</i>
    suffix, defaults to decimal), a register or a variable.</p>

    <p>The possible values for <code>var</code> are either a register or a
    variable. Certain optimizations will be made if the register used is <b>cx</b>:</p>

<pre>
; Multiplication with repeated sums
ax = 0
repeat 3 with cx { @myLoop
    ax += 2
}
</pre>

    <h3 id="while-loop"><code>while</code> loop</h3>
    <p>The <code>while</code> loop can be used to repeat a specific part an
    indefined amount of times, depending on an arbitrary condition. Once
    this condition is not met, the loop will end:</p>
<pre>
while ax &lt; 10 {
    ;
    ; code to repeat <i>while</i> ax &lt; 10
    ;
}
</pre>

    <p>The code will <b>not</b> be executed even once if the condition is
    not met in the first place (<i>e.g.</i> ax turns out to be &gt;= 10).
    To solve this, it is possible to specify the loop to execute <b>at least
    once</b> with the <code>or once</code> modifier:</p>
<pre>
ax = 9
while ax == 10 <b>or once</b> {
    ax += 1
}
</pre>

    <p>The first time the loop will enter thanks to the <code>or once</code>
    part. In the loop, <i>ax</i> will become 10. Then the loop will repeat
    because the condition is met, and exit once <i>ax</i> becomes 11.</p>

    <h3 id="forever-loop"><code>forever</code> loop</h3>
    <p>This loop can be used when one desires to execute a block of code
    indefinitely. The <code>forever</code> loop proves extremely useful with
    the <code>break</code> keyword, since it allows arbitrarily complex
    conditions to be met before the loop is exited:</p>
<pre>
; Collatz conjecture starting at 7
ax = 7
forever {
    printf "%d -> ", ax
    if ax is even {
        ax, dx = divmod ax, 2
    } else {
        ax *= 3
        ax += 1
    }
    if ax == 1 {
        break 2
    }
}
put digit 1
</pre>

    <h3 id="break-statement"><code>break</code> statement</h3>
    <p>If one desires to <code>break</code> free from a block, it's possible
    to do so with the <code>break</code> keyword, used to early terminate
    a block:</p>
<pre>while ax &lt; 10 {
    ax += 1
    <b>break</b>
    ax -= 1  ; This line will never be executed
}
; Execution will continue here</pre>

    <p>However, this is not very useful per se. For this reason, it's possible
    to specify the amount of blocks one desires to break, by specifying the
    number after the keyword:</p>
<pre>
<pre>while ax &lt; 10 {
    ax += 1
    if ax > 7 {
        break 2  ; This will break two levels
    }
    ax -= 1
}
; Execution will continue here on break</pre>
</pre>

    <h2 id="functions">Functions</h2>
    <p>Functions <b>must</b> be declared at the top of the file, since
    their signature has to be known beforehand to determine whether a call
    to them is legal or not.</p>

    <p>Functions are defined by the <code>function</code> keyword, followed by
    the function name to be used. After this, a list of comma separated
    registers or variables must be specified insides parenthesis. Optionally,
    one can use the <code>returns</code> keyword to determine where the
    returned value will be passed. If omitted, the function will be considered
    to return nothing. These can also be labeled.</p>

<pre>
function multiply(ax, bx) returns dx {
    dx = 0
    if ax &lt; bx {
        repeat ax with cx {
            dx += bx
        }
    } else {
        repeat bx with cx {
            dx += ax
        }
    }
}
</pre>

    <p>To call this function, one can first specify to which variable
    or register the result will be assigned. This will be ilegal when the
    function returns no values:</p>
<pre>
dx = multiply(7, 5)
</pre>

    <p>Optimization will be made if the arguments passed match the
    original parameters list, and so will the resulting assignment.
    In the example above, no copying will be made from <i>dx</i> to <i>dx</i>
    since they're the same. If one were to call:
<pre>
ax = 7
bx = 5
dx = multiply(bx, ax)
</pre>

    <p>The value of <i>ax</i> would have to first get saved, then updated
    with the right value, and then restoring the saved value into <i>bx</i>,
    this is, <b>three</b> operations in contrast to no copying at all
    (assuming that the right registers contain the right values, which
    may not always be the case).</p>

    <p>It's possible to pass a vector variable <i>by reference</i>,
    by prepending <code>&</code> to the variable name. This also works
    on assignments and any other kind of operations, since its address
    is known at compile time.</p>

    <h2 id="builtin-functions">Built-in functions</h2>
    <p>Built-in functions are called in a similar to how user-defined functions
    are, although they do <b>not</b> use parenthesis to separate the argument
    list. Every of these are treated as a different statement.</p>

    <h3 id="divmod"><code>divmod</code> function</h3>
    <p>The <code>divmod</code> function can be used to perform both
    <i>division</i> and the <i>modulo</i> operator (to obtain the remainder).
    It will calculate both values on dividing the first argument by the second
    one:</p>
<pre>
; Store (ax divided by bx) in ax
; Store (ax modulo bx) in dx
ax, dx = divmod ax, bx
</pre>

    <p>Note that due how this function works and to guarantee that no data is
    loss, certain parameters may perform worse than others. For instance,
    calling the function with <i>dx, ax</i> and storing the result in
    <i>dx, ax</i> too will cause <b>nine</b> generated instructions, while
    invoking the function with <i>ax, bx</i> and storing the result in
    <i>ax, dx</i> will cause <b>two</b>, due to how it works.
    <br />
    Some of these special worse cases are those which have the divisor in
    either <i>ax</i> or <i>dx</i> and the result isn't stored on any register
    or variable different to these, or when the result is neither <i>ax</i> nor
    <i>dx</i>.</p>

    <h2 id="basic-io">Basic IO</h2>
    <p>It's possible to perform basic input/output operations using the console
    as an interface with the user.</p>

    <h3 id="printf"><code>printf</code> function</h3>
    <p>This function allows to <b>print</b> a possibly <b>f</b>ormatted string
    to the console. No formatting can be made when printing an already-defined
    string variable, since special handling is performed to generate a
    string which allows being formatted.</p>

    <p>The parameters accepted by this function are either a single variable,
    or an inmediate string, in which case it may contain <b>%X</b> where
    <b>X</b> indicates the type of the variable to be formatted (either a
    <b>s</b>tring or a <b>d</b>ecimal value), and after this, the arguments
    to be formatted in a comma-separated fashion:</p>
<pre>string hello = "Hello, world!\r\n"
string person = "User"
short favourite = 42

printf hello
printf "Hello %s! I see you like the number %d", person, favourite
printf "\r\nExiting...\r\n"</pre>

    <p>If two subsequent calls were to be made to <code>printf</code>,
    it's recommended to use <code>printf("%s%s", str1, str2)</code> since
    furhter optimization will be made.</p>

    <h3 id="putchar"><code>putchar</code> function</h3>
    <p>Used to <b>put</b> a single <b>char</b>acter (or digit) to the screen,
    advancing the cursor position.</p>

    <p>Either a variable or an inmediate value can be passed to the function:</p>
<pre>put char 'H'
put char 'I'
put char ' '
put digit cx</pre>

    <h3 id="getchar"><code>getchar</code> function</h3>
    <p>Used to <b>get</b> a single <b>char</b>acter (or digit) from the
    keyboard input. In addition, the input can both be echoed (by default)
    or hidden by appending <code>show</code> or <code>hide</code>:</p>

<pre>dl = get digit hide
dh = get digit hide
dl += dh
printf dl</pre>

    <h3 id="setcursor"><code>setcursor</code> function</h3>
    <p>This function allows to freely <b>set the cursor</b> on the screen.
    The list of parameters to be supplied are, in this order, the <i>row</i>
    and then the <i>column</i> to which the cursor should move:</p>
<pre>setcursor 12, 37
printf "Middle"</pre>

    <p>Since the screen defaults to 80 columns and 25 rows, in this case,
    <code>12, 37</code> would be able to print <code>"Middle"</code> on
    the middle of the screen.</p>

    <p>Since the screen defaults to 80 columns and 25 rows, in this case,
    <code>12, 37</code> would be able to print <code>"Middle"</code> on
    the middle of the screen.</p>

    <h3 id="cls"><code>cls</code> function</h3>
    <p>Used to <b>cl</b>ear the <b>s</b>creen. It takes no parameters.</p>

    <h2 id="compiling">Compiling</h2>
    <p>To compile <code>.lnn</code> files (containing the source code of your
    program in Lonang), simply use the <code>./lnn</code> tool. As a basic
    example, to compile the code in <code>main.lnn</code>:</p>
    <pre>./lnn main.lnn</pre>

    <h2 id="debugging">Debugging</h2>
    <p>If one desires to debug the generated code, one can use the
    <code>tag</code> statement, which will add a comment along with a
    <code>nop</code> instructions so that the execution can be ran until it.
    A message must always be specified, for instance:</p>
    <pre>tag Starting the part which needs debugging</pre>
</div>
</body>
</html>