Bottom is a lightweight encoding format used by Discord and Tumblr users from all around the world. This document aims to detail the Bottom specification officially, so that implementing it correctly is as easy as possible.
Each character in Bottom holds a purpose of some sort. These are detailed here for your convenience, and will be referred to in depth below.
Unicode escape(s) | Character | Value |
---|---|---|
U+1FAC2 |
🫂 | Integer 200 |
U+1F496 |
💖 | Integer 50 |
U+2728 |
✨ | Integer 10 |
U+1F97A |
🥺 | Integer 5 |
U+002C |
, | Integer 1 |
U+2764 , U+FE0F |
❤️ | Integer 0 |
Unicode escape(s) | Character | Purpose |
---|---|---|
U+1F449 , U+1F448 |
👉👈 | Byte terminator |
- The input stream must be valid UTF-8 encoded text. Encoding invalid UTF-8 is illegal.
- The output stream will be a sequence of groups of value characters (see table above) with each group terminated by the byte terminator character, i.e
💖✨✨✨👉👈💖💖🥺,,,👉👈💖💖,👉👈💖✨✨✨✨🥺,,👉👈💖💖✨🥺👉👈💖💖,👉👈💖✨,,,👉👈
- The total numerical value of each group must equal the decimal value of the corresponding input byte.
- For example, the numerical value of
💖💖,,,,
, as according to the character table above, is50 + 50 + 1 + 1 + 1 + 1
, or 104. This sequence would thus representU+0068
orh
, which has a decimal value of104
. - Note the ordering of characters within groups. Groups of value characters must be in descending order. While character order (within groups) technically does not affect the output in any way, arbitrary ordering can encroach significantly on decoding speed and is considered both illegal and bad form.
- For example, the numerical value of
- The encoding can be represented succintly in EBNF:
Note that EBNF fails to capture any notion of semantic validity, i.e character ordering. It's technically possible to encode character ordering rules into the grammar, but that is not shown here for the sake of brevity and simplicity.
bottom -> values (BYTE_TERMINATOR values)* BYTE_TERMINATOR values -> value_character+ | null_value value_character -> 🫂 | 💖 | ✨ | 🥺 | , null_value -> ❤️ BYTE_TERMINATOR -> 👉👈
- Byte terminators that do not follow a group of value characters are illegal, i.e
💖💖,,,,👉👈👉👈
or👉👈💖💖,,,,👉👈
. As such,👉👈
alone is illegal. - Groups of value characters must be followed by a byte terminator.
💖💖,,,,
alone is illegal, but💖💖,,,,👉👈
is valid. - The null value must be followed by a byte terminator.
💖💖,,,,👉👈❤️👉👈💖💖,,,,👉👈
and💖💖,,,,👉👈❤️👉👈
are valid, but💖💖,,,,👉👈❤️
alone is illegal.
- Decoding is quite simple and there aren't many special considerations to be made.
If you find it difficult, consider reading the source of one of the existing Bottom decoders.
- If speed is a priority, you may want to generate a hashmap (or similar) mapping each possible encoded byte to its decoded form. This drastically improves the decode speed of correctly encoded text.
For each byte b
of the input stream:
- Let
v
be the decimal value ofb
. - Let
o
be a buffer of Unicode scalar values. - If
v
is zero, encode this byte as ❤️ (U+2764
,U+FE0F
) - If
v
is non-zero, repeat the below untilv
is zero:- Find the largest value character (see table above) where the relationship
v >= character_value
is satisfied. Let this becharacter_value
. - Push the Unicode scalar values corresponding to
character_value
too
. - Subtract
character_value
fromv
.
- Find the largest value character (see table above) where the relationship
- Push the Unicode scalar values representing the byte terminator to
o
.
An implementation can thus be expressed as the following pseudo-code:
let o = new string
for b in input_stream:
let v = b as number
if v is 0:
o.append("❤️")
else:
loop:
if v >= 200:
o.append("🫂")
v = v - 200
else if v >= 50:
o.append("💖")
v = v - 50
else if v >= 10:
o.append("✨")
v = v - 10
else if v >= 5:
o.append("🥺")
v = v - 5
else if v >= 1:
o.append(",")
v = v - 1
else:
break
o.append("👉👈")
return o