Skip to content

Latest commit

 

History

History
286 lines (276 loc) · 7 KB

Base256Emoji.md

File metadata and controls

286 lines (276 loc) · 7 KB

Base256Emoji

This base is a benchmark / test / torture for implementations that want to support Unicode.

Encoding

Since both buffers and base256 items have 256 permutations per item the encoding is trivial, there is a one to one correspondence between one UTF-32 character and one byte value and you don't need to deal with any overflow or padding.

First, allocate a UTF-32 output string with a codepoint length of your input buffer.

Then, for each index lookup in the correspondence table using the current byte value as an index and write the codepoint you found to your output buffer at the same index.

You can find out the correspondence using this table:

Emoji Unicode codepoint Byte Value
🚀 U+1F680 0
🪐 U+1FA90 1
U+2604 2
🛰 U+1F6F0 3
🌌 U+1F30C 4
🌑 U+1F311 5
🌒 U+1F312 6
🌓 U+1F313 7
🌔 U+1F314 8
🌕 U+1F315 9
🌖 U+1F316 10
🌗 U+1F317 11
🌘 U+1F318 12
🌍 U+1F30D 13
🌏 U+1F30F 14
🌎 U+1F30E 15
🐉 U+1F409 16
U+2600 17
💻 U+1F4BB 18
🖥 U+1F5A5 19
💾 U+1F4BE 20
💿 U+1F4BF 21
😂 U+1F602 22
U+2764 23
😍 U+1F60D 24
🤣 U+1F923 25
😊 U+1F60A 26
🙏 U+1F64F 27
💕 U+1F495 28
😭 U+1F62D 29
😘 U+1F618 30
👍 U+1F44D 31
😅 U+1F605 32
👏 U+1F44F 33
😁 U+1F601 34
🔥 U+1F525 35
🥰 U+1F970 36
💔 U+1F494 37
💖 U+1F496 38
💙 U+1F499 39
😢 U+1F622 40
🤔 U+1F914 41
😆 U+1F606 42
🙄 U+1F644 43
💪 U+1F4AA 44
😉 U+1F609 45
U+263A 46
👌 U+1F44C 47
🤗 U+1F917 48
💜 U+1F49C 49
😔 U+1F614 50
😎 U+1F60E 51
😇 U+1F607 52
🌹 U+1F339 53
🤦 U+1F926 54
🎉 U+1F389 55
💞 U+1F49E 56
U+270C 57
U+2728 58
🤷 U+1F937 59
😱 U+1F631 60
😌 U+1F60C 61
🌸 U+1F338 62
🙌 U+1F64C 63
😋 U+1F60B 64
💗 U+1F497 65
💚 U+1F49A 66
😏 U+1F60F 67
💛 U+1F49B 68
🙂 U+1F642 69
💓 U+1F493 70
🤩 U+1F929 71
😄 U+1F604 72
😀 U+1F600 73
🖤 U+1F5A4 74
😃 U+1F603 75
💯 U+1F4AF 76
🙈 U+1F648 77
👇 U+1F447 78
🎶 U+1F3B6 79
😒 U+1F612 80
🤭 U+1F92D 81
U+2763 82
😜 U+1F61C 83
💋 U+1F48B 84
👀 U+1F440 85
😪 U+1F62A 86
😑 U+1F611 87
💥 U+1F4A5 88
🙋 U+1F64B 89
😞 U+1F61E 90
😩 U+1F629 91
😡 U+1F621 92
🤪 U+1F92A 93
👊 U+1F44A 94
🥳 U+1F973 95
😥 U+1F625 96
🤤 U+1F924 97
👉 U+1F449 98
💃 U+1F483 99
😳 U+1F633 100
U+270B 101
😚 U+1F61A 102
😝 U+1F61D 103
😴 U+1F634 104
🌟 U+1F31F 105
😬 U+1F62C 106
🙃 U+1F643 107
🍀 U+1F340 108
🌷 U+1F337 109
😻 U+1F63B 110
😓 U+1F613 111
U+2B50 112
U+2705 113
🥺 U+1F97A 114
🌈 U+1F308 115
😈 U+1F608 116
🤘 U+1F918 117
💦 U+1F4A6 118
U+2714 119
😣 U+1F623 120
🏃 U+1F3C3 121
💐 U+1F490 122
U+2639 123
🎊 U+1F38A 124
💘 U+1F498 125
😠 U+1F620 126
U+261D 127
😕 U+1F615 128
🌺 U+1F33A 129
🎂 U+1F382 130
🌻 U+1F33B 131
😐 U+1F610 132
🖕 U+1F595 133
💝 U+1F49D 134
🙊 U+1F64A 135
😹 U+1F639 136
🗣 U+1F5E3 137
💫 U+1F4AB 138
💀 U+1F480 139
👑 U+1F451 140
🎵 U+1F3B5 141
🤞 U+1F91E 142
😛 U+1F61B 143
🔴 U+1F534 144
😤 U+1F624 145
🌼 U+1F33C 146
😫 U+1F62B 147
U+26BD 148
🤙 U+1F919 149
U+2615 150
🏆 U+1F3C6 151
🤫 U+1F92B 152
👈 U+1F448 153
😮 U+1F62E 154
🙆 U+1F646 155
🍻 U+1F37B 156
🍃 U+1F343 157
🐶 U+1F436 158
💁 U+1F481 159
😲 U+1F632 160
🌿 U+1F33F 161
🧡 U+1F9E1 162
🎁 U+1F381 163
U+26A1 164
🌞 U+1F31E 165
🎈 U+1F388 166
U+274C 167
U+270A 168
👋 U+1F44B 169
😰 U+1F630 170
🤨 U+1F928 171
😶 U+1F636 172
🤝 U+1F91D 173
🚶 U+1F6B6 174
💰 U+1F4B0 175
🍓 U+1F353 176
💢 U+1F4A2 177
🤟 U+1F91F 178
🙁 U+1F641 179
🚨 U+1F6A8 180
💨 U+1F4A8 181
🤬 U+1F92C 182
U+2708 183
🎀 U+1F380 184
🍺 U+1F37A 185
🤓 U+1F913 186
😙 U+1F619 187
💟 U+1F49F 188
🌱 U+1F331 189
😖 U+1F616 190
👶 U+1F476 191
🥴 U+1F974 192
U+25B6 193
U+27A1 194
U+2753 195
💎 U+1F48E 196
💸 U+1F4B8 197
U+2B07 198
😨 U+1F628 199
🌚 U+1F31A 200
🦋 U+1F98B 201
😷 U+1F637 202
🕺 U+1F57A 203
U+26A0 204
🙅 U+1F645 205
😟 U+1F61F 206
😵 U+1F635 207
👎 U+1F44E 208
🤲 U+1F932 209
🤠 U+1F920 210
🤧 U+1F927 211
📌 U+1F4CC 212
🔵 U+1F535 213
💅 U+1F485 214
🧐 U+1F9D0 215
🐾 U+1F43E 216
🍒 U+1F352 217
😗 U+1F617 218
🤑 U+1F911 219
🌊 U+1F30A 220
🤯 U+1F92F 221
🐷 U+1F437 222
U+260E 223
💧 U+1F4A7 224
😯 U+1F62F 225
💆 U+1F486 226
👆 U+1F446 227
🎤 U+1F3A4 228
🙇 U+1F647 229
🍑 U+1F351 230
U+2744 231
🌴 U+1F334 232
💣 U+1F4A3 233
🐸 U+1F438 234
💌 U+1F48C 235
📍 U+1F4CD 236
🥀 U+1F940 237
🤢 U+1F922 238
👅 U+1F445 239
💡 U+1F4A1 240
💩 U+1F4A9 241
👐 U+1F450 242
📸 U+1F4F8 243
👻 U+1F47B 244
🤐 U+1F910 245
🤮 U+1F92E 246
🎼 U+1F3BC 247
🥵 U+1F975 248
🚩 U+1F6A9 249
🍎 U+1F34E 250
🍊 U+1F34A 251
👼 U+1F47C 252
💍 U+1F48D 253
📣 U+1F4E3 254
🥂 U+1F942 255

Decoding

It is the same as encoding but the other way around.

Note it is not recommended to use a 8 gigabytes UTF-32 codepoint -> struct {bool, byte}, it might be wise to a hash map instead.