-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Provide way to allow for new buffer encodings. #1772
Comments
Extending built-in object isn't a good solution. |
bobrik, Buffer is not a built-in for javascript, it is created by Node's core. |
It's built-in for node. My own opinion: create helper object to do that task. In other case, how could I know in any module that encoding is registered? |
This is a feature request to allow extension to the "built-in" Buffer In order to detect if an encoding is present, one would need to Simply requiring 'encodingType.js' seems simplistic and being at setup On Mon, Sep 26, 2011 at 1:02 PM, Ian Babrou
|
Very good point. Node.js probably needs boolean |
However, the There are only a few pre-defined encodings, they appear in many It won't be easy to write a function that adds yet another encoding to that list, because there was never a list as a real data structure to begin with, and you cannot alter a pile of |
I also think that JavaScript functions For many use cases that is inevitable; however, for simple encodings (such as CP 866, KOI8-R, Windows-1251) the The module node-iconv (by @bnoordhuis) already does something similar, but supports only some predefined encodings, and is non-Windows, and requires compilation (while Node.js core binaries for Windows are pre-built and available for download). |
node-iconv should now compile on Windows: bnoordhuis/node-iconv#27 As to 'only some predefined encodings', we're talking about 120 supported encodings. :-) |
Closing, no activity in over a year and node-iconv and iconv-lite have the market pretty much cornered. We may someday allow pluggable encodings but that's ultra-low priority. |
Iconv-lite seems to have the I worked around it by creating my own node-singlebyte module. Of course, I still look forward for the day when pluggable encodings become natively supported (just because the native code would likely run faster than JavaScript), but now I can live without it. |
I'm not sure what you mean by "pluggable encodings" or by "single byte". |
Pluggable decoders for StringDecoder. |
Hmm. I don't really get why there'd be a request for this in core. We But, I do forget people want core to be more than just a "core". |
The idea is not about adding more encoding definitions to the core. The idea is about adding an API that would allow userland modules to define more encodings for their own purposes in a way that would allow the core to run The typical userland code would then look like the following: Buffer.defineEncoding('cp437', [
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F,
0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
0x48, 0x49, 0x4A, 0x4B, 0x4C, 0x4D, 0x4E, 0x4F,
0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
0x58, 0x59, 0x5A, 0x5B, 0x5C, 0x5D, 0x5E, 0x5F,
0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
0x68, 0x69, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E, 0x6F,
0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77,
0x78, 0x79, 0x7A, 0x7B, 0x7C, 0x7D, 0x7E, 0x7F,
0xC7, 0xFC, 0xE9, 0xE2, 0xE4, 0xE0, 0xE5, 0xE7,
0xEA, 0xEB, 0xE8, 0xEF, 0xEE, 0xEC, 0xC4, 0xC5,
0xC9, 0xE6, 0xC6, 0xF4, 0xF6, 0xF2, 0xFB, 0xF9,
0xFF, 0xD6, 0xDC, 0xA2, 0xA3, 0xA5, 0x20A7, 0x192,
0xE1, 0xED, 0xF3, 0xFA, 0xF1, 0xD1, 0xAA, 0xBA,
0xBF, 0x2310, 0xAC, 0xBD, 0xBC, 0xA1, 0xAB, 0xBB,
0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
0x2555, 0x2563, 0x2551, 0x2557, 0x255D, 0x255C, 0x255B, 0x2510,
0x2514, 0x2534, 0x252C, 0x251C, 0x2500, 0x253C, 0x255E, 0x255F,
0x255A, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256C, 0x2567,
0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256B,
0x256A, 0x2518, 0x250C, 0x2588, 0x2584, 0x258C, 0x2590, 0x2580,
0x3B1, 0x3B2, 0x393, 0x3C0, 0x3A3, 0x3C3, 0x3BC, 0x3C4,
0x3A6, 0x398, 0x3A9, 0x3B4, 0x221E, 0x3C6, 0x3B5, 0x2229,
0x2261, 0xB1, 0x2265, 0x2264, 0x2320, 0x2321, 0xF7, 0x2248,
0xB0, 0x2219, 0xB7, 0x221A, 0x207F, 0xB2, 0x25A0, 0xA0
]);
var buf = new Buffer([0xE6, 0x54, 0x6F, 0x72, 0x72, 0x65, 0x6E, 0x74, 0x73]);
console.log( buf.toString('cp437', 0, 8) ); // 'μTorrent'
If done in JavaScript, it would run slower than If done in some C/C++ addon, it would face the general inavailability of build tools on end-user Windows systems. There's no good way to solve this problem, there's only a (not obvious) choice of the lesser ot the three evils:
The proposed solution is trying to “split the baby” by suggesting to put the encoder and decoder to the core but to let the encoding definitions come from the userland. The userland code may then remain in JavaScript (definitions are data structures, such as an array in the above example, and 80% speed drop won't affect them), the addition to the core is minimal. |
That sounds like an extremely bad idea, especially if two or more modules decide to implement different versions of the same encoding with the same name. |
Right now buffers use hard coded values for encodings, when making certain protocols ( http://self-issued.info/docs/draft-jones-json-web-signature.html#base64urllogic ) this requires creating entirely new wrappers for toString and no way to support it via the Buffer constructor.
Having something similar to:
The performance issue could be worked around by never encoding to a buffer until a toString or property related to the buffer value is requested (and then not even doing that if the encodings match on toString), and then cacheing the result.
The text was updated successfully, but these errors were encountered: