add some EBCDIC encodings #112

Mithgol · 2015-11-06T08:54:13Z

Fixes #111 partially.

EBCDIC 037 mapping has been taken from http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT and automatically converted from 0xXXXX to \uXXXX format for JavaScript.

EBCDIC 1140 is said to be different only at code point 9F (I have manually retyped that difference).

Note: this pull request does not contain tests because I am not sure how they should look like.

Mithgol · 2015-11-06T09:06:47Z

EBCDIC 500 mapping has been taken from http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP500.TXT and automatically converted from 0xXXXX to \uXXXX format for JavaScript.

EBCDIC 1148 is said to be different only at code point 9F (I have manually retyped that difference).

devin122 · 2017-04-13T16:07:15Z

There is some problems here with some of the control mappings. The problem arises because EBCDIC has a Carriage Return, New Line, and Line Feed. The problem with these mappings is that control characters in EBCDIC which do not translate have been given arbitrary unicode values starting at 0x80. This includes the NL character (0x20 in EBCDIC), which is assigned U+0080. On the systems I've touched the EBCDIC NL character is used in place of the LF character for marking EOL

Mithgol · 2017-04-16T18:09:32Z

Currently Wikipedia says that EBCDIC NL is 0x15 in EBCDIC 500 (and in its variation EBCDIC 1148) and in EBCDIC 037 (and in its variation EBCDIC 1140).

These four are mapped (by the Microsoft mappings, mentioned above) to U+0085 (officially said to be “NEXT LINE” or “NEL”) which seems correct to me.

devin122 · 2017-04-16T21:54:30Z

Im not really sure how many programs handle U+0085 properly. The other side is, when converting the other direction, with LF being the usual line terminator. means it gets converted to EBCDIC LF (0x25). I need to double check, but on the EBCDIC machines I've had access to, they do not like this at all. They want NL line endings.

RovoMe · 2020-06-30T15:40:54Z

As I have to add support for various EBCDIC encodings also I did a little research on this matter and I found an implementation by IBM which they open sourced.

Here ConversionMaps is used to map between encodings and code pages (or more formally CCSIDs). In ConvTable this mapping is now used to load the respective converter (i.e. ConvTable1140 to map between Unicode and EBCDIC (CCSID 037 = Euro update 1140 according to "Code pages with Latin-1 character sets" on the Wikipedia entry)). Skimming through their codebase a nice amount of such mappings are available, that might be helpful in adding support for those encodings to iconv-lite.

On using a bit more complex EBCDIC sample taken from this page I was able, after some back and force conversions and modifying my local sbcs-data.js file, to validate the correctness of the ebcdic.txt sample file against the ascii.txt file with a test like this:

    it("Read EBCDIC from stream", () => {
        let expected: string = fs.readFileSync("./test/ascii.txt", "latin1");
        while (expected.includes("\n") || expected.includes("\r")) {
            expected = expected.replace("\n", "").replace("\r", "");
        }

        // https://querysurge.zendesk.com/hc/en-us/articles/215029906-QuerySurge-and-Mainframe-Data-EBCDIC-Files
        // the EBCDIC file is UTF-8 encoded, so we'll need to specify this in the call. For the output
        // ASCII file, we'll use the ISO-8859-1 encoding. The record length for the sample file is 67
        // bytes
        const stream: Stream =
            fs.createReadStream("./test/ebcdic.txt")
                .pipe(iconv.decodeStream("utf8"))
                .pipe(iconv.encodeStream("iso88591"))
                .pipe(iconv.decodeStream("ebcdic037"))
                // .pipe(iconv.decodeStream("ebcdic1140"))
                // .pipe(iconv.decodeStream("ebcdic500"))
                // .pipe(iconv.decodeStream("ebcdic1148"))
        ;
        
        const chunks: unknown[] = [];
        stream.on("data", (chunk: string) => chunks.push(Buffer.from(chunk)));
        stream.on("end", () => assert.deepStrictEqual(chunks.toString(), expected));
    });

This sample test works with CCSID: 037, 277, 280, 284, 285, 297, 500, 1047 but fails for i.e. 273

BTW, one can check EBCDIC files in IntelliJ quite easily just by changing the file encoding from the default UTF-8 to i.e. IBM01140 or similar ones. Unfortunately, I need such support in Visual Studio Code, which seem to rely on jschardet and iconv-lite to probe and convert between encodings.

HTH

ashtuchkin · 2020-07-01T19:31:35Z

Thanks for the research @RovoMe! Any specific action items you would like to add here, or is it mostly additional info?

I always try to generate the encodings directly from authoritative sources, e.g. see in https://github.com/ashtuchkin/iconv-lite/blob/master/generation/gen-dbcs.js we download corresponding tables from unicode.org or encoding.spec.whatwg.org.

To support EBCDIC, ideally I'd want something like gen-ebcdic.js that downloads the tables from unicode.org and transforms it to iconv-lite format. Java sources are not work great for that purpose, unfortunately.

Also I think the NL concern by @devin122 is valid (see https://en.wikipedia.org/wiki/Newline#Representation). We might want to address it by 1) encoding/decoding without changes by default, this would keep 1:1 representation of all latin1 characters, but then 2) add a codec option like EBCDICNLConversion: '\n', which would enable conversion of NL char to corresponding char(s). This conversion can probably be a separate PR.

Finally, FYI, we do work on integrating iconv-lite into VS Code, but it hasn't happened yet.

Fish1 · 2021-08-11T19:14:33Z

I would like this please. Thanks!

Dman247 · 2021-08-11T19:40:01Z

Agreed, it would be very helpful to have the capability of opening encodings like CP037.

GitMensch · 2022-08-23T16:48:46Z

Is there any chance this PR is going forward?

GitMensch · 2023-11-06T17:53:08Z

vscode depends on this issue - microsoft/vscode#49891 is "the big and old" one, duplicates are at least microsoft/vscode#147064 microsoft/vscode#179693.

@ashtuchkin Can you take a look at integrating this and publish a new version?

Mithgol added 2 commits November 6, 2015 11:45

Added EBCDIC 037 and EBCDIC 1140 encodings.

b56ec86

Added EBCDIC 500 and EBCDIC 1148 encodings.

4001361

Mithgol changed the title ~~add EBCDIC 037 and EBCDIC 1140 encodings~~ add some EBCDIC encodings Nov 6, 2015

Mithgol mentioned this pull request Nov 6, 2015

EBCDIC #111

Open

ashtuchkin force-pushed the master branch from 978c58b to 5148f43 Compare June 8, 2020 08:19

ashtuchkin force-pushed the master branch 4 times, most recently from 84ee650 to 9aa082f Compare July 16, 2020 08:07

ashtuchkin force-pushed the master branch from 5d99a92 to ed88711 Compare May 23, 2021 22:34

lukebrowell mentioned this pull request Jul 3, 2021

Support EBCDIC encodings microsoft/vscode#49891

Closed

ap891843 mentioned this pull request Feb 7, 2022

fix: Provide a setting to define encoding for USS files. eclipse-che4z/che-che4z-lsp-for-cobol#1237

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add some EBCDIC encodings #112

add some EBCDIC encodings #112

Mithgol commented Nov 6, 2015

Mithgol commented Nov 6, 2015 •

edited

Loading

devin122 commented Apr 13, 2017

Mithgol commented Apr 16, 2017

devin122 commented Apr 16, 2017

RovoMe commented Jun 30, 2020

ashtuchkin commented Jul 1, 2020

Fish1 commented Aug 11, 2021

Dman247 commented Aug 11, 2021

GitMensch commented Aug 23, 2022

GitMensch commented Nov 6, 2023

add some EBCDIC encodings #112

Are you sure you want to change the base?

add some EBCDIC encodings #112

Conversation

Mithgol commented Nov 6, 2015

Mithgol commented Nov 6, 2015 • edited Loading

devin122 commented Apr 13, 2017

Mithgol commented Apr 16, 2017

devin122 commented Apr 16, 2017

RovoMe commented Jun 30, 2020

ashtuchkin commented Jul 1, 2020

Fish1 commented Aug 11, 2021

Dman247 commented Aug 11, 2021

GitMensch commented Aug 23, 2022

GitMensch commented Nov 6, 2023

Mithgol commented Nov 6, 2015 •

edited

Loading