Skip to content

onDataString can mangle multi-byte characters  #37

Closed
@nwolverson

Description

@nwolverson

onDataString is easy to use but is unfortunately a subtle foot-gun.

This function calls the underlying node API returning a buffer, and then calls Buffer.toString on the result.

But if a character would span two data events, each Buffer.toString will replace the initial/trailling code units with the replacement character.

As it stands, onDataString should be documented with a clear warning, it is not suitable for general purpose use but only streams that are either guaranteed to have single-byte encoded characters/known short lengths.

The comment on setEncoding unfortunately recommends this:

"Where possible, you should try to use onDataString instead of this function."

I think if this recommended onData, it might be more clear to the user that something fishy is going on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: documentationImprovements or additions to documentation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions