adding wcwidth for char in libcore #15224

kwantam · 2014-06-27T20:33:58Z

It would be nice to have a wcwidth-alike, presumably living in core::unicode and exposed as a char method. I've got a working local implementation of this that automatically generates the search tables for 0- and double-width characters from the latest unicode data (this does not need to be done at build time, only when the unicode charsets are updated).

If this is a desirable feature, can you give me some guidance as to

What the function should be called (wcwidth is pretty C-ish and maybe not so Rustic).
Where the autogen stuff should live (maybe in src/etc/unicode_width?)
Whether you'd like me to add a wcwidth_cjk-equivalent function, whose behavior is slightly different on certain spacing characters, as described in http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c (see comment above the mk_wcwidth_cjk function).

Any other thoughts are, of course, appreciated.

The text was updated successfully, but these errors were encountered:

huonw · 2014-06-28T00:11:16Z

What the function should be called (wcwidth is pretty C-ish and maybe not so Rustic).

It can be a function/method on char like 'a'.width() or something.

Where the autogen stuff should live (maybe in src/etc/unicode_width?)

There's a unicode.py script already.

Whether you'd like me to add a wcwidth_cjk-equivalent function, whose behavior is slightly different on certain spacing characters, as described in http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c (see comment above the mk_wcwidth_cjk function).

The comment suggests that this is mainly a legacy thing, and so (in my mind) isn't a high priority.

Add libunicode; move unicode functions from core - created new crate, libunicode, below libstd - split `Char` trait into `Char` (libcore) and `UnicodeChar` (libunicode) - Unicode-aware functions now live in libunicode - `is_alphabetic`, `is_XID_start`, `is_XID_continue`, `is_lowercase`, `is_uppercase`, `is_whitespace`, `is_alphanumeric`, `is_control`, `is_digit`, `to_uppercase`, `to_lowercase` - added `width` method in UnicodeChar trait - determines printed width of character in columns, or None if it is a non-NULL control character - takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise) - split `StrSlice` into `StrSlice` (libcore) and `UnicodeStrSlice` (libunicode) - functionality formerly in `StrSlice` that relied upon Unicode functionality from `Char` is now in `UnicodeStrSlice` - `words`, `is_whitespace`, `is_alphanumeric`, `trim`, `trim_left`, `trim_right` - also moved `Words` type alias into libunicode because `words` method is in `UnicodeStrSlice` - unified Unicode tables from libcollections, libcore, and libregex into libunicode - updated `unicode.py` in `src/etc` to generate aforementioned tables - generated new tables based on latest Unicode data - added `UnicodeChar` and `UnicodeStrSlice` traits to prelude - libunicode is now the collection point for the `std::char` module, combining the libunicode functionality with the `Char` functionality from libcore - thus, moved doc comment for `char` from `core::char` to `unicode::char` - libcollections remains the collection point for `std::str` The Unicode-aware functions that previously lived in the `Char` and `StrSlice` traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and `use` the `UnicodeChar` and/or `UnicodeStrSlice` traits: extern crate unicode; use unicode::UnicodeChar; use unicode::UnicodeStrSlice; use unicode::Words; // if you want to use the words() method NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude. closes #15224 [breaking-change]

Replace `x` with `it` I kept some usages of `x`: * `x`s that are used together with `y`, `z`, ... * `x` that shadow `it`. I use `it` for iterators out of r-a, so there were some cases that I used `it` and `x` together. * `x` in test fixtures. Many of those `x` usages was not me so I thought it's better to keep them as is. I tried to remove the rest, but since there was too many `x` I might missed some of them or changed some of them that I didn't want to change.

huonw added the A-libs label Jun 28, 2014

kwantam mentioned this issue Jun 30, 2014

move Unicode functionality from libcore/collections/regex into libunicode ; add width() method for char ; update libunicode tables for Unicode 7.0 #15283

Merged

bors closed this as completed in #15283 Jul 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding wcwidth for char in libcore #15224

adding wcwidth for char in libcore #15224

kwantam commented Jun 27, 2014

huonw commented Jun 28, 2014

Uh oh!

adding wcwidth for char in libcore #15224

adding wcwidth for char in libcore #15224

Comments

kwantam commented Jun 27, 2014

huonw commented Jun 28, 2014

Uh oh!