Skip to content

adding wcwidth for char in libcore #15224

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
kwantam opened this issue Jun 27, 2014 · 1 comment · Fixed by #15283
Closed

adding wcwidth for char in libcore #15224

kwantam opened this issue Jun 27, 2014 · 1 comment · Fixed by #15283

Comments

@kwantam
Copy link
Contributor

kwantam commented Jun 27, 2014

It would be nice to have a wcwidth-alike, presumably living in core::unicode and exposed as a char method. I've got a working local implementation of this that automatically generates the search tables for 0- and double-width characters from the latest unicode data (this does not need to be done at build time, only when the unicode charsets are updated).

If this is a desirable feature, can you give me some guidance as to

  1. What the function should be called (wcwidth is pretty C-ish and maybe not so Rustic).
  2. Where the autogen stuff should live (maybe in src/etc/unicode_width?)
  3. Whether you'd like me to add a wcwidth_cjk-equivalent function, whose behavior is slightly different on certain spacing characters, as described in http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c (see comment above the mk_wcwidth_cjk function).

Any other thoughts are, of course, appreciated.

@huonw
Copy link
Member

huonw commented Jun 28, 2014

What the function should be called (wcwidth is pretty C-ish and maybe not so Rustic).

It can be a function/method on char like 'a'.width() or something.

Where the autogen stuff should live (maybe in src/etc/unicode_width?)

There's a unicode.py script already.

Whether you'd like me to add a wcwidth_cjk-equivalent function, whose behavior is slightly different on certain spacing characters, as described in http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c (see comment above the mk_wcwidth_cjk function).

The comment suggests that this is mainly a legacy thing, and so (in my mind) isn't a high priority.

@huonw huonw added the A-libs label Jun 28, 2014
bors added a commit that referenced this issue Jul 9, 2014
Add libunicode; move unicode functions from core

- created new crate, libunicode, below libstd
- split `Char` trait into `Char` (libcore) and `UnicodeChar` (libunicode)
  - Unicode-aware functions now live in libunicode
    - `is_alphabetic`, `is_XID_start`, `is_XID_continue`, `is_lowercase`,
      `is_uppercase`, `is_whitespace`, `is_alphanumeric`, `is_control`, `is_digit`,
      `to_uppercase`, `to_lowercase`
  - added `width` method in UnicodeChar trait
    - determines printed width of character in columns, or None if it is a non-NULL control character
    - takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise)
- split `StrSlice` into `StrSlice` (libcore) and `UnicodeStrSlice` (libunicode)
  - functionality formerly in `StrSlice` that relied upon Unicode functionality from `Char` is now in `UnicodeStrSlice`
    - `words`, `is_whitespace`, `is_alphanumeric`, `trim`, `trim_left`, `trim_right`
  - also moved `Words` type alias into libunicode because `words` method is in `UnicodeStrSlice`
- unified Unicode tables from libcollections, libcore, and libregex into libunicode
- updated `unicode.py` in `src/etc` to generate aforementioned tables
- generated new tables based on latest Unicode data
- added `UnicodeChar` and `UnicodeStrSlice` traits to prelude
- libunicode is now the collection point for the `std::char` module, combining the libunicode functionality with the `Char` functionality from libcore
  - thus, moved doc comment for `char` from `core::char` to `unicode::char`
- libcollections remains the collection point for `std::str`

The Unicode-aware functions that previously lived in the `Char` and `StrSlice` traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and `use` the `UnicodeChar` and/or `UnicodeStrSlice` traits:

    extern crate unicode;
    use unicode::UnicodeChar;
    use unicode::UnicodeStrSlice;
    use unicode::Words; // if you want to use the words() method

NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude.

closes #15224
[breaking-change]
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 17, 2023
Replace `x` with `it`

I kept some usages of `x`:
* `x`s that are used together with `y`, `z`, ...
* `x` that shadow `it`. I use `it` for iterators out of r-a, so there were some cases that I used `it` and `x` together.
* `x` in test fixtures. Many of those `x` usages was not me so I thought it's better to keep them as is.

I tried to remove the rest, but since there was too many `x` I might missed some of them or changed some of them that I didn't want to change.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
2 participants