Skip to content

Treat args/env as lossy UTF-8 #12283

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

lilyball
Copy link
Contributor

Change os::args() and os::env() to use str::from_utf8_lossy().
Add new functions os::args_as_bytes() and os::env_as_bytes() to retrieve the args/env as byte vectors instead.

The existing methods were left returning strings because I expect that the common use-case is to want string handling.

Fixes #7188.

pub fn as_bytes_no_nul<'a>(&'a self) -> &'a [u8] {
if self.buf.is_null() { fail!("CString is null!"); }
unsafe {
cast::transmute((self.buf, self.len()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transmute a raw::Slice to avoid accidentally getting the order mixed up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I was copying as_bytes() but this is a good excuse to fix that.

os::args() was using str::raw::from_c_str(), which would assert if the
C-string wasn't valid UTF-8. Switch to using from_utf8_lossy() instead,
and add a separate function os::args_as_bytes() that returns the ~[u8]
byte-vectors instead.
Parse the environment by default with from_utf8_lossy. Also provide
byte-vector equivalents (e.g. os::env_as_bytes()).

Unfortunately, setenv() can't have a byte-vector equivalent because of
Windows support, unless we want to define a setenv_bytes() that fails
under Windows for non-UTF8 (or non-UTF16).
@lilyball
Copy link
Contributor Author

New version pushed

@erickt
Copy link
Contributor

erickt commented Feb 15, 2014

Generally I like this approach. Why .as_bytes_no_nul instead of .as_bytes_no_null or .as_bytes_without_null? We use null far more often than nul. The only other nul I could find was some internal functions in path, which should probably also be renamed too.

@lilyball
Copy link
Contributor Author

The character '\0' is correctly termed NUL. "null" is a generic word that I've seen some people use in Rust to mean '\0' but more commonly is used to me the zero pointer. And .as_bytes_wihout_nul just seems unnecessarily long.

bors added a commit that referenced this pull request Feb 15, 2014
Change `os::args()` and `os::env()` to use `str::from_utf8_lossy()`.
Add new functions `os::args_as_bytes()` and `os::env_as_bytes()` to retrieve the args/env as byte vectors instead.

The existing methods were left returning strings because I expect that the common use-case is to want string handling.

Fixes #7188.
@bors bors closed this Feb 15, 2014
@lilyball lilyball deleted the env-args-bytes branch February 15, 2014 20:24
flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 26, 2024
Minor refactor format-args

* Move all linting logic into a single format implementations struct

This should help with the future format-args improvements.

**NOTE TO REVIEWERS**:  use "hide whitespace" in the github diff -- most of the code has shifted, but relatively low number of lines actually modified.

Followig up from rust-lang#12274

r? `@xFrednet`

---

changelog: none
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Current ~str UTF-8 behavior allows for denial-of-service attack with args, environ
4 participants