Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

files of infinite length cause bat to crash #636

Open
rgreenblatt opened this issue Aug 23, 2019 · 19 comments · May be fixed by #2902
Open

files of infinite length cause bat to crash #636

rgreenblatt opened this issue Aug 23, 2019 · 19 comments · May be fixed by #2902
Labels
bug Something isn't working

Comments

@rgreenblatt
Copy link

bat /dev/zero
memory allocation of 34359738368 bytes failed
@sharkdp
Copy link
Owner

sharkdp commented Aug 23, 2019

Thank you for reporting this. I can reproduce this.

I guess this is caused by the fact that we read every single line into a Vec<u8> buffer without any size restriction:

bat/src/inputfile.rs

Lines 38 to 40 in 187a3b9

pub fn read_line(&mut self, buf: &mut Vec<u8>) -> io::Result<bool> {
if self.first_line.is_empty() {
let res = self.inner.read_until(b'\n', buf).map(|size| size > 0)?;

@sharkdp sharkdp added the bug Something isn't working label Aug 23, 2019
@treere
Copy link

treere commented Oct 26, 2019

Hi,
I'm working on a fix.
The possible solutions are:

  • stop reading a line longer than X and report an error
  • split a line longer then X into small pieces

Which solution is better?
What I should do with the highlight?

@sharkdp
Copy link
Owner

sharkdp commented Oct 28, 2019

I'm working on a fix.

Sounds great.

Which solution is better?
What I should do with the highlight?

I would have to experiment with this myself to answer your questions. I could imagine that this is not easy to fix. For example, we need to make sure to still support the non-buffering/"immediate-response" behavior of bat when reading from STDIN, etc.

split a line longer then X into small pieces

Without having thought too much about it, I believe that would be my first try. Not sure if we can feed a non-finished line to the highlighter.

@Enselic
Copy link
Collaborator

Enselic commented Aug 18, 2022

Anyone looking to fix this issue should take a look at this PR which was an attempt at doing just that: #712

I am explicitly mentioning that so we can close the above PR to keep the PR inbox clean.

@aarondill
Copy link

I'd like to point out in case it has not yet been realized, this also affect when bat is piped into another program (ie bat /dev/zero | tr '\0' '0'). This is not the effect that the standard cat has when reading a file, breaking the drop-in replaceability of cat.

"""
Whenever the output of bat goes to a non-interactive terminal, i.e. when the output is piped into another process or into a file, bat will act as a drop-in replacement for cat(1) and fall back to printing the plain file contents.
"""

@deboard
Copy link

deboard commented Jan 3, 2024

Please assign this issue to me.

@deboard
Copy link

deboard commented Jan 3, 2024

Interesting...
With the machine I am on right now I get "memory allocation of 68719476736 bytes failed" then a core dump.
I see this type in the rust docs: https://docs.rs/typenum/latest/i686-unknown-linux-gnu/typenum/consts/type.N68719476736.html
pub type N68719476736 = NInt
The original number was 34359738368, this is also a type: https://docs.rs/typenum/latest/i686-unknown-linux-gnu/typenum/consts/type.U34359738368.html

It looks like rust has expanded the max amount of memory that can fit inside a Vec since this was first reported (isn't this a Vec memory problem?).

@einfachIrgendwer0815
Copy link
Contributor

It looks like rust has expanded the max amount of memory that can fit inside a Vec since this was first reported (isn't this a Vec memory problem?).

I don't think it's a Vec memory problem because a line in an infinitely long file might as well be infinitely long (which is definitely true for /dev/zero). An infinitely long line would require an infinite amount of memory, no matter how much actually fits in a Vec (which likely depends on how much memory the system has?).

@deboard
Copy link

deboard commented Jan 4, 2024

True.

It looks like if you call std::fs::metadata("/dev/zero").unwrap().len(), the len value is zero. It might be a good fix to check the length of a (any) file and if it is 0 don't try to read it in.

@deboard
Copy link

deboard commented Jan 11, 2024

Hmm, well i have tried several different things. The latest being

                    if !path.is_file() {
                        // File does not exist or is a dir?
                        let fname = String::from(path.to_string_lossy());
                        return Err(format!("'{fname}' does not exist.").into());
                    }

                    let fname = String::from(path.to_string_lossy());
                    let meta = std::fs::metadata(&fname).unwrap();
                    if !meta.is_symlink() {
                        if  meta.len() == 0 {
                            // If the file is zero length do not attempt to open it.
                            // Files like /dev/zero will cause bat to crash or be killed.
                            return Err(format!("'{fname}' is a zero length file.").into());
                        }
                    }

But that breaks the test below bacause a zero length error is returned. But I test is the file exists and if it is a symlink

assets::tests::syntax_detection_for_symlinked_file' panicked at src/assets.rs

What I don't get is the file passed into Input::ordinary_file in that test seems to be a symlink... Any ideas?

@einfachIrgendwer0815
Copy link
Contributor

std::fs::metadata follows symlinks. So, when the file is a real file, std::fs::metadata returns that file's metadata. But when the file is a symlink, std::fs::metadata returns the metadata of the file (or directory) the symlink points to. std::fs::symlink_metadata should do the trick.

However, I'm not certain a zero-size check is sufficient since files under /proc report to have a size of 0 without being an infinite file. Additionally, real empty files will also have a size of 0 and bat shouldn't mind reading those. Though, apparently /dev/zero doesn't claim to be a real file, but a character device instead.

@einfachIrgendwer0815
Copy link
Contributor

A different idea could be the following:
When the length of a line exceeds a certain limit, all it's content gets discarded, but reading continues until the end of line. On the next line, reading continues as usual. The line being too long could be replaced by a short note.

Imagine line 52 being too long to read:

 51   │ ...
 52 ! │ <line too long>
 53   │ ...

(the '!' would have the purpose of making clear that <line too long> is not actual content)

Pros:

  • following lines that are reasonably long can be displayed anyway
  • the user is told exactly which lines are too long

Cons:

  • when reading an infinite line, bat will still hang forever

To not make bat hang forever, a second limit could be added. When that one is reached, bat would abort immediately.

What is your opinion on that approach?

@deboard
Copy link

deboard commented Jan 14, 2024

Interseting. Right now I am looking at doing this, but that may be a better way, I'll see how your suggestion works. Because, just dont crash:

                    #[cfg(unix)]
                    {
                        let fname = String::from(path.to_string_lossy());
                        if fname == "/dev/zero" {
                            // /dev/zero seems to be a special case file for bat on unix.
                            // It is an empty infinate length char file.
                            // This makes it imposible to read in the first line.
                            // Reading in the first line is the first thing
                            // InputReader::new does.
                            // I hate hard coding things like this, hack...
                            // Files like /dev/tty cause bat to hang but
                            // not crash like /dev/zero.
                            return Err(format!("'{fname}' <EMPTY>.").into());
                        }
                    }

@aarondill
Copy link

if you choose to go the hard coding route, would it be possible to use std::os::unix::fs::MetadataExt.rdev to get the device number and check if it's 1:5 instead?

this would protect against files created using mknod, as well as permitting the (bad idea, but allowed) use case of /dev/zero not being the zero character file.

@einfachIrgendwer0815
Copy link
Contributor

if you choose to go the hard coding route, would it be possible to use std::os::unix::fs::MetadataExt.rdev to get the device number and check if it's 1:5 instead?

That should be reliable (https://www.kernel.org/doc/html/latest/admin-guide/devices.html), but this is a very specialized solution. There are other such files, /dev/random (1:8), /dev/urandom (1:9), or any symbolic link to one of these.

To my understanding, the title of this issue and the author's initial comment just point out a very extreme case of the actual problem. In fact, you don't need an infinite file to let bat run out of memory, it's enough to use a regular file that is bigger than the available memory. Therefore, this problem is not limited to Linux and not to just a handful of special files. That's why I think this issue should be tackled with a more general and platform independent solution.

@deboard
Copy link

deboard commented Jan 16, 2024

It would be better to have the fix based on the actual memory problem rather than special cases. I'll keep poking at it.

@deboard
Copy link

deboard commented Jan 27, 2024

Is there a requirement or is it the desired behaviour to page binary data? If not this is a possible fix:

        // try to read in 100 bytes, see what the content type is
        let mut hundred_bytes = vec![0; 100 as usize];
        reader.read(&mut hundred_bytes).ok();
        let content_type = if hundred_bytes.is_empty() {
            None
        } else {
            Some(content_inspector::inspect(&hundred_bytes))
        };

        let mut first_line = vec![];

        match content_type {
            Some(ContentType::BINARY) => {
                // NO-OP on binary data
                // /dev/zero falls under this category
                // So does a .jpeg file...:w
                // 
            },
            Some(ContentType::UTF_8) |
            Some(ContentType::UTF_8_BOM) => {
                reader.read_until(b'\n', &mut first_line).ok();
            },
            Some(ContentType::UTF_16LE) |
            Some(ContentType::UTF_16BE)  |
            Some(ContentType::UTF_32LE) |
            Some(ContentType::UTF_32BE) |
            None => /* Not sure about reading on None... */ {
                reader.read_until(0x00, &mut first_line).ok();
            },
        };

@deboard
Copy link

deboard commented Jan 27, 2024

Actually, I think I can make an improvement on the code above.

@deboard
Copy link

deboard commented Jan 28, 2024

I am seeing an issue with with the idea of creating the InputReader from any "OrdinaryFile", just read in a "first line" and expect everything will be OK. More analysis on the files attributes and its content type need to happen before reading in a first line. For example, some files do not have a "line" in the text file sort of context.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
7 participants