-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Reject reads that exceed our buffer length in png_push_fill_buffer. #552
Reject reads that exceed our buffer length in png_push_fill_buffer. #552
Conversation
Reproducer:
Without the patch, MSAN reports a failure in the CRC reading code, which calls
|
@jbowler When you have a moment, PTAL? The tests won't run until a maintainer initiates them. |
I have the same access as you. I can't believe this is a bug in libpng, please provide a repro. I.e. a test file and minimal test case. |
#552 (comment) is a minimal repro case. |
My apologies - my bad (using github on a cellphone. There is a law about that somewhere, don't github and cellphone?) This is weird; careful analysis will be required because your code change is clearly correct if we believe in "self documenting" code (i.t. buffer_size means what it suggests, not "last valid buffer index"). The problem is that the bug implies that any user of the progressive reader that reads to end-of-stream would detect an error and any user of the progressive reader that used libpng to read over a PNG embedded in a stream would have seen detectably weird behavior. This suggests a regression. The bug also confirms the lack of a working unit test for the progressive reader. Nasty. Can't see a CVE but there might be one. |
Also, there have to be two bugs don't there? The code change in question causes libpng to read an extra (initialized) byte from the buffer; one it should have read in the first place. But why, without the fix, is there an "uninitialized" byte available? That doesn't make any sense to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnstiles-google: please explain the change to iine 462, it seems to me that it destroys the progressive read code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope: this code change does not make any difference if the png_error is removed.
Please explain @johnstiles-google
@ctruta: please look at the code change VERY carefully. It is an obfuscated no-op apart from the addition of the png_error, which is certainly not a no-op.
Bear in mind that this is C, not python: indentation does not matter.
If somehow the powers define that this whole thing should be deleted I would be grateful if you would inform me directly via my email.
Read callbacks are given a buffer and a number of bytes to read. There is no mechanism for partially successful reads—either the read is fully satisfied, or If you run the reproducer in MSAN you will see that My proposed change should cause As you deduced, in cases where the read can be fully satisfied, the logic is unchanged. |
Caveat: I don't know how to run the libpng unit tests locally, and don't have the permissions to run them on github. As far as I can see, Chrome and Skia don't show any negative repercussions from this change from a unit-test perspective. I cannot reason through a way that the previous code could be correct—leaving the buffer partially unwritten seems like it could not be correct ever. However I am not an expert in libpng so there may be nuances that I am unfamiliar with. |
I am not sure what you mean by this—the indentation isn't changed from before, nor is it abnormal. |
The comment about indentation was misleading, but look at Guy's original code (which is substantially unaltered in this interface) Look at Guy's original code; it hasn't changed since Guy wrote it. This may help you understand why it works and why it doesn't read uninitialized data. |
I believe this bug is in the original code, which only implemented progressive reading for the IDAT chunks. If I am correct the bug arises because of a call to png_push_fill_buffer which is not preceded by the required check for available bytes. This is certainly true of the initial signature check (in 1.6.44) but that's not where this file seems to fail. There is another possibility which is much worse and which I won't mention here. In either case it looks like an internal libpng error and it looks like it might be fairly serious (possible CVE level). This is an update; I will investigate more. |
Just to follow up, I have now run the libpng unit tests locally. (All you need to do is That being said, I'm admittedly not an expert in libpng design philosophy—if there is a better way to fix the issue, that's fine too. I'm highly interested in solving the MSAN issue, but much less concerned about how we fix it. This seemed to be the simplest approach to me, and seemed to get to the root problem in a way that didn't add complexity, but maybe there is a better way.
|
@johnstiles-google I am completely unable to repro using your "reproducer". I attach a version which compiles against .libs/libpng16.a (GCC 13.2.1 with -g and -O2 for libpng). I pulled your patch for libpng but when I run the program it exits normally; not via png_error/setjmp as it would if the repro worked. Please try this yourself. I examined the data; it contains two complete chunks and nothing else, so the save buffer never gets used (the pointer remains NULL) and there is always sufficient data in current_buffer_ptr. The read loop just exits because it runs out of data. I've attached the data as a PNG file: But it's not much use because it doesn't seem to correspond to the bug you are reporting; libpng doesn't care about the bad CRC or the actual eXIf contents, it just warns that they are wrong. |
I just reconfirmed that the error case is reached. Instead of #7 0x7fef207104b2 abort There is a chance that the disconnect comes from our config settings. Here is the pnglibconf.h that we use in Chrome: https://source.chromium.org/chromium/chromium/src/+/main:third_party/libpng/pnglibconf.h?q=pnglibconf.h&ss=chromium My initial hunch was that this was the reason for the disparity: PNG_USER_CHUNK_CACHE_MAX However, I've tried changing that back to 1000 and it doesn't affect the outcome. I did find that if you disable PNG_READ_eXIf_SUPPORTED, then no error occurs. It may be possible that you haven't enabled this flag. |
Your version of png_handle_eXIf is borked somehow. It's reading too many bytes. First thing to do is to document the 'length' parameters to each of the calls to png_push_fill_buffer; either note them down under the debugger or just add a printf (well, fprintf(stderr, if you still have the abort in). Then we can compare with mine. If the break is a compiler issue then an fprintf might hide it but then the bug would not repro so that would prove something too. So just post a list of the arguments in turn. |
I've made this change and kept the
|
Note also, we are on libpng 1.6.37. There is a small chance that this bug has already been discovered and fixed, and we just need to update. |
This is the bug:
It's from png_handle_eXIf as are the two previous 1 byte reads but there are only 149 bytes in the whole of the eXIf data. So png_handle_eXIf reads two extra bytes and after that point the whole stream is out-of-whack. This is the "[other] possibility which is much worse and which I won't mention here" because it affects everything. A completely correct PNG with an eXIf chunk which triggers that case will be reported as corrupted by every one of the libpng read APIs as soon as the next chunk is read. There's no point adding an extra check in png_push_fill_buffer; it's a big bad obvious bug that must be fixed. Here's my sequence which, as you can see, is identical apart from the 149 byte read and adds up to the actual length: 8+4+4+13+4+4+4+1+1+147+4 (That's just fprintf(stderr, "%lu+", length) ; easy to pass to Ok, so it isn't in my 1.6.44-git (i.e. origin/libpng16 HEAD). So next is for you to checkout the origin/libpng16 branch and see if the problem goes away. I recommend using the
It gives you what would actually be checked in. It is very important to test any pull release against a clean completely up-to-date clone (local branch) of the origin. For libpng thats configure/make check or cmake/make test or both for build system changes. |
OK, I will try updating our version to latest code and see if the issue goes away. That would be a nice resolution and if so I'm happy to drop this PR. That being said, as a counterpoint to this comment:
Detecting internal errors as soon as possible is generally worth doing. If the caller is requesting more bytes than can be satisfied, and it's minimal cost to detect the issue and stop processing immediately, it seems like a good idea to do so. In a library that's used everywhere, a simple bounds check that can protect us from a CVE down the road seems valuable to me—especially since we already had a bounds check here, just an ineffective one. |
(FYI: it's a weekend here, and I expect updating libpng will take some time; I don't expect to get to it until Monday.) |
The bug is in 1.6.37. png_handle_eXIf is calling png_crc_finish with an incorrect "length" parameter ( CAUSE: No unit test! So no one actually tested that code path. DATES (I'm doing this quickly, I may be wrong): Bug created by Glenn in 2017:
Bug fixed by Cosmin in (I think):
So "not repro" because it is fixed and no change necessary because the current code is adequate and consistent for any error of this type (chunk handler reading too many bytes). There's a separate bug in the signature handling but I have to get a repro before suggesting a fix. It's not particularly important. We have missing unit file tests; otherwise valid PNG files with an invalid eXIf chunk (invalid byte order specifier with valid CRC). If I can generate those I think I can add them easily to the exist pngtest tests. |
Agreed—I just did the same analysis as you. The actual bug fix actually happened a little sooner, in 2021:
Cosmin's 2022 change just simplifies the logic. Thank you for adding the unit tests. I will get Chrome updated. |
(Instead of |
@johnstiles-google can you test the attached pr552-fixed.png against your current build (1.6.37 with or without your change): I changed the the repro prog to be in C and accept a file argument: It should fail with 1.6.37 and complain about a bad chunk name; the 'fix' to the original test is just an IEND chunk at the end (well, there's a spurious \n too but that doesn't matter.) |
No, not in this version of libpng (IRC I added png_assert to 1.7). I don't want to do that because it produces a horrible situation where exactly the same libpng build with exactly the same file produces two different error messages depending on just the buffering of the input (with the progressive reader)! The check also hardly ever works; your test file had both a broken eXIf chunk and truncation exactly at the end of that chunk. The same check can be performed more generally in png_read_data but why, what does that achieve? Just a marginally better error message which is normally not displayed (well, png_assert, IRC was disabled in release but still). The whole architecture of relying on a complex chunk handler to get the read of the chunk data exactly right is broken; the fix is so validate individual reads against the actual data length and make png_crc_finish not take a length parameter at all. Probably a fail-safe internal change that could be done quite quickly. |
I ran your "pr552-fixed.png.txt" and got this output. It did not hit the
|
Good, thanks. That matches what happens when I recreate the bug in 1.6.44. The error message is: ND[00][00]: invalid chunk type As I expected. |
I've updated Chromium to libpng 1.6.43 and confirmed that the issue no longer reproduces. Thanks again for your debugging assistance. |
Previously,
png_push_fill_buffer
would simply truncate the read, leaving the remainder of the output buffer in an uninitialized state. This can lead to undefined behavior when handling truncated or malformed inputs.