Skip to content

Commit

Permalink
Attempt to detect inline images which contain "EI" sequence in the ac…
Browse files Browse the repository at this point in the history
…tual image data (issue 11124)

This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

*Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

---
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
  • Loading branch information
Snuffleupagus committed Jun 26, 2020
1 parent 276d917 commit 28d2ada
Show file tree
Hide file tree
Showing 4 changed files with 52 additions and 3 deletions.
48 changes: 45 additions & 3 deletions src/core/parser.js
Original file line number Diff line number Diff line change
Expand Up @@ -203,10 +203,11 @@ class Parser {
I = 0x49,
SPACE = 0x20,
LF = 0xa,
CR = 0xd;
const n = 10,
CR = 0xd,
NUL = 0x0;
const startPos = stream.pos;
const lexer = this.lexer,
startPos = stream.pos,
n = 10;
let state = 0,
ch,
maybeEIPos;
Expand Down Expand Up @@ -243,6 +244,25 @@ class Parser {
break;
}
}

if (state !== 2) {
continue;
}
// Check that the "EI" sequence isn't part of the image data, since
// that would cause the image to be truncated (fixes issue11124.pdf).
if (lexer.knownCommands) {
const nextObj = lexer.peekObj();
if (nextObj instanceof Cmd && !lexer.knownCommands[nextObj.cmd]) {
// Not a valid command, i.e. the inline image data *itself*
// contains an "EI" sequence. Resetting the state.
state = 0;
}
} else {
warn(
"findDefaultInlineStreamEnd - `lexer.knownCommands` is undefined."
);
}

if (state === 2) {
break; // Finished!
}
Expand Down Expand Up @@ -1276,6 +1296,28 @@ class Lexer {
return Cmd.get(str);
}

peekObj() {
const streamPos = this.stream.pos,
currentChar = this.currentChar,
beginInlineImagePos = this.beginInlineImagePos;

let nextObj;
try {
nextObj = this.getObj();
} catch (ex) {
if (ex instanceof MissingDataException) {
throw ex;
}
warn(`peekObj: ${ex}`);
}
// Ensure that we reset *all* relevant `Lexer`-instance state.
this.stream.pos = streamPos;
this.currentChar = currentChar;
this.beginInlineImagePos = beginInlineImagePos;

return nextObj;
}

skipToNextLine() {
let ch = this.currentChar;
while (ch >= 0) {
Expand Down
1 change: 1 addition & 0 deletions test/pdfs/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@
!issue6336.pdf
!issue6387.pdf
!issue6410.pdf
!issue11124.pdf
!issue8586.pdf
!jbig2_symbol_offset.pdf
!gradientfill.pdf
Expand Down
Binary file added test/pdfs/issue11124.pdf
Binary file not shown.
6 changes: 6 additions & 0 deletions test/test_manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -3147,6 +3147,12 @@
"type": "text",
"about": "Invisible (and broken) TrueType font used for text-selection."
},
{ "id": "issue11124",
"file": "pdfs/issue11124.pdf",
"md5": "9bde831515dc6b8bb2c7c00c8189aca9",
"rounds": 1,
"type": "eq"
},
{ "id": "issue11768",
"file": "pdfs/issue11768_reduced.pdf",
"md5": "0cafde97d78bb6883531a325a996a5ef",
Expand Down

0 comments on commit 28d2ada

Please # to comment.