Avoid keeping hold of partial bytes forever. #984

Lukasa · 2019-04-29T16:43:31Z

Motivation:

The HTTPDecoder is a complex object that has very careful state management goals. One source of this
complexity is that it is fed a stream of bytes with arbitrary chunk sizes, but needs to produce a
collection of objects that are contiguous in memory. For example, each header field name and value
must be turned into a String, which requires a contiguous sequence of bytes to do.

As a result, it is quite common to have a situation where the HTTPDecoder has only part of an
object that must be emitted atomically. In this situation, the HTTPDecoder would like to instruct
its ByteToMessageHandler to keep hold of the bytes that form the beginning of that object. To avoid
asking http_parser to parse those bytes twice, the HTTPDecoder uses a value called httpParserOffset
to keep track.

As an example, consider what would happen if the "Connection: keep-alive\r\n" header field was delivered
in two chunks: first "Connection: keep-al", and then "ive\r\n". The header field name can be emitted in
its entirety, but the partial field value must be preserved. To achieve this, the HTTPDecoder will store
an offset internally to keep track of which bytes have been parsed. In this case, the offset will be set
to 7: the number of bytes in "keep-al". It will then tell the rest of the code that only 12 bytes of the
original 19 byte message were consumed, causing the ByteToMessageHandler to preserve those 7 bytes.

However, when the next chunk is received, the ByteToMessageHandler will replay those bytes to
HTTPDecoder. To avoid parsing them a second time, HTTPDecoder keeps track of how many bytes it is
expecting to see replayed. This is the value in httpParserOffset.

Due to a logic error in the HTTPDecoder, the httpParserOffset field was never returned to zero.
This field would be modified whenever a partial field was received, but would never be returned
to zero when a complete message was parsed. This would cause the HTTPDecoder to unnecessarily keep
hold of extra bytes in the ByteToMessageHandler even when they were no longer needed. In some cases
the number could get smaller, such as when a new partial field was received, but it could never drop
to zero even when a complete HTTP message was receivedincremented.

Happily, due to the rest of the HTTPDecoder logic this never produced an invalid message: while
ByteToMessageHandler was repeatedly producing extra bytes, it never actually passed them to http_parser
again, or caused any other issue. The only situation in which a problem would occur is if the HTTPDecoder
had a RemoveAfterUpgradeStrategy other than .dropBytes. In that circumstance, decodeLast would not
consume any extra bytes, but those bytes would have remained in the buffer passed to decodeLast, which
would then incorrectly forward them on. This is the only circumstance in which this error manifested,
and in most applications it led to surprising and irregular crashes on connection teardown. In all
other applications the only effect was unnecessarily preserving a few tens of extra bytes on
some connections, until receiving EOF caused us to drop all that memory anyway.

Modifications:

Return httpParserOffset to 0 when a full message has been delivered.

Result:

Fewer weird crashes.

Motivation: The HTTPDecoder is a complex object that has very careful state management goals. One source of this complexity is that it is fed a stream of bytes with arbitrary chunk sizes, but needs to produce a collection of objects that are contiguous in memory. For example, each header field name and value must be turned into a String, which requires a contiguous sequence of bytes to do. As a result, it is quite common to have a situation where the HTTPDecoder has only *part* of an object that must be emitted atomically. In this situation, the HTTPDecoder would like to instruct its ByteToMessageHandler to keep hold of the bytes that form the beginning of that object. To avoid asking http_parser to parse those bytes twice, the HTTPDecoder uses a value called httpParserOffset to keep track. As an example, consider what would happen if the "Connection: keep-alive\r\n" header field was delivered in two chunks: first "Connection: keep-al", and then "ive\r\n". The header field name can be emitted in its entirety, but the partial field value must be preserved. To achieve this, the HTTPDecoder will store an offset internally to keep track of which bytes have been parsed. In this case, the offset will be set to 7: the number of bytes in "keep-al". It will then tell the rest of the code that only 12 bytes of the original 19 byte message were consumed, causing the ByteToMessageHandler to preserve those 7 bytes. However, when the next chunk is received, the ByteToMessageHandler will *replay* those bytes to HTTPDecoder. To avoid parsing them a second time, HTTPDecoder keeps track of how many bytes it is expecting to see replayed. This is the value in httpParserOffset. Due to a logic error in the HTTPDecoder, the httpParserOffset field was never returned to zero. This field would be modified whenever a partial field was received, but would never be returned to zero when a complete message was parsed. This would cause the HTTPDecoder to unnecessarily keep hold of extra bytes in the ByteToMessageHandler even when they were no longer needed. In some cases the number could get smaller, such as when a new partial field was received, but it could never drop to zero even when a complete HTTP message was receivedincremented. Happily, due to the rest of the HTTPDecoder logic this never produced an invalid message: while ByteToMessageHandler was repeatedly producing extra bytes, it never actually passed them to http_parser again, or caused any other issue. The only situation in which a problem would occur is if the HTTPDecoder had a RemoveAfterUpgradeStrategy other than .dropBytes. In that circumstance, decodeLast would not consume any extra bytes, but those bytes would have remained in the buffer passed to decodeLast, which would then incorrectly *forward them on*. This is the only circumstance in which this error manifested, and in most applications it led to surprising and irregular crashes on connection teardown. In all other applications the only effect was unnecessarily preserving a few tens of extra bytes on some connections, until receiving EOF caused us to drop all that memory anyway. Modifications: - Return httpParserOffset to 0 when a full message has been delivered. Result: Fewer weird crashes.

weissi

Thank you! And sorry for the complexity in HTTPDecoder :'(

Motivation: The HTTPDecoder is a complex object that has very careful state management goals. One source of this complexity is that it is fed a stream of bytes with arbitrary chunk sizes, but needs to produce a collection of objects that are contiguous in memory. For example, each header field name and value must be turned into a String, which requires a contiguous sequence of bytes to do. As a result, it is quite common to have a situation where the HTTPDecoder has only *part* of an object that must be emitted atomically. In this situation, the HTTPDecoder would like to instruct its ByteToMessageHandler to keep hold of the bytes that form the beginning of that object. To avoid asking http_parser to parse those bytes twice, the HTTPDecoder uses a value called httpParserOffset to keep track. As an example, consider what would happen if the "Connection: keep-alive\r\n" header field was delivered in two chunks: first "Connection: keep-al", and then "ive\r\n". The header field name can be emitted in its entirety, but the partial field value must be preserved. To achieve this, the HTTPDecoder will store an offset internally to keep track of which bytes have been parsed. In this case, the offset will be set to 7: the number of bytes in "keep-al". It will then tell the rest of the code that only 12 bytes of the original 19 byte message were consumed, causing the ByteToMessageHandler to preserve those 7 bytes. However, when the next chunk is received, the ByteToMessageHandler will *replay* those bytes to HTTPDecoder. To avoid parsing them a second time, HTTPDecoder keeps track of how many bytes it is expecting to see replayed. This is the value in httpParserOffset. Due to a logic error in the HTTPDecoder, the httpParserOffset field was never returned to zero. This field would be modified whenever a partial field was received, but would never be returned to zero when a complete message was parsed. This would cause the HTTPDecoder to unnecessarily keep hold of extra bytes in the ByteToMessageHandler even when they were no longer needed. In some cases the number could get smaller, such as when a new partial field was received, but it could never drop to zero even when a complete HTTP message was receivedincremented. Happily, due to the rest of the HTTPDecoder logic this never produced an invalid message: while ByteToMessageHandler was repeatedly producing extra bytes, it never actually passed them to http_parser again, or caused any other issue. The only situation in which a problem would occur is if the HTTPDecoder had a RemoveAfterUpgradeStrategy other than .dropBytes. In that circumstance, decodeLast would not consume any extra bytes, but those bytes would have remained in the buffer passed to decodeLast, which would then incorrectly *forward them on*. This is the only circumstance in which this error manifested, and in most applications it led to surprising and irregular crashes on connection teardown. In all other applications the only effect was unnecessarily preserving a few tens of extra bytes on some connections, until receiving EOF caused us to drop all that memory anyway. Modifications: - Return httpParserOffset to 0 when a full message has been delivered. Result: Fewer weird crashes. (cherry picked from commit ae3d298)

Lukasa added the 🔨 semver/patch No public API change. label Apr 29, 2019

Lukasa added this to the 2.1.0 milestone Apr 29, 2019

Lukasa requested a review from weissi April 29, 2019 16:43

weissi approved these changes Apr 29, 2019

View reviewed changes

Merge branch 'master' into cb-no-leftovers-after-drip-feeding

371696b

Lukasa merged commit ae3d298 into apple:master Apr 29, 2019

Lukasa deleted the cb-no-leftovers-after-drip-feeding branch April 29, 2019 17:00

weissi modified the milestones: 2.1.0, 2.0.2 Apr 30, 2019

Lukasa modified the milestones: 2.0.2, 2.1.0 May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid keeping hold of partial bytes forever. #984

Avoid keeping hold of partial bytes forever. #984

Lukasa commented Apr 29, 2019

weissi left a comment

Avoid keeping hold of partial bytes forever. #984

Avoid keeping hold of partial bytes forever. #984

Conversation

Lukasa commented Apr 29, 2019

weissi left a comment

Choose a reason for hiding this comment