-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix(spool): Revert to page counts for size estimate #3379
Conversation
@@ -1569,26 +1569,25 @@ mod tests { | |||
"buffer.writes:1|c", | |||
"buffer.envelopes_written:3|c", | |||
"buffer.envelopes_disk_count:3|g", | |||
"buffer.disk_size:1031|h", | |||
"buffer.disk_size:24576|h", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty far off. But the relative error should be lower for larger disk buffers.
@@ -74,7 +74,9 @@ pub fn delete<'a>(key: QueueKey) -> Query<'a, Sqlite, SqliteArguments<'a>> { | |||
/// | |||
/// This info used to calculate the current allocated database size. | |||
pub fn current_size<'a>() -> Query<'a, Sqlite, SqliteArguments<'a>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the docs for this function and the calling function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fun fact: The doc on the calling function was already correct because we never updated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, it didn't subtract the free pages.
// Result: | ||
// pgsize = 2'408'464'463 t.elapsed() = 7.007533833s | ||
// pgsize_agg = 2'408'464'463 t.elapsed() = 5.010104791s | ||
// brute_force = 1'750'000'000 t.elapsed() = 7.893590875s | ||
// pragma = 3'036'307'456 t.elapsed() = 213.417µs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! But it over estimates by quite a bit > 1 GiB?
Edit: To clarify I think that's okay, just surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep the estimates are pretty far off, this is something we will have to keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is changelog worthy, in my PR I had:
**Bug Fixes**:
- Fix performance regression in the spooler. ([#3378](https://github.com/getsentry/relay/pull/3378))
As discussed with @olksdr: Because the spool size estimation is also used to decide whether or not to transition back to memory mode, and the new computation over-estimates the spool size, this PR might require a follow-up if we notice that Relay gets stuck in disk mode. With the performance regression fixed, getting stuck in disk mode should not be a big problem performance-wise. We could even consider operating always in disk mode to simplify the state machine. |
INC-703 showed that
estimate_spool_size
viadbstat
is too slow. This PR replaces the estimate with the number of non-free pages multiplied by the page size. Though inaccurate, benchmarks show that this is fast regardless of the db size.Fixes https://github.com/getsentry/team-ingest/issues/307.