forked from chilts/mongodb-queue
-
Notifications
You must be signed in to change notification settings - Fork 0
🗃 Remove visible
when acking jobs
#16
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a non-breaking change that forms part of the migration path to improving our performance in #15 When acking a job, we now unset the `visible` property, which makes `visible` and `deleted` mutually exclusive properties, so that the presence of one implies the absence of the other. We can do this, because the only time we query on `visible` is when we *also* query for `deleted: {$exists: false}`. Similarly, the only times we query for `deleted: {$exists: true}` are times when we don't query `visible`.
alecgibson
added a commit
that referenced
this pull request
Feb 6, 2025
This is a **BREAKING CHANGE** that aims to improve database performance by changing the queries and indexes used to perform operations. There will be no changes visible to consumers of the JavaScript API. The break will: - require consumers to run a migration query before upgrading - add new indexes - allow dropping of an old index More details on migration are at the bottom of this commit message. Motivation ---------- This library seems to have been written with the assumption that its collections are small. However, we have hundreds of thousands of jobs in various queues on Production, which causes some slow queries because of the design choices made in the schema of this library. In particular, we aim to address two issues: 1. `{deleted: {$exists: boolean}}` calls are inefficient, but used by practically every query in this library 2. counting inflight jobs has awful performance when there are many available jobs The result is that no further filtering is needed beyond the index on any of the issued queries. `$exists` --------- MongoDB's `$exists` operator has [tricky index performance][1]. In the best cases, `$exists: true` can use the index, but only if it's sparse, and `$exists: false` can **never** just use the index: it always needs to fetch documents. In order to avoid the constant use of `$exists` in this library, we rely on a logical paradigm shift: we `$unset` `visible` when acking the job, so that `visible` and `deleted` are mutually exclusive fields. Therefore we can: - add a sparse index for both of these fields - query on the field we care about, and **know** that it implies the absence of the other field, allowing the removal of `$exists` assertions It should be noted that in local testing, I observed [strange behaviour][2] when trying to use this partial index: we have to use `ack: {$gt: ''}` instead of `ack: {$exists: true}` to get MongoDB to leverage this index for some reason. `inFlight()` ------------ The existing `inFlight()` query has particularly bad performance in cases where there are many (hundreds of thousands) of jobs available to pick up. This is because the existing query uses the `deleted_1_visible_1` index, but even after filtering by `deleted` and `visible` with the index, the database will need to fetch every single job that could be picked up, and check for `ack`, which is very slow. We improve the performance here by: - the removal of the `$exists` query (see above) - the addition of a partial index that only contains unacked jobs that have been retrieved at some point by `get()`. We can then filter these by the current time to find in-flight jobs Migration path -------------- This performance improvement is built upon a shift in the assumptions made about underlying job structure: namely, that `deleted` and `visible` are now mutually exclusive properties (which was not true before). 1. Bump patch version to [`7.1.1`][3]: this will start removing the `visible` property from acked jobs in a non-breaking way 2. Deploy the patch to Production 3. Update any existing documents to match this new schema: ```js db.collection.updateMany( {deleted: {$exists: true}}, {$unset: {visible: 1}}, ) ``` 4. Bump major version to `8.0.0` and deploy 5. Drop old index `delted_1_visible_1`, which is no longer used [1]: https://www.mongodb.com/docs/manual/reference/operator/query/exists/#use-a-sparse-index-to-improve--exists-performance [2]: https://www.mongodb.com/community/forums/t/partial-index-is-not-used-during-search/290507/2 [3]: #16
dawidreedsy
approved these changes
Feb 6, 2025
alecgibson
added a commit
that referenced
this pull request
Feb 10, 2025
This is a **BREAKING CHANGE** that aims to improve database performance by changing the queries and indexes used to perform operations. There will be no changes visible to consumers of the JavaScript API. The break will: - require consumers to run a migration query before upgrading - add new indexes - allow dropping of an old index More details on migration are at the bottom of this commit message. Motivation ---------- This library seems to have been written with the assumption that its collections are small. However, we have hundreds of thousands of jobs in various queues on Production, which causes some slow queries because of the design choices made in the schema of this library. In particular, we aim to address two issues: 1. `{deleted: {$exists: boolean}}` calls are inefficient, but used by practically every query in this library 2. counting inflight jobs has awful performance when there are many available jobs The result is that no further filtering is needed beyond the index on any of the issued queries. `$exists` --------- MongoDB's `$exists` operator has [tricky index performance][1]. In the best cases, `$exists: true` can use the index, but only if it's sparse, and `$exists: false` can **never** just use the index: it always needs to fetch documents. In order to avoid the constant use of `$exists` in this library, we rely on a logical paradigm shift: we `$unset` `visible` when acking the job, so that `visible` and `deleted` are mutually exclusive fields. Therefore we can: - add a sparse index for both of these fields - query on the field we care about, and **know** that it implies the absence of the other field, allowing the removal of `$exists` assertions It should be noted that in local testing, I observed [strange behaviour][2] when trying to use this partial index: we have to use `ack: {$gt: ''}` instead of `ack: {$exists: true}` to get MongoDB to leverage this index for some reason. `inFlight()` ------------ The existing `inFlight()` query has particularly bad performance in cases where there are many (hundreds of thousands) of jobs available to pick up. This is because the existing query uses the `deleted_1_visible_1` index, but even after filtering by `deleted` and `visible` with the index, the database will need to fetch every single job that could be picked up, and check for `ack`, which is very slow. We improve the performance here by: - the removal of the `$exists` query (see above) - the addition of a partial index that only contains unacked jobs that have been retrieved at some point by `get()`. We can then filter these by the current time to find in-flight jobs Migration path -------------- This performance improvement is built upon a shift in the assumptions made about underlying job structure: namely, that `deleted` and `visible` are now mutually exclusive properties (which was not true before). 1. Bump patch version to [`7.1.1`][3]: this will start removing the `visible` property from acked jobs in a non-breaking way 2. Deploy the patch to Production 3. Update any existing documents to match this new schema: ```js db.collection.updateMany( {deleted: {$exists: true}}, {$unset: {visible: 1}}, ) ``` 4. Bump major version to `8.0.0` and deploy 5. Drop old index `delted_1_visible_1`, which is no longer used [1]: https://www.mongodb.com/docs/manual/reference/operator/query/exists/#use-a-sparse-index-to-improve--exists-performance [2]: https://www.mongodb.com/community/forums/t/partial-index-is-not-used-during-search/290507/2 [3]: #16
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a non-breaking change that forms part of the migration path to improving our performance in #15
When acking a job, we now unset the
visible
property, which makesvisible
anddeleted
mutually exclusive properties, so that the presence of one implies the absence of the other.We can do this, because the only time we query on
visible
is when we also query fordeleted: {$exists: false}
. Similarly, the only times we query fordeleted: {$exists: true}
are times when we don't queryvisible
.