Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Simplified DiskWriterQueue with blocking concurrency #2411

Merged
merged 2 commits into from
Feb 13, 2024

Conversation

ltetak
Copy link
Contributor

@ltetak ltetak commented Jan 25, 2024

It is relatively easy to put the DiskWriterQueue into a state where it does nothing. It is caused by mismatches where the logic does not track properly which _task is the current one. It has many problems:

  • Wait() waits for a wrong _task
  • _Task is not started at all

e.g.
#2307

My repro steps were to run a lot of Inserts and Deletes in parallel (to fill up the disk queue). Then every couple of seconds run _db.Checkpoint() to force full db lock and Wait() invocation.

Fix is to use a much simpler blocking approach (one thread is dedicated to this). It is a good tradeoff IMO for now. It can be later replaced with an awaitable mutex version.
Edit: I added an async version of the semaphore which does not block the thread.

@ltetak ltetak mentioned this pull request Jan 25, 2024
@mbdavid
Copy link
Collaborator

mbdavid commented Feb 13, 2024

Thanks! This are an old code that must be updated.

@mbdavid mbdavid merged commit 6d2a165 into litedb-org:master Feb 13, 2024
1 check failed
@jdtkw
Copy link

jdtkw commented Mar 6, 2024

Thanks @ltetak - this indeed resolved our isue (#2307 - I work with @dgodwin1175), but v5.0.18 and v5.0.19 causes us to hit #2435 prior to being able to validate this with an official build. A custom build of #2436 on top of v5.0.19 (that includes #2411) seems to indicate that we can have a stable solution.

@ltetak
Copy link
Contributor Author

ltetak commented Mar 6, 2024

hi @jdtkw, transaction (and especially AutoTransaction class) was the next thing I wanted to take a look at. I know about a couple of problems there.

  1. AutoTransaction can fail when reverting the transaction - this is bad by itself but it's double-bad because it hides the original exception.
  2. Error handling in transactions is wrong causing wrong counts. Fix #2435 Transactions are not removed in LiteDB 5.0.18 #2436 may be a fix to it but we need to be sure the DB is in a good state. There are a lot of "ENSURE" errors. My guess is that some transaction does not return the DB to a valid state and it breaks it.
    We run the database in single threaded mode (we serialize every access to the db by locks) so it must be either a problem in the algorithm somewhere or some external exception. I have some evidence that external exceptions make this problem much worse so I would start there - it means if you have an unstable storage medium causing random exceptions it may lead to a corrupted database (which should not happen thanks to the journal approach).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants