-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Busy hang #104
Comments
Just to check, are you waiting for each result to return before running the next insert? |
Hmm. The real answer is "I don't know" :-) I just run a loop that builds the insert queries and calls |
r/run is blocking, so that should be waiting. I'll take a look tomorrow. |
Also, this is a fairly large table. Before that I was able to insert 1878859 records and 3699819 records into two other tables, so this is not something that happened immediately. |
Hey sorry I haven't had a chance to look into this just yet. As a temporary workaround, have you considered using multiple connections? e.g. closing the old connection and creating a new one every 100,000 inserts? Obviously not a great long term solution, but it might help in the short term. |
Thanks, I don't really need a workaround, because this isn't anything urgent — I'm just trying to use clj-rethinkdb for larger things. It works for my smaller application (https://partsbox.io/), but so far seems to break for larger amounts of data. I'm increasingly worried about the complexity in the driver and the (inevitable) resulting problems. |
I'm looking to move all of the connection and results handling to the official Java RethinkDB driver as soon as there is a release of it, which offloads the hard part to them, and lets us just focus on the query language. |
FWIW, I think this is a very good idea. We are a small community for now, so we should keep our code as simple as possible. |
Can you try this again? We ran into a similar sounding issue (nothing was running, but no exceptions being thrown). The root cause turned out to be that all of the go threads were blocked waiting on reads from the db. The connection code has been rewritten with manifold which shouldn't suffer from this. |
Of course. I will have to dig up that project and reproduce the problem again, which might take a while, though. |
I retried this with I find it slightly alarming that performance drops with time when inserting larger numbers of records — from about 8-10k inserts/s at the beginning down to around 2k inserts/s after 23M records have been inserted (it's for a table with no indexes, and the data is fairly homogeneous). But I strongly suspect it's a RethinkDB issue. It seems to be reading about twice as much data as is being written to disk. As for clj-rethinkdb, I noticed that its performance improved since
This is with a smaller task that doesn't hit RethinkDB's limitations. That's a 30% improvement! Nice! |
Great to hear! The degraded performance after lots of inserts is interesting, although there's not really enough info to say whether that's RethinkDB, the driver, or something else entirely. If you felt like it, taking a VisualVM profile over time would be helpful to check if there's a memory leak which is causing excessive GCing. Thanks for following up! |
I confirmed that the degraded performance is due to RethinkDB and is to be expected: rethinkdb/rethinkdb#5805 |
Thanks for that, I'll watch that issue with interest. |
I hit another problem while inserting large amounts of data: the main process eventually hangs at close to 400% CPU usage, while RethinkDB stops receiving new data.
This happened after a loop inserted 14432000 records. The total number to be inserted was 24795802.
Records were inserted in batches (vectors) of 1024 at a time.
Unfortunately, I have little to go on, as the main process just continued spinning, there were no exceptions to be seen. I captured a CPU sample using YourKit, which might point someone to the problem.
This is using
[com.apa512/rethinkdb "0.11.0"]
, so it's not the memory leak I reported beforeThe text was updated successfully, but these errors were encountered: