-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Large MaxTableSize and AndreasBriese/bbloom #745
Comments
@zorino What does |
|
It looks like the func makeslice(et *_type, len, cap int) unsafe.Pointer {
mem, overflow := math.MulUintptr(et.size, uintptr(cap))
if overflow || mem > maxAlloc || len < 0 || len > cap {
// NOTE: Produce a 'len out of range' error instead of a
// 'cap out of range' error when someone does make([]T, bignumber).
// 'cap out of range' is true too, but since the cap is only being
// supplied implicitly, saying len is clearer.
// See golang.org/issue/4085.
mem, overflow := math.MulUintptr(et.size, uintptr(len))
if overflow || mem > maxAlloc || len < 0 {
panicmakeslicelen()
}
panicmakeslicecap()
}
return mallocgc(mem, et, true)
} @zorino You might want to try with a smaller |
Yes it works, but I was trying to overcome the max open files of 1024 (ulimit -n) which I have no control over. Thank you anyway. I was able to run it with 768 << 20 and then will try doing backup/restore/flatten + aggressive GC (as proposed here #718) to limit the number of files. I already set the ValueLogMaxEntries to 100000000 giving vlog files of 1.1GB. Any other recommendations would be welcome. |
You can increase your ulimit. 1024 files are not that many. Also, if you have larger SSTables, you'd have issues later during compactions. Each compaction could iterate over 10+1 SSTables (assuming each level is 10x the size of the previous level), which would take a long time if your SSTable is 768MB (instead of 64MB). |
Yeah but that's the problem I cannot set ulimit to more than 2048 since I'm running the creation of the KV stores on grid computers. If I decrease the LevelSizeMultiplier to 3 would that make less iteration for the compaction ? I'm trying to build a rather big KV store and my bulk creation is time consuming. I had to split my data entries (tsv files here) into 10 files of ~10M lines each. Each line creates in average ~300 KV for my main KV store but there is a lot of duplicate in there, but that is still ~3 billions key to process. In the first iteration of my program, I was updating the value in place but that was too slow... now I insert everything in one run and then I iterate over all KVs (stream API) and merging the keys that have several versions (with a discard earlier versions). Also to improve the throughput I have several badger KV stores which store the features as well as the combination of features (sha1sum) into KVs. E.g. K_store : k_key -> kk_store_hash_key OS : Right now it takes me ~400 minutes to build a store and almost 3x that time to merge all the KVs. I would also like to improve the load, for the first batch insert it goes around 15 / 32 in average (except for the first few minutes where the load is awesome) but for the stream / merge afterward not much more then 3-4/32 but my merging routines are inherently slow since I need to query several other KV store so I can live with that. Anyway that was just to give you some context and thank you again for your help. |
When setting a large MaxTableSize of 1024 << 20 I'm getting the following error :
The text was updated successfully, but these errors were encountered: