-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Ability to truncate raft.db on nomad server nodes #4477
Comments
can you please put some actual numbers to these definition ? :) |
@jippi In our case nomad servers was on t2.micro instances. And we launch spark jobs with 2000 executors(ie 2000 allocations). After cluster hung, due insufficient hardware we upgrade servers to t2.xlarge, but fo now |
@tantra35 that seem quite underpowered, see the recommended production requirements - especially with the bursty capacity of the t2 family - personally we use m4.xlarge and m4.2xlarge in prod |
@jippi I understand this(and we already made new more power cluster), but it seems looks strange that after we dramatically reduced allocs count(ie stop all spark jobs), nomad doesn't return occupied memory, and doesn't make this even after restart server, its not problem to replace all server nodes(in that case as i mention above, new server takes expected amount of memory), but sometimes this not so easy (fo example in static dedicated hardware) |
did you try to force a gc? |
@jippi this was first what we did |
@burdandrei can you share how you solve this in your case? |
@tantra35 Until nomad takes a snapshot, the raft DB is not compacted. Currently the threshold of when we take a snapshot is not tunable, it does it after 8192 raft writes. What's the output of We could make this value tunable but that's a tradeoff between disk I/O (since compacting the raft DB via snapshots is expensive) vs disk utilization. I would recommend increasing disk capacity on your servers. |
well @tantra35 in the meantime i changed servers to be i3.xlarge to have 32 GB ram and NVME for the data dir. but to be honest raft.db file is hardly reaching 200MB. @preetapan It shows that deep inside nomad is not foolproof. =( |
here is our values:
due cluster now in static state(no new allocations add or removed) we have no any chance move the case from the dead point, but if we had the opportunity to send some signal to nomad for example USR1, or nomad can making some cleanup at start(that seems reasonable) , or static tool to maintain |
@preetapan A few seconds ago on one of our cluster diff between
but snapshot(
In logs we see follow
Its very strange and looks like a bug, that from(
in nomad case
I think in nomad After write little prorgam that emulates package main
import (
"fmt"
raftboltdb "github.com/hashicorp/raft-boltdb"
)
func main() {
l_db, _ := raftboltdb.NewBoltStore("./raft.db")
l_first,_ := l_db.FirstIndex()
l_end,_ := l_db.LastIndex()
fmt.Printf("[INFO] raft: Compacting logs from %d to %d\n", l_first, l_end)
if l_err := l_db.DeleteRange(l_first, l_end); l_err != nil {
fmt.Printf("[ERR] log compaction failed: %v\n", l_err)
} else {
fmt.Printf("[INFO] log compaction ok\n")
}
l_db.Close()
} We found that raft.db not truncated, and we make manual compaction throw
after that nomad servers stop eat too many memory |
Nomad version
0.8.4
After some launch of huge jobs we have very big
raft,db
files on server nodes, so restart of nomad server takes huge amount of time, and nomad server eats too many memory. When we replace one server with this hugeraft.db
with new one, on that noderaft.db
not so big, and nomad eats expected amount of memory. So it will be very usfull to manually truncateraft.db
I found some mention about the same problem in consul issue tracker:
hashicorp/consul#866
But not clearly to my is nomad have the same behavior
The text was updated successfully, but these errors were encountered: