-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Complete snapshot implementation #275
Comments
TheR1sing3un
added a commit
to TheR1sing3un/dledger
that referenced
this issue
Feb 19, 2023
1. support protocol about install snapshot Closes openmessaging#275
@TheR1sing3un Great idea. Please add further info on how you would implement the snapshot installation based on the architecture of DLedger if you have time. |
TheR1sing3un
moved this from 📋 Backlog
to 🏗 In progress
in @TheR1sing3un's opensource works
Apr 7, 2023
TheR1sing3un
added a commit
to TheR1sing3un/dledger
that referenced
this issue
Jun 4, 2023
1. support protocol about install snapshot Closes openmessaging#275
TheR1sing3un
added a commit
to TheR1sing3un/dledger
that referenced
this issue
Jun 4, 2023
1. support protocol about install snapshot Closes openmessaging#275
TheR1sing3un
added a commit
to TheR1sing3un/dledger
that referenced
this issue
Jul 9, 2023
1. support protocol about install snapshot Closes openmessaging#275
TheR1sing3un
added a commit
to TheR1sing3un/dledger
that referenced
this issue
Jul 9, 2023
1. support protocol about install snapshot Closes openmessaging#275
RongtongJin
pushed a commit
that referenced
this issue
Jul 15, 2023
* feat(core): support protocol about install snapshot 1. support protocol about install snapshot Closes #275 * feat(core): support protocol about install snapshot 1. support protocol about install snapshot Closes #275 * feat(core): refactor basic dledger overall structure to make it more "raft" like 1. refactor basic dledger overall structure to make it more "raft" like * feat(core): pass all original test 1. pass all original test * feat(core): support batch append 1. support batch append * fix(example): resolve conflicts after rebasing master 1. resolve conflicts after rebasing master * fix(jepsen): resolve conflicts about jepsen after rebasing master 1. resolve conflicts about jepsen after rebasing master * fix(jepsen): fix type error 1. fix type error * feat(core): support installing snapshot 1. support installing snapshot * feat(core): support installing snapshot 1. support installing snapshot * feat(jepsen): test snapshot in jepsen 1. test snapshot in jepsen * test(core): polish flaky test 1. polish flaky test * rerun * feat(core): commit entry which is in current term 1. commit entry which is in current term * rerun * rerun * rerun * fix(core): make the field: position in DLedgerEntry meaningless 1. make the field: position in DLedgerEntry meaningless * test(core): use different store base path for each ut 1. use different store base path for each ut * test(core): use different store base path for each ut 1. use different store base path for each ut * rerun * test(core): use different store base path for each ut 1. use different store base path for each ut * test(core): use different store base path for each ut 1. use different store base path for each ut * test(core): use different store base path for each ut 1. use different store base path for each ut * fix(core): update peer watermark when compare success 1. update peer watermark when compare success * fix(core): fix 1. fix * fix(core): fix 1. fix * test(proxy): remove proxy test 1. remove proxy test * feat(example): add batch append cmd 1. add batch append cmd * fix(core): reuse forks 1. reuse forks * chore(global): add more git ignore file 1. add more git ignore file * build(global): set reuseForks to false 1. set reuseForks to false * rerun * feat(core): clear pending map and set writeIndex when role dispatcher role change from append to compare 1. clear pending map and set writeIndex when role dispatcher role change from append to compare * rerun
github-project-automation
bot
moved this from 🏗 In progress
to ✅ Done
in @TheR1sing3un's opensource works
Jul 15, 2023
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
现状
目前已经实现了初步的快照,但是快照仅用于启动时的快速replay到状态机。目前会有如下的问题:
当leader给follower进行日志append时,目标日志已经因为快照生成而被删除,导致无法找到该日志,leader一直报错,并且follower也一直无法同步到这条日志,如果当前raft peers中过半follower都出现上述问题,那么整个集群将处于不可用的状态。
解决
我们需要实现完整的RAFT快照协议,也就是目前需要实现当follower需要同步leader已经被快照删除的日志的时候,leader需要直接发送当前最新的快照到follower,用于follower的快速同步。
论文解析
我们的日志肯定是不可以持续的增长下去的,因为当我们日志数量达到很大的时候,比如说我们的日志数据已经达到了几千万条的时候,我们和一个还没有多少数据的跟随者进行同步的话,需要将这些日志全部发送,其实是十分浪费资源和时间的。
那么我们其实可以使用快照,也就是对领袖某一个时刻它的状态机的数据进行保存,然后将这个快照发送给那些很落后的节点进行快速的同步,同时由于快照已经记录此时的所有必要数据,那么我们可以将这些日志删除,避免日志无限度的增长下去。
论文中的
Figure 13
是安装快照的RPC的参数和实现。参数:
结果:
接收者实现:
term
<currentTerm
则立马回复。offset*
为0)offset
处开始写入数据。done
不为true,那么回复然后等待更多的数据分块传来。lastIncludedIndex
小的快照或者部分快照。实现快照
快照生成
onCommit
进行提交。doCommit
方法。onCommit
用于在状态机中应用目前被提交但未被apply的日志。saveSnapshot
方法用于判断当前是否需要进行快照,以及后续的快照操作。createSnapshotWritter
用于生成一个快照文件的writer。SnapshotSaveHook
用于保存基本的快照元数据信息和writer对象,以及用于后续回调操作。onSnapshotSave
将该快照保存任务放入任务队列。doSnapshotSave
方法。onSnapshotSave
用于让状态机将自身状态生成一个快照。快照加载
loadSnapshot
方法。snapshotReader
用于从快照存储空间中读取快照元数据和实际数据。snapshotLoadHook
钩子函数,推进实际的快照读取任务以及读取之后的回调。onSnapshotLoad
方法,生成一个快照读取任务,然后放入到任务队列。doSnapshotLoad
方法用于实际的快照读取。snapshotReader
中读取SnapshotStore中的该快照的元数据信息,判断该快照目前是否有效。onSnapshootLoad
。snapshotReader
中读取SnapshotStore中的实际快照数据,然后更新自己的状态机。snapshotLoadHook
的回调。lasIncluedIndex+1
开始。快照安装
InstallSnapshot
的RPC请求,将从本地的SnapshotManager获取一个可用的快照数据,然后通过上述请求携带发送。InstallSnapshot
的请求,先进行一次有效判断,即判断leader身份和快照是否当前仍有效。installSnapshot
方法,发起一次快照安装。snapshotReader
。Install
类型的snapshotLoadHook
,这里和普通的快照加载中的hook进行区分,因为读取后的回调函数逻辑不同。onSnapshotLoad
方法将该任务入列。doSnapshotLoad
方法。snapshotReader
从SnapshotStore中读取元数据信息,判断该快照目前是否有效。onSnapshotLoad
方法。Install
的snapshotLoadHook
回调函数。lastIncludedIndex
前的日志都清空,并且更新Raft的commitIndex
。优化
快照发送
目前我们先实现直接通过一次request来发送所有的快照数据,但是实际生产环境下的快照数据都不会很小,一次请求就直接发送全部的数据不太现实,因此可以这里进行分chunk发送。
The text was updated successfully, but these errors were encountered: