Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

recover readers exactly from checkpoint #1620

Merged
merged 10 commits into from
Jul 23, 2024
Merged

Conversation

Abingcbc
Copy link
Collaborator

问题

之前在从checkpoint恢复reader时,无论之前reader在哪个位置,都会将reader放置到readerArray中,然后在后续移动到正确的位置。
如果出现日志轮转文件数量大于readArray最大大小时,并且出现inode复用,就会出现readerArray顺序错误(readerArray强要求,reader顺序按照文件轮转降序排列,即 log.2 log.1 log)
目前,已知导致的问题:

  1. 错误恢复导致的reader关闭,日志截断
  2. 错误恢复导致文件读取位置置为开头,日志采集重复

解决方法

在checkpoint中新增字段,保存reader在readerArray中的位置。
-2:不在队列中
-1:默认值,新建的reader,需要放置到readerArray末尾
>0:在readerArray中的实际位置
在从checkpoint中恢复的时候,根据不同的值,恢复reader到之前准确的位置。

Copy link
Collaborator

@henryzhx8 henryzhx8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. checkpoint恢复时不要收到队列长度20的限制;
  2. 确认一下升级的那一下,checkpoint里没有idx字段的情况

core/event_handler/EventHandler.cpp Show resolved Hide resolved
commit c8385ca
Author: henryzhx8 <zhhxjack8@aliyun.com>
Date:   Fri Jul 19 09:42:31 2024 +0800

    fix core caused by concurrent use of non-thread-safe gethostbyname (alibaba#1611)

    * fix core caused by concurrent use of non-thread-safe gethostbyname

commit 8fc252e
Author: Qiu Fengshuo <alph000@163.com>
Date:   Thu Jul 18 09:42:06 2024 +0800

    speedup CI UT job (alibaba#1606)

    * Split the original UT CI job into two separate jobs: one with SPL and one without SPL

    * fix: change design. Build .a and .so at the same CI UT job

    * fix

    * fix

    * fix

    * fix

    * fix

    * fix

    * fix
@messixukejia
Copy link
Collaborator

需要针对性的给出e2e测试场景构造,让这些场景都有机会触发。专门记个任务吧。

core/event_handler/EventHandler.cpp Outdated Show resolved Hide resolved
core/event_handler/EventHandler.cpp Outdated Show resolved Hide resolved
@henryzhx8 henryzhx8 added the bug Something isn't working label Jul 23, 2024
@henryzhx8 henryzhx8 added this to the v2.0 milestone Jul 23, 2024
@henryzhx8 henryzhx8 merged commit 939937a into alibaba:main Jul 23, 2024
15 checks passed
Abingcbc added a commit to Abingcbc/ilogtail that referenced this pull request Jul 23, 2024
Abingcbc added a commit to Abingcbc/ilogtail that referenced this pull request Jul 23, 2024
Abingcbc added a commit to Abingcbc/ilogtail that referenced this pull request Jul 23, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants