Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Remote output health improvements #4185

Open
michel-laterman opened this issue Dec 6, 2024 · 1 comment
Open

Remote output health improvements #4185

michel-laterman opened this issue Dec 6, 2024 · 1 comment
Labels
enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@michel-laterman
Copy link
Contributor

Describe the enhancement:

Currently remote output health is reported (when updateState is called) in the policy-self monitor:

func reportOutputHealth(ctx context.Context, bulker bulk.Bulk, zlog zerolog.Logger) {
//pinging logic
bulkerMap := bulker.GetBulkerMap()

This creates a document in the primary ES instance with the output health status:

func CreateOutputHealth(ctx context.Context, bulker bulk.Bulk, doc model.OutputHealth) error {
return createOutputHealth(ctx, bulker, FleetOutputHealth, doc)
}
func createOutputHealth(ctx context.Context, bulker bulk.Bulk, index string, doc model.OutputHealth) error {
if doc.Timestamp == "" {
doc.Timestamp = time.Now().UTC().Format(time.RFC3339)
}
doc.DataStream = &model.DataStream{
Dataset: "fleet_server.output_health",
Type: "logs",
Namespace: "default",
}
body, err := json.Marshal(doc)
if err != nil {
return err
}
id, err := uuid.NewV4()
if err != nil {
return err
}
_, err = bulker.Create(ctx, index, id.String(), body, bulk.WithRefresh())
return err
}
.

However policy self monitor may not be a good place to have these updates as the output bulker health signal is not actually used by the monitor.
Additionally gathering a reference to all bulkers may cause some concurrency issues as seen in #4170.

We may want to have remote bulkers start a heartbeat goroutine that would use the primary bulker to write their status directly; This would address both issues.

@michel-laterman michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Dec 6, 2024
@cmacknz
Copy link
Member

cmacknz commented Dec 6, 2024

We may want to have remote bulkers start a heartbeat goroutine that would use the primary bulker to write their status directly; This would address both issues.

This is also the first alternative I thought of when I first saw what the code was doing. I don't think we'd have to worry about the number of goroutines, because there aren't going to be 1000s of remote outputs unless there is some crazy bug somewhere.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

2 participants