-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Help trace worker crash in Kusto. #450
Conversation
src/Runner.Listener/JobDispatcher.cs
Outdated
@@ -952,8 +951,10 @@ private async Task LogWorkerProcessUnhandledException(Pipelines.AgentJobRequestM | |||
ArgUtil.NotNull(timeline, nameof(timeline)); | |||
TimelineRecord jobRecord = timeline.Records.FirstOrDefault(x => x.Id == message.JobId && x.RecordType == "Job"); | |||
ArgUtil.NotNull(jobRecord, nameof(jobRecord)); | |||
var unhandledExceptionIssue = new Issue() { Type = IssueType.Error, Message = errorMessage }; | |||
unhandledExceptionIssue.Data[Constants.Runner.WorkerCrashIssueDataKey] = string.Empty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service will check the issue data to decide whether fire product trace event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about:
Should we make this more generic? Do we have any other cases that warrant a more generic telemetry mechanism, e.g. [...].Data["_telemetry"] = "WORKER_CRASH"
@@ -486,7 +486,10 @@ public void ProcessCommand(IExecutionContext context, string inputLine, ActionCo | |||
|
|||
foreach (var property in command.Properties) | |||
{ | |||
issue.Data[property.Key] = property.Value; | |||
if (!string.Equals(property.Key, Constants.Runner.WorkerCrashIssueDataKey, StringComparison.OrdinalIgnoreCase)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we already filter everything other than specific properties?
* Help trace worker crash in Kusto. * more * feedback.
There was a bug in the Actions hosted runner virtual environment that causes the available disk space reduced. http://github.com/actions/virtual-environments/issues/709
Some customer's workflow used all disk space and cause the runner crash as well.
We want to get alert when the number of worker crash increase, so we can proactively investigate these issue instead of waiting for customer report.
Each worker crash means there is a failed workflow run and the customer is not happy with it.