-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Incomplete events schema #1388
Incomplete events schema #1388
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
{ | ||
"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", | ||
"self": { | ||
"vendor": "com.snowplowanalytics.snowplow", | ||
"name": "failure", | ||
"format": "jsonschema", | ||
"version": "1-0-0" | ||
}, | ||
"type": "object", | ||
"properties": { | ||
"failureType": { | ||
"type": "string", | ||
"description": "Classification of the failure. For example, validation error." | ||
}, | ||
"errors": { | ||
"type": "array", | ||
"items": { | ||
"type": "object" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Initially the idea was to have defined fields, but we changed it to just an empty object because of how BQ will load data - any defined fields will create a static definition of the object, which is a pain to change, and additional properties would be ignored. An empty object, however, will just load what it gets. This isn't necessarily the ideal scenario. We should consider separate fields for error message (array of strings), and error details (flexible object), but that design has downsides too. |
||
}, | ||
"description": "A list of errors encountered, and supporting information about the error. Each entry should always have at least a message field and a source field." | ||
}, | ||
"schema": { | ||
"type": ["string", "null"], | ||
"description": "The schema for the object which caused the failure, if it was self-descrbing." | ||
}, | ||
"data": { | ||
"type": ["object", "null"], | ||
"additionalProperties": true, | ||
"description": "The original data object which caused the failure." | ||
}, | ||
"timestamp": { | ||
oguzhanunlu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"type": "string", | ||
"description": "Timestamp at which the failure occurred", | ||
"format": "date-time" | ||
}, | ||
"componentName": { | ||
"type": "string", | ||
"description": "Name of the component which produced the failure" | ||
}, | ||
"componentVersion": { | ||
"type": "string", | ||
"description": "Version of the component which produced the failure" | ||
} | ||
}, | ||
"required": ["failureType", "errors", "timestamp", "componentName", "componentVersion"] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add something like an
error_code
. At the moment we put a lot of description elements into the failed event schemas but these aren't apparent as to what the failure relates to unless you actually add it into the schema. The way I've been thinking about this is to provide a high level classification e.g.,D - the failure is related to data that has been sent with the event
S - the failure is related to an issue with a schema sent with the event
I - the failure is related to an internal processing error (e.g., pipeline issue)
E - the failure is related to an external processing issue (e.g., failed to get a response from an API enrichment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mike. Interesting suggestion. @stanch would love to hear what you think.
For incomplete events I'd be inclined to treat it as post-MVP, but depending on how difficult to implement it might be something we could do before rolling out the end product
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miike now that we have some firmer examples of what the actual data looks like, we tweaked some things. I think something like the requirement you describe will be satisfied by the failureType field.
This will essentially indicate the classification of the error: eg. validation error or schema violation, for example.
(additionally we're releasing 4.2.0 before this one, which fixes things so that schema violations at enrichment level don't get reported as enrichment errors.)
I'm not sure if this would give you everything you're asking for here, so we can still consider it post-MVP