Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Incomplete events schema #1388

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions schemas/com.snowplowanalytics.snowplow/failure/jsonschema/1-0-0
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{
"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
"self": {
"vendor": "com.snowplowanalytics.snowplow",
"name": "failure",
"format": "jsonschema",
"version": "1-0-0"
},
"type": "object",
"properties": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something like an error_code. At the moment we put a lot of description elements into the failed event schemas but these aren't apparent as to what the failure relates to unless you actually add it into the schema. The way I've been thinking about this is to provide a high level classification e.g.,

D - the failure is related to data that has been sent with the event
S - the failure is related to an issue with a schema sent with the event
I - the failure is related to an internal processing error (e.g., pipeline issue)
E - the failure is related to an external processing issue (e.g., failed to get a response from an API enrichment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mike. Interesting suggestion. @stanch would love to hear what you think.

For incomplete events I'd be inclined to treat it as post-MVP, but depending on how difficult to implement it might be something we could do before rolling out the end product

Copy link
Contributor Author

@colmsnowplow colmsnowplow Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miike now that we have some firmer examples of what the actual data looks like, we tweaked some things. I think something like the requirement you describe will be satisfied by the failureType field.

This will essentially indicate the classification of the error: eg. validation error or schema violation, for example.

(additionally we're releasing 4.2.0 before this one, which fixes things so that schema violations at enrichment level don't get reported as enrichment errors.)

I'm not sure if this would give you everything you're asking for here, so we can still consider it post-MVP

"failureType": {
"type": "string",
"description": "Classification of the failure. For example, validation error."
},
"errors": {
"type": "array",
"items": {
"type": "object"
Copy link
Contributor Author

@colmsnowplow colmsnowplow Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially the idea was to have defined fields, but we changed it to just an empty object because of how BQ will load data - any defined fields will create a static definition of the object, which is a pain to change, and additional properties would be ignored. An empty object, however, will just load what it gets.

This isn't necessarily the ideal scenario. We should consider separate fields for error message (array of strings), and error details (flexible object), but that design has downsides too.

},
"description": "A list of errors encountered, and supporting information about the error. Each entry should always have at least a message field and a source field."
},
"schema": {
"type": ["string", "null"],
"description": "The schema for the object which caused the failure, if it was self-descrbing."
},
"data": {
"type": ["object", "null"],
"additionalProperties": true,
"description": "The original data object which caused the failure."
},
"timestamp": {
"type": "string",
"description": "Timestamp at which the failure occurred",
"format": "date-time"
},
"componentName": {
"type": "string",
"description": "Name of the component which produced the failure"
},
"componentVersion": {
"type": "string",
"description": "Version of the component which produced the failure"
}
},
"required": ["failureType", "errors", "timestamp", "componentName", "componentVersion"]
}
Loading