-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add JsonL-Format as possible logformat #47
base: main
Are you sure you want to change the base?
Conversation
Please also add a
to test out your logic for detecting the timestamp key, even when
EDIT: also please add lines that are problematic:
|
Overall, this looks very good. I like your adaptive timestamp detection and removal from the logged content.
Also, I'll add support for detecting the presence of the EDIT: well, I found a few more things to discuss before merging. See other comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comments.
This file format poses some interesting issues compared to the other types. Most of the others deal with processing lines of text, so all the fields are strings by default. Now that we are getting data that has been through the JSON parser, it can be any of a number of types (int, float, bool, None, even dict if the JSON is nested), so we can't assume strings for everything. |
What do you think should happen, if no timestamp key is available? I don't think it makes sense to add this row to the previous rows. |
I think this may bring up a difference in approach from the log reader that reads loosely-formatted text files. In the case of text file, lines with no timestamp are presumed to be continuations of the last timestamped line (if a traceback gets logged, for instance). With parsed JSON, it is kind of a puzzle, why would we get a line with no timestamp? And what to do with such a line? Dropping it on the floor seems the easiest, but also the least friendly to the user - that line might have important stuff in it. I guess we could just log a warning in that case so it doesn't get lost, but we don't make any extra assumptions about it. Also, with regard to the code that looks for a new timestamp key if the old one changes to a new one - I'm a little wary of being too helpful there. In the past I have written APIs with similar helpfulness in mind, and I ended up getting tied up in some knots because an API was too flexible, and this feels similar. Do you think this is going to be a common occurrence? Have you seen this in the log files you work with? To begin, I feel we should start strict, and require the key to be the same throughout a given jsonl file, and if this becomes more common, we'll address it in a future version. |
Without the dynamic timestamp col search, the row without timestamp will be added to the previous line. The only times i found changing keys, is when i changed the logging mechanism. But to be honest, it was just bad practice having both versions in the same file. |
This PR adds support for jsonL (or ndjson) logs.
Files are found if they are called with the file ending jsonl.
Currently is the timestamp searched in all top-level keys in the json objects.
Each key is printed to its own line (\n are not escaped). That's the easiest option for showing json objects. In the future it is possible to allow more complex options.
Additionally there is a small example file, which can be read with the new Reader.