Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Cut operation #34

Merged
merged 2 commits into from
Dec 22, 2023
Merged

Cut operation #34

merged 2 commits into from
Dec 22, 2023

Conversation

bee-san
Copy link
Owner

@bee-san bee-san commented Dec 22, 2023

I still find myself reaching for awk and cut for certain kinds of data analysis. So I added a new operation that's able to cover 80 % of their utility while staying in CyberChef.

Basically, it's a small DSL (similar to the awk print statement) for extracting fields from tabulated data like CSVs. It allows you to easily select the fields you want, reorder them, merge existing fields into new ones and so on. It also provides a nice pathway for carving CSVs from fixed width data, and for converting between different kinds of tabular data (e.g. CSV to TSV).

The following expression extracts the 0th field, joins the 1st and 2nd fields (with a "T" between them), extracts the 4th and 5th field, and then finally the last field.

0, 1"T"2, 4-5, -1

So far I've found it useful for stuff like:

Parsing commands outputs from common utilities like ls -la and zfs list.
Filtering noisy JSON logs, first by turning them into CSVs and then extracting the interesting columns. (JSONPath currently only supports extracting one field at the time).
Sorting CSVs. Basically, performing a Schwartzian transform by placing the sorting column first, then applying 'Sort'.

Considerations

I'm not sure if 'Cut' is the best name for this operation, as it doesn't mirror the original UNIX behavior exactly and also incorporates concepts from awk. Perhaps Carve, Slice, or even Chisel might be good names.

The input/output record delimiters could be removed and replaced with Fork. I've included them because I suspect most users of the operation will want to apply it to several records at once, and to mirror how awk works.

Obviously nobody likes embedding new DSLs either, but being mostly based on the existing syntax offered by awk and cut, I would argue it's similar enough to them to be easy to pick up.

@bee-san bee-san merged commit 73716a2 into bee-san:master Dec 22, 2023
6 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants