Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Other sizes of data (group size and Endianness) #104

Closed
ACleverDisguise opened this issue Oct 23, 2020 · 8 comments
Closed

Other sizes of data (group size and Endianness) #104

ACleverDisguise opened this issue Oct 23, 2020 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@ACleverDisguise
Copy link

ACleverDisguise commented Oct 23, 2020

I frequently have to dump data files (ADC output, for example) that don't just have byte-oriented data. It would be nice to be able to specify data width in the dump so I get the hex data grouped in the natural data size instead of having to do the little-endian two-step and mentally group indistinguishable bytes by 2 or 4 or whatever. Something like:

--word-size=1 (uint8_t, default)
--word-size=2 (uint16_t)
--word-size=4 (uint32_t)
--word-size=8 (uint64_t)
--word-size=16 (uint128_t)

That covers the common-ish types. If you want to be really brave you could do weird crap like 3-byte or 17 byte, but that is likely low return on investment.

Not all such data is little-endian, so an extra flag for those cases where word-size > 1 would be:

--little-endian (default)
--big-endian

Also, interpretation could be signed or unsigned

--signed
--unsigned (default)

Of course with this you'd drop the byte-oriented colouration (but maybe with --signed you'd highlight negative numbers in red or something).

@sharkdp
Copy link
Owner

sharkdp commented Oct 24, 2020

Thank you for the feedback.

It's not entirely clear to me what the output would look like.

Say I choose --word-size=2 (uint16_t) and the input contains 0xAB 0xCD 0x12 0x34. Would you like to see

CDAB 3412

for --little-endian and

ABCD 1234

for --big-endian?

@ACleverDisguise
Copy link
Author

That's pretty much exactly what I was picturing, yes.

@sharkdp
Copy link
Owner

sharkdp commented Oct 31, 2020

This looks similar to xxds -groupsize option if I am not mistaking:

       -g bytes | -groupsize bytes
              Separate the output of every <bytes> bytes (two hex characters or  eight
              bit-digits  each)  by  a whitespace.  Specify -g 0 to suppress grouping.
              <Bytes> defaults to 2 in normal mode, 4 in little-endian mode and  1  in
              bits mode.  Grouping does not apply to postscript or include style.

I recently came across this when reading this blog post which makes use of -g to inspect ELF64 executables.

@sharkdp sharkdp added enhancement New feature or request help wanted Extra attention is needed labels Oct 31, 2020
@ACleverDisguise
Copy link
Author

ACleverDisguise commented Nov 2, 2020

It is similar to -g and -e in xxd, yes, but I'm not a huge fan of their nomenclature and their rather bizarre default assumptions. (Like the bizarre assumption that "normal" is big-endian, which hasn't been "normal" for decades now.) I can understand, perhaps, that you might want to keep it compatible for easier transition for users, though, so I'm only going to express a mild preference for breaking free from it.

@sharkdp
Copy link
Owner

sharkdp commented Dec 5, 2022

@RinHizakura If you find the time, could you maybe summarize what is and what is not possible with your new option in #170? (released today)

@RinHizakura
Copy link
Contributor

The new option --group-bytes will provide the functionality to group multiple octets as a unit, which means that several bytes will be shown together without whitespace. It is quite similar to the option -groupsize in xxd, however, the possible group size should only be 1, 2, 4, or 8 currently.

On the other hand, this could only be shown in the big-endian format. The little-endian dump is not supported now.

@sharkdp
Copy link
Owner

sharkdp commented Dec 7, 2022

The new option --group-bytes will provide the functionality to group multiple octets as a unit, which means that several bytes will be shown together without whitespace. It is quite similar to the option -groupsize in xxd, however, the possible group size should only be 1, 2, 4, or 8 currently.

I think this limitation fine for now. 16 would probably be nice, but I understand that it probably interferes with --panels.

On the other hand, this could only be shown in the big-endian format. The little-endian dump is not supported now.

Right. I agree with @ACleverDisguise that this would be a really nice feature to have. So let's keep this ticket open for now.

@sharkdp sharkdp changed the title Other sizes of data. Other sizes of data (group size and Endianness) Dec 7, 2022
@sharkdp
Copy link
Owner

sharkdp commented Apr 25, 2023

I think the main functionality requested in this ticket is now supported with #189 by @RinHizakura now also merged.

@sharkdp sharkdp closed this as completed Apr 25, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants