Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add an --x86 flag #20

Open
rjoleary opened this issue Aug 24, 2018 · 7 comments
Open

Add an --x86 flag #20

rjoleary opened this issue Aug 24, 2018 · 7 comments
Assignees

Comments

@rjoleary
Copy link

The GNU xz command has a boolean --x86 flag which allegedly gives 0-15% extra compression when applied to files containing x86 machine code. The Linux kernel uses this flag when compressing its bzImage. It is also popular among UEFI implementations when compressing firmware.

The explanation from the xz man page is:

          A  BCJ filter converts relative addresses in the machine code to
          their absolute counterparts.  This doesn't change  the  size  of
          the  data,  but it increases redundancy, which can help LZMA2 to
          produce 0-15 % smaller .xz file.  The  BCJ  filters  are  always
          reversible, so using a BCJ filter for wrong type of data doesn't
          cause any data loss, although it may make the compression  ratio
          slightly worse.

We have already adapted this feature to Go. Would it be suitable to upstream it with an --x86 flag?

@ulikunitz
Copy link
Owner

I know about the feature, it is an additional filter in the xz format. It increases redundancy in the x86 code. I haven't really looked into it and nowadays it would be more important to implement the feature for AMD64 and ARM.

@ulikunitz ulikunitz self-assigned this Aug 25, 2018
@rjoleary
Copy link
Author

The --x86 option is also effective for amd64 because their instruction sets are similar. A --arm option would be really cool too!

@mpictor
Copy link

mpictor commented Jan 12, 2021

I'd really like to see --x86 added. As @rjoleary noted, this works on both x86 and amd64.

@ulikunitz what would be needed to get this support added?

ARM (and nowadays RISC-V) would also be nice, but I don't think there's an implementation in any language for RISC-V, and I don't think there's one in go for ARM.

@ulikunitz
Copy link
Owner

The infrastructure for filters is in the code. The filter interface in format.go must be implemented. The only implementation is currently lzmaFilter in lzmafilter.go.

What would help would be a write-up, how the filter works with examples. The xz spec says only "These filters convert relative branch, call, and jump instructions to their absolute counterparts in executable files. This conversion increases redundancy and thus compression ratio." Note that we have to maintain full compatibility to the xz tools.

When I read the spec, I wondered whether I would really need to identify executable segments in the COFF and ELF formats, which have 32 and 64 bit variants and how I would track the program counters. It is probably much simpler, but at the time it appeared quite complex to me and I would need to read source code to understand what the semantics are. It takes much less time to implement something from a good specification than to understand semantics from source code, which will be required here to maintain compatibility.

BTW RISC-V is not in the spec, so we would need to do something incompatible that would not be supported by the xz tools.

@mpictor
Copy link

mpictor commented Jan 13, 2021

It takes much less time to implement something from a good specification than to understand semantics from source code, which will be required here to maintain compatibility.

I assume the code linked by @rjoleary is already tested and encodes/decodes identically to the filter in liblzma, but I could be wrong. @rjoleary do you happen to remember what your implementation came from? A translation of liblzma code, a spec, or ???

@rjoleary
Copy link
Author

Fiano's Go implementation is a port of https://github.com/tianocore/edk2/blob/00f5e11913a8706a1733da2b591502d59f848a99/BaseTools/Source/C/LzmaCompress/Sdk/C/Bra86.c which came from liblzma which is in the public domain.

The filter is reversible even when run on arbitrary non-x86 data. The compression benefits are small (<10%), but cheap.

I'm not too sure exactly how it works. You might want to test it more than we did with different inputs and code coverage. It seems to have gone out of fashion in newer UEFI Firmware Volumes I've seen.

@ulikunitz
Copy link
Owner

Thanks for the link. The modification is indeed simpler than I expected, but still will require me to re-engineer semantics.

I'm currently working on a complete rewrite of the Lempel-Ziv compression and use a much simpler/faster LZ sequence generator, so don't expect the feature to be supported soon.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants