Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Plan for rewrite branch #50

Open
gingerwizard opened this issue Sep 27, 2022 · 6 comments
Open

Plan for rewrite branch #50

gingerwizard opened this issue Sep 27, 2022 · 6 comments

Comments

@gingerwizard
Copy link

Is https://github.com/ulikunitz/xz/tree/rewrite production ready? When do you anticipate this being promoted to main?

Thanks

@ulikunitz
Copy link
Owner

ulikunitz commented Sep 28, 2022

No it is not production ready. Here is a list of actions that are still to be done:

  • write parallel LZMA2 reader
  • make xz use the new lzma package
  • run tests; fix bugs
  • run benchmarks
  • write sequencer that uses a tree based match finder
  • run benchmarks again
  • run fuzzers; fix bugs
  • publish lz module (new code is dependent on this module)
  • publish release candidate; fix bugs
  • publish release

Please note that the new release will not be backward compatible, but it should be faster and will support parallel encoding and decoding. Since I work full time and I cannot provide a timeline, but I will provide updates under this issue.

@ulikunitz
Copy link
Owner

Update: The rewrite branch is now working. Using multiple threads I have achieved write rates over 150 MByte/s, but the compression ratio is larger (39% vs. 33%). I have not done any work on the defaults. Such parallel encoded streams can also be read in a multi-threaded way and I achieve there reading rates of over 190 MByte/s.

There are still some bug fixes required. I need to make the xz Reader a ReadCloser to stop the threads if not the whole stream is read, but so far it looks promising.

@ulikunitz
Copy link
Owner

Just an update.

I have done optimization work and found that I have very fast compressors but those cannot bring the compression rate smaller on 29% measured for the Silesia corpus. The bt4 match finder mode in xz can achieve compression rates of 23% for the same thing. So I currently write a tree-based match finder to achieve the same results. I have updated the task list above to reflect the activity.

@ulikunitz
Copy link
Owner

ulikunitz commented Jun 12, 2023

I have now a very slow parser (ca. 1 MiB/s) that reaches 26% on the Silesia corpus, but the code supports now multithreaded compression and decompression. I have published an alpha release v0.6.0-alpha.3. The new lz module with the Lempel-Ziv parsers is published as well, so you can actually test it.

@wagoodman
Copy link

wagoodman commented Sep 16, 2024

It looks like your list is a little outdated -- it appears that you're ahead of what's still left (🎉 ). What additional tasks are really left? Would you like any help with some of these tasks?

@ulikunitz
Copy link
Owner

Sorry, there has been a lot of work in my day job. There is a v0.6.0-alpha.3 you can experiment with, it supports the parallel modes. I would be interested in some feedback regarding it. Compression rates are still 2% below the original xz, but encoding is much faster especially using the parallel modes.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants