Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Import/loading takes a long time. How to speed up loading? #49

Open
Wikinaut opened this issue Mar 21, 2023 · 8 comments
Open

Import/loading takes a long time. How to speed up loading? #49

Wikinaut opened this issue Mar 21, 2023 · 8 comments

Comments

@Wikinaut
Copy link

Wikinaut commented Mar 21, 2023

I use pyphen for my Rasperry Pi Zero powered Internetradio https://github.com/Wikinaut/pinetradio .

Import of pyphen started.
pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero

Loading always takes a very long time. Is there a way to decrease the loading time?

@liZe
Copy link
Member

liZe commented Mar 22, 2023

I use pyphen for my Rasperry Pi Zero powered Internetradio

Cool!

pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero

That’s a lot. Even if the Raspberry Pi’s CPU is slow, it shouldn’t take so much time. Profiling on my computer doesn’t give interesting results, could you please provide profiling information on your Raspberry? You can get profiling information launching python -c "import pyphen; pyphen.Pyphen(lang='de_DE')" -m cProfile -o /tmp/cprofile, and you can send the /tmp/cprofile file here.

(I hope I’ll be able to read it even if it’s no the same platform, otherwise I’ll ask you to launch an additional command!)

@Wikinaut
Copy link
Author

Done. The command did not work, but I put the commands into a file an run that. Here ist the full output:

(available until March 2024)
https://dpaste.com/7RA2RN2ES.txt

@Wikinaut
Copy link
Author

Wikinaut commented Mar 22, 2023

Here's is just one example of the usage (purpose of hyphenation: allow use of maxium font size on the tiny display of https://github.com/Wikinaut/pinetradio ). The first - and third - came from the hyphenation. Currently, I use only no or one automatic hyphenation per word.
grafik

@liZe
Copy link
Member

liZe commented Mar 22, 2023

You may get slightly better results using the 0.14.0 version, as it may be a bit faster if your storage is slow (and it probably is). That could help with the 17 seconds spent mainly to list dictionaries, and the 30 seconds in the __init__ function code when a dictionary is parsed.

But except from this change, you have almost the same distribution of time than me. It could be possible to find optimizations, but nothing’s obvious for me now.

@Wikinaut
Copy link
Author

Wikinaut commented Mar 22, 2023

0.14.0 is not much better:

Import of pyphen started.
pyphen imported, loading of de_DE took 39.62 seconds on Raspberry Pi Zero

new profile:

https://dpaste.com/CG5F52TKZ.txt

@liZe
Copy link
Member

liZe commented Mar 23, 2023

0.14.0 is not much better

A 10% improvement is good news, that’s what I was hoping for, but it’s not enough.

new profile:

It looks like we saved some time listing dictionaries, importing the module seems to be much faster.

For ~50s, there’s:

It should be possible to save some time, but there’s nothing obvious from what I see there :/.

@Wikinaut
Copy link
Author

Perhaps an offline-preprocessing of the (used) dictionary? Could this help?

@liZe
Copy link
Member

liZe commented Mar 25, 2023

Perhaps an offline-preprocessing of the (used) dictionary? Could this help?

I’ve tried to load a JSON (generated from the dictionary) on my laptop and it’s ~5 times faster than loading the Hunspell dictionary. With Pickle, it’s ~10 times faster. The benefits would be probably higher on slower systems.

We could consider including these pre-processed dictionaries. Pickle and JSON are probably not the best solutions (for different reasons), good ideas are welcome 😁.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants