-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Really bad performance #63
Comments
Hi, If you are concerned about performance then you may want to consider the LibXML binding https://modules.raku.org/dist/LibXML:cpan:WARRINGD |
I understand that, but taking 3m30s and 5GiB just to load a 50MByte file.... Is not OK. |
Sure, Another line of attack would be to profile the performance of the Grammar on its own (that is without the action that builds the object tree,) and see if any improvements can be made there. There are probably some micro-optimisations that could be made within the code but you probably wouldn't want to start on that until profiling has revealed the places that would be of benefit. |
It's good to see it's not just me. I wanted to use XML::Entity::HTML that depends on this module. Turns out that out of 160 seconds of rendering a 1MB HTML on 4 cores, 140 went straight to escaping tags, when the named HTML barely had any tags to escape in the first place! |
This happens because of the way this module was structured. On the one hand, it's a great example of some very cool Raku features but… they're also ones that haven't been very well optimized. Lots of Won't help with memory, but should help with speed: most classes will use method-call syntax for attributes ( This can almost certainly be optimized by a LOT, but I'm not sure if it can be done while maintaining 100% backwards compatibility. Guess I'll give it a try. |
It's certainly mostly my fault the XML module is slow as molasses in January. When I first worked on this in 2010, I wanted to try using all of the cool language features that had drawn me to Raku in the first place, and had more emphasis on that than on performance. Subsequent updates only focused on trying even more cool new features. I'd planned on eventually writing an add-on extension using LibXML2 bindings (I see there is at least one module doing that now) but using the simpler API this module provides. All of the amazing developers who have worked on this since I abandoned my Raku libraries a decade ago, have improved it substantially, and they are all saints for working on the convoluted codebase I left behind. |
@supernovus don't worry - it's really much better that you have left a lot of stuff to work with/on, than just silently abandoning them. Also, the code isn't all that bad really... when I started looking into it, what struck me is the "builder pattern" everywhere. I thought that would be an immediate and straight-forward place for improvements - but then @alabamenhu started actually making changes and reported that there aren't really easy gains with an eager system - in which case I also wouldn't say it's really your fault. @alabamenhu are you planning to adopt this module, by the way? Not pressuring you in either direction, just curious. And if you don't, maybe your changes could be merged back into this repo, with a new version published perhaps. |
Not sure if I'll adopt it per se, but I'll see what I can do to work with it. One important thing to consider here: this is a pure Raku module, and that has major value even if a wrapper for LibXML would be faster. There's no guarantee that LibXML will be available on any given system, so a fully vanilla Raku module is a good thing. One thing that MIGHT be faster potentially is what I did for parsing number format strings in |
Hi!
I'm trying to parse a 52 MByte XML file and the performance is really bad.
I'm trying to follow the instructions and just doing:
This code will use more than 5Gbytes of memory [1], only one core is used [2] and it takes more than 3m30s (in comparison a perl version takes around 15 seconds to parse the file)
[1] - Reported by![htop image](https://camo.githubusercontent.com/e89f6c5e4e0090ff574d63883a1b896d7d943a45fb6d9589f1bc1bbefd86bf6f/68747470733a2f2f692e696d6775722e636f6d2f775736475452642e706e67)
cat /proc/$PID/smaps | grep -i pss | awk '{Total+=$2} END {print Total/1024" MB"}'
[2] -
The text was updated successfully, but these errors were encountered: