Performance improvements when encoding very large hashes with symbol keys #163
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm currently working on a project which involves processing a huge hash (a couple gigabytes in memory) and occasionally dumping it as JSON (about 1 GB of it). (The project is MatmaRex/commons-media-views, please don't ask why I didn't do something saner, this seemed like a good idea at the time and now it's an interesting mental exercise.) I switched from built-in JSON parser/encode to YAJL for streamed encoding, but the performance seemed not quite as good as I expected. I did some digging and here are the results.
This set of patches should improve encoding performance across the board, but particularly when encoding hashes, and particularly when they have symbolic keys, and especially when they're really large. It looks like most of it is thanks to reduced number of object allocations, and thus fewer GC pauses while encoding. The only potential drawback is that some monkey-patched methods on builtin classes that previously were respected will no longer be. I don't think that's something you're aiming to support.
Testing with this large file: https://dl.dropboxusercontent.com/u/10983006/tmp/big.json (~110 MB) parsed with
Yajl::Parser.new(symbolize_keys: true)
, I get a 2x performance improvement when encoding the parsed data back into JSON.