Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Use Int256 to reduce BigInts in FD operations. #93

Merged
merged 5 commits into from
Aug 12, 2024
Merged

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented Jun 12, 2024

We do not here explicitly introduce support for FD{BitIntegers.Int256}, though that should work out of the box both before and after this PR.

Rather, this PR uses a (U)Int256 under the hood to prevent allocations from Int128 widening to BigInt in FD operations.
Unfortunately, rem and mod on BitIntegers.Int256 still fall-back to a BigInt (see the note here), so this doesn't completely eliminate the BigInt allocs. But it does reduce them.


This is a pretty small PR, but it should have a big impact on users of FD{Int128}.

Before:

julia> @btime fd * fd setup = (fd = FixedDecimal{Int128,3}(1.234))
  392.413 ns (24 allocations: 464 bytes)
FixedDecimal{Int128,3}(1.523)

After:

julia> @btime fd * fd setup = (fd = FixedDecimal{Int128,3}(1.234))
  213.039 ns (12 allocations: 240 bytes)
FixedDecimal{Int128,3}(1.523)

We do not here explicitly introduce support for FD{BitIntegers.Int256},
though that should work out of the box both before and after this PR.

Rather, this PR _uses_ a (U)Int256 under the hood to prevent allocations
from Int128 widening to BigInt in FD operations.
@NHDaly NHDaly requested a review from Drvi June 12, 2024 19:27
@NHDaly
Copy link
Member Author

NHDaly commented Jun 12, 2024

I just realized this reimplements RelationalAI-oss#7, from 6 years ago (😳) which @TotalVerb had already reviewed. @TotalVerb you may want to do one more pass over this.

NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
Copy link
Collaborator

@Drvi Drvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@NHDaly NHDaly merged commit e0c1932 into master Aug 12, 2024
12 checks passed
@NHDaly NHDaly deleted the nhd-overflow-Int128 branch August 12, 2024 17:49
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants