-
-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Future-proofing 24/53bit precision for f32/f64 generation #416
Comments
@sbarral thank you for opening the issue. It is a bit embarrasing to say, but I/we never noticed that a multiply-based approach can produce floating point values with one more bit of precision than the transmute-based approach. And that with all the discussion there has been around the conversion step... As you maybe know, we had a problem of an exploding number of options to convert to a 0..1 range. With the transmute approach [0, 1), (0, 1) and (0, 1] have the same performance. With all things being equal, picking an open range as default made the most sense. Options we have have/had until now:
I suppose the extra bit of precision that the multiply-method offers is an extra argument for bringing back something like I do believe having a Your I remember from my own benchmarks that the transmute method (with bitmasks and floating subtraction) wasfaster than the multiplication method (integer to float conversion and float multiplication). But I don't have things around to confirm at the moment, maybe it is also just CPU dependend. @dhardy, @tspiteri What do you think of eventually having this selection:
|
Regarding Just to be clear though, I didn't really mean to push for an immediate move away from the current So the question for me is: assuming that the computational costs are comparable, should we prefer 52-bit (0,1) sampling or 53-bit [0,1) sampling? To me the latter seems like "the right thing to do" because 53 bits is what a user familiar with IEEE (but unfamiliar with our implementation details) would expect. Regarding naming, I think that the naming should suggest the accuracy/complexity trade-off, so I would prefer |
@vks already advocated not supporting any FP sampling in If we have |
I suggest |
Seconded, this looks unambiguous to me. |
Was removing the We are making quite a lot of braking changes the past couple of weeks. I hope that with the 0.5 release we can make a couple of things stable and say they are going to need very good reasons that have not been brought up before to be up for discussion.
Yes, definitely the choice should not solely be based on performance. But it is not something to ignore either. For example, we choose to not go with the high precision conversion because of performance.
I think only the high precision variant makes sense to someone who doesn't know the details. If you know some of the details and are familiar with IEEE, both 52-bit and 53-bit seems reasonable, it is the difference the implicit bit makes. So I still think the practical concerns matter most: that the guarantee to not return some value is more interesting than that it might be generated. |
If we have several distributions like We could try adding several convenience functions to
Or we could potentially add a function like this (which would normally be inlined), though it would still be cumbersome to specify all parameters: fn gen01<T>(&mut self, type: RangeType, precision: Precision) -> T {
match (type, precision) {
(RangeType::ClosedOpen, Precision::Standard) => self.sample(Range::new(0.0, 1.0)),
(RangeType::ClosedOpen, Precision::Full) => self.sample(HighPrecision01),
...
}
} @pitdicker Whether or not |
Why not make an enum for Open/Closed, one for the various precision types, and use those to generate all the options? I agree that Alternatively, I think |
Hmm, do we want to implement all 8 variants though? I suppose we can try and just use There's also the question of how this relates to the |
Since there's some pressure to get a 0.5 release out soon, we could also revert to the old behaviour here ( |
I can imagine quite a few more variants than 8 😄. But we really should make choices. Offering 8 ways to do basically the same thing is going to make Rand hard to use, without bringing many advantages. And it probably is not the most preferable choice, but as We also make choices when it comes to which RNGs to pick, with which constants, sizes etc. And we make choice in which algorithms to use certain distributions. Just have to make a choice that is somewhat sane... I think I can agree to make [0, 1) the default again. In a way it is the most 'natural'. The RNG produces values in powers of two. An open distribution needs The only reason we can currently fit Because our What do you think of this selection:
|
|
Just wondering: wouldn't it be possible, whichever the choice of My thinking is that it would somewhat release the pressure to keep behavioral backward compatibility for |
Totally possible; I did something similar before (with |
Regarding the eight combinations, it'd only be six because |
The main issue (change back to |
I know this is a contentious issue, sorry for bringing it again... Like others, I feel a bit uneasy with the choice to promote a
Standard
distribution that samples within the open (0, 1) interval. As a matter of fact, I am a bit warry of promoting any "one true way" to produce FP values.Now I do agree that a fully open interval is adequate in most applications. OTOH, I have never seen a practical situation where a half-open interval would not be adequate too: it is often important to avoid singularity at one of the bound, typically with
log(u)
-type transforms, but I suspect that the need to avoid both 0 and 1 is very rare.The main reason I dislike the (0, 1) open interval as the default choice, though, is that it implicitly bakes into the API a defficiency of the current
transmute
-based FP generation algorithm, namely the truncation to 23 bits (resp. 52 bits for f64) precision, instead of the 24/53 bits that one may expect based on the FP significand precision.The problem is that, unlike with half-open [0, 1) or (0, 1] intervals, the generation of FP values in the (0, 1) open interval becomes AFAIK prohibitively expensive with 24/53 bits due to the need for a rejection step. So in a way, the current
Standard
distribution would become a commitment to an implementation detail of the current FP generation method.For this reason, if it is really deemed important to promote a default FP generation distribution rather than just define e.g.
Open01
,OpenClosed01
andClosedOpen01
, I would then favor the widely-adopted convention of sampling within [0, 1) because (i) it is also adequate for most situations and (ii) it leaves open the possibility (today, or in the future) to efficiently generate FP values with a 24/53-bit precision.Regarding point (ii) above, I made some preliminary investigations to assess the computational advantage of the current 52-bit
transmute
-based method over two 53-bit FP generation methods. The following u64->f64 conversion methods were benchmarked:The benchmark performs a simple sum of a large number of FP values produced by one of the above conversion function fed by the 64-bit output of
XorShiftRng
.As always, the benchmark needs to be taken with a big grain of salt, especially that the methods are normally inlined so a lot depends on the actual code context. In order to assess robustness towards inlining, the benchmark was ran a second time with the above functions marked with
#[inline(never)]
. Also, I did not try any other CPU than mine (i5-7200). With this caveat, here are the computation times:#[inline(always)]
#[inline(never)]
XorShiftRng
+transmute52
XorShiftRng
+direct53
XorShiftRng
+transmute53
Two surprises:
transmute53
was very slightly but consistently faster, which I surmise is becausec
has a 50% chance to be 0.0 sou - c
can evaluate fast.In any case, these results seem to strongly question the purported advantage of the 52-bit
transmute
-based method, at least for modern CPUs. And even for old CPUs, I would expect thetransmute53
version to be reasonably close totransmute52
.The text was updated successfully, but these errors were encountered: