-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Improve calc_fclass() with binary decision tree #207
Conversation
src/softfloat.h
Outdated
out |= (expn == FMASK_EXPN && (frac & FMASK_QNAN)) ? 0x200 : 0; | ||
|
||
// Check if it is special value -INF | ||
out |= (f == 0xff800000) ? 0x001 : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-INF
could be integrated into the if-else statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the decision tree as follow, and gained more speedup:
/* Check the exponent bits */
if (expn) {
if (expn != FMASK_EXPN) {
/* Check if it is negative normal or positive normal */
out = sign ? 0x002 : 0x040;
} else {
/* Check if it is NaN */
if (frac) {
out = frac & FMASK_QNAN ? 0x200 : 0x100;
} else if (!sign) {
/* Check if it is +INF */
out = 0x080;
} else {
/* Check if it is -INF */
out = 0x001;
}
}
} else if (frac) {
/* Check if it is negative or positive subnormal */
out = sign ? 0x004 : 0x020;
} else{
/* Check if it is +0 or -0 */
out = sign ? 0x008 : 0x010;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2011eric
Thanks for your review and suggestions.
In my tests, the current version's execution efficiency is now approximately 69% faster than the original version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to run arch-test with DEVICE=F?
Just to make sure this implementation is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I attempted to execute the command make arch-test RISCV_DEVICE=FZicsr
as your request, I consistently encountered the following error message, regardless of using the new function or the original one:
inferior exit code 0
inferior exit code 0
ERROR | Segmentation fault (core dumped)
I can successfully execute other tests, such as RISCV_DEVICE=IMC
, without encountering any errors. I'm not sure about the cause of the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I used a loop to iterate through the entire range of uint32_t values as input and compared the output differences between the original function and the new function. The experimental results showed that for all inputs, both functions would return the same value.
Here is the code used for testing:
uint32_t i = 0;
while(1) {
if(calc_fclass(i) != calc_fclass_new(i)) {
fprintf(stderr, " Error: %x %x %x", i, calc_fclass(i), calc_fclass_new(i));
return 1;
}
if(i == 0xffffffff) {
break;
}
i++;
}
printf(" All tests passed.\n");
return 0;
} else { | ||
/* Check if it is NaN */ | ||
if (frac) { | ||
out = frac & FMASK_QNAN ? 0x200 : 0x100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to change this line:
out = FMASK_QNAN ? 0x200 : 0x100;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the current implementation is correct. Changing line 61 to out = FMASK_QNAN ? 0x200 : 0x100;
would cause different behavior for certain inputs compared to the original function. For example, when the input is 0x7f800001, the original function would return 0x100, but applying this modification would make it return 0x200.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right.
Seems that the current implementation is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squash git commits and refine commit messages.
Improved the calc_fclass function by changing it into a binary decision tree. This optimization significantly improves the performance of the function, resulting in a speedup of approximately 69% in my test. You can review the test and its results in the following link: https://github.com/visitorckw/rv32emu-calc_fclass-optimization/
Thank @visitorckw for contributing! |
Improve calc_fclass() with binary decision tree
I have optimized the calc_fclass function by changing it into a binary decision tree. This optimization significantly improves the performance of the function, resulting in a speedup of approximately 69% in my test. The test I wrote to evaluate the performance: https://github.com/visitorckw/rv32emu-calc_fclass-optimization/