Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve calc_fclass() with binary decision tree #207

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

visitorckw
Copy link
Collaborator

@visitorckw visitorckw commented Sep 4, 2023

I have optimized the calc_fclass function by changing it into a binary decision tree. This optimization significantly improves the performance of the function, resulting in a speedup of approximately 69% in my test. The test I wrote to evaluate the performance: https://github.com/visitorckw/rv32emu-calc_fclass-optimization/

@jserv jserv requested a review from 2011eric September 4, 2023 02:02
src/softfloat.h Outdated Show resolved Hide resolved
@jserv jserv changed the title Optimize calc_fclass() for improved performance Improve calc_fclass() with binary decision tree Sep 4, 2023
src/softfloat.h Outdated
out |= (expn == FMASK_EXPN && (frac & FMASK_QNAN)) ? 0x200 : 0;

// Check if it is special value -INF
out |= (f == 0xff800000) ? 0x001 : 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-INF could be integrated into the if-else statement

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the decision tree as follow, and gained more speedup:

/* Check the exponent bits */
    if (expn) {
        if (expn != FMASK_EXPN) {
            /* Check if it is negative normal or positive normal */
            out = sign ? 0x002 : 0x040;
        } else {
            /* Check if it is NaN */
            if (frac) {
                out = frac & FMASK_QNAN ? 0x200 : 0x100;
            } else if (!sign) {
                /* Check if it is +INF */
                out = 0x080;
            } else {
                /* Check if it is -INF */
                out = 0x001;
            }
        }
    } else if (frac) {
        /* Check if it is negative or positive subnormal */
        out = sign ? 0x004 : 0x020;
    } else{
        /* Check if it is +0 or -0 */
        out = sign ? 0x008 : 0x010;
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2011eric
Thanks for your review and suggestions.
In my tests, the current version's execution efficiency is now approximately 69% faster than the original version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to run arch-test with DEVICE=F?
Just to make sure this implementation is correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I attempted to execute the command make arch-test RISCV_DEVICE=FZicsr as your request, I consistently encountered the following error message, regardless of using the new function or the original one:

inferior exit code 0
inferior exit code 0  
ERROR | Segmentation fault (core dumped)

I can successfully execute other tests, such as RISCV_DEVICE=IMC, without encountering any errors. I'm not sure about the cause of the error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I used a loop to iterate through the entire range of uint32_t values as input and compared the output differences between the original function and the new function. The experimental results showed that for all inputs, both functions would return the same value.

Here is the code used for testing:

uint32_t i = 0;
while(1) {
    if(calc_fclass(i) != calc_fclass_new(i)) {
        fprintf(stderr, "  Error:    %x %x %x", i, calc_fclass(i), calc_fclass_new(i));
        return 1;
    }
    if(i == 0xffffffff) {
        break;
    }
    i++;
}
printf("  All tests passed.\n");
return 0;

} else {
/* Check if it is NaN */
if (frac) {
out = frac & FMASK_QNAN ? 0x200 : 0x100;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to change this line:

out = FMASK_QNAN ? 0x200 : 0x100;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the current implementation is correct. Changing line 61 to out = FMASK_QNAN ? 0x200 : 0x100; would cause different behavior for certain inputs compared to the original function. For example, when the input is 0x7f800001, the original function would return 0x100, but applying this modification would make it return 0x200.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right.
Seems that the current implementation is correct.

@jserv jserv requested a review from 2011eric September 5, 2023 05:49
Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash git commits and refine commit messages.

Improved the calc_fclass function by changing it into a binary decision
tree. This optimization significantly improves the performance of the
function, resulting in a speedup of approximately 69% in my test. You
can review the test and its results in the following link:
https://github.com/visitorckw/rv32emu-calc_fclass-optimization/
@jserv jserv merged commit 1275468 into sysprog21:master Sep 5, 2023
@jserv
Copy link
Contributor

jserv commented Sep 5, 2023

Thank @visitorckw for contributing!

vestata pushed a commit to vestata/rv32emu that referenced this pull request Jan 24, 2025
Improve calc_fclass() with binary decision tree
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants