Improve calc_fclass() with binary decision tree #207

visitorckw · 2023-09-04T01:51:35Z

I have optimized the calc_fclass function by changing it into a binary decision tree. This optimization significantly improves the performance of the function, resulting in a speedup of approximately 69% in my test. The test I wrote to evaluate the performance: https://github.com/visitorckw/rv32emu-calc_fclass-optimization/

src/softfloat.h

2011eric · 2023-09-04T14:06:10Z

src/softfloat.h

-    out |= (expn == FMASK_EXPN && (frac & FMASK_QNAN)) ? 0x200 : 0;
+
+    // Check if it is special value -INF
+    out |= (f == 0xff800000) ? 0x001 : 0;


-INF could be integrated into the if-else statement

I rewrote the decision tree as follow, and gained more speedup:

/* Check the exponent bits */ if (expn) { if (expn != FMASK_EXPN) { /* Check if it is negative normal or positive normal */ out = sign ? 0x002 : 0x040; } else { /* Check if it is NaN */ if (frac) { out = frac & FMASK_QNAN ? 0x200 : 0x100; } else if (!sign) { /* Check if it is +INF */ out = 0x080; } else { /* Check if it is -INF */ out = 0x001; } } } else if (frac) { /* Check if it is negative or positive subnormal */ out = sign ? 0x004 : 0x020; } else{ /* Check if it is +0 or -0 */ out = sign ? 0x008 : 0x010; }

@2011eric
Thanks for your review and suggestions.
In my tests, the current version's execution efficiency is now approximately 69% faster than the original version.

Are you able to run arch-test with DEVICE=F?
Just to make sure this implementation is correct.

When I attempted to execute the command make arch-test RISCV_DEVICE=FZicsr as your request, I consistently encountered the following error message, regardless of using the new function or the original one:

inferior exit code 0 inferior exit code 0 ERROR | Segmentation fault (core dumped)

I can successfully execute other tests, such as RISCV_DEVICE=IMC, without encountering any errors. I'm not sure about the cause of the error.

However, I used a loop to iterate through the entire range of uint32_t values as input and compared the output differences between the original function and the new function. The experimental results showed that for all inputs, both functions would return the same value.

Here is the code used for testing:

uint32_t i = 0; while(1) { if(calc_fclass(i) != calc_fclass_new(i)) { fprintf(stderr, " Error: %x %x %x", i, calc_fclass(i), calc_fclass_new(i)); return 1; } if(i == 0xffffffff) { break; } i++; } printf(" All tests passed.\n"); return 0;

2011eric · 2023-09-04T15:39:32Z

src/softfloat.h

+        } else {
+            /* Check if it is NaN */
+            if (frac) {
+                out = frac & FMASK_QNAN ? 0x200 : 0x100;


I forgot to change this line:

out = FMASK_QNAN ? 0x200 : 0x100;

I believe the current implementation is correct. Changing line 61 to out = FMASK_QNAN ? 0x200 : 0x100; would cause different behavior for certain inputs compared to the original function. For example, when the input is 0x7f800001, the original function would return 0x100, but applying this modification would make it return 0x200.

Yes, you're right.
Seems that the current implementation is correct.

jserv

Squash git commits and refine commit messages.

Improved the calc_fclass function by changing it into a binary decision tree. This optimization significantly improves the performance of the function, resulting in a speedup of approximately 69% in my test. You can review the test and its results in the following link: https://github.com/visitorckw/rv32emu-calc_fclass-optimization/

jserv · 2023-09-05T13:27:40Z

Thank @visitorckw for contributing!

Improve calc_fclass() with binary decision tree

jserv requested a review from 2011eric September 4, 2023 02:02

jserv reviewed Sep 4, 2023

View reviewed changes

src/softfloat.h Outdated Show resolved Hide resolved

jserv changed the title ~~Optimize calc_fclass() for improved performance~~ Improve calc_fclass() with binary decision tree Sep 4, 2023

2011eric reviewed Sep 4, 2023

View reviewed changes

jserv requested a review from 2011eric September 5, 2023 05:49

jserv requested changes Sep 5, 2023

View reviewed changes

visitorckw force-pushed the master branch from 583a7c3 to e9507cc Compare September 5, 2023 12:56

visitorckw force-pushed the master branch from e9507cc to e7e4077 Compare September 5, 2023 13:04

jserv merged commit 1275468 into sysprog21:master Sep 5, 2023

vestata pushed a commit to vestata/rv32emu that referenced this pull request Jan 24, 2025

Merge pull request sysprog21#207 from visitorckw/master

a7e7373

Improve calc_fclass() with binary decision tree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve calc_fclass() with binary decision tree #207

Improve calc_fclass() with binary decision tree #207

visitorckw commented Sep 4, 2023 •

edited

Loading

2011eric Sep 4, 2023

2011eric Sep 4, 2023

visitorckw Sep 4, 2023

2011eric Sep 4, 2023

visitorckw Sep 4, 2023

visitorckw Sep 4, 2023

2011eric Sep 4, 2023

visitorckw Sep 4, 2023

2011eric Sep 5, 2023

jserv left a comment

jserv commented Sep 5, 2023

Improve calc_fclass() with binary decision tree #207

Improve calc_fclass() with binary decision tree #207

Conversation

visitorckw commented Sep 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jserv left a comment

Choose a reason for hiding this comment

jserv commented Sep 5, 2023

visitorckw commented Sep 4, 2023 •

edited

Loading