-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8360116: Add support for AVX10 floating point minmax instruction #25914
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@jatin-bhateja The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
/label add hotspot-compiler-dev |
@jatin-bhateja |
61646a5
to
ecb2294
Compare
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing these new instructions! I had a look at your changes and have a few minor suggestions and questions. I am quite new to this part of the codebase, so feel free to disagree if I am way off base.
How did you test these changes?
Also, if you merge the current master branch, the Windows build failures in the Github Actions will be fixed.
assert(opc == Op_MinV || opc == Op_MinReductionV || | ||
opc == Op_MaxV || opc == Op_MaxReductionV, "sanity"); | ||
if (elem_bt == T_FLOAT) { | ||
evminmaxps(dst, mask, src1, src2, true, opc == Op_MinV || opc == Op_MinReductionV ? 0x4 : 0x5, vlen_enc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps 0x4
and 0x5
should be factored into named constants since they are used in multiple places and it would also help readability if one does not have the documentation handy when reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mhaessig ,
Command bits are in accordance with Tables 11.1 and 11.2 of section 11.2. First 2 bits [1:0] signify the operation kind, 00 for min and 01 for max. Next two bits [3:2] signify the sign selection logic and 4th bit 0 for both min/max, with this command word we can emulate the semantics of Math.max/min using a single AVX10 instruciton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got that from the documentation you kindly linked in the description. My question was rather to define a constant like AVX10_MINMAX_MAX_COMPARE_SIGN = 0x5
that can be used instead of the plain magic numbers. Because people looking at the code later will not have the "luxury" of being provided a link to the relevant documentation right when puzzling about what 0x4
means in this case.
@@ -1238,6 +1238,18 @@ void C2_MacroAssembler::evminmax_fp(int opcode, BasicType elem_bt, | |||
} | |||
} | |||
|
|||
void C2_MacroAssembler::vminmax_fp(int opc, BasicType elem_bt, XMMRegister dst, KRegister mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 1122 mentions the differences between vminps/vmaxps
and Java semantics. Perhaps a mention of the new instructions introduced in this PR might help people who are confused about the fact that vminmax_fp
is overloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details on insturction semantics can be found in section 11.2 of AVX10 manual https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10
void eminmaxsh(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8); | ||
void eminmaxss(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8); | ||
void eminmaxsd(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8); | ||
void evminmaxph(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, bool merge, int imm8, int vector_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason evminmaxph
does not have a version where src
has type Address
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we do not have a matcher pattern to consume it, as the MIN/MAX sequence was anyway, a bulky one. I have added a new pattern for memory operand flavor of the pattern specifically for AVX-10, along with this patch.
Patch has been regressed over the following tests using Intel SDE https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html (Version 9.53).
- test/jdk/jdk/incubator/vector/Double*VectorTests:: (min/max all variants including reduction)
- test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java
- test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
e.g. command line /home/jatinbha/softwares/sde-external-9.53.0-2025-03-16-lin/sde64 -future -ptr_raise -icount -- java
Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection.
The following pseudo-code describes the existing algorithm for min/max[FD]:
Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand.
btmp = (b < +0.0) ? a : b
atmp = (b < +0.0) ? b : a
Tmp = Max_Float(atmp , btmp)
Res = (atmp == NaN) ? atmp : Tmp
For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0.
btmp = (b < +0.0) ? b : a
atmp = (b < +0.0) ? a : b
Tmp = Max_Float(atmp , btmp)
Res = (atmp == NaN) ? atmp : Tmp
Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling.
Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type.
Kindly review and share your feedback.
Best Regards,
Jatin
[1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25914/head:pull/25914
$ git checkout pull/25914
Update a local copy of the PR:
$ git checkout pull/25914
$ git pull https://git.openjdk.org/jdk.git pull/25914/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25914
View PR using the GUI difftool:
$ git pr show -t 25914
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25914.diff
Using Webrev
Link to Webrev Comment