Skip to content

8360116: Add support for AVX10 floating point minmax instruction #25914

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Jun 20, 2025

Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection.

The following pseudo-code describes the existing algorithm for min/max[FD]:

Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand.

btmp = (b < +0.0) ? a : b
atmp = (b < +0.0) ? b : a
Tmp = Max_Float(atmp , btmp)
Res = (atmp == NaN) ? atmp : Tmp

For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0.

btmp = (b < +0.0) ? b : a
atmp = (b < +0.0) ? a : b
Tmp = Max_Float(atmp , btmp)
Res = (atmp == NaN) ? atmp : Tmp

Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling.

Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type.

Kindly review and share your feedback.

Best Regards,
Jatin

[1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8360116: Add support for AVX10 floating point minmax instruction (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25914/head:pull/25914
$ git checkout pull/25914

Update a local copy of the PR:
$ git checkout pull/25914
$ git pull https://git.openjdk.org/jdk.git pull/25914/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25914

View PR using the GUI difftool:
$ git pr show -t 25914

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25914.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 20, 2025

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 20, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Jun 20, 2025

@jatin-bhateja The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jun 20, 2025
@jatin-bhateja
Copy link
Member Author

/label add hotspot-compiler-dev

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 20, 2025
@openjdk
Copy link

openjdk bot commented Jun 20, 2025

@jatin-bhateja
The hotspot-compiler label was successfully added.

@jatin-bhateja jatin-bhateja marked this pull request as ready for review June 25, 2025 10:30
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 25, 2025
@mlbridge
Copy link

mlbridge bot commented Jun 25, 2025

Webrevs

Copy link
Contributor

@mhaessig mhaessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing these new instructions! I had a look at your changes and have a few minor suggestions and questions. I am quite new to this part of the codebase, so feel free to disagree if I am way off base.

How did you test these changes?

Also, if you merge the current master branch, the Windows build failures in the Github Actions will be fixed.

assert(opc == Op_MinV || opc == Op_MinReductionV ||
opc == Op_MaxV || opc == Op_MaxReductionV, "sanity");
if (elem_bt == T_FLOAT) {
evminmaxps(dst, mask, src1, src2, true, opc == Op_MinV || opc == Op_MinReductionV ? 0x4 : 0x5, vlen_enc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps 0x4 and 0x5 should be factored into named constants since they are used in multiple places and it would also help readability if one does not have the documentation handy when reading the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mhaessig ,
Command bits are in accordance with Tables 11.1 and 11.2 of section 11.2. First 2 bits [1:0] signify the operation kind, 00 for min and 01 for max. Next two bits [3:2] signify the sign selection logic and 4th bit 0 for both min/max, with this command word we can emulate the semantics of Math.max/min using a single AVX10 instruciton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got that from the documentation you kindly linked in the description. My question was rather to define a constant like AVX10_MINMAX_MAX_COMPARE_SIGN = 0x5 that can be used instead of the plain magic numbers. Because people looking at the code later will not have the "luxury" of being provided a link to the relevant documentation right when puzzling about what 0x4 means in this case.

@@ -1238,6 +1238,18 @@ void C2_MacroAssembler::evminmax_fp(int opcode, BasicType elem_bt,
}
}

void C2_MacroAssembler::vminmax_fp(int opc, BasicType elem_bt, XMMRegister dst, KRegister mask,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 1122 mentions the differences between vminps/vmaxps and Java semantics. Perhaps a mention of the new instructions introduced in this PR might help people who are confused about the fact that vminmax_fp is overloaded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void eminmaxsh(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8);
void eminmaxss(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8);
void eminmaxsd(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8);
void evminmaxph(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, bool merge, int imm8, int vector_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason evminmaxph does not have a version where src has type Address?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we do not have a matcher pattern to consume it, as the MIN/MAX sequence was anyway, a bulky one. I have added a new pattern for memory operand flavor of the pattern specifically for AVX-10, along with this patch.

Patch has been regressed over the following tests using Intel SDE https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html (Version 9.53).

  • test/jdk/jdk/incubator/vector/Double*VectorTests:: (min/max all variants including reduction)
  • test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java
  • test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java

e.g. command line /home/jatinbha/softwares/sde-external-9.53.0-2025-03-16-lin/sde64 -future -ptr_raise -icount -- java

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

2 participants