Skip to content

[AArch64] does not use rev32/rev64 instructions, resulting in redundant shift operations #130469

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
k-arrows opened this issue Mar 9, 2025 · 2 comments · Fixed by #136707
Closed

Comments

@k-arrows
Copy link

k-arrows commented Mar 9, 2025

Here is the code from gcc testsuite.
https://godbolt.org/z/jzdcsfxx4

typedef char __attribute__ ((vector_size (16))) v16qi;
typedef unsigned short __attribute__ ((vector_size (16))) v8hi;
typedef unsigned int __attribute__ ((vector_size (16))) v4si;
typedef unsigned long long __attribute__ ((vector_size (16))) v2di;
typedef unsigned short __attribute__ ((vector_size (8))) v4hi;
typedef unsigned int __attribute__ ((vector_size (8))) v2si;
 
v2di
G1 (v2di r)
{
  return (r >> 32) | (r << 32);
}
 
v4si
G2 (v4si r)
{
  return (r >> 16) | (r << 16);
}
 
v8hi
G3 (v8hi r)
{
  return (r >> 8) | (r << 8);
}
 
v2si
G4 (v2si r)
{
  return (r >> 16) | (r << 16);
}
 
v4hi
G5 (v4hi r)
{
  return (r >> 8) | (r << 8);
}

GCC efficiently uses rev32 or rev64 to complete the operation in a single instruction.

@llvmbot
Copy link
Member

llvmbot commented Mar 9, 2025

@llvm/issue-subscribers-backend-aarch64

Author: None (k-arrows)

Here is the code from gcc testsuite. https://godbolt.org/z/jzdcsfxx4 ```c typedef char __attribute__ ((vector_size (16))) v16qi; typedef unsigned short __attribute__ ((vector_size (16))) v8hi; typedef unsigned int __attribute__ ((vector_size (16))) v4si; typedef unsigned long long __attribute__ ((vector_size (16))) v2di; typedef unsigned short __attribute__ ((vector_size (8))) v4hi; typedef unsigned int __attribute__ ((vector_size (8))) v2si;

v2di
G1 (v2di r)
{
return (r >> 32) | (r << 32);
}

v4si
G2 (v4si r)
{
return (r >> 16) | (r << 16);
}

v8hi
G3 (v8hi r)
{
return (r >> 8) | (r << 8);
}

v2si
G4 (v2si r)
{
return (r >> 16) | (r << 16);
}

v4hi
G5 (v4hi r)
{
return (r >> 8) | (r << 8);
}


GCC efficiently uses rev32 or rev64 to complete the operation in a single instruction.
</details>

@jyli0116
Copy link
Contributor

Hi, I'm looking into this right now, could I please be assigned to the issue?

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants