Skip to content

luajit: bit ops works differently with -march=native on Tiger Lake #6787

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
Totktonada opened this issue Jan 15, 2022 · 3 comments
Open

luajit: bit ops works differently with -march=native on Tiger Lake #6787

Totktonada opened this issue Jan 15, 2022 · 3 comments
Labels
bug Something isn't working luajit

Comments

@Totktonada
Copy link
Member

Tarantool version:

2.10.0-beta2-5-gdc19be406

Successful build:

$ cmake . -DCMAKE_BUILD_TYPE=Debug -DENABLE_BACKTRACE=ON -DENABLE_DIST=ON -DENABLE_FEEDBACK_DAEMON=OFF -DENABLE_BUNDLED_LIBCURL=OFF && make -j

Failed build:

$ CFLAGS="-march=native -O2" cmake . -DCMAKE_BUILD_TYPE=Debug -DENABLE_BACKTRACE=ON -DENABLE_DIST=ON -DENABLE_FEEDBACK_DAEMON=OFF -DENABLE_BUNDLED_LIBCURL=OFF && make -j
# or
$ CFLAGS="-march=native" cmake . -DCMAKE_BUILD_TYPE=Debug -DENABLE_BACKTRACE=ON -DENABLE_DIST=ON -DENABLE_FEEDBACK_DAEMON=OFF -DENABLE_BUNDLED_LIBCURL=OFF && make -j

How it fails:

./src/tarantool third_party/luajit/test/LuaJIT-tests/lib/ffi/bit64.lua
LuajitError: third_party/luajit/test/LuaJIT-tests/lib/ffi/bit64.lua:49: assertion failed!
fatal error, exiting the event loop

The test case itself:

  9 ffi.cdef[[
 10 typedef enum { ZZI = -1 } ienum_t;
 11 typedef enum { ZZU } uenum_t;
 12 ]]
<...>
 44 do --- tobit/band negative unsigned enum
 45   local x = ffi.new("uenum_t", -10)
 46   local y = tobit(x)
 47   local z = band(x)
 48   assert(type(y) == "number")
 49   assert(y == -10)
 50   assert(type(z) == "cdata")
 51   assert(z == 2^32-10)
 52 end
Hardware

Dell Latitude 5420. Tiger Lake CPU.

$ cat /proc/cpuinfo | head -n 27
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 140
model name	: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
stepping	: 1
microcode	: 0x86
cpu MHz		: 1799.998
cache size	: 12288 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 27
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear flush_l1d arch_capabilities
vmx flags	: vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs
bogomips	: 3609.60
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:
Environment
$ head -n 1 /etc/os-release
NAME=Gentoo
$ uname -a
Linux rade 5.14.0-gentoo #20 SMP Tue Nov 30 01:58:29 MSK 2021 x86_64 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz GenuineIntel GNU/Linux
$ gcc --version | head -n 1
gcc (Gentoo 11.2.0 p1) 11.2.0
  • glibc-2.34-r2
  • binutils-2.37_p1-r1

The problem was initially found, when I built tarantool using emerge, and then it was narrowed down to the reproducer above. I'll share configuration applied on tarantool package just in case.

emerge configuration

Ebuild is here.

$ grep -R dev-db/tarantool /etc/portage/
/etc/portage/package.accept_keywords/tarantool:dev-db/tarantool **
/etc/portage/package.use/tarantool:dev-db/tarantool debug
/etc/portage/package.env/debug.env.list:dev-db/tarantool debug.conf
$ cat /etc/portage/env/debug.conf 
# > If nostrip is in your default FEATURES, splitdebug won't do anything! 
# https://wiki.gentoo.org/wiki/Debugging
CFLAGS="${CFLAGS} -ggdb"
CXXFLAGS="${CXXFLAGS} -ggdb"
FEATURES="${FEATURES} splitdebug compressdebug installsources -nostrip"
USE="debug"

USE flags: backtrace debug system-libcurl system-libyaml system-zstd -feedback-daemon -gcov -gprof -systemd -test CPU_FLAGS_X86="avx sse2".

$ cat /etc/portage/make.conf
CFLAGS="-march=native -O2"
CXXFLAGS="${CFLAGS}"
CHOST="x86_64-pc-linux-gnu"
MAKEOPTS="-j9 -l8"
CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3"
USE="${USE} ${CPU_FLAGS_X86}"
PORTDIR="/usr/portage"
DISTDIR="${PORTDIR}/distfiles"
PKGDIR="${PORTDIR}/packages"
ACCEPT_KEYWORDS="~amd64"
EMERGE_DEFAULT_OPTS="--with-bdeps=y --quiet-build=y --jobs=4 --load-average=4"
PORTAGE_RSYNC_EXTRA_OPTS="--quiet"
INPUT_DEVICES="keyboard synaptics mouse evdev wacom"
VIDEO_CARDS="fbdev vesa intel i965 i915 iris"
# HDA Intel doesn't require anything special,
# disable other (default enabled) cards
ALSA_CARDS=""

# Moved to /etc/portage/package.use/python as suggested by
# eselect news read 35.
# 'Python 3.7 to become the default target'.
# PYTHON_TARGETS="python3_6 python3_7 python3_8 python3_9"
# PYTHON_SINGLE_TARGET="python3_7"

# I don't really care about versions here.
#RUBY_TARGETS="ruby26"

# Enable X-related features in apps
USE="${USE} X"

# Fonts supporting antialiasing and so on
USE="${USE} xft"
@Totktonada
Copy link
Member Author

If we'll dig down to the root of the issue, I would also ask to re-verify, whether it is actually the reason of the difference in vinyl behaviour. This problem is described in tarantool/expirationd#104 and I see a correlation.

How to check: checkout expirationd 1.1.1-44-g838c2d1, run luatest -v and look, whether the following errors are shown:

Timed out waiting for Vinyl memory quota

I would also highlight that we see those errors (re vinyl quota) in the integration testing on Ubuntu 20.04 (Focal), where expirationd tests are run using the usual package for Ubuntu Focal. So, if the correlation is actually a relation, this problem is possibly not about some very unusual build.

@kyukhin kyukhin added bug Something isn't working teamL labels Jan 21, 2022
@Totktonada
Copy link
Member Author

Reproduced on Intel Xeon: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz.

/proc/cpuinfo
processor	: 79
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
stepping	: 7
microcode	: 0x5003006
cpu MHz		: 800.597
cache size	: 28160 KB
physical id	: 1
siblings	: 40
core id		: 28
cpu cores	: 20
apicid		: 121
initial apicid	: 121
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 4206.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Minimal test case:

local ffi = require("ffi")
local bit = require("bit")

ffi.cdef[[
typedef enum { ZZI = -1 } ienum_t;
typedef enum { ZZU } uenum_t;
]]

-- tobit/band negative unsigned enum
local x = ffi.new("uenum_t", -10)
local y = bit.tobit(x)
assert(type(y) == "number")
assert(y == -10, tostring(y))

Related to AVX512?

@Buristan
Copy link
Collaborator

When build with -march=native on Tiger Lake (or any other CPU architecture with AVX512) the following script

print(ffi.new('uint64_t', -10))

prints 18446744073709551615ULL, i.e. -1ULL.
The reason is the function lj_num2u64(), more specifically its usage of vcvttsd2usi instruction.

This instruction is avalable with CPU feature flag AVX512F.

#(gdb) x /16i lj_num2u64
   0x445c48 <lj_cconv_ct_ct+2376>:      vcvttsd2usi rax,xmm0
=> 0x445c4e <lj_cconv_ct_ct+2382>:      mov    QWORD PTR [r14],rax

xmm0 is the argument:

(gdb) p /x $xmm0
$23 = {v4_float = {0x0, 0xfffffffe, 0x0, 0x0}, v2_double = {0xfffffffffffffff6, 0x0}, v16_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x24, 0xc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x0, 0x0, 0x0, 0xc024, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0xc0240000, 0x0, 0x0}, v2_int64 = {0xc024000000000000, 0x0}, uint128 = 0x0000000000000000c024000000000000}

and the result is the rax:

(gdb) p $rax
$22 = -1

As stated in instruction description:

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result cannot be represented in the destination format, the floating-point invalid exception is raised, and if this exception is masked, the integer value 2^w – 1 is returned, where w represents the number of bits in the destination format.

So as far as -10 doesn't hit unsigned integer range 2^64 - 1 == -1ULL value is yielded.

In upstream this function is modified by the following patch. After the patch the assembler of this function contains vcvttsd2si instruction that yields the indefinite integer value (0x8000000000000000) as a result of the conversion.

See also LuaJIT/LuaJIT#415.

@igormunkin igormunkin removed the teamL label Sep 15, 2022
ligurio added a commit to tarantool/luajit that referenced this issue Apr 11, 2024
This commit adds the build with aforementioned option to exotic
builds matrix, so it is tested on both x86_64 and ARM64
architectures now.

Needed for tarantool/tarantool#9595
Related to tarantool/tarantool#6787
ligurio added a commit to tarantool/luajit that referenced this issue Apr 11, 2024
This commit adds a workflow for building and testing with enabled
AVX512.

Needed for tarantool/tarantool#9595
Related to tarantool/tarantool#6787
ligurio added a commit to tarantool/luajit that referenced this issue Apr 11, 2024
This commit adds a workflow for building and testing with enabled
AVX512.

Needed for tarantool/tarantool#9595
Related to tarantool/tarantool#6787
ligurio added a commit to tarantool/luajit that referenced this issue Apr 11, 2024
This commit adds a workflow for building and testing with enabled
AVX512.

Needed for tarantool/tarantool#9595
Related to tarantool/tarantool#6787
ligurio added a commit to tarantool/luajit that referenced this issue Apr 15, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9595
Relates to tarantool/tarantool#6787
ligurio added a commit to tarantool/luajit that referenced this issue Apr 15, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9595
Relates to tarantool/tarantool#6787
ligurio added a commit to tarantool/luajit that referenced this issue Apr 16, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9595
Relates to tarantool/tarantool#6787
ligurio added a commit to tarantool/luajit that referenced this issue Apr 16, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9595
Relates to tarantool/tarantool#6787
ligurio added a commit to tarantool/luajit that referenced this issue Jun 13, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9595
Relates to tarantool/tarantool#6787
ligurio added a commit to tarantool/luajit that referenced this issue Jun 14, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9595
Relates to tarantool/tarantool#6787
Buristan pushed a commit to tarantool/luajit that referenced this issue Jun 20, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9595
Relates to tarantool/tarantool#6787
Buristan pushed a commit to tarantool/luajit that referenced this issue Jun 20, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9924
Relates to tarantool/tarantool#6787

Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org>
Reviewed-by: Sergey Kaplun <skaplun@tarantool.org>
Signed-off-by: Sergey Kaplun <skaplun@tarantool.org>
(cherry picked from commit bb08425)
Buristan pushed a commit to tarantool/luajit that referenced this issue Jun 20, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9924
Relates to tarantool/tarantool#6787

Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org>
Reviewed-by: Sergey Kaplun <skaplun@tarantool.org>
Signed-off-by: Sergey Kaplun <skaplun@tarantool.org>
Buristan pushed a commit to tarantool/luajit that referenced this issue Jun 20, 2024
This commit adds a workflow for building and testing with AVX512
enabled.

Needed for tarantool/tarantool#9924
Relates to tarantool/tarantool#6787

Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org>
Reviewed-by: Sergey Kaplun <skaplun@tarantool.org>
Signed-off-by: Sergey Kaplun <skaplun@tarantool.org>
(cherry picked from commit bb08425)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working luajit
Projects
None yet
Development

No branches or pull requests

4 participants