This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
float16 argmax breaks on negative inputs #9007
Comments
Looks like a bug of the data1 = mx.sym.Variable('data1')
data2 = mx.sym.Variable('data2')
sym = mx.sym.maximum(data1, data2)
exe = sym.simple_bind(ctx=mx.cpu(), data1=(1,), type_dict={'data1': np.float16, 'data2': np.float16})
exe.forward(is_train=True, data1=np.array([-3], dtype=np.float16), data2=np.array([-4], dtype=np.float16))
print(exe.arg_dict['data1'].dtype)
print(exe.arg_dict['data2'].dtype)
print(exe.outputs[0]) |
Unary reduce ops have the problem of handling float16 correctly. For example, a = mx.nd.array([-2, 0], dtype=np.float16)
print(mx.nd.max(a))
[ 6.10351562e-05]
<NDArray 1 @cpu(0)> I'm guessing it might be related to setting initial minimum value for float16 type. |
The predefined max/min values in the format of float16 are wrong. The largest float16 is 65504, while the following code gives 57312. >>> a = mx.nd.array([65504, 65504], dtype='float16')
>>> mx.nd.min(a)
[ 57312.]
<NDArray 1 @cpu(0)> |
This was referenced Dec 19, 2017
# for free
to subscribe to this conversation on GitHub.
Already have an account?
#.
Description
float16 implementation of argmax seems to treat negative numbers as if they were zero.
Environment info (Required)
Package used (Python/R/Scala/Julia):
I'm using Python.
Minimum reproducible example
This code will calculate the argmax on two three-vectors, returning two integers. The first integer is correct, the second, which corresponds to the negative inputs, is incorrect. Seems to happen on both CPU and GPU.
This messes up accuracy metrics, for example, when doing half-precision training. You can work around it by adding a large constant offset before taking the argmax, but its obviously a horrible hack that isn't always going to work.
The text was updated successfully, but these errors were encountered: