float16 argmax breaks on negative inputs #9007

taliesinb · 2017-12-09T00:17:48Z

Description

float16 implementation of argmax seems to treat negative numbers as if they were zero.

Environment info (Required)

----------Python Info----------
Version      : 3.6.1
Compiler     : GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)
Build        : ('default', 'May 11 2017 13:04:09')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 9.0.1
Directory    : /Users/taliesinb/.anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 0.12.1
Directory    : /Users/taliesinb/.anaconda3/lib/python3.6/site-packages/mxnet-0.12.1-py3.6.egg/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Darwin-17.3.0-x86_64-i386-64bit
system       : Darwin
node         : T-Book.local
release      : 17.3.0
version      : Darwin Kernel Version 17.3.0: Thu Nov  9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.2516 sec, LOAD: 1.4235 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.3097 sec, LOAD: 0.5009 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.4425 sec, LOAD: 1.4291 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.2759 sec, LOAD: 1.3082 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.2479 sec, LOAD: 1.0078 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.2734 sec, LOAD: 0.7072 sec.

Package used (Python/R/Scala/Julia):
I'm using Python.

Minimum reproducible example

This code will calculate the argmax on two three-vectors, returning two integers. The first integer is correct, the second, which corresponds to the negative inputs, is incorrect. Seems to happen on both CPU and GPU.

import mxnet as mx
import numpy as np
data = mx.symbol.Variable('data')
argmax = mx.symbol.argmax(data, axis=-1)
exec = argmax.simple_bind(ctx=mx.cpu(), data=(2, 3), type_dict={'data':np.float16})
exec.forward(is_train=True, data=np.asarray([[1,2,3],[-4,-3,-2]], dtype=np.float16))
exec.outputs

This messes up accuracy metrics, for example, when doing half-precision training. You can work around it by adding a large constant offset before taking the argmax, but its obviously a horrible hack that isn't always going to work.

The text was updated successfully, but these errors were encountered:

piiswrong · 2017-12-09T19:02:15Z

@reminisce

reminisce · 2017-12-10T06:17:09Z

Looks like a bug of the argmax operator itself, not about simple_bind, because the following test on maximum op could generate correct result. Will dig deeper.

data1 = mx.sym.Variable('data1')
data2 = mx.sym.Variable('data2')
sym = mx.sym.maximum(data1, data2)
exe = sym.simple_bind(ctx=mx.cpu(), data1=(1,), type_dict={'data1': np.float16, 'data2': np.float16})
exe.forward(is_train=True, data1=np.array([-3], dtype=np.float16), data2=np.array([-4], dtype=np.float16))
print(exe.arg_dict['data1'].dtype)
print(exe.arg_dict['data2'].dtype)
print(exe.outputs[0])

reminisce · 2017-12-10T06:51:02Z

Unary reduce ops have the problem of handling float16 correctly. For example,

a = mx.nd.array([-2, 0], dtype=np.float16)
print(mx.nd.max(a))
[  6.10351562e-05]
<NDArray 1 @cpu(0)>

I'm guessing it might be related to setting initial minimum value for float16 type.
https://github.com/dmlc/mshadow/blob/2d7780c3f2eefe4453fa419862d1b2089bedb8d5/mshadow/extension/reduce_with_axis.h#L121

reminisce · 2017-12-19T21:14:00Z

The predefined max/min values in the format of float16 are wrong. The largest float16 is 65504, while the following code gives 57312.

>>> a = mx.nd.array([65504, 65504], dtype='float16')
>>> mx.nd.min(a)
[ 57312.]
<NDArray 1 @cpu(0)>

piiswrong assigned piiswrong and unassigned piiswrong Dec 9, 2017

This was referenced Dec 19, 2017

Fix float16 min and max values dmlc/mshadow#315

Merged

Fix float16 min and max #9149

Merged

szha closed this as completed Jan 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float16 argmax breaks on negative inputs #9007

float16 argmax breaks on negative inputs #9007

taliesinb commented Dec 9, 2017

piiswrong commented Dec 9, 2017

reminisce commented Dec 10, 2017

reminisce commented Dec 10, 2017

reminisce commented Dec 19, 2017

float16 argmax breaks on negative inputs #9007

float16 argmax breaks on negative inputs #9007

Comments

taliesinb commented Dec 9, 2017

Description

Environment info (Required)

Minimum reproducible example

piiswrong commented Dec 9, 2017

reminisce commented Dec 10, 2017

reminisce commented Dec 10, 2017

reminisce commented Dec 19, 2017