Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

float16 argmax breaks on negative inputs #9007

Closed
taliesinb opened this issue Dec 9, 2017 · 4 comments
Closed

float16 argmax breaks on negative inputs #9007

taliesinb opened this issue Dec 9, 2017 · 4 comments

Comments

@taliesinb
Copy link
Contributor

Description

float16 implementation of argmax seems to treat negative numbers as if they were zero.

Environment info (Required)

----------Python Info----------
Version      : 3.6.1
Compiler     : GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)
Build        : ('default', 'May 11 2017 13:04:09')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 9.0.1
Directory    : /Users/taliesinb/.anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 0.12.1
Directory    : /Users/taliesinb/.anaconda3/lib/python3.6/site-packages/mxnet-0.12.1-py3.6.egg/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Darwin-17.3.0-x86_64-i386-64bit
system       : Darwin
node         : T-Book.local
release      : 17.3.0
version      : Darwin Kernel Version 17.3.0: Thu Nov  9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.2516 sec, LOAD: 1.4235 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.3097 sec, LOAD: 0.5009 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.4425 sec, LOAD: 1.4291 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.2759 sec, LOAD: 1.3082 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.2479 sec, LOAD: 1.0078 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.2734 sec, LOAD: 0.7072 sec.

Package used (Python/R/Scala/Julia):
I'm using Python.

Minimum reproducible example

This code will calculate the argmax on two three-vectors, returning two integers. The first integer is correct, the second, which corresponds to the negative inputs, is incorrect. Seems to happen on both CPU and GPU.

import mxnet as mx
import numpy as np
data = mx.symbol.Variable('data')
argmax = mx.symbol.argmax(data, axis=-1)
exec = argmax.simple_bind(ctx=mx.cpu(), data=(2, 3), type_dict={'data':np.float16})
exec.forward(is_train=True, data=np.asarray([[1,2,3],[-4,-3,-2]], dtype=np.float16))
exec.outputs

This messes up accuracy metrics, for example, when doing half-precision training. You can work around it by adding a large constant offset before taking the argmax, but its obviously a horrible hack that isn't always going to work.

@piiswrong piiswrong assigned piiswrong and unassigned piiswrong Dec 9, 2017
@piiswrong
Copy link
Contributor

@reminisce

@reminisce
Copy link
Contributor

Looks like a bug of the argmax operator itself, not about simple_bind, because the following test on maximum op could generate correct result. Will dig deeper.

data1 = mx.sym.Variable('data1')
data2 = mx.sym.Variable('data2')
sym = mx.sym.maximum(data1, data2)
exe = sym.simple_bind(ctx=mx.cpu(), data1=(1,), type_dict={'data1': np.float16, 'data2': np.float16})
exe.forward(is_train=True, data1=np.array([-3], dtype=np.float16), data2=np.array([-4], dtype=np.float16))
print(exe.arg_dict['data1'].dtype)
print(exe.arg_dict['data2'].dtype)
print(exe.outputs[0])

@reminisce
Copy link
Contributor

Unary reduce ops have the problem of handling float16 correctly. For example,

a = mx.nd.array([-2, 0], dtype=np.float16)
print(mx.nd.max(a))
[  6.10351562e-05]
<NDArray 1 @cpu(0)>

I'm guessing it might be related to setting initial minimum value for float16 type.
https://github.com/dmlc/mshadow/blob/2d7780c3f2eefe4453fa419862d1b2089bedb8d5/mshadow/extension/reduce_with_axis.h#L121

@reminisce
Copy link
Contributor

The predefined max/min values in the format of float16 are wrong. The largest float16 is 65504, while the following code gives 57312.

>>> a = mx.nd.array([65504, 65504], dtype='float16')
>>> mx.nd.min(a)
[ 57312.]
<NDArray 1 @cpu(0)>

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants