[Potential NAN bug] Loss may become NAN during training #383

Justobe · 2020-07-27T14:10:57Z

Hello~

Thank you very much for sharing the code!

I try to use my own data set ( with the same shape as mnist) in code. After some iterations, it is found that the training loss becomes NAN. After carefully checking the code, I found that the following code may trigger NAN in loss:

In TensorFlow-Examples/examples/2_BasicModels/logistic_regression.py:

cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))

If pred contains 0 (output of softmax ), the result of tf.log(pred) is inf because log(0) is illegal . And this may cause the result of loss to become NAN.

It could be fixed by making the following changes:

cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred + 1e-10), reduction_indices=1))

or

cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred,1e-10,1.0)), reduction_indices=1))

Hope to hear from you ~

Thanks in advance! : )

The text was updated successfully, but these errors were encountered:

Justobe · 2021-01-13T03:52:20Z

@aymericdamien

Justobe mentioned this issue Jul 29, 2020

fix NaN bug in tf.log #384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Potential NAN bug] Loss may become NAN during training #383

[Potential NAN bug] Loss may become NAN during training #383

Justobe commented Jul 27, 2020

Justobe commented Jan 13, 2021

[Potential NAN bug] Loss may become NAN during training #383

[Potential NAN bug] Loss may become NAN during training #383

Comments

Justobe commented Jul 27, 2020

Justobe commented Jan 13, 2021