not-edge loss value is too small, edge loss is nan/inf #14

zhengtianyu1996 · 2019-01-07T14:49:22Z

Hi, I tried your affinity loss (not adaptive) as my loss function, my network is DeeplabV3+, MobileNet, My own dataset. I set margin=3.0, lambda1=1.0, lambda2=1.0
But there is something wrong with the loss, the not-edge loss is really small and not converge.

Here is a part of nor-edge loss value during training:
Mean Aff Loss is:[6.15826357e-05] Mean Aff Loss is:[7.15486458e-05] Mean Aff Loss is:[4.56848611e-05] Mean Aff Loss is:[5.51421945e-05] Mean Aff Loss is:[7.94407606e-05] Mean Aff Loss is:[0.000143873782] Mean Aff Loss is:[6.04316447e-05] Mean Aff Loss is:[9.94381699e-05] Mean Aff Loss is:[0.000107184518] Mean Aff Loss is:[6.87552383e-05] Mean Aff Loss is:[7.98113e-05] Mean Aff Loss is:[0.000122067388] Mean Aff Loss is:[5.42108719e-05]

As for edge loss value, it will alert Nan or Inf in the beginning. It troubles me so much :(

Could anyone give some advice?

The text was updated successfully, but these errors were encountered:

zhengtianyu1996 · 2019-01-08T16:05:07Z

okay, update something about the Nan/Inf problem:

in losses.affinity_loss function
the edge seems good, the not_ignore seems good. But when it runs to tf.logical_and:

edge = tf.logical_and(edge, not_ignore)

the output 'edge' will be a completely zero-matrix. That means no effective value left. So the final edge_loss will meet problems.

I will continue debugging, hope could help someone.

zhengtianyu1996 · 2019-01-16T17:07:26Z

I think the problem is just caused by:
edge_indices = tf.where(tf.reshape(edge, [-1]))
because 'edge' sometimes will be a completely zero-matrix, so the shape of edge_indices will be (0,1) sometimes. Then
edge_loss = tf.gather(edge_loss, edge_indices)
will generate inf values.

So I think 'edge' and 'not_ignore' should be checked well. However, I still don't know whether it's a common problem, maybe it's related to the dataset itself. How do you think about it? @twke18

arc144 · 2019-01-17T13:06:13Z

okay, update something about the Nan/Inf problem:

in losses.affinity_loss function
the edge seems good, the not_ignore seems good. But when it runs to tf.logical_and:

edge = tf.logical_and(edge, not_ignore)

the output 'edge' will be a completely zero-matrix. That means no effective value left. So the final edge_loss will meet problems.

I will continue debugging, hope could help someone.

I was looking how ignores_from_label and edges_from_label compute the edge map and it seems they compute it differently. ignores_from_label does it backwards, i.e. for st_y in range(2*size,-1,-size):, whereas edges_from_label does it forward, i.e. for st_y in range(0,2*size+1,size):.

Is it intentional? Could this be the source of the zero-matrix issue when edge = tf.logical_and(edge, not_ignore) is computed?

xychenunc · 2020-03-06T14:46:55Z

Have you guys obtained improved results using affinity field loss? I tried many times, but I can hardly get improved results over my baseline. I also have the same issue as yours and I just do not take the term which is Nan into account when computing the total loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not-edge loss value is too small, edge loss is nan/inf #14

not-edge loss value is too small, edge loss is nan/inf #14

zhengtianyu1996 commented Jan 7, 2019

zhengtianyu1996 commented Jan 8, 2019

zhengtianyu1996 commented Jan 16, 2019

arc144 commented Jan 17, 2019 •

edited

Loading

xychenunc commented Mar 6, 2020

not-edge loss value is too small, edge loss is nan/inf #14

not-edge loss value is too small, edge loss is nan/inf #14

Comments

zhengtianyu1996 commented Jan 7, 2019

zhengtianyu1996 commented Jan 8, 2019

zhengtianyu1996 commented Jan 16, 2019

arc144 commented Jan 17, 2019 • edited Loading

xychenunc commented Mar 6, 2020

arc144 commented Jan 17, 2019 •

edited

Loading