Skip to content
YonghaoHe edited this page Mar 15, 2021 · 9 revisions

LFD

In this page, we introduce the main features of LFD. Since LFD is an updated version of LFFD, we hightly recommend you to read the LFFD paper first for understanding the core ideas.

By defualt, you are supposed to know the basic ideas of LFFD.

What is behind the Receptive Field (RF) and Effective Receptive Field (ERF)

RF and ERF are important concepts in CNN. They are controlled by the depth and width of the network. The details of RF and ERF are not discussed here. You can easily find out much information by searching in Google.

We want to deliver the following main idea:

  • The main responsibility of backbones is to provide multiple feature maps that have different RF sizes and RF strides
  • The backbones should be carefully designed to balance the trade-off of accuracy and speed

If you want to design your own network, you can remember the following two simple rules:

  1. higher accuracy ---- let the first feature map for head has the RF stride of 4. This will result in better feature representations, especially for small objects.
  2. faster speed ---- let the first feature map for head has the RF stride of 8. This will rapidly decrease the resolution of feature map, thus greatly reduce the amount of computations.

FCN-style vs FPN-style

We briefly discuss the overall structures of networks for object detection. In our view, the structures can be divided into two categories: FCN-style and FPN-style. The following figure shows the typical structures of the two kinds:

FCN-style vs FPN-style

FCN-style: directly use feature maps from backbones as the inputs of heads, like SSD and LFFD FPN-style: use feature maps from backbones to perform feature fusion, heads accept the feature maps after fusion as inputs, like RetinaNet and FCOS

Which style to choose ?

FCN-style

  1. the mean ratio of longer side and shorter side of objects is small, say less than 3. This means that the objects are nearly 'square'. For WIDERFACE and TT100K, we use FCN-style structures.
  2. FCN-style is memory-friendly and runs faster.

FPN-style

  1. the mean length ratio of longer side and shorter side of objects is relatively large, say greater than 3. This means that the objects may be 'narrow'.
  2. FPN-style may result in better feature representation. However, FPN-style costs more memory and runs slower compared to FCN-style.

Generally speaking, try to use FCN-style first, so that you can feel and enjoy the advantages of LFD, espacially for edge devices. In real-world applications, designing an appropriate network should take more things into considerations and it is empirical. We will add more real cases to let you master the essential.

Head Sharing

Head sharing means that all heads share the same set of weights. It exists in RetinaNet and FCOS and seems become a standard choice. However, authors did not really give some reliable explanations.

In the previous work LFFD, we do not share weights among heads. In LFD, we definitely recommend you to use head sharing in all structures. We present the reasons below:

  • why head sharing is possible ?

Let's take a look at the feature maps that are inputs of heads. These feature maps are semantical 'isomorphism'. Semantical 'isomorphism' is created by ourselves, and it means that inner semantical structures of the maps are analogical although they have different semantic levels. From another view, the objects belong to the same category share similar structures. I do not know if you can fully understand, in the future, we may give more explanations.

  • what are the benefits of head sharing ?
  1. solve the sample imbalance for all heads. In LFFD, we have to design proper samplers to feed enough objects with varying scales into each head, or the weights are not well updated. But with head sharing, such problem completely disappears. No matter which head the object is assigned to, the shared weights are effectively updated. In our view, this benefit is the most important. In our practice, the models with head sharing have better performance than those without head sharing.
  2. less weights (everyone knows this), but the amount of computations remains

BBoxes Assignment across Heads

For one-stage anchor-free detectors, how to assign bboxes (objects) to each head is important. In FCOS, the max distance between the point (RF center) and bbox's four sides determines the target head. In LFD, we provide two simple choices: longer side or shorter side of the bbox.

  • longer side

In longer side strategy, each head clearly knows the upper bound of bboxes assigned to it. When you choose FCN-style structure, we recommend you to adopt this stategy.

  • shorter side In shorter side strategy, each head only knows the lower bound of bboxes assigned to it, and the upper bound is uncertain. Thus, the head needs higher level features to enhance the representation. So this strategy is always combined with FPN-style structures.

Which Losses to Choose

For regression loss, we tried MSE, Smooth L1 and IOULoss. We find that IOULoss performs better, so you can use IOULoss as your default regression loss.

For classification loss, we tried CE and FocalLoss. On WIDERFACE, FocalLoss is slightly better than CE, but on TT100K, CE is better. If you have enough machines, you can try the both and pick up the better one.