By guest blog
Compile Flin
Source: analyticsvidhya
introduce
Object detection is one of the most widely studied topics in the computer vision community. It has entered various industries, involving use cases from image security, surveillance, automatic vehicle systems to machine inspection.
At present, object detection based on deep learning can be roughly divided into two categories

Two stage detectors, such as region based CNN (rcnn) and its subsequent products.

Primary detectors such as Yolo series detectors and SSDs
The conventional, dense sampling (possible object position) primary detector applied to the anchor frame may be faster and simpler, but its accuracy has lagged behind that of the twostage detector due to the extreme level imbalance encountered in the training process.
Fair published a paper in 2018, in which they introduced the concept of focus loss and used a primary detector they called retinanet to deal with such imbalances.
Before we go into the essence of focus loss, let’s first understand what this kind of imbalance problem is and what it may cause.
catalogue

Why focus loss

What is focus loss

Cross entropy loss
 Cross entropy problem
 example

Balanced cross entropy loss
 Equilibrium cross entropy problem
 example

Focus loss description
 example

Cross entropy loss vs focus loss
 Records easy to classify correctly
 Misclassified records
 Very easy to categorize records

Last thought
Why focus loss
Two classical first level detection methods, such as enhanced detector, DPM and the latest methods (such as SSD), can evaluate about 10 ^ 4 to 10 ^ 5 candidate positions in each image, but only a few positions contain objects (i.e. foreground), while the rest are only background objects. This leads to class imbalance.
This imbalance leads to two problems

The training efficiency is low, because most of the positions are easy to be judged as negative (which means the detector can easily classify them as background), which is not helpful to the learning of the detector.

Negative classes (detection with high probability) account for a large part of the input. Although the gradients and losses calculated separately are small, they may overburden the losses and calculated gradients and lead to model degradation.
What is focus loss
In short, focal loss (FL) is an improved version of cross entropy loss (CE). It deals with class imbalance by assigning more weights to examples that are difficult to classify or prone to misclassification (i.e. background with noise texture or some objects or objects that we are interested in), The weight of the simple example (background object) is reduced.
As a result, focus loss reduces the loss contribution of simple examples and increases the importance of correcting misclassified examples.
Therefore, let’s first understand the cross entropy loss of binary classification.
Cross entropy loss
The idea behind the cross entropy loss is to punish the wrong prediction, rather than reward the correct prediction.
The cross entropy loss of binary classification is as follows:
Among them:
Y_{act}=Actual value of Y
Y_{pred}=Predicted value of Y
For the convenience of marking, we remember y_{act} = YAnd Y_{pred} = p 。
Y ∈ {0,1}, which is the correct annotation
P ∈ [0,1] is the estimated probability of y = 1.
For symbolic convenience, we can rewrite the above equation as:
p_{t}={ ln (P), when y = 1 – ln (1p), when y =}
CE（p，y）= CE（p_{t}）=ln（p_{t}）
Cross entropy problem
As you can see, the blue line in the figure below shows the easily classified P when p is very close to 0 (when y = 0) or 1_{t}>The example of 0.5 may cause a considerable loss.
Let’s use the following example to understand it.
example
Suppose that the prospect (which we call class 1) is correctly classified as P = 0.95——
CE（FG）= ln（0.95）= 0.05
And the background (we call it Class 0) is correctly classified as P = 0.05——
CE（BG）=ln（1 0.05）= 0.05
The problem is that for the imbalanced data set, when these small losses are added to the whole image, the overall loss (total loss) may be overburdened. Therefore, it will lead to model degradation.
Balanced cross entropy loss
A common method to solve the problem of class imbalance is to introduce the weight factor ∝ [0,1] into the class
For marking convenience, we can define ∝ in the loss function_{t}As follows:
CE（p_{t}）= ∝_{t} ln ln（p_{t}）
As you can see, this is just an extension of cross entropy.
The problem of equilibrium cross entropy
Our experiments will show that the large class imbalance in the dense detector training process outweighs the cross entropy loss.
The easily classified negative class accounts for the majority of the losses and dominates the gradient. Although it balances the importance of positive / negative examples, it does not distinguish between simple / difficult examples.
Let’s understand this through an example
example
Suppose that the prospect (which we call class 1) is correctly classified as P = 0.95——
CE（FG）= 0.25 * ln（0.95）= 0.0128
Correctly classified as P = 0.05 background (we call it Class 0)——
CE（BG）=（10.25）* ln（1 0.05）= 0.038
Although positive and negative classes can be distinguished correctly, simple / difficult examples cannot be distinguished.
This is where focus loss (extended to cross entropy) works.
Focus loss description
Focus loss is just an extension of cross entropy loss function, which will reduce the weight of simple examples and focus training on difficult negative samples.
For this reason, the researchers put forward the following suggestions
（1 p_{t}）^{γ}It is cross entropy loss, and the focusing parameters can be adjusted γ ≥0。
Focus loss in retinanet object detection α Balanced variants, where α = 0.25， γ= 2. The effect is the best.
Therefore, focus loss can be defined as——
FL (p_{t}) = α_{t}(1 p_{t})^{γ} log log(p_{t}).
about γ For several values of [0,5], you can see the focus loss, see Figure 1.
We will note the following characteristics of focus loss:
 When the sample classification is wrong and P_{t}The modulation factor is close to 1, and the loss is not affected.
 Whenp_{t} →1The factor becomes 0, and the loss of well classified examples is weighed.
 Focusing parameters γ The weight of the simple example is adjusted smoothly.
The effect of modulation factor also increases with the increase of modulation factor（ After a lot of experiments and experiments, the researchers found that γ = 2) the best effect
be careful: when γ= At 0, FL is equivalent to CE. Refer to the blue curve in the figure.
Intuitively, the modulation factor reduces the loss contribution of the simple example and expands the range of low loss for example reception.
Let’s understand the characteristics of the above focus loss through an example.
example
 When records (foreground or background) are correctly classified,

When the foreground is correctly classified, the prediction probability p = 0.99; when the background is correctly classified, the prediction probability p = 0.01.
p_{t}=99, when y {0_{act}=01, when y = 1_{act}=0} modulation factor (FG) = (10.99)^{2} = 0.0001
Modulation factor (BG) = (1 – (10.01)) 2 = 0.0001, as you can see, the modulation factor is close to 0, so the loss will be weighted down. 
When the foreground is misclassified, the prediction probability is p = 0.01; when the background object is misclassified, the prediction probability is p = 0.99.
p_{t}=01, when y {0_{act}=99, when y = 1_{act}=0} modulation factor (FG) = (10.01)^{2} = 0.9801
Modulation factor (BG) = (1 – (10.99))^{2}=9801 as you can see, the modulation factor is close to 1, so the loss is not affected.
Now, let’s use some examples to compare cross entropy and focus loss, and look at the impact of focus loss during training.
Cross entropy loss vs focus loss
Let’s make a comparison by considering the following situations.
Records easy to classify correctly
It is assumed that the prediction probability of correct foreground classification is p = 0.95, and the prediction probability of correct background classification is p = 0.05.
p_{t}=95, when y {0_{act}=1:10.05, when y_{act }=0} Ce (FG) = – ln (0.95) = 0.0512932943875505
Let’s consider ∝ = 0.25 and γ= 2.
FL(FG)= 0.25 * (10.95)^{2} * ln (0.95) = 3.2058308992219E5
FL(BG)= 0.75 * (1(10.05))^{2} * ln (10.05) = 9.61E5
Misclassified records
It is assumed that the foreground with prediction probability p = 0.05 is classified as the background object with prediction probability p = 0.05.
p_{t}={0.95, when y Act = 1 10.05, when y Act = 0}
CE（FG）= ln（0.05）= 2.995732273553991
CE（BG）= ln（10.95）= 2.995732273553992
Let’s consider the same scenario, that is, ∞ = 0.25 and γ= 2。
FL（FG）= 0.25 * （10.05）2 * ln（0.05）= 0.675912094220619
FL（BG）= 0.75 * （1（10.95））2 * ln（10.95）= 2.027736282661858
Very easy to categorize records
Suppose that for the background object with prediction probability p = 0.01, the foreground is classified with prediction probability p = 0.99.
p_{t}=99, when y {0_{act}=01, when y = 1_{act }=0} Ce (FG) = – ln (0.99) = 0.0100503358535014
CE（BG）= ln（10.01）= 0.0100503358535014
Let’s consider the same scenario, that is, ∞ = 0.25 and γ= 2。
FL（FG）= 0.25 * （10.01）2 * ln（0.99）= 2.51 * 10 7
FL（BG）= 0.75 * （1（10.01））2 * ln（10.01） = 7.5377518901261E7
Last thought
Option 1: 0.05129/3.2058 * 10 – 7 = 1600 times smaller
Option 2: 2.3 / 0.667 = 4.5 times smaller
Option 3: 0.01/0.00000025 = 40000 times smaller.
These three cases clearly show how focus loss can reduce the weight of well classified records, and on the other hand, give greater weight to misclassified or difficult records.
After a large number of tests and experiments, the researchers found that ∝ = 0.25 and γ = 2. The effect is the best.
Endnote
In object detection, we experienced the whole evolutionary process from cross entropy loss to focus loss. I’ve tried to explain the loss of focus in target detection.
Thank you for reading!
reference resources

https://medium.com/@14prakash/theintuitionbehindretinaneteb636755607d

https://developers.arcgis.com/python / guide / howretinanetworks /
Link to the original text:https://www.analyticsvidhya.com/blog/2020/08/abeginnersguidetofocallossinobjectdetection/
Welcome to panchuang AI blog:
http://panchuang.net/
Sklearn machine learning official Chinese document:
http://sklearn123.com/
Welcome to pancreato blog Resource Hub:
http://docs.panchuang.net/