Loss function is one of the vital part of Deep Learning Models. In order to make your deep learning models learn from its mistakes loss function comes into picture. Loss functions are used to measure how well your deep learning model is able to predict the expected output label. But Loss function alone cannot make your model learn from its mistake ( i.e difference between actual output and predicted output ). It needs helps of optimization function in order to rectify the mechanism that led to incorrect prediction.
But there are many types of loss function. You might be wondering that why do we need so many loss function, so to clear your dilemma of loss functions we are here for you.
We can basically divide loss functions into 2 parts:
1. Regression Loss Functions.
2. Classification Loss Functions.
In this post we will discuss about Classification loss function. So let’s embark upon this journey of understanding loss functions for deep learning models.
1 . Log Loss
Log loss is also known as cross entropy loss function. It comes into play when you try to train a binary classifier. But in today’s ease of access to different libraries and frameworks it is pretty easy to ignore the actual working of this loss function.
Now the question is, your training labels are 0 and 1 but your training predictions are 0.4, 0.6, 0.89, 0.1122 etc.. So how do we calculate a measure of the error of our model ? If we directly classify all the observations having values > 0.5 into 1 then we are at a high risk of increasing the misclassification. This is because it is likely happen that many values having probabilities 0.4, 0.45, 0.49 can have a true value of 1.
2. Focal Loss
The loss function is reshaped to down-weight easy examples and thus focus training on hard negatives. A modulating factor (1-pt)^ γ is added to the cross entropy loss where γ is tested from [0,5] in the experiment.
There are two properties of the FL:
- When an example is misclassified and pt is small, the modulating factor is near 1 and the loss is unaffected. As pt →1, the factor goes to 0 and the loss for well-classified examples is down-weighted.
- The focusing parameter γ smoothly adjusts the rate at which easy examples are down-weighted. When γ = 0, FL is equivalent to CE. When γis increased, the effect of the modulating factor is likewise increased. (γ=2 works best in experiment.)
3. Relative Entropy
The relative entropy, also known as the Kullback-Leibler divergence, between two probability distributions on a random variable is a measure of the distance between them. Formally, given two probability distributions p(x) and q(x) over a discrete random variable X, the relative entropy given by D(p||q) is defined as follows:
4. Hinge Loss
This loss function is notably used for SVM ( Support Vector Machines ).
SVM uses hinge loss where as logistic regression using logistic loss function for optimizing the cost function and arriving at the weights. The way the hinge loss is different from logistic loss can be understood from the plot below
Note that the yellow line gradually curves downwards unlike purple line where the loss becomes 0 for values ‘predicted y’ ≥1. By looking at the plots above, this nature of curves brings out few major differences between logistic loss and hinge loss.
We can see logistic loss diverges faster than hinge loss. So it is also more sensitive to outliers.
In the next part we will look different types of regression loss function. You can also check out our this previous post on loss and optimisation functions. Loss Functions and Optimization functions
You can also check out this amazing post on loss functions: