To understand the concept of regularization in machine learning first we need to understand underfitting, overfitting, bias and variance.
To understand these concept consider the below diagram:
We know that in regression problem we try to find a best fit line for all distributed data points.
In the above diagram the first diagram (fig1) shows that our model is linear but the data distribution is polynomial. As you can see from the diagram, if we calculate prediction error /error for this diagram (fig1) we will get high prediction error/ error.
This the situation of underfitting , where we are getting high prediction error /error for distributed data. The prediction error /error will be high for both training data as well as test data (random data).
And the reason for prediction error/ error is not best fit line for distributed data points , to correct this problem, we need a polynomial regression model to get a best fit line for distributed data as our data distribution is polynomial.
In the fig2, we try a polynomial regression model to get a best fit line for the distributed data points. Let's suppose the polynomial regression we used here is polynomial of degree 2 (p=2) and get a best fit line as shown in diagram (fig2).
For this diagram (fig2) we will get low prediction error/ error as our data mostly lies on best fit line. But we can improve this model if we increase the degree of polynomial of regression model.
In fig3 we increased the degree of polynomial to get a very best fit line for our distributed data and prediction error /error for this regression model will be very less as our data points lies on best fit line.
But the problem with this regression model is the prediction error/ error is less for training data but not for test data. Meaning in this type of modeling the model perform very well when its come to training data but perform very bad with unseen data or test data or random data. This is called overfitting and model is called overfitted model.
To deal with this type of problems we have concept of regularization in machine learning.
Bias and Variance : Bias and variance both are forms of prediction error /error in machine learning.
Bias: Difference between predicted values and actual values. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high prediction error /error on training and test data.
Variance: Variance is also difference between predicted values and actual values. But model with variance pays more attention to training data and doesn't generalize on the data which it hasn't seen such as test data or random data.
Underfitting is a case of high variance and high bias.
Overfitting is a case of low bias and high variance.
Appropriate fitting is a case of low bias and and low variance which is a good regression model.
Bias and Variance Trade off :
Bias and variance trade off can be defined in terms of complexity of models. For example Low variance and high bias algorithms are less complex.
Examples : Linear regression , Naive Bayes etc..
And High variance and low bias algorithms are more complex and tend to flexibility.
Examples: Non-linear regression , decision tree , Nearest Neighbor etc...
To build a good model we need a balance between bias and variance means models shouldn't underfit and overfit and total error should be minimum.
You can see in the diagram as model complexity increases , model tends to overfitting and with low complexity model tend to underfitting. Hence bias and variance also change with model complexity. The optimal model is shown with red circle.
Concept of Regularization in machine learning