Sometimes the model can overfit the training data and hence underperforms with real data. To reduce complexity, we use regularization.
- Add a penalty factor on complexity i.e. number of features and values of weights.
- Lasso: Force coefficients to zero if not relevant. Ends up performing feature selection.
- Ridge: Reduces coefficients of irrelevant features but does not drop them to zero.
Links to this Evergreen Note