Different Methods of Feature Selection in Machine Learning

How to automate the process of selecting features

Image for post
Image for post
Image Source

In data science, there are many different approaches to building features to model complicated relationships — although, this may sometimes be troublesome. But, you will learn about the various strategies you can use, in this blog post, to use only the features that are most important to your model!

Feature selection is the approach by which you choose a subset of features specific to the design of models. This process comes with many advantages, with the most noticeable being the performance enhancement of a machine learning algorithm.

A second advantage includes decreasing computational complexity. If the number of features in a model is reduced, the simpler it becomes to compute the model’s parameters, which would also mean a reduction in the volume of data capacity needed to preserve your model’s features.

Third, another advantage includes better comprehension of your results — you can acquire more awareness of how features contribute to each other in the course of feature selection.

Unfortunately, there is no straightforward and simple solution for choosing which features to include in a model. However, there are numerous techniques that you can use in an effective way to process features. There are four broad forms of approaches to feature selection, and I will go through their respective benefits/disadvantages.

Awareness of the particular environment relevant to the dataset is one of the most critical factors when evaluating significant features. This is called domain knowledge. Some examples of domain knowledge may include reading previous study papers that have discussed related subjects, or asking key stakeholders to decide what they consider the most critical variables are for the goal variable prediction.

Filter methods are methods of feature selection conducted before ever running a model as a pre-processing stage. They operate by analyzing aspects of how variables are related to each other. Various indicators are used to decide which elements will be omitted and which will remain, based on the model that is being used. Filter methods will usually return a feature ranking that will inform you how features are arranged in comparison to each other and they would delete the variables found to be obsolete.

A typical filter approach in linear regression is to remove features that are strongly correlated with each other. Using a variance threshold is another filter approach. This approach sets a certain threshold for the requisite variance between characteristics in order to incorporate them in a model. The logic behind this is that variables will not adjust significantly if they do not have a high variance, and will thus not have much effect on the dependent variable.

The data scientist is responsible for determining the cut-off point at which the top features will be retained, and this is usually determined by cross-validation.

Using various combinations of features to train models and then measure efficiency, wrapper methods evaluate the optimum subset of features. Any subset is used to train models and then it is tested on a test set. Since wrapper methods are so time-consuming, using them for large feature sets becomes difficult. As one would expect, wrapper methods can end up being very computationally expensive, but in deciding the optimum subset, they are highly efficient.

Recursive feature elimination, which begins with all features present in a model and eliminates them one by one, is an example of a wrapper method in linear regression. Since a feature has been excluded from the model, whatever subset of features resulted in the least statistically relevant degradation of the model fit would mean which missing feature is the least useful for prediction. Forward selection is the inverse of this process, which undergoes the same process, but in reverse — it starts with a single feature and proceeds to add one feature at a time that better enhances the efficiency of the model.

Embedded methods are methods of feature selection that are used in the machine learning algorithm’s actual design. Regularization, particularly Lasso regularization, is the most popular form of embedded method, since it has the potential to automatically reduce the collection of features. Lasso regression is often alluded to as L1 Norm Regularization.

I hope this helped you better understand some of the benefits and disadvantages of the more popular methods of feature selection in machine learning.

Thank you for reading!


Aspiring Data Scientist — Recent Graduate of Flatiron School’s Online Data Science Bootcamp

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store