Dimensionality Reduction in Machine Learning

Board Infinity

Dimensionality Reduction in Machine Learning

Overview

In classification problems of machine learning sometimes there exist several factors known as features that affect the final classification. The more features, the more difficulty occurs in visualizing and operating upon the training dataset.

Seldom, there is a correlation between many of the features and therefore they become redundant. Then comes in the scene the concept of dimensionality reduction. It is the method of decreasing the count of considered random variables by finding a set of principal variables.

Scope

This article deals with:

What is dimensionality reduction?
Components of dimensionality reduction.
Methods of dimensionality reduction.

This article does not concern itself with:

ML algorithms.
Other unrelated ML concepts.

What is dimensionality reduction?

Dimensionality is the number of columns, variables, or input features in a given dataset. Dimensionality reduction is the technique of decreasing these features. A dataset may contain a large count of input features in several cases, complicating the task of predictive modeling. Since it is hard predicting or visualizing the training dataset with a huge count of features, dimensionality reduction is required.

Dimensionality reduction techniques are a method of transforming a dataset with high dimensions into one with fewer dimensions, while also making sure that it still provides similar information. Dimensionality reduction is widely used in machine learning for finding a better-fit predictive model during solving regression and classification problems.

Components of dimensionality reduction

Dimensionality reduction has the following two components:

Feature selection

In feature selection, we attempt to hunt a subset of the original features (variables) to obtain a tinier subset that can be used in modeling the problem. This can be achieved in one of the following ways:
1. Filter
2. Wrapper
3. Embedded

Feature extraction

Feature extraction decreases the amount of data in a space with a high number of dimensions to a lower dimension space. Following are a few common techniques:

Principal component analysis.
Linear discriminant analysis.
Kernel PCA.
Quadratic discriminant analysis.

Dimensionality reduction methods

The following are some of the prominent dimensionality reduction techniques:

Auto-Encoder.
Factor Analysis.
Score comparison.
Random Forest.
Missing Value Ratio.
Low Variance Filter.
High Correlation Filter.
Forward Selection.
Backward Elimination.
Principal Component Analysis.

Conclusion

The more features, the more difficulty occurs in visualizing and operating upon the training dataset.
Dimensionality reduction is the method of decreasing the count of considered random variables by finding a set of principal variables.
Dimensionality is the number of columns, variables, or input features in a given dataset.
Dimensionality reduction has the following two components: feature selection and feature extraction.

Dimensionality Reduction in Machine Learning

Board Infinity

Types of Data in Statistics

5 Most-Common Machine Learning Mistakes to Avoid as a Beginner

A-List of the Top Machine Learning Algorithms in Python

Top 5 Machine Learning Algorithms every Data Scientist should Know

The Most Important Types of Machine Learning Models

Complete Guide To Decision Tree Algorithms for Beginners /w Examples

Logistic Regression: Detailed Introduction

Naive Bayes Classifiers - Overview

Linear Discriminant Analysis in Machine Learning

Underfitting and Overfitting in Machine Learning

Decision Tree Algorithm: Explantation and Implementation

Cross Validation In Machine Learning

Dimensionality Reduction in Machine Learning

Top 5 Types of Clustering Algorithms Every Data Scientist Should Know!

Genetic Algorithm in Machine Learning

Understanding Activation Function in Neural Network

Clustered and Non-Clustered Index

DBMS vs RDMS: Key Differences

Overview

Scope

What is dimensionality reduction?

Components of dimensionality reduction

Feature selection

Feature extraction

Dimensionality reduction methods

Conclusion