Principal components are the key to PCA they represent what's underneath the hood of your data. Speeding Up a Machine Learning (ML) Algorithm: Since PCA's main idea is dimensionality reduction, you can leverage that to speed up your machine learning algorithm's training and testing time considering your data has a lot of features, and the ML algorithm's learning is too slow.Īt an abstract level, you take a dataset having many features, and you simplify that dataset by selecting a few Principal Components from original features. Hence, PCA can do that for you since it projects the data into a lower dimension, thereby allowing you to visualize the data in a 2D or 3D space with a naked eye. Considering that there are a large number of variables or dimensions along which the data is distributed, visualization can be a challenge and almost impossible. To solve a problem where data is the key, you need extensive data exploration like finding out how the variables are correlated or understanding the distribution of a few variables. Where can you apply PCA?ĭata Visualization: When working on any data related problem, the challenge in today's world is the sheer volume of data, and the variables/features that define that data. You will find them being used interchangeably. Note: Features, Dimensions, and Variables are all referring to the same thing. One important thing to note about PCA is that it is an Unsupervised dimensionality reduction technique, you can cluster the similar data points based on the feature correlation between them without any supervision (or labels), and you will learn how to achieve this practically using Python in later sections of this tutorial!Īccording to Wikipedia, PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. For example, A 28 X 28 image has 784 picture elements (pixels) that are the dimensions or features which together represent that image. It tries to preserve the essential parts that have more variation of the data and remove the non-essential parts with fewer variation.ĭimensions are nothing but features that represent the data. Principal Component Analysis (PCA) is a linear dimensionality reduction technique that can be utilized for extracting information from a high-dimensional space by projecting it into a lower-dimensional sub-space.
0 Comments
Leave a Reply. |