Principal Component Analysis for Visualization
- Get link
- X
- Other Apps
Last Updated on October 27, 2023
Principal component analysis (PCA) is an unsupervised machine learning method. Perhaps essentially the most well-liked use of principal component analysis is dimensionality low cost. Besides using PCA as a data preparation method, we’re in a position to moreover use it to help visualize data. An picture is worth a thousand phrases. With the information visualized, it is less complicated for us to get some insights and decide on the next step in our machine learning fashions.
In this tutorial, you may uncover discover ways to visualize data using PCA, along with using visualization to help determining the parameter for dimensionality low cost.
After ending this tutorial, you may know:
- How to utilize visualize a extreme dimensional data
- What is outlined variance in PCA
- Visually observe the outlined variance from the outcomes of PCA of extreme dimensional data
Let’s get started.

Principal Component Analysis for Visualization
Photo by Levan Gokadze, some rights reserved.
Tutorial Overview
This tutorial is break up into two components; they’re:
- Scatter plot of extreme dimensional data
- Visualizing the outlined variance
Prerequisites
For this tutorial, we assume that you just’re already accustomed to:
- How to Calculate Principal Component Analysis (PCA) from Scratch in Python
- Principal Component Analysis for Dimensionality Reduction in Python
Scatter plot of extreme dimensional data
Visualization is a vital step to get insights from data. We can examine from the visualization that whether or not or not a pattern could be seen and due to this fact estimate which machine learning model is suitable.
It is easy to depict points in two dimension. Normally a scatter plot with x- and y-axis are in two dimensional. Depicting points in three dimensional is a bit troublesome nevertheless not unimaginable. In matplotlib, as an example, can plot in 3D. The solely disadvantage is on paper or on show, we’re in a position to solely check out a 3D plot at one viewport or projection at a time. In matplotlib, that’s managed by the diploma of elevation and azimuth. Depicting points in 4 or 5 dimensions is unimaginable on account of we dwell in a three-dimensional world and have no idea of how points in such a extreme dimension would appear like.
This is the place a dimensionality low cost method similar to PCA comes into play. We can cut back the dimension to 2 or three so we’re in a position to visualize it. Let’s start with an occasion.
We start with the wine dataset, which is a classification dataset with 13 choices (i.e., the dataset is 13 dimensional) and three programs. There are 178 samples:
1 2 3 4 5 | from sklearn.datasets import load_wine winedata = load_wine() X, y = winedata[‘data’], winedata[‘target’] print(X.type) print(y.type) |
1 2 | (178, 13) (178,) |
Among the 13 choices, we’re in a position to decide on any two and plot with matplotlib (we color-coded the fully totally different programs using the c
argument):
1 2 3 4 | ... import matplotlib.pyplot as plt plt.scatter(X[:,1], X[:,2], c=y) plt.current() |
or we’re in a position to moreover select any three and current in 3D:
1 2 3 4 | ... ax = fig.add_subplot(projection=‘3d’) ax.scatter(X[:,1], X[:,2], X[:,3], c=y) plt.current() |
But this doesn’t reveal quite a lot of how the information looks like, on account of majority of the choices aren’t confirmed. We now resort to principal component analysis:
1 2 3 4 5 6 7 | ... from sklearn.decomposition import PCA pca = PCA() Xt = pca.fit_transform(X) plot = plt.scatter(Xt[:,0], Xt[:,1], c=y) plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’])) plt.current() |
Here we rework the enter data X
by PCA into Xt
. We ponder solely the first two columns, which comprise primarily essentially the most data, and plot it in two dimensional. We can see that the purple class is form of distinctive, nevertheless there’s nonetheless some overlap. If we scale the information sooner than PCA, the consequence could possibly be fully totally different:
1 2 3 4 5 6 7 8 9 | ... from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline pca = PCA() pipe = Pipeline([(‘scaler’, StandardScaler()), (‘pca’, pca)]) Xt = pipe.fit_transform(X) plot = plt.scatter(Xt[:,0], Xt[:,1], c=y) plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’])) plt.current() |
Because PCA is delicate to the scale, if we normalized each operate by StandardScaler
we’re in a position to see a larger consequence. Here the fully totally different programs are additional distinctive. By this plot, we’re assured {{that a}} straightforward model similar to SVM can classify this dataset in extreme accuracy.
Putting these collectively, the following is the entire code to generate the visualizations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | from sklearn.datasets import load_wine from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline import matplotlib.pyplot as plt # Load dataset winedata = load_wine() X, y = winedata[‘data’], winedata[‘target’] print(“X type:”, X.type) print(“y type:”, y.type) # Show any two choices plt.decide(figsize=(8,6)) plt.scatter(X[:,1], X[:,2], c=y) plt.xlabel(winedata[“feature_names”][1]) plt.ylabel(winedata[“feature_names”][2]) plt.title(“Two express choices of the wine dataset”) plt.current() # Show any three choices fig = plt.decide(figsize=(10,8)) ax = fig.add_subplot(projection=‘3d’) ax.scatter(X[:,1], X[:,2], X[:,3], c=y) ax.set_xlabel(winedata[“feature_names”][1]) ax.set_ylabel(winedata[“feature_names”][2]) ax.set_zlabel(winedata[“feature_names”][3]) ax.set_title(“Three express choices of the wine dataset”) plt.current() # Show first two principal components with out scaler pca = PCA() plt.decide(figsize=(8,6)) Xt = pca.fit_transform(X) plot = plt.scatter(Xt[:,0], Xt[:,1], c=y) plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’])) plt.xlabel(“PC1”) plt.ylabel(“PC2”) plt.title(“First two principal components”) plt.current() # Show first two principal components with scaler pca = PCA() pipe = Pipeline([(‘scaler’, StandardScaler()), (‘pca’, pca)]) plt.decide(figsize=(8,6)) Xt = pipe.fit_transform(X) plot = plt.scatter(Xt[:,0], Xt[:,1], c=y) plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’])) plt.xlabel(“PC1”) plt.ylabel(“PC2”) plt.title(“First two principal components after scaling”) plt.current() |
If we apply the similar methodology on a definite dataset, similar to MINST handwritten digits, the scatterplot is simply not exhibiting distinctive boundary and as a result of this truth it needs a additional troublesome model similar to neural neighborhood to classify:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from sklearn.datasets import load_digits from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline import matplotlib.pyplot as plt digitsdata = load_digits() X, y = digitsdata[‘data’], digitsdata[‘target’] pca = PCA() pipe = Pipeline([(‘scaler’, StandardScaler()), (‘pca’, pca)]) plt.decide(figsize=(8,6)) Xt = pipe.fit_transform(X) plot = plt.scatter(Xt[:,0], Xt[:,1], c=y) plt.legend(handles=plot.legend_elements()[0], labels=itemizing(digitsdata[‘target_names’])) plt.current() |
Visualizing the outlined variance
PCA in essence is to rearrange the choices by their linear combos. Hence it is referred to as a operate extraction method. One attribute of PCA is that the first principal component holds primarily essentially the most particulars concerning the dataset. The second principal component is additional informative than the third, and so forth.
To illustrate this idea, we’re in a position to remove the principal components from the distinctive dataset in steps and see how the dataset looks like. Let’s ponder a dataset with fewer choices, and current two choices in a plot:
1 2 3 4 5 6 | from sklearn.datasets import load_iris irisdata = load_iris() X, y = irisdata[‘data’], irisdata[‘target’] plt.decide(figsize=(8,6)) plt.scatter(X[:,0], X[:,1], c=y) plt.current() |
This is the iris dataset which has solely 4 choices. The choices are in comparable scales and due to this fact we’re in a position to skip the scaler. With a 4-features data, the PCA can produce at most 4 principal components:
1 2 3 4 | [[ 0.36138659 -0.08452251 0.85667061 0.3582892 ] [ 0.65658877 0.73016143 -0.17337266 -0.07548102] [-0.58202985 0.59791083 0.07623608 0.54583143] [-0.31548719 0.3197231 0.47983899 -0.75365743]] |
For occasion, the first row is the first principal axis on which the first principal component is created. For any data degree $p$ with choices $p=(a,b,c,d)$, given that principal axis is denoted by the vector $v=(0.36,-0.08,0.86,0.36)$, the first principal component of this data degree has the price $0.36 events a – 0.08 events b + 0.86 events c + 0.36times d$ on the principal axis. Using vector dot product, this price could be denoted by
$$
p cdot v
$$
Therefore, with the dataset $X$ as a 150 $events$ 4 matrix (150 data components, each has 4 choices), we’re in a position to map each data degree into to the price on this principal axis by matrix-vector multiplication:
$$
X cdot v
$$
and the result is a vector of dimension 150. Now if we take away from each data degree the corresponding price alongside the principal axis vector, which may be
$$
X – (X cdot v) cdot v^T
$$
the place the transposed vector $v^T$ is a row and $Xcdot v$ is a column. The product $(X cdot v) cdot v^T$ follows matrix-matrix multiplication and the result is a $150times 4$ matrix, similar dimension as $X$.
If we plot the first two operate of $(X cdot v) cdot v^T$, it looks like this:
1 2 3 4 5 6 7 8 | ... # Remove PC1 Xmean = X – X.suggest(axis=0) price = Xmean @ pca.components_[0] pc1 = price.reshape(–1,1) @ pca.components_[0].reshape(1,–1) Xremove = X – pc1 plt.scatter(Xremove[:,0], Xremove[:,1], c=y) plt.current() |
The numpy array Xmean
is to shift the choices of X
to centered at zero. This is required for PCA. Then the array price
is computed by matrix-vector multiplication.
The array price
is the magnitude of each data degree mapped on the principal axis. So if we multiply this price to the principal axis vector we get once more an array pc1
. Removing this from the distinctive dataset X
, we get a model new array Xremove
. In the plot we seen that the components on the scatter plot crumbled collectively and the cluster of each class is way much less distinctive than sooner than. This means we eradicated numerous data by eradicating the first principal component. If we repeat the similar course of as soon as extra, the components are further crumbled:
1 2 3 4 5 6 7 | ... # Remove PC2 price = Xmean @ pca.components_[1] pc2 = price.reshape(–1,1) @ pca.components_[1].reshape(1,–1) Xremove = Xremove – pc2 plt.scatter(Xremove[:,0], Xremove[:,1], c=y) plt.current() |
This looks like a straight line nevertheless really not. If we repeat as quickly as additional, all components collapse proper right into a straight line:
1 2 3 4 5 6 7 | ... # Remove PC3 price = Xmean @ pca.components_[2] pc3 = price.reshape(–1,1) @ pca.components_[2].reshape(1,–1) Xremove = Xremove – pc3 plt.scatter(Xremove[:,0], Xremove[:,1], c=y) plt.current() |
The components all fall on a straight line on account of we eradicated three principal components from the information the place there are solely 4 choices. Hence our data matrix turns into rank 1. You can try repeat as quickly as additional this course of and the consequence could possibly be all components collapse proper right into a single degree. The amount of information eradicated in each step as we eradicated the principal components could be found by the corresponding outlined variance ratio from the PCA:
1 2 | ... print(pca.explained_variance_ratio_) |
1 | [0.92461872 0.05306648 0.01710261 0.00521218] |
Here we’re in a position to see, the first component outlined 92.5% variance and the second component outlined 5.3% variance. If we eradicated the first two principal components, the remaining variance is barely 2.2%, due to this fact visually the plot after eradicating two components looks like a straight line. In reality, as soon as we study with the plots above, not solely we see the components are crumbled, nevertheless the fluctuate throughout the x- and y-axes are moreover smaller as we eradicated the weather.
In phrases of machine learning, we’re in a position to consider utilizing only one single operate for classification on this dataset, particularly the first principal component. We must rely on to comprehend a minimal of 90% of the distinctive accuracy as using the full set of choices:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ... from sklearn.model_selection import train_test_split from sklearn.metrics import f1_score from collections import Counter from sklearn.svm import SVC X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) clf = SVC(kernel=“linear”, gamma=‘auto’).match(X_train, y_train) print(“Using all choices, accuracy: “, clf.ranking(X_test, y_test)) print(“Using all choices, F1: “, f1_score(y_test, clf.predict(X_test), widespread=“macro”)) suggest = X_train.suggest(axis=0) X_train2 = X_train – suggest X_train2 = (X_train2 @ pca.components_[0]).reshape(–1,1) clf = SVC(kernel=“linear”, gamma=‘auto’).match(X_train2, y_train) X_test2 = X_test – suggest X_test2 = (X_test2 @ pca.components_[0]).reshape(–1,1) print(“Using PC1, accuracy: “, clf.ranking(X_test2, y_test)) print(“Using PC1, F1: “, f1_score(y_test, clf.predict(X_test2), widespread=“macro”)) |
1 2 3 4 | Using all choices, accuracy: 1.0 Using all choices, F1: 1.0 Using PC1, accuracy: 0.96 Using PC1, F1: 0.9645191409897292 |
The totally different use of the outlined variance is on compression. Given the outlined variance of the first principal component is huge, if we’ve got to retailer the dataset, we’re in a position to retailer solely the the projected values on the first principal axis ($Xcdot v$), along with the vector $v$ of the principal axis. Then we’re in a position to roughly reproduce the distinctive dataset by multiplying them:
$$
X approx (Xcdot v) cdot v^T
$$
In this fashion, we wish storage for only one price per data degree instead of 4 values for 4 choices. The approximation is additional right if we retailer the projected values on quite a lot of principal axes and add up quite a lot of principal components.
Putting these collectively, the following is the entire code to generate the visualizations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.decomposition import PCA from sklearn.metrics import f1_score from sklearn.svm import SVC import matplotlib.pyplot as plt # Load iris dataset irisdata = load_iris() X, y = irisdata[‘data’], irisdata[‘target’] plt.decide(figsize=(8,6)) plt.scatter(X[:,0], X[:,1], c=y) plt.xlabel(irisdata[“feature_names”][0]) plt.ylabel(irisdata[“feature_names”][1]) plt.title(“Two choices from the iris dataset”) plt.current() # Show the principal components pca = PCA().match(X) print(“Principal components:”) print(pca.components_) # Remove PC1 Xmean = X – X.suggest(axis=0) price = Xmean @ pca.components_[0] pc1 = price.reshape(–1,1) @ pca.components_[0].reshape(1,–1) Xremove = X – pc1 plt.decide(figsize=(8,6)) plt.scatter(Xremove[:,0], Xremove[:,1], c=y) plt.xlabel(irisdata[“feature_names”][0]) plt.ylabel(irisdata[“feature_names”][1]) plt.title(“Two choices from the iris dataset after eradicating PC1”) plt.current() # Remove PC2 Xmean = X – X.suggest(axis=0) price = Xmean @ pca.components_[1] pc2 = price.reshape(–1,1) @ pca.components_[1].reshape(1,–1) Xremove = Xremove – pc2 plt.decide(figsize=(8,6)) plt.scatter(Xremove[:,0], Xremove[:,1], c=y) plt.xlabel(irisdata[“feature_names”][0]) plt.ylabel(irisdata[“feature_names”][1]) plt.title(“Two choices from the iris dataset after eradicating PC1 and PC2”) plt.current() # Remove PC3 Xmean = X – X.suggest(axis=0) price = Xmean @ pca.components_[2] pc3 = price.reshape(–1,1) @ pca.components_[2].reshape(1,–1) Xremove = Xremove – pc3 plt.decide(figsize=(8,6)) plt.scatter(Xremove[:,0], Xremove[:,1], c=y) plt.xlabel(irisdata[“feature_names”][0]) plt.ylabel(irisdata[“feature_names”][1]) plt.title(“Two choices from the iris dataset after eradicating PC1 to PC3”) plt.current() # Print the outlined variance ratio print(“Explainedd variance ratios:”) print(pca.explained_variance_ratio_) # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) # Run classifer on all choices clf = SVC(kernel=“linear”, gamma=‘auto’).match(X_train, y_train) print(“Using all choices, accuracy: “, clf.ranking(X_test, y_test)) print(“Using all choices, F1: “, f1_score(y_test, clf.predict(X_test), widespread=“macro”)) # Run classifier on PC1 suggest = X_train.suggest(axis=0) X_train2 = X_train – suggest X_train2 = (X_train2 @ pca.components_[0]).reshape(–1,1) clf = SVC(kernel=“linear”, gamma=‘auto’).match(X_train2, y_train) X_test2 = X_test – suggest X_test2 = (X_test2 @ pca.components_[0]).reshape(–1,1) print(“Using PC1, accuracy: “, clf.ranking(X_test2, y_test)) print(“Using PC1, F1: “, f1_score(y_test, clf.predict(X_test2), widespread=“macro”)) |
Further learning
This half provides additional property on the topic if you happen to’re making an attempt to go deeper.
Books
Tutorials
- How to Calculate Principal Component Analysis (PCA) from Scratch in Python
- Principal Component Analysis for Dimensionality Reduction in Python
APIs
- scikit-learn toy datasets
- scikit-learn iris dataset
- scikit-learn wine dataset
- matplotlib scatter API
- The mplot3d toolkit
Summary
In this tutorial, you discovered discover ways to visualize data using principal component analysis.
Specifically, you found:
- Visualize a extreme dimensional dataset in 2D using PCA
- How to utilize the plot in PCA dimensions to help choosing a suitable machine learning model
- How to observe the outlined variance ratio of PCA
- What the outlined variance ratio means for machine learning
Get a Handle on Linear Algebra for Machine Learning!
Develop a working understand of linear algebra
…by writing traces of code in python
Discover how in my new Ebook:
Linear Algebra for Machine Learning
It provides self-study tutorials on topics like:
Vector Norms, Matrix Multiplication, Tensors, Eigendecomposition, SVD, PCA and far more…
Finally Understand the Mathematics of Data
Skip the Academics. Just Results.
See What’s Inside
How to Calculate Principal Component Analysis (PCA)…
Face Recognition using Principal Component Analysis
Principal Component Analysis for Dimensionality…
6 Dimensionality Reduction Algorithms With Python
How to Visualize Filters and Feature Maps in…
Better Understand Your Data in R Using Visualization…
- Get link
- X
- Other Apps
Comments
Post a Comment