Principal Component Analysis for Visualization

Last Updated on October 27, 2023

Principal component analysis (PCA) is an unsupervised machine learning method. Perhaps essentially the most well-liked use of principal component analysis is dimensionality low cost. Besides using PCA as a data preparation method, we’re in a position to moreover use it to help visualize data. An picture is worth a thousand phrases. With the information visualized, it is less complicated for us to get some insights and decide on the next step in our machine learning fashions.

In this tutorial, you may uncover discover ways to visualize data using PCA, along with using visualization to help determining the parameter for dimensionality low cost.

After ending this tutorial, you may know:

How to utilize visualize a extreme dimensional data
What is outlined variance in PCA
Visually observe the outlined variance from the outcomes of PCA of extreme dimensional data

Let’s get started.

Principal Component Analysis for Visualization
Photo by Levan Gokadze, some rights reserved.

Tutorial Overview

This tutorial is break up into two components; they’re:

Scatter plot of extreme dimensional data
Visualizing the outlined variance

Prerequisites

For this tutorial, we assume that you just’re already accustomed to:

How to Calculate Principal Component Analysis (PCA) from Scratch in Python
Principal Component Analysis for Dimensionality Reduction in Python

Scatter plot of extreme dimensional data

Visualization is a vital step to get insights from data. We can examine from the visualization that whether or not or not a pattern could be seen and due to this fact estimate which machine learning model is suitable.

It is easy to depict points in two dimension. Normally a scatter plot with x- and y-axis are in two dimensional. Depicting points in three dimensional is a bit troublesome nevertheless not unimaginable. In matplotlib, as an example, can plot in 3D. The solely disadvantage is on paper or on show, we’re in a position to solely check out a 3D plot at one viewport or projection at a time. In matplotlib, that’s managed by the diploma of elevation and azimuth. Depicting points in 4 or 5 dimensions is unimaginable on account of we dwell in a three-dimensional world and have no idea of how points in such a extreme dimension would appear like.

This is the place a dimensionality low cost method similar to PCA comes into play. We can cut back the dimension to 2 or three so we’re in a position to visualize it. Let’s start with an occasion.

We start with the wine dataset, which is a classification dataset with 13 choices (i.e., the dataset is 13 dimensional) and three programs. There are 178 samples:

from sklearn.datasets import load_wine<br />winedata = load_wine()<br />X, y = winedata[‘data’], winedata[‘target’]<br />print(X.type)<br />print(y.type)

from sklearn.datasets import load_wine

winedata = load_wine()

X, y = winedata[‘data’], winedata[‘target’]

print(X.type)

print(y.type)

(178, 13)<br />(178,)

1 2	(178, 13) (178,)

Among the 13 choices, we’re in a position to decide on any two and plot with matplotlib (we color-coded the fully totally different programs using the c argument):

…<br />import matplotlib.pyplot as plt<br />plt.scatter(X[:,1], X[:,2], c=y)<br />plt.current()

...

import matplotlib.pyplot as plt

plt.scatter(X[:,1], X[:,2], c=y)

plt.current()

or we’re in a position to moreover select any three and current in 3D:

…<br />ax = fig.add_subplot(projection=’3d’)<br />ax.scatter(X[:,1], X[:,2], X[:,3], c=y)<br />plt.current()

...

ax = fig.add_subplot(projection=‘3d’)

ax.scatter(X[:,1], X[:,2], X[:,3], c=y)

plt.current()

But this doesn’t reveal quite a lot of how the information looks like, on account of majority of the choices aren’t confirmed. We now resort to principal component analysis:

…<br />from sklearn.decomposition import PCA<br />pca = PCA()<br />Xt = pca.fit_transform(X)<br />plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)<br />plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’]))<br />plt.current()

...

from sklearn.decomposition import PCA

pca = PCA()

Xt = pca.fit_transform(X)

plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)

plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’]))

plt.current()

Here we rework the enter data X by PCA into Xt. We ponder solely the first two columns, which comprise primarily essentially the most data, and plot it in two dimensional. We can see that the purple class is form of distinctive, nevertheless there’s nonetheless some overlap. If we scale the information sooner than PCA, the consequence could possibly be fully totally different:

…<br />from sklearn.preprocessing import StandardScaler<br />from sklearn.pipeline import Pipeline<br />pca = PCA()<br />pipe = Pipeline([(‘scaler’, StandardScaler()), (‘pca’, pca)])<br />Xt = pipe.fit_transform(X)<br />plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)<br />plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’]))<br />plt.current()

...

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

pca = PCA()

pipe = Pipeline([(‘scaler’, StandardScaler()), (‘pca’, pca)])

Xt = pipe.fit_transform(X)

plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)

plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’]))

plt.current()

Because PCA is delicate to the scale, if we normalized each operate by StandardScaler we’re in a position to see a larger consequence. Here the fully totally different programs are additional distinctive. By this plot, we’re assured {{that a}} straightforward model similar to SVM can classify this dataset in extreme accuracy.

Putting these collectively, the following is the entire code to generate the visualizations:

from sklearn.datasets import load_wine<br />from sklearn.decomposition import PCA<br />from sklearn.preprocessing import StandardScaler<br />from sklearn.pipeline import Pipeline<br />import matplotlib.pyplot as plt</p><p># Load dataset<br />winedata = load_wine()<br />X, y = winedata[‘data’], winedata[‘target’]<br />print(“X type:”, X.type)<br />print(“y type:”, y.type)</p><p># Show any two choices<br />plt.decide(figsize=(8,6))<br />plt.scatter(X[:,1], X[:,2], c=y)<br />plt.xlabel(winedata[“feature_names”][1])<br />plt.ylabel(winedata[“feature_names”][2])<br />plt.title(“Two express choices of the wine dataset”)<br />plt.current()</p><p># Show any three choices<br />fig = plt.decide(figsize=(10,8))<br />ax = fig.add_subplot(projection=’3d’)<br />ax.scatter(X[:,1], X[:,2], X[:,3], c=y)<br />ax.set_xlabel(winedata[“feature_names”][1])<br />ax.set_ylabel(winedata[“feature_names”][2])<br />ax.set_zlabel(winedata[“feature_names”][3])<br />ax.set_title(“Three express choices of the wine dataset”)<br />plt.current()</p><p># Show first two principal components with out scaler<br />pca = PCA()<br />plt.decide(figsize=(8,6))<br />Xt = pca.fit_transform(X)<br />plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)<br />plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’]))<br />plt.xlabel(“PC1”)<br />plt.ylabel(“PC2”)<br />plt.title(“First two principal components”)<br />plt.current()</p><p># Show first two principal components with scaler<br />pca = PCA()<br />pipe = Pipeline([(‘scaler’, StandardScaler()), (‘pca’, pca)])<br />plt.decide(figsize=(8,6))<br />Xt = pipe.fit_transform(X)<br />plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)<br />plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’]))<br />plt.xlabel(“PC1”)<br />plt.ylabel(“PC2”)<br />plt.title(“First two principal components after scaling”)<br />plt.current()

from sklearn.datasets import load_wine

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

import matplotlib.pyplot as plt

# Load dataset

winedata = load_wine()

X, y = winedata[‘data’], winedata[‘target’]

print(“X type:”, X.type)

print(“y type:”, y.type)

# Show any two choices

plt.decide(figsize=(8,6))

plt.scatter(X[:,1], X[:,2], c=y)

plt.xlabel(winedata[“feature_names”][1])

plt.ylabel(winedata[“feature_names”][2])

plt.title(“Two express choices of the wine dataset”)

plt.current()

# Show any three choices

fig = plt.decide(figsize=(10,8))

ax = fig.add_subplot(projection=‘3d’)

ax.scatter(X[:,1], X[:,2], X[:,3], c=y)

ax.set_xlabel(winedata[“feature_names”][1])

ax.set_ylabel(winedata[“feature_names”][2])

ax.set_zlabel(winedata[“feature_names”][3])

ax.set_title(“Three express choices of the wine dataset”)

plt.current()

# Show first two principal components with out scaler

pca = PCA()

plt.decide(figsize=(8,6))

Xt = pca.fit_transform(X)

plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)

plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’]))

plt.xlabel(“PC1”)

plt.ylabel(“PC2”)

plt.title(“First two principal components”)

plt.current()

# Show first two principal components with scaler

pca = PCA()

pipe = Pipeline([(‘scaler’, StandardScaler()), (‘pca’, pca)])

plt.decide(figsize=(8,6))

Xt = pipe.fit_transform(X)

plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)

plt.legend(handles=plot.legend_elements()[0], labels=itemizing(winedata[‘target_names’]))

plt.xlabel(“PC1”)

plt.ylabel(“PC2”)

plt.title(“First two principal components after scaling”)

plt.current()

If we apply the similar methodology on a definite dataset, similar to MINST handwritten digits, the scatterplot is simply not exhibiting distinctive boundary and as a result of this truth it needs a additional troublesome model similar to neural neighborhood to classify:

from sklearn.datasets import load_digits<br />from sklearn.decomposition import PCA<br />from sklearn.preprocessing import StandardScaler<br />from sklearn.pipeline import Pipeline<br />import matplotlib.pyplot as plt</p><p>digitsdata = load_digits()<br />X, y = digitsdata[‘data’], digitsdata[‘target’]<br />pca = PCA()<br />pipe = Pipeline([(‘scaler’, StandardScaler()), (‘pca’, pca)])<br />plt.decide(figsize=(8,6))<br />Xt = pipe.fit_transform(X)<br />plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)<br />plt.legend(handles=plot.legend_elements()[0], labels=itemizing(digitsdata[‘target_names’]))<br />plt.current()

from sklearn.datasets import load_digits

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

import matplotlib.pyplot as plt

digitsdata = load_digits()

X, y = digitsdata[‘data’], digitsdata[‘target’]

pca = PCA()

pipe = Pipeline([(‘scaler’, StandardScaler()), (‘pca’, pca)])

plt.decide(figsize=(8,6))

Xt = pipe.fit_transform(X)

plot = plt.scatter(Xt[:,0], Xt[:,1], c=y)

plt.legend(handles=plot.legend_elements()[0], labels=itemizing(digitsdata[‘target_names’]))

plt.current()

Visualizing the outlined variance

PCA in essence is to rearrange the choices by their linear combos. Hence it is referred to as a operate extraction method. One attribute of PCA is that the first principal component holds primarily essentially the most particulars concerning the dataset. The second principal component is additional informative than the third, and so forth.

To illustrate this idea, we’re in a position to remove the principal components from the distinctive dataset in steps and see how the dataset looks like. Let’s ponder a dataset with fewer choices, and current two choices in a plot:

from sklearn.datasets import load_iris<br />irisdata = load_iris()<br />X, y = irisdata[‘data’], irisdata[‘target’]<br />plt.decide(figsize=(8,6))<br />plt.scatter(X[:,0], X[:,1], c=y)<br />plt.current()

from sklearn.datasets import load_iris

irisdata = load_iris()

X, y = irisdata[‘data’], irisdata[‘target’]

plt.decide(figsize=(8,6))

plt.scatter(X[:,0], X[:,1], c=y)

plt.current()

This is the iris dataset which has solely 4 choices. The choices are in comparable scales and due to this fact we’re in a position to skip the scaler. With a 4-features data, the PCA can produce at most 4 principal components:

[[ 0.36138659 -0.08452251  0.85667061  0.3582892 ]<br /> [ 0.65658877  0.73016143 -0.17337266 -0.07548102]<br /> [-0.58202985  0.59791083  0.07623608  0.54583143]<br /> [-0.31548719  0.3197231   0.47983899 -0.75365743]]

[[ 0.36138659 -0.08452251 0.85667061 0.3582892 ]

[ 0.65658877 0.73016143 -0.17337266 -0.07548102]

[-0.58202985 0.59791083 0.07623608 0.54583143]

[-0.31548719 0.3197231 0.47983899 -0.75365743]]

For occasion, the first row is the first principal axis on which the first principal component is created. For any data degree $p$ with choices $p=(a,b,c,d)$, given that principal axis is denoted by the vector $v=(0.36,-0.08,0.86,0.36)$, the first principal component of this data degree has the price $0.36 events a – 0.08 events b + 0.86 events c + 0.36times d$ on the principal axis. Using vector dot product, this price could be denoted by
$$
p cdot v
$$
Therefore, with the dataset $X$ as a 150 $events$ 4 matrix (150 data components, each has 4 choices), we’re in a position to map each data degree into to the price on this principal axis by matrix-vector multiplication:
$$
X cdot v
$$
and the result is a vector of dimension 150. Now if we take away from each data degree the corresponding price alongside the principal axis vector, which may be
$$
X – (X cdot v) cdot v^T
$$
the place the transposed vector $v^T$ is a row and $Xcdot v$ is a column. The product $(X cdot v) cdot v^T$ follows matrix-matrix multiplication and the result is a $150times 4$ matrix, similar dimension as $X$.

If we plot the first two operate of $(X cdot v) cdot v^T$, it looks like this:

…<br /># Remove PC1<br />Xmean = X – X.suggest(axis=0)<br />price = Xmean @ pca.components_[0]<br />pc1 = price.reshape(-1,1) @ pca.components_[0].reshape(1,-1)<br />Xremove = X – pc1<br />plt.scatter(Xremove[:,0], Xremove[:,1], c=y)<br />plt.current()

...

# Remove PC1

Xmean = X – X.suggest(axis=0)

price = Xmean @ pca.components_[0]

pc1 = price.reshape(–1,1) @ pca.components_[0].reshape(1,–1)

Xremove = X – pc1

plt.scatter(Xremove[:,0], Xremove[:,1], c=y)

plt.current()

The numpy array Xmean is to shift the choices of X to centered at zero. This is required for PCA. Then the array price is computed by matrix-vector multiplication.
The array price is the magnitude of each data degree mapped on the principal axis. So if we multiply this price to the principal axis vector we get once more an array pc1. Removing this from the distinctive dataset X, we get a model new array Xremove. In the plot we seen that the components on the scatter plot crumbled collectively and the cluster of each class is way much less distinctive than sooner than. This means we eradicated numerous data by eradicating the first principal component. If we repeat the similar course of as soon as extra, the components are further crumbled:

…<br /># Remove PC2<br />price = Xmean @ pca.components_[1]<br />pc2 = price.reshape(-1,1) @ pca.components_[1].reshape(1,-1)<br />Xremove = Xremove – pc2<br />plt.scatter(Xremove[:,0], Xremove[:,1], c=y)<br />plt.current()

...

# Remove PC2

price = Xmean @ pca.components_[1]

pc2 = price.reshape(–1,1) @ pca.components_[1].reshape(1,–1)

Xremove = Xremove – pc2

plt.scatter(Xremove[:,0], Xremove[:,1], c=y)

plt.current()

This looks like a straight line nevertheless really not. If we repeat as quickly as additional, all components collapse proper right into a straight line:

…<br /># Remove PC3<br />price = Xmean @ pca.components_[2]<br />pc3 = price.reshape(-1,1) @ pca.components_[2].reshape(1,-1)<br />Xremove = Xremove – pc3<br />plt.scatter(Xremove[:,0], Xremove[:,1], c=y)<br />plt.current()

...

# Remove PC3

price = Xmean @ pca.components_[2]

pc3 = price.reshape(–1,1) @ pca.components_[2].reshape(1,–1)

Xremove = Xremove – pc3

plt.scatter(Xremove[:,0], Xremove[:,1], c=y)

plt.current()

The components all fall on a straight line on account of we eradicated three principal components from the information the place there are solely 4 choices. Hence our data matrix turns into rank 1. You can try repeat as quickly as additional this course of and the consequence could possibly be all components collapse proper right into a single degree. The amount of information eradicated in each step as we eradicated the principal components could be found by the corresponding outlined variance ratio from the PCA:

…<br />print(pca.explained_variance_ratio_)

1 2	... print(pca.explained_variance_ratio_)

[0.92461872 0.05306648 0.01710261 0.00521218]

1	[0.92461872 0.05306648 0.01710261 0.00521218]

Here we’re in a position to see, the first component outlined 92.5% variance and the second component outlined 5.3% variance. If we eradicated the first two principal components, the remaining variance is barely 2.2%, due to this fact visually the plot after eradicating two components looks like a straight line. In reality, as soon as we study with the plots above, not solely we see the components are crumbled, nevertheless the fluctuate throughout the x- and y-axes are moreover smaller as we eradicated the weather.

In phrases of machine learning, we’re in a position to consider utilizing only one single operate for classification on this dataset, particularly the first principal component. We must rely on to comprehend a minimal of 90% of the distinctive accuracy as using the full set of choices:

…<br />from sklearn.model_selection import train_test_split<br />from sklearn.metrics import f1_score<br />from collections import Counter<br />from sklearn.svm import SVC</p><p>X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)<br />clf = SVC(kernel=”linear”, gamma=”auto”).match(X_train, y_train)<br />print(“Using all choices, accuracy: “, clf.ranking(X_test, y_test))<br />print(“Using all choices, F1: “, f1_score(y_test, clf.predict(X_test), widespread=”macro”))</p><p>suggest = X_train.suggest(axis=0)<br />X_train2 = X_train – suggest<br />X_train2 = (X_train2 @ pca.components_[0]).reshape(-1,1)<br />clf = SVC(kernel=”linear”, gamma=”auto”).match(X_train2, y_train)<br />X_test2 = X_test – suggest<br />X_test2 = (X_test2 @ pca.components_[0]).reshape(-1,1)<br />print(“Using PC1, accuracy: “, clf.ranking(X_test2, y_test))<br />print(“Using PC1, F1: “, f1_score(y_test, clf.predict(X_test2), widespread=”macro”))

...

from sklearn.model_selection import train_test_split

from sklearn.metrics import f1_score

from collections import Counter

from sklearn.svm import SVC

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

clf = SVC(kernel=“linear”, gamma=‘auto’).match(X_train, y_train)

print(“Using all choices, accuracy: “, clf.ranking(X_test, y_test))

print(“Using all choices, F1: “, f1_score(y_test, clf.predict(X_test), widespread=“macro”))

suggest = X_train.suggest(axis=0)

X_train2 = X_train – suggest

X_train2 = (X_train2 @ pca.components_[0]).reshape(–1,1)

clf = SVC(kernel=“linear”, gamma=‘auto’).match(X_train2, y_train)

X_test2 = X_test – suggest

X_test2 = (X_test2 @ pca.components_[0]).reshape(–1,1)

print(“Using PC1, accuracy: “, clf.ranking(X_test2, y_test))

print(“Using PC1, F1: “, f1_score(y_test, clf.predict(X_test2), widespread=“macro”))

Using all choices, accuracy:  1.0<br />Using all choices, F1:  1.0<br />Using PC1, accuracy:  0.96<br />Using PC1, F1:  0.9645191409897292

Using all choices, accuracy: 1.0

Using all choices, F1: 1.0

Using PC1, accuracy: 0.96

Using PC1, F1: 0.9645191409897292

The totally different use of the outlined variance is on compression. Given the outlined variance of the first principal component is huge, if we’ve got to retailer the dataset, we’re in a position to retailer solely the the projected values on the first principal axis ($Xcdot v$), along with the vector $v$ of the principal axis. Then we’re in a position to roughly reproduce the distinctive dataset by multiplying them:
$$
X approx (Xcdot v) cdot v^T
$$
In this fashion, we wish storage for only one price per data degree instead of 4 values for 4 choices. The approximation is additional right if we retailer the projected values on quite a lot of principal axes and add up quite a lot of principal components.

Putting these collectively, the following is the entire code to generate the visualizations:

from sklearn.datasets import load_iris<br />from sklearn.model_selection import train_test_split<br />from sklearn.decomposition import PCA<br />from sklearn.metrics import f1_score<br />from sklearn.svm import SVC<br />import matplotlib.pyplot as plt</p><p># Load iris dataset<br />irisdata = load_iris()<br />X, y = irisdata[‘data’], irisdata[‘target’]<br />plt.decide(figsize=(8,6))<br />plt.scatter(X[:,0], X[:,1], c=y)<br />plt.xlabel(irisdata[“feature_names”][0])<br />plt.ylabel(irisdata[“feature_names”][1])<br />plt.title(“Two choices from the iris dataset”)<br />plt.current()</p><p># Show the principal components<br />pca = PCA().match(X)<br />print(“Principal components:”)<br />print(pca.components_)</p><p># Remove PC1<br />Xmean = X – X.suggest(axis=0)<br />price = Xmean @ pca.components_[0]<br />pc1 = price.reshape(-1,1) @ pca.components_[0].reshape(1,-1)<br />Xremove = X – pc1<br />plt.decide(figsize=(8,6))<br />plt.scatter(Xremove[:,0], Xremove[:,1], c=y)<br />plt.xlabel(irisdata[“feature_names”][0])<br />plt.ylabel(irisdata[“feature_names”][1])<br />plt.title(“Two choices from the iris dataset after eradicating PC1”)<br />plt.current()</p><p># Remove PC2<br />Xmean = X – X.suggest(axis=0)<br />price = Xmean @ pca.components_[1]<br />pc2 = price.reshape(-1,1) @ pca.components_[1].reshape(1,-1)<br />Xremove = Xremove – pc2<br />plt.decide(figsize=(8,6))<br />plt.scatter(Xremove[:,0], Xremove[:,1], c=y)<br />plt.xlabel(irisdata[“feature_names”][0])<br />plt.ylabel(irisdata[“feature_names”][1])<br />plt.title(“Two choices from the iris dataset after eradicating PC1 and PC2”)<br />plt.current()</p><p># Remove PC3<br />Xmean = X – X.suggest(axis=0)<br />price = Xmean @ pca.components_[2]<br />pc3 = price.reshape(-1,1) @ pca.components_[2].reshape(1,-1)<br />Xremove = Xremove – pc3<br />plt.decide(figsize=(8,6))<br />plt.scatter(Xremove[:,0], Xremove[:,1], c=y)<br />plt.xlabel(irisdata[“feature_names”][0])<br />plt.ylabel(irisdata[“feature_names”][1])<br />plt.title(“Two choices from the iris dataset after eradicating PC1 to PC3”)<br />plt.current()</p><p># Print the outlined variance ratio<br />print(“Explainedd variance ratios:”)<br />print(pca.explained_variance_ratio_)</p><p># Split data<br />X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)</p><p># Run classifer on all choices<br />clf = SVC(kernel=”linear”, gamma=”auto”).match(X_train, y_train)<br />print(“Using all choices, accuracy: “, clf.ranking(X_test, y_test))<br />print(“Using all choices, F1: “, f1_score(y_test, clf.predict(X_test), widespread=”macro”))</p><p># Run classifier on PC1<br />suggest = X_train.suggest(axis=0)<br />X_train2 = X_train – suggest<br />X_train2 = (X_train2 @ pca.components_[0]).reshape(-1,1)<br />clf = SVC(kernel=”linear”, gamma=”auto”).match(X_train2, y_train)<br />X_test2 = X_test – suggest<br />X_test2 = (X_test2 @ pca.components_[0]).reshape(-1,1)<br />print(“Using PC1, accuracy: “, clf.ranking(X_test2, y_test))<br />print(“Using PC1, F1: “, f1_score(y_test, clf.predict(X_test2), widespread=”macro”))

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.decomposition import PCA

from sklearn.metrics import f1_score

from sklearn.svm import SVC

import matplotlib.pyplot as plt

# Load iris dataset

irisdata = load_iris()

X, y = irisdata[‘data’], irisdata[‘target’]

plt.decide(figsize=(8,6))

plt.scatter(X[:,0], X[:,1], c=y)

plt.xlabel(irisdata[“feature_names”][0])

plt.ylabel(irisdata[“feature_names”][1])

plt.title(“Two choices from the iris dataset”)

plt.current()

# Show the principal components

pca = PCA().match(X)

print(“Principal components:”)

print(pca.components_)

# Remove PC1

Xmean = X – X.suggest(axis=0)

price = Xmean @ pca.components_[0]

pc1 = price.reshape(–1,1) @ pca.components_[0].reshape(1,–1)

Xremove = X – pc1

plt.decide(figsize=(8,6))

plt.scatter(Xremove[:,0], Xremove[:,1], c=y)

plt.xlabel(irisdata[“feature_names”][0])

plt.ylabel(irisdata[“feature_names”][1])

plt.title(“Two choices from the iris dataset after eradicating PC1”)

plt.current()

# Remove PC2

Xmean = X – X.suggest(axis=0)

price = Xmean @ pca.components_[1]

pc2 = price.reshape(–1,1) @ pca.components_[1].reshape(1,–1)

Xremove = Xremove – pc2

plt.decide(figsize=(8,6))

plt.scatter(Xremove[:,0], Xremove[:,1], c=y)

plt.xlabel(irisdata[“feature_names”][0])

plt.ylabel(irisdata[“feature_names”][1])

plt.title(“Two choices from the iris dataset after eradicating PC1 and PC2”)

plt.current()

# Remove PC3

Xmean = X – X.suggest(axis=0)

price = Xmean @ pca.components_[2]

pc3 = price.reshape(–1,1) @ pca.components_[2].reshape(1,–1)

Xremove = Xremove – pc3

plt.decide(figsize=(8,6))

plt.scatter(Xremove[:,0], Xremove[:,1], c=y)

plt.xlabel(irisdata[“feature_names”][0])

plt.ylabel(irisdata[“feature_names”][1])

plt.title(“Two choices from the iris dataset after eradicating PC1 to PC3”)

plt.current()

# Print the outlined variance ratio

print(“Explainedd variance ratios:”)

print(pca.explained_variance_ratio_)

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

# Run classifer on all choices

clf = SVC(kernel=“linear”, gamma=‘auto’).match(X_train, y_train)

print(“Using all choices, accuracy: “, clf.ranking(X_test, y_test))

print(“Using all choices, F1: “, f1_score(y_test, clf.predict(X_test), widespread=“macro”))

# Run classifier on PC1

suggest = X_train.suggest(axis=0)

X_train2 = X_train – suggest

X_train2 = (X_train2 @ pca.components_[0]).reshape(–1,1)

clf = SVC(kernel=“linear”, gamma=‘auto’).match(X_train2, y_train)

X_test2 = X_test – suggest

X_test2 = (X_test2 @ pca.components_[0]).reshape(–1,1)

print(“Using PC1, accuracy: “, clf.ranking(X_test2, y_test))

print(“Using PC1, F1: “, f1_score(y_test, clf.predict(X_test2), widespread=“macro”))

Further learning

This half provides additional property on the topic if you happen to’re making an attempt to go deeper.

Books

Deep Learning

Tutorials

How to Calculate Principal Component Analysis (PCA) from Scratch in Python
Principal Component Analysis for Dimensionality Reduction in Python

APIs

Summary

In this tutorial, you discovered discover ways to visualize data using principal component analysis.

Specifically, you found:

Visualize a extreme dimensional dataset in 2D using PCA
How to utilize the plot in PCA dimensions to help choosing a suitable machine learning model
How to observe the outlined variance ratio of PCA
What the outlined variance ratio means for machine learning

Search This Blog

Solution Desk

Why Does My Snapchat AI Have a Story? Has Snapchat AI Been Hacked?