Member-only story
UMAP for Dimensionality Reduction: Visualization of Features to Understand Your Data Better
UMAP, the Uniform Manifold Approximation and Projection, is a method for dimension reduction. The purpose of this post is to show how to use the python package UMAP in practice, and to visualize the results.
First, install UMAP through
pip install umap-learn[plot]
You can also install UMAP through pip install umap-learn
if you do not wish to use their built-in plotting method.
Insert the following code to import necessary libraries:
import pandas as pd
import umap
import umap.plot
from sklearn.datasets import fetch_california_housing
from sklearn import preprocessing
import plotly.express as px
from pickle import dump
We will use the California Housing dataset from sklearn.
Since UMAP works with classification dataset, we will use the pandas.qcut to divide the dataset into 10 quantiles
The most expensive house will be labeled as 9 and the cheapest will be labeled as 0. The features are then normalized using StandardScaler from sklearn. This is to normalize the data with mean and standard deviation.
housing = fetch_california_housing()
target = pd.DataFrame(housing.target, columns=['target'])
target['target'] = pd.qcut(target['target'], 10, labels=False)
# normalize the data
scaler = preprocessing.StandardScaler().fit(housing.data)
X_scaled =…