Member-only story

UMAP for Dimensionality Reduction: Visualization of Features to Understand Your Data Better

Zex
3 min readJul 8, 2022

UMAP, the Uniform Manifold Approximation and Projection, is a method for dimension reduction. The purpose of this post is to show how to use the python package UMAP in practice, and to visualize the results.

First, install UMAP through

pip install umap-learn[plot]

You can also install UMAP through pip install umap-learn if you do not wish to use their built-in plotting method.

Insert the following code to import necessary libraries:

import pandas as pd
import umap
import umap.plot
from sklearn.datasets import fetch_california_housing
from sklearn import preprocessing
import plotly.express as px
from pickle import dump

We will use the California Housing dataset from sklearn.

Since UMAP works with classification dataset, we will use the pandas.qcut to divide the dataset into 10 quantiles

The most expensive house will be labeled as 9 and the cheapest will be labeled as 0. The features are then normalized using StandardScaler from sklearn. This is to normalize the data with mean and standard deviation.

housing = fetch_california_housing()
target = pd.DataFrame(housing.target, columns=['target'])
target['target'] = pd.qcut(target['target'], 10, labels=False)

# normalize the data
scaler = preprocessing.StandardScaler().fit(housing.data)
X_scaled =…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Zex
Zex

Written by Zex

Machine Learning Engineer. Learn about Artificial Intelligence, Behavioral Economics, Music, and some random stuff

No responses yet

Write a response

Recommended from Medium

Lists

See more recommendations