Long workshop: Dimension Reduction for Beginners

Dimension Reduction for Beginners

Aedin Culhane

Dana-Farber Cancer Institute / Harvard University

Abstract

This workshop will provide a beginner’s guide to dimension reduction, principal component analysis (PCA), the difference between singular value decomposition, different forms of PCA and fast PCA versions for single-cell data analysis. We will describe how to detect artifacts and select the optimal number of components. It will focus on SVD, PCA, CA, TSNE and UMAP applied single-cell data.

Principal component analysis (PCA) is a key step in many bioinformatics pipelines. In this interactive session we will dive into the various implementations of singular value decomposition (SVD) and principal component analysis (PCA) to clarify the relationship between these methods, and to demonstrate the equivalencies and contrasts between these methods. We will also describe correspondence analysis, decomposition of the Pearson Residuals and demonstrate how it differs from PCA. We will look at TSNE and UMAP, discuss interpretation of outputs, as well as some common pitfalls and sources of confusion in utilizing these methods.

(This will be a much improved version of the workshop provided in Bioc2020)

Keywords: Dimension reduction, PCA, TSNE, UMAP

R build status