Site Loader

Data mining is the collection of processes by which we can extract useful insights from data. Inherent in this definition is the idea of data reduction: useful insights (whether in the form of summaries, sentiment analyses, etc.) ought to be “smaller” and “more organized” than the original raw data. The challenges presented by high data dimensionality (the so-called curse of dimensionality) must be addressed in order to achieve insightful and interpretable analytical results. In this report, we introduce the basic principles of dimensionality reduction and a number of feature selection methods (filter, wrapper, regularization), and discuss some current advanced topics (SVD, spectral feature selection, UMAP) and provide examples (with code).

Data Science Report Series #8: Feature Selection and Dimension Reduction, by Patrick Boily, Olivier Leduc, Andrew Macfie, Aditya Maheshwari, and Maia Pelletier.

Post Author: Patrick Boily

Patrick is interested in the applications of mathematics and statistcs to evidence-based decision support. He has worked on 25+ such projects since 2008. He has extensive experience in data science, machine learning, A.I. and predictive analytics, data cleaning and data visualization.