PyConf Hyderabad

# Dimensionality Reduction and Principal Component Analysis

Submitted by joydeep bhattacharjee (@infinite-joy) on Wednesday, 23 August 2017

Technical level: Intermediate Status: Rejected

### Abstract

PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. It is one of the most popular methods that is used for Dimensionality Reduction.

Key takeaways: In this session we will understand the theoretical concepts behind PCA, when and how to use PCA, how to achieve dimensionality reduction, and how it benefits us using real world data.

### Outline

Normally when we are applying any of the machine learning concepts, we need to deal with a lot of matrices. Each matrix may have a lot of features or dimensions and then we will need to do a lot of computation. It may be prohibitive to run all the computations in a production environment, not counting the added problem of overfitting. In many occasions it is also very useful to visualise the data. Due to our limitations as human beings, we are not able to visualise higher dimensions. For these reasons we need to resort to Principal Component Analysis or PCA to reduce the dimensions in our data-set. In this talk you will learn

• What is Principal Component Analysis and why you should be interested in this?
• The math behind principal component analysis and why it works the way its supposed to work?
• How to select principal components?
• Implementing this in production using sklearn

Additional and Optional(if time permits)

• How to plugin PCA to an existing production application.

### Requirements

Knowledge on

• Matrices and Matrix Multiplication
• Pandas
• Numpy
• Sklearn
• Bokeh
• Simple Prediction Algorithms like linear regression

### Speaker bio

Hello, I am a software engineer/data scientist working for a consulting firm called Nineleaps. Currently I am working on a project where we are trying to apply machine learning algorithms to various medical problems and the pharmaceutical industry at large. I also have a podcast on various developer topics called Flawcode. I love talking about machine learning and software engineering and you can send me a hi at @alt227Joydeep.

### Slides

https://docs.google.com/presentation/d/129s88g8tynuzN-IqNxBeKEsqie-7rn15FlYHbMZ4aSo/edit?usp=sharing