# 机器学习｜Feature Engineering

Posted by Derek on July 26, 2019

# 1. Feature Selection

We could delete some existing columns by using feature importance.

# 2. Feature Generation/Extraction

We could add some new columns. For example, generating new attributes using existing ones for each row or generating new attributes across multiple rows.

We have a lot of possible interactions, and we need to reduce number of features (group existing features before generation or feature selection after generation).

## 2.1 High-Order Linear Interactions

We can use principal component analysis (PCA) to find uncorrelated components or canonical correlation analysis (CCA) to find correlated components across multiple sources.

### 2.1.1 PCA

We have seen PCA in data quality section and here is an example of PCA in Python.

### 2.1.2 CCA

In statistics, CCA is a way of inferring information from cross-covariance matrices. If we have two vectors $X=(x_1, ..., x_n)$ and $Y=(y_1, ..., y_n)$ of random variables, and there are correlations among the variables, then CCA will find linear combinations of $X$ and $Y$ which have maximum correlation with each other. In other words, CCA seeks linear transforms such that correlation is maximized in the common subspace: $$(a', b')=\arg\max\mathrm{corr}(a^TX, b^TY).$$

#### 2.1.2.1 Computation

PCA is implemented by eigendecomposition $$\Sigma w=\lambda w,$$ while CCA is implemented by generalized eigendecomposition $$\left(\begin{matrix}0 & \Sigma_{XY} \\ \Sigma_{YX} & 0\end{matrix}\right)\left(\begin{matrix}w_X \\ w_Y\end{matrix}\right)=\lambda\left(\begin{matrix}\Sigma_{XX} & 0 \\ 0 & \Sigma_{YY}\end{matrix}\right)\left(\begin{matrix}w_X \\ w_Y\end{matrix}\right).$$

## 2.2 Fisher's Linear Discriminant Analysis (FLDA)

Suppose global mean $\mu$ and class means $\mu_k, k=1, ..., K.$

Let within-class covariance be $$\Sigma_w=\frac{1}{K}\sum_k\frac{1}{n_k}\sum_{i=1}^{n_k}(x_i-\mu_k)(x_i-\mu_k)^T,$$ and between-class covariance be $$\frac{1}{K}\sum_k(\mu_k-\mu)(\mu_k-\mu)^T.$$ We need to find a direction $w$ which maximizes $\frac{w^T\Sigma_bw}{w^T\Sigma_ww}.$ For multiple $w$'s, FLDA can be solved by generalized eigendecomposition.

In Python, using clf = LinearDiscriminatAnalysis(), clf.fit(...), clf.predict(...).