1. Way to Handle
 Collect more data.
 Use appropriate evaluation metrics: (1)
scale_pos_weight
in XGBoost ($\frac{\mathrm{Number\ of\ negative\ instances}}{\mathrm{Number\ of\ positive\ instances}}$); (2) Recall and precision; (3) ROC and AUC.  Resample the training set.
 Ensemble different resampled datasets.
2. Confusion matrix, Recall and Precision
A confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix).
Predicted label 1  Predicted label 2  
True label 1  correct true positive for class 1 $A$ 
wrong false positive for class 2 $B$ 

True label 2  wrong false positive for class 1 $C$ 
correct true positive for class 2 $D$ 
Thus, $\mathrm{Precision\ 1}=\frac{A}{A+C}, \mathrm{Precision\ 2}=\frac{D}{B+D}, \mathrm{Recall\ 1}=\frac{A}{A+B}, \mathrm{Recall\ 2}=\frac{D}{C+D}.$
We may have:
 High recall and high precision: the class is perfectly handled by the model.
 Low recall and high precision: the model cannot detect the class well but is highly trustable when it does.
 High recall and low precision: the class is well detected but the model also include points of other classes in it.
 Low recall and low precision: the class is poorly handled by the model.
3. ROC, PR and AUC
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The $x$axis is false positive rate and the $y$axis is true positive rate. ROC only considers positives and does not place more emphasis on one class over the other.
Hence, we can use precisionrecall (PR) curves and area under curve (AUC) measures the performance  higher is better.
4. Resampling
A widely adopted technique for dealing with highly unbalanced datasets is called resampling. It consists of removing samples from the majority class (undersampling) and / or adding more examples from the minority class (oversampling).
4.1 Undersampling
Here give some method: RandomUnderSampler, ClusterCentroids, NearMiss. Cleaning undersampling techniques: TomekLinks, EditedNearestNeighbours, RepeatedEditedNeighbours.
4.2 Oversampling
Oversampling is more popular and synthetic minority oversampling technique (SMOTE) is commonly used. In imblearn.over_samoling.SMOTE
, default k_neighbors=5
.
Note that always split into test and train sets before trying oversampling techniques.