By Emil Hvitfeldt

January 13, 2020

A dataset with an uneven number of cases in each class is said to be unbalanced. Many models produce a subpar performance on unbalanced datasets. A dataset can be balanced by increasing the number of minority cases using SMOTE, BorderlineSMOTE and ADASYN. Or by decreasing the number of majority cases using NearMiss or Tomek link removal.

Posted on:
January 13, 2020
1 minute read, 56 words
R Package tidymodels
See Also:
ISLR tidymodels labs
Textrecipes Version 0.4.0
Textrecipes series: Pretrained Word Embedding