themis: dealing with imbalanced data by using synthetic oversampling

July 9, 2020

Date

July 9, 2020

Time

7:00 PM

Location

St. Louis, MO

Many classification tasks come with an unbalanced dataset. Examples range from disease prediction to fraud detection. Naively applying your model will lead to an ineffective predictor that only predicts the majority class. The themis package implements various established algorithms that adjust this imbalance in the data by either removing cases from the majority classes or by synthetically adds cases to the minority classes until the desired ratio is met. A walkthrough of the heart of the synthetic oversampling algorithms will be given in code and visualization along with talk about performance. themis was created because of a lack of unity and speed in existing R packages.

Posted on:
July 9, 2020
Length:
1 minute read, 106 words
Categories:
talk useR2020
See Also: