Talks & workshops

Events I have been invited to present at, shared along with slides, videos, and other linkable resources.

2020

Predictive modeling with text using tidy data principles

Invited workshop for R/Pharma Conference Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems? Do you need a flexible framework for handling text data that allows you to engage in tasks from exploratory data analysis to supervised predictive modeling?

palette2vec: A new way to explore color palettes

There are many palettes available in various R packages. Having a way to explore all of these palettes are already found within the https://github.com/EmilHvitfeldt/r-color-palettes repository and the {paletteer} package. This talk shows what happens when we take one step further into explorability. Using handcrafted color features, dimensionality reduction, and interactive tools will we create and explore a color palette embedding. In this embedded space will we interactively be able to cluster palettes, find neighboring palettes, and even generate new palettes in a whole new way.

Looking at Stop Words: Why You Shouldn’t Blindly Trust Model Defaults

Removing stop words is a fairly common step in natural language processing, and NLP packages often supply a default list. However, most documentation and tutorials don’t explore the nuances of selecting an appropriate list. Defaults for machine learning and modeling can be helpful but may be misleading or wrong. This talk will focus on the importance of checking assumptions and defaults in the software you use.

September 9, 2020

7:00 PM

Salt Lake City R Users Group


materials

themis: dealing with imbalanced data by using synthetic oversampling

Many classification tasks come with an unbalanced dataset. Examples range from disease prediction to fraud detection. Naively applying your model will lead to an ineffective predictor that only predicts the majority class. The themis package implements various established algorithms that adjust this imbalance in the data by either removing cases from the majority classes or by synthetically adds cases to the minority classes until the desired ratio is met. A walkthrough of the heart of the synthetic oversampling algorithms will be given in code and visualization along with talk about performance.

Predictive modeling with text using tidy data principles

J. Silge and E. Hvitfeldt Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems? Do you need a flexible framework for handling text data that allows you to engage in tasks from exploratory data analysis to supervised predictive modeling?

Reproducible preprocessing with recipes

Working alone or with other people becomes increasing difficult with the increase of files and people. This seminar goes into detail why and how to use git in collaborative research. Material in this talk is heavely inspired by Excuse me, do you have a moment to talk about version control? by Jenny Bryan.