Talks & workshops

Events I have been invited to present at, shared along with slides, videos, and other linkable resources.

2020

Looking at stop words: why you shouldn’t blindly trust model defaults

Invited talk at Salt Lake City R Users Group Removing stop words is a fairly common step in natural language processing, and NLP packages often supply a default list. However, most documentation and tutorials don’t explore the nuances of selecting an appropriate list. Defaults for machine learning and modeling can be helpful but may be misleading or wrong. This talk will focus on the importance of checking assumptions and defaults in the software you use.

Looking at Stop Words: Why You Shouldn’t Blindly Trust Model Defaults

Removing stop words is a fairly common step in natural language processing, and NLP packages often supply a default list. However, most documentation and tutorials don’t explore the nuances of selecting an appropriate list. Defaults for machine learning and modeling can be helpful but may be misleading or wrong. This talk will focus on the importance of checking assumptions and defaults in the software you use.

September 9, 2020

7:00 PM

Salt Lake City R Users Group


materials

themis: dealing with imbalanced data by using synthetic oversampling

Many classification tasks come with an unbalanced dataset. Examples range from disease prediction to fraud detection. Naively applying your model will lead to an ineffective predictor that only predicts the majority class. The themis package implements various established algorithms that adjust this imbalance in the data by either removing cases from the majority classes or by synthetically adds cases to the minority classes until the desired ratio is met. A walkthrough of the heart of the synthetic oversampling algorithms will be given in code and visualization along with talk about performance.

Predictive modeling with text using tidy data principles

J. Silge and E. Hvitfeldt Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems? Do you need a flexible framework for handling text data that allows you to engage in tasks from exploratory data analysis to supervised predictive modeling?

Reproducible preprocessing with recipes

Working alone or with other people becomes increasing difficult with the increase of files and people. This seminar goes into detail why and how to use git in collaborative research. Material in this talk is heavely inspired by Excuse me, do you have a moment to talk about version control? by Jenny Bryan.

Git & Github

Working alone or with other people becomes increasing difficult with the increase of files and people. This seminar goes into detail why and how to use git in collaborative research. Material in this talk is heavely inspired by Excuse me, do you have a moment to talk about version control? by Jenny Bryan.