talk

palette2vec: A new way to explore color palettes

There are many palettes available in various R packages. Having a way to explore all of these palettes are already found within the https://github.com/EmilHvitfeldt/r-color-palettes repository and the {paletteer} package. This talk shows what happens when we take one step further into explorability. Using handcrafted color features, dimensionality reduction, and interactive tools will we create and explore a color palette embedding. In this embedded space will we interactively be able to cluster palettes, find neighboring palettes, and even generate new palettes in a whole new way.

Looking at Stop Words: Why You Shouldn’t Blindly Trust Model Defaults

Removing stop words is a fairly common step in natural language processing, and NLP packages often supply a default list. However, most documentation and tutorials don’t explore the nuances of selecting an appropriate list. Defaults for machine learning and modeling can be helpful but may be misleading or wrong. This talk will focus on the importance of checking assumptions and defaults in the software you use.

themis: dealing with imbalanced data by using synthetic oversampling

Many classification tasks come with an unbalanced dataset. Examples range from disease prediction to fraud detection. Naively applying your model will lead to an ineffective predictor that only predicts the majority class. The themis package implements various established algorithms that adjust this imbalance in the data by either removing cases from the majority classes or by synthetically adds cases to the minority classes until the desired ratio is met. A walkthrough of the heart of the synthetic oversampling algorithms will be given in code and visualization along with talk about performance.

Reproducible preprocessing with recipes

Working alone or with other people becomes increasing difficult with the increase of files and people. This seminar goes into detail why and how to use git in collaborative research. Material in this talk is heavely inspired by Excuse me, do you have a moment to talk about version control? by Jenny Bryan.

Git & Github

Working alone or with other people becomes increasing difficult with the increase of files and people. This seminar goes into detail why and how to use git in collaborative research. Material in this talk is heavely inspired by Excuse me, do you have a moment to talk about version control? by Jenny Bryan.

Building a package that fits into an evolving ecosystem

With an ever-increasing amount of textual data is available to us, having a well-thought-out toolchain for modelling is crucial. tidymodels is a recent effort to create a modelling framework that shares the underlying design philosophy, grammar, and data structures of the tidyverse. textrecipes joined the roster a year ago and provided an exciting bridge to text preprocessing. This talk tells the tale of the package textrecipes; starting with the Github issue that sparked the idea for the package, go over the trials and challenges associated with building a package that heavily integrates with other packages all the way to the release on CRAN.

Building R Packages

Building a R package can seem daunting with its many files and structure. This seminar will go through the different use cases for a R package, dos and don’ts and best practices. Finally a live demonstration starting with the creation of a R package ending with release on CRAN.

Debugging and Profiling in R

Hitting an error or a speed-bump while working in R can be a frustration. This seminar will cover strategies and techniques for performing debugging and code profiling in R. We will look at some different ways to identify bugs, how to fix them and how to prevent them from coming back again. We will also look at a couple of typical patterns seen in slow code and at what can be done to fix it.

Working with tidymodels

Tidymodels is a “meta-package” in the same way as tidyverse, but with a focus on modeling and statistical analysis. This talk will go through how to use tidymodels to do modeling in a tidy fashion.

Text Analysis in R

An ever-increasing amount of textual data is available us. I’ll talk you through a structured way to do exploratory data analysis(also called text mining) using tidytext to gain insight into the plain unstructured text. This will be followed by a demonstration of how modeling strategies can facilitate decision making when you have text as part of your data.