Chapter 6 Genre Classification — Ensemble Approach

Here we create a stacked model that will hopefully outperform both models.

In breif a stacked ensemble model takes the outputs of multiple models and feeds them to another layer of models. In this case we will use the prediction probabilities of the spotify audio feature model (79%) and the topic probability models (56% accuracy).

Below join the features of the topic model to the spotifyr features. This data frame will be used to create a data frame of predictions.

chart_df <- chart_analysis %>% 
  mutate(id = glue::glue("{artist}....{title}")) %>% 
  inner_join(chart_topics) %>% 
  distinct(chart, id, .keep_all = TRUE) %>% 
  select(-c("duration_ms", "time_signature", "type", "mode",
            "rank", "year", "artist", "featured_artist", "title")) %>%
  mutate(chart = as.factor(chart),
         key = as.factor(key))
## Joining, by = c("chart", "id")
chart_df
## # A tibble: 510 x 17
##    chart danceability energy key   loudness speechiness acousticness
##    <fct>        <dbl>  <dbl> <fct>    <dbl>       <dbl>        <dbl>
##  1 Hot …        0.66   0.383 2        -9.38      0.0304     0.381   
##  2 Hot …        0.355  0.48  11       -7.31      0.0282     0.679   
##  3 Hot …        0.573  0.635 7        -6.62      0.0275     0.000598
##  4 Hot …        0.557  0.676 7        -4.46      0.027      0.0661  
##  5 Hot …        0.531  0.746 11       -4.23      0.0329     0.0972  
##  6 Hot …        0.579  0.776 8        -5.36      0.173      0.0749  
##  7 Hot …        0.517  0.743 9        -4.34      0.0363     0.243   
##  8 Hot …        0.613  0.667 1        -6.87      0.029      0.356   
##  9 Hot …        0.514  0.659 1        -6.14      0.0309     0.0284  
## 10 Hot …        0.546  0.805 0        -4.25      0.0348     0.125   
## # … with 500 more rows, and 10 more variables: instrumentalness <dbl>,
## #   liveness <dbl>, valence <dbl>, tempo <dbl>, id <glue>, x1 <dbl>,
## #   x2 <dbl>, x3 <dbl>, x4 <dbl>, x5 <dbl>

Now we need to create training and testing data for this stacked model. We need to create predictions on the training set and predictions on the test set. These respective predictions will be our new training and testing sets. Feels clunky, I know.

set.seed(1)
init_split <- initial_split(chart_df, strata = "chart")
train_df <- training(init_split)
test_df <- testing(init_split)

We will now bind the predictions together into a new data frame. Note that it would be redundant to have probabilities for both country and rock so we select only country.

train_preds <- bind_cols(predict(audio_classifier, train_df, type = "prob"),
                         predict(lyric_classifier, train_df, type = "prob")) %>% 
  select(contains("country")) %>% 
  bind_cols(select(train_df, chart)) %>% 
  janitor::clean_names()

test_preds <- bind_cols(predict(audio_classifier, test_df, type = "prob"),
                         predict(lyric_classifier, test_df, type = "prob")) %>% 
  select(contains("country")) %>% 
  bind_cols(select(test_df, chart)) %>% 
  janitor::clean_names()

Now we can go ahead creating the ensemble model.

chart_rec <- recipe(chart ~ ., data = train_preds)  %>%
  step_center(all_numeric()) %>%
  step_scale(all_numeric()) %>%
  prep()


baked_train <- bake(chart_rec, train_preds)
baked_test <- bake(chart_rec, test_preds)

With our newly specified model we can try creating a few different classification models.

ranger_fit <- rand_forest(mode = "classification") %>%
  set_engine("ranger") %>%
  fit(chart ~ ., data = baked_train)

rf_fit <- rand_forest(mode = "classification") %>%
  set_engine("randomForest") %>%
  fit(chart ~ ., data = baked_train)

c50_fit <- decision_tree(mode = "classification") %>%
  set_engine("C5.0") %>%
  fit(chart ~ ., data = baked_train)

How did these models perform?

rf_estimates <- predict(rf_fit, baked_test) %>%
  bind_cols(baked_test) %>%
  yardstick::metrics(truth = chart, estimate = .pred_class)

ranger_estimates <- predict(ranger_fit, baked_test) %>%
  bind_cols(baked_test) %>%
  yardstick::metrics(truth = chart, estimate = .pred_class)

c50_estimates <- predict(c50_fit, baked_test) %>%
  bind_cols(baked_test) %>%
  yardstick::metrics(truth = chart, estimate = .pred_class)

bind_rows(
  rf_estimates,
  ranger_estimates,
  c50_estimates
) %>% 
  filter(.metric == "accuracy") %>% 
  mutate(model = c("rf", "ranger rf", "c50"))
## # A tibble: 3 x 4
##   .metric  .estimator .estimate model    
##   <chr>    <chr>          <dbl> <chr>    
## 1 accuracy binary         0.890 rf       
## 2 accuracy binary         0.921 ranger rf
## 3 accuracy binary         0.937 c50

It seems that our C5.0 Model performs the best with an accuracy of nearly 94%. That’s a whole 15% better than our audio feature only model. But how does this perform on new data?

6.1 Case Study

Americana is an extraordinary music genre. It is the confluence of folk, blues, country, and rock & roll. It is a uniquely American genre. We will use our new models to conduct a case study on the song September Doves by Lost Dog Street band.

Before we embark, familiarize yourself with the music. Listen to the song in its entirety. Think about how you would classify the song yourself. Are there differences between the lyrics and the audio? Perhaps listen to the song while we fetch and preprocess the song features and lyrics.

# extract audio features
doves_feats <- track_audio_features("Lost Dog Street Band", "September Doves")

#------------------------------- lyric features -------------------------------#
# retrieve and tokenize song lyrics 
doves_tokens <- genius_lyrics("Lost Dog Street Band", "September Doves") %>% 
  unnest_tokens(word, lyric) %>% 
  mutate(word = SnowballC::wordStem(word))

doves_dtm <- doves_tokens %>% 
  count(track_title, word, sort = TRUE) %>% 
  cast_dtm(track_title, word, n)


# calculate topic probabilities
doves_lda <- posterior(lda_5, doves_dtm) 

# extract probs for modeling
doves_lda_df <- doves_lda[[2]] %>% 
  as_tibble() %>% 
  janitor::clean_names()

Below we test the features with each model and bind the country probabilities together as we will need this for the stacked model.

#----------------------------- model predictions ------------------------------#
doves_stack_test <- bind_cols(
  predict(audio_classifier, 
          mutate(doves_feats, key = as.factor(key)),
          type = "prob"),
  predict(lyric_classifier, 
          doves_lda_df, type = "prob")
) %>% 
  select(contains("country")) %>% 
  janitor::clean_names()

doves_stack_test
## # A tibble: 1 x 2
##   pred_hot_country_songs pred_hot_country_songs1
##                    <dbl>                   <dbl>
## 1                  0.272                     0.6

We can see that based on the audio features, our model doesn’t classify September Doves as country, but it does based on the song lyrics.

When we feed this into our ensemble model, what is the result?

predict(c50_fit, doves_stack_test, type = "prob")
## # A tibble: 1 x 2
##   `.pred_Hot Country Songs` `.pred_Rock Songs`
##                       <dbl>              <dbl>
## 1                     0.931             0.0694