Chapter 2 Setting Context - Billboard top charts
Billboard magazine first started publishing music popularity charts in 1936. Billbaord studies multiple segments of listenership such as radio plays, online streams, digital and physical purchases. Through these efforts it has since established itself as an industry standard. Their charts have been used in numerous reseaarch projects aand analyses. Until recently, there is has been no one effective way of accessing this data in a format that is ready for analysis.
bbcharts
was created to simplify the process of retrieving chart rankings. Through bbcharts, users now have access to 321 different charts that are hosted on Billboard.com. There a number of weekly and annual charts. bbcharts lists these in bbcharts::chart_table
and bbcharts::ye_chart_table
respectively (note that ye
stands for “year end”). For an overview of the functionality visit the GitHub repo.
For this case study we will be comparing rock and country songs. In order to create a model that classifies rock and country, we will need a dataset consisting of rock and country songs that can be used in our training. For this, we will look at the top rock and country songs of the past three years using the year end Billboard charts.
To create the dataset we will need to utilize the two charts: "hot-country-songs"
and "hot-rock-songs"
. One approach to do this would be to manually fetch these datasets using year_end_bb("chart-name", year)
and binding them together. However, we have the power of the tidyverse at our fingertips and it would be a shame not to use it!
The process that we will use will be to:
- Create a tibble of chart and year pairs
- Create a new list column with each pair’s chart
- Unnest
To create a tibble with all unique pairs, we will use tidyr::crossing()
. We will provide two vectors. One containing the name of the charts to be used and a second with the years of interest. These values will later be fed to year_end_bb()
.
library(tidyverse)
chart_grid <- crossing(
chart = c("hot-country-songs", "hot-rock-songs"),
year = 2016:2018
)
chart_grid
## # A tibble: 6 x 2
## chart year
## <chr> <int>
## 1 hot-country-songs 2016
## 2 hot-country-songs 2017
## 3 hot-country-songs 2018
## 4 hot-rock-songs 2016
## 5 hot-rock-songs 2017
## 6 hot-rock-songs 2018
Next, we will create a new column called chart_ranks
using a combination of mutate()
and map2()
. mutate()
allows you to create and modify columns in a data frame. map2()
allows you to iterate over two arguments simultaneously and pass these to another function.
The below code will iterate over these pairs and store the output of year_end_bb()
in the new column chart_ranks
. As we only need what is contained in that chart_ranks
, we will select it and unnest()
the tibbles.
library(bbcharts)
charts <- chart_grid %>%
mutate(chart_ranks = map2(.x = chart, .y = year, year_end_bb)) %>%
select(chart_ranks) %>%
unnest()
charts
## # A tibble: 600 x 6
## rank year chart artist featured_artist title
## <int> <int> <chr> <chr> <chr> <chr>
## 1 1 2016 Hot Country… Florida Georgi… <NA> H.O.L.Y.
## 2 2 2016 Hot Country… Thomas Rhett <NA> Die A Happy Man
## 3 3 2016 Hot Country… Tim McGraw <NA> Humble And Kind
## 4 4 2016 Hot Country… Dierks Bentley <NA> Somewhere On A…
## 5 5 2016 Hot Country… Jon Pardi <NA> Head Over Boots
## 6 6 2016 Hot Country… Cole Swindell <NA> You Should Be …
## 7 7 2016 Hot Country… Sam Hunt <NA> Break Up In A …
## 8 8 2016 Hot Country… Maren Morris <NA> My Church
## 9 9 2016 Hot Country… Blake Shelton <NA> Came Here To F…
## 10 10 2016 Hot Country… Kelsea Balleri… <NA> Peter Pan
## # … with 590 more rows
Here we have a table of 600 tracks. This tibble will serve as the source of our data. We will need to collect the audio information (from Spotify) and the song lyrics in later steps.
Before continuing I recommend exploring the charts
data frame and try and understand how it was created. Run the above code funtion by function and line by line. See what changes take place after each and every step.