Curating a Spotify Discover Playlist for Two People with K-Means Clustering

Choose a friend to discover brand new music with. We create an adventurous playlist curated to both of your tastes with machine learning.

Trustin Yoon
Towards Data Science

--

Photo by Morning Brew on Unsplash

iTunes revolutionized music. Spotify did it again.

Spotify is a necessity in my life. Not only because I have instant access to 50 million songs and over 1 million podcasts (Patiently waiting for Rogan in September) — it’s also because of the built-in recommendation features that magically tell me what my newest favorite songs are before I’ve even listened to them. Spotify has studied my listening history for the past 7 years and played a massive role in drastically evolving my music tastes. My love for 90s underground neo-soul-boombap was not introduced to me by my Korean household nor from the Top-50-Charts-loving Irvine suburbs that I grew up in. It was from falling into Spotify’s recommendation algorithm rabbit holes time and time again for hours on end. I have been able to discover favorite artists and diversify my genre palette far beyond what it would’ve been without Spotify.

And the single most crucial aid to my music exploring journey has been Spotify’s Discover Weekly playlist. With access to the listening histories of over 110 million subscribers, Spotify data scientists flex their full ML power by curating 30 new recommend songs for users every Monday.

Screenshot of my Discover Weekly playlist

Finding a hidden gem while scavenging through Spotify’s recommendations is such a pleasurable experience that I wondered if it was possible to make this into a collaborative experience. I wanted to create a Discover Weekly playlist that learns two people’s music tastes and recommends songs that are curated for both of them. With quarantine still being a thing, I thought that this “Discover Together” playlist could serve as a novel socializing activity. So I partnered up with my friend Arjun Reddy, and we were ready to build.

Check out the full GitHub repo here: https://github.com/trustinyoon/Spotify-Discover-2gether

Data Collection

Arjun and I decided to automate the Discover Together Playlist curated for our collective music tastes as a test. We used Spotipy, a lightweight Python library for the Spotify Web API and created a Spotify Developers app to gain authorization to Spotify profile data. Here is the official Spotipy documentation: https://spotipy.readthedocs.io/en/2.13.0/

We also forked authorization flow codes from Plegas Gerasimos’s repo that contained the codes for granting authorization flows which can be found here: https://github.com/makispl/Spotify-Data-Analysis

Then we created a code that authorizes access to a user’s Spotify profile to extract their top 50 medium term songs using Spotipy’s current_user_top_tracks() function. Medium_term was the most optimal time frame since short_term songs were too short (~1 week of songs) and long_term was too long (basically a user’s all time song history). The user’s top 50 medium term songs were then extracted into a .csv file, and this was repeated for the second user.

We combined both of our top 50 songs into one dataframe of 100 songs that will be used for analysis and to base our song recommender from. Each song has official data provided by Spotify regarding its different sonic features: danceability, energy, key (musical key), loudness, mode (minor or major), speechiness, acousticness, instrumentalness, liveness, valence (how positive/happy/cheerful it sounds), and tempo. These features are used to create clusters from our dataframe. We chose to focus on danceability, energy, speechiness, acousticness, valence, and tempo after concluding the other features were noise to our analysis. Here is a distribution plot grid of the features of our favorite recent songs.

K-Means Clustering

Now that we had both users’ top 50 songs, we can use those tracks to recommend new songs that are curated for both people’s recent music tastes. K-means clustering was the method we chose so that we could group each song from our combined songs into a cluster (sub-genre/type of song) with similar sonic features. We chose to have k=20 even though the elbow method showed that optimal k=6, since we wanted to have more specified clusters and base our recommended songs off less generalized clusters. Songs would be clustered together if their positions on a 6 axes plot were close in euclidean distance. Below is the 3d visualization of the 100 songs labeled by their respective clusters.

When visualizing, some information is lost since we are flattening 6 features onto 3 dimensions. We are still able to visualize the 20 clusters with each song being of close euclidean distance from each other in a given cluster.

We can further explore the summary of each cluster below.

Histogram of number of songs per cluster (left) Distribution of cluster’s mean song features (right)

Building the Discover Together Playlist

Once we got a list of songs for each cluster, we fed each list into Spotipy’s recommendation() function which takes in 5 seed tracks to base its recommendation from. For the clusters that contained more than 5 songs, we randomly chose 5 from its list to represent the cluster. The clusters that contained less than 5 songs, we left it as is. We ran the recommendation function for each cluster, and each time it produced 1 recommended track for the clusters that contained less than 5 songs and 2 recommended tracks for the clusters that contained 5 songs. We wanted to place more weight onto the clusters with more songs since that meant we generally enjoyed songs that fit in those clusters more than the smaller clusters. After fetching all the recommended tracks, we put them into a list as a parameter for the Spotipy function that automatically creates and places a playlist with the given list of songs straight to the user’s Spotify library.

Conclusion

After a week of coding, we finally automated the Discover Together Playlist! Here’s what it looks like!

Although there were a few songs that we have already listened to, we were generally very pleased with the quality of the recommendations and each of us added a couple of the recommendations to our personal playlists!

“Discover Together” hasn’t been developed into a functioning public web app, yet. We’re looking for any app developer who is comfortable working with JSON, web APIs, or UI/UX who would be interested in helping us turn this project into a useable app for any two people who want to explore new music together! Please contact us at trustinyoon@gmail.com or arjunravulareddy@gmail.com if you’re interested!

___________________________________________________________________

9/14/20 UPDATE: We have built a functioning web app and are looking for awesome UI/UX and graphic designers to help with our logo, landing page aesthetic, and marketing materials! Email trustinyoon@gmail.com with your portfolio link if interested :)

--

--