Spotify Recently Played Songs

2022-04-20

GitHub

banner

* ETL made with draw.io

Summary

The Spotify API documentation is very well written and can be used as a walkthrough for a wide variety of project applications. In this project I set up a daily extraction of my recently played tracks using Python for the data processing and Apache Airflow as the scheduler. The resulting data is appended to a database on a local Microsoft SQL Server.

Here is the resulting table:

table

Data Extraction

headers = {  
    'Accept' : 'application/json',  
    'Content-Type' : 'application/json',  
    'Authorization' : 'Bearer {token}'.format(token=token)  
}  
  
today = datetime.now() + timedelta(hours=4)  
yesterday = today - timedelta(days=1)  
yesterday_unix_timestamp = int(yesterday.timestamp()) * 1000  
r = requests.get('https://api.spotify.com/v1/me/player/recently-played?after={time}'.format(time=yesterday_unix_timestamp), headers=headers)  
  
data = r.json()

Data Transformation and Validation

for song in data['items']:  
    primary_key.append(song['played_at'])  
    song_names.append(song['track']['name'])  
    artist_names.append(song['track']['artists'][0]['name'])  
    timestamps.append(song['played_at'][0:10])  
  
song_dict = {  
    'primary_key' : primary_key,  
    'song_name' : song_names,  
    'artist_name' : artist_names,  
    'timestamp' : timestamps  
}  
  
song_df = pd.DataFrame(song_dict, columns = ['primary_key', 'song_name', 'artist_name', 'timestamp'])
def check_if_valid_data(df) -> bool:  
    if df.empty:  
        print('No songs downloaded')  
        quit()  
  
    if pd.Series(df['primary_key']).is_unique:  
        pass  
    else:  
        raise Exception('Primary Key Check Violated')  
  
    if df.isnull().values.any():  
        raise Exception('Null value found')  
  
    yesterday = datetime.now() - timedelta(hours=20)  
    timestamps = df['primary_key']  
    for timestamp in timestamps:  
        played_time = datetime.strptime(timestamp[:19], '%Y-%m-%dT%H:%M:%S')  
        if played_time < yesterday:  
            raise Exception('At least one song is not from the past 24 hrs')  
  
    return True

Loading Data into MS SQL Server

Initiating table:

driver = '{ODBC DRIVER 17 for SQL Server}'  
server = 'localhost\server_name'  
database = 'db_name'  
  
connection_string = '''  
Driver={driver};  
Server={server};  
Database={database};  
Trusted_Connection=yes;  
'''.format(  
    driver=driver,  
    server=server,  
    database=database  
)  
  
conn = pyodbc.connect(connection_string)  
cursor = conn.cursor()  

create_table = '''  
IF (  
    NOT EXISTS (
		SELECT *
		FROM INFORMATION_SCHEMA.TABLES  
		WHERE TABLE_NAME = 'my_played_tracks'        
		)    
)BEGIN  
    CREATE TABLE my_played_tracks (
		primary_key VARCHAR(200) NOT NULL PRIMARY KEY,
		song_name VARCHAR(200),
		artist_name VARCHAR(200),
		timestamp VARCHAR(200)
	)  
END;  
'''  
  
cursor.execute(create_table)  
conn.commit()

Loading Data:

quoted = urllib.parse.quote_plus(connection_string)  
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))  
song_df.to_sql('my_played_tracks', engine, index=False, if_exists='append')

conn.close()

Scheduling with Apache Airflow

Airflow was installed locally for this project by following the Apache documentation.

airflow webserver -p 8080
airflow scheduler

Airflow Windows: https://towardsdatascience.com/run-apache-airflow-on-windows-10-without-docker-3c5754bb98b4

OAuth 2.0 Authorization

In order to extract the song information from Spotify's API we need an 'Access Token'. This can be acquired manually from Spotify. If you only require temporary access, you can skip this next part, however a code would have to be manually produced and copied into the program every time it runs.

In order to programatically request an 'Access Token', authorization must be granted through the OAuth 2.0 framework. This program follows the Authorization Code Flow. 3 pieces of information are required for the end product:

  • client_id
  • client_secret
  • refresh_token

Getting these variables involves creating a Spotify Dev App and manually authorizing the app in a browser.

Spotify Dev App

Spotify has an App Guide that describes how to create an App. The 'client_id' and 'client_secret' are both at the top of the app page.

Remember to add a 'Redirect URI' to the app (I used https://localhost:8000/callback). It doesn't have to be a real website, it is just required to have a trusted destination for Spotify to send the 'authorization code' required for the next step.

One-Time Manual User Authorization

The rest of this setup will follow the Authorization Code Flow document.

The goal is to acquire a 'Refresh Token' which will allow us to request and receive a new 'Access Token' programatically. In order to get this token a 'GET request' must be sent to the '/authorize' endpoint. This GET request can be as simple as a browser url:

https://accounts.spotify.com/authorize?client_id=<client_id_from_app>&scope=user-read-recently-played&response_type=code&redirect_uri=https%3A%2F%2Flocalhost%3A8888%2Fcallback

Parameters:

  • client_id: ID from your newly created app
  • response_type: code
  • redirect_uri: The url added to the app settings, encoded here
  • scope: user-read-recently-played (or any [other scopes])
    • Use ''%20' as a space to separate multiple scopes

After building this URL, authorization is requested:

permission

After the redirect, the 'authorization code' is appended to the url in the browser:

url

Run a curl command with the code to get an Access Token and a Refresh Token

curl -H "Authorization: Basic <base64 encoded client_id:secret_id>" \
	-H "Content-Type: application/x-www-form-urlencoded" \
	-d grant_type=authorization_code \
	-d code=AQBicXEN4K...qU_2-Pr0Q \
	-d redirect_uri=https%3A%2F%2Flocalhost%3A8888%2Fcallback \
	https://accounts.spotify.com/api/token

Parameters:

  • Authorization: Basic <base64 encoded client_id:secret_id>
    • Ex. paste your 'client_id:secret_id' string into here
  • Content-Type: application/x-www-form-urlencoded
  • grant_type=authorization_code
  • code=AQBicXEN4Ks...WYAqU_2-Pr0Q
    • Given by Spotify in the redirect URL
  • redirect_uri=https%3A%2F%2Flocalhost%3A8000%2Fcallback
    • Only used for verification

The response should look like this:

{
   "access_token": "NgCXRK...MzYjw",
   "token_type": "Bearer",
   "scope": "user-read-private user-read-email",
   "expires_in": 3600,
   "refresh_token": "NgAagA...Um_SHo"
}

The 'access_token' can be used to authenticate connection the API, however it expires in 1 hour. The 'refresh_token', however, can be used along with the base64 encoded app credentials to get another 1 hour 'access_token'--giving unlimited access.

The code below is used to acquire a new 'access_token' at each run:

def refresh(self):  
  
    query = 'https://accounts.spotify.com/api/token'  
  
    response = requests.post(query,  
                             data={'grant_type': 'refresh_token',  
                                   'refresh_token': refresh_token},  
                             headers={'Authorization': 'Basic ' + base_64})  
    return response.json()['access_token']

For setting the scope of the returned data:

Others Implementations: