If a malicious URI is passed to the library, the library can be tricked into performing an operation on a different API endpoint than intended.
Before the security advisory feature was enabled on GitHub, I was already in contact with Stéphane Bruckert via e-mail, and he asked me to look into a potential fix.
My recommendation is to perform stricter parsing of URLs and URIs, which I implemented in the patch included at the end of the report. If you prefer, I can also invite you to a private fork of the repository.
The impact of this vulnerability depends heavily on what operations a client application performs when it handles a URI from a user and how it uses the responses it receives from the API.
From 30cf29b16e893dcac974dbd7481fb58a073b853c Mon Sep 17 00:00:00 2001
From: Shaderbug <119610832+Shaderbug@users.noreply.github.com>
Date: Tue, 10 Jan 2023 19:26:18 +0100
Subject: [PATCH] Improve URL and URI handling
---
spotipy/client.py | 61 +++++++++++++++-----
tests/integration/non_user_endpoints/test.py | 6 +-
2 files changed, 49 insertions(+), 18 deletions(-)
diff --git a/spotipy/client.py b/spotipy/client.py
index d7025a9..b094947 100644
--- a/spotipy/client.py
+++ b/spotipy/client.py
@@ -6,6 +6,7 @@ __all__ = ["Spotify", "SpotifyException"]
import json
import logging
+import re
import warnings
import requests
@@ -96,6 +97,29 @@ class Spotify(object):
"US",
"UY"]
+ # Spotify URI scheme defined in [1], and the ID format as base-62 in [2].
+ #
+ # Unfortunately the IANA specification is out of date and doesn't include the new types
+ # show and episode. Additionally, for the user URI, it does not specify which characters
+ # are valid for usernames, so the assumption is alphanumeric which coincidentially are also
+ # the same ones base-62 uses.
+ # In limited manual exploration this seems to hold true, as newly accounts are assigned an
+ # identifier that looks like the base-62 of all other IDs, but some older accounts only have
+ # numbers and even older ones seemed to have been allowed to freely pick this name.
+ #
+ # [1] https://www.iana.org/assignments/uri-schemes/prov/spotify
+ # [2] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
+ _regex_spotify_uri = r'^spotify:(?P<type>track|artist|album|playlist|show|episode|user):(?P<id>[0-9A-Za-z]+)$'
+
+ # Spotify URLs are defined at [1]. The assumption is made that they are all
+ # pointing to open.spotify.com, so a regex is used to parse them as well,
+ # instead of a more complex URL parsing function.
+ #
+ # [1] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
+ _regex_spotify_url = r'^(http[s]?:\/\/)?open.spotify.com\/(?P<type>track|artist|album|playlist|show|episode|user)\/(?P<id>[0-9A-Za-z]+)(\?.*)?$'
+
+ _regex_base62 = r'^[0-9A-Za-z]+$'
+
def __init__(
self,
auth=None,
@@ -1940,20 +1964,27 @@ class Spotify(object):
return path
def _get_id(self, type, id):
- fields = id.split(":")
- if len(fields) >= 3:
- if type != fields[-2]:
- logger.warning('Expected id of type %s but found type %s %s',
- type, fields[-2], id)
- return fields[-1].split("?")[0]
- fields = id.split("/")
- if len(fields) >= 3:
- itype = fields[-2]
- if type != itype:
- logger.warning('Expected id of type %s but found type %s %s',
- type, itype, id)
- return fields[-1].split("?")[0]
- return id
+ uri_match = re.search(Spotify._regex_spotify_uri, id)
+ if uri_match is not None:
+ uri_match_groups = uri_match.groupdict()
+ if uri_match_groups['type'] != type:
+ raise ValueError("Unexpected Spotify URI type.")
+ else:
+ return uri_match_groups['id']
+
+ url_match = re.search(Spotify._regex_spotify_url, id)
+ if url_match is not None:
+ url_match_groups = url_match.groupdict()
+ if url_match_groups['type'] != type:
+ raise ValueError("Unexpected Spotify URL type.")
+ else:
+ return url_match_groups['id']
+
+ # Raw identifiers might be passed, ensure they are also base-62
+ if re.search(Spotify._regex_base62, id) is not None:
+ return id
+
+ raise ValueError("Unsupported URL / URI")
def _get_uri(self, type, id):
if self._is_uri(id):
@@ -1962,7 +1993,7 @@ class Spotify(object):
return "spotify:" + type + ":" + self._get_id(type, id)
def _is_uri(self, uri):
- return uri.startswith("spotify:") and len(uri.split(':')) == 3
+ return re.search(Spotify._regex_spotify_uri, uri) is not None
def _search_multiple_markets(self, q, limit, offset, type, markets, total):
if total and limit > total:
diff --git a/tests/integration/non_user_endpoints/test.py b/tests/integration/non_user_endpoints/test.py
index 96ee4da..116e1d9 100644
--- a/tests/integration/non_user_endpoints/test.py
+++ b/tests/integration/non_user_endpoints/test.py
@@ -280,7 +280,7 @@ class AuthTestSpotipy(unittest.TestCase):
try:
self.spotify.track(self.bad_id)
self.assertTrue(False)
- except SpotifyException:
+ except ValueError:
self.assertTrue(True)
def test_show_urn(self):
@@ -296,7 +296,7 @@ class AuthTestSpotipy(unittest.TestCase):
self.assertTrue(show['name'] == 'Heavyweight')
def test_show_bad_urn(self):
- with self.assertRaises(SpotifyException):
+ with self.assertRaises(ValueError):
self.spotify.show("bogus_urn", market="US")
def test_shows(self):
@@ -333,7 +333,7 @@ class AuthTestSpotipy(unittest.TestCase):
self.assertTrue(episode['name'] == '#1 Buzz')
def test_episode_bad_urn(self):
- with self.assertRaises(SpotifyException):
+ with self.assertRaises(ValueError):
self.spotify.episode("bogus_urn", market="US")
def test_episodes(self):
--
2.34.1
Summary
If a malicious URI is passed to the library, the library can be tricked into performing an operation on a different API endpoint than intended.
Details
The code Spotipy uses to parse URIs and URLs accepts user data too liberally which allows a malicious user to insert arbitrary characters into the path that is used for API requests. Because it is possible to include
..
, an attacker can redirect for example a track lookup viaspotifyApi.track()
to an arbitrary API endpoint like playlists, but this is possible for other endpoints as well.Before the security advisory feature was enabled on GitHub, I was already in contact with Stéphane Bruckert via e-mail, and he asked me to look into a potential fix.
My recommendation is to perform stricter parsing of URLs and URIs, which I implemented in the patch included at the end of the report. If you prefer, I can also invite you to a private fork of the repository.
PoC
The POC expects
SPOTIFY_CLIENT_ID
andSPOTIFY_CLIENT_SECRET
environment variables to be set to authenticate against the API.Impact
The impact of this vulnerability depends heavily on what operations a client application performs when it handles a URI from a user and how it uses the responses it receives from the API.
Possible Patch
Caviats of this patch
ValueError
if it cannot parse an ID, instead of logging a warning or silently passing back whatever it received as input.ValueError
that didn't require a valid user session, other tests may also need adjustment