Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

object_store: Support container@account.dfs.core.windows.net/path URL style for az protocol #7046

Open
daviewales opened this issue Jan 30, 2025 · 4 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted

Comments

@daviewales
Copy link

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

In adlfs (fsspec), there are two valid conventions for specifying a canonical path to Azure Data Lake Gen2 storage, including the account name:

az://container@account.dfs.core.windows.net/path-part/file
abfs://container@account.dfs.core.windows.net/path-part/file

While object_store supports both the az and abfs protocols, it does not support specifying the account in the URL for the az protocol.
However, it does support this for the abfs protocol.

Describe the solution you'd like
I would like to be able to use the following URL convention with object_store, which improves compatibility with adlfs:

az://container@account.dfs.core.windows.net/path-part/file

Describe alternatives you've considered
The workaround is to remember that I need to use the abfs protocol in object_store based tools such as polars.
However, it would be nice to be able to use the az protocol for everything.

@daviewales daviewales added the enhancement Any new improvement worthy of a entry in the changelog label Jan 30, 2025
@daviewales daviewales changed the title [object_store]: Support container@account.dfs.core.windows.net/path URL style for az protocol object_store: Support container@account.dfs.core.windows.net/path URL style for az protocol Jan 30, 2025
@tustvold
Copy link
Contributor

tustvold commented Jan 30, 2025

This seems fine to me and should be a relatively straightforward change to MicrosoftAzureBuilder::parse_url

@daviewales
Copy link
Author

As an extension to the above feature request, I can see that object_store supports the following URL schemes:

  • https://<account>.dfs.core.windows.net
  • https://<account>.blob.core.windows.net
  • https://<account>.blob.core.windows.net/<container>
  • https://<account>.dfs.fabric.microsoft.com
  • https://<account>.dfs.fabric.microsoft.com/<container>
  • https://<account>.blob.fabric.microsoft.com
  • https://<account>.blob.fabric.microsoft.com/<container>

However, it would be excellent if these could be extended to support the following generic structure:

  • https://<account>.<type>.<service>/<container>/<path>

where:

  • <account>: the name of the storage account
  • <type>: dfs or blob
  • <service>: core.windows.net or fabric.microsoft.com
  • <container>: the name of the container
  • <path>: the directory path within the container

This matches the URL structure used by Microsoft's AzCopy tool.
It also matches a supported URL structure for Azure Synapse's OPENROWSET function.

@kylebarron
Copy link
Contributor

The only addition you're suggesting is the <path>? You can use the PrefixStore to handle that

@daviewales
Copy link
Author

daviewales commented Feb 7, 2025

Also in the documentation, it lists

https://<account>.blob.core.windows.net/<container>

But not

https://<account>.dfs.core.windows.net/<container>

My goal is to be able to use the following in Polars:

pl.scan_csv('https://<account>.dfs.core.windows.net/<container>/path.csv')

But I've tried, and it didn't seem to work.

When I checked the object_store documentation, I saw that it supported https://<account>.dfs.core.windows.net, but not https://<account>.dfs.core.windows.net/<container>, so I thought that might be the issue.`

I also saw in the documentation that other URL schemes were specified as supporting the <path>, but not the https://<account>.dfs.core.windows.net one.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted
Projects
None yet
Development

No branches or pull requests

3 participants