Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Enable more authentication options for Databricks data source #2087

Open
ghjklw opened this issue May 22, 2024 · 2 comments
Open

Enable more authentication options for Databricks data source #2087

ghjklw opened this issue May 22, 2024 · 2 comments

Comments

@ghjklw
Copy link

ghjklw commented May 22, 2024

Soda core uses databricks.sql.connect for authentication, which offer many options, as documented:

Unfortunately, the way this is implemented by soda.data_sources.spark_data_source.databricks_connection_function limits it to personal access tokens:

def databricks_connection_function(host: str, http_path: str, token: str, database: str, schema: str, **kwargs):
from databricks import sql
user_agent_entry = f"soda-core-spark/{SODA_CORE_VERSION} (Databricks)"
logging.getLogger("databricks.sql").setLevel(logging.INFO)
connection = sql.connect(
server_hostname=host,
catalog=database,
schema=schema,
http_path=http_path,
access_token=token,
_user_agent_entry=user_agent_entry,
)
return connection

Likewise in SparkDataSource:

connection = connection_function(
username=self.username,
password=self.password,
host=self.host,
port=self.port,
database=self.database,
auth_method=self.auth_method,
kerberos_service_name=self.kerberos_service_name,
driver=self.driver,
token=self.token,
schema=self.schema,
http_path=self.http_path,
organization=self.organization,
cluster=self.cluster,
server_side_parameters=self.server_side_parameters,
configuration=self.configuration,
scheme=self.scheme,
)

A solution could be to extend the signature of databricks_connection_function to match databricks.sql.connect, for example:

def databricks_connection_function(
    host: str,
    http_path: str,
    database: str,
    schema: str,
    auth_type: Literal["databricks-oauth"] | None = None,
    token: str | None = None,
    username: str | None = None,
    password: str | None = None,
    client_id: str | None = None,
    client_secret: str | None = None,
):
  ...

These could then be sent trough to databricks.sql.connect (with the exception of client_id and client_secret which require the creation of a credentials provider if defined).

Adding these options (in particular OAuth) would allow much more secure and robust connection alternatives!

@tools-soda
Copy link

SAS-3512

@benjamin-pirotte
Copy link

benjamin-pirotte commented May 24, 2024

Hi, thank you for creating the ticket! I will add the request to our backlog and prioritize accordingly.
If you have time, feel free to contribute, it would be greatly appreciated! https://github.com/sodadata/soda-core/blob/main/CONTRIBUTING.md.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants