Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Enable delta.feature.allowColumnDefaults for all tables #343

Merged
merged 3 commits into from
May 30, 2024

Conversation

dhirschfeld
Copy link
Contributor

@dhirschfeld dhirschfeld commented Feb 2, 2024

Resolves #324

MCVE
from datetime import datetime as DateTime
from typing import TypeAlias

import sqlalchemy as sa
from databricks.sqlalchemy import TIMESTAMP
from sqlalchemy.orm import (
    DeclarativeBase,
    declared_attr,
    Mapped,
    mapped_column,
    MappedAsDataclass,
)
from sqlalchemy.types import (
    Double,
    String,
)
from typing_extensions import Annotated


float64: TypeAlias = Double
uint64: TypeAlias = sa.BigInteger
str32 = Annotated[str, 32]
str64 = Annotated[str, 64]

TABLE_NAME_MAPPING = {
    'Model': 'model_metadata',
}


class Base(MappedAsDataclass, DeclarativeBase):

    @declared_attr.directive
    def __tablename__(cls) -> str:
        return TABLE_NAME_MAPPING[cls.__name__]

    type_annotation_map = {
        float64: Double,
        uint64: sa.BigInteger,
        str32: String(32),
        str64: String(64),
    }

    insert_timestamp: Mapped[DateTime] = mapped_column(
        TIMESTAMP,
        init=False,
        nullable=False,
        server_default=sa.func.current_timestamp(),
    )


class Model(Base):
    __table_args__ = (
        sa.UniqueConstraint("name", "version", name='unique_key'),
    )
    pkid: Mapped[uint64] = mapped_column(
        sa.BigInteger,
        sa.Identity(always=True),
        init=False,
        primary_key=True,
    )
    name: Mapped[str64]
    version: Mapped[str32]

@dhirschfeld
Copy link
Contributor Author

I thought I'd get the ball rolling. Feel free to pick this up / push any changes you want.

@dhirschfeld
Copy link
Contributor Author

This seems to work - the tables have the correct TBLPROPERTIES DDL emitted :

2024-02-02 15:00:09,243 INFO sqlalchemy.engine.Engine CREATE SCHEMA IF NOT EXISTS metadata
2024-02-02 15:00:09,244 INFO sqlalchemy.engine.Engine [no key 0.00171s] {}
Databricks dialect ignores SQLAlchemy's autoincrement semantics. Use explicit Identity() instead.
Databricks does not support unique constraints
2024-02-02 15:00:09,391 INFO sqlalchemy.engine.Engine 
CREATE TABLE metadata.model_metadata (
	pkid BIGINT GENERATED ALWAYS AS IDENTITY, 
	name STRING NOT NULL, 
	version STRING NOT NULL, 
	insert_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, 
	PRIMARY KEY (pkid)
) USING DELTA
TBLPROPERTIES('delta.feature.allowColumnDefaults' = 'enabled')


2024-02-02 15:00:09,393 INFO sqlalchemy.engine.Engine [no key 0.00164s] {}

...and are able to be created with the correct properties:

SHOW CREATE TABLE model_metadata;
CREATE TABLE `catalog`.metadata.model_metadata (
  pkid BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
  name STRING NOT NULL,
  version STRING NOT NULL,
  insert_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT `model_metadata_pk` PRIMARY KEY (`pkid`))
USING delta
TBLPROPERTIES (
  'delta.checkpoint.writeStatsAsJson' = 'false',
  'delta.checkpoint.writeStatsAsStruct' = 'true',
  'delta.enableDeletionVectors' = 'true',
  'delta.feature.allowColumnDefaults' = 'supported',
  'delta.feature.deletionVectors' = 'supported',
  'delta.feature.identityColumns' = 'supported',
  'delta.feature.invariants' = 'supported',
  'delta.minReaderVersion' = '3',
  'delta.minWriterVersion' = '7')

Signed-off-by: Levko Kravets <levko.ne@gmail.com>
@kravets-levko kravets-levko merged commit c4e8103 into databricks:main May 30, 2024
10 of 11 checks passed
@dhirschfeld dhirschfeld deleted the column-defaults branch May 31, 2024 01:05
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[sqlalchemy] [Feature Request] Support default values
2 participants