Skip to content

Add the ability to create tables with deeply nested schemas in SQL #11746

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Tracked by #2326
adragomir opened this issue Jul 31, 2024 · 2 comments
Closed
Tracked by #2326

Add the ability to create tables with deeply nested schemas in SQL #11746

adragomir opened this issue Jul 31, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@adragomir
Copy link

Is your feature request related to a problem or challenge?

At the moment, we cannot create tables with deeply nested columns, for example a column that has a map of string to list of structs etc.
We should have the ability to create deeply nested schemas in SQL

Describe the solution you'd like

DuckDB has in its SQL DML dialect some markers / functions that allow this:

CREATE OR REPLACE TABLE test1 (
	struct1 STRUCT(
	  name VARCHAR,
	  bools BOOLEAN,
	  uint32 UINTEGER,
          -- a list of strings
	  tags VARCHAR[]
	),
        -- a list of structs
	list_struct STRUCT(
	  bools BOOLEAN,
	  uint32 UINTEGER,
          -- a list of ints
	  ints32 INTEGER[]
	)[],
	struct_list STRUCT(
	  bools BOOLEAN,
	  uint32 UINTEGER,
          -- a list of structs
	  products STRUCT(
	    qty INT4,
	    name VARCHAR
	  )[]
	),
        -- map of string to map
	map_map_struct MAP(
	  VARCHAR,
          -- map of string to struct
	  MAP(
	    VARCHAR,
	    STRUCT(
	      status VARCHAR,
	      changed INTEGER
	    )
	  )
	)
);

We should have either syntax or functions that we could use to have the same result

Describe alternatives you've considered

No response

Additional context

No response

@jayzhan211
Copy link
Contributor

jayzhan211 commented Aug 9, 2024

statement error DataFusion error: This feature is not implemented: Unsupported CAST from List\(Field \{ name: "item", data_type: Struct\(\[Field \{ name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}, Field \{ name: "info", data_type: Struct\(\[Field \{ name: "color", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}, Field \{ name: "size", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\]\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\]\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\) to Struct\(\[Field \{ name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}, Field \{ name: "info", data_type: List\(Field \{ name: "item", data_type: Struct\(\[Field \{ name: "color", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}, Field \{ name: "size", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\]\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\]\)
create table t (s struct<name varchar, info struct<color varchar, size int>>[]) as values 
(
    [
        {'name': 'n1', 'info': {'color': 'r', 'size': 1}}, 
        {'name': 'n2', 'info': {'color': 'b', 'size': 2}}
    ]
);

We need to support such syntax in sqlparser.
List of struct is tracked in apache/datafusion-sqlparser-rs#1372

Since we mirror the behavior from duckdb for struct and map mostly, I prefer we switch to duckdb syntax for them in datafusion, so we can check against the result with copy&paste easily

TODO:

  • Support syntax for struct in map
    CREATE TABLE tb2 (col MAP(INTEGER,STRUCT(a INTEGER, b VARCHAR)))

@alamb
Copy link
Contributor

alamb commented Feb 13, 2025

I think this is woring wekk now

> create table t as values
(
    [
        {'name': 'n1', 'info': {'color': 'r', 'size': 1}},
        {'name': 'n2', 'info': {'color': 'b', 'size': 2}}
    ]
);
0 row(s) fetched.
Elapsed 0.002 seconds.

> select * from t;
+--------------------------------------------------------------------------------+
| column1                                                                        |
+--------------------------------------------------------------------------------+
| [{name: n1, info: {color: r, size: 1}}, {name: n2, info: {color: b, size: 2}}] |
+--------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
> create table t (s struct<name varchar, info struct<color varchar, size int>>[]);
0 row(s) fetched.
Elapsed 0.001 seconds.

> select * from t;
+---+
| s |
+---+
+---+
0 row(s) fetched.
Elapsed 0.001 seconds.

@alamb alamb closed this as completed Feb 13, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants