Release proposal: Nightly v1.0 #9604

BohuTANG · 2023-01-15T01:25:35Z

Summary

Release name: v1.0-nightly, get on the train now ✋
Let's make the Databend more Lakehouse!

v1.0 (Prepare for release on March 5th)

Task	Status	Comments
(Query) Support Decimal data type#2931	DONE	high-priority(release in v1.0 )
(Query) Query external stage file(parquet)#9847	DONE	high-priority(release in v1.0)
(Query) Array functions#7931	DONE	high-priority(release in v1.0)
(Query) Query Result Cache#10010	DONE	high-priority(release in v1.0)
(Planner) CBO#9597	DONE	high-priority(release in v1.0)
(Processor) Aggregation spilling#10273	DONE	high-priority(release in v1.0)
(Storage) Alter table#9441	DONE	high-priority(release in v1.0 )
(Storage) Block data cache#9772	DONE	high-priority(release in v1.0 )

Archive releases

Reference

What are Databend release channels?
Nightly v1.0 is part of our Roadmap 2023
Community website: https://databend.rs

xudong963 · 2023-01-15T08:11:37Z

Is there an expected time to release v1.0?

BohuTANG · 2023-01-15T09:12:32Z

Is there an expected time to release v1.0?

The preliminary plan is to release in March, mainly focusing on alter table, update, and group by spill.

tangguoqiang172528725 · 2023-02-13T13:05:00Z

Hope simplify the way to insert data, it will help get more user.

BohuTANG · 2023-02-24T00:09:27Z

Add Query Result Cache#10010

haydenflinner · 2023-02-24T16:12:59Z

Hope simplify the way to insert data, it will help get more user.
It's already the easiest to insert of all of the similar products I've tried, how would you like to insert?

@BohuTANG Are there any plans for higher-performance client reads, like maybe streaming Arrow/Parquet/some other high-perf format? I'm not familiar with other read protocols like for example ClickHouse's, I've just been using the mysql connector. But it would be neat to be able to have databend in the middle while paying little overhead vs reading the raw parquet files from S3.

BohuTANG · 2023-02-25T00:23:34Z

@haydenflinner

But it would be neat to be able to have databend in the middle while paying little overhead vs reading the raw parquet files from S3.

Databend supports the suffix an ignore_result to ignore the result from server to client by MySQL wired protocol.

For example:

select * from hits limit 20000;

20000 rows in set (0.53 sec)
Read 146370 rows, 101.91 MiB in 0.507 sec., 288.51 thousand rows/sec., 200.88 MiB/sec.

With ignore_result(Not send result to client):

mysql> select * from hits limit 20000 ignore_result;
Empty set (0.26 sec)
Read 146370 rows, 101.91 MiB in 0.236 sec., 619.37 thousand rows/sec., 431.24 MiB/sec.

haydenflinner · 2023-02-25T01:40:13Z

@BohuTANG That is neat and confirms my suspicion that MySql protocol is a bottleneck in some usecases. Parquet read speeds are in the GB/s, but even by telling the mysql client not to handle the result, we get only MB/s. This confirms the results in the paper I linked, see "Postgres++" in the final table of results vs "Postgres".

If one wanted to use databend as a simple intermediary between dataframes and s3 (more lake-house style), databend is providing a lot of value still in interactive query handling, file size and metadata mgmt, far simpler interface, etc. But it presents a bottleneck when it comes to raw-read-speed. If I wanted to do this for example: df = pd.read_sql("select * from hits limit 1000000"), that would be I think 10x slower than df = pd.read_parquet("local-download-of-hits.parquet"). But I suspect primarily due to mysql protocol overhead; the rest of databend is so fast I wouldn't expect it to get in the way much. I can file a ticket for this, don't let me derail the 1.0 thread, sorry 😄

haydenflinner · 2023-02-25T14:02:44Z

I believe the modern open source protocol most similar to what that paper describes is "Apache Arrow Flight"

sundy-li · 2023-02-25T14:11:28Z

I believe the modern open source protocol most similar to what that paper describes is "Apache Arrow Flight"

Yes, we have plan to do this in #9832.

If the query result is small, MySQL client could work as normal since OLTP data result will commonly be small so it's ok.

Otherwise, we should use other formats or protocols to handle large output (MySQL client is really bad in this case)

You can use:

Unload command to upload the data in parquet/csv formats into storage. https://databend.rs/doc/unload-data/
HTTP/ClickHouse handler to export the data

curl 'http://default@localhost:8124/' --data-binary "select 1,2,3 from numbers(3) format TSV"

Wait for the flight SQL feature, that's called native client!

This paper did not cover clickhouse-client. But AFAIK, clickhouse-client is the best client/protocol I ever see.

BohuTANG added roadmap-track Roadmap track issues Tracking labels Jan 15, 2023

BohuTANG pinned this issue Jan 15, 2023

BohuTANG mentioned this issue Jan 17, 2023

Roadmap 2023 #9448

Open

9 tasks

BohuTANG mentioned this issue Feb 6, 2023

Tracking: Large dataset insert and read #7823

Closed

50 tasks

BohuTANG mentioned this issue Mar 3, 2023

Release proposal: Nightly v1.1 #10334

Closed

6 tasks

BohuTANG closed this as completed Mar 5, 2023

BohuTANG unpinned this issue Mar 5, 2023

BohuTANG mentioned this issue Apr 14, 2023

Release proposal: Nightly v1.2 #11073

Closed

7 tasks

BohuTANG mentioned this issue Jun 27, 2023

Release proposal: Nightly v1.3 #11868

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release proposal: Nightly v1.0 #9604

Release proposal: Nightly v1.0 #9604

BohuTANG commented Jan 15, 2023 •

edited

Loading

xudong963 commented Jan 15, 2023

BohuTANG commented Jan 15, 2023

tangguoqiang172528725 commented Feb 13, 2023

BohuTANG commented Feb 24, 2023

haydenflinner commented Feb 24, 2023

BohuTANG commented Feb 25, 2023

haydenflinner commented Feb 25, 2023

haydenflinner commented Feb 25, 2023

sundy-li commented Feb 25, 2023 •

edited

Loading

Release proposal: Nightly v1.0 #9604

Release proposal: Nightly v1.0 #9604

Comments

BohuTANG commented Jan 15, 2023 • edited Loading

v1.0 (Prepare for release on March 5th)

Archive releases

Reference

xudong963 commented Jan 15, 2023

BohuTANG commented Jan 15, 2023

tangguoqiang172528725 commented Feb 13, 2023

BohuTANG commented Feb 24, 2023

haydenflinner commented Feb 24, 2023

BohuTANG commented Feb 25, 2023

haydenflinner commented Feb 25, 2023

haydenflinner commented Feb 25, 2023

sundy-li commented Feb 25, 2023 • edited Loading

BohuTANG commented Jan 15, 2023 •

edited

Loading

sundy-li commented Feb 25, 2023 •

edited

Loading