Skip to content

feat: complete after error syntax #334

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 4 commits into from
May 8, 2025
Merged

Conversation

liuxy0551
Copy link
Collaborator

@liuxy0551 liuxy0551 commented Jul 25, 2024

在错误语法的 SQL 后进行自动补全

现状举例

  1. 前方 SQL 语法错误导致光标所在位置无法准确的提示 INSERT 等关键字:
SELECT FROM tb1;
I|

image

  1. 错误语法后的 SELECT * FROM 被解析为多个 statement,无法进行准确的自动补全:
SELECT FROM tb1;
SELECT * FROM |

image

预期举例

  1. 有分号分隔时,以分号后一位作为左边界,右边界不变,将区间内的内容给到 antlr4-c3 进行解析。此时期望能够提示 INSERT 等关键字
SELECT  FROM tb1;
I|
  1. 没有分号分隔时,无法感知第一行的 sql 语句已经结束。此时无法准确的自动补全
SELECT FROM tb1
I|

改动思路

提到的左边界和右边界可以参考 dt-sql-parser #231 的描述。

通过分隔符进行切分(通常是 ;),这里依旧保留现状寻找最小合适范围的策略,并在此策略上继续优化,借助两种方式进一步缩小解析范围。

  1. 在已经获取到的合适范围中以光标为起点,向左查找 ; 的 tokenIndex,并以此为左边界;
  2. 在已经获取到的合适范围中以光标为起点,向右查找 ; 的 tokenIndex,并以此为右边界;

通常在写 SQL 时,一般不会先写当前语句的 ;,所以右边界一般不会再次改变。如果左右没有查找到 ; 则不修改左右边界。

实现效果

2024-09-27 15 40 13

@liuxy0551 liuxy0551 force-pushed the feat_complete branch 2 times, most recently from f6b1a3d to 56e4d0d Compare July 30, 2024 16:41
@liuxy0551
Copy link
Collaborator Author

liuxy0551 commented Jul 31, 2024

过程中遇到的问题

尝试过以独立语句开头的关键词(如:SELECT, INSERT)进行切分,遇到了一些问题:

  1. pg 的个别语法
REVOKE SELECT (co_name) ON table_name |FROM PUBLIC;

GRANT SELECT (column_name) ON table_name TO |role_specification;

MERGE INTO wines w USING wine_stock_changes s ON s.winename = w.winename 
WHEN NOT MATCHED AND stock_delta > 0 
THEN INSERT (col_name) |VALUES(s.winename, s.stock_delta);

WITH with_query_name (col_name) AS (SELECT id FROM table_expression) SEARCH DEPTH 
FIRST BY column_name SET column_name 
CYCLE col_name SET col_name 
USING col_name SELECT|;

上述语法中的 SELECT ON 等和常规独立语句不同,不属于一个语句,此时切分得到的语句依旧无法进行正确的自动补全。

  1. 子查询(复杂)
SELECT c.customer_id, c.customer_name, c.email, total_orders.total_amount, total_orders.order_count
FROM customers c
JOIN (
  SELECT o.customer_id, SUM(o.total_amount) AS total_amount, COUNT(o.order_id) AS order_count
  FROM orders o
  WHERE o.order_date BETWEEN '2024-08-01' AND '2024-08-31'
  GROUP BY o.customer_id
  HAVING COUNT(o.order_id) > 5
) AS total_orders
ON c.customer_id = total_orders.customer_id
WHERE| c.status = 'active'
ORDER BY total_orders.total_amount DESC;

上述语句中存在多层级的子查询,此时如果在子查询后出现光标,且光标位置和子查询不是同一层级,那么会出现较为明显的切分错误,结果如下,连子查询的括号都不完整,更不谈正确进行自动补全了。

  SELECT o.customer_id, SUM(o.total_amount) AS total_amount, COUNT(o.order_id) AS order_count
  FROM orders o
  WHERE o.order_date BETWEEN '2024-08-01' AND '2024-08-31'
  GROUP BY o.customer_id
  HAVING COUNT(o.order_id) > 5
) AS total_orders
ON c.customer_id = total_orders.customer_id
WHERE|

因此,放弃通过以独立语句开头的关键词进行切分,仅通过分隔符进行切分(通常是 ;)。

@liuxy0551 liuxy0551 force-pushed the feat_complete branch 2 times, most recently from b610cb3 to d841e3c Compare August 26, 2024 08:54
@liuxy0551 liuxy0551 force-pushed the feat_complete branch 8 times, most recently from 944a97f to 417c063 Compare September 27, 2024 07:57
@liuxy0551 liuxy0551 marked this pull request as ready for review September 27, 2024 07:58
@liuxy0551 liuxy0551 changed the title test: complete after error syntax feat: complete after error syntax Sep 27, 2024
@liuxy0551
Copy link
Collaborator Author

已发 beta 包在离线中验证效果符合预期,dt-sql-parser@4.1.0-beta.2, monaco-sql-languages@0.12.3-beta.3

@liuxy0551 liuxy0551 force-pushed the feat_complete branch 2 times, most recently from 9ce9722 to 6cebe8e Compare October 15, 2024 14:14
@openai0229
Copy link

所以左右边界都是采取分号来做划分吗?

@mumiao
Copy link
Collaborator

mumiao commented Mar 28, 2025

有冲突

@liuxy0551
Copy link
Collaborator Author

有冲突

#378 部分设计重合,待重新验证功能

@liuxy0551
Copy link
Collaborator Author

有冲突

#378 部分设计重合,待重新验证功能

@JackWang032 将 getMinimumParserInfo 方法的内容拆分为 getMinimumInputInfo 和 parserWithNewInput,作用分别是 获取最小解析边界 和 重新解析新的 inputSlice。便于后续通过 ; 再次切分 input 并解析

具体改动在 bd83147

@liuxy0551 liuxy0551 requested a review from JackWang032 April 3, 2025 03:37
* @param input source string
* @returns parse and parserTree
*/
private parserWithNewInput(inputSlice: string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个方法对标的是createParserWithCache,我觉得方法名可以改下,其次只返回parserIns是不是更好点?由有具体方法决定何时去生成解析树

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个方法对标的是createParserWithCache,我觉得方法名可以改下,其次只返回parserIns是不是更好点?由有具体方法决定何时去生成解析树

方法名的话有什么建议吗,这个方法更多的是 createParserWithCache 和 parseWithCache 的结合,所以同时返回了 parserIns 和 parserTree,我觉得还有 parserWithInput、parserWithInputSlice 这些可选

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我倾向于直接使用已有的createParser,两者功能重叠了

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createParser

推了一个新的 commit,使用已有的 createParser 方法获取 parserIns 即可

@JackWang032
Copy link
Collaborator

+1

@mumiao mumiao merged commit 99b01e5 into DTStack:next May 8, 2025
6 checks passed
mumiao added a commit that referenced this pull request May 8, 2025
* feat: improve errorListener msg (#281)

* feat: add mysql errorListener and commonErrorListener

* feat: improve other sql error msg

* feat: support i18n for error msg

* feat: add all sql errorMsg unit test

* feat: update locale file and change i18n funtion name

* test: upate error unit test

* feat(flinksql): collect comment, type attribute for entity (#319)

* feat(flinksql): collect comment, type attribute for entity

* feat(flinksql): delete console log

* fix(#305): delete function ctxToWord,using ctxToText instead of ctxToWord

* feat: update attribute's type

* feat(flinksql): update flinksql's entitycollect unit test

* feat: optimize interface and update unit test

* feat: update collect attr detail

* feat: optimize interface and some function's arguments

* feat: add comment and update params' name

* feat: collect alias in select statement

* feat: update collect attribute function and update unit test

---------

Co-authored-by: zhaoge <>

* fix: spell check (#337)

Co-authored-by: liuyi <liuyi@dtstack.com>

* ci: check-types and test unit update

* feat: collect entity's attribute(#333)

* feat(trinosql): collect trino sql's attribute(comment,alias,colType)

* feat(hivesql): collect hive sql's attribute(comment,alias,colType)

* feat(impalasql): collect attribute(comment, colType, alias)

* feat(sparksql): collect entity's attribute (comment,alias, colType)

* feat: update endContextList of collect attribute

* feat(postgresql): collect hive sql's attribute(alias,colType)

* feat: update interface of attrInfo and alter entitycollect ts file

* feat(mysql): collect entity's attribute(comment,colType,alias)

* ci: fix check-types problem

---------

Co-authored-by: zhaoge <>

* chore(release): 4.1.0-beta.0

* fix: #362 set hiveVar value (#369)

* fix: #371 export EntityContext types (#372)

* fix: minimum collect candidates boundary to fix parse performance (#378)

* fix: minimum collect candidates boundary to fix parse performance

* fix: fix check-types

* fix: remove debugger code

* fix(flink): fix flinksql syntax error about ROW and function using (#383)

Co-authored-by: zhaoge <>

* build: pnpm antlr4 --lang all

* Feat/follow keywords (#407)

* feat: provide follow keywords when get suggestions

* chore: add watch script

* refactor: optimize spark grammar (#360)

* feat: support semantic context of isNewStatement (#361)

* feat: support semantic context of isStatamentBeginning

* docs: add docs for semantic context

* feat: unify variables in lexer (#366)

* feat: unify variables in lexer

* fix: all sql use WHITE_SPACE

* feat: complete after error syntax (#334)

* refactor: split getMinimumParserInfo to slice input and parser again

* test: complete after error syntax

* feat: complete after error syntax

* feat: use createParser to get parserIns and remove parserWithNewInput

* feat(all sql): add all sql expression column (#358)

* feat(impala): add impala expression column

* feat(trino): add expression column

* feat(hive): add hive expression column

* feat(spark): add spark expression column

* feat(mysql): add mysql expression column unit test

* feat(flink): add flink expression column

* feat(postgresql): add pg expression column

* feat: #410 optimize processCandidates tokenIndexOffset (#411)

* test: test suggestion wordRanges with range when processCandidates without tokenIndexOffset

* feat: #410 optimize processCandidates tokenIndexOffset

---------

Co-authored-by: 霜序 <976060700@qq.com>
Co-authored-by: XCynthia <942884029@qq.com>
Co-authored-by: 琉易 <liuxy0551@qq.com>
Co-authored-by: liuyi <liuyi@dtstack.com>
Co-authored-by: zhaoge <>
Co-authored-by: Hayden <hayden9653@gmail.com>
Co-authored-by: JackWang032 <64318393+JackWang032@users.noreply.github.com>
Co-authored-by: JackWang032 <2522134117@qq.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants