Skip to content

The Resources for "Natural Language to Logical Form" ; "自然语言转逻辑形式"研究资料收集。

License

Notifications You must be signed in to change notification settings

BaeSeulki/NL2LF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 

Repository files navigation

NL2LF

(持续更新中...)
recently update log:

0. UnifiedSKG, UniSAr
1. GNN works: LGESQL, ShadowGNN, SADGA, S²SQL (SOTA)
2. RatSQL + Pretraining (STRUG, GraPPa, GAP, GP) + NatSQL
3. PICARD, DT-Fixup, RaSaP
4. wikisql: SeaD, SeqGenSQL, BRIDGE^

The Resources for Natural Language to Logical Form Research, Focus on NL2SQL first.
"自然语言转逻辑形式"研究资料收集: 本阶段主要以NL2SQL的研究为主, 主要包括评测公开数据集、相关论文和部分代码实现、相关博客或公众号文章。

NL2SQL
一、主要评测数据集 dataset
二、主要论文方法及代码实现 papers&code
    1. WikiSQL
    2. Spider
    3. UnifiedSKG
三、相关资源扩展 extend-resources
    1. Related Works
        1.1. Pre-training
        1.2. Systems
        1.3. Surveys
        1.4. Blogs
        1.5. Other Papers
        1.6. Tools
    2. SQL2Seq
    3. 图神经网络 GNN

NL2SQL & Text2SQL

一、主要评测数据集(DataSet)











  • CCKS2022:金融NL2SQL评测

    现有NL2SQL数据和方法主要关注“封闭场景指定库/表”设定,这很难满足业务范围动态发展的需求。从领域特性来看,金融数据多为时间序列,包括日频行情、季频财报、年度GDP、不定期股票质押解质押等,这无疑会增大问题转SQL难度。

二、主要论文方法及代码实现(Papers&Code)

论文主要以WikiSQL和Spider为评测数据,相应排行榜详见任务主页。
下面主要整理具有代表性的方法,持续更新补充...
注: Exe_score 表示 | model | Dev accuracy | Test accuracy |,表示执行准确率(Execution accuracy)
Log_score 表示逻辑准确率(Logical accuracy),且Spider中不包括值预测。

1. WikiSQL:







  • Schema aware Denoising (SeaD) 🔥🔥

    在text-to-SQL任务中,由于架构设计的限制,seq2seq模型通常会导致局部最优。在本文中,作者提出了一种简单而有效的方法:采用基于transformer的seq2seq模型来加强文本到SQL生成。使用模式感知去噪(SeaD)对seq2seq模型进行训练:由两个去噪目标组成,训练模型从erosion和随机噪声中恢复输入或预测输出(自回归方式),而不是对encoder施加约束或将任务重新格式化为槽位填充。这些去噪目标作为辅助任务,用于在seq2seq生成中更好地建模结构数据。此外,作者改进并提出了一种子句敏感执行引导(Execution Guided, EG)解码策略,以克服生成模型EG解码的局限性。

    Paper

    Exe_score

    SeaD + EG 92.9 93.0
    SeaD 90.2 90.1


  • Schema Dependency Guided 🔥🔥

    结合Question和Schema之间的依存关系来进行多任务学习。

    Paper

    Exe_score

    SDSQL + EG 92.5 92.4
    SDSQL 88.7 88.8






  • Information Extraction Approach

    信息抽取的方法: 采用统一的基于BERT的抽取模型来识别query提及的槽位类型,包括序列标注方法、关系抽取和基于文本匹配的链接方法。

    Paper

    Exe_score

    BERT-IE-SQL + EG 92.6 92.5
    BERT-IE-SQL 88.7 88.8


  • MRC Approach 🔥

    阅读理解的方法: 与传统槽位填充方法不同的是,该方法将NL2SQL转化为QA问题,通过统一的MRC框架来预测不同的槽位。

    Paper

    Code

    Exe_score

    BERT-MRC-SQL + STILTs training + AGG enhancement 87.8 87.4
    BERT-MRC-SQL + STILTs training 86.2 86.0
    BERT-MRC-SQL 85.9 85.9




2. Spider:



















  • SmBoP

    与自上而下的自回归分析相比,半自回归自底向上解析器具有多种优势。首先,由于每个解码步骤中的子树都是并行生成的,因此理论上的运行时间是对数而不是线性复杂度。其次,自下而上的方法学习在每个步骤上学习语义子程序的表示,而不是语义上模糊的部分树。最后,SMBOP基于Transformer的层将子树相互关联起来,与传统的beam-search不同,以探索过的其他树木为条件为树进行评分。

    Paper

    Code
    https://github.com/OhadRubin/SmBop

    Log_score

    SmBoP + GraPPa (DB content used) 74.7 69.5
    SmBoP + BART 66.0 60.5

    Exe_score

    SmBoP + GraPPa (DB content used) - 71.1






  • GAZP

    GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycleconsistent examples to adapt the parser. Unlike data-augmentation, which typically synthesizes unverified examples in the training environment, GAZP synthesizes examples in the new environment whose inputoutput consistency are verified.

    Paper

    Exe_score

    GAZP + BERT - 53.5




3. UnifiedSKG: 🔥🔥

Blog

Code

Paper

三、相关资源扩展 (extend resources)

1. Related Works
1.1 Pre-training 🔥🔥🔥

jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.

A novel weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus.

A new method for Text-to-SQL parsing, Grammar Pre-training (GP),is proposed to decode deep relations between question and database.

An effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data.

this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.

table pre-training can be realized by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.

A pretrained language model that jointly learns representations for NL sentences and (semi-)structured tables.

this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.

Adapting a semantic parser trained on a single language.

1.2 Systems
1.3 Surveys
1.4 Blogs
1.5 Other Papers
1.6 Tools
2. SQL2Seq

Paper

Code

3. 图神经网络(GNN)

Paper

Code

Releases

No releases published

Packages

No packages published