[new] Add a trick for StaticEmbedding #317

ghost · 2020-08-13T02:20:17Z

Description：修改StaticEmbedding类中的_load_with_vocab方法，首先读取所有预训练词向量，然后遍历vocab中的word，依次判断原始word、全小写的word、全大写的word以及首字母大写的word是否存在于预训练词向量中，即：原始word匹配失败的话就为word分配一个语义尽可能相似的预训练词向量，从而提升vocab中word匹配到预训练词向量的概率。

Main reason: 原始的_load_with_vocab方法只在读入预训练词向量时，对预训练词向量中的word与vocab中的word进行硬匹配，因此匹配成功率很低，对最终的实验效果影响很大。

Checklist 检查下面各项是否完成

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (例如[bugfix]修复bug，[new]添加新功能，[test]修改测试，[rm]删除旧代码)
Changes are complete (i.e. I finished coding on this PR) 修改完成才提PR
All changes have test coverage 修改的部分顺利通过测试。对于fastnlp/fastnlp/的修改，测试代码必须提供在fastnlp/test/。
Code is well-documented 注释写好，API文档会从注释中抽取
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change 修改导致例子或tutorial有变化，请找核心开发人员

Changes: 逐项描述修改的内容

修改了StaticEmbedding类中的_load_with_vocab，在匹配预训练词向量时增加了多轮匹配，提升vocab中word匹配到预训练词向量的概率。

Mention: 找人review你的PR

@修改过这个文件的人
@核心开发人员

Add a trick for StaticEmbedding

379ea69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[new] Add a trick for StaticEmbedding #317

[new] Add a trick for StaticEmbedding #317

Uh oh!

ghost commented Aug 13, 2020 •

edited by ghost

Loading

Uh oh!

Uh oh!

[new] Add a trick for StaticEmbedding #317

Are you sure you want to change the base?

[new] Add a trick for StaticEmbedding #317

Uh oh!

Conversation

ghost commented Aug 13, 2020 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ghost commented Aug 13, 2020 •

edited by ghost

Loading