Support multitoken masked segments #19

sam-writer · 2020-05-22T01:26:43Z

This is definitely doable, I have a notebook that I can share with anyone interested LINK. Unclear if it is doable with a reasonable performance budget, however.

What I haven't tried yet, but would like to: Use BART for this, which should be a natural fit, because of the training procedure.

Anwarvic · 2021-09-06T12:41:13Z

I agree, using BART is more suitable for that when you set match_source_len=False.

Load BART base model

bart = torch.hub.load('pytorch/fairseq', 'bart.base') #takes around two minutes
bart.eval()  # enable evaluation mode
bart.cuda()  # use GPU

Use it:

sentences = ['The <mask> is on the <mask> in front of <mask>.']
bart.fill_mask(sentences, topk=3, beam=10, match_source_len=False)

Gives the following results:

[[('', tensor(-1.5974e-05, device='cuda:0')),
  ('�The photo is on the right in front of the building.',
   tensor(-0.6064, device='cuda:0')),
  ('�The photo is on the right in front of the house.',
   tensor(-0.6113, device='cuda:0'))]]

sam-writer added enhancement New feature or request help wanted Extra attention is needed labels May 22, 2020

sam-writer mentioned this issue Aug 10, 2020

Using SpanBERT to predict spans facebookresearch/SpanBERT#29

Closed

sam-writer mentioned this issue Jan 6, 2022

How does fitbert handle answers that have multiple tokens? #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multitoken masked segments #19

Support multitoken masked segments #19

sam-writer commented May 22, 2020 •

edited

Loading

Anwarvic commented Sep 6, 2021 •

edited

Loading

Support multitoken masked segments #19

Support multitoken masked segments #19

Comments

sam-writer commented May 22, 2020 • edited Loading

Anwarvic commented Sep 6, 2021 • edited Loading

sam-writer commented May 22, 2020 •

edited

Loading

Anwarvic commented Sep 6, 2021 •

edited

Loading