Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

answer contains <pad> leads to error in the example #90

Open
deangeckt opened this issue Nov 1, 2021 · 3 comments
Open

answer contains <pad> leads to error in the example #90

deangeckt opened this issue Nov 1, 2021 · 3 comments

Comments

@deangeckt
Copy link

deangeckt commented Nov 1, 2021

when running the example:

nlp("42 is the answer to life, universe and everything.")


ValueError Traceback (most recent call last)
in
----> 1 nlp("42 is the answer to life, universe and everything.")

~\question_generation\pipelines.py in call(self, inputs)
58 qg_examples = self._prepare_inputs_for_qg_from_answers_prepend(inputs, answers)
59 else:
---> 60 qg_examples = self._prepare_inputs_for_qg_from_answers_hl(sents, answers)
61
62 qg_inputs = [example['source_text'] for example in qg_examples]

~\question_generation\pipelines.py in _prepare_inputs_for_qg_from_answers_hl(self, sents, answers)
140 answer_text = answer_text.strip()
141
--> 142 ans_start_idx = sent.index(answer_text)
143
144 sent = f"{sent[:ans_start_idx]} {answer_text} {sent[ans_start_idx + len(answer_text): ]}"

ValueError: substring not found

in _extract_answers() ,
when debugging i saw the "pad" in the dec (in the answer)

dec = [self.ans_tokenizer.decode(ids, skip_special_tokens=False) for ids in outs]

could be fixed when using skip_special_tokens to TRUE?

@deangeckt
Copy link
Author

also in _prepare_inputs_for_qg_from_answers_hl()
i'd add this check to avoid further exceptions

if answer_text not in sent:
continue

@cdhx
Copy link

cdhx commented Dec 20, 2021

same issue
this is intermediate result during execute

input: 42 is the answer to life, the universe and everything.
sents, answers: ['42 is the answer to life, the universe and everything.'] [['<pad> 42']]
answer: [['<pad> 42']]

@YiLing28
Copy link

in pipelines.py, line 90, set skip_special_tokens=True
dec = [self.ans_tokenizer.decode(ids, skip_special_tokens=True) for ids in outs]
This may solve the problem.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants