-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Unable to reproduce the PhoMT results with the HuggingFace Model #2
Comments
Are you using sacreBLEU? |
I'm using the sacreBLEU from HuggingFace Metric. Is this different than the sacreBLEU you used in the paper? If so, can you share the Command-line that you used with the sacreBLEU? |
Our training and inference stages (an example below) were originally performed by using fairseq. We then computed the detokenized and case-sensitive BLEU score using SacreBLEU (with the signature “BLEU+case.mixed+numrefs.1+smooth.exp+tok.1- 3a+version.1.5.1”).
|
@justinphan3110 I just had a bit of time to redo the evaluation. Using the simple script below, you'd obtain the scarebleu score at 44.2. import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer_en2vi = AutoTokenizer.from_pretrained(
"vinai/vinai-translate-en2vi", src_lang="en_XX"
)
model_en2vi = AutoModelForSeq2SeqLM.from_pretrained("vinai/vinai-translate-en2vi")
device_en2vi = torch.device("cuda")
model_en2vi.to(device_en2vi)
def translate_en2vi(en_texts: str) -> str:
input_ids = tokenizer_en2vi(en_texts, padding=True, return_tensors="pt").to(
device_en2vi
)
output_ids = model_en2vi.generate(
**input_ids,
decoder_start_token_id=tokenizer_en2vi.lang_code_to_id["vi_VN"],
num_return_sequences=1,
num_beams=5,
early_stopping=True
)
vi_texts = tokenizer_en2vi.batch_decode(output_ids, skip_special_tokens=True)
return vi_texts
with open("PhoMT-detokenization-test/test.en", "r") as input_file:
lines = [line.strip() for line in input_file.readlines()]
index = 0
writer = open("PhoMT-detokenization-test/test.vi_generated.v1", "w")
while index < len(lines):
texts = lines[index : index + 8]
outputs = translate_en2vi(texts)
print(outputs)
for output in outputs:
writer.write(output.strip() + "\n")
index = index + 8
writer.close()
import evaluate
references = [[line.strip()] for line in open("PhoMT-detokenization-test/test.vi", "r").readlines()]
predictions = [
line.strip() for line in open("PhoMT-detokenization-test/test.vi_generated.v1", "r").readlines()
]
sacrebleu = evaluate.load("sacrebleu")
results = sacrebleu.compute(predictions=predictions, references=references)
print(results) |
Evaluation for import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("VietAI/envit5-translation")
model = AutoModelForSeq2SeqLM.from_pretrained("VietAI/envit5-translation")
device = torch.device("cuda")
model.to(device)
def translate(vi_texts: str) -> str:
input_ids = tokenizer(vi_texts, padding=True, return_tensors="pt").to(device)
output_ids = model.generate(
**input_ids,
num_return_sequences=1,
num_beams=5,
early_stopping=True,
max_length=512
)
en_texts = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
return en_texts
with open("PhoMT-detokenization-test/test.vi", "r") as input_file:
lines = ["vi: " + line.strip() for line in input_file.readlines()]
index = 0
writer = open("PhoMT-detokenization-test/test.en_generated.vietai", "w")
while index < len(lines):
texts = lines[index : index + 8]
outputs = translate(texts)
print(outputs)
for output in outputs:
writer.write(output[4:].strip() + "\n")
index = index + 8
writer.close()
with open("PhoMT-detokenization-test/test.en", "r") as input_file:
lines = ["en: " + line.strip() for line in input_file.readlines()]
index = 0
writer = open("PhoMT-detokenization-test/test.vi_generated.vietai", "w")
while index < len(lines):
texts = lines[index : index + 8]
outputs = translate(texts)
print(outputs)
for output in outputs:
writer.write(output[4:].strip() + "\n")
index = index + 8
writer.close()
references = [
[line.strip()]
for line in open("PhoMT-detokenization-test/test.en", "r").readlines()
]
predictions = [
line.strip()
for line in open(
"PhoMT-detokenization-test/test.en_generated.vietai", "r"
).readlines()
]
sacrebleu = evaluate.load("sacrebleu")
results = sacrebleu.compute(predictions=predictions, references=references)
print(results)
references = [
[line.strip()]
for line in open("PhoMT-detokenization-test/test.vi", "r").readlines()
]
predictions = [
line.strip()
for line in open(
"PhoMT-detokenization-test/test.vi_generated.vietai", "r"
).readlines()
]
sacrebleu = evaluate.load("sacrebleu")
results = sacrebleu.compute(predictions=predictions, references=references)
print(results) |
Hi, I'm trying to reproduce the En2Vi result described in the paper on the PhoMT Test Set.
I used the generation type as showed in the example
Yet, the testing result I got from the HuggingFace model was around 42.2 (The result showed in the paper is 44.29).
Do you plan to release the eval code/pipeline to reproduce the result discussed in the paper?
The text was updated successfully, but these errors were encountered: