Unable to reproduce the PhoMT results with the HuggingFace Model #2

justinphan3110 · 2022-10-10T04:54:42Z

Hi, I'm trying to reproduce the En2Vi result described in the paper on the PhoMT Test Set.
I used the generation type as showed in the example

model_name = 'vinai/vinai-translate-en2vi'

tokenizer = AutoTokenizer.from_pretrained(model_name, src_lang="en_XX")
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
.....
outputs = model.generate(
        input_ids=batch['input_ids'].to('cuda'),
        max_length=max_target_length,
        do_sample=True,
        top_k=100,
        top_p=0.8,
        decoder_start_token_id=tokenizer.lang_code_to_id["vi_VN"],
        num_return_sequences=1,
    )

Yet, the testing result I got from the HuggingFace model was around 42.2 (The result showed in the paper is 44.29).

Do you plan to release the eval code/pipeline to reproduce the result discussed in the paper?

datquocnguyen · 2022-10-10T05:34:29Z

Are you using sacreBLEU?

justinphan3110 · 2022-10-10T05:38:48Z

I'm using the sacreBLEU from HuggingFace Metric. Is this different than the sacreBLEU you used in the paper? If so, can you share the Command-line that you used with the sacreBLEU?

datquocnguyen · 2022-10-10T06:11:20Z

Our training and inference stages (an example below) were originally performed by using fairseq. We then computed the detokenized and case-sensitive BLEU score using SacreBLEU (with the signature “BLEU+case.mixed+numrefs.1+smooth.exp+tok.1- 3a+version.1.5.1”).
The huggingface versions are just variants converted from our original fairseq models. So, I am not sure what makes differences in obtained scores between the two libraries atm.

SOURCE_LANG=vi_VN
TARGET_LANG=en_XX
LANGS=ar_AR,cs_CZ,de_DE,en_XX,es_XX,et_EE,fi_FI,fr_XX,gu_IN,hi_IN,it_IT,ja_XX,kk_KZ,ko_KR,lt_LT,lv_LV,my_MM,ne_NP,nl_XX,ro_RO,ru_RU,si_LK,tr_TR,vi_VN,zh_CN

fairseq-generate $DATA_DIR \
   --path $MODEL_DIR/checkpoint_best.pt\
   --task translation_from_pretrained_bart \
   --gen-subset valid\
   -t $TARGET_LANG -s $SOURCE_LANG \
   --bpe 'sentencepiece' --sentencepiece-model $MODEL_DIR/sentence.bpe.model \
   --sacrebleu --remove-bpe 'sentencepiece' \
   --batch-size 32 --langs $LANGS > vi_en

cp $SOURCE_DATA_DIR/val_tourism_finance.en_XX vi_en.ref
#cp $SOURCE_DATA_DIR/test_tourism.en_XX vi_en.ref

cat vi_en| grep -P "^H" |sort -V |cut -f 3- | sed 's/\[en_XX\]//g' > vi_en.hyp

datquocnguyen · 2023-10-30T06:38:25Z

@justinphan3110 I just had a bit of time to redo the evaluation. Using the simple script below, you'd obtain the scarebleu score at 44.2.

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer_en2vi = AutoTokenizer.from_pretrained(
    "vinai/vinai-translate-en2vi", src_lang="en_XX"
)
model_en2vi = AutoModelForSeq2SeqLM.from_pretrained("vinai/vinai-translate-en2vi")
device_en2vi = torch.device("cuda")
model_en2vi.to(device_en2vi)

def translate_en2vi(en_texts: str) -> str:
    input_ids = tokenizer_en2vi(en_texts, padding=True, return_tensors="pt").to(
        device_en2vi
    )
    output_ids = model_en2vi.generate(
        **input_ids,
        decoder_start_token_id=tokenizer_en2vi.lang_code_to_id["vi_VN"],
        num_return_sequences=1,
        num_beams=5,
        early_stopping=True
    )
    vi_texts = tokenizer_en2vi.batch_decode(output_ids, skip_special_tokens=True)
    return vi_texts

with open("PhoMT-detokenization-test/test.en", "r") as input_file:
    lines = [line.strip() for line in input_file.readlines()]
    index = 0
    writer = open("PhoMT-detokenization-test/test.vi_generated.v1", "w")
    while index < len(lines):
        texts = lines[index : index + 8]
        outputs = translate_en2vi(texts)
        print(outputs)
        for output in outputs:
            writer.write(output.strip() + "\n")
        index = index + 8
    writer.close()
    
import evaluate
references = [[line.strip()] for line in open("PhoMT-detokenization-test/test.vi", "r").readlines()]
predictions = [
    line.strip() for line in open("PhoMT-detokenization-test/test.vi_generated.v1", "r").readlines()
]
sacrebleu = evaluate.load("sacrebleu")
results = sacrebleu.compute(predictions=predictions, references=references)
print(results)

datquocnguyen · 2023-11-17T10:49:46Z

Evaluation for VietAI/envit5-translation:

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("VietAI/envit5-translation")
model = AutoModelForSeq2SeqLM.from_pretrained("VietAI/envit5-translation")
device = torch.device("cuda")
model.to(device)


def translate(vi_texts: str) -> str:
    input_ids = tokenizer(vi_texts, padding=True, return_tensors="pt").to(device)
    output_ids = model.generate(
        **input_ids,
        num_return_sequences=1,
        num_beams=5,
        early_stopping=True,
        max_length=512
    )
    en_texts = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
    return en_texts


with open("PhoMT-detokenization-test/test.vi", "r") as input_file:
    lines = ["vi: " + line.strip() for line in input_file.readlines()]
    index = 0
    writer = open("PhoMT-detokenization-test/test.en_generated.vietai", "w")
    while index < len(lines):
        texts = lines[index : index + 8]
        outputs = translate(texts)
        print(outputs)
        for output in outputs:
            writer.write(output[4:].strip() + "\n")

        index = index + 8

    writer.close()

with open("PhoMT-detokenization-test/test.en", "r") as input_file:
    lines = ["en: " + line.strip() for line in input_file.readlines()]
    index = 0
    writer = open("PhoMT-detokenization-test/test.vi_generated.vietai", "w")
    while index < len(lines):
        texts = lines[index : index + 8]
        outputs = translate(texts)
        print(outputs)
        for output in outputs:
            writer.write(output[4:].strip() + "\n")

        index = index + 8

    writer.close()
    
references = [
    [line.strip()]
    for line in open("PhoMT-detokenization-test/test.en", "r").readlines()
]
predictions = [
    line.strip()
    for line in open(
        "PhoMT-detokenization-test/test.en_generated.vietai", "r"
    ).readlines()
]
sacrebleu = evaluate.load("sacrebleu")
results = sacrebleu.compute(predictions=predictions, references=references)
print(results)

references = [
    [line.strip()]
    for line in open("PhoMT-detokenization-test/test.vi", "r").readlines()
]
predictions = [
    line.strip()
    for line in open(
        "PhoMT-detokenization-test/test.vi_generated.vietai", "r"
    ).readlines()
]
sacrebleu = evaluate.load("sacrebleu")
results = sacrebleu.compute(predictions=predictions, references=references)
print(results)

datquocnguyen closed this as completed Oct 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce the PhoMT results with the HuggingFace Model #2

Unable to reproduce the PhoMT results with the HuggingFace Model #2

justinphan3110 commented Oct 10, 2022 •

edited

Loading

datquocnguyen commented Oct 10, 2022

justinphan3110 commented Oct 10, 2022

datquocnguyen commented Oct 10, 2022 •

edited

Loading

datquocnguyen commented Oct 30, 2023 •

edited

Loading

datquocnguyen commented Nov 17, 2023

Unable to reproduce the PhoMT results with the HuggingFace Model #2

Unable to reproduce the PhoMT results with the HuggingFace Model #2

Comments

justinphan3110 commented Oct 10, 2022 • edited Loading

datquocnguyen commented Oct 10, 2022

justinphan3110 commented Oct 10, 2022

datquocnguyen commented Oct 10, 2022 • edited Loading

datquocnguyen commented Oct 30, 2023 • edited Loading

datquocnguyen commented Nov 17, 2023

justinphan3110 commented Oct 10, 2022 •

edited

Loading

datquocnguyen commented Oct 10, 2022 •

edited

Loading

datquocnguyen commented Oct 30, 2023 •

edited

Loading