Data augmentation/generation utilities for Iara.
Can be used standalone (instructions below) or as part of other programs.
help: ./run_txt_aug.py -h
example usage:
./run_txt_aug.py corpus_1br_10pt_15sept.tok --aug backtranslate random_del --maxs 5 --lang en --translate_mode google --append --output out.txt
help: ./run_txt_gen.py -h
example usage:
./run_txt_gen.py --input_file palavras.txt --context "radiologia médica" --num 2 --return_type "frases" --api_key "YOUR_OPENAI_API_KEY" --output query.txt
help: ./run_audio_aug.py -h
example usage:
./run_audio_aug.py test.ogg --augmentations PitchShift GainTransition --output_format ogg
help: ./run_create_corpus.py -h
example usage:
./run_create_corpus.py dataset_es.txt --lang es --output corpus_es.tok