Deep neural approach to Boundary and Disfluency Detection
This is part of my MSc project. More info:
My dissertation (ptbr)
•
EACL paper
•
STIL paper
•
PROPOR paper
•
LREC paper
First, clone this repository using git
:
git clone https://github.com/mtreviso/deepbond.git
Then, cd
to the DeepBond folder:
cd deepbond
Create a Python virtualenv and install all dependencies using:
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
Run the install command:
python3 setup.py install
Please note that since Python 3 is required, all the above commands (pip/python) have to be bounded to the Python 3 version.
The data should be put in a folder called data
in the root dir. Here is the basic ingredients that you might need:
- Corpus (see license): https://github.com/nilc-nlp/DNLT-BP
- Word embeddings (word2vec skipgram): https://www.dropbox.com/s/rw3ti4ebctufp4j/embeddings.zip?dl=1
- Prosodic information (only for Control and MCI): https://www.dropbox.com/s/0gmt2o2xeah13xk/prosodic.zip?dl=1
You can also send me an e-mail if you have any questions!
You can use deepbond in two ways:
The full list of arguments (CLI) and options (lib) can be seen via:
python3 -m deepbond --help
Take a look at the experiments
folder for more examples.
MIT.
If you use deepbond, you can cite this paper:
@inproceedings{treviso2018sentence,
author = "Marcos Vinícius Treviso and Sandra Maria Aluísio",
title = "Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests",
booktitle = "Computational Processing of the Portuguese Language (PROPOR)",
year = "2018",
publisher = "Springer International Publishing",
pages = "409--418",
}
Or the more recent publication (results without prosodic information + CRF)
@inproceedings{casanova-etal-2020-evaluating,
title = "Evaluating Sentence Segmentation in Different Datasets of Neuropsychological Language Tests in {B}razilian {P}ortuguese",
author = {Casanova, Edresson and
Treviso, Marcos and
H{\"u}bner, Lilian and
Alu{\'\i}sio, Sandra},
booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference (LREC)",
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
pages = "2605--2614",
ISBN = "979-10-95546-34-4",
}