Enhancing text to graph representation by edge weighting and filtering through word association measures

Datasets

Polarity
WEBKB
R8
20 Newsgroups

Reproducing the experiments

We implemented auxiliary scripts in the scripts/ directory. To use these scripts, copy them to the project root directory (this directory). Example:

cp scripts/feature_runner.sh .

Requirements

The implementation was constructed using Python 3.6.13

Optional step

There is a optional step, to use a virtual environment before install the dependencies:

Install the virtualenv package:

pip install virtualenv

Create a new environment:

virtualenv venv

Activate the new environment:

source venv/bin/activate

Install the dependencies:

The dependencies are listed in the requirements.txt file. To install, run:

pip install -r requirements.txt

First step: Graphs generation and representation learning

Running all experiments

To run the proposed approach:

./feature_runner.sh

To run the baseline:

./baseline_feature_runner.sh

Running a single estimation

mprof run --output <time_mem_output_file.dat> --interval 60 feature_generator.py --dataset <dataset> --strategy <weight_strategy> --window 12 --emb_dim 100  --cut_percent <cut_p>

Where:

<time_mem_output_file> : Output file to register time x memory values.

<dataset> : Input dataset: polarity, r8 or webkb

<weight_strategy> : Weight strategy: pmi, pmi_all,llr, llr_all, chi_square, chi_square_all (for proprosed weight approaches) or no_weight (for baseline)

<cut_p> : Cut p: 5, 10, 20, 30, 50, 70, 90 (for the proposed approach) or 0 (for baseline)

Second step: Classification

Running all experiments

To run the proposed approach:

./cnn_runner.sh

To run the baseline:

./baseline_cnn_runner.sh

Running a single estimation

python3 cnn_main.py --dataset <dataset> --cut_percent <cut_p> --strategy <weight_strategy> --window 12 --emb_dim 100

Where:

<dataset> : Input dataset: polarity, r8 or webkb

<weight_strategy> : Weight strategy: pmi, pmi_all,llr, llr_all, chi_square, chi_square_all (for proprosed weight approaches) or no_weight (for baseline)

<cut_p> : Cut p: 5, 10, 20, 30, 50, 70, 90 (for the proposed approach) or 0 (for baseline)

Evaluation

The results_main.py script file implements the evaluation component. This script calculates - for a dataset and cut p - the 10-fold mean f1 score for all weight strategies and the baseline and also runs the Wilcoxon test comparing each weight strategy with the baseline. These results are printed in the console running the script (for example, the Linux terminal), and also, the results are written to output files in the plots/next_level/<cut p>/<dataset> directory, in txt files name as f1_polarity_12.txt

For example, running the results_main.py script as:

python results_main.py --dataset r8 --emb_dim 100 --window 12 --cut_percent 50

will generate the file: plots/next_level/0.05/polarity/f1_polarity_12.txt, containing:

no_weight,77.12544679853906,0.050221219618395326
chi_square,78.55941729972031,0.050606510004414684,p=0.275390625
chi_square_all,79.69882465466792,0.028751551738617296,p=0.10546875
llr,79.39449196564627,0.024537426341991925,p=0.193359375
llr_all,79.65264759486207,0.04178546879605646,p=0.16015625
pmi,79.42541259746658,0.03398688281606517,p=0.16015625
pmi_all,78.89614764450108,0.046800758344722936,p=0.275390625

Where:

The first line contains the mean f1 score and the standard deviation for the baseline (no_weight). The other lines contain for each weight strategy the mean f1 score, the standard deviation, and the p-value for the Wilcoxon test compared to the baseline.

The results_main.py runs for a single estimation on a combination of a dataset and cut p. To run the evaluation for all experiments, run:

./plot_f1.sh

Name		Name	Last commit message	Last commit date
Latest commit History 554 Commits
analysis		analysis
cnn		cnn
emnlp_2022_results		emnlp_2022_results
plots/next_level		plots/next_level
representation_learning		representation_learning
results		results
sac_results		sac_results
scripts		scripts
text_graph		text_graph
text_handler		text_handler
time_mem		time_mem
weight_cutter		weight_cutter
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cnn_main.py		cnn_main.py
compare_measures.py		compare_measures.py
f1_graphics_emnlp_2022.py		f1_graphics_emnlp_2022.py
feature_generator.py		feature_generator.py
requirements.txt		requirements.txt
results_main.py		results_main.py
resume_time_men.py		resume_time_men.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing text to graph representation by edge weighting and filtering through word association measures

Datasets

Reproducing the experiments

Requirements

Optional step

Install the dependencies:

First step: Graphs generation and representation learning

Running all experiments

Running a single estimation

Second step: Classification

Running all experiments

Running a single estimation

Evaluation

About

Releases 5

Packages

Languages

License

mro15/my-graph-library

Folders and files

Latest commit

History

Repository files navigation

Enhancing text to graph representation by edge weighting and filtering through word association measures

Datasets

Reproducing the experiments

Requirements

Optional step

Install the dependencies:

First step: Graphs generation and representation learning

Running all experiments

Running a single estimation

Second step: Classification

Running all experiments

Running a single estimation

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages