In this repository, we provide the code and Distributional Semantic Models (DSMs) for a large-scale evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
These scripts offer a useful instrument to investigate the performance of embeddings in several semantic tasks, carrying out an in-depth statistical analysis to identify the major factors influencing the behavior of DSMs.
This analysis is composed of an intrinsic evaluation, an extrinsic evaluation, and Representational Similarity Analysis (RSA), a methodology borrowed from cognitive science to inspect the semantic spaces generated by distributional models.
This repository contains the material to reproduce extrinsic evaluation and RSA. The intrinsic evaluation can be found at this repository: https://github.com/patrickjeuniaux/word-embeddings-benchmarks
To download the models write on command line:
wget http://coling-lab.humnet.unipi.it:8080/<model_url>
You can find the list of all models' urls in space_paths.txt.
if you use any material from this repository please cite this paper:
@article{lenci2022comparative,
title={A comparative evaluation and analysis of three generations of Distributional Semantic Models},
author={Lenci, Alessandro and Sahlgren, Magnus and Jeuniaux, Patrick and Cuba Gyllensten, Amaru and Miliani, Martina},
journal={Language Resources and Evaluation},
pages={1--45},
year={2022},
publisher={Springer}
}