Skip to content

ssun32/CLIRMatrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

CLIRMatrix

http://www.cs.jhu.edu/~shuosun/clirmatrix/

Alternatively, CLIRMatrix is also available in the following google drive:

https://drive.google.com/drive/folders/1V-DcBwvAnlVAYJw_gsx0zXV5VXJcRGGc?usp=sharing

Script to extract untruncated documents from Wikipedia dumps:

Usage:
    ./extract.sh [wikipedia language code]
E.g.
    ./extract.sh en

Reference

[1] Shuo Sun, Kevin Duh CLIRMatrix: A massively large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published