name | institution | |
---|---|---|
Jiarui Luo | 11911419@mail.sustech.edu.cn | Southern University of Science and Technology |
Xinying Zheng | 11912039@mail.sustech.edu.cn | Southern University of Science and Technology |
Renjie Liu | 11911808@mail.sustech.edu.cn | Southern University of Science and Technology |
- install anaconda3 and go into anaconda3 bash
- create a new environment for testing:
conda create --name dbgroup python=3.9
- activate new environment:
conda activate dbgroup
- install basic packages:
pip install numpy pandas scipy tqdm
- install cpu-version pytorch:
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
- install sentence-transformers:
pip install -U sentence-transformers
- install faiss:
conda install faiss -c pytorch
python Main.py
- Use regular expressions to extract features from sentences
- Find entity pairs whose features are highly matched and add them at the beginning of the result set
- Encode each sentence using neural network (bert-tiny)
- Build an HNSW index by faiss
- Search the index to find topK neighbors for each encoded sentence and generate (sentence, neighborhood) pairs
- Sort the pairs using cosine distance
- Filter the result to remove pairs that are unlikely to match using extracted features