vespa 🛵

Document Relevancy Ranking and Similarity Scoring using Vector Space Model.

Supporting all modes described here.

Installation

To install directly from github, run:

pip install git+ssh://git@github.com/mauricesvp/vespa.git
# or
pip install git+https://git@github.com/mauricesvp/vespa.git

To install from source:

git clone git@github.com:mauricesvp/vespa.git
# or
git clone https://github.com/mauricesvp/vespa.git

cd vespa
pip install .

Usage

from vespa import Vespa

corpus = ["Example document."]  # corpus: list of documents (strings)
vsm = Vespa(corpus)

results = vsm.score("Example query")
# > (0.7071067811865475, 'Example document.')

results = vsm.k_score("Example query", k=1)
# > [(0.7071067811865475, 'Example document.')]

The default mode is lnc.ltc, which means lnc is applied to each corpus document, and ltc to each query document. You can either supply a different mode when initializing, or to k_score or score directly (this will change the mode for subsequent calls).

If you want to get the score of a specific document, you can use the additional document argument for score:

results = vsm.score(query="Your query", document="Some document in corpus")

Documents can be added to the corpus:

vsm.add("some new document")  # str or list of str

or the corpus can be rebuilt, removing all previous entries:

vsm.corpus(new_corpus)  # str or list of str

Modes

All available modes are noted below (more details).

	Term frequency		Document frequency		Document length normalization
b	Binary weight	n	Disregards the collection frequency	n	No document length normalization
n	Raw term frequency	f	Inverse collection frequency	c	Cosine normalization
a	Augmented normalized frequency	t	Inverse collection frequency	u	Pivoted unique normalization
l	Logarithm	p	Probabilistic inverse collection frequency	b	Pivoted characted length normalization
L	Average-term-frequency-based normalization
d	Double logarithm

Limitations

Vespa does not feature:

Lemmatization and Stemming
Stopword filtering
Spelling correction
Any kind of machine learning

Background

For further reading, please reference:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tests		tests
vespa		vespa
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dev-requirements.txt		dev-requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vespa 🛵

Installation

Usage

Modes

Limitations

Background

About

Languages

License

mauricesvp/vespa

Folders and files

Latest commit

History

Repository files navigation

vespa 🛵

Installation

Usage

Modes

Limitations

Background

About

Topics

Resources

License

Stars

Watchers

Forks

Languages