Skip to content
@MinishLab

The Minish Lab

Solving big problems with small models

Hello, we're minish!

We're a two-person (@pringled and @stephantul) open-source company, with a focus on Natural Language Processing.

We believe that if you make models fast enough, you unlock new possibilities.

Using our software, you can:

  • Embed the entire English Wikipedia in 5 minutes
  • Classify tens of thousands of documents per second on CPU
  • Approximately deduplicate extremely large datasets in minutes
  • Build the fastest RAG application in the world
  • Easily evaluate which ANN algorithm works best for your data

Our projects:

  • model2vec: make tiny models that are still really really good.
  • potion: the best small model in the world. 100-500x faster than a sentence-transformer, and almost as good.
  • vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
  • semhash: lightning-fast, super accuracte, approximate deduplication for your text datasets.

You can also find us on:

Pinned Loading

  1. model2vec model2vec Public

    Fast State-of-the-Art Static Embeddings

    Python 1.1k 49

  2. semhash semhash Public

    Fast Semantic Text Deduplication

    Python 511 22

  3. vicinity vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 240 7

  4. tokenlearn tokenlearn Public

    Pre-train Static Word Embeddings

    Python 47 3

Repositories

Showing 9 of 9 repositories
  • model2vec Public

    Fast State-of-the-Art Static Embeddings

    MinishLab/model2vec’s past year of commit activity
    Python 1,052 MIT 49 4 3 Updated Feb 15, 2025
  • semhash Public

    Fast Semantic Text Deduplication

    MinishLab/semhash’s past year of commit activity
    Python 511 MIT 22 0 3 Updated Feb 15, 2025
  • vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    MinishLab/vicinity’s past year of commit activity
    Python 240 MIT 7 1 2 Updated Feb 15, 2025
  • .github Public

    Readme

    MinishLab/.github’s past year of commit activity
    0 0 0 0 Updated Feb 15, 2025
  • MinishLab/minishlab.github.io’s past year of commit activity
    SCSS 0 MIT 0 0 0 Updated Feb 6, 2025
  • tokenlearn Public

    Pre-train Static Word Embeddings

    MinishLab/tokenlearn’s past year of commit activity
    Python 47 MIT 3 1 1 Updated Jan 29, 2025
  • korok Public

    Lightweight Hybrid Search and Reranking

    MinishLab/korok’s past year of commit activity
    Python 7 MIT 1 0 0 Updated Dec 26, 2024
  • watertemplate Public template

    Template

    MinishLab/watertemplate’s past year of commit activity
    Makefile 1 MIT 1 0 0 Updated Dec 9, 2024
  • evaluation Public

    Code to evaluate performance for embeddings

    MinishLab/evaluation’s past year of commit activity
    Python 10 MIT 0 0 0 Updated Sep 25, 2024

Top languages

Loading…

Most used topics

Loading…