Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix(deps): Update getmeili/meilisearch Docker tag to v1.12.1 #20069

Merged
merged 2 commits into from
Jan 7, 2025

Conversation

cq-bot
Copy link
Contributor

@cq-bot cq-bot commented Jan 7, 2025

This PR contains the following updates:

Package Update Change
getmeili/meilisearch minor v1.1.0 -> v1.12.1

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

meilisearch/meilisearch (getmeili/meilisearch)

v1.12.1

Compare Source

Fixes

There was a bug in the engine when adding an empty payload, it was making the batch fails.
Fixed by @​irevoire in https://github.com/meilisearch/meilisearch/pull/5192

Full Changelog: meilisearch/meilisearch@v1.12.0...v1.12.1

v1.12.0: 🦗

Compare Source

Meilisearch v1.12 introduces significant indexing speed improvements, almost halving the time required to index large datasets. This release also introduces new settings to customize and potentially further increase indexing speed.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

New features and updates 🔥

Improve indexing speed

Indexing time is improved across the board!

  • Performance is maintained or better on smaller machines
  • On bigger machines with multiple cores and good IO, Meilisearch v1.12 is much faster than Meilisearch v1.11
    • More than twice as fast for raw document insertion tasks.
    • More than x4 as fast for incrementally updating documents in a large database.
    • Embeddings generation was also improved up to x1.5 for some workloads.

The new indexer also makes task cancellation faster.

Done by @​dureuill, @​ManyTheFish, and @​Kerollmops in #​4900.

New index settings: use facetSearch and prefixSearch to improve indexing speed

v1.12 introduces two new index settings: facetSearch and prefixSearch.

Both settings allow you to skip parts of the indexing process. This leads to significant improvements to indexing speed, but may negatively impact search experience in some use cases.

Done by @​ManyTheFish in #​5091

facetSearch

Use this setting to toggle facet search:

curl \
  -X PUT 'http://localhost:7700/indexes/books/settings/facet-search' \
  -H 'Content-Type: application/json' \
  --data-binary 'true'

The default value for facetSearch is true. When set to false, this setting disables facet search for all filterable attributes in an index.

prefixSearch

Use this setting to configure the ability to search a word by prefix on an index:

curl \
  -X PUT 'http://localhost:7700/indexes/books/settings/prefix-search' \
  -H 'Content-Type: application/json' \
  --data-binary 'disabled'

prefixSearch accepts one of the following values:

  • "indexingTime": enables prefix processing during indexing. This is the default Meilisearch behavior
  • "disabled": deactivates prefix search completely

Disabling prefix search means the query he will no longer match the word hello. This may significantly impact search result relevancy, but speeds up the indexing process.

New API route: /batches

The new /batches endpoint allow you to query information about task batches.

GET /batches returns a list of batch objects:

curl  -X GET 'http://localhost:7700/batches'

This endpoint accepts the same parameters as GET /tasks route, allowing you to narrow down which batches you want to see. Parameters used with GET /batches apply to the tasks, not the batches themselves. For example, GET /batches?uid=0 returns batches containing tasks with a taskUid of 0 , not batches with a batchUid of 0.

You may also query GET /batches/:uid to retrieve information about a single batch object:

curl  -X GET 'http://localhost:7700/batches/BATCH_UID'

/batches/:uid does not accept any parameters.

Batch objects contain the following fields:

{
  "uid": 160,
  "progress": {
    "steps": [
      {
        "currentStep": "processing tasks",
        "finished": 0,
        "total": 2
      },
      {
        "currentStep": "indexing",
        "finished": 2,
        "total": 3
      },
      {
        "currentStep": "extracting words",
        "finished": 3,
        "total": 13
      },
      {
        "currentStep": "document",
        "finished": 12300,
        "total": 19546
      }
    ],
    "percentage": 37.986263
  },
  "details": {
    "receivedDocuments": 19547,
    "indexedDocuments": null
  },
  "stats": {
    "totalNbTasks": 1,
    "status": {
      "processing": 1
    },
    "types": {
      "documentAdditionOrUpdate": 1
    },
    "indexUids": {
      "mieli": 1
    }
  },
  "duration": null,
  "startedAt": "2024-12-12T09:44:34.124726733Z",
  "finishedAt": null
}

Additionally, task objects now include a new field, batchUid. Use this field together with /batches/:uid to retrieve data on a specific batch.

{
  "uid": 154,
  "batchUid": 142,
  "indexUid": "movies_test2",
  "status": "succeeded",
  "type": "documentAdditionOrUpdate",
  "canceledBy": null,
  "details": {
    "receivedDocuments": 1,
    "indexedDocuments": 1
  },
  "error": null,
  "duration": "PT0.027766819S",
  "enqueuedAt": "2024-12-02T14:07:34.974430765Z",
  "startedAt": "2024-12-02T14:07:34.99021667Z",
  "finishedAt": "2024-12-02T14:07:35.017983489Z"
}

Done by @​irevoire in #​5060, #​5070, #​5080

Other improvements

  • New query parameter for GET/tasks: reverse. If reverse is set to true, tasks will be returned in reversed order, from oldest to newest tasks. Done by @​irevoire in #​5048
  • Phrase searches withshowMatchesPosition set to true give a single location for the whole phrase @​flevi29 in #​4928
  • New Prometheus metrics by @​PedroTurik in #​5044
  • When a query finds matching terms in document fields with array values, Meilisearch now includes an indices field to _matchesPosition specifying which array elements contain the matches by @​LukasKalbertodt in #​5005
  • ⚠️ Breaking vectorStore change: field distribution no longer contains _vectors. Its value used to be incorrect, and there is no current use case for the fixed, most likely empty, value. Done as part of #​4900
  • Improve error message by adding index name in #​5056 by @​airycanon

Fixes 🐞

Misc

❤️ Thanks again to our external contributors:

v1.11.3: 🐿️

Compare Source

What's Changed

Full Changelog: meilisearch/meilisearch@v1.11.2...v1.11.3

v1.11.2: 🐿️

Compare Source

What's Changed

Full Changelog: meilisearch/meilisearch@v1.11.1...v1.11.2

v1.11.1: 🐿️

Compare Source

What's Changed

Full Changelog: meilisearch/meilisearch@v1.11.0...v1.11.1

v1.11.0: 🐿️

Compare Source

Meilisearch v1.11 introduces AI-powered search performance improvements thanks to binary quantization and various usage changes, all of which are steps towards a future stabilization of the feature. We have also improved federated search usage following user feedback.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

New features and updates 🔥

Experimental - AI-powered search improvements

This release is Meilisearch's first step towards stabilizing AI-powered search and introduces a few breaking changes to its API. Consult the PRD for full usage details.

Done by @​dureuill in #​4906, #​4920, #​4892, and #​4938.

⚠️ Breaking changes
  • When performing AI-powered searches, hybrid.embedder is now a mandatory parameter in GET and POST /indexes/{:indexUid}/search
  • As a consequence, it is now mandatory to pass hybrid even for pure semantic searches
  • embedder is now a mandatory parameter in GET and POST /indexes/{:indexUid}/similar
  • Meilisearch now ignores semanticRatio and performs a pure semantic search for queries that include vector but not q
Addition & improvements
  • The default model for OpenAI is now text-embedding-3-small instead of text-embedding-ada-002
  • This release introduces a new embedder option: documentTemplateMaxBytes. Meilisearch will truncate a document's template text when it goes over the specified limit
  • Fields in documentTemplate include a new field.is_searchable property. The default document template now filters out both empty fields and fields not in the searchable attributes list:

v1.11:

{% for field in fields %}
  {% if field.is_searchable and not field.value == nil %}
    {{ field.name }}: {{ field.value }}\n
  {% endif %}
{% endfor %}

v1.10:

{% for field in fields %}
  {{ field.name }}: {{ field.value }}\n
{% endfor %}

Embedders using the v1.10 document template will continue working as before. The new default document template will only work with newly created embedders.

Vector database indexing performance improvements

v1.11 introduces a new embedder option, binaryQuantized:

curl \
  -X PATCH 'http://localhost:7700/indexes/movies/settings' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "embedders": {
      "image2text": {
        "binaryQuantized": true
      }
    }
  }'

Enable binary quantization to convert embeddings of floating point numbers into embeddings of boolean values. This will negatively impact the relevancy of AI-powered searches but significantly improve performance in large collections with more than 100 dimensions.

In our benchmarks, this reduced the size of the database by a factor of 10 and divided the indexing time by a factor of 6 with little impact on search times.

[!WARNING]
Enabling this feature will update all of your vectors to contain only 1s or -1s, significantly impacting relevancy.

You cannot revert this option once you enable it. Before setting binaryQuantized to true, Meilisearch recommends testing it in a smaller or duplicate index in a development environment.

Done by @​irevoire in #​4941.

Federated search improvements

Facet distribution and stats for federated searches

This release adds two new federated search options, facetsByIndex and mergeFacets. These allow you to request a federated search for facet distributions and stats data.

Facet information by index

To obtain facet distribution and stats for each separate index, use facetsByIndex when querying the POST /multi-search endpoint:

POST /multi-search
{
  "federation": {
    "limit": 20,
    "offset": 0,
	"facetsByIndex": {
	  "movies": ["title", "id"],
	  "comics": ["title"],
	}
  },
  "queries": [
    {
      "q": "Batman",
      "indexUid": "movies"
    },
    {
      "q": "Batman",
      "indexUid": "comics"
    }
  ]
}

The multi-search response will include a new field, facetsByIndex with facet data separated per index:

{
  "hits": [],
  
  "facetsByIndex": {
      "movies": {
        "distribution": {
          "title": {
            "Batman returns": 1
          },
          "id": {
            "42": 1
          }
        },
        "stats": {
          "id": {
            "min": 42,
            "max": 42
          }
        }
      },}
}
Merged facet information

To obtain facet distribution and stats for all indexes merged into a single, use both facetsByIndex and mergeFacets when querying the POST /multi-search endpoint:

POST /multi-search
{

  "federation": {
    "limit": 20,
    "offset": 0,
	  "facetsByIndex": {
	    "movies": ["title", "id"],
	    "comics": ["title"],
	  },
	  "mergeFacets": {
	    "maxValuesPerFacet": 10,
	  }
  }
  "queries": [
    {
      "q": "Batman",
      "indexUid": "movies"
    },
    {
      "q": "Batman",
      "indexUid": "comics"
    }
  ]
}

The response includes two new fields, facetDistribution and facetStarts:

{
  "hits": [],
  
  "facetDistribution": {
    "title": {
      "Batman returns": 1
      "Batman: the killing joke":
    },
    "id": {
      "42": 1
    }
  },
  "facetStats": {
    "id": {
      "min": 42,
      "max": 42
    }
  }
}

Done by @​dureuill in #​4929.

Experimental — New STARTS WITH filter operator

Enable the experimental feature to use the STARTS WITH filter operator:

curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "containsFilter": true
  }'

Use the STARTS WITH operator when filtering:

curl \
  -X POST http://localhost:7700/indexes/movies/search \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "filter": "hero STARTS WITH spider"
  }'

🗣️ This is an experimental feature, and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @​Kerollmops in #​4939.

Other improvements

Fixes 🐞

  • ⚠️ When using federated search, query.facets was silently ignored at the query level, but should not have been. It now returns the appropriate error. Use federation.facetsByIndex instead if you want facets to be applied during federated search.
  • Prometheus /metrics return the route pattern instead of the real route when returning the HTTP requests total by @​irevoire in #​4839
  • Truncate values at the end of a list of facet values when the number of facet values is larger than maxValuesPerFacet. For example, setting maxValuesPerFacet to 2 could result in ["blue", "red", "yellow"], being truncated to ["blue", "yellow"] instead of ["blue", "red"]`. By @​dureuill in #​4929
  • Improve the task cancellation when vectors are used, by @​irevoire in #​4971
  • Swedish support: the characters å, ä, ö are no longer normalized to a and o. By @​ManyTheFish in #​4945
  • Update rhai to fix an internal error when updating documents with a function (experimental) by @​irevoire in #​4960
  • Fix the bad experimental search queue size by @​irevoire in #​4992
  • Do not send empty edit document by function by @​irevoire in #​5001
  • Display vectors when no custom vectors were ever provided by @​dureuill in #​5008

Misc

❤️ Thanks again to our external contributors:

v1.10.3: 🦩

Compare Source

Search improvements

This PR lets you configure two behaviors of the engine through experimental cli flags:

Done by @​irevoire in https://github.com/meilisearch/meilisearch/pull/5000

Full Changelog: meilisearch/meilisearch@v1.10.2...v1.10.3

v1.10.2: 🦩

Compare Source

Fixes 🦋

Activate the Swedish tokenization Pipeline

The Swedish tokenization pipeline were deactivated in the previous versions, now it is activated when specifying the index Language in the settings:

PATCH /indexes/:index-name/settings
{
  "localizedAttributes": [ { "locales": ["swe"], "attributePatterns": ["*"] } ]
}

related PR: #​4949

v1.10.1: 🦩

Compare Source

Fixes 🦋

Better search handling under heavy loads

All of the next PR should make meilisearch behave better under heavy loads:

Speed improvement 🐎

We made the autobatching of the document deletion with the document deletion by filter possible which should uncklog the task queue of the people using these two operations heavily.
Meilisearch still cannot autobatch the document deletion by filter and the document addition, though.

Full Changelog: meilisearch/meilisearch@v1.10.0...v1.10.1

v1.10.0: 🦩

Compare Source

Meilisearch v1.10 introduces federated search. This innovative feature allows you to receive a single list of results for multi-search requests. v1.10 also includes a setting to manually define which language or languages are present in your documents, and two new new experimental features: the CONTAINS filter operator and the ability to update a subset of your dataset with a function.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

New features and updates 🔥

Federated search

Use the new federation setting of the /multi-search route to return a single search result object:

curl \
  -X POST 'http://localhost:7700/multi-search' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "federation": {
      "offset": 5,
      "limit": 10
    }
    "queries": [
      {
        "q": "Batman",
        "indexUid": "movies"
      },
      {
        "q": "Batman",
        "indexUid": "comics"
      }
    ]
  }'

Response:

{
  "hits": [
    {
      "id": 42,
      "title": "Batman returns",
      "overview": "..",
      "_federation": {
        "indexUid": "movies",
        "queriesPosition": 0
      }
    },
    {
      "comicsId": "batman-killing-joke",
      "description": "..",
      "title": "Batman: the killing joke",
      "_federation": {
        "indexUid": "comics",
        "queriesPosition": 1
      }
    },
    
 ],
  processingTimeMs: 0,
  limit: 20,
  offset: 0,
  estimatedTotalHits: 2,
  semanticHitCount: 0,
}

When performing a federated search, Meilisearch merges the results coming from different sources in descending ranking score order.

If federation is empty ({}), Meilisearch sets offset and limit to 0 and 20 respectively.

If federation is null or missing, multi-search returns one list of search result objects for each index.

Federated results relevancy

When performing federated searches, use federationOptions in the request's queries array to configure the relevancy and the weight of each index:

curl \
 -X POST 'http://localhost:7700/multi-search' \
 -H 'Content-Type: application/json' \
 --data-binary '{
  "federation": {},
  "queries": [
    {
      "q": "apple red",
      "indexUid": "fruits",
      "filter": "BOOSTED = true",
      "_showRankingScore": true,
      "federationOptions": {
        "weight": 3.0
      }
    },
    {
      "q": "apple red",
      "indexUid": "fruits",
      "_showRankingScore": true,
    }
  ]
}'

federationOptions must be an object. It supports a single field, weight, which must be a positive floating-point number:

  • if weight < 1.0, results from this index are less likely to appear in the results
  • if weight > 1.0, results from this index are more likely to appear in the results
  • if not specified, weight defaults to 1.0

📖 Consult the usage page for more information about the merge algorithm.

Done by @​dureuill in #​4769.

Experimental: CONTAINS filter operator

Enable the containsFilter experimental feature to use the CONTAINS filter operator:

curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "containsFilter": true
  }'

CONTAINS filters results containing partial matches to the specified string, similar to a SQL LIKE:

curl \
  -X POST http://localhost:7700/indexes/movies/search \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "q": "super hero",
    "filter": "synopsis CONTAINS spider"
  }'

🗣️ This is an experimental feature, and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @​irevoire in #​4804.

Language settings

Use the new localizedAttributes index setting and the locales search parameter to explicitly set the languages used in document fields and the search query itself. This is particularly useful for <=v1.9 users who have to occasionally resort to alternative Meilisearch images due to language auto-detect issues in Swedish and Japanese datasets.

Done by @​ManyTheFish in #​4819.

Set language during indexing with localizedAttributes

Use the newly introduced localizedAttributes setting to explicitly declare which languages correspond to which document fields:

curl \
  -X PATCH 'http://localhost:7700/indexes/movies/settings' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "localizedAttributes": [
      {"locales": ["jpn"], "attributePatterns": ["*_ja"]},
      {"locales": ["eng"], "attributePatterns": ["*_en"]},
      {"locales": ["cmn"], "attributePatterns": ["*_zh"]},
      {"locales": ["fra", "ita"], "attributePatterns": ["latin.*"]},
      {"locales": [], "attributePatterns": ["*"]}
    ]
  }'

locales is a list of ISO-639-3 language codes to assign to a pattern. The currently supported languages are: epo, eng, rus, cmn, spa, por, ita, ben, fra, deu, ukr, kat, ara, hin, jpn, heb, yid, pol, amh, jav, kor, nob, dan, swe, fin, tur, nld, hun, ces, ell, bul, bel, mar, kan, ron, slv, hrv, srp, mkd, lit, lav, est, tam, vie, urd, tha, guj, uzb, pan, aze, ind, tel, pes, mal, ori, mya, nep, sin, khm, tuk, aka, zul, sna, afr, lat, slk, cat, tgl, hye.

attributePattern is a pattern that can start or end with a * to match one or several attributes.

If an attribute matches several rules, only the first rule in the list will be applied. If the locales list is empty, then Meilisearch is allowed to auto-detect any language in the matching attributes.

These rules are applied to the searchableAttributes, the filterableAttributes, and the sortableAttributes.

Set language at search time with locales

The /search route accepts a new parameter, locales. Use it to define the language used in the current query:

curl \
  -X POST http://localhost:7700/indexes/movies/search \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "q": "進撃の巨人",
    "locales": ["jpn"]
  }'

The locales parameter overrides eventual locales in the index settings.

Experimental: Edit documents with a Rhai function

Use a Rhai function to edit documents in your database directly from Meilisearch:

First, activate the experimental feature:

curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "editDocumentsByFunction": true
  }'

Then query the /documents/edit route with the editing function:

curl http://localhost:7700/indexes/movies/documents/edit \
  -H 'content-type: application/json' \
  -d '{
   "function": "doc.title = `✨ ${doc.title.to_upper()} ✨`",
   "filter": "id > 3000"
  }'

/documents/edit accepts three parameters in its payload: function, filter, and context.

function must be a string with a Rhai function. filter must be a filter expression.. context must be an object with data you want to make available for the editing function.

📖 More information here.

🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @​Kerollmops in #​4626.

Experimental AI-powered search: quality of life improvements

For the purpose of future stabilization of the feature, we are applying changes and quality-of-life improvements.

Done by @​dureuill in #​4801, #​4815, #​4818, #​4822.

⚠️ Breaking changes: Changing the parameters of the REST API

The old parameters of the REST API are too numerous and confusing.

Removed parameters: query , inputField, inputType, pathToEmbeddings and embeddingObject.
Replaced by:

  • request : A JSON value that represents the request made by Meilisearch to the remote embedder. The text to embed must be replaced by the placeholder value “{{text}}”.
  • response: A JSON value that represents a fragment of the response made by the remote embedder to Meilisearch. The embedding must be replaced by the placeholder value "{{embedding}}".

Before:

// v1.10 version ✅
{
  "source": "rest",
  "url": "https://localhost:10006",
  "request": {
    "model": "minillm",
    "prompt": "{{text}}"
  },
  "response": {
    "embedding": "{{embedding}}"
  }
}
// v1.9 version ❌
{
  "source": "rest",
  "url": "https://localhost:10006",
  "query": {
    "model": "minillm",
  },
  "inputField": ["prompt"],
  "inputType": "text",
  "embeddingObject": ["embedding"]
}

[!CAUTION]
This is a breaking change to the configuration of REST embedders.
Importing a dump containing a REST embedder configuration will fail in v1.10 with an error: "Error: unknown field query, expected one of source, model, revision, apiKey, dimensions, documentTemplate, url, request, response, distribution at line 1 column 752".

Upgrade procedure:

  1. Remove embedders with source "rest"
  2. Update your Meilisearch Cloud project or self-hosted Meilisearch instance as usual
Add custom headers to REST embedders

When the source of an embedder is set to rest, you may include an optional headers parameter. Use this to configure custom headers you want Meilisearch to include in the requests it sends the embedder.

Embedding requests sent from Meilisearch to a remote REST embedder always contain two headers:

  • Authorization: Bearer <apiKey> (only if apiKey was provided)
  • Content-Type: application/json

When provided, headers should be a JSON object whose keys represent the name of additional headers to send in requests, and the values represent the value of these additional headers.

If headers is missing or null for a rest embedder, only Authorization and Content-Type are sent, as described above.

If headers contains Authorization and Content-Type, the declared values will override the ones that are sent by default.

Using the headers parameter for any other source besides rest results in an invalid_settings_embedder error.

Other quality-of-life improvements

📖 More details here

  • Add url parameter to the OpenAI embedder. url should be an URL to the embedding endpoint (including the v1/embeddingspart) from OpenAI. If url is missing or null for an openAi embedder, the default OpenAI embedding route will be used (https://api.openai.com/v1/embeddings).
  • dimensions is now available as an optional parameter for ollama embedders. Previously it was only available for rest, openAi and userProvided embedders.
  • Previously _vectors.embedder was omitted for documents without at least one embedding for embedder. This was inconsistent and prevented the user from checking the value of regenerate.
  • When a request to a REST embedder fails, the duration of the exponential backoff is now randomized up to twice its base duration
  • Truncate rather than embed by chunk when OpenAI embeddings are bigger than the max number of tokens
  • Improve error message when indexing documents and embeddings are missing for a user-provided embedder
  • Improve error message when a model configuration cannot be loaded and its "architectures" field does not contain "BertModel"

⚠️ Important change regarding the minimal Ubuntu version compatible with Meilisearch

Because the GitHub Actions runner now enforces the usage of a Node version that is not compatible with Ubuntu 18.04 anymore, we had to upgrade the minimal Ubuntu version compatible with Meilisearch. Indeed, we use these GitHub actions to build and provide our binaries.

Now, Meilisearch is only compatible with Ubuntu 20.04 and later and not with Ubuntu 18.4 anymore.

Done by @​curquiza in #​4783.

Other improvements

Fixes 🐞

Misc

❤️ Thanks again to our external contributors:

v1.9.1: 🦎

Compare Source

Fixes 🪲

This fixes an issue where dumps created for indexes with:

  1. A user-provided embedder
  2. At least one documents that opt-out of vectors for that user-provided embedder

would fail to import correctly.

Upgrade path to v1.10.0 🚀

If you are a Cloud user affected by the above issue, please contact customer support so we perform the upgrade for you.

If you are an OSS user affected by the above, perform the following operations:

  1. Upgrade from v1.9.0 to v1.9.1 without using a dump
  2. Upgrade to v1.10.0 using a dump created from v1.9.1

Full Changelog

v1.9.0: 🦎

Compare Source

Meilisearch v1.9 includes performance improvements for hybrid search and the addition/updating of settings. This version benefits from multiple requested features, such as the new frequency matching strategy and the ability to retrieve similar documents.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

New features and updates 🔥

Hybrid search updates

This release introduces multiple hybrid search updates.

Done by @​dureuill and @​irevoire in #​4633 and #​4649

⚠️ Breaking change: Empty _vectors.embedder arrays

Empty _vectors.embedder arrays are now interpreted as having no vector embedding.

Before v1.9, Meilisearch interpreted these as a single embedding of dimension 0. This change follows user feedback that the previous behavior was unexpected and unhelpful.

⚠️ Breaking change: _vectors field no longer present in search results

When the experimental vectorStore feature is enabled, Meilisearch no longer includes _vectors in returned search results by default. This will considerably improve performance.

Use the new retrieveVectors search parameter to display the _vectors field:

curl \
  -X POST 'http://localhost:7700/indexes/INDEX_NAME/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "q": "SEARCH QUERY",
    "retrieveVectors": true
  }'
⚠️ Breaking change: Meilisearch no longer preserves the exact representation of embeddings appearing in _vectors

In order to save storage and run faster, Meilisearch is no longer storing your vector "as-is". Meilisearch now returns the float in a canonicalized representation rather than the user-provided representation.

For example, 3 may be represented as 3.0

Document _vectors accepts object values

The document _vectors field now accepts objects in addition to embedding arrays:

{
  "id": 42,
  "_vectors": {
    "default": [0.1, 0.2 ],
    "text": {
      "embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
      "regenerate": false
    },
    "translation": {
      "embeddings": [0.1, 0.2, 0.3, 0.4],
      "regenerate": true
    }
  }
}

The _vectors object may contain two fields: embeddings and regenerate.

If present, embeddings will replace this document's embeddings.

regenerate must be either true or false. If regenerate: true, Meilisearch will overwrite the document embeddings each time the document is updated in the future. If regenerate: false, Meilisearch will keep the last provided or generated embeddings even if the document is updated in the future.

This change allows importing embeddings to autoembedders as a one-shot process, by setting them as regenerate: true. This change also ensures embeddings are not regenerated when importing a dump created with Meilisearch v1.9.

Meilisearch v1.9.0 also improves performance when indexing and using hybrid search, avoiding useless operations and optimizing the important ones.

New feature: Ranking score threshold

Use rankingScoreThreshold to exclude search results with low ranking scores:

curl \
 -X POST 'http://localhost:7700/indexes/movies/search' \
 -H 'Content-Type: application/json' \
 --data-binary '{
    "q": "Badman dark returns 1",
    "showRankingScore": true,
    "limit": 5,
    "rankingScoreThreshold": 0.2
 }'

Meilisearch does not return any documents below the configured threshold. Excluded results do not count towards estimatedTotalHits, totalHits, and facet distribution.

⚠️ For performance reasons, if the number of documents above rankingScoreThreshold is higher than limit, Meilisearch does not evaluate the ranking score of the remaining documents. Results ranking below the threshold are not immediately removed from the set of candidates. In this case, Meilisearch may overestimate the count of estimatedTotalHits, totalHits and facet distribution.

Done by @​dureuill in #​4666

New feature: Get similar documents endpoint

This release introduces a new AI-powered search feature allowing you to send a document to Meilisearch and receive a list of similar documents in return.

Use the /indexes/{indexUid}/similar endpoint to query Meilisearch for related documents:

curl \
  -X POST /indexes/:indexUid/similar
  -H 'Content-Type: application/json' \
  --data-binary '{
    "id": "23",
    "offset": 0,
    "limit": 2,
    "filter": "release_date > 1521763199",
    "embedder": "default",
    "attributesToRetrieve": [],
    "showRankingScore": false,
    "showRankingScoreDetails": false
  }'
  • id: string indicating the document needing similar results, required
  • offset: number of results to skip when paginating, optional, defaults to 0
  • limit: number of results to display, optional, defaults to 20
  • filter: string with a filter expression Meilisearch should apply to the results, optional, defaults to null
  • embedder: string indicating the embedder Meilisearch should use to retrieve similar documents, optional, defaults to "default"
  • attributesToRetrieve: array of strings ind

Configuration

📅 Schedule: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

@cq-bot cq-bot requested review from a team and murarustefaan January 7, 2025 13:46
@cq-bot cq-bot added automerge Automatically merge once required checks pass area/plugin/destination/meilisearch labels Jan 7, 2025
@cq-bot cq-bot force-pushed the renovate/getmeili-meilisearch-1.x branch from d8ad73d to 27d7c13 Compare January 7, 2025 14:06
@erezrokah erezrokah changed the title fix(deps): Update getmeili/meilisearch Docker tag to v1.12.1 chore(deps): Update getmeili/meilisearch Docker tag to v1.12.1 Jan 7, 2025
@kodiakhq kodiakhq bot merged commit 0205de5 into main Jan 7, 2025
12 checks passed
@kodiakhq kodiakhq bot deleted the renovate/getmeili-meilisearch-1.x branch January 7, 2025 17:27
@cq-bot cq-bot changed the title chore(deps): Update getmeili/meilisearch Docker tag to v1.12.1 fix(deps): Update getmeili/meilisearch Docker tag to v1.12.1 Jan 7, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area/plugin/destination/meilisearch automerge Automatically merge once required checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants