From a35e814abdfe433099716e8f6bbb73599c3c973a Mon Sep 17 00:00:00 2001 From: Kai Welke Date: Tue, 30 Jul 2024 09:01:28 +0200 Subject: [PATCH] fix(specs): clarify decompounding limitations (#3227) --- specs/common/schemas/IndexSettings.yml | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/specs/common/schemas/IndexSettings.yml b/specs/common/schemas/IndexSettings.yml index a95f3b37ee..7d7a018d8e 100644 --- a/specs/common/schemas/IndexSettings.yml +++ b/specs/common/schemas/IndexSettings.yml @@ -144,6 +144,8 @@ baseIndexSettings: You can specify different lists for different languages. Decompounding is supported for these languages: Dutch (`nl`), German (`de`), Finnish (`fi`), Danish (`da`), Swedish (`sv`), and Norwegian (`no`). + Decompounding doesn't work for words with [non-spacing mark Unicode characters](https://www.charactercodes.net/category/non-spacing_mark). + For example, `Gartenstühle` won't be decompounded if the `ü` consists of `u` (U+0075) and `◌̈` (U+0308). default: {} x-categories: - Languages @@ -527,10 +529,12 @@ indexSettingsAsSearchParams: decompoundQuery: type: boolean description: | - Whether to split compound words into their building blocks. + Whether to split compound words in the query into their building blocks. For more information, see [Word segmentation](https://www.algolia.com/doc/guides/managing-results/optimize-search-results/handling-natural-languages-nlp/in-depth/language-specific-configurations/#splitting-compound-words). Word segmentation is supported for these languages: German, Dutch, Finnish, Swedish, and Norwegian. + Decompounding doesn't work for words with [non-spacing mark Unicode characters](https://www.charactercodes.net/category/non-spacing_mark). + For example, `Gartenstühle` won't be decompounded if the `ü` consists of `u` (U+0075) and `◌̈` (U+0308). default: true x-categories: - Languages