Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support ICU tokenizer #307

Closed
mausch opened this issue Mar 3, 2022 · 1 comment · Fixed by #309
Closed

Support ICU tokenizer #307

mausch opened this issue Mar 3, 2022 · 1 comment · Fixed by #309

Comments

@mausch
Copy link
Contributor

mausch commented Mar 3, 2022

Hi and thank you for creating and maintaining these docker images!

On to the issue 🙂

Setting NOMINATIM_TOKENIZER=icu on image tag 4.0-d880386e3e7833363dab5b5b37fa72f6d65c9766 crashes import with:

2022-03-03 15:39:05: Setting up tokenizer
.........................
Traceback (most recent call last):
  File "/usr/local/bin/nominatim", line 11, in <module>
    exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module',
  File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 235, in nominatim
    return parser.run(**kwargs)
  File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 96, in run
    return args.command.run(args)
  File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/setup.py", line 101, in run
    tokenizer = SetupAll._get_tokenizer(args.continue_at, args.config)
  File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/setup.py", line 171, in _get_tokenizer
    return tokenizer_factory.create_tokenizer(config)
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/factory.py", line 59, in create_tokenizer
    tokenizer.init_new_db(config, init_db=init_db)
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/icu_tokenizer.py", line 46, in init_new_db
    self.loader = ICURuleLoader(config)
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/icu_rule_loader.py", line 48, in __init__
    self._setup_analysis()
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/icu_rule_loader.py", line 128, in _setup_analysis
    self.analysis[name] = TokenAnalyzerRule(section, self.normalization_rules)
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/icu_rule_loader.py", line 156, in __init__
    analysis_mod = importlib.import_module(module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/token_analysis/generic.py", line 9, in <module>
    import datrie
ModuleNotFoundError: No module named 'datrie'

The ICU tokenizer was introduced in Nominatim 4.0.0 : https://nominatim.org/2021/11/03/release-40.html

As far as I understand this new tokenizer makes the custom Postgres module obsolete, which means we can deploy Nominatim on managed Postgres instances e.g. AWS RDS.

Maybe the python3-datrie package just needs to be added to the Dockerfile around here?

python3-icu git \

@leonardehrenfried
Copy link
Collaborator

Yes, I think that you need to add python3-datrie and perhaps a few others.

PRs on this are very welcome!

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants