Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Doc: Use the Snowball project's PyStemmer library #125264

Closed
wants to merge 4 commits into from

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented Oct 10, 2024

PyStemmer exposes bindings to libstemmer_c,
the core Snowball library written in C.
This can improve performance of word stemming.

Sphinx will use PyStemmer if installed, but it isn't a requirement -- downstream redistributors can remove the package, for example.

A

xref:


📚 Documentation preview 📚: https://cpython-previews--125264.org.readthedocs.build/

PyStemmer exposes bindings to libstemmer_c,
the core Snowball library written in C.
This can improve performance of word stemming.
@hugovk
Copy link
Member

hugovk commented Oct 10, 2024

What was the doctest failure? The CI logs don't give much detail:

make: Entering directory '/home/runner/work/cpython/cpython/Doc'
make[1]: Entering directory '/home/runner/work/cpython/cpython/Doc'
mkdir -p build

Missing the required blurb or sphinx-build tools.
Please run 'make venv' to install local copies.

make[1]: *** [Makefile:55: build] Error 1
make[1]: Leaving directory '/home/runner/work/cpython/cpython/Doc'
Testing of doctests in the sources finished, look at the results in build/doctest/output.txt
make: *** [Makefile:[13](https://github.com/python/cpython/actions/runs/11275988615/job/31358745485?pr=125264#step:8:14)4: doctest] Error 1
make: Leaving directory '/home/runner/work/cpython/cpython/Doc'
Error: Process completed with exit code 2.

@AA-Turner
Copy link
Member Author

AA-Turner commented Oct 10, 2024

It's the previous step:

Run make -C Doc/ PYTHON=../python venv
make: Entering directory '/home/runner/work/cpython/cpython/Doc'
Creating venv in ./venv
[... snip ...]
Building wheels for collected packages: PyStemmer
  Building wheel for PyStemmer (pyproject.toml): started
  Building wheel for PyStemmer (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error
  
  × Building wheel for PyStemmer (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [180 lines of output]
      running bdist_wheel
      running build
      running build_ext
      /tmp/pip-build-env-n1d6aaaa/normal/lib/python3.14/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-a6agkwmr/pystemmer_2ef001f4a4fc43f28c3ad72b5862f741/src/Stemmer.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
                      self.__purgeCache()
              def __get__(self):
                  return self.max_cache_size
      
          def __purgeCache (self):
              if len(self.cache) < self.max_cache_size:
                         ^
      ------------------------------------------------------------
      
      src/Stemmer.pyx:152:19: Compiler crash in AnalyseExpressionsTransform
      
      ModuleNode.body = StatListNode(Stemmer.pyx:25:0)
      StatListNode.stats[8] = CClassDefNode(Stemmer.pyx:81:5,
          as_name = 'Stemmer',
          class_name = 'Stemmer',
          doc = 'An instance of a stemming algorithm.\n\n    The algorithm has internal state, so must not be called concurrently.\n    ie, only a single thread should access the instance at any given time.\n\n    When creating a `Stemmer` object, there is one required argument: the\n    name of the algorithm to use in the new stemmer.  A list of the valid\n    algorithm names may be obtained by calling the `algorithms()` function\n    in this module.  In addition, the appropriate stemming algorithm for a\n    given language may be obtained by using the 2 or 3 letter ISO 639\n    language codes.\n\n    A second optional argument to the constructor for `Stemmer` is the size\n    of cache to use.  The cache implemented in this module is not terribly\n    efficient, but benchmarks show that it approximately doubles\n    performance for typical text processing operations, without too much\n    memory overhead.  The cache may be disabled by passing a size of 0.\n    The default size (10000 words) is probably appropriate in most\n    situations.  In pathological cases (for example, when no word is\n    presented to the stemming algorithm more than once, so the cache is\n    useless), the cache can severely damage performance.\n\n    The "benchmark.py" script supplied with the PyStemmer distribution can\n    be used to test the performance of the stemming algorithms with various\n    cache sizes.\n\n    ',
          module_name = '',
          punycode_class_name = 'Stemmer',
          visibility = 'private')
      CClassDefNode.body = StatListNode(Stemmer.pyx:82:4)
      StatListNode.stats[3] = DefNode(Stemmer.pyx:151:4,
          is_cyfunction = True,
          modifiers = [...]/0,
          name = '__purgeCache',
          np_args_idx = [...]/0,
          num_required_args = 1,
          outer_attrs = [...]/2,
          py_wrapper_required = True,
          reqd_kw_flags_cname = '0',
          used = True)
      File 'Nodes.py', line 397, in analyse_expressions: StatListNode(Stemmer.pyx:152:8)
      File 'Nodes.py', line 7176, in analyse_expressions: IfStatNode(Stemmer.pyx:152:8)
      File 'Nodes.py', line 7222, in analyse_expressions: IfClauseNode(Stemmer.pyx:152:11,
          is_terminator = True)
      File 'ExprNodes.py', line 663, in analyse_temp_boolean_expression: PrimaryCmpNode(Stemmer.pyx:152:27,
          operator = '<',
          result_is_used = True,
          use_managed_ref = True)
      File 'ExprNodes.py', line 13407, in analyse_types: PrimaryCmpNode(Stemmer.pyx:152:27,
          operator = '<',
          result_is_used = True,
          use_managed_ref = True)
      File 'ExprNodes.py', line 6139, in analyse_types: SimpleCallNode(Stemmer.pyx:152:14,
          analysed = True,
          result_is_used = True,
          use_managed_ref = True)
      File 'ExprNodes.py', line 6257, in analyse_c_function_call: SimpleCallNode(Stemmer.pyx:152:14,
          analysed = True,
          result_is_used = True,
          use_managed_ref = True)
      File 'ExprNodes.py', line 7338, in coerce_to: AttributeNode(Stemmer.pyx:152:19,
          attribute = 'cache',
          is_attribute = 1,
          needs_none_check = True,
          result_is_used = True,
          use_managed_ref = True)
      File 'ExprNodes.py', line 983, in coerce_to: AttributeNode(Stemmer.pyx:152:19,
          attribute = 'cache',
          is_attribute = 1,
          needs_none_check = True,
          result_is_used = True,
          use_managed_ref = True)
      
      Compiler crash traceback from this point on:
        File "/tmp/pip-build-env-n1d6aaaa/normal/lib/python3.14/site-packages/Cython/Compiler/ExprNodes.py", line 983, in coerce_to
          if src_type.is_cv_qualified:
             ^^^^^^^^^^^^^^^^^^^^^^^^
      AttributeError: 'NoneType' object has no attribute 'is_cv_qualified'
      Compiling src/Stemmer.pyx because it changed.
      [1/1] Cythonizing src/Stemmer.pyx
      Traceback (most recent call last):
        File "/home/runner/work/cpython/cpython/Doc/venv/lib/python3.14/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
          ~~~~^^
        File "/home/runner/work/cpython/cpython/Doc/venv/lib/python3.14/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/runner/work/cpython/cpython/Doc/venv/lib/python3.14/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                              metadata_directory)
                                              ^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-n1d6aaaa/overlay/lib/python3.14/site-packages/setuptools/build_meta.py", line 421, in build_wheel
          return self._build_with_temp_dir(
                 ~~~~~~~~~~~~~~~~~~~~~~~~~^
              ['bdist_wheel'],
              ^^^^^^^^^^^^^^^^
          ...<3 lines>...
          super().run_command(command)
          ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
        File "/tmp/pip-build-env-n1d6aaaa/overlay/lib/python3.14/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
          ~~~~~~~~~~~^^
        File "/tmp/pip-build-env-n1d6aaaa/overlay/lib/python3.14/site-packages/setuptools/command/build_ext.py", line 98, in run
          _build_ext.run(self)
          ~~~~~~~~~~~~~~^^^^^^
        File "/tmp/pip-build-env-n1d6aaaa/overlay/lib/python3.14/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
          ~~~~~~~~~~~~~~~~~~~~~^^
        File "/tmp/pip-build-env-n1d6aaaa/overlay/lib/python3.14/site-packages/setuptools/_distutils/command/build_ext.py", line 476, in build_extensions
          self._build_extensions_serial()
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
        File "/tmp/pip-build-env-n1d6aaaa/overlay/lib/python3.14/site-packages/setuptools/_distutils/command/build_ext.py", line 502, in _build_extensions_serial
          self.build_extension(ext)
          ~~~~~~~~~~~~~~~~~~~~^^^^^
        File "/tmp/pip-build-env-n1d6aaaa/overlay/lib/python3.14/site-packages/setuptools/command/build_ext.py", line 263, in build_extension
          _build_ext.build_extension(self, ext)
          ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
        File "/tmp/pip-build-env-n1d6aaaa/normal/lib/python3.14/site-packages/Cython/Distutils/build_ext.py", line 130, in build_extension
          new_ext = cythonize(
                    ~~~~~~~~~^
              ext,force=self.force, quiet=self.verbose == 0, **options
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          )[0]
          ^
        File "/tmp/pip-build-env-n1d6aaaa/normal/lib/python3.14/site-packages/Cython/Build/Dependencies.py", line 1154, in cythonize
          cythonize_one(*args)
          ~~~~~~~~~~~~~^^^^^^^
        File "/tmp/pip-build-env-n1d6aaaa/normal/lib/python3.14/site-packages/Cython/Build/Dependencies.py", line 1321, in cythonize_one
          raise CompileError(None, pyx_file)
      Cython.Compiler.Errors.CompileError: src/Stemmer.pyx
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for PyStemmer
Failed to build PyStemmer
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (PyStemmer)
The venv has been created in the ./venv directory
make: Leaving directory '/home/runner/work/cpython/cpython/Doc'

Odd it doesn't report a failure, but I'm no expert on Make.

P.S. the error makes sense as we're building a pre-release Python with a Cython package, we can't expect support yet.

A

@hugovk
Copy link
Member

hugovk commented Oct 10, 2024

Odd it doesn't report a failure, but I'm no expert on Make.

I'm guessing because the venv target is really one big if statement, and the if is successful, and it's not propagating up a failing subcommand.

P.S. the error makes sense as we're building a pre-release Python with a Cython package, we can't expect support yet.

Yeah, fair enough. Might we run into this locally as well? Can we mark PyStemmer as --only-binary in requirements.txt instead of only removing it on CI?

@AA-Turner
Copy link
Member Author

Can we mark PyStemmer as --only-binary in requirements.txt instead of only removing it on CI?

From what I can find, --only-binary is only applicable to the whole build, rather than per-package in requirements.txt.

I'll try adjusting the makefile to use --only-binary PyStemmer and see what happens.

A

@AA-Turner
Copy link
Member Author

make: Entering directory '/home/runner/work/cpython/cpython/Doc'
Creating venv in ./venv
[...snip....]
ERROR: Could not find a version that satisfies the requirement PyStemmer~=2.2.0 (from versions: none)
ERROR: No matching distribution found for PyStemmer~=2.2.0

This reverts commit 551edb0.
@hugovk
Copy link
Member

hugovk commented Oct 10, 2024

Ah right, it mentions the failure in the help. That's a shame.

  --only-binary <format_control>
                              Do not use source packages. Can be supplied
                              multiple times, and each time adds to the
                              existing value. Accepts either ":all:" to
                              disable all source packages, ":none:" to
                              empty the set, or one or more package names
                              with commas between them. Packages without
                              binary distributions will fail to install
                              when this option is used on them.

I think python/docsbuild-scripts#217 might be the better approach?

@hugovk
Copy link
Member

hugovk commented Oct 24, 2024

python/docsbuild-scripts#217 has been merged, shall we close this?

@AA-Turner
Copy link
Member Author

I'd like to return to this in the future to have it upstream (especially if we move to RtD), but for now yes it can be closed.

A

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
awaiting core review docs Documentation in the Doc dir needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes skip issue skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants