Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

gh-130587: Add hand-written docs for non-OP tokens #130588

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

encukou
Copy link
Member

@encukou encukou commented Feb 26, 2025

  • Add hand-written docs for non-OP tokens

  • Make the automation (generate_token.py) check that the hand-written docs are present, and only generate docs for the OP tokens

  • Switch to list-table for the OP tokens, to make their docs more compact

  • Add ENDMARKER to the grammar docs where it appears (toplevel components)

  • Add forgotten versionchanged entry for EXCLAMATION

  • Remove docs for NT_OFFSET


📚 Documentation preview 📚: https://cpython-previews--130588.org.readthedocs.build/

@encukou
Copy link
Member Author

encukou commented Feb 26, 2025

@encukou
Copy link
Member Author

encukou commented Feb 26, 2025

@lysnikolaou, does this look reasonable to you?

Comment on lines +66 to +67
A generic token value returned by the :mod:`tokenize` module for
:ref:`operators <operators>` and :ref:`delimiters <delimiters>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A generic token value returned by the :mod:`tokenize` module for
:ref:`operators <operators>` and :ref:`delimiters <delimiters>`.
A generic token value that indicates an
:ref:`operator <operators>` or :ref:`delimiter <delimiters>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include the information that this is only returned by the tokenize module. It's not used internally in the tokenizer at all. Maybe add it as an implementation detail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenize docs have some more information; how much should be repeated here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an .. impl-detail:: like you did for ENCODING?

Comment on lines 149 to 150
Such tokens are produced instead of regular :data:`COMMENT` tokens only when
:func:`ast.parse` is invoked with ``type_comments=True``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does compile() with the AST flag not produce them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does.

@@ -69,7 +69,7 @@ All input read from non-interactive files has the same form:
.. grammar-snippet::
:group: python-grammar

file_input: (NEWLINE | `statement`)*
file_input: (NEWLINE | `statement`)* ENDMARKER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how these changes are related to this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ENDMARKER docs now refer here, and it would be curious if the marker is missing.

The actual gramar does use ENDMARKER.

Comment on lines +236 to +240
tokendef_re = re.compile(r'.. data:: (\w+)')
for line in fileobj:
if match := tokendef_re.fullmatch(line.strip()):
if match[1].isupper():
has_handwritten_doc.add(match[1])
Copy link
Member

@AA-Turner AA-Turner Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid re:

Suggested change
tokendef_re = re.compile(r'.. data:: (\w+)')
for line in fileobj:
if match := tokendef_re.fullmatch(line.strip()):
if match[1].isupper():
has_handwritten_doc.add(match[1])
for line in fileobj:
if not line.startswith('.. data:: '):
continue
tok_name = line.removeprefix('.. data:: ').rstrip()
if tok_name.isidentifier() and tok_name.isupper():
has_handwritten_doc.add(tok_name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but at the cost of duplicating the prefix. Is that a good tradeoff?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, though clarity may be lost in removing duplication:

Suggested change
tokendef_re = re.compile(r'.. data:: (\w+)')
for line in fileobj:
if match := tokendef_re.fullmatch(line.strip()):
if match[1].isupper():
has_handwritten_doc.add(match[1])
for line in map(str.rstrip, fileobj):
name = line.removeprefix('.. data:: ')
# Token names must be uppercase and made up of alphanumerics or '_'
if line != name and name.isidentifier() and name.isupper():
has_handwritten_doc.add(name)

Copy link
Member

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good improvement to me! Thanks @encukou!

I've left some inline comments regarding some specifics in the docs.

Comment on lines +66 to +67
A generic token value returned by the :mod:`tokenize` module for
:ref:`operators <operators>` and :ref:`delimiters <delimiters>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include the information that this is only returned by the tokenize module. It's not used internally in the tokenizer at all. Maybe add it as an implementation detail?

Comment on lines 149 to 150
Such tokens are produced instead of regular :data:`COMMENT` tokens only when
:func:`ast.parse` is invoked with ``type_comments=True``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
@encukou
Copy link
Member Author

encukou commented Feb 27, 2025

Thank you for the reviews! I addressed some; I'll continue next week.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
awaiting core review docs Documentation in the Doc dir skip news
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

4 participants