Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Include Haddocks for exported (low-level) bindings #26

Open
3 tasks
edsko opened this issue Aug 1, 2024 · 20 comments
Open
3 tasks

Include Haddocks for exported (low-level) bindings #26

edsko opened this issue Aug 1, 2024 · 20 comments
Assignees
Labels
confirmed-use-case Features for which there are client-confirmed use cases

Comments

@edsko
Copy link
Collaborator

edsko commented Aug 1, 2024

When we generate foreign imports from C headers, if the declarations in those headers have documentation attached, then we should include that documentation in the generated bindings also. Ideally we would be using Haddock formatting here, but given the absence of a general agreed standard for the documentation of C headers, that might be hard to do in general. Nonetheless, we could at least support common formats.

@edsko edsko added this to the 2: Low-level API milestone Aug 1, 2024
@edsko edsko added the confirmed-use-case Features for which there are client-confirmed use cases label Aug 3, 2024
TravisCardwell added a commit that referenced this issue Oct 7, 2024
TravisCardwell added a commit that referenced this issue Oct 8, 2024
TravisCardwell added a commit that referenced this issue Oct 8, 2024
TravisCardwell added a commit that referenced this issue Oct 8, 2024
TravisCardwell added a commit that referenced this issue Oct 8, 2024
All `libclang` parameter names are capitalized.

The header documentation is updated.
@TravisCardwell
Copy link
Collaborator

I defined the wrappers in doxygen_wrappers.h as well as the corresponding Haskell API in HsBindgen.Clang.Doxygen and the necessary enumerations in HsBindgen.Clang.Doxygen.Enums, in the doxygen branch.

Of the content in clang-c/Documentation.h, the following is not (yet) implemented:

  • Struct CXComment attributes ASTNode and TranslationUnit are not implemented. I think that we will traverse the AST top-down, so I doubt we will need these.
  • Typedef CXAPISet, as well as related functions clang_createAPISet and clang_disposeAPISet, are not implemented. I do not think that we will need these for our purposes.
  • Functions clang_getSymbolGraphForUSR and clang_getSymbolGraphForCursor are not implemented. I do not think that we will need these for our purposes.

Please feel free to let me know if we will need something that I have not implemented, though I will no doubt find out when I proceed with implementation.

@TravisCardwell
Copy link
Collaborator

I am currently implementing an ast-dump utility that walks the AST and dumps information so that I can see exactly how the AST is structured. The LLVM tool works fine, of course; I want to see details via our library.

Just to get started quickly, I started implementing it as a hs-bindgen-libclang test suite, like the tutorial. It could be useful in the long run, though. Perhaps it should be put in a separate package, so that more dependencies can be used. For example, it would probably be worthwhile to implement a CLI parser using optparse-applicative to provide options to select what information is displayed. I am being very verbose with documentation output since that is what I am working on, but such information would get in the way when working on something else.

BTW, output is currently formatted in Markdown

I have not committed the code yet, but I should be able to do so tomorrow.

@TravisCardwell
Copy link
Collaborator

I wonder if it is worth adding HsBindgen.Clang.Util.Classification predicates for the documentation API.

@edsko
Copy link
Collaborator Author

edsko commented Oct 8, 2024

Would the existing show-clang-ast command help with this? Or do you need more info than it provides?

@TravisCardwell
Copy link
Collaborator

Would the existing show-clang-ast command help with this? Or do you need more info than it provides?

I hadn't seen that yet! Nice!

Currently, I am inspecting the parsed comments in detail.

(WIP) Example
* "S4"
    * cursor type kind: Right CXType_Record
        * spelling: "Record"
    * record: "struct S4"
    * extent start: (4,1)
    * extent end: (18,2)
    * comment: "/**\n * A struct with a Doxygen comment\n */"
        * brief: "A struct with a Doxygen comment"
        * kind: Right CXComment_FullComment
            * children: 1
                * kind: Right CXComment_Paragraph
                    * children: 1
                        * kind: Right CXComment_Text
                            * text: " A struct with a Doxygen comment"
* "a"
    * cursor type kind: Right CXType_Char_S
        * spelling: "Char_S"
    * extent start: (8,5)
    * extent end: (8,11)
    * comment: "/**\n     * A field preceded by a Doxygen comment\n     */"
        * brief: "A field preceded by a Doxygen comment"
        * kind: Right CXComment_FullComment
            * children: 1
                * kind: Right CXComment_Paragraph
                    * children: 1
                        * kind: Right CXComment_Text
                            * text: " A field preceded by a Doxygen comment"
* "b"
    * cursor type kind: Right CXType_Int
        * spelling: "Int"
    * extent start: (10,5)
    * extent end: (10,10)
    * comment: "/**< A field followed by a Doxygen comment */"
        * brief: "A field followed by a Doxygen comment"
        * kind: Right CXComment_FullComment
            * children: 1
                * kind: Right CXComment_Paragraph
                    * children: 1
                        * kind: Right CXComment_Text
                            * text: " A field followed by a Doxygen comment "
* "c"
    * cursor type kind: Right CXType_Float
        * spelling: "Float"
    * extent start: (17,5)
    * extent end: (17,12)
    * comment: "/**\n     * A field that refers to another field\n     *\n     * See also @ref S4::a\n     */"
        * brief: "A field that refers to another field"
        * kind: Right CXComment_FullComment
            * children: 2
                * kind: Right CXComment_Paragraph
                    * children: 1
                        * kind: Right CXComment_Text
                            * text: " A field that refers to another field"
                * kind: Right CXComment_Paragraph
                    * children: 3
                        * kind: Right CXComment_Text
                            * text: " See also "
                        * kind: Right CXComment_InlineCommand
                            * name: "ref"
                            * render kind: Right CXCommentInlineCommandRenderKind_Normal
                            * args: 1
                                * 0: "S4::a"
                        * kind: Right CXComment_Text
                            * whitespace: True
                            * text: "     "

TravisCardwell added a commit that referenced this issue Oct 11, 2024
TravisCardwell added a commit that referenced this issue Oct 11, 2024
This commit is a squashed version of a number of previous commits,
rebased to work with a recent redesign of `Fold`.

`FoldM` uses `MonadIO`, and I refactored the code to call `liftIO`
minimizing changes.  Now that we can use a `Reader`, it is possible to
track indentation within the `Reader` environment, using `local` for
indentation, but I have not implemented that.

The new version of `Fold` no longer provides `parent`, as the parent
cursor can be queried using the API.  I rewrote the code to query both
the semantic and lexical parents.  When they are equal and are not the
target file, just "parent" is displayed.  Otherwise, "semantic parent"
and "lexical parent" are displayed separately.

This program uses features added in the `doxygen` branch (#26), but I am
referencing the new `clang-ast-dump` issue (#212).
@edsko
Copy link
Collaborator Author

edsko commented Oct 11, 2024

Keeping a small dependency footprint is not unimportant, especially since this may run as a TH splice. However, I also don't think we should be overly worried about it. Most of the dependencies of commonmark we already have anyway.

TravisCardwell added a commit that referenced this issue Oct 11, 2024
TravisCardwell added a commit that referenced this issue Oct 11, 2024
TravisCardwell added a commit that referenced this issue Oct 11, 2024
TravisCardwell added a commit that referenced this issue Oct 11, 2024
All `libclang` parameter names are capitalized.

The header documentation is updated.
TravisCardwell added a commit that referenced this issue Oct 11, 2024
@TravisCardwell
Copy link
Collaborator

I investigated how we might output Haddock comments in the preprocessor case. We generate the Haskell AST using haskell-src-exts, and we currently generate the Haskell source code using Pretty, which does not support comments.

Package haskell-src-exts-sc provides a way to include comments. It works as follows:

  1. An AST annotated with Maybe CodeComment is pretty-printed using Pretty, which ignores the annotations.
  2. The pretty-printed code is then parsed with Parser to get an AST that is annotated with source locations.
  3. The two ASTs are combined to produce an AST that is annotated with both comments and source locations, using generics.
  4. That AST is traversed, and source locations for comments are determined while the source locations for the AST elements are updated to make room for the inserted comments.
  5. The resulting AST annotated with the updated source locations and list of Comments (which includes source locations) can then be printed using ExactPrint.

The package is not maintained. The version in Hackage fails to build (with GHC 9.6.6) because it does not load the UndecidableInstances extension, but that issue was fixed in the repository.

My test produced broken output, as a comment was inserted immediately after a brace, starting a multi-line comment.

-- | This is some module documentation.
module Demo where

-- | Foo is a data type
data Foo = Foo{-- | It has a bar!
               bar :: Int, -- | It has a baz!
                           baz :: Int}

The algorithm used for inserting comments is simple, and I think it would work pretty well if the non-commented code put each field (etc.) on a separate line (as they should be with comments). If we want to fix this, I think it would involve forking the Pretty implementation, not making any major changes to haskell-src-exts-sc. (The only change would be to use the forked pretty-printing implementation.) Both packages use the BSD-3 license.

Alternatives include using ghc-exactprint, but that would likely result in a higher maintenance cost.

At any rate, I now have a much better idea of how the translated documentation will be consumed.

@TravisCardwell
Copy link
Collaborator

Doxygen documentation may contain references to identifiers.

We translate from C names to Haskell names using a local context. For example, record field names translate to names that include the data type name or constructor name, according to configured options. When we run across an identifier in the documentation, we are not able to perform the same translation without context.

If we want to translate references, perhaps we can accumulate a reference map (from C names to Haskell names) during code translation and then pass that map to a subsequent step that translates the documentation. Any identifier that is in the reference map would then be able to be translated.

An easy yet unhelpful alternative is to not translate references. All references could be rendered as code (@reference@), resulting in Haskell documentation that still "references" the C identifiers.

@TravisCardwell
Copy link
Collaborator

TravisCardwell commented Oct 17, 2024

I ran into what seemed to be strange behavior, and I finally realized that it is not strange but rather a mistake in the documentation in clang-c/Documentation.h (for libclang 18)!

/**
 * Convert a given full parsed comment to an HTML fragment.
 *
...
 * \li "para-brief" for \paragraph and equivalent commands;
...
 */
CINDEX_LINKAGE CXString clang_FullComment_getAsHTML(CXComment Comment);

The mistake is in the middle line, which should have \\paragraph with an escaped backslash to reference the \paragraph command instead of use it. The \paragraph documentation includes a warning:

This command only works inside a subsubsection of a related page documentation block and not in other documentation blocks!

It is in one of the "other documentation blocks" here. Not only does it serve as a no-op, it breaks the list item so that the following text is parsed as a verbatim line in between list items.

In documentation where an escaped backslash is used to reference a command, the escaped backslash and following text are represented in separate (adjacent) text elements.

EDIT

In clang-c/Index.h:

/**
 * @}
 */

/**
 * \defgroup CINDEX_MODULE Module introspection
 *
 * The functions in this group provide access to information about modules.
 *
 * @{
 */

The \defgroup command is used for organization, but it looks like libclang does not support it. As with the above \paragraph mistake, the command is not recognized. From the libclang AST, there is no reference to the command. We cannot know which command is there, though we can tell that something did not work because it breaks the flow of the documentation and (incorrectly) parses text after the problematic command as a verbatim line.

Such issues are the only places where I see verbatim lines in the parsed ASTs. I am not even sure how to create a verbatim line aside from this, and the documentation does not provide any clues.

@TravisCardwell
Copy link
Collaborator

I am switching back to other priorities, but I was really close to finishing the first implementation of the reifying of the libclang documentation AST and went ahead and did so this morning. Perhaps it is a good idea to go ahead and create a PR and merge it so that the branch does not get lost/forgotten/stale.

The "kind" of a CXComment determines what it contains, but the C type system of course cannot constrain return values by kind. For example, it is possible to get a CXComment of any kind even if a CXComment of a kind that represents block content is expected, from the perspective of the type system. I do not expect there to be issues with this, but I went ahead and implemented defensively just in case.

Haskell type Comment is a top-level comment, which corresponds to the CXComment_FullComment kind. If any other kind is found, it is put into an appropriate type so that we do not lose documentation. Note that Comment also includes the display name of the CXCursor, which we may use to document the original C name.

Kind CXComment_Null is not represented with a type in Haskell. The (high-level) clang_getComment function returns a Maybe Comment, where Nothing corresponds to a CXComment_Null CXComment.

The rest of the CXComment kinds are organized into block (CommentBlockContent) and inline (CommentInlineContent) content. When block content is expected but inline content is found, that inline content is wrapped in an appropriate block container so that it is not lost. When inline content is expected but a paragraph is found, the paragraph content is returned so that it is not lost. When inline content is expected but other block content is found, it is ignored. There is not an elegant way to handle this case, which should never happen anyway.

TravisCardwell added a commit that referenced this issue Oct 24, 2024
We decided against lenient parsing of `libclang` comments and now throw
an error upon encountering an unexpected comment kind, such as block
content within inline content.

We will of course need to test this against various real-world code.
Error messages include the cursor display name, the file name, and the
extent.  We can investigate the `libclang` comment AST for any errors
using the `clang-ast-dump` utility.

Note that this internal module is not used anywhere yet, so it is
therefore untested.  I will test it when returning to the documentation
task.
TravisCardwell added a commit that referenced this issue Oct 24, 2024
Reify libclang documentation AST (#26)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
confirmed-use-case Features for which there are client-confirmed use cases
Projects
None yet
Development

No branches or pull requests

2 participants