Implement AST annotations #256

edsko · 2024-11-06T06:46:07Z

We should be able handle annotations in the AST, such as recording source spans, additional information for structs (such as offsets, field widths etc provided by Clang) or type information (for type inference of macros). Perhaps the following rather simple approach would be sufficient:

data Pass = ...

type Annot :: Symbol -> Pass -> Type
type family Annot con p -- open type family

type Expr :: Pass -> Type
data Expr p
  = Con1 ( Annot "Con1" p ) A B
  | Con2 ( Annot "Con2" p ) C
  ... -- NB: no extension constructor

edsko · 2024-11-06T06:46:37Z

Alternatives would be to have a different type family for each constructor, or to use an open sum type approach e.g.

data ExprCon = Con1 | Con2

type Content :: Pass -> k -> Type
data family Content p con
data    instance Content P Con1 = MkCon1 A B
newtype instance Content P Con2 = MkCon2 C

data WithAnnot p con = Annot { annotation :: !( Annot p con ), content :: !( Content p con ) }

type Expr p = VariantF ( WithAnnot p ) '[ Con1, Con2 ]

but that seems a bit too heavy-weight.

TravisCardwell · 2024-11-08T07:34:54Z

I am working on implementing the de# the first comment, using a symbol-indexed open type family with passes defined using a sum type.

I do not yet have a clear view of what passes we might have. To start with, I just defined a single pass, tentatively called Parsed.

We need to decide what parts of the AST we would like to annotate. Here is what I have so far:

Newtype
NewtypeField
Struct
StructField

Should other parts of the AST be annotated?

In this initial implementation, I just set all annotations to (). The code is changed with minimal modifications to get it to compile, and we can format it nicely when adding actual annotations.

Regarding module organization, the annotation types (just type family instances since everything is currently ()) are currently all defined in HsBindgen.Hs.AST. I imagine that we may want to use multiple modules when the implementation increases in size.

I have not updated the tests yet. A number of them fail because the AST is now pretty-printed with annotations.

I am pushing the current state to the ast-annotations branch in case anybody wants to look at it. Please do not hesitate to let me know of any corrections or suggestions.

TravisCardwell · 2024-11-11T02:20:52Z

Here is an overview of the data flow:

graph TD
  SRC@{ shape: doc, label: "C Source"}
  LL("Low-level libclang types")
  C("C AST types")
  CIR("C IR types")
  HS("Haskell AST types")
  BC("Backend common types")
  TH("Template Haskell types")
  PP("Preprocessor types")
  DST@{ shape: doc, label: "Haskell Source"}

  SRC-- parsed to     -->LL
  LL--  translated to -->C
  C--   translated to -->CIR
  CIR-- translated to -->HS
  HS--  translated to -->BC
  BC--  translated to -->TH
  BC--  translated to -->PP
  PP--  rendered to   -->DST

  click LL "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen-libclang/src/HsBindgen/Clang/LowLevel/Core.hs" "HsBindgen.Clang.LowLevel.Core"
  click C "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/C/AST.hs" "HsBindgen.C.AST"
  click HS "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/Hs/AST.hs" "HsBindgen.Hs.AST"
  click BC "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/Backend/Common.hs" "HsBindgen.Backend.Common"
  click TH "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/Backend/TH.hs" "HsBindgen.Backend.TH"
  click PP "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/Backend/PP.hs" "HsBindgen.Backend.PP"

Use of a simplified C IR is discussed in #253. From this discussion, my understanding is that the C AST will be transformed to separate C IR types. Perhaps neither C AST nor C IR types need annotations. I imagine that multi-pass processing will all occur only with the Haskell AST. I do not yet have a clear view of what passes we might have, though.

The term "annotations" has a connotation of "extra" information, but I think we should be clear that it is used for including information with types that vary depending on the pass regardless of it is "extra" or not.

LINE pragmas may be generated (Include LINE pragmas in generated output? #74). (I imagine that this may be enabled/disabled via configuration.) The required source location information is retrieved from libclang extents, and it will need to be passed from the C AST to the backend types. The type does not vary, so this is probably best included directly in the types, not in annotations.
Tool decisions may be output in comments (Explain tool decisions in generated output #23). (I imagine that this may be enabled/disabled via configuration.) I imagine that this may be implemented using a type like [ToolDecision] throughout all the types. This type does not vary, so this is probably best included directly in the types, not in annotations.
Documentation must be translated from C/Doxygen to Haskell/Haddock syntax (Include Haddocks for exported (low-level) bindings #26). If this translation is context-free, perhaps the high-level documentation types (defined in HsBindgen.Clang.HighLevel.Documentation) will be passed from the C AST to the backend types. In the preprocessor backend, it can be translated to Haskell documentation strings that include Haddock documentation syntax and is formatted with appropriate indentation and line length. In the Template Haskell backend, it can be translated to Haskell documentation strings that do not include Haddock documentation syntax, to be added in specific locations by module finalizers. With this design, the type does not vary, so it is probably best included directly in the types, not in annotations.
Import resolution is done differently in the different backends. The Template Haskell backend references names directly, imported in our backend implementation. The preprocessor backend resolves names using our own code, specifying the module a name is imported from and if the import should be qualified or not. Imports can optionally specify aliases, and the type of name (identifier or operator) determines how names are pretty-printed. All of this is specific to the preprocessor backend. I do not think that import resolution should be done earlier, as it would complicate the design and implementation.

phadej · 2024-11-11T15:20:12Z

I agree with @TravisCardwell if the TL;DR is that for every need we identified so far the cleaner and simpler solution is "This [extra info] type does not vary, so this is probably best included directly in the [structure] types".

And, YAGNI for any non-yet identified needs.

edsko · 2024-11-12T11:42:38Z

I don't mind delaying until we have a concrete use case, but one such use case is the results of type inference from @sheaf 's type checker.

edsko added this to the 1: `Storable` instances milestone Nov 6, 2024

edsko mentioned this issue Nov 6, 2024

Introduce a simplified C IR. #253

Open

edsko assigned TravisCardwell Nov 6, 2024

TravisCardwell mentioned this issue Nov 13, 2024

Add AST annotations #276

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement AST annotations #256

Implement AST annotations #256

edsko commented Nov 6, 2024

edsko commented Nov 6, 2024

TravisCardwell commented Nov 8, 2024

TravisCardwell commented Nov 11, 2024

phadej commented Nov 11, 2024 •

edited

Loading

edsko commented Nov 12, 2024

Implement AST annotations #256

Implement AST annotations #256

Comments

edsko commented Nov 6, 2024

edsko commented Nov 6, 2024

TravisCardwell commented Nov 8, 2024

TravisCardwell commented Nov 11, 2024

phadej commented Nov 11, 2024 • edited Loading

edsko commented Nov 12, 2024

phadej commented Nov 11, 2024 •

edited

Loading