Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Generated structs exposed in header files, reactions linkable from separate source files, and updated C target preamble visibility #1599

Merged
merged 98 commits into from
Apr 12, 2023

Conversation

petervdonovan
Copy link
Collaborator

@petervdonovan petervdonovan commented Feb 22, 2023

This is opened in response to feedback from Magnition. It allows LF files to be written without requiring target-specific code to be mixed into them.

UPDATE: This PR has come to encompass substantially more than just the small changes that I intended. I explain:

Definitions

Declarations are user-visible if they are intended to be #included in the LF programmer's C code. User-visible generated code can therefore be expected to be viewed frequently as a reference for how C code has to interact with LF code.

Declarations are mangled if they are not intended to be user-visible.

The main .c file is the one that has initializeTriggerObjects.

Change Summary

This PR:

  • Factors mangled reactor-class-specific declarations out of the main .c file and into mangled .h files. This includes mangled self structs, mangled auxiliary structs such as port and action structs, and mangled reaction function declarations.
  • Factors mangled reactor-class-specific definitions out of the main .c file and into mangled .c files. This includes mangled reaction function definitions.
  • Exposes user-visible reactor-class-specific declarations in user-visible .h files. This includes mangled self structs, mangled port structs of the current reactor, mangled port structs of the contained reactors (non-recursively, down to just one level of hierarchy), and mangled auxiliary structs such as action structs.
    • This results in a substantial amount of duplication. Mangled declarations are duplicated in large part. Declarations for port structs of contained reactors appear both in the contained reactors' user-visible header files and in the user-visible header files for their container. I think this is bad, but justified. It allows us to minimize namespace pollution by preventing certain files from having to #include others in their entirety. It also allows us to present simplified versions of the self structs, without all of the implementation details which are not intended to be user-visible.
  • Implements a new semantics for preambles in the C target.
    • File-level preambles are copied into user-visible header files and mangled header files.
    • Reactor-level preambles are copied into mangled source files.
  • Allows the use of non-inlined reactions together with contained banks, contained multiports, contained banks with multiports, and non-contained ports.

Rationale

  • Although this PR was originally intended to implement non-inlined reactions (which it, to some extent, does), it does not claim to implement this feature in its entirety. For example, it only includes a few smoke tests for non-inlined reactions. It also does not allow aliasing of ports, which is necessary in order to use non-inlined reactions universally without naming conflicts. I would also like to emphasize that non-inlined reactions are still an experimental, undocumented feature.
  • Instead, this PR is primarily focused on reducing the potential for name collisions in the generated code. It does this using hashcodes. In cases where hashcodes cannot be used, I attempt to reduce naming conflicts by moving code into separate files.
  • Christian has pointed out problems with the preamble semantics that is implemented in this PR. I think that his concerns make sense, and I think he might be right. However, I would like to have some experience with the preamble semantics implemented in this PR before I take the trouble to add public and private preambles to the C target. The semantics implemented here is sufficient to implement all of the functionality that is used in our integration tests and benchmarks.
  • I discussed the handling of file-level preambles with @mkhubaibumer today. One potential problem with my approach is that it copies the file-level preamble to many different header files, which results in a lot of duplication. I think that this is fine as-is because I think that it is an antipattern to have a long file-level preamble. If it is long, then the LF programmer should factor it out into a separate .h file. However, I do not feel strongly about this and am willing to change it if other people object.

Work that will be left for future PRs

  • It is very unlikely that hashcodes will collide, but I have not analyzed the hashcodes used to verify any probabilistic guarantees nor to verify that the hashcodes will be the same for repeated executions of the same compilation process.
  • Port aliasing, as we discussed in the comments below, still needs to be implemented. I do not expect this to be very difficult. Like non-inlined reactions, this will also be an undocumented feature that we will quietly introduce in a backwards-compatible way to allow experimentation.
  • More tests are probably needed for non-inlined reactions.
  • Non-inlined reactions are not currently implemented in a particularly efficient way in the cases of banks and multiports. In the current implementation the banks and multiports are copied into data structures (on the stack) pointers to which passed to the programmer's reaction function. This is especially problematic when activity is sparse. However, I think that master also has this problem, since it copies data into local structs, and the changes are backwards-compatible, so this is not really a regression.

Companion PR in reactor-c: lf-lang/reactor-c#189

@petervdonovan
Copy link
Collaborator Author

petervdonovan commented Feb 23, 2023

This syntax was discussed in our meeting with Magnition this morning:

reaction(count.out) {=
    lf_set(out, ...);
=}

reaction(count.out as countout) {=
    lf_set(countout, ...);
=}

I think that the reason for wanting to do this syntax change can be articulated more precisely. I claim that port access using dot notation (e.g., count.out in a reaction body) is an illness in the sense that it reduces modularity, and that it has two symptoms:

Symptom 1

The code generator contains "scary code" that generates structs solely for the purpose of allowing dot notation access in C. This is accidental complexity; it is not essential to any interesting problem we are trying to solve, and instead is merely an issue of how we interact with C syntax. And our code quality problem is not merely that we have to maintain more code; it is also that the code we maintain has to pass around information about the structure of the program -- in particular, the dot notation expression that is being used to access a port. This information has to be propagated through every level of the code generator; it cannot be abstracted in any intermediate IR because the final code that falls out has to reflect the form of the expression.

This issue will not be completely hidden from end users. In order to understand what prototypes they have to implement, end users will have to read the generated header files, including the types which are referenced in the prototypes of the functions they have to implement. Do we really want to clutter these generated header files with meaningless, "do nothing" structs that are just there to allow this dot access notation?

Symptom 2

Dot notation syntax stands in the way of refactoring. Suppose we have the following reactor:

reactor r {
    input in: int
    output out: int
    reaction(in) {=
        lf_set(out, in->value);
    =}
}

And it gets refactored to this reactor:

reactor r {
    input in: long
    output out: int
    transducer = new LongToIntConverter
    in -> transducer.in
    reaction(transducer.out as in) {=
        lf_set(out, in->value);
    =}
}

Without aliasing using as, we would need to change the reaction body to reflect that we are now accessing transducer.out instead of in. This is an especially serious problem when the reactions are not inlined because it means that a refactoring in the LF file requires the user to make changes in a separate file.

Furthermore, the name transducer.out may not be related to what is actually being computed by the reaction. Disallowing aliasing makes it unnecessarily difficult to write self-documenting code.

The core problem

Mainstream languages allow most expressions to be treated as reducible in the sense that they can be replaced with their value (in this case, with the port that is being referenced). When expressions are not treated as reducible, they cannot scale without becoming hard to understand. These structs that we are having to generate, for example, reflect the fact that whenever we have a port that is accessed using dot notation, it becomes "its own concept," requiring its own representation in a header file; and furthermore, these extra concepts grow as the number of expressions that you compose grows because you cannot replace the dot-notation expressions with something (a port) that they evaluate to. Symptom 2 is also a consequence of this fundamental error in the sense that because the hierarchical reference is "its own concept," it cannot be used in the place of a regular port. There is no fundamental reason for this incompatibility.

@petervdonovan petervdonovan force-pushed the bodyless-reactions branch 3 times, most recently from 331e02a to b210197 Compare February 24, 2023 07:35
@lhstrh
Copy link
Member

lhstrh commented Feb 25, 2023

In response to #1599 (comment):

How about we:

  • for now, keep inline bodies as they are, including the helper structs for dot notation;
  • do not create helper structs for externally implemented bodies; and
  • for bodiless reactions have a.x translate to x in reaction scope, unless there already is an x, in which case the user has to disambiguate with an as clause.

This is backward compatible and should not get into anyone's way.

@petervdonovan petervdonovan changed the title Bodyless reactions Extract header files and C code from main file Feb 25, 2023
@edwardalee
Copy link
Collaborator

This refactoring example is another instance of the round tripping problem. I wonder how many such problems still lurk, waiting to be discovered as we gain experience...

@petervdonovan
Copy link
Collaborator Author

This PR now involves generating code that is specific to a given reactor constructor into a separate file. This change is motivated by the need to reduce name conflicts, and is part of the effort to expose generated types in generated header files. It is also a long-standing to-do item that I think has the potential to make the generated code easier to work with. If my implementation works, this PR will also allow us to avoid generating code multiple times for reactors that are instantiated multiple times with different names, which can let us modestly reduce code size.

This also gives us the opportunity to re-assess how we deal with file-level preambles. The semantics of preambles have been discussed in #1372 and #32, and probably in a few other issues. If we take the stance that they should be for #includeing header files (or more generally, for imports, in the case of Python, TS, etc.), then we should duplicate the contents of a file-level preamble at the top of every .c file that comes from the .lf file. This has the added benefit that if you try to declare global variables this way, your program will not compile. Reactor-level preambles would remain restricted to only the file that is specific to the associated reactor, which again can help us with information hiding/reduction of name conflicts.

Is everyone OK with this?

@edwardalee
Copy link
Collaborator

Unfortunately, global variables are sometimes needed. For example, when targeting the nRF52, initialization of hardware interfaces has to happen only once, regardless of how many instances of a reactors use that hardware. See for example Display.lf.

@petervdonovan
Copy link
Collaborator Author

petervdonovan commented Feb 25, 2023

In that case, we can define buckler_spi_initialized and the other global variables in a reactor-specific preamble without using the static keyword, and we can declare them in the file-level preamble using the extern keyword. Then it (the definition) will be code-generated only into the reactor-specific .c file, but it will be declared for all reactors in the LF file.

@edwardalee
Copy link
Collaborator

OK, it would be good to test that idea. It should be tested with inheritance also. I.e., all reactor classes that use some shared hardware should subclass a common base reactor, and the global variables and initialization code should be declared there.

@cmnrd
Copy link
Collaborator

cmnrd commented Feb 27, 2023

In response to #1599 (comment):

How about we:

* for now, keep inline bodies as they are, including the helper structs for dot notation;

* do not create helper structs for externally implemented bodies; and

* for bodiless reactions have `a.x` translate to `x` in reaction scope, unless there already is an `x`, in which case the user has to disambiguate with an `as` clause.

This is backward compatible and should not get into anyone's way.

This sounds good to me! Removing the dot notation syntax would be a great relief and simplify the code generation also in other targets.

In that case, we can define buckler_spi_initialized and the other global variables in a reactor-specific preamble without using the static keyword, and we can declare them in the file-level preamble using the extern keyword. Then it (the definition) will be code-generated only into the reactor-specific .c file, but it will be declared for all reactors in the LF file.

I am not sure if I understand how this would work. Are you saying that the file level preamble would be code generated into a header file, while the reactor-level preamble is generated into a .c file? I am afraid that this wouldn't be general enough, but I am not sure. I believe you will need the public and private preamble mechanism as implemented in the C++ target to separate declaration and definition. Also, be prepared to get all kind of weird issues once imports are involved

@edwardalee
Copy link
Collaborator

I am opposed to changing the syntax of LF to avoid a perceived awkwardness under the hood in the C target implementation. Specifically, I am not in favor either of referring to a port x of a contained reactor y as just x, nor am I in favor of renaming it using "as". The cost of this change is high (it will have to be supported in all targets and will complicate the documentation and tutorials). Moreover, it creates yet another round tripping problem. What if I later add a port named x to y? What if I renamed it z and later add a port named z?

@petervdonovan
Copy link
Collaborator Author

petervdonovan commented Feb 28, 2023

The cost of this change is high (it will have to be supported in all targets and will complicate the documentation and tutorials).

I agree that we would have to change the documentation and tutorials and it could make them more complicated.

to avoid a perceived awkwardness under the hood in the C target implementation.

For the record, there is a bit more to it than that... For example, Christian seems to have suggested that other targets may have similar problems. There is also a bit more explanation in the comment where I introduced the idea.

Moreover, it creates yet another round tripping problem.

Yes, I agree that the proposal could be involved in round tripping problems and could in some cases make them worse. I actually argued in my earlier comment that that it would alleviate round tripping problems rather than making them worse; however, I agree that there are lots of ways to make any two files incompatible with each other, and that port aliases can add more such ways.


Overall, I think I understand the concerns and I like Marten's idea to avoid making any backward-incompatible changes. This issue with hierarchical references can be treated as an experimental, undocumented feature that we can implement (for just the C target, and only for the non-inlined reactions), try out, and maybe discuss again at another time. If we do it the way Marten suggested, then the current coding experience in Lingua Franca will continue unchanged, and nearly all the effort involved in this PR will be in non-disruptive refactorings that we need to do anyway.

@cmnrd
Copy link
Collaborator

cmnrd commented Feb 28, 2023

to avoid a perceived awkwardness under the hood in the C target implementation.

For the record, there is a bit more to it than that... For example, Christian seems to have suggested that other targets may have similar problems. There is also a bit more explanation in the comment where I introduced the idea.

The dot notation creates awkwardness in all target generators, precisely because it requires to create a new object just to give the reaction something that it can use to reference the port. This has been a challenge in C++ and was even more difficult to implement in Rust. The code in the generators is complex, difficult to maintain, and after all only there for cosmetics. I would be very happy if we could just get rid of this "feature".

Moreover, it creates yet another round tripping problem. What if I later add a port named x to y? What if I renamed it z and later add a port named z?

I don't understand why this would be problematic. Let's try to make this more concrete:

reactor Foo {
  output x: int
}
reactor Bar {
  y = new Foo()
  reaction(y.x) {=
    // code can use x to refer to y.x
  =}
}

In this example, I cannot add a port x to y, because it already has one by this name. But I could add another instance of Foo, and then I need to disambiguate:

reactor Foo {
  output x: int
}
reactor Bar {
  y = new Foo()
  a = new Foo()
  reaction(y.x, a.x as z) {=
    // code can use x to refer to y.x
    // code can use z to refer to a.x
  =}
}

This is not a round-tripping problem, as we add something to the reaction signature and this something just needs to fulfil certain constraints.

Also, if we now modify Foo and add a port z like so

reactor Foo {
  output x: int
  output z: int
}

we can do this without problems. We can leave Bar unmodified and it would work as before. So also here, there is no round-tripping problem.


Maybe a less disruptive solution would be to make y.x available as y_x instead of x. This would be sufficient to disambiguate, but avoid the dot notation. In consequence, we would not need the new syntax for renaming.

I must say, though, that I would love to have the renaming option available for all ports and triggers. In programming, names matter and good names are part of the documentation. Maybe a port has a more specific meaning in the context of a specific reaction than it has on the outside interface of a reactor. Maybe also the port of a contained library reactor has an ambiguous name and we want to clarify. So why not have this feature? It should be quite easy to implement (in the C++ target it is trivial), and I also don't see why it would be difficult to document. That said, I agree that we are currently lacking the man power and this is not high priority, so this idea should be added to the backlog.

petervdonovan added a commit that referenced this pull request Mar 3, 2023
@petervdonovan
Copy link
Collaborator Author

As a (late) response to earlier comments from @cmnrd:

Are you saying that the file level preamble would be code generated into a header file, while the reactor-level preamble is generated into a .c file?

Yes.

I am afraid that this wouldn't be general enough, but I am not sure. I believe you will need the public and private preamble mechanism as implemented in the C++ target to separate declaration and definition.

I am in the process of implementing this idea and getting tests/benchmarks to compile with this change. So far I have not run into problems. The way I am doing it right now separates declaration and definition by putting declarations in file-level preambles and definitions in reactor-level preambles. AFAICT this seems to be working OK.

A potential downside is that users are forced to put declarations in file-level preambles even when only one reactor needs them. I do not regard this as a problem, however, since in all the languages I know imports are commonly done at file-level. Indeed, even though I think they can be scoped in Python and TypeScript, it is unusual and even confusing to do so.

Also, be prepared to get all kind of weird issues once imports are involved

There may well be weird issues, but I am not yet aware of any in particular. Superusers who want to do creative tricks with their #includes can do that in their C code, but they should not have to deal with anything out of the ordinary in their LF files.

@cmnrd
Copy link
Collaborator

cmnrd commented Mar 7, 2023

I find the approach interesting. At the same time, I find the constraint to place definitions only in reactor level preambles a bit strange. Consider a case where a couple of reactors use a specific data type. We could place the declaration of the data struct as well as the declaration of some API functions for manipulating the struct in the file level preamble. But where should the definitions go? As I understand, we would need to put them in the local preamble of one reactor, but which? Perhaps the only real answer to this question is that preambles shouldn't be used for this and the type and functions should be declared and defined in external header and .c files.

@petervdonovan
Copy link
Collaborator Author

As I understand, we would need to put them in the local preamble of one reactor, but which? Perhaps the only real answer to this question is that preambles shouldn't be used for this and the type and functions should be declared and defined in external header and .c files.

OK, I understand your concern now. It would be easy to use the private/public keyword for this, and then there would be no problem. However, it also is often possible to determine which reactor is most closely related to the functions, and to put the functions there. Furthermore, I agree with your point that the user can always just use an external .c file -- and indeed they probably will in a nontrivial project; that is AFAICT the whole point of our big emphasis on target language interoperability.

This issue is not too complicated and is very easy to change as we find out more about what LF programs will look like in practice. I propose to table the discussion for now.

petervdonovan added a commit that referenced this pull request Mar 8, 2023
petervdonovan added a commit that referenced this pull request Mar 8, 2023
Copy link
Member

@lhstrh lhstrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks very nice, but I left some questions. I'm a little worried about the change that now requires a bunch of includes in the preamble that didn't used to be necessary.

org.lflang/src/org/lflang/ASTUtils.java Show resolved Hide resolved
org.lflang/src/org/lflang/ASTUtils.java Show resolved Hide resolved
test/C/c/bank_multiport_to_reaction_no_inlining.c Outdated Show resolved Hide resolved
test/C/src/AfterZero.lf Outdated Show resolved Hide resolved
test/C/src/DeadlineHandledAbove.lf Show resolved Hide resolved
@petervdonovan petervdonovan force-pushed the bodyless-reactions branch 2 times, most recently from 057fa36 to dc4177c Compare March 31, 2023 20:29
I think that in the long run this might not be what we want, but it
reduces the impact of this and the accompanying reactor-c PR on
existing LF programs.
The error gets reported with an unexpected line number (not even in the
right file) when a mistake is inserted into the type for the list of
time values. I do not want to fix this right now (honestly it is the C++
compiler's fault, I do not even know how we would fix this), so instead
I will try to cover it up.
I do not know how I accidentally put these changes here but anyway they
were wrong and are the reason why the unit tests have been failing.
The unit tests were failing for a while because a few typos were
corrected in the validator but not in the validation tests.
Copy link
Member

@lhstrh lhstrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Only nitpicks.

test/C/src/multiport/FullyConnectedAddressable.lf Outdated Show resolved Hide resolved
test/C/src/modal_models/ConvertCaseTest.lf Outdated Show resolved Hide resolved
test/C/src/federated/DistributedMultiportToken.lf Outdated Show resolved Hide resolved
@lhstrh
Copy link
Member

lhstrh commented Apr 12, 2023

Waiting to pull the trigger on this one until lf-lang/reactor-c#189 has been merged...

@petervdonovan
Copy link
Collaborator Author

I will change the test so that it passes. This is a slight regression in the quality of the error message due to the change in the grammar that is introduced by this PR. It is a warning sign, but I think we can tolerate it.

@petervdonovan petervdonovan merged commit acd39bc into master Apr 12, 2023
@cmnrd cmnrd deleted the bodyless-reactions branch June 8, 2023 07:45
@petervdonovan petervdonovan added the feature New feature label Aug 25, 2023
@petervdonovan petervdonovan changed the title Extract header files and C code from main file Expose header files for generated structs and allow linking reactions from separate source files Aug 25, 2023
@petervdonovan petervdonovan changed the title Expose header files for generated structs and allow linking reactions from separate source files Expose header files for generated structs, allow linking reactions from separate source files, and update C target preamble visibility Aug 25, 2023
@lhstrh lhstrh changed the title Expose header files for generated structs, allow linking reactions from separate source files, and update C target preamble visibility Generated structs exposed in header files, reactions linkable from separate source files, and updated C target preamble visibility Aug 28, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants