Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

JIT Code Generation #70

Open
auto-differentiation-dev opened this issue Jun 13, 2023 · 2 comments
Open

JIT Code Generation #70

auto-differentiation-dev opened this issue Jun 13, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@auto-differentiation-dev
Copy link
Collaborator

In some applications, the exact same calculations need to be carried out for different inputs repeatedly. Examples are Monte-Carlo simulation where the execution path (branches, etc.) is independent of the input data point.

Re-recording the tape on every path is redundant in this case, and it may be beneficial to record only one path on tape and then generate compiled code from this recording just in time. This code should produce both the value and the derivatives, and optimisations can be applied to increase performance. The overhead of JIT compilation should be compensated for larger numbers of repeated executions.

It would be good to add such a feature to XAD as an option.

@auto-differentiation-dev auto-differentiation-dev added the enhancement New feature or request label Jun 13, 2023
@stebos100
Copy link

Hi All !

I would just like to enquire if this enhancement has been employed within XAD ? And if so is there an example of this being done in code ?

Looking forward to hearing from you

Kind regards
Stephan

@auto-differentiation-dev
Copy link
Collaborator Author

Hi Stephan,

This feature has not yet been employed in XAD - it is in planning stage at the moment.

At this point we envisage the following steps, and are interested in feedback and ideas:

  • First, we should implement a mode that does not use expression templates - it would greatly simplify the process of JIT compilation. Expression templates are not required for this mode of operation, as the computation of adjoints will be based on compiled code and the performance improvements from expression templates are not needed. This has been captured in issue Add a mode without expression templates for debugging #34
  • Then, every operation should generate code for both the forward operation as well as the reverse run. We think that the LLVM framework would be a good choice for this - generating LLVM IR instead of C++ source code. The operations would be simple arithmetic, so this should be relatively straightforward. The result of this step would be a complete LLVM IR representation of the code-path taken, including calculating the reverse path.
  • Once recording is done, this should be passed to the LLVM compiler, to generate optimised binary code from this LLVM IR. This binary code should be dynamically loaded into the application at runtime, so that it can be executed as part of the same program execution.
  • This function can then be executed with arbitrary values for the registered input variables, as well as the seed(s) for the output adjoints. It would produce the value and the first order derivatives.
  • As the JIT compilation step will add time overheads, this mode will almost certainly be slower than direct interpretation of the tape. Hence this mode only makes sense for Monte-Carlo type setups, where the same recorded code-path is to be executed many times with different inputs. XAD will therefore need an API for users to capture the function and re-run it for different input values.
  • Depending on the performance observed in the first implementation, it is likely that some optimisations should be applied to the code generation, e.g.:
    • generate vector code, e.g. AVX512, for the native machine the code runs on, so that multiple Monte-Carlo paths can be executed simultaneously.
    • eliminate dead code, i.e. code the outputs don't depend on, and eliminate it
    • detect loops and repetitions and also place them in the generated code, to reduce the code size to be JIT compiled
    • cache compilation results, so that when the same program is run again with different input, the compilation step can be skipped
    • etc.

NOTE: It is very important that the code-path stays exactly the same for any other inputs, as it's recorded only once. It is essential that no branches, polymorphic function calls, number of loop iterations, etc, depend on the independent input data. Ideas how to enforce and detect data-dependent branching are welcome.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants