CodeGenComposability_Scipy2013

Abstract

The current state of language implementation and code generation in the Python ecosystem is a rapidly growing untended garden. The vast number of projects at SciPy 2013 that have implemented some form of code generation technology is amazing, but many projects are stepping on each other toes and the path for a user of the ecosystem is not clear. At SciPy 2013 many of us got together to discuss this issue and come up with a path forward. While no clear statement emerged, a consensus that we need more discussion and study of the issues is needed.

Biased Summary

Andy: I called this meeting at SciPy2013 because I have been independently contacted by four individuals about the state of affairs of speeding Python through code generation.

The development activity in the field and the support from the broader community is encouraging. Unfortunately while there have been many successful projects in this area development is highly fragmented. This raises two issues among smaller projects

Uncertainty about the future makes potential developers hesitant
Independent researchers repeatedly create complete compiler infrastructures to test novel components

As a result a large amount of exciting work is either unimplemented or implemented within siloed, uncomposable projects.

Currently there is very little sharing among projects. This makes collaboration and reuse difficult. This is not a new phenomenon in the compiler community. Lars Bergstrom's masters thesis lists over 95 papers that never pushed their work to a common compiler infrastructure. I often refer to this as the composability problem.

The composability problem is that there is no effective way to compose (very) high level languages together. One can break execution from one to another, but this incurs a high cost context switch. This often requires code duplication to pass all the necessary details from one system to the other. It also requires any sort of static analysis to become quite complex as the different languages may employ different semantics. As a result it is difficult to link high and low level code. The scientific python community is feeling this problem in spades as they want both to work naturally.

Our discussion came to two basic goals:

A shared interface for the representation of high-level algorithmic, and array-based computations (how can we share optimizations?)
A shared intermediate level implementation from abstract (usually numeric) algorithms to better code generators (how can we generate efficient low-level code for common algorithms on accelerators?)

While coming up with two Intermediate Representations (IRs) to meet these needs might be the most effective path for sharing, the vast number of use cases (see below) means that someone is not going to be happy with anything. In addition it will be difficult to convince development communities to change existing codebases. The community has come together in the past to define a single array protocol in NumPy, Does an analogous representation exist? How do we work together towards interoperation?

One way forward is to build pairwise translations from one project's graph to another's. This solution seems poorer in quality but has a clear path to solution. Our current plan is to engage the community a bit and then find funding to have a small summit to come to some more face time to engage about the issue.

Discussion Notes

Below is a characterization of that discussion. The original google doc is at https://t.co/WlCUfRzrfX

potential project name: scikit-air (Array Intermediate Representation)

Questions

What do we gain by sharing?
What transformations can we share?
What interface is necessary?

Structures

DAG vs. tree
Type systems: robust (C++?), user defines, concrete types
language vs. library
data format vs. interface

Use Cases

Share common functionality
speed: fast vectorization of user code
data flow
Ufuncs from many libs
Define Intermediate Representations
Routines to transform between Intermediate Representations
Invariant of the IR
Bindings (connecting low level code to Python)
Optimizations of graphs
Algorithmic Differentiation
Operator fusion
Loop fusion
graph transformations
Mathematical / numerical optimizations
Applications
Bayesian probabilistic programming
Numerical optimization
Write OpenCL kernels and ufuncify them over numpy arrays
treat control flow and data flow together

Scope

Want to be architecture independent
Want vector instructions
Don’t duplicate what LLVM does well
track data flow
Source and Target
Lower than SymPy
Should Target GPUs
Shouldn’t go lower than LLVM
don’t depend on LLVM
No ISA, concrete vector intrinsics
GPU generation is hard enough so that we only want to do it once
Do we want to include execution model in this representation?
higher level accelerator target: heterogeneous IR == track execution
Missing Features of LLVM
- Too Low Level
- Weird Arrays / Vectors
- Generic symbols

Common Algorithms

Search
Dissassemble and Assemble
Visitor pattern (and generalizations)
abstract / dynamic properties

Concrete Tasks

Document language implementation (Alex)
Higher-level abstraction above CUDA: Load/store vector machine (Alex)
Define data/control-flow concept (Siu)
Thread on NumFOCUS mailing list (Matt)
NumFOCUS SEP (Anthony)
Send details about PyKit (Siu)
Experiment to test IRs (Andy)
Secure funding (Andy)
Turning explicit loops into the polyhedral model (Serge)

Large Conceptual Pieces (this list should not expand beyond 4-5)

Efficient low-level code generation

Problems

not concrete (enough)
problems not isolated -> piecewise unification of projects
backing / blessing / funding?

People

Projects Represented

dynd
falcon
fwrap
Ignition
Lair
Numba Pro
NumPy
parakeet
pydy
PyOP2/Fenics
PyTrillinos
Seamless / ODIN
sparrow
SymPy
Theano, HyperOpt
xdress

Other Projects generating code in Python

blaze
Copperhead (?)
Cython (?)
FEniCS
Loo.py
PyCUDA / PyOpenCL
PyKit (Mark Florisson)
Pythran
CorePy (defunct?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly