You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 1, 2025. It is now read-only.
This is the top level issue to track all the work we plan to do to make the glow runtime supports concurrent execution, pipelining, batching and so on.
At a high level, the idea for the runtime is to be able to:
Enqueue inputs: Run input0, then run input1 as soon as the previous run is done, etc.
Slice the inputs into batch size and transparently run them: Take N input and sequentially run them in batches of M (where M is the size of the compiled model and N the actual run size.)
Pipeline work across models: Run input1 on model M1, then run the result of M1 on M2 while running input2 on M1, etc.
Among other things, the glow runtime will have to:
Manage input/output queues for each model (and communication with the devices)
Manage incoming model
Keep track of data dependencies and schedule next tasks to be done
Split inputs
Pad inputs
Dispatch workload on device
Keep track of the status of devices
Also, somewhat orthogonal to the runtime, but related, glow will need to:
Determine what and where to run things (graph partitioning)
Right now, we started by splitting the compilation and runtime stages properly.
This work is tracked in: #2040, #1967, #1953, #1951