Skip to content

wasi-parallel is a proposal to add a parallel for construct to WASI.

Notifications You must be signed in to change notification settings

WebAssembly/wasi-parallel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wasi-parallel

A proposed WebAssembly System Interface API for parallel computation.

Current Phase

wasi-parallel is currently in Phase 1.

Champions

Phase 4 Advancement Criteria

wasi-parallel must have at least two complete independent implementations.

Table of Contents

Introduction

wasi-parallel addresses the need for parallel execution. By treating parallelism as a system capability, this API allows parallel workloads to be offloaded to a variety of devices, from CPUs to GPUs to FPGAs. The current specification is a subset of the features provided by other parallel programming frameworks (e.g., OpenMP, OpenCL).

WebAssembly lacks support for parallel execution in general and this can be a significant performance lag in several domains (ML, HPC). SIMD (128-bit or larger) does not fully address the issue: many programs benefit from parallel execution and standalone WebAssembly engines have no standard way to access this system capability (unlike browser Web Workers).

wasi-parallel was introduced in 2021 (see the slides and meeting notes), prior to the wasi-threads proposal. If you are solely interested in spawning CPU threads, wasi-threads is the right API (see the considered alternatives sections for more details).

Goals

  • improve performance: this API should make it possible to improve the performance of certain parallel applications, especially those designed around a "parallel for" construct.
  • any kind of parallel device: this proposal aims for parallel execution on heterogeneous devices (e.g., CPU, GPU, FPGA?).

Non-goals

  • modify core WebAssembly: the current proposal does not propose changes to the WebAssembly instruction set.
  • replicate a parallel programming model: many parallel programming frameworks already exist (e.g., OpenMP, OpenCL, pthreads); this API does not intend to match all features of any existing framework. Here, we explore the possibility of compiling programs written under those frameworks but do not guarantee compilation of existing parallel programs (i.e., due to "missing" wasi-parallel functionality).

API walk-through

TODO

Detailed design discussion

The design of wasi-parallel is still in an experimental phase. Suggestions are welcome as an issue!

Device selection

First, the user must be able to pick a parallel device to execute on. Early feedback on the design (from the browser ecosystem) indicated that, since not all hosts would support all kinds of parallel devices, the command to retrieve a device should always succeed. This means that the kind of device the user selects is only a hint and can be overriden by the host.

get-device: func(hint: device-kind) -> expected<device, error>

Buffer management

Since the parallel device could be something other than the CPU, there must be some way to indicate what regions of WebAssembly memory will be used by the device. The host is then responsible for transferring the memory to and from the device. The host, however, is not required to copy the memory &mdash implementations of a parallel CPU device could simply pass around pointers to shared memory.

create-buffer: func(device: device, size: u32, kind: buffer-access-kind) -> expected<buffer, error>
write-buffer: func(data: list<u8>, buffer: buffer) -> expected<unit, error>
read-buffer: func(buffer: buffer) -> expected<list<u8>, error>

Kernel definition

There must be a way to indicate what code should be run in parallel. Several other designs were discarded to reach the current mechanism: a binary-encoded WebAssembly module that exports a kernel function and imports a shared memory. When invoked by parallel-exec, this kernel is instantiated by the host and scheduled on the parallel device; the call returns once the parallel execution is complete.

parallel-exec: func(device: device, kernel: list<u8>, num-iterations: u32, block-size: u32, buffers: list<buffer>) -> expected<unit, error>

Considered alternatives

Other approaches are possible:

wasi-threads

The wasi-threads proposal aims to expose host thread creation to a WebAssembly program. With some caveats, wasi-threads can be used to implement pthreads. The primary use case is exposing CPU-executed, OS-managed threads; users who simply need threads should look there first.

Because wasi-parallel aims to allow parallel execution on more than just CPUs, the API is quite different. Memory may be synchronized between devices, so it includes buffer management APIs. And the number of iterations to execute must be known up front for the device driver (e.g., GPU) to schedule the iterations optimally.

Host APIs

This API standardizes a subset of the functionality available for parallel execution on a host system. Users of WebAssembly programs could instead use this parallelism natively on the host side and expose custom APIs for their workload — in other words, skip standardization. This is a legitimate approach that users should consider, especially if the host hardware is known and fixed. wasi-parallel is targeted at situations in which parallelism is needed but the exact host environment is unknown.

Stakeholder Interest & Feedback

TODO before entering Phase 3.

References & acknowledgements

Many thanks for valuable feedback and advice from:

About

wasi-parallel is a proposal to add a parallel for construct to WASI.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages