Skip to content

[js/api] introducing IO binding for tensor #16452

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 22 commits into from
Aug 29, 2023
Merged

Conversation

fs-eire
Copy link
Contributor

@fs-eire fs-eire commented Jun 22, 2023

Description

This PR adds a few properties, methods and factories to Tensor type to support IO-binding feature. This will allow user to create tensor from GPU/CPU bound data without a force transferring of data between CPU and GPU.

This change is a way to resolve #15312

Change Summary

  1. Add properties to Tensor type:
    a. location: indicating where the data is sitting. valid values are cpu, cpu-pinned, texture, gpu-buffer.
    b. texture: sit side to data, a readonly property of WebGLTexture type. available only when location === 'texture'
    c. gpuBuffer: sit side to data, a readonly property of GPUBuffer type. available only when location === 'gpu-buffer'

  2. Add methods to Tensor type (usually dealing with inference outputs):

    • async function getData() allows user to download data from GPU to CPU manually.
    • function dispose() allows user to release GPU resources manually.
  3. Add factories for creating Tensor instances:
    a. fromTexture() to create a WebGL texture bound tensor data
    b. fromGpuBuffer() to create a WebGPUBuffer bound tensor data
    c. fromPinnedBuffer() to create a tensor using a CPU pinned buffer

Examples:

create tensors from texture and pass to inference session as inputs

// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
  executionProviders: [ 'webgl' ],
  preferredOutputLocation: { 'image_output:0': 'texture' }
});

...

const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;

@fs-eire
Copy link
Contributor Author

fs-eire commented Jun 22, 2023

A few questions need to be figured out:

  • what is a reasonable definition for the options (second parameter) for Tensor.fromTexture()? currently I only have width and height in definition. maybe need more (layout/format/...?)
  • to add { preserveGpuData: true } to session options so that it can produce texture bound tensor as output instead of previous behavior (always download to CPU) now using new property preferredOutputLocation in session options.
  • to add functions from wasm to allocate/free memories for CPU-pinned buffer.

@fs-eire fs-eire requested a review from guschmue June 23, 2023 00:35
guschmue
guschmue previously approved these changes Jul 11, 2023
Copy link
Contributor

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this PR is only the tensor part. You still need another PR to make sure the tensorFromTextureXXX's gpu resource is in the same context with the backend so that the external resource can be recognized by backend and used for copy/draw/dispatch.

Just for inference, in tfjs, create tensor from cpu/buffer/texture is like below:

export function tensor<R extends Rank>(values: TensorLike|WebGLData|WebGPUData, shape?: ShapeMap[R], dtype?: DataType): Tensor<R>

And for getting data from tensor, there are three methods:
tensor.data() // Asynchronously downloads the values.
tensor.dataSync() // Synchronously downloads the values.
tensor.dataToGPU() // Copy the tensor's data to a new GPU resource. Comparing to the dataSync() and data(), this method prevents data from being downloaded to CPU.

@fs-eire
Copy link
Contributor Author

fs-eire commented Jul 24, 2023

I suppose this PR is only the tensor part. You still need another PR to make sure the tensorFromTextureXXX's gpu resource is in the same context with the backend so that the external resource can be recognized by backend and used for copy/draw/dispatch.

Just for inference, in tfjs, create tensor from cpu/buffer/texture is like below:

export function tensor<R extends Rank>(values: TensorLike|WebGLData|WebGPUData, shape?: ShapeMap[R], dtype?: DataType): Tensor<R>

And for getting data from tensor, there are three methods: tensor.data() // Asynchronously downloads the values. tensor.dataSync() // Synchronously downloads the values. tensor.dataToGPU() // Copy the tensor's data to a new GPU resource. Comparing to the dataSync() and data(), this method prevents data from being downloaded to CPU.

unlike tfjs, ort-web always runs a model. ORT-web users cannot run a single kernel, pause from a middle point of a graph, or use any graph API to construct a model graph. This offers less flexibility in return of a much simpler usage scenario. This is almost all users do:

  • create input tensor(s)
  • call session.run()
  • get output tensor(s)

so, this mean, in ort-web(and all other ort-JS libraries), tensors are 2 types: created by users and created by runtime. Tensors that created by users will be used as input tensors or pre-allocated bound outputs, and tensors that created by runtime are originally a model's output tensors, but they can be used as input to another model.

Tensors created by users do not "own" the underlying resources. Users are expected to use the non-internal APIs to create CPU tensors via the following constructor:

new Tensor(type, data, dims?);
new Tensor(typedArrayData, dims?);

or create location specific tensors using

Tensor.fromTexture(texture, options); // with no 'download' and 'dispose' in 'options'
Tensor.fromGpuBuffer(gpuBuffer, options); // with no 'download' and 'dispose' in 'options'
Tensor.fromPinnedBuffer(type, buffer, dims?);

on the other hand, tensor created by ORT as outputs, should be created with 'download' and 'dispose' so that users can manually release the data.

Considering the above explaination, we can tell the following scenarios:

  • downloading GPU data to CPU for user created tensors: NOT ALLOWED. we don't expect users to use Tensor class to download GPU data from their own resources. If this is a model input, ORT will handle data transfer inside if a copy is required.
  • uploading CPU data to GPU for user created tensors: NOT ALLOWED. If this is a model input, ORT will handle data transfer inside.
  • downloading GPU data to CPU for ORT created tensors: via tensor.getData().
  • uploading CPU data to GPU for ORT created tensors: NOT ALLOWED. I assume this scenario is actually out of scope, as users can use the raw data to play with their canvas/image element.

Overall, considering multiple reasons, onnxruntime-web choose to use a different way to design the Tensor class like what tfjs does. this brings pros and cons.

guschmue
guschmue previously approved these changes Jul 25, 2023
Copy link
Contributor

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit. Thanks.

@fs-eire fs-eire merged commit e5ca3f3 into main Aug 29, 2023
@fs-eire fs-eire deleted the fs-eire/js-api-tensor-gpu branch August 29, 2023 19:58
@langhuihui
Copy link

is there an onnx for example?

@langhuihui
Copy link

results still return CPU data

@langhuihui
Copy link

I found that preferredOutputLocation only used in wasm/wasm-core-impl.ts. But webgl only use backend-onnxjs.ts


export * from 'onnxruntime-common';
import * as ort from 'onnxruntime-common';
export default ort;

import {registerBackend, env} from 'onnxruntime-common';
import {version} from './version';

if (!BUILD_DEFS.DISABLE_WEBGL) {
  const onnxjsBackend = require('./backend-onnxjs').onnxjsBackend;
  registerBackend('webgl', onnxjsBackend, -10);
}

if (!BUILD_DEFS.DISABLE_WASM) {
  const wasmBackend = BUILD_DEFS.DISABLE_TRAINING ? require('./backend-wasm-inference').wasmBackend :
                                                    require('./backend-wasm-training').wasmBackend;
  if (!BUILD_DEFS.DISABLE_WEBGPU) {
    registerBackend('webgpu', wasmBackend, 5);
    registerBackend('webnn', wasmBackend, 5);
  }
  registerBackend('cpu', wasmBackend, 10);
  registerBackend('wasm', wasmBackend, 10);
}

Object.defineProperty(env.versions, 'web', {value: version, enumerable: true});

kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
[//]: # (## Work In Progress. Feedbacks are welcome!)

### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.

This change is a way to resolve microsoft#15312

### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`

2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.

3. Add factories for creating `Tensor` instances:
    a. `fromTexture()` to create a WebGL texture bound tensor data
    b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
    c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer

### Examples:

create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
  executionProviders: [ 'webgl' ],
  preferredOutputLocation: { 'image_output:0': 'texture' }
});

...

const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
siweic0 pushed a commit to siweic0/onnxruntime-web that referenced this pull request May 9, 2024
[//]: # (## Work In Progress. Feedbacks are welcome!)

### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.

This change is a way to resolve microsoft#15312

### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`

2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.

3. Add factories for creating `Tensor` instances:
    a. `fromTexture()` to create a WebGL texture bound tensor data
    b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
    c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer

### Examples:

create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
  executionProviders: [ 'webgl' ],
  preferredOutputLocation: { 'image_output:0': 'texture' }
});

...

const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] ORT web API to use WebGL texture as model input
5 participants