-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[js/api] introducing IO binding for tensor #16452
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
A few questions need to be figured out:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this PR is only the tensor part. You still need another PR to make sure the tensorFromTextureXXX
's gpu resource is in the same context with the backend so that the external resource can be recognized by backend and used for copy/draw/dispatch.
Just for inference, in tfjs, create tensor from cpu/buffer/texture is like below:
export function tensor<R extends Rank>(values: TensorLike|WebGLData|WebGPUData, shape?: ShapeMap[R], dtype?: DataType): Tensor<R>
And for getting data from tensor, there are three methods:
tensor.data()
// Asynchronously downloads the values.
tensor.dataSync()
// Synchronously downloads the values.
tensor.dataToGPU()
// Copy the tensor's data to a new GPU resource. Comparing to the dataSync() and data(), this method prevents data from being downloaded to CPU.
unlike tfjs, ort-web always runs a model. ORT-web users cannot run a single kernel, pause from a middle point of a graph, or use any graph API to construct a model graph. This offers less flexibility in return of a much simpler usage scenario. This is almost all users do:
so, this mean, in ort-web(and all other ort-JS libraries), tensors are 2 types: created by users and created by runtime. Tensors that created by users will be used as input tensors or pre-allocated bound outputs, and tensors that created by runtime are originally a model's output tensors, but they can be used as input to another model. Tensors created by users do not "own" the underlying resources. Users are expected to use the non-internal APIs to create CPU tensors via the following constructor: new Tensor(type, data, dims?);
new Tensor(typedArrayData, dims?); or create location specific tensors using Tensor.fromTexture(texture, options); // with no 'download' and 'dispose' in 'options'
Tensor.fromGpuBuffer(gpuBuffer, options); // with no 'download' and 'dispose' in 'options'
Tensor.fromPinnedBuffer(type, buffer, dims?); on the other hand, tensor created by ORT as outputs, should be created with 'download' and 'dispose' so that users can manually release the data. Considering the above explaination, we can tell the following scenarios:
Overall, considering multiple reasons, onnxruntime-web choose to use a different way to design the Tensor class like what tfjs does. this brings pros and cons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one nit. Thanks.
is there an onnx for example? |
results still return CPU data |
I found that
|
[//]: # (## Work In Progress. Feedbacks are welcome!) ### Description This PR adds a few properties, methods and factories to Tensor type to support IO-binding feature. This will allow user to create tensor from GPU/CPU bound data without a force transferring of data between CPU and GPU. This change is a way to resolve microsoft#15312 ### Change Summary 1. Add properties to `Tensor` type: a. `location`: indicating where the data is sitting. valid values are `cpu`, `cpu-pinned`, `texture`, `gpu-buffer`. b. `texture`: sit side to `data`, a readonly property of `WebGLTexture` type. available only when `location === 'texture'` c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer` type. available only when `location === 'gpu-buffer'` 2. Add methods to `Tensor` type (usually dealing with inference outputs): - async function `getData()` allows user to download data from GPU to CPU manually. - function `dispose()` allows user to release GPU resources manually. 3. Add factories for creating `Tensor` instances: a. `fromTexture()` to create a WebGL texture bound tensor data b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer ### Examples: create tensors from texture and pass to inference session as inputs ```js // when create session, specify we prefer 'image_output:0' to be stored on GPU as texture const session = await InferenceSession.create('./my_model.onnx', { executionProviders: [ 'webgl' ], preferredOutputLocation: { 'image_output:0': 'texture' } }); ... const myImageTexture = getTexture(); // user's function to get a texture const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format. const results = await session.run(myFeeds); const myOutputTexture = results['image_output:0'].texture; ```
[//]: # (## Work In Progress. Feedbacks are welcome!) ### Description This PR adds a few properties, methods and factories to Tensor type to support IO-binding feature. This will allow user to create tensor from GPU/CPU bound data without a force transferring of data between CPU and GPU. This change is a way to resolve microsoft#15312 ### Change Summary 1. Add properties to `Tensor` type: a. `location`: indicating where the data is sitting. valid values are `cpu`, `cpu-pinned`, `texture`, `gpu-buffer`. b. `texture`: sit side to `data`, a readonly property of `WebGLTexture` type. available only when `location === 'texture'` c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer` type. available only when `location === 'gpu-buffer'` 2. Add methods to `Tensor` type (usually dealing with inference outputs): - async function `getData()` allows user to download data from GPU to CPU manually. - function `dispose()` allows user to release GPU resources manually. 3. Add factories for creating `Tensor` instances: a. `fromTexture()` to create a WebGL texture bound tensor data b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer ### Examples: create tensors from texture and pass to inference session as inputs ```js // when create session, specify we prefer 'image_output:0' to be stored on GPU as texture const session = await InferenceSession.create('./my_model.onnx', { executionProviders: [ 'webgl' ], preferredOutputLocation: { 'image_output:0': 'texture' } }); ... const myImageTexture = getTexture(); // user's function to get a texture const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format. const results = await session.run(myFeeds); const myOutputTexture = results['image_output:0'].texture; ```
Description
This PR adds a few properties, methods and factories to Tensor type to support IO-binding feature. This will allow user to create tensor from GPU/CPU bound data without a force transferring of data between CPU and GPU.
This change is a way to resolve #15312
Change Summary
Add properties to
Tensor
type:a.
location
: indicating where the data is sitting. valid values arecpu
,cpu-pinned
,texture
,gpu-buffer
.b.
texture
: sit side todata
, a readonly property ofWebGLTexture
type. available only whenlocation === 'texture'
c.
gpuBuffer
: sit side todata
, a readonly property ofGPUBuffer
type. available only whenlocation === 'gpu-buffer'
Add methods to
Tensor
type (usually dealing with inference outputs):getData()
allows user to download data from GPU to CPU manually.dispose()
allows user to release GPU resources manually.Add factories for creating
Tensor
instances:a.
fromTexture()
to create a WebGL texture bound tensor datab.
fromGpuBuffer()
to create a WebGPUBuffer bound tensor datac.
fromPinnedBuffer()
to create a tensor using a CPU pinned bufferExamples:
create tensors from texture and pass to inference session as inputs