Skip to content

String tensor support #832

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 43 commits into from
Sep 10, 2021
Merged

String tensor support #832

merged 43 commits into from
Sep 10, 2021

Conversation

alonre24
Copy link
Collaborator

@alonre24 alonre24 commented Aug 22, 2021

We introduce string tensor for RedisAI, which are tensors in which every element is a null-terminated string (we support utf-8 format).
This PR adds the following capabilities:

  • store and retrive string tensors in RedisAI by using the known API: AI_TENSORSET <KEY> string <shape_1> [<shape_2> ...] [BLOB <blob> | VALUES <str_1> ...]
  • Run ONNX models whose inputs/outputs are string tensors

todo:

  • Enable running TF models whose inputs/outputs are string tensors
  • Add string tensor support to LLAPI (?)

@alonre24 alonre24 requested a review from DvirDukhan August 23, 2021 06:12
@codecov
Copy link

codecov bot commented Aug 23, 2021

Codecov Report

Merging #832 (2b3f391) into master (de0f302) will increase coverage by 1.24%.
The diff coverage is 91.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #832      +/-   ##
==========================================
+ Coverage   79.97%   81.22%   +1.24%     
==========================================
  Files          53       55       +2     
  Lines        8009     8126     +117     
==========================================
+ Hits         6405     6600     +195     
+ Misses       1604     1526      -78     
Impacted Files Coverage Δ
src/serialization/AOF/rai_aof_rewrite.c 0.00% <0.00%> (ø)
.../serialization/RDB/decoder/previous/v2/decode_v2.c 46.56% <0.00%> (+9.81%) ⬆️
src/execution/parsing/deprecated.c 81.18% <66.66%> (+1.48%) ⬆️
...c/serialization/RDB/decoder/current/v4/decode_v4.c 71.42% <71.42%> (ø)
.../serialization/RDB/decoder/previous/v0/decode_v0.c 63.33% <72.72%> (+1.69%) ⬆️
tests/module/LLAPI.c 75.83% <79.01%> (+1.36%) ⬆️
src/serialization/RDB/encoder/v4/encode_v4.c 85.96% <81.81%> (ø)
src/backends/tensorflow.c 72.16% <91.00%> (+3.12%) ⬆️
src/backends/onnxruntime.c 78.80% <91.86%> (+1.42%) ⬆️
src/redis_ai_objects/tensor.c 91.72% <94.52%> (+5.03%) ⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53f16b6...2b3f391. Read the comment docs.

Copy link
Collaborator

@DvirDukhan DvirDukhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see tensor.h/tensor.c comments

please add:

  1. script to create onnx model
  2. DAG tests
  3. v4 encoding

size_t RAI_TensorByteSize(RAI_Tensor *t) {
// TODO: as per dlpack it should be
// size *= (t->dtype.bits * t->dtype.lanes + 7) / 8;
// todo: change after verify that blob size is initialized properly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This upper TODO is old (Luca?), and I don't really understand what it means. Should I remove it?
About the other todo - will do.

@chayim chayim added ci-test and removed ci-test labels Aug 24, 2021
@@ -1,29 +1,6 @@
import onnx
from onnx import helper, TensorProto, ModelProto, defs


def make_model(graph, **kwargs): # type: (GraphProto, **Any) -> ModelProto
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's validate ONNX versions vs ONNXRuntime versions here please

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated - using ONNX 1.8.0 when running this script will create a model that runs properly on ONNXRuntime 1.8.1.
Note that if we use ONNX latest (1.10.1), Opset version is 15, while ONNXRuntime 1.8.1 supports up to 14. I assume that ONNXRumtime will support Opset version 15...

@alonre24 alonre24 added ci-test and removed ci-test labels Sep 5, 2021
@alonre24 alonre24 added ci-test and removed ci-test labels Sep 5, 2021
@alonre24 alonre24 removed the ci-test label Sep 6, 2021
Comment on lines +48 to +49
redis> AI.TENSORSET mytensor STRING 1 2 VALUES first second
OK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please have a dedicated section regarding strings
how they should be provided by blob and by values.
consider doing so in a different PR involving documentation for BOOL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will do in a new PR

@alonre24 alonre24 merged commit d2565ac into master Sep 10, 2021
@alonre24 alonre24 deleted the string_tensor_support branch September 10, 2021 11:40
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants