Skip to content

Reuse memory in TENSORSET #540

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 4 commits into from
Dec 23, 2020
Merged

Reuse memory in TENSORSET #540

merged 4 commits into from
Dec 23, 2020

Conversation

lantiga
Copy link
Contributor

@lantiga lantiga commented Dec 20, 2020

Addresses #515 by reusing memory allocated in argv in TENSORSET.

@lantiga lantiga changed the title Tensorset memreuse Reuse memory in TENSORSET Dec 20, 2020
@codecov
Copy link

codecov bot commented Dec 20, 2020

Codecov Report

Merging #540 (71e5acb) into master (3f9d4f0) will increase coverage by 0.02%.
The diff coverage is 97.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #540      +/-   ##
==========================================
+ Coverage   75.74%   75.76%   +0.02%     
==========================================
  Files          25       25              
  Lines        5384     5401      +17     
==========================================
+ Hits         4078     4092      +14     
- Misses       1306     1309       +3     
Impacted Files Coverage Δ
src/tensor.c 82.75% <97.05%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f9d4f0...71e5acb. Read the comment docs.

Copy link
Collaborator

@DvirDukhan DvirDukhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good
small comment

@@ -814,7 +868,14 @@ int RAI_parseTensorSetArgs(RedisModuleCtx *ctx, RedisModuleString **argv, int ar
size_t datalen;
const char *data;
DLDataType datatype = RAI_TensorDataTypeFromString(typestr);
*t = RAI_TensorCreateWithDLDataType(datatype, dims, ndims, tensorAllocMode);
if (datafmt == REDISAI_DATA_BLOB) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you are checking it here
the switch in line 889 is redundant since it checks only a single case. I think you can move its content to the else block in line 875

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@filipecosta90
Copy link
Collaborator

filipecosta90 commented Dec 23, 2020

I've quickly ran the benchmark that produced the best result for autobatching variation ( 60 clients, autobatching 30 ) ( command OUTPUT_NAME_SUFIX=tensorset_PR_ MIN_CLIENTS=60 MAX_CLIENTS=60 MIN_TENSOR_BATCHSIZE=0 MAX_TENSOR_BATCHSIZE=0 MAX_BATCHSIZE=30 MIN_BATCHSIZE=30 SLEEP_BETWEEN_RUNS=0 DATABASE_HOST=10.3.0.207 DATABASE_PORT=6380 NUM_VISION_INFERENCES=25000 ./scripts/run_inference_redisai_vision.sh ) and we see an reduction in the memory BW of around 300MB/sec.

Notice that at ~= 400 inferences/sec we have an expected BW of 224 * 224 * 3 * 4bytes * 400 / ( 1024^3 ) = 0.23 GB/sec as show bellow by the table.

tensor size bytes 602112
tensorset expected BW GB/sec @400 inferences/sec 0.2335455322

If we look at the numbers we notice that the memory BW reduction was of aroud 0.3 GB/sec ( matches the expected ) and the expected improvement of 5% both in the overall ops/sec and inference latency.

notes commit overall inferences/sec p50 p75 p99 page faults/sec memory BW GB/sec
master e06c663 397.85 151.42 154.37 162.56 1400000 5.340576172
PR 52c4235 416.48 143.87 147.07 156.16 1313000 5.00869751
-- %improvement 4.68% 4.99% 4.73% 3.94% 6.21% 6.21%

Detail of memory BW by using the minor page faults/sec counter of e06c663

( the drops mean a new test iteration ( 3 iterations ) )
image

Detail of memory BW by using the minor page faults/sec counter of 52c4235

( the drops mean a new test iteration ( 3 iterations ) )
image


Bottom line we see that this generally improves the inference performance ( even when it's faded by modelrun+tensorget) so I would merge this asap and work on further reducing the overall mem bandwidth by for example pushing forward the investigation of reusing tensor allocated memory on backends ( example of tensorflow with dlpack inputs )

@lantiga lantiga merged commit 65787a1 into master Dec 23, 2020
@lantiga lantiga deleted the tensorset-memreuse branch December 23, 2020 14:20
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve RAI_TensorSet_RedisCommand memory growth/page_fault code path by reusing already allocated redis BLOB
3 participants