Reuse memory in TENSORSET #540

lantiga · 2020-12-20T21:59:47Z

Addresses #515 by reusing memory allocated in argv in TENSORSET.

codecov · 2020-12-20T22:22:44Z

Codecov Report

Merging #540 (71e5acb) into master (3f9d4f0) will increase coverage by 0.02%.
The diff coverage is 97.05%.

@@            Coverage Diff             @@
##           master     #540      +/-   ##
==========================================
+ Coverage   75.74%   75.76%   +0.02%     
==========================================
  Files          25       25              
  Lines        5384     5401      +17     
==========================================
+ Hits         4078     4092      +14     
- Misses       1306     1309       +3

Impacted Files	Coverage Δ
src/tensor.c	`82.75% <97.05%> (-0.02%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f9d4f0...71e5acb. Read the comment docs.

DvirDukhan

looks good
small comment

DvirDukhan · 2020-12-21T16:14:09Z

src/tensor.c

@@ -814,7 +868,14 @@ int RAI_parseTensorSetArgs(RedisModuleCtx *ctx, RedisModuleString **argv, int ar
    size_t datalen;
    const char *data;
    DLDataType datatype = RAI_TensorDataTypeFromString(typestr);
-    *t = RAI_TensorCreateWithDLDataType(datatype, dims, ndims, tensorAllocMode);
+    if (datafmt == REDISAI_DATA_BLOB) {


since you are checking it here
the switch in line 889 is redundant since it checks only a single case. I think you can move its content to the else block in line 875

filipecosta90 · 2020-12-23T12:08:11Z

I've quickly ran the benchmark that produced the best result for autobatching variation ( 60 clients, autobatching 30 ) ( command OUTPUT_NAME_SUFIX=tensorset_PR_ MIN_CLIENTS=60 MAX_CLIENTS=60 MIN_TENSOR_BATCHSIZE=0 MAX_TENSOR_BATCHSIZE=0 MAX_BATCHSIZE=30 MIN_BATCHSIZE=30 SLEEP_BETWEEN_RUNS=0 DATABASE_HOST=10.3.0.207 DATABASE_PORT=6380 NUM_VISION_INFERENCES=25000 ./scripts/run_inference_redisai_vision.sh ) and we see an reduction in the memory BW of around 300MB/sec.

Notice that at ~= 400 inferences/sec we have an expected BW of 224 * 224 * 3 * 4bytes * 400 / ( 1024^3 ) = 0.23 GB/sec as show bellow by the table.

tensor size bytes	602112
tensorset expected BW GB/sec @400 inferences/sec	0.2335455322

If we look at the numbers we notice that the memory BW reduction was of aroud 0.3 GB/sec ( matches the expected ) and the expected improvement of 5% both in the overall ops/sec and inference latency.

notes	commit	overall inferences/sec	p50	p75	p99	page faults/sec	memory BW GB/sec
master	`e06c663`	397.85	151.42	154.37	162.56	1400000	5.340576172
PR	`52c4235`	416.48	143.87	147.07	156.16	1313000	5.00869751
--	%improvement	4.68%	4.99%	4.73%	3.94%	6.21%	6.21%

Detail of memory BW by using the minor page faults/sec counter of `e06c663`

( the drops mean a new test iteration ( 3 iterations ) )

Detail of memory BW by using the minor page faults/sec counter of `52c4235`

( the drops mean a new test iteration ( 3 iterations ) )

Bottom line we see that this generally improves the inference performance ( even when it's faded by modelrun+tensorget) so I would merge this asap and work on further reducing the overall mem bandwidth by for example pushing forward the investigation of reusing tensor allocated memory on backends ( example of tensorflow with dlpack inputs )

lantiga changed the title ~~Tensorset memreuse~~ Reuse memory in TENSORSET Dec 20, 2020

lantiga requested review from filipecosta90, alonre24 and DvirDukhan December 20, 2020 23:00

DvirDukhan reviewed Dec 21, 2020

View reviewed changes

lantiga added 3 commits December 21, 2020 17:38

Reuse memory allocated for argv in TENSORSET

d5aa1e6

clang formatting

b54888a

Remove useless switch

52c4235

lantiga force-pushed the tensorset-memreuse branch from e37f506 to 52c4235 Compare December 21, 2020 16:44

filipecosta90 linked an issue Dec 23, 2020 that may be closed by this pull request

Improve RAI_TensorSet_RedisCommand memory growth/page_fault code path by reusing already allocated redis BLOB #515

Closed

filipecosta90 added the enhancement label Dec 23, 2020

filipecosta90 approved these changes Dec 23, 2020

View reviewed changes

Merge branch 'master' into tensorset-memreuse

71e5acb

lantiga merged commit 65787a1 into master Dec 23, 2020

lantiga deleted the tensorset-memreuse branch December 23, 2020 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reuse memory in TENSORSET #540

Reuse memory in TENSORSET #540

Uh oh!

lantiga commented Dec 20, 2020

Uh oh!

codecov bot commented Dec 20, 2020 •

edited

Loading

Uh oh!

DvirDukhan left a comment

Uh oh!

DvirDukhan Dec 21, 2020

Uh oh!

lantiga Dec 21, 2020

Uh oh!

filipecosta90 commented Dec 23, 2020 •

edited

Loading

Uh oh!

Uh oh!

Reuse memory in TENSORSET #540

Reuse memory in TENSORSET #540

Uh oh!

Conversation

lantiga commented Dec 20, 2020

Uh oh!

codecov bot commented Dec 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DvirDukhan left a comment

Choose a reason for hiding this comment

Uh oh!

DvirDukhan Dec 21, 2020

Choose a reason for hiding this comment

Uh oh!

lantiga Dec 21, 2020

Choose a reason for hiding this comment

Uh oh!

filipecosta90 commented Dec 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Detail of memory BW by using the minor page faults/sec counter of e06c663

Detail of memory BW by using the minor page faults/sec counter of 52c4235

Uh oh!

Uh oh!

codecov bot commented Dec 20, 2020 •

edited

Loading

filipecosta90 commented Dec 23, 2020 •

edited

Loading

Detail of memory BW by using the minor page faults/sec counter of `e06c663`

Detail of memory BW by using the minor page faults/sec counter of `52c4235`