Skip to content

Onnx kill switch #779

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 35 commits into from
Jun 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
040043d
Introduce kill switch mechanism for onnxruntime sessions (not ready yet)
alonre24 May 26, 2021
0961e66
Putting login in onnx backend (not ready yet)
alonre24 May 27, 2021
9d5fbf8
WIP
alonre24 May 27, 2021
22662bd
Refactor background workers + add support to kill switch in onnx (for…
alonre24 May 29, 2021
c3a45e9
Refactor - do not use rax, extend onnxRunSessions array whenever a ne…
alonre24 May 30, 2021
2684852
Refactor backends loading
alonre24 May 30, 2021
cd9baa1
Start testing - not finished
alonre24 May 31, 2021
5c09106
Support bool type tensor
alonre24 May 31, 2021
5d3dd2c
Support tensors of type bool. Add validation that a input value doesn…
alonre24 May 31, 2021
fa14217
Merge branch 'Support_BOOL_type_for_tensors' into ONNX_kill_switch
alonre24 May 31, 2021
04dac08
Support tensor of type bool in ONNX, Add tests for kill switch
alonre24 Jun 1, 2021
1d6b3ed
Add load time config for ONNX_TIMEOUT. Parallel tests seems not to work.
alonre24 Jun 1, 2021
ea3c174
Some fixes
alonre24 Jun 2, 2021
05c2a39
Merge master (resolve conflicts in backends.c)
alonre24 Jun 6, 2021
4bbfbcd
Remove debug print
alonre24 Jun 6, 2021
cd2936c
Merge master with updated changes of supporting tensor of type bool
alonre24 Jun 6, 2021
4aed8ca
Some fixes and documentation complement.
alonre24 Jun 6, 2021
6cd9652
Refactor load time config
alonre24 Jun 6, 2021
42059b8
Remove redundant include
alonre24 Jun 6, 2021
6c906aa
Merge branch 'master' into ONNX_kill_switch
alonre24 Jun 7, 2021
23749c4
PR fixes part 1: refactor config and run queue info files (and all pl…
alonre24 Jun 10, 2021
342afbb
Merge branch 'ONNX_kill_switch' of https://github.com/RedisAI/RedisAI…
alonre24 Jun 10, 2021
697faf9
linter...
alonre24 Jun 10, 2021
ee02cc0
Merge branch 'master' into ONNX_kill_switch
alonre24 Jun 10, 2021
1201cb2
linter...
alonre24 Jun 10, 2021
4360679
Merge branch 'master' into ONNX_kill_switch
alonre24 Jun 10, 2021
21737e6
More PR fixes, add the option to get the global run sessions array fr…
alonre24 Jun 10, 2021
73f2a91
Minor fixes
alonre24 Jun 10, 2021
3942d23
More PR fixes, among that:
alonre24 Jun 13, 2021
e9fed4f
Merge branch 'master' into ONNX_kill_switch
alonre24 Jun 13, 2021
349653c
Fix tests for the case that we run on GPU - since CPU queue always cr…
alonre24 Jun 13, 2021
af423a7
Update readies
alonre24 Jun 13, 2021
fd8c672
PR fixes
alonre24 Jun 14, 2021
78da23e
Return error if onnx is executed in a non async manner (via gears for…
alonre24 Jun 14, 2021
285b5be
Small refactor in get_thread_id function.
alonre24 Jun 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 4 additions & 12 deletions docs/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -934,14 +934,14 @@ Because `AI.DAGRUN` provides the `PERSIST` option it is flagged as a 'write' com
Refer to the Redis [`READONLY` command](https://redis.io/commands/readonly) for further information about read-only cluster replicas.

## AI.INFO
The **`AI.INFO`** command returns general module information or information about the execution a model or a script.
The **`AI.INFO`** command returns information about the execution of a model or a script.

Runtime information is collected each time that [`AI.MODELRUN`](#aimodelrun) or [`AI.SCRIPTRUN`](#aiscriptrun) is called. The information is stored locally by the executing RedisAI engine, so when deployed in a cluster each shard stores its own runtime information.
Runtime information is collected each time that [`AI.MODELEXECUTE`](#aimodelrun) or [`AI.SCRIPTEXECUTE`](#aiscriptrun) is called. The information is stored locally by the executing RedisAI engine, so when deployed in a cluster each shard stores its own runtime information.

**Redis API**

```
AI.INFO [<key>] [RESETSTAT]
AI.INFO <key> [RESETSTAT]
```

_Arguments_
Expand All @@ -951,15 +951,7 @@ _Arguments_

_Return_

For a module genernal information: An array with alternating entries that represent the following key-value pairs:

* **Version**: a string showing the current module version.
* **Low level API Version**: a string showing the current module's low level api version.
* **RDB Encoding version**: a string showing the current module's RDB encoding version.
* **TensorFlow version**: a string showing the current loaded TesnorFlow backend version.
* **ONNX version**: a string showing the current loaded ONNX Runtime backend version.

For model or script runtime information: An array with alternating entries that represent the following key-value pairs:
An array with alternating entries that represent the following key-value pairs:

* **KEY**: a String of the name of the key storing the model or script value
* **TYPE**: a String of the type of value (i.e. 'MODEL' or 'SCRIPT')
Expand Down
2 changes: 2 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ ADD_LIBRARY(redisai_obj OBJECT
execution/parsing/parse_utils.c
execution/run_info.c
execution/background_workers.c
execution/run_queue_info.c
execution/utils.c
config/config.c
execution/DAG/dag.c
Expand Down Expand Up @@ -88,6 +89,7 @@ ENDIF()
IF(BUILD_ORT)
ADD_LIBRARY(redisai_onnxruntime_obj OBJECT
backends/onnxruntime.c
backends/onnx_timeout.c
${BACKEND_COMMON_SRC}
)
ENDIF()
Expand Down
30 changes: 30 additions & 0 deletions src/backends/backedns_api.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#pragma once

#include <stdint.h>

/**
* @return The internal id of RedisAI current working thread.
* id range is {0, ..., <threads_count>-1}. If this is called from a non
* RedisAI BG thread, return -1.
*/
long (*RedisAI_GetThreadId)(void);

/**
* @return The number of working threads in RedisAI. This number should be
* equal to the number of threads per queue (load time config) * number of devices
* registered in RedisAI (a new device is registered if a model is set to run on
* this device in AI.MODELSTORE command.
*/
uintptr_t (*RedisAI_GetThreadsCount)(void);

/**
* @return The number of working threads per device queue (load time config).
*/
long long (*RedisAI_GetNumThreadsPerQueue)(void);

/**
* @return The maximal number of milliseconds that a model run session should run
* before it is terminated forcefully (load time config).
* Currently supported only fo onnxruntime backend.
*/
long long (*RedisAI_GetModelExecutionTimeout)(void);
Loading