Skip to content

Crash on TF.GRAPH foo graph.pb #9

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
gkorland opened this issue Sep 7, 2018 · 7 comments
Closed

Crash on TF.GRAPH foo graph.pb #9

gkorland opened this issue Sep 7, 2018 · 7 comments
Labels
Milestone

Comments

@gkorland
Copy link
Contributor

gkorland commented Sep 7, 2018

Using graph sample - https://github.com/tensorflow/models/blob/master/samples/languages/java/training/model/graph.pb

redis-cli -x TF.GRAPH foo < graph.pb

causes

22990:M 07 Sep 2018 03:36:17.123 # Redis 999.999.999 crashed by signal: 11
22990:M 07 Sep 2018 03:36:17.123 # Crashed running the instruction at: 0x7f9b2ab313a5
22990:M 07 Sep 2018 03:36:17.123 # Accessing address: 0x68
22990:M 07 Sep 2018 03:36:17.123 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5Graph12AllocateNodeESt10shared_ptrINS_14NodePropertiesEEPKNS_4NodeE+0x55)[0x7f9b2ab313a5]

Backtrace:
/home/guy/redislabsmodules/redis/src/redis-server *:6379(logStackTrace+0x5a)[0x5591347721da]
/home/guy/redislabsmodules/redis/src/redis-server *:6379(sigsegvHandler+0xb1)[0x559134772991]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f9b3153a890]
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5Graph12AllocateNodeESt10shared_ptrINS_14NodePropertiesEEPKNS_4NodeE+0x55)[0x7f9b2ab313a5]
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5Graph7AddNodeERKNS_7NodeDefEPNS_6StatusE+0x365)[0x7f9b2ab318d5]
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5GraphC1EPKNS_19OpRegistryInterfaceE+0x323)[0x7f9b2ab33a93]
/usr/local/lib/libtensorflow.so(_ZN8TF_GraphC2Ev+0x23)[0x7f9b2b8914c3]
/usr/local/lib/libtensorflow.so(TF_NewGraph+0x1e)[0x7f9b2b8915ee]
/home/guy/redislabsmodules/RedisTF/src/redistf.so(RedisTF_Graph_RedisCommand+0x119)[0x7f9b2edf970f]
@lantiga
Copy link
Contributor

lantiga commented Sep 7, 2018

Does the model created by running tf-minimal.py crash as well?

@lantiga
Copy link
Contributor

lantiga commented Sep 7, 2018

So, in order to be served, the variables must be initialized and the graph must be frozen before exporting. Here's how I modified the script you pointed to (note how the saving part changes):

import tensorflow as tf
from tensorflow.python.framework.graph_util import convert_variables_to_constants

x = tf.placeholder(tf.float32, name='input')
y_ = tf.placeholder(tf.float32, name='target')

W = tf.Variable(5., name='W')
b = tf.Variable(3., name='b')

y = x * W + b
y = tf.identity(y, name='output')

loss = tf.reduce_mean(tf.square(y - y_))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss, name='train')

init = tf.global_variables_initializer()

sess = tf.Session()
sess.run(init)

frozen_graph = convert_variables_to_constants(sess, sess.graph_def, ['output'])
tf.train.write_graph(frozen_graph, './', 'graph.pb', as_text=False)

@lantiga
Copy link
Contributor

lantiga commented Sep 17, 2018

@gkorland may I close this one?

@gkorland
Copy link
Contributor Author

Thanks, it seems to work, but why did my original graph crashed Redis?

@lantiga
Copy link
Contributor

lantiga commented Sep 26, 2018

Running a graph that is not frozen also needs a checkpoint file. If it’s not there, TF raises an exception, which we’re not handling yet.
We should definitely fail gracefully, but I’d still require frozen graphs at least for now.

@gkorland
Copy link
Contributor Author

Ok, so let's keep this issue for now, so we won't forget to handle the fail gracefully

@lantiga lantiga added this to the 0.1.0 milestone Feb 8, 2019
@K-Jo K-Jo added the bug label Feb 13, 2019
@lantiga
Copy link
Contributor

lantiga commented Mar 5, 2019

Should be solved by #72, feel free to reopen

@K-Jo K-Jo closed this as completed Mar 7, 2019
rafie added a commit that referenced this issue Jun 6, 2019
rafie added a commit that referenced this issue Jun 12, 2019
rafie added a commit that referenced this issue Aug 27, 2019
lantiga pushed a commit that referenced this issue Sep 1, 2019
* ARM support and bin/os-arch-variant scheme

* Build: fixes #1

* Build: fixes #2

* Build: fixes #3

* Build: fixes #4

* Build fixes #5

* CircleCI config.yml refectoring

* Build fixes #6

* Build fixes #7

* Build fixes #8

* Build fixes #9

* Build fixes #10

* Build fixes #11

* Build fixes #12

* Build fixes #13

* Build fixes #14

* Build fixes #15

* Build fixes #16

* Build fixes #17

* Build fixes #18

* Build fixes #19

* Build fixes #20

* Build fixes #21

* Build fixes #22

* Build fixes #23

* Build fixes #24

* Build fixes #25

* Build fixes #26

* Filesystem restructuring

* Pack fixes + docker goal in makefile
rafie added a commit that referenced this issue Sep 8, 2019
lantiga pushed a commit that referenced this issue Sep 19, 2019
* Readies sync

* CircleCI: multiarch docker build

* Readies sync

* CircleCI: multiarch docker build #2

* CircleCI: multiarch docker build #3

* Readies sync

* CircleCI: multiarch docker build

* Readies sync

* CircleCI: multiarch docker build #2

* CircleCI: multiarch docker build #3

* Support selective build (i.e. excluding engines)

* CircleCI: multiarch docker build #4

* CircleCI: multiarch docker build #5

* CircleCI: multiarch docker build #6

* CircleCI: multiarch docker build #7

* CircleCI: multiarch docker build #8

* CircleCI: multiarch docker build #9

* Disabled CircleCI restore from cache

* Reverted python3 dependency installation

* CircleCI: multiarch docker build #10

* CircleCI: multiarch docker build #11

* system-setup: fixed Python libs installations

* Fixed tensorflow collect script

* Enabled macOS in CircleCI + Fixed basic_tests.py for decoding

* CircleCI: moved to rmbuilder:x64-build

* CircleCI fixes #2

* CircleCI fixes #3

* CircleCI fixes #4

* Reverted RLTest decoding-related change

* CircleCI fixes #5

* Tests: double-panda.py to diagnose macOS issue

* get_deps: download libtorch from original url

- download libtorch from original url via rapack.sh
- paella/platform: fixed problem with RHEL identification

* paella: fixed urllib3 issue
lantiga pushed a commit that referenced this issue May 6, 2020
* ARM support and bin/os-arch-variant scheme

* Build: fixes #1

* Build: fixes #2

* Build: fixes #3

* Build: fixes #4

* Build fixes #5

* CircleCI config.yml refectoring

* Build fixes #6

* Build fixes #7

* Build fixes #8

* Build fixes #9

* Build fixes #10

* Build fixes #11

* Build fixes #12

* Build fixes #13

* Build fixes #14

* Build fixes #15

* Build fixes #16

* Build fixes #17

* Build fixes #18

* Build fixes #19

* Build fixes #20

* Build fixes #21

* Build fixes #22

* Build fixes #23

* Build fixes #24

* Build fixes #25

* Build fixes #26

* Filesystem restructuring

* Pack fixes + docker goal in makefile
lantiga pushed a commit that referenced this issue May 6, 2020
* Readies sync

* CircleCI: multiarch docker build

* Readies sync

* CircleCI: multiarch docker build #2

* CircleCI: multiarch docker build #3

* Readies sync

* CircleCI: multiarch docker build

* Readies sync

* CircleCI: multiarch docker build #2

* CircleCI: multiarch docker build #3

* Support selective build (i.e. excluding engines)

* CircleCI: multiarch docker build #4

* CircleCI: multiarch docker build #5

* CircleCI: multiarch docker build #6

* CircleCI: multiarch docker build #7

* CircleCI: multiarch docker build #8

* CircleCI: multiarch docker build #9

* Disabled CircleCI restore from cache

* Reverted python3 dependency installation

* CircleCI: multiarch docker build #10

* CircleCI: multiarch docker build #11

* system-setup: fixed Python libs installations

* Fixed tensorflow collect script

* Enabled macOS in CircleCI + Fixed basic_tests.py for decoding

* CircleCI: moved to rmbuilder:x64-build

* CircleCI fixes #2

* CircleCI fixes #3

* CircleCI fixes #4

* Reverted RLTest decoding-related change

* CircleCI fixes #5

* Tests: double-panda.py to diagnose macOS issue

* get_deps: download libtorch from original url

- download libtorch from original url via rapack.sh
- paella/platform: fixed problem with RHEL identification

* paella: fixed urllib3 issue
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants