Skip to content

Check RDB serialization #302

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
lantiga opened this issue Mar 5, 2020 · 11 comments
Closed

Check RDB serialization #302

lantiga opened this issue Mar 5, 2020 · 11 comments
Assignees

Comments

@lantiga
Copy link
Contributor

lantiga commented Mar 5, 2020

There have been reports of dump.rdb being corrupted. To the best of my knowledge this happened on PyTorch models running on GPU, but I don't have a direct repro. Reports talk of this happening after Redis is terminated abruptly, although this clashes with what happens during RDB dump (writing to tmp file and then renaming).

This will need to be verified and eventually fixed for GA.

@lantiga lantiga self-assigned this Mar 5, 2020
@irthomasthomas
Copy link

Hi I think I have experienced this today. Things where working ok and then my server rebooted and now I can't load the dump.rdb.....

1178:M 12 Apr 2020 12:15:35.049 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1178:M 12 Apr 2020 12:15:35.049 # Server initialized
1178:M 12 Apr 2020 12:15:35.049 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1178:M 12 Apr 2020 12:15:35.049 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1178:M 12 Apr 2020 12:15:35.060 * Module 'ai' loaded from /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/redisai.so
1178:M 12 Apr 2020 12:15:35.570 * RedisGears version 0.9.0, git_sha=e57aa980a1f203b23bd50ca77da4dd71139b952d
1178:M 12 Apr 2020 12:15:35.570 * MaxExecutions:1000000
1178:M 12 Apr 2020 12:15:35.570 * MaxExecutionsPerRegistration:100
1178:M 12 Apr 2020 12:15:35.570 * ProfileExecutions:0
1178:M 12 Apr 2020 12:15:35.570 * PythonAttemptTraceback:1
1178:M 12 Apr 2020 12:15:35.570 * DependenciesUrl:http://redismodules.s3.amazonaws.com/redisgears/redisgears-dependencies.linux-bionic-x64.0.9.0.tgz
1178:M 12 Apr 2020 12:15:35.570 * DependenciesSha256:0216195ea9c2d8cec43f8731432ba86b7a742e5c6668cce8bfaac263ea21e312
1178:M 12 Apr 2020 12:15:35.570 * CreateVenv:0
1178:M 12 Apr 2020 12:15:35.570 * RedisAI api loaded successfully.
1178:M 12 Apr 2020 12:15:35.574 * Found python installation under: /var/opt/redislabs/lib/modules/python3
1178:M 12 Apr 2020 12:15:35.574 * Found venv installation under: /var/opt/redislabs/lib/modules/python3/.venv
1178:M 12 Apr 2020 12:15:38.603 * Initializing Python environment with: exec(open('/var/opt/redislabs/lib/modules/python3/.venv/bin/activate_this.py').read(), {'file': '/var/opt/redislabs/lib/modules/python3/.venv/bin/activate_this.py'})
1178:M 12 Apr 2020 12:15:43.000 * Module 'rg' loaded from /root/redis-5.0.8/redisgears/redisgears.so
1178:M 12 Apr 2020 12:15:43.056 * Module 'bf' loaded from /root/redis-5.0.8/RedisBloom-2.2.2/redisbloom.so
1178:M 12 Apr 2020 12:15:43.133 # JSON data type for Redis v99.99.99 [encver 0]
1178:M 12 Apr 2020 12:15:43.133 * Module 'ReJSON' loaded from /root/RedisJSON/src/rejson.so
1178:M 12 Apr 2020 12:16:02.494 * TORCH backend loaded from /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/redisai_torch.so
ERR: [enforce fail at inline_container.cc:137] . PytorchStreamReader failed reading zip archive: failed finding central directory
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const*) + 0x76 (0x7faa1e5f98c6 in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::valid(char const*) + 0x8d (0x7faa205ebfad in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::init() + 0xa9 (0x7faa205ef0f9 in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/lib/libtorch.so)
frame #3: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_deletecaffe2::serialize::ReadAdapterInterface >) + 0x53 (0x7faa205f1ae3 in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/lib/libtorch.so)
frame #4: + 0x300b91f (0x7faa2181f91f in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/lib/libtorch.so)
frame #5: torch::jit::load(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_deletecaffe2::serialize::ReadAdapterInterface >, c10::optionalc10::Device, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >&) + 0x3c (0x7faa2181ef8c in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/lib/libtorch.so)
frame #6: torch::jit::load(std::istream&, c10::optionalc10::Device, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >&) + 0x75 (0x7faa2181f5a5 in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/lib/libtorch.so)
frame #7: torchLoadModel + 0x22e (0x7faa2b99bede in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/redisai_torch.so)
frame #8: RAI_ModelCreateTorch + 0x87 (0x7faa2b997887 in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/backends/redisai_torch/redisai_torch.so)
frame #9: + 0x10d13 (0x7faa367ecd13 in /root/redis-5.0.8/redisai/RedisAI-0.9.0/install-cpu/redisai.so)
frame #10: + 0x541b1 (0x55a7077c51b1 in /root/redis-5.0.8/src/redis-server 127.0.0.1:6379)
frame #11: rdbLoadRio + 0x254 (0x55a7077c66e4 in /root/redis-5.0.8/src/redis-server 127.0.0.1:6379)
frame #12: rdbLoad + 0x52 (0x55a7077c6ee2 in /root/redis-5.0.8/src/redis-server 127.0.0.1:6379)
frame #13: loadDataFromDisk + 0x8c (0x55a7077a68ec in /root/redis-5.0.8/src/redis-server 127.0.0.1:6379)
frame #14: main + 0x486 (0x55a707798e86 in /root/redis-5.0.8/src/redis-server 127.0.0.1:6379)
frame #15: __libc_start_main + 0xe7 (0x7faa36cfbb97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #16: _start + 0x2a (0x55a70779914a in /root/redis-5.0.8/src/redis-server 127.0.0.1:6379)

1178:M 12 Apr 2020 12:16:02.779 # The RDB file contains module data for the module type 'AI__MODEL', that the responsible module is not able to load. Check for modules log above for additional clues.

@lantiga
Copy link
Contributor Author

lantiga commented Apr 12, 2020

@irthomasthomas thank you for reporting this issue.

If you can't share the RDB (which is understandable), a few pieces of information would be useful:

  • are you running on CPU? (from the logs it looks like it)
  • I'm assuming you didn't do any rebuild or upgrades in the reboot, right?
  • how is your RDB save configured? (how often)

Thanks

@irthomasthomas
Copy link

irthomasthomas commented Apr 12, 2020 via email

@irthomasthomas
Copy link

@lantiga
I just tested that dump.rdb is working on my laptop with redis-5.0.7 and gears/ai v0.4.0
So I uploaded that rdb to the server and tried running it again but with redis-5.0.7 instead of 0.8 and with gears/ai v0.9.0 and it still produced that error.

@lantiga
Copy link
Contributor Author

lantiga commented Apr 13, 2020

@irthomasthomas I don't seem to have access to that repo

@irthomasthomas
Copy link

@lantiga Sorry I forgot to make it public. Try now.

@irthomasthomas
Copy link

@lantiga I don't think the dump.rdb is corrupted. I switched back to redisAI 0.4.0 and it works again.

@lantiga
Copy link
Contributor Author

lantiga commented Apr 13, 2020

Got it. I'll take a look in a few hours. Thanks!

@lantiga
Copy link
Contributor Author

lantiga commented Apr 18, 2020

Hi @irthomasthomas, sadly it took longer. Do you remember the name of the key for the model?

@irthomasthomas
Copy link

@lantiga No worries it's auction:model

@irthomasthomas
Copy link

Hi @lantiga I don't know if this is related but now I keep having to restore the rdb from backup or I get this error:

redis-server: src/streams_reader.c:375: StreamReader_ReadRecords: Assertion `RedisModule_CallReplyType(values) == REDISMODULE_REPLY_ARRAY' failed.
[1] 879 abort redis-server ~/redis-5.0.8/redis.conf

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants