Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ETIMEDOUT error in Redis Sentinel, when Master loses network connectivity #1021

Closed
yyosifov opened this issue Dec 10, 2019 · 4 comments
Closed

Comments

@yyosifov
Copy link

Hi guys,

We observe a very strange reconnect issue. We have a setup with 3 Sentinels with Quorum 2 - we have 2 slaves and 1 master.

When we run our application, if we ssh to the Master node and break the network - i.e. we execute something like sudo ifconfig ens5 down where ens5 is the ethernet network, we stop the networking for this machine. The failover of the Redis runs successfully and a new master is elected.

What we observe is that it seems like ioredis does not reconnect immediately, but stalls for around 3+ minutes. We've ran our app with DEBUG=ioredis: and here are part of the logs:

[12-10-2019 05:33:38] - info: GET /i/localizations/dictionary 200 419.645 ms - 2
  ioredis:redis write command[10.4.2.14:6379]: 0 -> get([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs' ]) +22s
[12-10-2019 05:36:00] - info: GET /i/localizations/dictionary - - ms - -
  ioredis:connection error: Error: read ETIMEDOUT +3m
[12-10-2019 05:36:07] - error: Redis error: Error: read ETIMEDOUT name=Error, stack=Error: read ETIMEDOUT
    at TCP.onStreamRead (internal/stream_base_commons.js:111:27)
  ioredis:redis status[10.4.2.14:6379]: ready -> close +2m
[12-10-2019 05:36:07] - warn: Redis disconnected. Retry in 100ms
  ioredis:connection reconnect in 100ms +3ms
  ioredis:redis status[10.4.2.14:6379]: close -> reconnecting +1ms
[12-10-2019 05:36:07] - info: redis close
[12-10-2019 05:36:07] - info: redis reconnecting
  ioredis:redis status[10.4.2.14:6379]: reconnecting -> connecting +103ms
  ioredis:redis status[test-redis1.domain.org:26379]: [empty] -> connecting +0ms
  ioredis:redis queue command[test-redis1.domain.org:26379]: 0 -> sentinel([ 'get-master-addr-by-name', 'mymaster' ]) +1ms
  ioredis:redis status[10.4.1.164:26379]: connecting -> connect +270ms
  ioredis:redis status[10.4.1.164:26379]: connect -> ready +0ms
  ioredis:connection send 1 commands in offline queue +375ms
  ioredis:redis write command[10.4.1.164:26379]: 0 -> sentinel([ 'get-master-addr-by-name', 'mymaster' ]) +1ms
  ioredis:redis write command[10.4.1.164:26379]: 0 -> sentinel([ 'sentinels', 'mymaster' ]) +193ms
  ioredis:SentinelConnector Updated internal sentinels: {"cursor":1,"sentinels":[{"host":"test-redis1.domain.org","port":26379},{"host":"test-redis2.domain.org","port":26379},{"host":"test-redis3.domain.org","port":26379},{"host":"10.4 ... <REDACTED full-length="257"> +3m
  ioredis:SentinelConnector resolved: 10.4.1.164:6379 +0ms
  ioredis:redis status[10.4.1.164:6379]: connecting -> connect +390ms
  ioredis:redis write command[10.4.1.164:6379]: 0 -> info([]) +0ms
[12-10-2019 05:36:08] - info: redis connected
  ioredis:redis status[10.4.1.164:26379]: ready -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +584ms
  ioredis:redis status[10.4.1.164:26379]: close -> end +0ms
  ioredis:redis status[10.4.1.164:6379]: connect -> ready +193ms
  ioredis:connection resend 1 unfulfilled commands +193ms
  ioredis:redis write command[10.4.1.164:6379]: 0 -> get([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs' ]) +0ms
[12-10-2019 05:36:08] - info: redis ready
  ioredis:redis write command[10.4.1.164:6379]: 0 -> expire([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs', '604799' ]) +392ms
  ioredis:redis write command[10.4.1.164:6379]: 0 -> get([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs' ]) +3s
  ioredis:redis write command[10.4.1.164:6379]: 0 -> expire([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs', '604800' ]) +397ms
[12-10-2019 05:36:12] - info: GET /i/localizations/dictionary 200 397.483 ms - 2

You can see from the logs that there is a break between:

12-10-2019 05:33:38 and 12-10-2019 05:36:00 and ioredis logs: ioredis:connection error: Error: read ETIMEDOUT +3m that it timed-outed after 3 minutes

What we think is that the ioredis still hangs on the Socket connection to the old master and waits for a timeout to try to reconnect.

  • Have you observed similar behavior?
  • Do you happen to know a fix which can cause an immediate reconnect in such cases?
  • Let me know if you need me to elaborate more on the issue.

Best,
Yosif

@stale
Copy link

stale bot commented Jan 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the wontfix label Jan 9, 2020
@stale stale bot closed this as completed Jan 16, 2020
@nedyalkov
Copy link

nedyalkov commented Jan 19, 2020

I think this issue is rather important and should be reopened. It is connected to #1014 and #1007.

@sangelov
Copy link

It looks like the same as #139

@kapalkat
Copy link

kapalkat commented Jun 2, 2020

Hey, the same happens to me! ioredis tries to reconnect to the master no matter if new master is elected by sentinels. I think this should be reopened. If there is a need for any info/logs please let me know and I can provide one.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants