Skip to content

Race condition in JedisClusterTopologyProvider #2986

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
velna opened this issue Sep 9, 2024 · 0 comments
Closed

Race condition in JedisClusterTopologyProvider #2986

velna opened this issue Sep 9, 2024 · 0 comments
Assignees
Labels
type: bug A general bug

Comments

@velna
Copy link

velna commented Sep 9, 2024

JedisClusterTopologyProvider use a cache object to optimize performace on frequent cluster requests, but the time value is updated before the cache object, there is a race condition that the old cache including the invalid cluster topology might get returned, which will result in a ClusterCommandExecutionFailureException: Could not get a resource from th pool.

	public ClusterTopology getTopology() {

		if (cached != null && shouldUseCachedValue()) {
			return cached;
		}

		Map<String, Exception> errors = new LinkedHashMap<>();

		List<Entry<String, ConnectionPool>> list = new ArrayList<>(cluster.getClusterNodes().entrySet());

		Collections.shuffle(list);

		for (Entry<String, ConnectionPool> entry : list) {

			try (Connection connection = entry.getValue().getResource()) {

				time = System.currentTimeMillis();  // time value is updated before cached object

				Set<RedisClusterNode> nodes = Converters.toSetOfRedisClusterNodes(new Jedis(connection).clusterNodes());

				cached = new ClusterTopology(nodes);

				return cached;

			} catch (Exception ex) {
				errors.put(entry.getKey(), ex);
			}
		}

		StringBuilder stringBuilder = new StringBuilder();

		for (Entry<String, Exception> entry : errors.entrySet()) {
			stringBuilder.append(String.format("\r\n\t- %s failed: %s", entry.getKey(), entry.getValue().getMessage()));
		}

		throw new ClusterStateFailureException(
				"Could not retrieve cluster information; CLUSTER NODES returned with error" + stringBuilder);
	}
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Sep 9, 2024
@mp911de mp911de self-assigned this Sep 9, 2024
@mp911de mp911de added type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged labels Sep 9, 2024
@mp911de mp911de changed the title Race condition in org.springframework.data.redis.connection.jedis.JedisClusterConnection.JedisClusterTopologyProvider Race condition in JedisClusterTopologyProvider Sep 11, 2024
mp911de added a commit that referenced this issue Sep 11, 2024
We now use a value object for caching the topology to avoid races in updating the cache timestamp.

Also, we set the cache timestamp after obtaining the topology to avoid that I/O latency expires the topology cache.

Closes #2986
@mp911de mp911de added this to the 3.3.4 (2024.0.4) milestone Sep 11, 2024
christophstrobl pushed a commit that referenced this issue Sep 11, 2024
We now use a value object for caching the topology to avoid races in updating the cache timestamp.

Also, we set the cache timestamp after obtaining the topology to avoid that I/O latency expires the topology cache.

Closes: #2986
Original Pull Request: #2989
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
type: bug A general bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants