Race condition in `JedisClusterTopologyProvider` #2986

velna · 2024-09-09T02:00:29Z

JedisClusterTopologyProvider use a cache object to optimize performace on frequent cluster requests, but the time value is updated before the cache object, there is a race condition that the old cache including the invalid cluster topology might get returned, which will result in a ClusterCommandExecutionFailureException: Could not get a resource from th pool.

	public ClusterTopology getTopology() {

		if (cached != null && shouldUseCachedValue()) {
			return cached;
		}

		Map<String, Exception> errors = new LinkedHashMap<>();

		List<Entry<String, ConnectionPool>> list = new ArrayList<>(cluster.getClusterNodes().entrySet());

		Collections.shuffle(list);

		for (Entry<String, ConnectionPool> entry : list) {

			try (Connection connection = entry.getValue().getResource()) {

				time = System.currentTimeMillis();  // time value is updated before cached object

				Set<RedisClusterNode> nodes = Converters.toSetOfRedisClusterNodes(new Jedis(connection).clusterNodes());

				cached = new ClusterTopology(nodes);

				return cached;

			} catch (Exception ex) {
				errors.put(entry.getKey(), ex);
			}
		}

		StringBuilder stringBuilder = new StringBuilder();

		for (Entry<String, Exception> entry : errors.entrySet()) {
			stringBuilder.append(String.format("\r\n\t- %s failed: %s", entry.getKey(), entry.getValue().getMessage()));
		}

		throw new ClusterStateFailureException(
				"Could not retrieve cluster information; CLUSTER NODES returned with error" + stringBuilder);
	}

The text was updated successfully, but these errors were encountered:

We now use a value object for caching the topology to avoid races in updating the cache timestamp. Also, we set the cache timestamp after obtaining the topology to avoid that I/O latency expires the topology cache. Closes #2986

We now use a value object for caching the topology to avoid races in updating the cache timestamp. Also, we set the cache timestamp after obtaining the topology to avoid that I/O latency expires the topology cache. Closes: #2986 Original Pull Request: #2989

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Sep 9, 2024

mp911de self-assigned this Sep 9, 2024

mp911de added type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged labels Sep 9, 2024

mp911de changed the title ~~Race condition in org.springframework.data.redis.connection.jedis.JedisClusterConnection.JedisClusterTopologyProvider~~ Race condition in JedisClusterTopologyProvider Sep 11, 2024

mp911de added this to the 3.3.4 (2024.0.4) milestone Sep 11, 2024

mp911de mentioned this issue Sep 11, 2024

Use value object for topology caching #2989

Closed

christophstrobl closed this as completed in 1514461 Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition in `JedisClusterTopologyProvider` #2986

Race condition in `JedisClusterTopologyProvider` #2986

velna commented Sep 9, 2024

Race condition in JedisClusterTopologyProvider #2986

Race condition in JedisClusterTopologyProvider #2986

Comments

velna commented Sep 9, 2024

Race condition in `JedisClusterTopologyProvider` #2986

Race condition in `JedisClusterTopologyProvider` #2986