gh-131466: `concurrent.futures.Executor.map`: avoid temporarily exceeding `buffersize` while collecting the next result #131467

ebonnal · 2025-03-19T15:57:43Z

Context recap:

If we have:

results: Iterator = executor.map(fn, iterable, buffersize=buffersize)

What happens when calling next(results):

fetch the next arg from interable and put a task for fn(arg) in the buffer
wait for the next result to be available
yield the collected result

-> During step 2. there is buffersize + 1 buffered tasks.

This PR swaps steps 1. and 2. so that buffersize is never exceeded, even during next.

Issue: concurrent.futures.Executor.map temporarily exceeds its buffersize while collecting the next result #131466

…llecting the next result

…ffer-after-yield

…Test.test_free_reference

ebonnal · 2025-03-22T19:07:41Z

Lib/concurrent/futures/_base.py

-                        yield _result_or_cancel(fs.pop(), end_time - time.monotonic())
+
+                    # Yield the awaited result
+                    yield fs.pop().result()


to be discussed: this could be replaced by a lighter yield fs.pop()._result because the prior call to _result_or_cancel guarantees that at this point the result is available.

Lib/test/test_concurrent_futures/executor.py

picnixz

While I understand that we could possibly exceed buffersize while collecting the next result, is there a real-word use case where it would really cause an issue? the reason is that we access to fs[-1] and then do fs.pop().

I see that have a del fut in _result_or_cancel() but can you confirm that it's sufficient to not hold any reference to the yet-to-be-popped future?

Lib/concurrent/futures/_base.py

picnixz · 2025-03-23T13:25:47Z

Asking Gregory as well since he's the mp expert c:

ebonnal · 2025-03-23T13:33:19Z

@picnixz sorry I re-asked your review because you made me realize that we actually don't need _result_or_cancel anymore:

test_executor_map_current_future_cancel introduced in #95169 does not break anymore because now if the fs[-1].result() access fails, the future is still in fs (not popped out like before) and it will be properly cancelled as part of the result_iterator's finally block.

I'm digging deeper into #95169 's context to check if I miss any non-tested scenario, especially regarding this:

    finally:
        # Break a reference cycle with the exception in self._exception
        del fut

picnixz · 2025-03-23T13:52:11Z

especially regarding this:

yes, that's what I wanted to ask, but I'm not an expert here so i'll let you investigate first c:

Lib/concurrent/futures/_base.py

…ed by fs[-1]

picnixz · 2025-04-19T09:00:25Z

Is this PR blocked by the other one or should I do something in particular?

ebonnal · 2025-04-19T20:25:49Z

Is this PR blocked by the other one or should I do something in particular?

Yes it's fair to consider this one blocked by #131701 @picnixz 👍 (definitely cleaner to merge the test it introduces before this)

dalcinl · 2025-04-29T14:13:09Z

@ebonnal Sorry for the late reply. What about this simpler and IMHO cleaner way below? The second to last line may be a bit controversial (it changes the type of a variable), but I've used that list-pop trick in my mpi4py.futures module to avoid keeping references to objects.

diff --git a/Lib/concurrent/futures/_base.py b/Lib/concurrent/futures/_base.py
index d98b1ebdd58..de34b86d1ee 100644
--- a/Lib/concurrent/futures/_base.py
+++ b/Lib/concurrent/futures/_base.py
@@ -625,21 +625,26 @@ def map(self, fn, *iterables, timeout=None, chunksize=1, buffersize=None):
         # before the first iterator value is required.
         def result_iterator():
             try:
+                result = None
                 # reverse to keep finishing order
                 fs.reverse()
                 while fs:
+                    # Careful not to keep a reference to the popped future
+                    if timeout is None:
+                        result = _result_or_cancel(fs.pop())
+                    else:
+                        result = _result_or_cancel(fs.pop(), end_time - time.monotonic())
                     if (
                         buffersize
                         and (executor := executor_weakref())
                         and (args := next(zipped_iterables, None))
                     ):
                         fs.appendleft(executor.submit(fn, *args))
-                    # Careful not to keep a reference to the popped future
-                    if timeout is None:
-                        yield _result_or_cancel(fs.pop())
-                    else:
-                        yield _result_or_cancel(fs.pop(), end_time - time.monotonic())
+                    # Careful not to keep a reference to the result
+                    result = [result]
+                    yield result.pop()
             finally:
+                del result
                 for future in fs:
                     future.cancel()
         return result_iterator()

ebonnal · 2025-04-29T16:03:24Z

Thank you for taking a look @dalcinl !

What about this simpler and IMHO cleaner way below?

What exactly do you find unclean in my proposal and justifying a list creation for each yielded element? (fun trick though!)

dalcinl · 2025-04-29T20:25:54Z

What exactly do you find unclean in my proposal and justifying a list creation for each yielded element

I guess it is just a matter of subjective taste, my patch looks slightly shorter, but I should say that the primary motivation was avoiding the use of the (conventionally) private _result attribute. The creation of a list with one element is as fast as an attribute lookup, so you can hardly notice any overhead because of it.

I'm biased, as I maintain an custom implementation of this routine, and I prefer to avoid the use of private APIs and attributes. Standard library modules may not be bound to such constraints.

Long story short, I believe both your proposal and mine are functionally equivalent, so FWIW, this PR has my +1.

ebonnal · 2025-04-29T21:56:21Z

the primary motivation was avoiding the use of the (conventionally) private _result attribute

I thought that was acceptable because Future and Executor are defined in the same module (_base.py) and I found a lot of other example in the std lib where private attributes are considered more as "module private" rather than "class private" 🤔.

The creation of a list with one element is as fast as an attribute lookup, so you can hardly notice any overhead because of it

Fair, actually I remember now that another alternative I had considered in the early days of this PR was:

result = deque()
while fs:
   ...
   result.append(fs.pop().result())
   ...
   yield result.pop()

Which is similar to your approach but reuses the same container (a deque for the append/pop performance).

I believe both your proposal and mine are functionally equivalent, so FWIW, this PR has my +1.

Thanks again for your review, I appreciate it, let's wait and gather more feedback 👀 !

dalcinl · 2025-04-30T06:56:19Z

Fair, actually I remember now that another alternative I had considered in the early days of this PR was:

I looks even better!! I'll borrow your approach for my own code. I you ever update this PR, please do not forget the del result or result.clear() in the finally block.

Executor.map: avoid temporarily exceeding the buffersize while co…

233ccc1

…llecting the next result

bedevere-app bot added the awaiting review label Mar 19, 2025

bedevere-app bot mentioned this pull request Mar 19, 2025

concurrent.futures.Executor.map temporarily exceeds its buffersize while collecting the next result #131466

Open

ebonnal added 9 commits March 19, 2025 16:07

Merge remote-tracking branch 'cpython/main' into feat/executor-map-bu…

b864ef9

…ffer-after-yield

avoid keeping a ref to result for test_thread_pool.ThreadPoolExecutor…

72d7028

…Test.test_free_reference

update comment about not keeping references to popped future/result

2a30697

introduce current_timeout variable

ab4182b

comment on the necessity of the result container

7a1ae46

avoid container

268927d

remove current_timeout usage

de09aff

fix comments format

1814bfe

rephrase comments

7206321

ebonnal changed the title ~~gh-131466: concurrent.futures.Executor.map: avoid temporarily exceeding buffersize while collecting the next result~~ gh-131466: concurrent.futures.Executor.map: avoid temporarily exceeding buffersize while collecting the next result Mar 20, 2025

picnixz self-requested a review March 22, 2025 15:59

picnixz added the skip news label Mar 22, 2025

ebonnal commented Mar 22, 2025

View reviewed changes

picnixz reviewed Mar 22, 2025

View reviewed changes

Lib/test/test_concurrent_futures/executor.py Outdated Show resolved Hide resolved

order imports

162add1

ebonnal requested a review from picnixz March 23, 2025 01:02

picnixz reviewed Mar 23, 2025

View reviewed changes

Lib/concurrent/futures/_base.py Outdated Show resolved Hide resolved

format comments

f2c5fd0

picnixz approved these changes Mar 23, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Mar 23, 2025

picnixz requested a review from gpshead March 23, 2025 13:25

remove _result_or_cancel

9474769

ebonnal requested a review from picnixz March 23, 2025 13:30

access awaited result via _result attribute

2a2119e

ebonnal commented Mar 23, 2025

View reviewed changes

Lib/concurrent/futures/_base.py Show resolved Hide resolved

ebonnal added 2 commits March 24, 2025 13:23

break a reference cycle with fs[-1]._exception

3be6956

break other potential reference cycles with fs, not only the one caus…

0d70be9

…ed by fs[-1]

ebonnal mentioned this pull request Mar 24, 2025

gh-74028: concurrent.futures.Executor.map: avoid reference cycles when an exception is raised #131701

Open

ebonnal added 2 commits March 25, 2025 00:34

lighter ref cycle break

f509097

move the ref cycle break into the finally block

d50dabd

picnixz requested review from graingert and removed request for picnixz March 29, 2025 23:50

ebonnal mentioned this pull request Apr 28, 2025

gh-74028: concurrent.futures.Executor.map: introduce buffersize param for lazier behavior #125663

Merged

dalcinl mentioned this pull request Apr 30, 2025

futures: avoid exceeding buffersize tasks in map mpi4py/mpi4py#642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-131466: `concurrent.futures.Executor.map`: avoid temporarily exceeding `buffersize` while collecting the next result #131467

gh-131466: `concurrent.futures.Executor.map`: avoid temporarily exceeding `buffersize` while collecting the next result #131467

Uh oh!

ebonnal commented Mar 19, 2025 •

edited

Loading

Uh oh!

ebonnal Mar 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

picnixz left a comment

Uh oh!

Uh oh!

picnixz commented Mar 23, 2025

Uh oh!

ebonnal commented Mar 23, 2025 •

edited

Loading

Uh oh!

picnixz commented Mar 23, 2025

Uh oh!

Uh oh!

picnixz commented Apr 19, 2025

Uh oh!

ebonnal commented Apr 19, 2025 •

edited

Loading

Uh oh!

dalcinl commented Apr 29, 2025

Uh oh!

ebonnal commented Apr 29, 2025

Uh oh!

dalcinl commented Apr 29, 2025

Uh oh!

ebonnal commented Apr 29, 2025 •

edited

Loading

Uh oh!

dalcinl commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

gh-131466: concurrent.futures.Executor.map: avoid temporarily exceeding buffersize while collecting the next result #131467

Are you sure you want to change the base?

gh-131466: concurrent.futures.Executor.map: avoid temporarily exceeding buffersize while collecting the next result #131467

Uh oh!

Conversation

ebonnal commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context recap:

Uh oh!

ebonnal Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

picnixz commented Mar 23, 2025

Uh oh!

ebonnal commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Mar 23, 2025

Uh oh!

Uh oh!

picnixz commented Apr 19, 2025

Uh oh!

ebonnal commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalcinl commented Apr 29, 2025

Uh oh!

ebonnal commented Apr 29, 2025

Uh oh!

dalcinl commented Apr 29, 2025

Uh oh!

ebonnal commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalcinl commented Apr 30, 2025

Uh oh!

Uh oh!

gh-131466: `concurrent.futures.Executor.map`: avoid temporarily exceeding `buffersize` while collecting the next result #131467

gh-131466: `concurrent.futures.Executor.map`: avoid temporarily exceeding `buffersize` while collecting the next result #131467

ebonnal commented Mar 19, 2025 •

edited

Loading

ebonnal Mar 22, 2025 •

edited

Loading

ebonnal commented Mar 23, 2025 •

edited

Loading

ebonnal commented Apr 19, 2025 •

edited

Loading

ebonnal commented Apr 29, 2025 •

edited

Loading