Read & write all frames in one pass #4506

jakirkham · 2021-02-12T21:09:03Z

Closes #xxxx
Tests added / passed
Passes black distributed / flake8 distributed

Instead of doing multiple reads with async, just allocate one big chunk of memory for all of the frames and read into it. Should cutdown on the number of passes through Tornado needed to fill these frames.

Also knowing that IOStream has an internal queue of buffers for writing, we are able to push all of the frames into that queue beforehand. Then ask Tornado to write after they are in the queue. This also cuts down on the number of passes through Tornado by simply entering the write handling code once and writing all the buffers.

mrocklin · 2021-02-12T21:54:58Z

Hrm, neat. :)

jakirkham · 2021-02-12T22:02:23Z

Yeah we might be able to do better with sendmsg and recvmsg. Am still wrapping my head around how we can use those

mrocklin · 2021-02-12T22:28:08Z

My guess is that going beyond this will have diminishing returns, at least on the scheduler side where we generally have small messages. I could easily be wrong though.

jakirkham · 2021-02-13T03:39:35Z

Made a few more changes. I think this captures the same idea as what sendmsg/recvmsg would give us (except for the ability to send multiple buffers at once). By doing this we are able to go from 2 read_bytes calls down to 1 read_bytes calls. Should add this is in addition to having one big buffer receiving all frames.

Ensures that the separate `frames` are freed from memory before proceeding to the next step of sending them out.

Instead of doing multiple reads with `async`, just allocate one big chunk of memory for all of the frames and read into it. Should cutdown on the number of passes through Tornado needed to fill these frames.

This should allow us to allocate space for the entirety of the rest of the message including the size of each frames and all following frames. We can then unpack all of this information once received. By doing this we are able to cutdown on addition send/recv calls that would otherwise occur and spend less time in Tornado's IO handling.

Simplifies the code in the TCP path by leveraging existing utility functions.

Make it a little easier to follow how the variables relate to each other.

These are just binary serialization steps that are not really depended on communication or issues that may come from sockets. So go ahead and move them out of the `try` block.

jakirkham · 2021-02-16T04:04:47Z

Simplified this a bit more using pack_frames_prelude to build the header for us. Then we just tack on the message size before that so the receiving end gets that first.

On the receiving end, we just get the message size first. Then use that to preallocate a buffer to hand off to Tornado to fill. The remaining unpacking is just handled by unpack_frames. Deserializing steps can then be handed off after that per usual.

Also group `frames` and `frames_nbytes` steps together. Finally rewrites the code to avoid use of constant for size of `"Q"`, which should make it invariant to changes in that size.

Should avoid issues on platforms where this may not be the exact size.

To simplify the logic, just concatenate small frames before doing any sends. This way we can use the same code path for all sends.

jakirkham · 2021-02-16T10:40:36Z

Also figured out how to offload all frames to Tornado. So it now only uses one write call there

Tornado has an internal queue that uses to hold frames before writing them. To avoid needing to track and wait on various `Future`s and the amount of data sent, we can just enqueue all of the frames we want to send before a send even happens and then start the write. This way Tornado already has all of the data we plan to send once it starts working. In the meantime, we are able to carry on with other tasks while this gets handled in the background. https://github.com/tornadoweb/tornado/blob/6cdf82e927d962290165ba7c4cccb3e974b541c3/tornado/iostream.py#L537-L538

jakirkham · 2021-02-16T21:14:04Z

Some profiling details in issue ( quasiben/dask-scheduler-performance#108 ). Most notably this is giving us a 7% improvement in transfer time for the shuffle benchmark ( quasiben/dask-scheduler-performance#108 (comment) )

jakirkham · 2021-02-17T00:54:45Z

Please let me know if anything else is needed here 🙂

jakirkham · 2021-02-17T17:52:38Z

Thanks Matt! 😄

mrocklin · 2021-02-17T18:00:28Z

All I did was press the green button :)

…

On Wed, Feb 17, 2021 at 11:52 AM jakirkham ***@***.***> wrote: Thanks Matt! 😄 — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#4506 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTFFC6YXDX3LXBRALZ3S7P67PANCNFSM4XRKYH6Q> .

jakirkham force-pushed the read_all_frames branch from 02fd115 to c9b9e54 Compare February 12, 2021 21:26

jakirkham force-pushed the read_all_frames branch 3 times, most recently from 1b7d376 to 7021240 Compare February 13, 2021 03:32

jakirkham force-pushed the read_all_frames branch 10 times, most recently from 83f3e8b to ff4047f Compare February 16, 2021 03:29

jakirkham added 7 commits February 15, 2021 19:34

Overwrite frames variable w/joined result

a8d32f3

Ensures that the separate `frames` are freed from memory before proceeding to the next step of sending them out.

Read all frames in one pass

2e2fe13

Instead of doing multiple reads with `async`, just allocate one big chunk of memory for all of the frames and read into it. Should cutdown on the number of passes through Tornado needed to fill these frames.

Handle nframes with rest of message

bccd0e6

Drop unneeded variable assignment

489c20b

Use packing and unpacking utility functions

77971c1

Simplifies the code in the TCP path by leveraging existing utility functions.

Fix-up variable names

e27dce0

Make it a little easier to follow how the variables relate to each other.

jakirkham force-pushed the read_all_frames branch from ff4047f to e27dce0 Compare February 16, 2021 03:34

Move packing/unpacking steps out of try

25e1672

These are just binary serialization steps that are not really depended on communication or issues that may come from sockets. So go ahead and move them out of the `try` block.

jakirkham added 2 commits February 15, 2021 20:25

Group header building steps together

2e140b9

Also group `frames` and `frames_nbytes` steps together. Finally rewrites the code to avoid use of constant for size of `"Q"`, which should make it invariant to changes in that size.

Use struct.calcsize to get size

d12d0d9

Should avoid issues on platforms where this may not be the exact size.

jakirkham force-pushed the read_all_frames branch from 08bddb1 to d12d0d9 Compare February 16, 2021 05:03

Concatenate small frames beforehand

39f4f71

To simplify the logic, just concatenate small frames before doing any sends. This way we can use the same code path for all sends.

jakirkham force-pushed the read_all_frames branch from 72e9dde to f9f8602 Compare February 16, 2021 10:21

jakirkham changed the title ~~Read all frames in one pass~~ Read & write all frames in one pass Feb 16, 2021

jakirkham force-pushed the read_all_frames branch from f9f8602 to 5860ccd Compare February 16, 2021 10:29

jakirkham force-pushed the read_all_frames branch from 5860ccd to 67be160 Compare February 16, 2021 11:03

jakirkham force-pushed the read_all_frames branch from 67be160 to 3e37aa9 Compare February 16, 2021 11:07

jakirkham mentioned this pull request Feb 16, 2021

DGX 4506 - 0 - Nightly Benchmark run 4506-0-20210216 quasiben/dask-scheduler-performance#108

Open

jakirkham mentioned this pull request Feb 17, 2021

Profiling Scheduler Performance #4443

Open

mrocklin merged commit 383ea03 into dask:master Feb 17, 2021

jakirkham deleted the read_all_frames branch February 17, 2021 16:41

jakirkham mentioned this pull request Feb 17, 2021

Using asyncio directly in TCP #4513

Closed

fjetter mentioned this pull request Feb 23, 2021

Integer overflow in TCP comm #4538

Closed

gjoseph92 mentioned this pull request Jul 23, 2021

🛑 DNM Deserialization: zero-copy merge subframes when possible #5112

Closed

3 tasks

jakirkham mentioned this pull request Jul 24, 2021

Unnecessary deep copy causes memory flare on network comms #5107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read & write all frames in one pass #4506

Read & write all frames in one pass #4506

jakirkham commented Feb 12, 2021 •

edited

Loading

mrocklin commented Feb 12, 2021

jakirkham commented Feb 12, 2021

mrocklin commented Feb 12, 2021

jakirkham commented Feb 13, 2021 •

edited

Loading

jakirkham commented Feb 16, 2021 •

edited

Loading

jakirkham commented Feb 16, 2021

jakirkham commented Feb 16, 2021 •

edited

Loading

jakirkham commented Feb 17, 2021

jakirkham commented Feb 17, 2021

mrocklin commented Feb 17, 2021 via email

Read & write all frames in one pass #4506

Read & write all frames in one pass #4506

Conversation

jakirkham commented Feb 12, 2021 • edited Loading

mrocklin commented Feb 12, 2021

jakirkham commented Feb 12, 2021

mrocklin commented Feb 12, 2021

jakirkham commented Feb 13, 2021 • edited Loading

jakirkham commented Feb 16, 2021 • edited Loading

jakirkham commented Feb 16, 2021

jakirkham commented Feb 16, 2021 • edited Loading

jakirkham commented Feb 17, 2021

jakirkham commented Feb 17, 2021

mrocklin commented Feb 17, 2021 via email

jakirkham commented Feb 12, 2021 •

edited

Loading

jakirkham commented Feb 13, 2021 •

edited

Loading

jakirkham commented Feb 16, 2021 •

edited

Loading

jakirkham commented Feb 16, 2021 •

edited

Loading