[manifold] make partitioner faster. #3

jhgg · 2018-02-04T04:32:36Z

Partitioner Optimizations

instead of Utils.group_by, we are using a new function called Utils.partition_pids which uses a tuple instead of a map internally. This ends up being roughly 2.2x faster than Utils.group_by in the benchmarks attached.

## GroupByBench
benchmark name     iterations   average time
partition_pids 8         5000   340.75 µs/op
partition_pids 24        5000   382.38 µs/op
partition_pids 48        5000   432.45 µs/op
group by 8               5000   600.24 µs/op
group by 24              2000   730.41 µs/op
group by 48              1000   1154.78 µs/op

make state not need to be modified during iteration. (pre-spawn the workers) in the Partitioner. This means we don't have to use Enum.reduce and can instead use a more optimized code path for this, do_send, which operates on a tuple of lists, and sends to the respective worker. Turns out this isn't really any faster...

## WorkerSendBenches
benchmark name    iterations   average time
enum reduce send       50000   65.56 µs/op
do_send send           50000   68.85 µs/op

add a specific case for sending to a single pid (which we do quite often, when doing manifold sends to a single pid). There is no practical speedup in avoiding the group by operation, but the resultant send operation ends up being faster, as it doesn't have to start an Enum.reducer, or iterate over the result of Utils.partition_pids, so that avoids ~0.15µs/op + ~0.70-1.5 µs/op. additionally, this function consumes less reductions as it is doing less things (which we want!)

## GroupByOneBench
benchmark name     iterations   average time
partition_pids 8     10000000   0.11 µs/op
partition_pids 24    10000000   0.14 µs/op
group by 48          10000000   0.14 µs/op
group by 8           10000000   0.14 µs/op
group by 24          10000000   0.16 µs/op
partition_pids 48    10000000   0.20 µs/op

## WorkerSendOneBenches (worker count=48)
benchmark name    iterations   average time
enum reduce send    10000000   0.70 µs/op
do_send send         1000000   1.47 µs/op

worker gained some optimizations too:

use list comprehension instead of Enum.each (it's a little bit faster), here's sending to 200 pids:

## SendBench
benchmark name     iterations   average time
send list comp           2000   961.87 µs/op
send fast reducer        2000   982.81 µs/op
send enum each           1000   1153.00 µs/op

add a special case for sending to a single pid:

## SendBenchOne
benchmark name     iterations   average time
send one             10000000   0.22 µs/op
send fast reducer    10000000   0.27 µs/op
send enum each       10000000   0.34 µs/op
send list comp       10000000   0.34 µs/op

Manifold.send

make a specialized code path for sending to a single pid.
use list comprehension instead of Enum.each

perf improvements.

vishnevskiy · 2018-02-04T04:48:12Z

lib/manifold/partitioner.ex

@@ -31,7 +31,11 @@ defmodule Manifold.Partitioner do
    # Set optimal process flags
    Process.flag(:trap_exit, true)
    Process.flag(:message_queue_data, :off_heap)
-    {:ok, Tuple.duplicate(nil, partitions)}
+    workers = for _ <- 0..partitions do
+      {:ok, pid} = Worker.start_link()


It is probably a fine change. But this no longer supports respawning a worker if it somehow crashes. If you want to make this change you should remove :trap_exit and the :EXIT match.

jhgg · 2018-02-04T05:20:19Z

this is clearly faster, i'm wanting to merge this then work on sharding partitions in #4. wdyt?

vishnevskiy

Only change that seems worth it is the group by optimization.

[manifold] make partitioner's partition operation 2.2x faster and other

afa549b

perf improvements.

jhgg requested a review from vishnevskiy February 4, 2018 04:32

benches.

0e8726d

vishnevskiy reviewed Feb 4, 2018

View reviewed changes

respawn worker properly.

77784d1

jhgg added 5 commits February 3, 2018 21:33

add optimized send path for sending to a single pid.

1d2da94

make worker a wee bit faster too.

d7b3aca

cleanups.

ee79c13

disappointing benchmarks.

acaf97e

commit some more benches.

d57b495

jhgg changed the title ~~[manifold] make partitioner's partition operation 2.2x faster and other~~ [manifold] make partitioner faster. Feb 4, 2018

jhgg requested a review from ihumanable February 4, 2018 06:17

jhgg added 11 commits February 3, 2018 22:23

test send one code path.

5fe6cd5

split the handles.

3faf6ed

specialize send for a single process too.

1862721

specialize send for a single process too.

80b50f9

use list comp.

379b953

tests.

8743634

fix test.

17f4d9e

[] is a free term + fix compiler warning.

9eb2cb7

Map.new no more.

f3bb00b

cleanup

5d6022f

fail.

c115a17

vishnevskiy approved these changes Feb 4, 2018

View reviewed changes

jhgg merged commit 05c01cd into master Feb 4, 2018

jhgg added a commit that referenced this pull request Feb 4, 2018

[manifold] optimize various components (#3)

841590c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[manifold] make partitioner faster. #3

[manifold] make partitioner faster. #3

jhgg commented Feb 4, 2018 •

edited

Loading

vishnevskiy Feb 4, 2018

jhgg Feb 4, 2018

jhgg commented Feb 4, 2018

vishnevskiy left a comment

[manifold] make partitioner faster. #3

[manifold] make partitioner faster. #3

Conversation

jhgg commented Feb 4, 2018 • edited Loading

Partitioner Optimizations

worker gained some optimizations too:

Manifold.send

vishnevskiy Feb 4, 2018

Choose a reason for hiding this comment

jhgg Feb 4, 2018

Choose a reason for hiding this comment

jhgg commented Feb 4, 2018

vishnevskiy left a comment

Choose a reason for hiding this comment

jhgg commented Feb 4, 2018 •

edited

Loading