Skip to content

Performance considerations

c80k edited this page May 29, 2020 · 10 revisions

Benchmarking results suggest that Capnp.Net.Runtime provides reasonable performance. The .NET Standard 2.1 variant shows superior performance compared to the .NET Standard 2.0 variant, thanks to exploiting the Span-based APIs. If you like to run your own experiments, the two VS solutions in the Benchmarking folder may serve as a good starting point:

  • CapnpBench.sln contains BenchmarkDotNet-based benchmarks, with gRPC reference measurements (needless to say that capnproto-dotnetcore is the winner).
  • CapnpProfile.sln contains a simple setup for performance profiling with Visual Studio's built-in features. E.g. it answers the question how much time is spent in which method.

The benchmarks measure a very simple "ping/pong" scenario: Caller sends an RPC with N bytes payload, which the callee sends back. Caller waits for response and sends the next message. The measured times are roundtrip times: from caller back to caller. The benchmarks also vary the midlayer's buffer size, which will be discussed later on. Running the benchmarks on the author's machine, loopback device, yields these results:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
Intel Core i7-9700K CPU 3.60GHz (Coffee Lake), 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method PayloadBytes BufferSize Mean Error StdDev
Echo 20 0 117.21 us 0.167 us 0.156 us
Echo 20 256 67.94 us 0.213 us 0.189 us
Echo 20 1024 68.58 us 0.134 us 0.126 us
Echo 20 4096 67.23 us 0.215 us 0.201 us
Echo 200 0 117.19 us 1.177 us 1.043 us
Echo 200 256 80.69 us 0.279 us 0.261 us
Echo 200 1024 67.35 us 0.181 us 0.169 us
Echo 200 4096 68.45 us 0.363 us 0.340 us
Echo 2000 0 156.74 us 0.479 us 0.425 us
Echo 2000 256 90.23 us 0.332 us 0.295 us
Echo 2000 1024 94.89 us 0.221 us 0.196 us
Echo 2000 4096 83.83 us 0.304 us 0.284 us
Echo 20000 0 165.04 us 0.404 us 0.378 us
Echo 20000 256 103.76 us 0.193 us 0.171 us
Echo 20000 1024 101.38 us 0.331 us 0.310 us
Echo 20000 4096 103.33 us 0.228 us 0.191 us
Echo 200000 0 428.80 us 8.441 us 13.388 us
Echo 200000 256 274.07 us 0.917 us 0.766 us
Echo 200000 1024 272.49 us 2.109 us 1.869 us
Echo 200000 4096 276.94 us 1.511 us 1.262 us
Echo 2000000 0 2,516.97 us 38.379 us 32.048 us
Echo 2000000 256 3,379.62 us 214.919 us 633.692 us
Echo 2000000 1024 3,510.61 us 125.403 us 369.753 us
Echo 2000000 4096 2,825.07 us 46.446 us 41.173 us

Linear regression gives that any remote procedure call induces fixed costs of somewhat less than 70µs, whereas each payload byte contributes ~1.3 .. 1.7ns. Profiling on where those 70µs are spent, it turns out that a whopping 97% slice belongs to "external code", including I/O (the socket send/receive functions).

Even for loopback IP, the vast majority of time is spent during I/O (namely, in the low-level socket send/receive functions). Hence, optimizing I/O is the first choice for optimization performance in general.

Buffering

Buffering is a very simple, yet highly effective measure for achieving high performance. By gathering data before handing them over to the socket it minimizes the number of send operations. You have to configure buffering manually for both RpcTcpClient and RpcTcpServer:

// Configuring a TCP client with buffering
var client = new TcpRpcClient();
client.AddBuffering(1024);
client.Connect("localhost", 1234);

// Configuring a TCP server with buffering
var server = new TcpRpcServer();
server.AddBuffering(1024);
server.StartAccepting(IPAddress.Any, 1234);

Without AddBuffering buffering won't be enabled. Although it seems almost always a good idea to use buffering, it is not enabled by default due to several reasons:

  • Benchmark results suggest that buffering is counterproductive for very large message payloads.
  • Choosing the buffer size according to application characteristics may save some additional microseconds.
  • You might want to implement your own buffering strategy, maybe in conjunction with a custom flow control strategy.

Flow control

Flow control is about picking the right time for pressing the "flush button": When to transmit buffered data to the send socket? The more data we accumulate, the better it is for throughput. However, attempting to accumulate beyond the point when there are no more data to follow would disrupt communication. The most defensive option would be to flush the buffer after each Cap'n Proto RPC message. But the Capnp.Net.Runtime does better than that. If you consider the benchmark's ping-pong pattern described above (which is hopefully representative for the behavior of many applications), you will find that each RPC Return message actually causes two messages being sent in turn:

  • A Finish message for finalizing the current question
  • A Call message for asking the new question

The RpcEngine implementation is able to buffer both messages before sending them in a single transmission. To achieve that, each possible RpcEngine entry method installs a "flush guard" on the current stack frame. The flush guard takes accountability for calling Flush before RpcEngine code is finally left. Hence, we are able to defer any "defensive flush" to the guard's accountability.

By exploiting applicating-specific communication patterns it may be possible to improve performance with a custom flow control strategy. Let's assume that your application always makes n calls at once:

IMyInterface i = ...;

var t1 = i.Foo("hello");
var t2 = i.Bar("world", new MyInterface());
var t3 = i.Baz();
await Task.WhenAll(t1, t2, t3);

Then you might apply an explicit flushing strategy. You could do so by injecting a custom midlayer. There is currently no off-the-shelf class doing exactly this. But here is an untested sketch how this might look like:

class FlushController
{
    bool _flushSuspended;
    readonly List<FlushControlledStream> _clients = new List<FlushControlledStream>();

    public void RequestFlush(FlushControlledStream client)
    {
        if (_flushSuspended)
            _clients.Add(client);
        else
            client.WrappedStream.Flush();
    }

    public void SuspendFlush()
    {
        _flushSuspended = true;
    }

    public void ResumeFlush()
    {
        _flushSuspended = false;
        foreach (var client in _clients)
        {
            client.WrappedStream.Flush();
        }
        _clients.Clear();
    }
}

class FlushControlledStream : Stream
{
    readonly FlushController _flushController;

    public FlushControlledStream(Stream wrappedStream, FlushController flushController)
    {
        WrappedStream = wrappedStream;
        _flushController = flushController;
    }

    public Stream WrappedStream { get; }

    public override bool CanRead => true;

    public override bool CanSeek => false;

    public override bool CanWrite => true;

    public override long Length => 0;

    public override long Position { get => 0; set => throw new NotSupportedException(); }

    public override void Flush()
    {
        _flushController.RequestFlush(this);
    }

    public override int Read(byte[] buffer, int offset, int count) => WrappedStream.Read(buffer, offset, count);

    public override long Seek(long offset, SeekOrigin origin) => throw new NotSupportedException();

    public override void SetLength(long value) => throw new NotSupportedException();

    public override void Write(byte[] buffer, int offset, int count) => WrappedStream.Write(buffer, offset, count);
}

struct FlushGuard: IDisposable
{
    readonly FlushController _ctl;

    public FlushGuard(FlushController ctl)
    {
        _ctl = ctl;
        _ctl.SuspendFlush();
    }

    public void Dispose()
    {
        _ctl.ResumeFlush();
    }
}

static void Main(string[] args)
{
    var controller = new FlushController();

    var client = new TcpRpcClient();
    client.AddBuffering(256);
    client.InjectMidlayer(s => new FlushControlledStream(s, controller));
    client.Connect("localhost", 1234);

    var main = client.GetMain<IMyInterface>();
    // declare t1, t2, t3
    using (var guard = new FlushGuard(controller))
    {
       t1 = i.Foo("hello");
       t2 = i.Bar("world", new MyInterface());
       t3 = i.Baz();
    }
    // await here
}