-
Notifications
You must be signed in to change notification settings - Fork 26
Performance considerations
Benchmarking results suggest that Capnp.Net.Runtime
provides reasonable performance. The .NET Standard 2.1 variant shows superior performance compared to the .NET Standard 2.0 variant, thanks to exploiting the Span-based APIs. If you like to run your own experiments, the two VS solutions in the Benchmarking folder may serve as a good starting point:
-
CapnpBench.sln
contains BenchmarkDotNet-based benchmarks, with gRPC reference measurements (needless to say that capnproto-dotnetcore is the winner). -
CapnpProfile.sln
contains a simple setup for performance profiling with Visual Studio's built-in features. E.g. it answers the question how much time is spent in which method.
The benchmarks measure a very simple "ping/pong" scenario: Caller sends an RPC with N bytes payload, which the callee sends back. Caller waits for response and sends the next message. The measured times are roundtrip times: from caller back to caller. The benchmarks also vary the midlayer's buffer size, which will be discussed later on. Running the benchmarks on the author's machine, loopback device, yields these results:
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
Intel Core i7-9700K CPU 3.60GHz (Coffee Lake), 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method | PayloadBytes | BufferSize | Mean | Error | StdDev |
---|---|---|---|---|---|
Echo | 20 | 0 | 117.21 us | 0.167 us | 0.156 us |
Echo | 20 | 256 | 67.94 us | 0.213 us | 0.189 us |
Echo | 20 | 1024 | 68.58 us | 0.134 us | 0.126 us |
Echo | 20 | 4096 | 67.23 us | 0.215 us | 0.201 us |
Echo | 200 | 0 | 117.19 us | 1.177 us | 1.043 us |
Echo | 200 | 256 | 80.69 us | 0.279 us | 0.261 us |
Echo | 200 | 1024 | 67.35 us | 0.181 us | 0.169 us |
Echo | 200 | 4096 | 68.45 us | 0.363 us | 0.340 us |
Echo | 2000 | 0 | 156.74 us | 0.479 us | 0.425 us |
Echo | 2000 | 256 | 90.23 us | 0.332 us | 0.295 us |
Echo | 2000 | 1024 | 94.89 us | 0.221 us | 0.196 us |
Echo | 2000 | 4096 | 83.83 us | 0.304 us | 0.284 us |
Echo | 20000 | 0 | 165.04 us | 0.404 us | 0.378 us |
Echo | 20000 | 256 | 103.76 us | 0.193 us | 0.171 us |
Echo | 20000 | 1024 | 101.38 us | 0.331 us | 0.310 us |
Echo | 20000 | 4096 | 103.33 us | 0.228 us | 0.191 us |
Echo | 200000 | 0 | 428.80 us | 8.441 us | 13.388 us |
Echo | 200000 | 256 | 274.07 us | 0.917 us | 0.766 us |
Echo | 200000 | 1024 | 272.49 us | 2.109 us | 1.869 us |
Echo | 200000 | 4096 | 276.94 us | 1.511 us | 1.262 us |
Echo | 2000000 | 0 | 2,516.97 us | 38.379 us | 32.048 us |
Echo | 2000000 | 256 | 3,379.62 us | 214.919 us | 633.692 us |
Echo | 2000000 | 1024 | 3,510.61 us | 125.403 us | 369.753 us |
Echo | 2000000 | 4096 | 2,825.07 us | 46.446 us | 41.173 us |
Linear regression gives that any remote procedure call induces fixed costs of somewhat less than 70µs, whereas each payload byte contributes ~1.3 .. 1.7ns. Profiling on where those 70µs are spent, it turns out that a whopping 97% slice belongs to "external code", including I/O (the socket send/receive functions).
Even for loopback IP, the vast majority of time is spent during I/O (namely, in the low-level socket send/receive functions). Hence, optimizing I/O is the first choice for optimization performance in general.
Buffering is a very simple, yet highly effective measure for achieving high performance. By gathering data before handing them over to the socket it minimizes the number of send operations. You have to configure buffering manually for both RpcTcpClient
and RpcTcpServer
:
// Configuring a TCP client with buffering
var client = new TcpRpcClient();
client.AddBuffering(1024);
client.Connect("localhost", 1234);
// Configuring a TCP server with buffering
var server = new TcpRpcServer();
server.AddBuffering(1024);
server.StartAccepting(IPAddress.Any, 1234);
Without AddBuffering
buffering won't be enabled. Although it seems almost always a good idea to use buffering, it is not enabled by default due to several reasons:
- Benchmark results suggest that buffering is counterproductive for very large message payloads.
- Choosing the buffer size according to application characteristics may save some additional microseconds.
- You might want to implement your own buffering strategy, maybe in conjunction with a custom flow control strategy.
Flow control is about picking the right time for pressing the "flush button": When to transmit buffered data to the send socket? The more data we accumulate, the better it is for throughput. However, attempting to accumulate beyond the point when there are no more data to follow would disrupt communication. The most defensive option would be to flush the buffer after each Cap'n Proto RPC message. But the Capnp.Net.Runtime
does better than that. If you consider the benchmark's ping-pong pattern described above (which is hopefully representative for the behavior of many applications), you will find that each RPC Return
message actually causes two messages being sent in turn:
- A
Finish
message for finalizing the current question - A
Call
message for asking the new question
The RpcEngine
implementation is able to buffer both messages before sending them in a single transmission. To achieve that, each possible RpcEngine
entry method installs a "flush guard" on the current stack frame. The flush guard takes accountability for calling Flush
before RpcEngine
code is finally left. Hence, we are able to defer any "defensive flush" to the guard's accountability.
By exploiting applicating-specific communication patterns it may be possible to improve performance with a custom flow control strategy. Let's assume that your application always makes n calls at once:
IMyInterface i = ...;
var t1 = i.Foo("hello");
var t2 = i.Bar("world", new MyInterface());
var t3 = i.Baz();
await Task.WhenAll(t1, t2, t3);
Then you might apply an explicit flushing strategy. You could do so by injecting a custom midlayer. There is currently no off-the-shelf class doing exactly this. But here is an untested sketch how this might look like:
class FlushController
{
bool _flushSuspended;
readonly List<FlushControlledStream> _clients = new List<FlushControlledStream>();
public void RequestFlush(FlushControlledStream client)
{
if (_flushSuspended)
_clients.Add(client);
else
client.WrappedStream.Flush();
}
public void SuspendFlush()
{
_flushSuspended = true;
}
public void ResumeFlush()
{
_flushSuspended = false;
foreach (var client in _clients)
{
client.WrappedStream.Flush();
}
_clients.Clear();
}
}
class FlushControlledStream : Stream
{
readonly FlushController _flushController;
public FlushControlledStream(Stream wrappedStream, FlushController flushController)
{
WrappedStream = wrappedStream;
_flushController = flushController;
}
public Stream WrappedStream { get; }
public override bool CanRead => true;
public override bool CanSeek => false;
public override bool CanWrite => true;
public override long Length => 0;
public override long Position { get => 0; set => throw new NotSupportedException(); }
public override void Flush()
{
_flushController.RequestFlush(this);
}
public override int Read(byte[] buffer, int offset, int count) => WrappedStream.Read(buffer, offset, count);
public override long Seek(long offset, SeekOrigin origin) => throw new NotSupportedException();
public override void SetLength(long value) => throw new NotSupportedException();
public override void Write(byte[] buffer, int offset, int count) => WrappedStream.Write(buffer, offset, count);
}
struct FlushGuard: IDisposable
{
readonly FlushController _ctl;
public FlushGuard(FlushController ctl)
{
_ctl = ctl;
_ctl.SuspendFlush();
}
public void Dispose()
{
_ctl.ResumeFlush();
}
}
static void Main(string[] args)
{
var controller = new FlushController();
var client = new TcpRpcClient();
client.AddBuffering(256);
client.InjectMidlayer(s => new FlushControlledStream(s, controller));
client.Connect("localhost", 1234);
var main = client.GetMain<IMyInterface>();
// declare t1, t2, t3
using (var guard = new FlushGuard(controller))
{
t1 = i.Foo("hello");
t2 = i.Bar("world", new MyInterface());
t3 = i.Baz();
}
// await here
}