Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Jaeger/Zipkin Exporter Performance #1274

Merged
merged 12 commits into from
Sep 22, 2020

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Sep 16, 2020

Changes

A round of performance tweaks for Zipkin. It now uses the Batch<Activity throughout (instead of making a copy) and flushes small (4k) chunks to the Http stream as they are ready.

The new version is slightly slower. If I make the chunk size bigger, it can be faster. But I think that is cheating for the sake of benchmarking. In a real-world scenario, we want to start sending bytes as soon as we can. Today we buffer everything into memory and then send it all at the end, that's why the memory usage is so high. I think lower chunks will actually make the export finish faster for real-world usage. I did the same thing (lower packet size) for JaegerExporter. Size reduced from 65000 to 4096. Socket default send buffer is 8192. For Jaeger we want to get the packets to the socket and let the networking algorithm handle the transmission, it has enough buffering we don't need our own 😄

/cc @reyang Curious what your thoughts are on this 4096 size change?

Benchmarks

Before:

Method NumberOfBatches NumberOfSpans Mean Error StdDev Completed Work Items Lock Contentions Gen 0 Gen 1 Gen 2 Allocated
ZipkinExporter_Batching 1 10000 49.77 ms 0.539 ms 0.478 ms 5.1818 - 1181.8182 1090.9091 1000.0000 18.7 MB
ZipkinExporter_Batching 10 10000 458.13 ms 5.493 ms 4.869 ms 34.0000 - 11000.0000 9000.0000 9000.0000 186.79 MB
ZipkinExporter_Batching 100 10000 4,625.93 ms 72.792 ms 68.090 ms 305.0000 - 124000.0000 100000.0000 100000.0000 1867.75 MB

After:

Method NumberOfBatches NumberOfSpans Mean Error StdDev Completed Work Items Lock Contentions Gen 0 Gen 1 Gen 2 Allocated
ZipkinExporter_Batching 1 10000 56.84 ms 1.087 ms 1.209 ms 4.2000 - 200.0000 - - 1.71 MB
ZipkinExporter_Batching 10 10000 511.93 ms 5.001 ms 4.678 ms 25.0000 - 2000.0000 - - 16.95 MB
ZipkinExporter_Batching 100 10000 5,151.66 ms 66.415 ms 58.875 ms 204.0000 - 21000.0000 1000.0000 - 169.3 MB

TODOs

  • CHANGELOG.md updated for non-trivial changes
  • Changes in public API reviewed

@CodeBlanch CodeBlanch requested a review from a team September 16, 2020 06:47
/// </summary>
public TimeSpan TimeoutSeconds { get; set; } = TimeSpan.FromSeconds(10);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeoutSeconds wasn't actually being used. The ActivityProcessor now owns timeout, so I just removed it.

@codecov
Copy link

codecov bot commented Sep 16, 2020

Codecov Report

Merging #1274 into master will decrease coverage by 0.00%.
The diff coverage is 93.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1274      +/-   ##
==========================================
- Coverage   79.12%   79.12%   -0.01%     
==========================================
  Files         215      215              
  Lines        6176     6175       -1     
==========================================
- Hits         4887     4886       -1     
  Misses       1289     1289              
Impacted Files Coverage Δ
...plementation/ZipkinActivityConversionExtensions.cs 92.07% <91.66%> (-0.08%) ⬇️
...rc/OpenTelemetry.Exporter.Zipkin/ZipkinExporter.cs 85.89% <93.75%> (-1.29%) ⬇️
...rc/OpenTelemetry.Exporter.Jaeger/JaegerExporter.cs 84.04% <100.00%> (ø)
...Telemetry.Exporter.Jaeger/JaegerExporterOptions.cs 100.00% <100.00%> (ø)
...Telemetry.Exporter.Zipkin/ZipkinExporterOptions.cs 100.00% <100.00%> (ø)
...ter.ZPages/Implementation/ZPagesActivityTracker.cs 100.00% <0.00%> (+2.85%) ⬆️

@reyang
Copy link
Member

reyang commented Sep 17, 2020

/cc @reyang Curious what your thoughts are on this 4096 size change?

It makes total sense to leverage the HTTP/TCP layer to do buffering. For smaller events we might consider turning off the Nagle's Algorithm (e.g. metrics).

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@eddynaka eddynaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
nit: I would change the title to Jaeger/Zipkin Exporter Performance

@CodeBlanch CodeBlanch changed the title Zipkin Exporter Performance Jaeger/Zipkin Exporter Performance Sep 18, 2020
@CodeBlanch CodeBlanch merged commit 401c111 into open-telemetry:master Sep 22, 2020
@CodeBlanch CodeBlanch deleted the zipkin-perf branch September 22, 2020 19:41
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants