Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closes #664 - Added data structure to store sliding windows of metric data for percentile computation #667

Merged
merged 4 commits into from
Apr 14, 2020

Conversation

JonasKunz
Copy link
Member

@JonasKunz JonasKunz commented Apr 8, 2020

This change is Reviewable

…of metric data for percentile computation
Copy link
Member

@ivansenic ivansenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @JonasKunz)


inspectit-ocelot-core/src/jmh/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueuePerfTest.java, line 13 at r1 (raw file):

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class WindowedDoubleQueuePerfTest {

So this test is single threaded perf check and does not realistically show the avg time as the data structure is designed to be used in a multi-thread way.. How about adding 2 or 4 threads and diving the work the the best/worst case scenarios..


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 73 at r1 (raw file):

     * <p>
     * The queue expects that all inserts happen ordered in time!
     * You should never insert data which is older than the latest element in the queue.

should we protect against this, assert that the last value in the timeStamps array is less or equal the given timestamp?


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 80 at r1 (raw file):

     * @param timeStamp the timestamp of the point to insert
     */
    public synchronized void insert(double value, long timeStamp) {

Have you though about having this data structure as the non-thread safe and then have synchronisation on the usage level? It's also not clear to me what's the API here, I see that the removeStaleValues is public method, should this really the case.. From what I see you can insert elements and then get a copy of the values or current size and that's it..

You could easily create a non-thread-safe version and then wrap it in the thread safe version that uses re-entrant lock to separate write read operations (meaningful if there is more reads then writes only imo)..

Maybe something else is also more meaningful than a hard synchronized on the method level.. How often with this be used? Where and when will this be used?


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 131 at r1 (raw file):

     * @return
     */
    public synchronized ValueCopy copy(double[] resultBuffer) {

split this copy into two methods one without the param and one with the param, call internal with null or concrete, clearer api


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 157 at r1 (raw file):

    private void trimToSize() {
        int desiredCapacity = Math.max(MIN_CAPACITY, roundUpToPowerOfTwo(size * CAPACITY_SCALING_FACTOR));

wouldn't roundUpToPowerOfTwo(size) be enough here.. you anyway return the `highestOneBit * 2;

If I want to trim to 9, highest one bit is 8, you would return 16, which is good imo? In your impl you would have 9 * 2 = 18, highest one bit is 16, you would return 32? Even your tests prove this?


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 174 at r1 (raw file):

    private void copyValues(double[] destination) {
        int capacity = capacity();

shouldn't the capacity be parameter to the method? actually does no matter.. Maybe actually having both copy method as part of the resize makes more sense here..


inspectit-ocelot-core/src/test/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueueTest.java, line 73 at r1 (raw file):

            WindowedDoubleQueue.ValueCopy result = queue.copy(null);
            assertThat(result.getData()).isEqualTo(expectedResult);

should you assert the size of the queue here?


inspectit-ocelot-core/src/test/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueueTest.java, line 119 at r1 (raw file):

        }

        @Test

can you also add a case with the overflow? so start index + size > capacity?

Copy link
Member Author

@JonasKunz JonasKunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ivansenic)


inspectit-ocelot-core/src/jmh/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueuePerfTest.java, line 13 at r1 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

So this test is single threaded perf check and does not realistically show the avg time as the data structure is designed to be used in a multi-thread way.. How about adding 2 or 4 threads and diving the work the the best/worst case scenarios..

The data structure will rarely experience multi threaded contention.
Ingestion will happen through a single thread, which receives the data from the application threads using a ArrayBlockingQueue.
The reason for this buffering is that while this data structure has an amortized O(1) runtime, it can spike to O(n) in case of resizes.
We do not want this to happen on application threads.

The synchorinization / thread safety is only here because occasional concurrent reads may happen.
E.g. whenever an exporter runs, which is usually every 15 seconds, it will copy the data in order to compute the percentiles.
During this phase we want to avoid concurrency issues, otherwise the datastructure will only be used by a single thread.


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 73 at r1 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

should we protect against this, assert that the last value in the timeStamps array is less or equal the given timestamp?

This doesn't kill the datastrcuture, it can just lead to old data stying for too long i nthe queue.
Nevertheless I agree with you to add a check as it should really not happen.


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 80 at r1 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

Have you though about having this data structure as the non-thread safe and then have synchronisation on the usage level? It's also not clear to me what's the API here, I see that the removeStaleValues is public method, should this really the case.. From what I see you can insert elements and then get a copy of the values or current size and that's it..

You could easily create a non-thread-safe version and then wrap it in the thread safe version that uses re-entrant lock to separate write read operations (meaningful if there is more reads then writes only imo)..

Maybe something else is also more meaningful than a hard synchronized on the method level.. How often with this be used? Where and when will this be used?

removeStaleValues is there so you can remove stale values even when no data has been inserted.
This will happen for example before querying percentiles or at regular intervals to free up memory.

I agree that it is simpler to make this datastructure non-thread safe. Regarding the usage see my other comment above.


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 131 at r1 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

split this copy into two methods one without the param and one with the param, call internal with null or concrete, clearer api

Done.


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 157 at r1 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

wouldn't roundUpToPowerOfTwo(size) be enough here.. you anyway return the `highestOneBit * 2;

If I want to trim to 9, highest one bit is 8, you would return 16, which is good imo? In your impl you would have 9 * 2 = 18, highest one bit is 16, you would return 32? Even your tests prove this?

Shrinking too fast can turn the amortized O(1) performance to O(n).
Imagine having exactly 16 values in the queue, adding one and removing it again:

When the element is added, the capacity will grow to 32. Now when the element is removed again, which with your strategy would cause the queue to shrink to 16 again. Repeat this and you are copying the data whith each insert/removals.

With the * CAPACITY_SCALING_FACTOR we guarantee that shrinking only happens if the queue is less than 25% filled,
leading to a provable amortized O(1) performance as it avoids the worst case illustrated above.


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 174 at r1 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

shouldn't the capacity be parameter to the method? actually does no matter.. Maybe actually having both copy method as part of the resize makes more sense here..

capacity is the capacity of the queue, not the target array.

Regarding your second point I'm not quite sure if I get what you mean.
copyValues is implemented as a separate method because it is used for resize as well as the public copy method.


inspectit-ocelot-core/src/test/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueueTest.java, line 73 at r1 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

should you assert the size of the queue here?

Done.


inspectit-ocelot-core/src/test/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueueTest.java, line 119 at r1 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

can you also add a case with the overflow? so start index + size > capacity?

Done.

ivansenic
ivansenic previously approved these changes Apr 14, 2020
Copy link
Member

@ivansenic ivansenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @JonasKunz)


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 157 at r1 (raw file):

Previously, JonasKunz (Jonas Kunz) wrote…

Shrinking too fast can turn the amortized O(1) performance to O(n).
Imagine having exactly 16 values in the queue, adding one and removing it again:

When the element is added, the capacity will grow to 32. Now when the element is removed again, which with your strategy would cause the queue to shrink to 16 again. Repeat this and you are copying the data whith each insert/removals.

With the * CAPACITY_SCALING_FACTOR we guarantee that shrinking only happens if the queue is less than 25% filled,
leading to a provable amortized O(1) performance as it avoids the worst case illustrated above.

Got it, thanks..


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 174 at r1 (raw file):

Previously, JonasKunz (Jonas Kunz) wrote…

capacity is the capacity of the queue, not the target array.

Regarding your second point I'm not quite sure if I get what you mean.
copyValues is implemented as a separate method because it is used for resize as well as the public copy method.

Forget it, I missed the usage in the copy


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 83 at r2 (raw file):

     */
    public void insert(double value, long timeStamp) {
        if (size > 0 && timeStamps[normalizeIndex(startIndex + size - 1)] > timeStamp) {

maybe it makes sence to extract caculations like normalizeIndex(startIndex + size - 1) to lastindex(), firstIndex(), nextIndex() helpers.. just to increase the readability?


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 128 at r2 (raw file):

     */
    public ValueCopy copy() {
        return copy(null);

how you kept it, should you not pass the new double[size] instead of null? The in the other copy method you can have an assertion that the result buffer size is enough, thus remove most of the stuff and simply do the copying?

Copy link
Member Author

@JonasKunz JonasKunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 2 of 2 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @JonasKunz)

Copy link
Member Author

@JonasKunz JonasKunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ivansenic)


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 83 at r2 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

maybe it makes sence to extract caculations like normalizeIndex(startIndex + size - 1) to lastindex(), firstIndex(), nextIndex() helpers.. just to increase the readability?

Done.


inspectit-ocelot-core/src/main/java/rocks/inspectit/ocelot/core/metrics/percentiles/WindowedDoubleQueue.java, line 128 at r2 (raw file):

Previously, ivansenic (Ivan Senic) wrote…

how you kept it, should you not pass the new double[size] instead of null? The in the other copy method you can have an assertion that the result buffer size is enough, thus remove most of the stuff and simply do the copying?

Done.

Copy link
Member Author

@JonasKunz JonasKunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @ivansenic from 2 discussions.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@JonasKunz JonasKunz merged commit 0446eb7 into inspectIT:master Apr 14, 2020
@JonasKunz JonasKunz deleted the percentiles-queue branch April 14, 2020 09:01
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a time-based FIFO queue for storing sliding windows of metric data.
2 participants