Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implement Iterators on Storage, arithmetic and comparisons use the StorageIterators #12274

Merged
merged 16 commits into from
Feb 17, 2025

Conversation

jdunkerley
Copy link
Member

Pull Request Description

The first step to get back some of the performance from the storage refactor.

  • New ColumnLongStorageWithArray and ColumnDoubleStorageWithArray interfaces allowing low-level iterators to iterate over the array directly.
  • Removed the DoubleArrayAdapter, BigIntegerArrayAdapter and BigDecimalArrayAdapter in favour of having ColumnStorage based facades allowing for some storage to present as others.
  • Moved the iterations in NumericBinaryOpImplementation into StorageIterators. Currently a lot of duplication to the NumericComparison. Will look to make more common later.

The intention is to keep working on the common iteration layer as build up the operator space.

Various tests on the performance impact of various parts (tested on my PC, over 1,000,000 additions, only provided for reference):

  • Using a builder versus populating array and bitset adds about 0.1ms.
  • Using the iterator over a tight inline loop appeared to be about 0.2ms
  • Using the array loops returns about 0.4ms over the getItemAsLong calls.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.
  • If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
    or the Snowflake database integration, a run of the Extra Tests has been scheduled.
    • If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

@jdunkerley jdunkerley added the CI: No changelog needed Do not require a changelog entry for this PR. label Feb 13, 2025
package org.enso.table.data.column.storage;

public interface ColumnDoubleStorageWithArray extends ColumnDoubleStorage {
/** Gets the array of long values for faster iteration. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where the faster iteration is needed? Is there a benchmark to execute and compare the slow/faster iteration? I'd like to see what Iterator isn't enough!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So have been testing on Column_Arithmetic_1000000.Plus_Fitting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
image

These were benchmark regressions that we've been discussing on Friday and later were also reported by @Akirathan in #12257. I think the goal of this specialization is to go back to the pre-27th of January performance

Copy link
Member

@JaroslavTulach JaroslavTulach Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With develop - e.g. commit 33a9ba9 I can run:

sbt:std-benchmarks> benchOnly Column_Arithmetic_1000000.Plus_Fitting
[info] Benchmark                               Mode  Cnt  Score    Error  Units
[info] Column_Arithmetic_1000000.Plus_Fitting  avgt    3  3.414 ± 16.068  ms/op

when running on this branch - e.g. 0c097d8 I get:

sbt:std-benchmarks> benchOnly Column_Arithmetic_1000000.Plus_Fitting
[info] Benchmark                               Mode  Cnt  Score    Error  Units
[info] Column_Arithmetic_1000000.Plus_Fitting  avgt    3  3.156 ± 15.599  ms/op

that seems fast enough.

I'd like to see what Iterator isn't enough!

OK, so develop branch is 10% slower than this PR. Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best way to generate relevant IGV graphs is to:

 enso$ JAVA_OPTS='-Dgraal.Dump=:1 -Dgraal.PrintGraph=Network' ./built-distribution/enso-engine-*/enso-*/bin/enso --run test/Benchmarks Plus_Fitting

while IGV is running. Then search for zipOverLongArrayStorages:

obrazek

and open "Before phase HighTierLowering" that one shall be relatively understandable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When comparing the XyzWithArray approach with the previous one, we are basically talking about following change:

iff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
index f0939dec81..e922ab64d2 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
@@ -724,8 +724,8 @@ public class StorageIterators {
       if (skipNothing && (isNothing1 || isNothing2)) {
         builder.appendNulls(1);
       } else {
-        long value1 = isNothing1 ? 0 : data1[index];
-        long value2 = isNothing2 ? 0 : data2[index];
+        long value1 = isNothing1 ? 0 : (Long) source1.getItemBoxed(index);
+        long value2 = isNothing2 ? 0 : (Long) source2.getItemBoxed(index);
         var result = operation.apply(index, value1, isNothing1, value2, isNothing2);
         builder.append(result);
       }

the difference in the compiler graphs is visible on these two pictures. This is the (currently) faster one:

obrazek

This is the slower one:

obrazek

The difference isn't in boxing! The difference is in reading additional LongStorage.data field!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following archive zipOverLongArrayStorage.bgv.gz shows the differences.

public static <R, S, T> ColumnStorage<T> zipOverStorages(
ColumnStorage<R> source1,
ColumnStorage<S> source2,
LongFunction<BuilderForType<T>> builderConstructor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, feel free to ignore

but seems like it may be easier to read if we have

@FunctionalInterface
public interface BuilderFactory<T> {
    BuilderForType<T> build(long initialCapacity);
}

and then

Suggested change
LongFunction<BuilderForType<T>> builderConstructor,
BuilderFactory<T> builderFactory,

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great, just some minor style suggestions.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the XyzWithArray approach is the right one. It breaks the encapsulation - walking this path we'll soon end up with spaghetti OOP code as we had at the end of later year.

As this analysis explains the problem isn't in boxing values - box more! You just need to eliminate the additional read of LongStorage.data field.

Take a look at #10150 for one possible way to do it.

jdunkerley and others added 7 commits February 14, 2025 09:12
…type.

Added Zip method to StorageIterators.
Added ColumnDoubleStorageWithArray.
Added specialisations to StorageIterators.
…e/ColumnStorageFacade.java

Co-authored-by: Jaroslav Tulach <jaroslav.tulach@enso.org>
@JaroslavTulach
Copy link
Member

JaroslavTulach commented Feb 14, 2025

Take a look at #10150 for one possible way to do it.

Another way is presented in the following diff. Let's add List<T> asList() to ColumnStorage! There can even be a default implementation, but LongStorage provides its own, hopefully faster:

diff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
index f0939dec81..2a2d321903 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
@@ -707,9 +707,9 @@ public class StorageIterators {
       LongFunction<BuilderForType<T>> builderConstructor,
       boolean skipNothing,
       LongZipOperation<T> operation) {
-    var data1 = source1.getArray();
+    var data1 = source1.asList();
     long size1 = source1.getSize();
-    var data2 = source2.getArray();
+    var data2 = source2.asList();
     long size2 = source2.getSize();
 
     long size = Math.max(size1, size2);
@@ -719,13 +719,15 @@ public class StorageIterators {
     Context context = Context.getCurrent();
 
     for (int index = 0; index < size; index++) {
-      boolean isNothing1 = index >= size1 || source1.isNothing(index);
-      boolean isNothing2 = index >= size2 || source2.isNothing(index);
+      Long valueOrNull1 = data1.get(index);
+      Long valueOrNull2 = data2.get(index);
+      var isNothing1 = valueOrNull1 == null;
+      var isNothing2 = valueOrNull2 == null;
       if (skipNothing && (isNothing1 || isNothing2)) {
         builder.appendNulls(1);
       } else {
-        long value1 = isNothing1 ? 0 : data1[index];
-        long value2 = isNothing2 ? 0 : data2[index];
+        var value1 = isNothing1 ? 0 : valueOrNull1;
+        var value2 = isNothing2 ? 0 : valueOrNull2;
         var result = operation.apply(index, value1, isNothing1, value2, isNothing2);
         builder.append(result);
       }
diff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningBigDecimal.java std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningBigDecimal.java
index 29cb63ea58..93a434a432 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningBigDecimal.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningBigDecimal.java
@@ -1,5 +1,7 @@
 package org.enso.table.data.column.operation.map.numeric.arithmetic;
 
+import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigDecimal;
+
 import java.math.BigDecimal;
 import org.enso.base.polyglot.NumericConverter;
 import org.enso.table.data.column.builder.Builder;
@@ -10,8 +12,6 @@ import org.enso.table.data.column.storage.*;
 import org.enso.table.data.column.storage.numeric.BigDecimalStorage;
 import org.enso.table.data.column.storage.numeric.BigIntegerStorage;
 
-import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigDecimal;
-
 public abstract class NumericBinaryOpReturningBigDecimal<
         T extends Number, I extends Storage<? super T>>
     extends BinaryMapOperation<T, I> {
diff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningDouble.java std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningDouble.java
index 4cb8acc709..74158cbf5f 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningDouble.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningDouble.java
@@ -1,6 +1,5 @@
 package org.enso.table.data.column.operation.map.numeric.arithmetic;
 
-import java.math.BigDecimal;
 import java.math.BigInteger;
 import org.enso.base.polyglot.NumericConverter;
 import org.enso.table.data.column.builder.Builder;
diff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/comparisons/NumericComparison.java std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/comparisons/NumericComparison.java
index 9f8b1c2b7b..6ede9c46f3 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/comparisons/NumericComparison.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/comparisons/NumericComparison.java
@@ -1,5 +1,8 @@
 package org.enso.table.data.column.operation.map.numeric.comparisons;
 
+import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigDecimal;
+import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigInteger;
+
 import java.math.BigDecimal;
 import java.math.BigInteger;
 import org.enso.base.CompareException;
@@ -12,15 +15,11 @@ import org.enso.table.data.column.storage.BoolStorage;
 import org.enso.table.data.column.storage.ColumnDoubleStorage;
 import org.enso.table.data.column.storage.ColumnLongStorage;
 import org.enso.table.data.column.storage.ColumnStorage;
-import org.enso.table.data.column.storage.ColumnStorageFacade;
 import org.enso.table.data.column.storage.Storage;
 import org.enso.table.data.column.storage.numeric.BigDecimalStorage;
 import org.enso.table.data.column.storage.numeric.BigIntegerStorage;
 import org.enso.table.data.column.storage.numeric.DoubleStorageFacade;
 
-import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigDecimal;
-import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigInteger;
-
 public abstract class NumericComparison<T extends Number, I extends Storage<? super T>>
     extends BinaryMapOperation<T, I> {
 
@@ -63,9 +62,12 @@ public abstract class NumericComparison<T extends Number, I extends Storage<? su
     } else if (arg instanceof BigDecimal bigDecimal) {
       return switch (storage) {
         case BigDecimalStorage s -> runBigDecimalMap(s, bigDecimal, problemAggregator);
-        case BigIntegerStorage s -> runBigDecimalMap(asBigDecimal(s), bigDecimal, problemAggregator);
-        case ColumnDoubleStorage s -> runBigDecimalMap(asBigDecimal(s), bigDecimal, problemAggregator);
-        case ColumnLongStorage s -> runBigDecimalMap(asBigDecimal(s), bigDecimal, problemAggregator);
+        case BigIntegerStorage s -> runBigDecimalMap(
+            asBigDecimal(s), bigDecimal, problemAggregator);
+        case ColumnDoubleStorage s -> runBigDecimalMap(
+            asBigDecimal(s), bigDecimal, problemAggregator);
+        case ColumnLongStorage s -> runBigDecimalMap(
+            asBigDecimal(s), bigDecimal, problemAggregator);
         default -> throw newUnsupported(storage);
       };
     } else if (NumericConverter.isCoercibleToLong(arg)) {
diff --git std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorage.java std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorage.java
index 09cc34737e..aa2b33a23e 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorage.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorage.java
@@ -1,5 +1,7 @@
 package org.enso.table.data.column.storage;
 
+import java.util.AbstractList;
+import java.util.List;
 import org.enso.table.data.column.storage.type.StorageType;
 
 /** Basic interface of a column storage. */
@@ -20,4 +22,18 @@ public interface ColumnStorage<T> {
 
   /* Gets the value at a given index. */
   T getItemBoxed(long index);
+
+  default List<T> asList() {
+    return new AbstractList<T>() {
+      @Override
+      public T get(int index) {
+        return getItemBoxed(index);
+      }
+
+      @Override
+      public int size() {
+        return Math.toIntExact(getSize());
+      }
+    };
+  }
 }
diff --git std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java
index f4e906633f..55c6f19e0a 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java
@@ -24,7 +24,7 @@ public class DoubleStorageFacade<T> implements ColumnDoubleStorage {
     return new DoubleStorageFacade<>(parent, Long::doubleValue) {
       @Override
       public double getItemAsDouble(long index) throws ValueIsNothingException {
-        return (double)parent.getItemAsLong(index);
+        return (double) parent.getItemAsLong(index);
       }
 
       @Override
diff --git std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java
index 857b9c5b79..f9ec616563 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java
@@ -1,6 +1,7 @@
 package org.enso.table.data.column.storage.numeric;
 
 import java.math.BigInteger;
+import java.util.AbstractList;
 import java.util.BitSet;
 import java.util.List;
 import org.enso.base.polyglot.NumericConverter;
@@ -194,4 +195,33 @@ public final class LongStorage extends AbstractLongStorage
   public long[] getArray() {
     return data;
   }
+
+  @Override
+  public List<Long> asList() {
+    return new LongList(data, isNothing);
+  }
+
+  private static final class LongList extends AbstractList<Long> {
+    private final long[] data;
+    private final BitSet isNothing;
+
+    private LongList(long[] data, BitSet isNothing) {
+      this.data = data;
+      this.isNothing = isNothing;
+    }
+
+    @Override
+    public Long get(int index) {
+      if (isNothing.get(index)) {
+        return null;
+      } else {
+        return data[index];
+      }
+    }
+
+    @Override
+    public int size() {
+      return data.length;
+    }
+  }
 }

I haven't benchmarked it deeply (I am still driving), but I think it is fast. The idea is to create a local copy of storage internals that can remain virtual - e.g. both its fields (data and isNothing) stay in registers. Just for fun I decided to implement java.util.List as that's quite useful interface.

If this really turns to be fast, then there is no need for ColumnLongStorage as boxing clearly isn't slowing us down.

Comment on lines +50 to 60
var iterator = source.iterator();
while (iterator.moveNext()) {
if (iterator.isNothing()) {
if (preserveNothing) {
builder.appendNulls(1);
} else {
operation.apply(builder, iterator.getIndex(), Double.NaN, true);
}
} else {
operation.apply(builder, index, source.getItemAsDouble(index), source.isNothing(index));
operation.apply(builder, iterator.getIndex(), iterator.getItemAsDouble(), false);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of operating with index seems just nicer.

But again - if we are not using it perhaps it is not worth keeping it here - seems like too early 'future-proofing' (I know I'm guilty of this as well from time to time...)

That is just an opinion though so feel free to ignore if you think it is better to have it already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree will remove in a later PR - index used for some error reporting.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • description of the PR is outdated, it should be fixed before another round of review
  • from an API perspective:
    • hard to separate API from implementation
    • too many types are public - needlessly increases conceptual surface
    • too little finalmodifiers - hurts clarity
  • I am not sure why we have own iterator interface. java.util.Iterator shall be good enough!
  • Any reason to keep the duality of boxed vs. unboxed methods?

Btw. there are only three combinations of access modifies that make sense in an API:

  • public final - anybody "call me"!
  • protected final - subclasses can "call me"
  • protected abstract - subclasses "must implement me"

anything else does not have enough clarity.

@@ -198,4 +187,70 @@ public LongStorage widen(IntegerType widerType) {
assert widerType.fits(getType());
return new LongStorage(data, (int) getSize(), getIsNothingMap(), widerType);
}

/** Allow access to the underlying data array for copying. */
public long[] getArray() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an internal methods, not something exposed to everyone. This way we are just asking clueless developers to call this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently public as needed by the Union operation. Intention is to hide as the refactor continues.

}

@Override
public long getItemAsLong() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we continue to have this "boxed" vs. "unboxed" duality? Boxing doesn't influence the peak performance!

import org.enso.table.data.column.storage.type.StorageType;

/** A facade for a column storage that converts the stored type to a long. */
public class LongStorageFacade<T> implements ColumnLongStorage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you want clueless developers to subclass, then:

Suggested change
public class LongStorageFacade<T> implements ColumnLongStorage {
public final class LongStorageFacade<T> implements ColumnLongStorage {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this package supposed to provide an API for Storage? Then why do we have a BoolStorage in the API package? Why there is a separate numeric implementation package, but no bool implementation package?

@jdunkerley jdunkerley changed the title Update NumericBinaryOp to use StorageIterators Implement Iterators on Storage, arithmetic and comparisons use the StorageIterators Feb 17, 2025
@jdunkerley jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Feb 17, 2025
@jdunkerley
Copy link
Member Author

Going to merge but needs more work. Will pick up again on Wednesday.

@mergify mergify bot merged commit b786a33 into develop Feb 17, 2025
48 checks passed
@mergify mergify bot deleted the wip/jd/table-op-perf branch February 17, 2025 21:35
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants