-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Implement Iterators on Storage, arithmetic and comparisons use the StorageIterators #12274
Conversation
std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorageFacade.java
Outdated
Show resolved
Hide resolved
package org.enso.table.data.column.storage; | ||
|
||
public interface ColumnDoubleStorageWithArray extends ColumnDoubleStorage { | ||
/** Gets the array of long values for faster iteration. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where the faster iteration is needed? Is there a benchmark to execute and compare the slow/faster iteration? I'd like to see what Iterator
isn't enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So have been testing on Column_Arithmetic_1000000.Plus_Fitting
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were benchmark regressions that we've been discussing on Friday and later were also reported by @Akirathan in #12257. I think the goal of this specialization is to go back to the pre-27th of January performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With develop - e.g. commit 33a9ba9 I can run:
sbt:std-benchmarks> benchOnly Column_Arithmetic_1000000.Plus_Fitting
[info] Benchmark Mode Cnt Score Error Units
[info] Column_Arithmetic_1000000.Plus_Fitting avgt 3 3.414 ± 16.068 ms/op
when running on this branch - e.g. 0c097d8 I get:
sbt:std-benchmarks> benchOnly Column_Arithmetic_1000000.Plus_Fitting
[info] Benchmark Mode Cnt Score Error Units
[info] Column_Arithmetic_1000000.Plus_Fitting avgt 3 3.156 ± 15.599 ms/op
that seems fast enough.
I'd like to see what Iterator isn't enough!
OK, so develop
branch is 10% slower than this PR. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best way to generate relevant IGV graphs is to:
enso$ JAVA_OPTS='-Dgraal.Dump=:1 -Dgraal.PrintGraph=Network' ./built-distribution/enso-engine-*/enso-*/bin/enso --run test/Benchmarks Plus_Fitting
while IGV is running. Then search for zipOverLongArrayStorages
:
and open "Before phase HighTierLowering" that one shall be relatively understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When comparing the XyzWithArray
approach with the previous one, we are basically talking about following change:
iff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
index f0939dec81..e922ab64d2 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
@@ -724,8 +724,8 @@ public class StorageIterators {
if (skipNothing && (isNothing1 || isNothing2)) {
builder.appendNulls(1);
} else {
- long value1 = isNothing1 ? 0 : data1[index];
- long value2 = isNothing2 ? 0 : data2[index];
+ long value1 = isNothing1 ? 0 : (Long) source1.getItemBoxed(index);
+ long value2 = isNothing2 ? 0 : (Long) source2.getItemBoxed(index);
var result = operation.apply(index, value1, isNothing1, value2, isNothing2);
builder.append(result);
}
the difference in the compiler graphs is visible on these two pictures. This is the (currently) faster one:
This is the slower one:
The difference isn't in boxing! The difference is in reading additional LongStorage.data
field!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following archive zipOverLongArrayStorage.bgv.gz shows the differences.
public static <R, S, T> ColumnStorage<T> zipOverStorages( | ||
ColumnStorage<R> source1, | ||
ColumnStorage<S> source2, | ||
LongFunction<BuilderForType<T>> builderConstructor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, feel free to ignore
but seems like it may be easier to read if we have
@FunctionalInterface
public interface BuilderFactory<T> {
BuilderForType<T> build(long initialCapacity);
}
and then
LongFunction<BuilderForType<T>> builderConstructor, | |
BuilderFactory<T> builderFactory, |
std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
Outdated
Show resolved
Hide resolved
...bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java
Show resolved
Hide resolved
...g/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpImplementation.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great, just some minor style suggestions.
std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the XyzWithArray
approach is the right one. It breaks the encapsulation - walking this path we'll soon end up with spaghetti OOP code as we had at the end of later year.
As this analysis explains the problem isn't in boxing values - box more! You just need to eliminate the additional read of LongStorage.data
field.
Take a look at #10150 for one possible way to do it.
…type. Added Zip method to StorageIterators. Added ColumnDoubleStorageWithArray. Added specialisations to StorageIterators.
…e/ColumnStorageFacade.java Co-authored-by: Jaroslav Tulach <jaroslav.tulach@enso.org>
Another way is presented in the following diff. Let's add diff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
index f0939dec81..2a2d321903 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
@@ -707,9 +707,9 @@ public class StorageIterators {
LongFunction<BuilderForType<T>> builderConstructor,
boolean skipNothing,
LongZipOperation<T> operation) {
- var data1 = source1.getArray();
+ var data1 = source1.asList();
long size1 = source1.getSize();
- var data2 = source2.getArray();
+ var data2 = source2.asList();
long size2 = source2.getSize();
long size = Math.max(size1, size2);
@@ -719,13 +719,15 @@ public class StorageIterators {
Context context = Context.getCurrent();
for (int index = 0; index < size; index++) {
- boolean isNothing1 = index >= size1 || source1.isNothing(index);
- boolean isNothing2 = index >= size2 || source2.isNothing(index);
+ Long valueOrNull1 = data1.get(index);
+ Long valueOrNull2 = data2.get(index);
+ var isNothing1 = valueOrNull1 == null;
+ var isNothing2 = valueOrNull2 == null;
if (skipNothing && (isNothing1 || isNothing2)) {
builder.appendNulls(1);
} else {
- long value1 = isNothing1 ? 0 : data1[index];
- long value2 = isNothing2 ? 0 : data2[index];
+ var value1 = isNothing1 ? 0 : valueOrNull1;
+ var value2 = isNothing2 ? 0 : valueOrNull2;
var result = operation.apply(index, value1, isNothing1, value2, isNothing2);
builder.append(result);
}
diff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningBigDecimal.java std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningBigDecimal.java
index 29cb63ea58..93a434a432 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningBigDecimal.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningBigDecimal.java
@@ -1,5 +1,7 @@
package org.enso.table.data.column.operation.map.numeric.arithmetic;
+import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigDecimal;
+
import java.math.BigDecimal;
import org.enso.base.polyglot.NumericConverter;
import org.enso.table.data.column.builder.Builder;
@@ -10,8 +12,6 @@ import org.enso.table.data.column.storage.*;
import org.enso.table.data.column.storage.numeric.BigDecimalStorage;
import org.enso.table.data.column.storage.numeric.BigIntegerStorage;
-import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigDecimal;
-
public abstract class NumericBinaryOpReturningBigDecimal<
T extends Number, I extends Storage<? super T>>
extends BinaryMapOperation<T, I> {
diff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningDouble.java std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningDouble.java
index 4cb8acc709..74158cbf5f 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningDouble.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/arithmetic/NumericBinaryOpReturningDouble.java
@@ -1,6 +1,5 @@
package org.enso.table.data.column.operation.map.numeric.arithmetic;
-import java.math.BigDecimal;
import java.math.BigInteger;
import org.enso.base.polyglot.NumericConverter;
import org.enso.table.data.column.builder.Builder;
diff --git std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/comparisons/NumericComparison.java std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/comparisons/NumericComparison.java
index 9f8b1c2b7b..6ede9c46f3 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/comparisons/NumericComparison.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/operation/map/numeric/comparisons/NumericComparison.java
@@ -1,5 +1,8 @@
package org.enso.table.data.column.operation.map.numeric.comparisons;
+import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigDecimal;
+import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigInteger;
+
import java.math.BigDecimal;
import java.math.BigInteger;
import org.enso.base.CompareException;
@@ -12,15 +15,11 @@ import org.enso.table.data.column.storage.BoolStorage;
import org.enso.table.data.column.storage.ColumnDoubleStorage;
import org.enso.table.data.column.storage.ColumnLongStorage;
import org.enso.table.data.column.storage.ColumnStorage;
-import org.enso.table.data.column.storage.ColumnStorageFacade;
import org.enso.table.data.column.storage.Storage;
import org.enso.table.data.column.storage.numeric.BigDecimalStorage;
import org.enso.table.data.column.storage.numeric.BigIntegerStorage;
import org.enso.table.data.column.storage.numeric.DoubleStorageFacade;
-import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigDecimal;
-import static org.enso.table.data.column.operation.map.numeric.arithmetic.NumericBinaryOpImplementation.asBigInteger;
-
public abstract class NumericComparison<T extends Number, I extends Storage<? super T>>
extends BinaryMapOperation<T, I> {
@@ -63,9 +62,12 @@ public abstract class NumericComparison<T extends Number, I extends Storage<? su
} else if (arg instanceof BigDecimal bigDecimal) {
return switch (storage) {
case BigDecimalStorage s -> runBigDecimalMap(s, bigDecimal, problemAggregator);
- case BigIntegerStorage s -> runBigDecimalMap(asBigDecimal(s), bigDecimal, problemAggregator);
- case ColumnDoubleStorage s -> runBigDecimalMap(asBigDecimal(s), bigDecimal, problemAggregator);
- case ColumnLongStorage s -> runBigDecimalMap(asBigDecimal(s), bigDecimal, problemAggregator);
+ case BigIntegerStorage s -> runBigDecimalMap(
+ asBigDecimal(s), bigDecimal, problemAggregator);
+ case ColumnDoubleStorage s -> runBigDecimalMap(
+ asBigDecimal(s), bigDecimal, problemAggregator);
+ case ColumnLongStorage s -> runBigDecimalMap(
+ asBigDecimal(s), bigDecimal, problemAggregator);
default -> throw newUnsupported(storage);
};
} else if (NumericConverter.isCoercibleToLong(arg)) {
diff --git std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorage.java std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorage.java
index 09cc34737e..aa2b33a23e 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorage.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorage.java
@@ -1,5 +1,7 @@
package org.enso.table.data.column.storage;
+import java.util.AbstractList;
+import java.util.List;
import org.enso.table.data.column.storage.type.StorageType;
/** Basic interface of a column storage. */
@@ -20,4 +22,18 @@ public interface ColumnStorage<T> {
/* Gets the value at a given index. */
T getItemBoxed(long index);
+
+ default List<T> asList() {
+ return new AbstractList<T>() {
+ @Override
+ public T get(int index) {
+ return getItemBoxed(index);
+ }
+
+ @Override
+ public int size() {
+ return Math.toIntExact(getSize());
+ }
+ };
+ }
}
diff --git std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java
index f4e906633f..55c6f19e0a 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java
@@ -24,7 +24,7 @@ public class DoubleStorageFacade<T> implements ColumnDoubleStorage {
return new DoubleStorageFacade<>(parent, Long::doubleValue) {
@Override
public double getItemAsDouble(long index) throws ValueIsNothingException {
- return (double)parent.getItemAsLong(index);
+ return (double) parent.getItemAsLong(index);
}
@Override
diff --git std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java
index 857b9c5b79..f9ec616563 100644
--- std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java
+++ std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java
@@ -1,6 +1,7 @@
package org.enso.table.data.column.storage.numeric;
import java.math.BigInteger;
+import java.util.AbstractList;
import java.util.BitSet;
import java.util.List;
import org.enso.base.polyglot.NumericConverter;
@@ -194,4 +195,33 @@ public final class LongStorage extends AbstractLongStorage
public long[] getArray() {
return data;
}
+
+ @Override
+ public List<Long> asList() {
+ return new LongList(data, isNothing);
+ }
+
+ private static final class LongList extends AbstractList<Long> {
+ private final long[] data;
+ private final BitSet isNothing;
+
+ private LongList(long[] data, BitSet isNothing) {
+ this.data = data;
+ this.isNothing = isNothing;
+ }
+
+ @Override
+ public Long get(int index) {
+ if (isNothing.get(index)) {
+ return null;
+ } else {
+ return data[index];
+ }
+ }
+
+ @Override
+ public int size() {
+ return data.length;
+ }
+ }
} I haven't benchmarked it deeply (I am still driving), but I think it is fast. The idea is to create a local copy of storage internals that can remain virtual - e.g. both its fields ( If this really turns to be fast, then there is no need for |
0c097d8
to
dd9c6ba
Compare
std-bits/table/src/main/java/org/enso/table/data/column/operation/StorageIterators.java
Outdated
Show resolved
Hide resolved
var iterator = source.iterator(); | ||
while (iterator.moveNext()) { | ||
if (iterator.isNothing()) { | ||
if (preserveNothing) { | ||
builder.appendNulls(1); | ||
} else { | ||
operation.apply(builder, iterator.getIndex(), Double.NaN, true); | ||
} | ||
} else { | ||
operation.apply(builder, index, source.getItemAsDouble(index), source.isNothing(index)); | ||
operation.apply(builder, iterator.getIndex(), iterator.getItemAsDouble(), false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way of operating with index seems just nicer.
But again - if we are not using it perhaps it is not worth keeping it here - seems like too early 'future-proofing' (I know I'm guilty of this as well from time to time...)
That is just an opinion though so feel free to ignore if you think it is better to have it already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree will remove in a later PR - index used for some error reporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unblocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- description of the PR is outdated, it should be fixed before another round of review
- from an API perspective:
- hard to separate API from implementation
- too many types are
public
- needlessly increases conceptual surface - too little
final
modifiers - hurts clarity
- I am not sure why we have own iterator interface.
java.util.Iterator
shall be good enough! - Any reason to keep the duality of boxed vs. unboxed methods?
Btw. there are only three combinations of access modifies that make sense in an API:
public final
- anybody "call me"!protected final
- subclasses can "call me"protected abstract
- subclasses "must implement me"
anything else does not have enough clarity.
std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java
Show resolved
Hide resolved
@@ -198,4 +187,70 @@ public LongStorage widen(IntegerType widerType) { | |||
assert widerType.fits(getType()); | |||
return new LongStorage(data, (int) getSize(), getIsNothingMap(), widerType); | |||
} | |||
|
|||
/** Allow access to the underlying data array for copying. */ | |||
public long[] getArray() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an internal methods, not something exposed to everyone. This way we are just asking clueless developers to call this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently public as needed by the Union operation. Intention is to hide as the refactor continues.
} | ||
|
||
@Override | ||
public long getItemAsLong() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we continue to have this "boxed" vs. "unboxed" duality? Boxing doesn't influence the peak performance!
std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/LongStorage.java
Show resolved
Hide resolved
import org.enso.table.data.column.storage.type.StorageType; | ||
|
||
/** A facade for a column storage that converts the stored type to a long. */ | ||
public class LongStorageFacade<T> implements ColumnLongStorage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you want clueless developers to subclass, then:
public class LongStorageFacade<T> implements ColumnLongStorage { | |
public final class LongStorageFacade<T> implements ColumnLongStorage { |
std-bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorage.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this package supposed to provide an API for Storage
? Then why do we have a BoolStorage
in the API package? Why there is a separate numeric
implementation package, but no bool
implementation package?
...bits/table/src/main/java/org/enso/table/data/column/storage/numeric/DoubleStorageFacade.java
Outdated
Show resolved
Hide resolved
Going to merge but needs more work. Will pick up again on Wednesday. |
Pull Request Description
The first step to get back some of the performance from the storage refactor.
ColumnLongStorageWithArray
andColumnDoubleStorageWithArray
interfaces allowing low-level iterators to iterate over the array directly.DoubleArrayAdapter
,BigIntegerArrayAdapter
andBigDecimalArrayAdapter
in favour of havingColumnStorage
based facades allowing for some storage to present as others.NumericBinaryOpImplementation
intoStorageIterators
. Currently a lot of duplication to theNumericComparison
. Will look to make more common later.The intention is to keep working on the common iteration layer as build up the operator space.
Various tests on the performance impact of various parts (tested on my PC, over 1,000,000 additions, only provided for reference):
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
or the Snowflake database integration, a run of the Extra Tests has been scheduled.