Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

FEA: engine accepts dpnp.ndarray and dpt.usm_ndarray objects as input data. #62

Merged
merged 9 commits into from
Nov 29, 2022

Conversation

fcharras
Copy link
Collaborator

@fcharras fcharras commented Nov 23, 2022

I think I'm happy with the result, the code is a little messy but not more than sklearn's and I think the PR basically succeeds in mimicking almost all aspects of sklearn's UX regarding what inputs are accepted and how it's casted under the hood, and also performance regarding the cautiousness with memory copies, while additionally creating support for dpnp.ndarray and dpt.usm_ndarray inputs, and recycling the most possible code that already exists in sklearn.

The PR still needs some polishing, and a few tests with dpnp/dpt tensors.

I think one (minor) difference is that, while sklearn try to convert numpy arrays with object dtype to float64, our engine will error out in this case because dpt.asarray refuses object dtype as input. Those kind of edge cases are complicated to unifiy because different array libraries can have different choices and I think it's fine we let it fail in this case.

…o pass dpnp.ndarray and dpt.usm_ndarray objects as input data.
benchmark/ext_helpers/daal4py.py Outdated Show resolved Hide resolved

@lru_cache
def make_sum_reduction_2d_axis1_kernel(
size0, size1, work_group_size, device, dtype, fused_unary_func=None
Copy link
Collaborator Author

@fcharras fcharras Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this kernel are for enabling the addition of this new argument fused_unary_func which enable fusing a unary ops on the elements of the input before summing. It's used in this PR to compute variance for scaling tolerance.

@@ -1,14 +1,29 @@
import warnings
Copy link
Collaborator Author

@fcharras fcharras Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With those changes (that consists in almost rewriting the whole file but that was expected and the sequence of previous refacto was building toward that) the behavior of sklearn regarding input validation is almost identically implemented.

The old behavior is removed (and in particular: printing warning when avoidable copies are implicitly triggered)

@@ -364,7 +366,7 @@ def _get_score_with_centers(centers):
[
-1827.22702,
-1027.674243,
-865.257397,
-865.257501,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference in those values comes from numerical instability after including the X_mean removal step.

), override_attr_context(
sklearn_validation,
get_namespace=_get_namespace,
_asarray_with_order=_asarray_with_order,
Copy link
Collaborator Author

@fcharras fcharras Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using monkey patching here enables an efficient workflow as a whole. I wonder if rather than having those 2 functions limited to array api specs, we should also consider pluginyfying it.

@fcharras
Copy link
Collaborator Author

Out of WIP

@fcharras fcharras requested review from ogrisel and jjerphan November 24, 2022 14:18
@fcharras fcharras changed the title [Draft] FEA: engine accepts dpnp.ndarray and dpt.usm_ndarray objects as input data. FEA: engine accepts dpnp.ndarray and dpt.usm_ndarray objects as input data. Nov 24, 2022
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @fcharras. Here is a first pass.

Comment on lines +34 to +45
try:
attrs_before = dict()
for attr_name, attr_value in attrs.items():
# raise AttributeError if obj does not have the attribute attr_name
attrs_before[attr_name] = getattr(obj, attr_name)
setattr(obj, attr_name, attr_value)

yield

finally:
for attr_name, attr_value in attrs_before.items():
setattr(obj, attr_name, attr_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

Suggested change
try:
attrs_before = dict()
for attr_name, attr_value in attrs.items():
# raise AttributeError if obj does not have the attribute attr_name
attrs_before[attr_name] = getattr(obj, attr_name)
setattr(obj, attr_name, attr_value)
yield
finally:
for attr_name, attr_value in attrs_before.items():
setattr(obj, attr_name, attr_value)
attrs_before = dict()
for attr_name, attr_value in attrs.items():
attribute = getattr(obj, attr_name, None)
if attribute is not None:
# Only replace the value of the `attr_name` does exist.
attrs_before[attr_name] = attribute
setattr(obj, attr_name, attr_value)
yield
for attr_name, attr_value in attrs_before.items():
setattr(obj, attr_name, attr_value)

Copy link
Collaborator Author

@fcharras fcharras Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/finally is not only about the AttributeError that was documented, but about any error that can be raised after yield and before exiting the context. I thought initially that before exiting the code after yield is always executed, in the same way it's true for __exit__ when expliciting context managers, but in case of such errors it seems not to be true, and caused failing tests (in the sklearn pipeline). Adding the try/finally fixes that.

Regarding only replacing the value if attr_name does exist, i'd prefer not, raising an AttributeError is a safeguard against typos, or unseen changes in input objects. The context manager is only intended to be used for replacing attributes that exist and I like it to fail otherwise.

Comment on lines 346 to 347
engine_kmeans_plusplus_centers = engine.init_centroids(X_prepared)
engine_kmeans_plusplus_centers = dpt.asnumpy(engine_kmeans_plusplus_centers.T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to _t-suffix one of them?

Comment on lines 401 to 402
centers, indices = engine._kmeans_plusplus(X_prepared)
centers = dpt.asnumpy(centers.T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, do we need to _t-suffix one of them?

@@ -172,12 +171,16 @@ def make_sum_reduction_2d_axis1_kernel(size0, size1, work_group_size, device, dt
minus_one_idx = np.int64(-1)
two_as_a_long = np.int64(2)

is_1d = size1 is None
if fused_unary_func is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document fused_unary_func, please?

Copy link
Collaborator Author

@fcharras fcharras Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's documented in the public function that follows. For readability it would be better I think to swap the definitions of those, but it would have increased the diff and masked the true diff.

Comment on lines 422 to 423
if (X_mean == 0).astype(int).sum() == len(X_mean):
X_mean = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document why None is conventionally used in this case?

Comment on lines 446 to 448
n_features * n_samples,
None,
max_work_group_size,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
n_features * n_samples,
None,
max_work_group_size,
size0=n_features * n_samples,
size1=None,
work_group_size=max_work_group_size,

Comment on lines 479 to 484
def _minus(x, y):
return x - y


def _plus(x, y):
return x + y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you group and document those helpers function for ops?

sklearn_numba_dpex/kmeans/drivers.py Show resolved Hide resolved
Comment on lines 114 to 117
# NB: inplace. # Optimized for C-contiguous array and for
# size1 >> preferred_work_group_size_multiple
@dpex.kernel
def broadcast_ops(left_operand_array, right_operand_vector):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth indicating that the left operand is modified in place and that the right one isn't modified at all.

fcharras and others added 2 commits November 25, 2022 15:04
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@fcharras
Copy link
Collaborator Author

Pushed your suggestions, also amended some changes to enable compute follow data. (the preferred device for compute is the device that stores de data if data is already on-device)

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM up to a few suggestions and questions.

Thank you, @fcharras!

# future instances. It is only used for testing purposes, using
# `sklearn_numba_dpex.testing.config.override_attr_context` context, for instance
# in the benchmark script.
# For normal usage, the compute will follow the __compute_follow_data__ principle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "__compute_follow_data__" an emphasis, here?
Do you have a reference for this concept?

Copy link
Collaborator Author

@fcharras fcharras Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me replace with *compute follows data* which would be more correct makrdown.

The concept is anchored in dpctl/numba_dpex design

  • in dpctl 1

  • in numba dpex 1 2

Comment on lines +75 to +84
# NB: numba_dpex kernels only currently supports working with C memory layout
# (see https://github.com/IntelPython/numba-dpex/issues/767) but our KMeans
# implementation is hypothetized to be more efficient with the F-memory layout.
# As a workaround the kernels work with the transpose of X, X_t, where X_t
# is created with a C layout, which results in equivalent memory access
# patterns than with a F layout for X.
# TODO: when numba_dpex supports inputs with F-layout:
# - use X rather than X_t and adapt the codebase (better for readability and
# more consistent with sklearn notations)
# - test the performances with both layouts and use the best performing layout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

)
# subsequent kernel calls only sum the data.
nofunc_kernel = _make_partial_sum_reduction_2d_axis1_kernel(
n_rows, work_group_size, None, dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
n_rows, work_group_size, None, dtype
n_rows, work_group_size, fused_unary_func=None, dtype=dtype

Comment on lines 423 to 424

if (X_mean == 0).astype(int).sum() == len(X_mean):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (X_mean == 0).astype(int).sum() == len(X_mean):
X_mean_is_zeroed = (X_mean == 0).astype(int).sum() == len(X_mean)
if X_mean_is_zeroed:

@@ -401,6 +404,80 @@ def _relocate_empty_clusters(
)


def prepare_data_for_lloyd(X_t, init, tol, copy_x):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document prepare_data_for_lloyd indicating that this is centering X for numerical stability?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ mostly reported sklearn docstring

Comment on lines +160 to +161
# NB: sampling without replacement must be executed sequentially so
# it's better done on CPU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 165 to 170
# Poor man's fancy indexing
# TODO: write a kernel ? or replace with better equivalent when available ?
centers_t = dpt.concat(
[dpt.expand_dims(X[center_idx], axes=1) for center_idx in centers_idx],
axis=1,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we open an issue on dpctl? What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dpctl implements the Array API and the Array API doesn't list such tools. Best things to have would be take but here are the list of supported functions 1 2

I'm not aware enough of where the array api is going to say what would be the best action here 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually take has been added recently to the Array APi, dpctl issue

Comment on lines 186 to 188
use_uniform_weights = (sample_weight == sample_weight[0]).astype(
int
).sum() == len(sample_weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use_uniform_weights = (sample_weight == sample_weight[0]).astype(
int
).sum() == len(sample_weight)
use_uniform_weights = (
(sample_weight == sample_weight[0]).astype(int).sum()
== len(sample_weight)
)

Copy link
Collaborator Author

@fcharras fcharras Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black doesn't like this, but it does accept

        use_uniform_weights = (
            (sample_weight == sample_weight[0]).astype(int).sum()
        ) == len(sample_weight)

i'll go for it

def get_labels(self, X, sample_weight):
labels, _ = self._get_labels_inertia(X, with_inertia=False)
# TODO: sample_weight actually not used for get_labels. Fix in sklearn ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue already open for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 304 to 306
sample_weight = dpt.ones(n_samples, dtype=dtype, device=device)
elif isinstance(sample_weight, numbers.Number):
sample_weight = dpt.full(n_samples, 1, dtype=dtype, device=device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the reasons not to have only one branch here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mistake! the latter line should read:

Suggested change
sample_weight = dpt.ones(n_samples, dtype=dtype, device=device)
elif isinstance(sample_weight, numbers.Number):
sample_weight = dpt.full(n_samples, 1, dtype=dtype, device=device)
sample_weight = dpt.ones(n_samples, dtype=dtype, device=device)
elif isinstance(sample_weight, numbers.Number):
sample_weight = dpt.full(n_samples, sample_weight, dtype=dtype, device=device)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@fcharras
Copy link
Collaborator Author

Thanks for the review @jjerphan I've pushed your suggestions and answered other comments. I'll go ahead and merge when the pipeline is green.

@fcharras fcharras merged commit 4489f5a into main Nov 29, 2022
@fcharras fcharras deleted the accept_dpnp_dpt_inputs branch November 29, 2022 12:54
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants