Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Incorrect indices error when updating cube #413

Closed
lr4d opened this issue Feb 16, 2021 · 3 comments
Closed

Incorrect indices error when updating cube #413

lr4d opened this issue Feb 16, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@lr4d
Copy link
Collaborator

lr4d commented Feb 16, 2021

Problem description

It looks like #398 has broken backwards compatibility for kartothek.io.**_cube users

Unit test stacktrace:

/tmp/venv/raq/raqbundle/core_data_snapshotter/build_datasets.py:686: in update_cube
    metadata=ktk_cube_metadata,
/tmp/venv/lib/python3.6/site-packages/kartothek/io/dask/bag_cube.py:489: in update_cube_from_bag
    df_serializer=df_serializer,
/tmp/venv/lib/python3.6/site-packages/kartothek/io/dask/common_cube.py:418: in append_to_cube_from_bag_internal
    cube = ensure_valid_cube_indices(existing_datasets, cube)
/tmp/venv/lib/python3.6/site-packages/kartothek/io/dask/common_cube.py:75: in ensure_valid_cube_indices
    compatible_indices = _ensure_compatible_indices(ds, table_indices)
/tmp/venv/lib/python3.6/site-packages/kartothek/io_components/utils.py:125: in _ensure_compatible_indices
    f"Incorrect indices provided for dataset.\n"
E   ValueError: Incorrect indices provided for dataset.
E   Expected: []
E   But got: {'XXX', 'YYY'}

@lr4d lr4d added the bug Something isn't working label Feb 16, 2021
@lr4d lr4d changed the title Incorrect indices error for when updating cube Incorrect indices error when updating cube Feb 16, 2021
@fjetter
Copy link
Collaborator

fjetter commented Feb 16, 2021

Do you have a minimal example? this is otherwise impossible to reproduce

@lr4d
Copy link
Collaborator Author

lr4d commented Feb 17, 2021

Of course, I'll post a minimal example today. I just wanted to create the issue already to link to it

@stephan-hesselmann-by
Copy link
Collaborator

The issue occurs when updating a cube that has been created with multiple datasets. The problem is with ensure_valid_cube_indices - it mutates the required indices while depending on the unmutated value when looping over the datasets. It should be relatively straight forward to fix. I'll check whether there were more issues on Monday and provide a fix for this one.

stephan-hesselmann-by added a commit to stephan-hesselmann-by/kartothek that referenced this issue Feb 22, 2021
The issue occured when updating a cube with multiple datasets, which also have different dimension columns.

This fixes issue JDASoftwareGroup#413.
stephan-hesselmann-by added a commit to stephan-hesselmann-by/kartothek that referenced this issue Feb 22, 2021
The issue occured when updating a cube with multiple datasets, which also have different dimension columns.

This fixes issue JDASoftwareGroup#413.
stephan-hesselmann-by added a commit to stephan-hesselmann-by/kartothek that referenced this issue Feb 22, 2021
The issue occured when updating a cube with multiple datasets, which also have different dimension columns.

This fixes issue JDASoftwareGroup#413.

The bug is caused by access of a mutated variable - namely `required_indices` via `table_indices` - in the loop.
I rewrote the loop to circumvent this problem and added a unit test which verifies that the index validation is working as expected.
stephan-hesselmann-by added a commit to stephan-hesselmann-by/kartothek that referenced this issue Feb 22, 2021
The issue occured when updating a cube with multiple datasets, which also have different dimension columns.

This fixes issue JDASoftwareGroup#413.

The bug is caused by access of a mutated variable - namely `required_indices` via `table_indices` - in the loop.
I rewrote the loop to circumvent this problem and added a unit test which verifies that the index validation is working as expected.
fjetter pushed a commit that referenced this issue Feb 23, 2021
* Fix: Cube index validation (#413)

The issue occured when updating a cube with multiple datasets, which also have different dimension columns.

This fixes issue #413.

The bug is caused by access of a mutated variable - namely `required_indices` via `table_indices` - in the loop.
I rewrote the loop to circumvent this problem and added a unit test which verifies that the index validation is working as expected.

* Expand unit tests

* Add unit test for index suppression
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Development

No branches or pull requests

3 participants