-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Refactor ragged tensor processing for readbility and improved performance #104
Refactor ragged tensor processing for readbility and improved performance #104
Conversation
7d5dbf3
to
1242ecd
Compare
77c4d62
to
cb3a7e5
Compare
This functionality is currently unsupported with undefined behaviour. This test was returning different shapes betwen CPU/GPU due to the difference in the implementation of `pull_apart_list`.
Otherwise this results in implicit broadcasting that consumes a large amount of memory
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -98,63 +98,6 @@ def test_simple_model(): | |||
_ = model.evaluate(loader) | |||
|
|||
|
|||
def test_nested_list(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to make sure I understand this correctly, we remove this test because ragged representation is no longer supported and dataloader now always outputs a tuple of values and offsets, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't change the output representation. the output types and shapes should be the same as before. Apart from the torch version which now outputs the offsets with the last value correctly and consistently returns the scalar columns as 2-d arrays with the last dim of size 1
The reason for the removal of the test is mostly because it is now broken due to the difference between our implementation of the pull_apart_list behaving in a different way between cudf/pandas. I'll try restoring these two lines https://github.com/NVIDIA-Merlin/dataloader/blob/v0.0.4/merlin/dataloader/loader_base.py#L577-L579 and see If I can restore this test.
The current implementation and what this test is checking isn't really nested list support. It currently loses the information about the nesting during the transformation. And what you get out of the dataloader cannot be turned back into the nested lists in the original data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to restore this test test_nested_list
, and tests are all passing. We can leave this to another PR if we want to change the behaviour. Thanks for checking about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! I have one more comment.
tensors_by_name[column_name] = self._to_tensor(leaves), self._to_tensor( | ||
col_offsets | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: I think black formatting makes this harder to read (at least for me). I wonder if using parens or tuple()
, e.g., tensors_by_name[column_name] = (self._to_tensor(leaves), self._to_tensor(col_offsets))
, improves readability with black
.
if HAS_GPU: | ||
offsets = torch.cat([offsets, torch.cuda.LongTensor([len(values)], device=self.device)]) | ||
else: | ||
offsets = torch.cat([offsets, torch.LongTensor([len(values)])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember there was some issue with NVTabular and/or T4Rec in the multi-GPU case and device=self.device
was added to line 149. What do you think about still keeping something like if HAS_GPU: offsets = offsets.to(device=self.device)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, it was probably an issue because of torch.cat
and offsets
and the other tensor being on different devices, so it should be okay with the refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason for the removal of the lines doing torch.cat is because the offsets previously were missing the last value. After this change the offsets are now computed in the _row_lengths_to_offests
method which doesn't require this logic. There's still a torch.cat in use in that method, that concatenates the cumulative sum of the row lengths with a leading zero.
…ance (#104) * Update ragged tensor handling to improve load performance * Store values/row_lengths in dict in make_tensors * Implement _sum method fo jax * Rename `_handle_tensors` to `_process_batch` * Rename `_create_tensors` to `_process_dataframe` * Update docstring of `make_tensors` * Handle case where `HAS_GPU=True` and `CUDA_VISIBLE_DEVICES=""` * Enable offsets row-parition output for torch ragged tensors * Return 2-d scalars from torch dataloader consistently * Remove `test_nested_list` from tensorflow dataloader tests This functionality is currently unsupported with undefined behaviour. This test was returning different shapes betwen CPU/GPU due to the difference in the implementation of `pull_apart_list`. * Add squeeze to torch embeddings lookup to handle 2-d keys * Add squeeze to batch before calling loss Otherwise this results in implicit broadcasting that consumes a large amount of memory * Revert "Remove `test_nested_list` from tensorflow dataloader tests" This reverts commit 5aa7104. * Restore nested list handling for pandas
Refactor ragged tensor processing for readbility and improved performance.
Performance
Loading of list columns is now roughly 10x faster.
Torch
This change fixes a couple things that were inconsistent about the Torch loader.