-
Notifications
You must be signed in to change notification settings - Fork 49
fix: read_csv with both index_col and use_cols inconsistent with pandas #1785
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
032f193
to
8d6d9ee
Compare
0785cf8
to
344d6c9
Compare
bigframes/session/__init__.py
Outdated
index_col=index_col, | ||
columns=columns, | ||
names=names, | ||
is_index_in_columns=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this parameter name. Wouldn't the read_gbq_table
function be able to figure out that the index columns are present already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to index_col_in_columns
and added docstring.
bigframes/session/loader.py
Outdated
@@ -96,7 +96,9 @@ def _to_index_cols( | |||
return index_cols | |||
|
|||
|
|||
def _check_column_duplicates(index_cols: Iterable[str], columns: Iterable[str]): | |||
def _check_column_duplicates( | |||
index_cols: Iterable[str], columns: Iterable[str], is_index_in_columns: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at the logic, I still don't understand the is_index_in_columns
name. If there isn't a better name, could we at least add some docstrings with more information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to index_col_in_columns
and added docstring.
tests/system/small/test_session.py
Outdated
|
||
# BigFrames requires `sort_index()` because BigQuery doesn't preserve row IDs | ||
# (b/280889935) or guarantee row ordering. | ||
bf_df = bf_df.sort_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we sort by the index already if we determine it's unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catches! Removed it from all similar tests.
344d6c9
to
7e59b20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Fixes internal issue 408499371 🦕