fix: read_csv with both index_col and use_cols inconsistent with pandas #1785

chelsea-lin · 2025-05-30T22:00:19Z

Fixes internal issue 408499371 🦕

tswast · 2025-06-04T17:20:24Z

bigframes/session/__init__.py

+            index_col=index_col,
+            columns=columns,
+            names=names,
+            is_index_in_columns=True,


I'm a bit confused by this parameter name. Wouldn't the read_gbq_table function be able to figure out that the index columns are present already?

renamed to index_col_in_columns and added docstring.

tswast · 2025-06-04T17:21:42Z

bigframes/session/loader.py

@@ -96,7 +96,9 @@ def _to_index_cols(
    return index_cols


-def _check_column_duplicates(index_cols: Iterable[str], columns: Iterable[str]):
+def _check_column_duplicates(
+    index_cols: Iterable[str], columns: Iterable[str], is_index_in_columns: bool


After looking at the logic, I still don't understand the is_index_in_columns name. If there isn't a better name, could we at least add some docstrings with more information?

renamed to index_col_in_columns and added docstring.

tswast · 2025-06-04T17:22:48Z

tests/system/small/test_session.py

+
+    # BigFrames requires `sort_index()` because BigQuery doesn't preserve row IDs
+    # (b/280889935) or guarantee row ordering.
+    bf_df = bf_df.sort_index()


Don't we sort by the index already if we determine it's unique?

Good catches! Removed it from all similar tests.

tswast

Thanks!

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels May 30, 2025

chelsea-lin force-pushed the main_chelsealin_readcsv branch from 032f193 to 8d6d9ee Compare May 30, 2025 23:46

chelsea-lin marked this pull request as ready for review May 30, 2025 23:46

chelsea-lin requested review from a team as code owners May 30, 2025 23:46

chelsea-lin requested a review from tswast May 30, 2025 23:46

blunderbuss-gcf bot assigned Genesis929 May 30, 2025

chelsea-lin force-pushed the main_chelsealin_readcsv branch 2 times, most recently from 0785cf8 to 344d6c9 Compare June 3, 2025 22:52

tswast reviewed Jun 4, 2025

View reviewed changes

chelsea-lin added 3 commits June 5, 2025 17:43

fix: read_csv with both index_col and use_cols inconsistent with pandas

0b7a3a9

ensure columns is not list type and avoid flacky ordered of columns

645316d

add docstring for index_col_in_columns and fix tests

7e59b20

chelsea-lin force-pushed the main_chelsealin_readcsv branch from 344d6c9 to 7e59b20 Compare June 5, 2025 17:46

chelsea-lin requested a review from tswast June 5, 2025 17:47

chelsea-lin added the kokoro:run Add this label to force Kokoro to re-run the tests. label Jun 5, 2025

Merge branch 'main' into main_chelsealin_readcsv

6ec7f1b

bigframes-bot removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Jun 6, 2025

tswast approved these changes Jun 6, 2025

View reviewed changes

tswast enabled auto-merge (squash) June 6, 2025 20:19

tswast merged commit ba7c313 into main Jun 6, 2025
17 of 24 checks passed

tswast deleted the main_chelsealin_readcsv branch June 6, 2025 20:24

release-please bot mentioned this pull request Jun 6, 2025

chore(main): release 2.6.0 #1787

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: read_csv with both index_col and use_cols inconsistent with pandas #1785

fix: read_csv with both index_col and use_cols inconsistent with pandas #1785

Uh oh!

chelsea-lin commented May 30, 2025

Uh oh!

tswast Jun 4, 2025

Uh oh!

chelsea-lin Jun 5, 2025

Uh oh!

tswast Jun 4, 2025

Uh oh!

chelsea-lin Jun 5, 2025

Uh oh!

tswast Jun 4, 2025

Uh oh!

chelsea-lin Jun 5, 2025

Uh oh!

tswast left a comment

Uh oh!

Uh oh!

Uh oh!

fix: read_csv with both index_col and use_cols inconsistent with pandas #1785

fix: read_csv with both index_col and use_cols inconsistent with pandas #1785

Uh oh!

Conversation

chelsea-lin commented May 30, 2025

Uh oh!

tswast Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

chelsea-lin Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

chelsea-lin Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

chelsea-lin Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

tswast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!