Skip to content

Commit

Permalink
fix ci
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Sep 29, 2022
1 parent 82554fb commit 07d0b0a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ jobs:
run: pip install --upgrade pyarrow huggingface-hub
- name: Install depencencies (minimum versions)
if: ${{ matrix.deps_versions != 'latest' }}
run: pip install pyarrow==6.0.1 huggingface-hub==0.2.0
run: pip install pyarrow==6.0.1 huggingface-hub==0.2.0 transformers
- name: Test with pytest
run: |
python -m pytest -rfExX -m ${{ matrix.test }} -n 2 --dist loadfile -sv ./tests/
Expand Down

1 comment on commit 07d0b0a

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.007817 / 0.011353 (-0.003535) 0.003681 / 0.011008 (-0.007328) 0.028686 / 0.038508 (-0.009822) 0.031155 / 0.023109 (0.008045) 0.303259 / 0.275898 (0.027361) 0.371870 / 0.323480 (0.048390) 0.005639 / 0.007986 (-0.002347) 0.003094 / 0.004328 (-0.001234) 0.006638 / 0.004250 (0.002387) 0.040084 / 0.037052 (0.003032) 0.314773 / 0.258489 (0.056284) 0.351208 / 0.293841 (0.057367) 0.028788 / 0.128546 (-0.099758) 0.009298 / 0.075646 (-0.066349) 0.247763 / 0.419271 (-0.171508) 0.047249 / 0.043533 (0.003716) 0.305085 / 0.255139 (0.049946) 0.328236 / 0.283200 (0.045036) 0.099364 / 0.141683 (-0.042319) 1.493229 / 1.452155 (0.041074) 1.535724 / 1.492716 (0.043008)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.232808 / 0.018006 (0.214802) 0.424662 / 0.000490 (0.424172) 0.004649 / 0.000200 (0.004449) 0.000300 / 0.000054 (0.000246)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.020980 / 0.037411 (-0.016432) 0.094866 / 0.014526 (0.080340) 0.105244 / 0.176557 (-0.071313) 0.149568 / 0.737135 (-0.587567) 0.106798 / 0.296338 (-0.189541)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.412583 / 0.215209 (0.197373) 4.121236 / 2.077655 (2.043582) 1.861092 / 1.504120 (0.356972) 1.667824 / 1.541195 (0.126629) 1.713061 / 1.468490 (0.244571) 0.441138 / 4.584777 (-4.143639) 3.327226 / 3.745712 (-0.418486) 1.863401 / 5.269862 (-3.406461) 1.227066 / 4.565676 (-3.338611) 0.051761 / 0.424275 (-0.372514) 0.010719 / 0.007607 (0.003112) 0.516276 / 0.226044 (0.290232) 5.182242 / 2.268929 (2.913313) 2.287232 / 55.444624 (-53.157392) 1.959852 / 6.876477 (-4.916625) 2.069001 / 2.142072 (-0.073071) 0.557652 / 4.805227 (-4.247575) 0.117436 / 6.500664 (-6.383228) 0.063016 / 0.075469 (-0.012453)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.540319 / 1.841788 (-0.301469) 12.972923 / 8.074308 (4.898615) 25.923726 / 10.191392 (15.732334) 0.885195 / 0.680424 (0.204771) 0.604232 / 0.534201 (0.070031) 0.345561 / 0.579283 (-0.233722) 0.387011 / 0.434364 (-0.047353) 0.228678 / 0.540337 (-0.311659) 0.236434 / 1.386936 (-1.150502)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.005806 / 0.011353 (-0.005546) 0.003655 / 0.011008 (-0.007353) 0.026776 / 0.038508 (-0.011732) 0.028262 / 0.023109 (0.005153) 0.415050 / 0.275898 (0.139152) 0.482372 / 0.323480 (0.158892) 0.003495 / 0.007986 (-0.004490) 0.002981 / 0.004328 (-0.001348) 0.004589 / 0.004250 (0.000338) 0.034680 / 0.037052 (-0.002372) 0.420201 / 0.258489 (0.161712) 0.465816 / 0.293841 (0.171975) 0.027426 / 0.128546 (-0.101120) 0.009451 / 0.075646 (-0.066195) 0.246202 / 0.419271 (-0.173070) 0.046717 / 0.043533 (0.003185) 0.422071 / 0.255139 (0.166932) 0.443141 / 0.283200 (0.159942) 0.089308 / 0.141683 (-0.052375) 1.625945 / 1.452155 (0.173790) 1.598227 / 1.492716 (0.105511)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.230508 / 0.018006 (0.212502) 0.410766 / 0.000490 (0.410276) 0.004113 / 0.000200 (0.003913) 0.000086 / 0.000054 (0.000032)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.021070 / 0.037411 (-0.016341) 0.095788 / 0.014526 (0.081263) 0.107848 / 0.176557 (-0.068708) 0.143844 / 0.737135 (-0.593291) 0.106057 / 0.296338 (-0.190281)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.438956 / 0.215209 (0.223747) 4.357397 / 2.077655 (2.279742) 2.128646 / 1.504120 (0.624526) 1.951197 / 1.541195 (0.410003) 2.005766 / 1.468490 (0.537276) 0.443226 / 4.584777 (-4.141551) 3.367314 / 3.745712 (-0.378398) 1.826530 / 5.269862 (-3.443331) 1.094225 / 4.565676 (-3.471452) 0.052394 / 0.424275 (-0.371881) 0.010764 / 0.007607 (0.003156) 0.539118 / 0.226044 (0.313073) 5.405828 / 2.268929 (3.136899) 2.534709 / 55.444624 (-52.909915) 2.203938 / 6.876477 (-4.672539) 2.320279 / 2.142072 (0.178206) 0.555994 / 4.805227 (-4.249234) 0.118189 / 6.500664 (-6.382475) 0.063331 / 0.075469 (-0.012139)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.601234 / 1.841788 (-0.240554) 13.410652 / 8.074308 (5.336344) 26.396248 / 10.191392 (16.204856) 0.951435 / 0.680424 (0.271011) 0.666184 / 0.534201 (0.131983) 0.346560 / 0.579283 (-0.232723) 0.398592 / 0.434364 (-0.035772) 0.238750 / 0.540337 (-0.301587) 0.244879 / 1.386936 (-1.142057)

CML watermark

Please # to comment.