Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improved error message for gated/private repos #5497

Merged
merged 2 commits into from
Feb 2, 2023
Merged

Conversation

osanseviero
Copy link
Contributor

Using use_auth_token=True is not needed anymore. If a user logged in, the token will be automatically retrieved. Also include a mention for gated repos

See huggingface/huggingface_hub#1064

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 2, 2023

The documentation is not available anymore as the PR was closed or merged.

@github-actions
Copy link

github-actions bot commented Feb 2, 2023

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.009491 / 0.011353 (-0.001862) 0.004690 / 0.011008 (-0.006319) 0.111904 / 0.038508 (0.073396) 0.030781 / 0.023109 (0.007671) 0.309442 / 0.275898 (0.033544) 0.389511 / 0.323480 (0.066031) 0.007277 / 0.007986 (-0.000709) 0.004364 / 0.004328 (0.000036) 0.074501 / 0.004250 (0.070250) 0.036799 / 0.037052 (-0.000254) 0.320279 / 0.258489 (0.061790) 0.353887 / 0.293841 (0.060046) 0.047969 / 0.128546 (-0.080577) 0.017281 / 0.075646 (-0.058366) 0.339655 / 0.419271 (-0.079617) 0.049317 / 0.043533 (0.005784) 0.321221 / 0.255139 (0.066082) 0.354743 / 0.283200 (0.071544) 0.098634 / 0.141683 (-0.043049) 1.408640 / 1.452155 (-0.043515) 1.488361 / 1.492716 (-0.004356)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.233677 / 0.018006 (0.215671) 0.604424 / 0.000490 (0.603934) 0.003834 / 0.000200 (0.003634) 0.000103 / 0.000054 (0.000049)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.022682 / 0.037411 (-0.014729) 0.103800 / 0.014526 (0.089274) 0.113868 / 0.176557 (-0.062689) 0.155111 / 0.737135 (-0.582025) 0.111862 / 0.296338 (-0.184476)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.474992 / 0.215209 (0.259783) 4.755325 / 2.077655 (2.677670) 1.889754 / 1.504120 (0.385634) 1.597009 / 1.541195 (0.055814) 1.639570 / 1.468490 (0.171080) 0.970681 / 4.584777 (-3.614096) 4.782567 / 3.745712 (1.036855) 4.350465 / 5.269862 (-0.919397) 2.413533 / 4.565676 (-2.152144) 0.115510 / 0.424275 (-0.308765) 0.011663 / 0.007607 (0.004055) 0.626450 / 0.226044 (0.400406) 6.238147 / 2.268929 (3.969218) 2.603070 / 55.444624 (-52.841555) 2.030378 / 6.876477 (-4.846099) 1.996883 / 2.142072 (-0.145190) 1.206436 / 4.805227 (-3.598792) 0.203018 / 6.500664 (-6.297646) 0.060550 / 0.075469 (-0.014919)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.259850 / 1.841788 (-0.581937) 14.079936 / 8.074308 (6.005628) 16.036329 / 10.191392 (5.844937) 0.221546 / 0.680424 (-0.458878) 0.042416 / 0.534201 (-0.491785) 0.438851 / 0.579283 (-0.140432) 0.507053 / 0.434364 (0.072689) 0.518672 / 0.540337 (-0.021665) 0.585278 / 1.386936 (-0.801659)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010718 / 0.011353 (-0.000635) 0.005469 / 0.011008 (-0.005539) 0.075624 / 0.038508 (0.037116) 0.029103 / 0.023109 (0.005994) 0.353294 / 0.275898 (0.077395) 0.353674 / 0.323480 (0.030194) 0.005678 / 0.007986 (-0.002308) 0.004610 / 0.004328 (0.000282) 0.075213 / 0.004250 (0.070963) 0.040032 / 0.037052 (0.002980) 0.344363 / 0.258489 (0.085874) 0.376861 / 0.293841 (0.083020) 0.043718 / 0.128546 (-0.084828) 0.016057 / 0.075646 (-0.059589) 0.087746 / 0.419271 (-0.331526) 0.051380 / 0.043533 (0.007848) 0.336904 / 0.255139 (0.081765) 0.357636 / 0.283200 (0.074436) 0.089425 / 0.141683 (-0.052258) 1.377462 / 1.452155 (-0.074692) 1.448844 / 1.492716 (-0.043872)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.259038 / 0.018006 (0.241031) 0.512284 / 0.000490 (0.511794) 0.005666 / 0.000200 (0.005466) 0.000123 / 0.000054 (0.000068)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.023669 / 0.037411 (-0.013742) 0.097979 / 0.014526 (0.083453) 0.117947 / 0.176557 (-0.058610) 0.140764 / 0.737135 (-0.596372) 0.114700 / 0.296338 (-0.181638)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.528844 / 0.215209 (0.313635) 5.073828 / 2.077655 (2.996173) 2.088738 / 1.504120 (0.584618) 1.855820 / 1.541195 (0.314626) 1.838639 / 1.468490 (0.370149) 0.968228 / 4.584777 (-3.616549) 4.589792 / 3.745712 (0.844079) 2.586149 / 5.269862 (-2.683712) 1.714241 / 4.565676 (-2.851435) 0.124502 / 0.424275 (-0.299774) 0.012115 / 0.007607 (0.004507) 0.679539 / 0.226044 (0.453494) 6.541335 / 2.268929 (4.272407) 2.749153 / 55.444624 (-52.695471) 2.124164 / 6.876477 (-4.752313) 2.181249 / 2.142072 (0.039177) 1.196846 / 4.805227 (-3.608381) 0.213352 / 6.500664 (-6.287312) 0.075021 / 0.075469 (-0.000448)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.254301 / 1.841788 (-0.587487) 14.494254 / 8.074308 (6.419946) 16.619679 / 10.191392 (6.428287) 0.205158 / 0.680424 (-0.475266) 0.022181 / 0.534201 (-0.512019) 0.422928 / 0.579283 (-0.156355) 0.539825 / 0.434364 (0.105461) 0.523165 / 0.540337 (-0.017173) 0.615014 / 1.386936 (-0.771922)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks !

@lhoestq lhoestq merged commit 7cfac43 into main Feb 2, 2023
@lhoestq lhoestq deleted the osanseviero-patch-1 branch February 2, 2023 11:17
@github-actions
Copy link

github-actions bot commented Feb 2, 2023

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.011522 / 0.011353 (0.000169) 0.006906 / 0.011008 (-0.004102) 0.114692 / 0.038508 (0.076184) 0.037686 / 0.023109 (0.014577) 0.393662 / 0.275898 (0.117764) 0.377730 / 0.323480 (0.054250) 0.008212 / 0.007986 (0.000226) 0.005470 / 0.004328 (0.001142) 0.086962 / 0.004250 (0.082712) 0.039085 / 0.037052 (0.002033) 0.357565 / 0.258489 (0.099076) 0.404384 / 0.293841 (0.110543) 0.055523 / 0.128546 (-0.073023) 0.018277 / 0.075646 (-0.057369) 0.389812 / 0.419271 (-0.029459) 0.058706 / 0.043533 (0.015173) 0.344735 / 0.255139 (0.089597) 0.395734 / 0.283200 (0.112535) 0.096098 / 0.141683 (-0.045584) 1.546654 / 1.452155 (0.094499) 1.665314 / 1.492716 (0.172597)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.255893 / 0.018006 (0.237887) 0.589563 / 0.000490 (0.589074) 0.005890 / 0.000200 (0.005690) 0.000123 / 0.000054 (0.000069)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.029167 / 0.037411 (-0.008245) 0.113561 / 0.014526 (0.099036) 0.125361 / 0.176557 (-0.051195) 0.182225 / 0.737135 (-0.554910) 0.125147 / 0.296338 (-0.171192)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.596859 / 0.215209 (0.381650) 5.797725 / 2.077655 (3.720071) 2.238420 / 1.504120 (0.734300) 1.933177 / 1.541195 (0.391982) 2.030750 / 1.468490 (0.562260) 1.122655 / 4.584777 (-3.462122) 5.247913 / 3.745712 (1.502201) 2.792742 / 5.269862 (-2.477120) 1.861487 / 4.565676 (-2.704190) 0.133009 / 0.424275 (-0.291266) 0.013219 / 0.007607 (0.005612) 0.696905 / 0.226044 (0.470861) 6.961298 / 2.268929 (4.692369) 2.895352 / 55.444624 (-52.549273) 2.353677 / 6.876477 (-4.522799) 2.458804 / 2.142072 (0.316731) 1.271905 / 4.805227 (-3.533322) 0.224850 / 6.500664 (-6.275814) 0.083773 / 0.075469 (0.008304)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.502425 / 1.841788 (-0.339363) 16.959241 / 8.074308 (8.884933) 19.865569 / 10.191392 (9.674177) 0.228608 / 0.680424 (-0.451816) 0.044035 / 0.534201 (-0.490166) 0.545172 / 0.579283 (-0.034112) 0.677193 / 0.434364 (0.242829) 0.608988 / 0.540337 (0.068650) 0.719210 / 1.386936 (-0.667726)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.008297 / 0.011353 (-0.003056) 0.005729 / 0.011008 (-0.005280) 0.084762 / 0.038508 (0.046254) 0.030622 / 0.023109 (0.007512) 0.408017 / 0.275898 (0.132119) 0.432114 / 0.323480 (0.108634) 0.006965 / 0.007986 (-0.001021) 0.004830 / 0.004328 (0.000502) 0.087375 / 0.004250 (0.083124) 0.048110 / 0.037052 (0.011058) 0.414978 / 0.258489 (0.156489) 0.446136 / 0.293841 (0.152295) 0.064351 / 0.128546 (-0.064195) 0.018273 / 0.075646 (-0.057374) 0.114853 / 0.419271 (-0.304418) 0.056962 / 0.043533 (0.013429) 0.427791 / 0.255139 (0.172652) 0.428829 / 0.283200 (0.145629) 0.108004 / 0.141683 (-0.033679) 1.639285 / 1.452155 (0.187130) 1.652106 / 1.492716 (0.159390)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.359744 / 0.018006 (0.341738) 0.596060 / 0.000490 (0.595570) 0.025448 / 0.000200 (0.025248) 0.000158 / 0.000054 (0.000104)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.026348 / 0.037411 (-0.011064) 0.119153 / 0.014526 (0.104628) 0.129304 / 0.176557 (-0.047253) 0.195670 / 0.737135 (-0.541465) 0.135559 / 0.296338 (-0.160780)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.588963 / 0.215209 (0.373754) 5.682957 / 2.077655 (3.605302) 2.380178 / 1.504120 (0.876059) 2.131299 / 1.541195 (0.590104) 2.167839 / 1.468490 (0.699349) 1.126418 / 4.584777 (-3.458359) 5.289104 / 3.745712 (1.543392) 2.952128 / 5.269862 (-2.317734) 1.922974 / 4.565676 (-2.642702) 0.143874 / 0.424275 (-0.280401) 0.015399 / 0.007607 (0.007792) 0.815675 / 0.226044 (0.589631) 7.320146 / 2.268929 (5.051217) 3.453670 / 55.444624 (-51.990954) 2.579133 / 6.876477 (-4.297344) 2.532331 / 2.142072 (0.390258) 1.345881 / 4.805227 (-3.459347) 0.242448 / 6.500664 (-6.258216) 0.070007 / 0.075469 (-0.005462)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.433173 / 1.841788 (-0.408614) 17.127287 / 8.074308 (9.052979) 17.953878 / 10.191392 (7.762486) 0.220035 / 0.680424 (-0.460389) 0.028660 / 0.534201 (-0.505541) 0.496233 / 0.579283 (-0.083050) 0.591587 / 0.434364 (0.157223) 0.635204 / 0.540337 (0.094867) 0.702143 / 1.386936 (-0.684793)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants