We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Describe the bug
cuDF regex does not match any characters that appear after \u0000 in the input string, which is different from the behavior in Python and Java.
\u0000
Steps/Code to reproduce bug
>>> print(re.compile('A').search("A\u0000B")) <re.Match object; span=(0, 1), match='A'> >>> print(re.compile('B').search("A\u0000B")) <re.Match object; span=(2, 3), match='B'>
>>> print(cudf.Series(['A\u0000B']).str.contains('A')) 0 True dtype: bool >>> print(cudf.Series(['A\u0000B']).str.contains('B')) 0 False
Expected behavior I would expect the behavior to be consistent between Python and cuDF.
Environment overview (please complete the following information)
Environment details
**git*** commit 12b2a62bb64255028d2eb3b9d3046f5eb43b5779 (HEAD, dave/percentile_approx_followup) Author: Dave Baranec <dbaranec@nvidia.com> Date: Thu Oct 7 17:03:25 2021 -0500 Cleanup. **git submodules*** ***OS Information*** DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS" NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.3 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal Linux ripper 5.11.0-36-generic #40~20.04.1-Ubuntu SMP Sat Sep 18 02:14:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux ***GPU Information*** Thu Oct 14 12:15:13 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 3080 Off | 00000000:42:00.0 On | N/A | | 30% 40C P8 22W / 320W | 8934MiB / 10015MiB | 15% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1685 G /usr/lib/xorg/Xorg 102MiB | | 0 N/A N/A 3167 G /usr/lib/xorg/Xorg 949MiB | | 0 N/A N/A 3364 G /usr/bin/gnome-shell 101MiB | | 0 N/A N/A 3815 G ...AAAAAAAAA= --shared-files 9MiB | | 0 N/A N/A 4853 G ./jetbrains-toolbox 18MiB | | 0 N/A N/A 225427 G ...AAAAAAAAA= --shared-files 41MiB | | 0 N/A N/A 394811 G gnome-control-center 3MiB | | 0 N/A N/A 425545 G /usr/lib/firefox/firefox 175MiB | | 0 N/A N/A 965183 G ...964638.log --shared-files 48MiB | | 0 N/A N/A 972975 G /usr/lib/firefox/firefox 3MiB | | 0 N/A N/A 973555 G /usr/lib/firefox/firefox 3MiB | | 0 N/A N/A 1930063 G /usr/lib/firefox/firefox 3MiB | | 0 N/A N/A 3445651 C ...-8-openjdk-amd64/bin/java 7447MiB | +-----------------------------------------------------------------------------+ ***CPU*** Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 43 bits physical, 48 bits virtual CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 NUMA node(s): 4 Vendor ID: AuthenticAMD CPU family: 23 Model: 8 Model name: AMD Ryzen Threadripper 2970WX 24-Core Processor Stepping: 2 Frequency boost: enabled CPU MHz: 2200.000 CPU max MHz: 3000.0000 CPU min MHz: 2200.0000 BogoMIPS: 5988.99 Virtualization: AMD-V L1d cache: 768 KiB L1i cache: 1.5 MiB L2 cache: 12 MiB L3 cache: 64 MiB NUMA node0 CPU(s): 0-5,24-29 NUMA node1 CPU(s): 12-17,36-41 NUMA node2 CPU(s): 6-11,30-35 NUMA node3 CPU(s): 18-23,42-47 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall sev_es fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca ***CMake*** /usr/bin/cmake cmake version 3.16.3 CMake suite maintained and supported by Kitware (kitware.com/cmake). ***g++*** /usr/bin/g++ g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ***nvcc*** /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_21:12:58_PST_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0 ***Python*** /home/andy/miniconda3/bin/python Python 3.9.5 ***Environment Variables*** PATH : /home/andy/miniconda3/bin:/home/andy/miniconda3/condabin:/home/andy/.cargo/bin:/home/andy/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/apache-maven-3.8.2/bin:mvnd-0.5.2-linux-amd64/bin:/usr/local/cuda/bin LD_LIBRARY_PATH : :/usr/local/cuda/targets/x86_64-linux/lib/:/usr/local/cuda/lib64 NUMBAPRO_NVVM : NUMBAPRO_LIBDEVICE : CONDA_PREFIX : /home/andy/miniconda3 PYTHON_PATH : ***conda packages*** /home/andy/miniconda3/bin/conda # packages in environment at /home/andy/miniconda3: # # Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 4.5 1_gnu attrs 21.2.0 pypi_0 pypi awscli 1.20.44 pypi_0 pypi awscli-plugin-endpoint 0.4 pypi_0 pypi botocore 1.21.44 pypi_0 pypi brotlipy 0.7.0 py39h27cfd23_1003 ca-certificates 2021.7.5 h06a4308_1 certifi 2021.5.30 py39h06a4308_0 cffi 1.14.6 py39h400218f_0 chardet 4.0.0 py39h06a4308_1003 colorama 0.4.3 pypi_0 pypi conda 4.10.3 py39h06a4308_0 conda-package-handling 1.7.3 py39h27cfd23_1 conda-standalone 4.9.0 h718eed5_1 constructor 3.2.1 py39h06a4308_0 cryptography 3.4.7 py39hd23ed53_0 datafusion 0.2.0 pypi_0 pypi docutils 0.15.2 pypi_0 pypi idna 2.10 pyhd3eb1b0_0 iniconfig 1.1.1 pypi_0 pypi jmespath 0.10.0 pypi_0 pypi ld_impl_linux-64 2.35.1 h7274673_9 libffi 3.3 he6710b0_2 libgcc-ng 9.3.0 h5101ec6_17 libgomp 9.3.0 h5101ec6_17 libstdcxx-ng 9.3.0 hd4cf53a_17 ncurses 6.2 he6710b0_1 numpy 1.21.2 pypi_0 pypi openssl 1.1.1l h7f8727e_0 packaging 21.0 pypi_0 pypi pip 21.1.3 py39h06a4308_0 pluggy 1.0.0 pypi_0 pypi py 1.10.0 pypi_0 pypi pyarrow 5.0.0 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pycosat 0.6.3 py39h27cfd23_0 pycparser 2.20 py_2 pyopenssl 20.0.1 pyhd3eb1b0_1 pyparsing 2.4.7 pypi_0 pypi pysocks 1.7.1 py39h06a4308_0 pytest 6.2.5 pypi_0 pypi python 3.9.5 h12debd9_4 python-dateutil 2.8.2 pypi_0 pypi pyyaml 5.4.1 pypi_0 pypi readline 8.1 h27cfd23_0 requests 2.25.1 pyhd3eb1b0_0 rsa 4.7.2 pypi_0 pypi ruamel_yaml 0.15.100 py39h27cfd23_0 s3transfer 0.5.0 pypi_0 pypi setuptools 52.0.0 py39h06a4308_0 six 1.16.0 pyhd3eb1b0_0 sqlite 3.36.0 hc218d9a_0 tk 8.6.10 hbc83047_0 toml 0.10.2 pypi_0 pypi tqdm 4.61.2 pyhd3eb1b0_1 tzdata 2021a h52ac0ba_0 urllib3 1.26.6 pyhd3eb1b0_1 wheel 0.36.2 pyhd3eb1b0_0 xz 5.2.5 h7b6447c_0 yaml 0.2.5 h7b6447c_0 zlib 1.2.11 h7b6447c_3
Additional context This requirement is being driven by NVIDIA/spark-rapids#3797
The text was updated successfully, but these errors were encountered:
Perhaps a duplicate of #6196
Sorry, something went wrong.
Yes, this is a duplicate. Closing this one.
davidwendt
No branches or pull requests
Describe the bug
cuDF regex does not match any characters that appear after
\u0000
in the input string, which is different from the behavior in Python and Java.Steps/Code to reproduce bug
Python
cuDF
Expected behavior
I would expect the behavior to be consistent between Python and cuDF.
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context
This requirement is being driven by NVIDIA/spark-rapids#3797
The text was updated successfully, but these errors were encountered: