Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[BUG] Regex stops parsing input when a null character is encountered #9440

Closed
andygrove opened this issue Oct 14, 2021 · 2 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Contributor

Describe the bug

cuDF regex does not match any characters that appear after \u0000 in the input string, which is different from the behavior in Python and Java.

Steps/Code to reproduce bug

Python

>>> print(re.compile('A').search("A\u0000B"))
<re.Match object; span=(0, 1), match='A'>
>>> print(re.compile('B').search("A\u0000B"))
<re.Match object; span=(2, 3), match='B'>

cuDF

>>> print(cudf.Series(['A\u0000B']).str.contains('A'))
0    True
dtype: bool
>>> print(cudf.Series(['A\u0000B']).str.contains('B'))
0    False

Expected behavior
I would expect the behavior to be consistent between Python and cuDF.

Environment overview (please complete the following information)

  • Local desktop (Ubuntu 20.04)\
  • cuDF 21.10 installed via conda
  • Python 3.7.10

Environment details

Click here to see environment details
 **git***
 commit 12b2a62bb64255028d2eb3b9d3046f5eb43b5779 (HEAD, dave/percentile_approx_followup)
 Author: Dave Baranec <dbaranec@nvidia.com>
 Date:   Thu Oct 7 17:03:25 2021 -0500
 
 Cleanup.
 **git submodules***
 
 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=20.04
 DISTRIB_CODENAME=focal
 DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
 NAME="Ubuntu"
 VERSION="20.04.3 LTS (Focal Fossa)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 20.04.3 LTS"
 VERSION_ID="20.04"
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 VERSION_CODENAME=focal
 UBUNTU_CODENAME=focal
 Linux ripper 5.11.0-36-generic #40~20.04.1-Ubuntu SMP Sat Sep 18 02:14:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
 
 ***GPU Information***
 Thu Oct 14 12:15:13 2021
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  GeForce RTX 3080    Off  | 00000000:42:00.0  On |                  N/A |
 | 30%   40C    P8    22W / 320W |   8934MiB / 10015MiB |     15%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 
 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |    0   N/A  N/A      1685      G   /usr/lib/xorg/Xorg                102MiB |
 |    0   N/A  N/A      3167      G   /usr/lib/xorg/Xorg                949MiB |
 |    0   N/A  N/A      3364      G   /usr/bin/gnome-shell              101MiB |
 |    0   N/A  N/A      3815      G   ...AAAAAAAAA= --shared-files        9MiB |
 |    0   N/A  N/A      4853      G   ./jetbrains-toolbox                18MiB |
 |    0   N/A  N/A    225427      G   ...AAAAAAAAA= --shared-files       41MiB |
 |    0   N/A  N/A    394811      G   gnome-control-center                3MiB |
 |    0   N/A  N/A    425545      G   /usr/lib/firefox/firefox          175MiB |
 |    0   N/A  N/A    965183      G   ...964638.log --shared-files       48MiB |
 |    0   N/A  N/A    972975      G   /usr/lib/firefox/firefox            3MiB |
 |    0   N/A  N/A    973555      G   /usr/lib/firefox/firefox            3MiB |
 |    0   N/A  N/A   1930063      G   /usr/lib/firefox/firefox            3MiB |
 |    0   N/A  N/A   3445651      C   ...-8-openjdk-amd64/bin/java     7447MiB |
 +-----------------------------------------------------------------------------+
 
 ***CPU***
 Architecture:                    x86_64
 CPU op-mode(s):                  32-bit, 64-bit
 Byte Order:                      Little Endian
 Address sizes:                   43 bits physical, 48 bits virtual
 CPU(s):                          48
 On-line CPU(s) list:             0-47
 Thread(s) per core:              2
 Core(s) per socket:              24
 Socket(s):                       1
 NUMA node(s):                    4
 Vendor ID:                       AuthenticAMD
 CPU family:                      23
 Model:                           8
 Model name:                      AMD Ryzen Threadripper 2970WX 24-Core Processor
 Stepping:                        2
 Frequency boost:                 enabled
 CPU MHz:                         2200.000
 CPU max MHz:                     3000.0000
 CPU min MHz:                     2200.0000
 BogoMIPS:                        5988.99
 Virtualization:                  AMD-V
 L1d cache:                       768 KiB
 L1i cache:                       1.5 MiB
 L2 cache:                        12 MiB
 L3 cache:                        64 MiB
 NUMA node0 CPU(s):               0-5,24-29
 NUMA node1 CPU(s):               12-17,36-41
 NUMA node2 CPU(s):               6-11,30-35
 NUMA node3 CPU(s):               18-23,42-47
 Vulnerability Itlb multihit:     Not affected
 Vulnerability L1tf:              Not affected
 Vulnerability Mds:               Not affected
 Vulnerability Meltdown:          Not affected
 Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
 Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
 Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
 Vulnerability Srbds:             Not affected
 Vulnerability Tsx async abort:   Not affected
 Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall sev_es fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
 
 ***CMake***
 /usr/bin/cmake
 cmake version 3.16.3
 
 CMake suite maintained and supported by Kitware (kitware.com/cmake).
 
 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
 Copyright (C) 2019 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 
 ***nvcc***
 /usr/local/cuda/bin/nvcc
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2021 NVIDIA Corporation
 Built on Sun_Feb_14_21:12:58_PST_2021
 Cuda compilation tools, release 11.2, V11.2.152
 Build cuda_11.2.r11.2/compiler.29618528_0
 
 ***Python***
 /home/andy/miniconda3/bin/python
 Python 3.9.5
 
 ***Environment Variables***
 PATH                            : /home/andy/miniconda3/bin:/home/andy/miniconda3/condabin:/home/andy/.cargo/bin:/home/andy/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/apache-maven-3.8.2/bin:mvnd-0.5.2-linux-amd64/bin:/usr/local/cuda/bin
 LD_LIBRARY_PATH                 : :/usr/local/cuda/targets/x86_64-linux/lib/:/usr/local/cuda/lib64
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /home/andy/miniconda3
 PYTHON_PATH                     :
 
 ***conda packages***
 /home/andy/miniconda3/bin/conda
 # packages in environment at /home/andy/miniconda3:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                        main
 _openmp_mutex             4.5                       1_gnu
 attrs                     21.2.0                   pypi_0    pypi
 awscli                    1.20.44                  pypi_0    pypi
 awscli-plugin-endpoint    0.4                      pypi_0    pypi
 botocore                  1.21.44                  pypi_0    pypi
 brotlipy                  0.7.0           py39h27cfd23_1003
 ca-certificates           2021.7.5             h06a4308_1
 certifi                   2021.5.30        py39h06a4308_0
 cffi                      1.14.6           py39h400218f_0
 chardet                   4.0.0           py39h06a4308_1003
 colorama                  0.4.3                    pypi_0    pypi
 conda                     4.10.3           py39h06a4308_0
 conda-package-handling    1.7.3            py39h27cfd23_1
 conda-standalone          4.9.0                h718eed5_1
 constructor               3.2.1            py39h06a4308_0
 cryptography              3.4.7            py39hd23ed53_0
 datafusion                0.2.0                    pypi_0    pypi
 docutils                  0.15.2                   pypi_0    pypi
 idna                      2.10               pyhd3eb1b0_0
 iniconfig                 1.1.1                    pypi_0    pypi
 jmespath                  0.10.0                   pypi_0    pypi
 ld_impl_linux-64          2.35.1               h7274673_9
 libffi                    3.3                  he6710b0_2
 libgcc-ng                 9.3.0               h5101ec6_17
 libgomp                   9.3.0               h5101ec6_17
 libstdcxx-ng              9.3.0               hd4cf53a_17
 ncurses                   6.2                  he6710b0_1
 numpy                     1.21.2                   pypi_0    pypi
 openssl                   1.1.1l               h7f8727e_0
 packaging                 21.0                     pypi_0    pypi
 pip                       21.1.3           py39h06a4308_0
 pluggy                    1.0.0                    pypi_0    pypi
 py                        1.10.0                   pypi_0    pypi
 pyarrow                   5.0.0                    pypi_0    pypi
 pyasn1                    0.4.8                    pypi_0    pypi
 pycosat                   0.6.3            py39h27cfd23_0
 pycparser                 2.20                       py_2
 pyopenssl                 20.0.1             pyhd3eb1b0_1
 pyparsing                 2.4.7                    pypi_0    pypi
 pysocks                   1.7.1            py39h06a4308_0
 pytest                    6.2.5                    pypi_0    pypi
 python                    3.9.5                h12debd9_4
 python-dateutil           2.8.2                    pypi_0    pypi
 pyyaml                    5.4.1                    pypi_0    pypi
 readline                  8.1                  h27cfd23_0
 requests                  2.25.1             pyhd3eb1b0_0
 rsa                       4.7.2                    pypi_0    pypi
 ruamel_yaml               0.15.100         py39h27cfd23_0
 s3transfer                0.5.0                    pypi_0    pypi
 setuptools                52.0.0           py39h06a4308_0
 six                       1.16.0             pyhd3eb1b0_0
 sqlite                    3.36.0               hc218d9a_0
 tk                        8.6.10               hbc83047_0
 toml                      0.10.2                   pypi_0    pypi
 tqdm                      4.61.2             pyhd3eb1b0_1
 tzdata                    2021a                h52ac0ba_0
 urllib3                   1.26.6             pyhd3eb1b0_1
 wheel                     0.36.2             pyhd3eb1b0_0
 xz                        5.2.5                h7b6447c_0
 yaml                      0.2.5                h7b6447c_0
 zlib                      1.2.11               h7b6447c_3

Additional context
This requirement is being driven by NVIDIA/spark-rapids#3797

@andygrove andygrove added bug Something isn't working Needs Triage Need team to review and classify labels Oct 14, 2021
@davidwendt
Copy link
Contributor

Perhaps a duplicate of #6196

@andygrove
Copy link
Contributor Author

Yes, this is a duplicate. Closing this one.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants