-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[BUG] read_csv fails to correctly handle misplaced quotes #2398
Comments
This is essentially the same problem as described in #873. There isn't really a "correct" way of handling misplaced quotes. For every example where pandas seemingly returns something more useful, you can construct an example where pandas returns the "wrong" data. We could decide for cuIO to match pandas' behavior, with all the good and bad side effects. Please take a look at the following to learn more how pandas behaves: |
@kkraus14 @ayushdg @OlivierNV what do you think we should with this issue? Should we aim to match Pandas, or not? |
The above situation seems like something that would be nice to handle, but in general we shouldn't try to handle every edge case / error case of Pandas here. We should do what's logical / what's the best end user experience. Alternatively, instead of returning the correct results as above, if we were able to clearly and loudly error saying "hey there was an unclosed quotation on line 123 at character 87, try setting |
IMO, we should try to match pandas if we can. The new csv implementation should be able to handle the case above when looking for |
As discussed in #873, these are edge cases where fixing them will break other (valid) use cases. The |
Describe the bug
Often csv files have misplaced quotes and sometime there is a quotation mark as a part of one of the string fields. This should not be interpreted as a quotation mark indicating that a field has delimiters in it and therefore uses
"
.Steps/Code to reproduce bug
Expected behavior
Workaround: Use cudf.read_csv with
quoting=3
. Pandas gives correct output for all quotation modes.Environment overview (please complete the following information)
docker pull rapidsai/rapidsai-nightly:0.9-cuda10.0-runtime-ubuntu16.04-gcc5-py3.7
Additional context
I might be wrong but maybe checking if opening quote exists just after delimiter and the second one before another delimiter might be the way? (Just a guess)
quoting.csv.zip
The text was updated successfully, but these errors were encountered: