-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
BUG: read_csv with date_parser lock file open on failure #15302
Comments
We had a very similar issue, can't seem to find this now. But on windows, this is fixed I think in 0.19.2, but maybe 0.20.0. (still in dev) can you try? |
I tried with 0.19.2 and the problem remains. When 0.20.0 becomes available on one of my systems I will give that a try. |
I thought this was solved, but was able to repro on windows. So it seems that we are not closing on calling a @rsheftel would you like to do a pull-request to fix? |
I don't think I am enough of an expert in the pandas code to submit a change that fixes this. Do you want me to just create a blank pull-request? (Sorry I'm new to the GitHub / collaborative world and how exactly it works) |
no this would be a regular pull-request that has tests and makes the change. http://pandas.pydata.org/pandas-docs/stable/contributing.html |
If I gain expertise in the code base and feel I can confidently contribute I will. Thanks. |
I looked into this with the help of @faizanv and finally got to the bottom of what's happening in pandas, but I'm still trying to figure out what's happening in the C API. The proximate cause of this issue is that when we're reading a standard CSV file from a user provided path, we open a file handle in C (using If there's no exception (or we catch the exception) the file handle is closed by a call to It's not clear to me why it gets called when we catch the exception versus when we don't. |
@jreback The issue is that Python knows nothing about any calls to |
We do something like this: try:
rows = parser.read()
finally:
parser.close() but we don't track the file handle of files opened by the |
you could try explicitly closing IN the actual parser when the .close method is called (ios do cleanup), but would prob have to set a flag so that you don't do it again in dealloc |
:) Funny you mention that. That is my current working solution |
I'm quite sure this should work with 1.3.2, but I'm not sure whether the following is a sufficient test for this: PYTHONWARNINGS="default" python test.py test.py: from io import StringIO
import pandas as pd
def strict_parser(dates):
assert False
with StringIO("a,b,c,datetime") as file:
pd.read_csv(
file,
parse_dates=["datetime"],
index_col=["datetime"],
date_parser=strict_parser,
) This should print a |
Cannot reproduce anymore with v1.5.0.dev0 on Linux.
|
take |
take |
Issue Description: Here what I observed is that when we are using the Proposed Solution: To handle this issue gracefully, we can modify the def strict_parser(dates):
try:
# Attempt to parse with timezone offset
datetimes = [pd.Timestamp(datetime.datetime.strptime(date, '%Y-%m-%d %H:%M:%S%z'), tz='UTC') for date in dates]
return pd.DatetimeIndex(datetimes)
except ValueError:
# If parsing with timezone offset fails, try parsing without timezone offset
datetimes = [pd.Timestamp(datetime.datetime.strptime(date, '%Y-%m-%d %H:%M:%S'), tz='UTC') for date in dates]
return pd.DatetimeIndex(datetimes) |
|
Problem description
When using the date_parser functionality of read_csv() if the file read fails then the file is left locked open. My use case is that I am trying to enforce a strict datetime format that must include the time zone offset. Another thread stated that the way to accomplish this is with date_parser. My issue is that I would like to move a file that fails being loaded to another directory, but I cannot because the failure in the date_parser keeps the file open until the python session is terminated.
Code Sample, a copy-pastable example if possible
The file data.csv is:
datetime,data
2010-05-05 09:30:00-0500,10
2010-05-05 09:35:00-0500,20
2010-05-05 09:40:00,30
Output of
pd.show_versions()
pandas: 0.19.1
The text was updated successfully, but these errors were encountered: