Skip to content

BUG: error in handling a sqlalchemy type with arguments (instantiated type, not class) #9083

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
pilotstew opened this issue Dec 15, 2014 · 13 comments · Fixed by #9138
Closed
Labels
Bug IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@pilotstew
Copy link

I'm trying to use DataFrame().to_sql to input a time aware dataframe series. Here is an example of my code.

times = ['201412120154', '201412110254']

df = pd.DataFrame()
df['time'] = pd.to_datetime(times, utc=True)

df.time.to_sql('test', engine, dtype={'time': sqlalchemy.TIMESTAMP(timezone=True)})

The error I recieve is:

TypeError: issubclass() arg 1 must be a class

The following code works but obviously results in a postgresql column that is not timezone aware.

times = ['201412120154', '201412110254']

df = pd.DataFrame()
df['time'] = pd.to_datetime(times, utc=True)

df.time.to_sql('test', engine, dtype={'time': sqlalchemy.TIMESTAMP})

I'm using python 2.7, pandas 0.15.2, postsgresql 9.3 and SQLAlchemy 0.9.7. This same issue also occurs with sqlalchemy.DATETIME(timezone=True) vs sqlalchemy.DATETIME

Full Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-161-ec79a553e6d0> in <module>()
----> 1 df.time.to_sql('test', w.engine, dtype={'time': sqlalchemy.TIMESTAMP(timezone=True)})

/home/stew/.pyenv/versions/p276/lib/python2.7/site-packages/pandas/core/generic.pyc in to_sql(self, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    964             self, name, con, flavor=flavor, schema=schema, if_exists=if_exists,   
    965             index=index, index_label=index_label, chunksize=chunksize,
--> 966             dtype=dtype)
    967 
    968     def to_pickle(self, path):

/home/stew/.pyenv/versions/p276/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(frame, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    536     pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
    537                       index_label=index_label, schema=schema,
--> 538                       chunksize=chunksize, dtype=dtype)
    539 
    540 

/home/stew/.pyenv/versions/p276/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype)
   1162             import sqlalchemy.sql.type_api as type_api
   1163             for col, my_type in dtype.items():
-> 1164                 if not issubclass(my_type, type_api.TypeEngine):
   1165                     raise ValueError('The type of %s is not a SQLAlchemy '
   1166                                      'type ' % col)

TypeError: issubclass() arg 1 must be a class
@jorisvandenbossche
Copy link
Member

@pilotstew Thanks for the report!

There appear to be a few problems:

  • first, the fact that the dtype argument does not work is an oversight. This feature was recently introduced (ENH: dtype costumization on to_sql (GH8778) #8926), and only tested with sqlalchemy types (as sqlalchemy.TIMESTAMP) without providing more arguments to it (as sqlalchemy.TIMESTAMP(timezone=True)). So this should be fixed.
  • second, this should work out of the box I think. There is even code for this (https://github.com/pydata/pandas/blob/v0.15.2/pandas/io/sql.py#L915), but this is not working because when you have a timezone, pandas now treats it as an object column of Timestamp objects (converted to TEXT), because datetime64 dtype does not support timezones.

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Dec 15, 2014
@jorisvandenbossche jorisvandenbossche added this to the 0.16.0 milestone Dec 15, 2014
@jorisvandenbossche
Copy link
Member

@pilotstew You can solve it for now using this monkey-patch:

from pandas.io.sql import SQLTable

def to_sql(self, frame, name, if_exists='fail', index=True,
           index_label=None, schema=None, chunksize=None, dtype=None):
    """
    patched version of https://github.com/pydata/pandas/blob/v0.15.2/pandas/io/sql.py#L1129
    """
    if dtype is not None:
        from sqlalchemy.types import to_instance, TypeEngine
        for col, my_type in dtype.items():
            if not isinstance(to_instance(my_type), TypeEngine):
                raise ValueError('The type of %s is not a SQLAlchemy '
                                 'type ' % col)

    table = SQLTable(name, self, frame=frame, index=index,
                     if_exists=if_exists, index_label=index_label,
                     schema=schema, dtype=dtype)
    table.create()
    table.insert(chunksize)
    # check for potentially case sensitivity issues (GH7815)
    if name not in self.engine.table_names(schema=schema or self.meta.schema):
        warnings.warn("The provided table name '{0}' is not found exactly "
                      "as such in the database after writing the table, "
                      "possibly due to case sensitivity issues. Consider "
                      "using lower case table names.".format(name), UserWarning)

pd.io.sql.SQLDatabase.to_sql = to_sql

Can you try that? This should let you be able to do df.time.to_sql('test', engine, dtype={'time': sqlalchemy.TIMESTAMP(timezone=True)})

@pilotstew
Copy link
Author

Worked Perfectly!

@jorisvandenbossche
Copy link
Member

@pilotstew Good to hear!
Did you also get the correct time zone in the database? (or are you in the same timezone as your data?) see #9086

@pilotstew
Copy link
Author

Actually, no I didn't. In pandas all timestamps were UTC and converted to local timezone in postgres. Since they are all uniform and converted correctly it's not a huge issue. I can convert back to UTC on the read_sql.

@jorisvandenbossche
Copy link
Member

OK, yes that is what I expected and described in #9086. So this is then another enhacement to do in 'real' timezone support!

@jorisvandenbossche
Copy link
Member

I opened a couple of other issues that I encountered while looking into this:

Lets keep this issue for the actual dtype problem of only accepting classes, and not instances.

@jorisvandenbossche jorisvandenbossche changed the title Error in handling a sqlalchemy type with timezone=True BUG: error in handling a sqlalchemy type with arguments (instantiated type, not class) Dec 15, 2014
@al626
Copy link

al626 commented Jun 14, 2016

Hi, the same code above is causing errors again, albeit different ones:

import sqlalchemy
import pandas as pd

eng = sqlalchemy.create_engine("postgresql+psycopg2://******:********@localhost:5432/postgres")
times = ['201412120154', '201412110254']
df = pd.DataFrame()
df['time'] = pd.to_datetime(times, utc=True)
df.time.to_sql('test', eng, dtype={'time': sqlalchemy.TIMESTAMP}, if_exists='append')

With the traceback:

Traceback (most recent call last):
  File "./test.py", line 10, in <module>
    df.time.to_sql('test', eng, dtype={'time': sqlalchemy.TIMESTAMP}, if_exists='append')
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1165, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 571, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 1250, in to_sql
    table.insert(chunksize)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 748, in insert
    keys, data_list = self.insert_data()
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 729, in insert_data
    d = b.values.astype('M8[us]').astype(object)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.py", line 842, in astype
    raise ValueError('Cannot cast DatetimeIndex to dtype %s' % dtype)
ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]

I'm using pandas version 0.18.1 and sqlalchemy version 1.0.13.

It looks like pandas is trying to convert the datetime into a microsecond timestamp but in pandas/tseries/index.py it can only convert things into objects, strings, integers and nanosecond timestamps.

Cheers

@al626
Copy link

al626 commented Jun 14, 2016

Also not working in the latest build of pandas - 0.18.1+121.g62b4327

Traceback is similar:

Traceback (most recent call last):
  File "./test.py", line 10, in <module>
    df.time.to_sql('test', eng, dtype={'time': sqlalchemy.TIMESTAMP}, if_exists='append')
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1166, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/io/sql.py", line 571, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/io/sql.py", line 1250, in to_sql
    table.insert(chunksize)
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/io/sql.py", line 748, in insert
    keys, data_list = self.insert_data()
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/io/sql.py", line 729, in insert_data
    d = b.values.astype('M8[us]').astype(object)
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/tseries/index.py", line 847, in astype
    raise ValueError('Cannot cast DatetimeIndex to dtype %s' % dtype)
ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]

@jorisvandenbossche
Copy link
Member

@al626 This is indeed not yet working (but this is due to the timezone aware datetime column, not due to the dtype argument as the original issue here).
I thought there was already an issue about it, but can't directly find it.

@jorisvandenbossche
Copy link
Member

There is the general issue #9086 about time zones not being supported, however this issue is not really up to date (situation has changed because it is now possible to have datetime64 timezone aware columns).

@al626
Copy link

al626 commented Jun 14, 2016

Thanks, I'll keep track of that

@suntracks
Copy link

Would there be any update on using timestamps data with timezones info

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Bug IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants