-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
to_sql function takes forever to insert in oracle database #14315
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
What database driver are you using? |
I used cx_Oracle driver to connect Both the databases are on same machine (I used a lubuntu Virtual Machine for this comparison), hence connection speed shouldn't be an issue ? |
@addresseerajat Can you have a look at the discussion in #8953 ? |
@jorisvandenbossche: I looked at the solution and tried using a similar approach. The relevant code is as follows:
The above line gives me an error:
My database version is oracle 11g. However when I execute the following command, I am able to insert into the database. The only problem: it takes a lot of time to insert.
|
Were there any other findings here? I've discovered that when pushing data into oracle using cx_oracle it's painfully slow. 10 rows can take 15 seconds to insert. The server we're using is decent (32GB of RAM and 8 core). |
我最近遇到了同样的问题。最后,我找到了解决问题的方法。 |
As mentioned by @wuhaochen I have also ran into this problem. For me the issue was that oracle was creating columns of CLOB data type for all the string columns of the pandas dataframe. I sped-up the code by explicitly setting the I think this should be the default behavior of |
Could you provide example for the varchar conversion? numbers always work quickly. thanks |
Sorry, the correct parameter of |
to_sql() is still practically broken when working with Oracle without using the workaround recommended above. |
It's not clear that there's a pandas specific fix for this issue so going to close |
I am using pandas to do some analysis on a excel file, and once that analysis is complete, I want to insert the resultant dataframe into a database. The size of this dataframe is around 300,000 rows and 27 columns.
I am using
pd.to_sql
method to insert dataframe in the database. When I use aMySQL
database, insertion in the database takes place around 60-90 seconds. However when I try to insert the same dataframe using the same function in anoracle
database, the process takes around 2-3 hours to complete.Relevant code can be found below:
I tried using different
chunk_size
s (from 50 to 3000), but the difference in time was only of the order of 10 minutes.Any solution to the above problem ?
The text was updated successfully, but these errors were encountered: