Skip to content
This repository has been archived by the owner on Jan 5, 2023. It is now read-only.

SQLAlchemy connection pool overflow and TimeoutError on catalog-next #513

Open
jbrown-xentity opened this issue Nov 18, 2020 · 3 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@jbrown-xentity
Copy link
Contributor

In doing load testing related to #449, the application was not responding correctly. A number of errors were encountered around serving standard traffic (/dataset, /organization, etc). New Relic implied that the issue was in SOLR, but the SOLR instances were not showing extreme load and the configuration seemed appropriate.

We found the following bug in the gunicorn log, and seems to imply an issue with sqlalchemy and/or the DB:

[2020-11-18 19:47:14 +0000] [25361] [ERROR] Error handling request /organization
Traceback (most recent call last):
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 56, in handle
    self.handle_request(listener_name, req, client, addr)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/gunicorn/workers/ggevent.py", line 160, in handle_request
    addr)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 107, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/newrelic/api/wsgi_application.py", line 680, in _nr_wsgi_application_wrapper_
    environ, _start_response, transaction)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/newrelic/api/wsgi_application.py", line 208, in __init__
    self.start_response)
  File "/usr/lib/ckan-new/src/ckan/ckan/config/middleware/__init__.py", line 210, in __call__
    return self.apps[app_name](environ, start_response)
  File "/usr/lib/ckan-new/src/ckan/ckan/config/middleware/common_middleware.py", line 96, in __call__
    return self.app(environ, start_response)
  File "/usr/lib/ckan-new/src/ckan/ckan/config/middleware/common_middleware.py", line 115, in __call__
    return self.app(environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/paste/cascade.py", line 130, in __call__
    return self.apps[-1](environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/paste/registry.py", line 379, in __call__
    app_iter = self.application(environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/repoze/who/middleware.py", line 86, in __call__
    app_iter = app(environ, wrapper.wrap_start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/pylons/middleware.py", line 214, in __call__
    self.app, new_environ, catch_exc_info=True)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/pylons/util.py", line 94, in call_wsgi_application
    app_iter = application(environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/weberror/errormiddleware.py", line 165, in __call__
    return self.application(environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/webob/dec.py", line 147, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/webob/dec.py", line 208, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/fanstatic/publisher.py", line 234, in __call__
    return request.get_response(self.app)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/webob/request.py", line 1053, in get_response
    application, catch_exc_info=False)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/webob/request.py", line 1022, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/webob/dec.py", line 147, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/webob/dec.py", line 208, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/fanstatic/injector.py", line 54, in __call__
    response = request.get_response(self.app)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/webob/request.py", line 1053, in get_response
    application, catch_exc_info=False)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/webob/request.py", line 1022, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/ckan-new/src/ckan/ckan/config/middleware/pylons_app.py", line 264, in inner
    result = application(environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/beaker/middleware.py", line 73, in __call__
    return self.app(environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/beaker/middleware.py", line 156, in __call__
    return self.wrap_app(environ, session_start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/routes/middleware.py", line 131, in __call__
    response = self.app(environ, start_response)
  File "/usr/lib/ckan-new/src/ckan/ckan/config/middleware/common_middleware.py", line 33, in __call__
    return self.app(environ, start_response)
  File "/usr/lib/ckan-new/src/ckan/ckan/config/middleware/common_middleware.py", line 59, in __call__
    return self.app(environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/newrelic/api/error_trace.py", line 59, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/pylons/wsgiapp.py", line 125, in __call__
    response = self.dispatch(controller, environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/pylons/wsgiapp.py", line 324, in dispatch
    return controller(environ, start_response)
  File "/usr/lib/ckan-new/src/ckan/ckan/lib/base.py", line 242, in __call__
    res = WSGIController.__call__(self, environ, start_response)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/newrelic/api/function_trace.py", line 163, in literal_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/newrelic/api/transaction_name.py", line 52, in dynamic_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/pylons/controllers/core.py", line 217, in __call__
    response = self._inspect_call(self.__before__)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/pylons/controllers/core.py", line 107, in _inspect_call
    result = self._perform_call(func, args)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/newrelic/hooks/framework_pylons.py", line 51, in __call__
    return self.__wrapped(*args, **kwargs)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/newrelic/api/function_trace.py", line 143, in dynamic_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/pylons/controllers/core.py", line 60, in _perform_call
    return func(**args)
  File "/usr/lib/ckan-new/src/ckan/ckan/lib/base.py", line 229, in __before__
    app_globals.app_globals._check_uptodate()
  File "/usr/lib/ckan-new/src/ckan/ckan/lib/app_globals.py", line 202, in _check_uptodate
    value = model.get_system_info('ckan.config_update')
  File "/usr/lib/ckan-new/src/ckan/ckan/model/system_info.py", line 59, in get_system_info
    obj = meta.Session.query(SystemInfo).filter_by(key=key).first()
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2755, in first
    ret = list(self[0:1])
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2547, in __getitem__
    return list(res)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2855, in __iter__
    return self._execute_and_instances(context)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2876, in _execute_and_instances
    close_with_result=True)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2885, in _get_bind_args
    **kw
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2867, in _connection_from_session
    conn = self.session.connection(**kw)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 998, in connection
    execution_options=execution_options)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1003, in _connection_for_bind
    engine, execution_options)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 403, in _connection_for_bind
    conn = bind.contextual_connect()
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2112, in contextual_connect
    self._wrap_pool_connect(self.pool.connect, None),
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect
    return fn()
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/pool.py", line 387, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/pool.py", line 766, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/pool.py", line 516, in checkout
    rec = pool._do_get()
  File "/usr/lib/ckan-new/lib/python2.7/site-packages/sqlalchemy/pool.py", line 1131, in _do_get
    (self.size(), self.overflow(), self._timeout))
TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30

How to reproduce

  1. Start the locust system to test catalog-next.data.gov system
  2. Observe logs from any catalog-next instance during test

Expected behavior

No overflow logs, traffic served is stable

Actual behavior

Overflow logs, traffic served is unstable/broken

@jbrown-xentity jbrown-xentity added the bug Something isn't working label Nov 18, 2020
@jbrown-xentity
Copy link
Contributor Author

Sent request for load level on DB from 1pm-3pm ET to FCS, will post data here if/when it is received...

@avdata99
Copy link
Contributor

CKAN starts sqlalchemy allowing send parameters from configuration (ref here).

@adborden adborden changed the title Load Testing Overloads CKAN SQLAlchemy connection pool overflow and TimeoutError on catalog-next Nov 20, 2020
@ghost
Copy link

ghost commented Nov 24, 2020

@avdata99 What's the estimate on this?

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants