Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Huge memory leak #323

Closed
droppy opened this issue Feb 15, 2014 · 10 comments
Closed

Huge memory leak #323

droppy opened this issue Feb 15, 2014 · 10 comments

Comments

@droppy
Copy link

droppy commented Feb 15, 2014

I tried to update old websocket library used in my project to the latest version from 'master' branch (SHA-1: ce2c1d6).
My app uses asio_tls config 'out of the box'. When the app starts it consumes ~170mb, but after 9000 total websocket client connects/disconnects it consumes ~860 mb.
I dumped app's memory to a file and viewed different parts using a binary viewer. It mostly consist of TLS certificates + JSON message bodies my app send to the clients + some binary garbage. I suppose connection_ptrs aren't being destroyed and the memory used is not freed.
My code doesn't store any connection_ptrs on its own. I use VIsual Studio x64 build mode using Intel Compiler 14.1, but I don't think this info is related to the issue.

@zaphoyd
Copy link
Owner

zaphoyd commented Feb 15, 2014

What do your message handler and tls_init_handler look like?

@droppy
Copy link
Author

droppy commented Feb 15, 2014

tls_init is almost a copy-paste from an echo_server_tls example:

context_ptr WebSocketServer::on_tls_init(websocketpp::connection_hdl /*hdl*/)
{
    context_ptr ctx(new boost::asio::ssl::context(boost::asio::ssl::context::tlsv1));
    try
    {
        ctx->set_options(boost::asio::ssl::context::default_workarounds |
            boost::asio::ssl::context::no_sslv2 |
            boost::asio::ssl::context::single_dh_use);
        ctx->use_certificate_chain_file(Config().GetCertChainFile());
        ctx->set_password_callback(std::bind(&WebSocketServer::get_password, this));
        ctx->use_private_key_file(Config().GetKeyFile(), boost::asio::ssl::context::pem);
    }
    catch (const std::exception& e)
    {
        g_log.log(L_ERROR, __FUNCTION__" %s", e.what());
    }
    return ctx;
}

void WebSocketServer::on_message(connection_hdl hdl, server::message_ptr msg)
{
    std::error_code ec;
    connection_ptr con = m_server.get_con_from_hdl(hdl, ec);
    if (ec)
        return;
    if (msg->get_opcode() == websocketpp::frame::opcode::text)
    {
        const std::string& response = HandleClientMsg(msg->get_payload());
        if (!response.empty())
            con->send(response, websocketpp::frame::opcode::text);
    }
}

@droppy droppy closed this as completed Feb 15, 2014
@droppy droppy reopened this Feb 15, 2014
@droppy
Copy link
Author

droppy commented Feb 15, 2014

oops, missclicked trying to find a 'code beautifyer'

@zaphoyd
Copy link
Owner

zaphoyd commented Feb 15, 2014

for code beautification, look at https://help.github.com/articles/github-flavored-markdown. I edited your comment to add a code block + syntax highlighting as an example.

WebSocket++ connections do not store the TLS certificate or the contents of individual messages. Looking at your handlers, can you confirm that the Config and HandleClientMsg objects/functions are being cleaned up properly?

@droppy
Copy link
Author

droppy commented Feb 15, 2014

Yes, Config is just an getter for a static entity, and HandleClientMsg is just a switch/case without any state tracking/storing.
As I said the problem occured only after I updated WebSocket++ library from the old version (I beleive May or June 2013 snapshot, it is hard to say for sure because I was not able to find something like version.h which would be very nice to have). When I rebuild server with the old version the problem is not appearing, but the new version has a lot of cool functionality I like very much.
The number of open system handles also goes up (from ~340 at the start to 5400 after 24 hours of uptime).
There is a log file part, maybe it can give a clue:

[2014-02-15 15:46:07] [fatal] error in handle_read_handshake: End of File
[2014-02-15 15:46:11] [fatal] error in handle_read_handshake: End of File
[2014-02-15 15:46:13] [error] got TLS short read, killing connection for now
[2014-02-15 15:46:27] [fatal] handle_transport_init received error: Pass through from underlying library
[2014-02-15 15:46:27] [error] got TLS short read, killing connection for now
[2014-02-15 15:46:44] [error] got TLS short read, killing connection for now
[2014-02-15 15:46:49] [fatal] handle_transport_init received error: Pass through from underlying library
[2014-02-15 15:46:54] [error] got TLS short read, killing connection for now
[2014-02-15 15:46:54] [info] asio async_write error: asio.ssl:336396495 (protocol is shutdown)
[2014-02-15 15:46:54] [fatal] error in handle_write_frame: Underlying Transport Error
[2014-02-15 15:46:56] [error] got TLS short read, killing connection for now
[2014-02-15 15:47:07] [fatal] handle_transport_init received error: TLS handshake timed out
[2014-02-15 15:47:16] [fatal] handle_transport_init received error: Pass through from underlying library
[2014-02-15 15:47:17] [fatal] handle_transport_init received error: Pass through from underlying library
...
[2014-02-15 15:59:40] [error] got TLS short read, killing connection for now
[2014-02-15 15:59:42] [error] got TLS short read, killing connection for now
[2014-02-15 15:59:45] [error] got TLS short read, killing connection for now
[2014-02-15 15:59:51] [fatal] handle_transport_init received error: Pass through from underlying library
[2014-02-15 15:59:53] [fatal] handle_transport_init received error: Pass through from underlying library
[2014-02-15 16:00:19] [fatal] error in handle_read_handshake: TLS Short Read
[2014-02-15 16:00:28] [fatal] error in handle_read_handshake: End of File
[2014-02-15 16:00:34] [error] got TLS short read, killing connection for now
[2014-02-15 16:00:35] [error] got TLS short read, killing connection for now
[2014-02-15 16:00:36] [error] got TLS short read, killing connection for now
[2014-02-15 16:00:41] [error] got TLS short read, killing connection for now

@droppy
Copy link
Author

droppy commented Feb 15, 2014

Additional logging experiments, I added global atomic_int32 which I increase in connection ctor and decrease in dtor (postfix increment is used so numbers are little odd)

[2014-02-15 16:59:44] [application] 0 connection ctor
[2014-02-15 16:59:52] [application] 1 connection ctor
[2014-02-15 16:59:52] [application] 2 connection ctor
[2014-02-15 16:59:54] [fatal] error in handle_read_handshake: End of File
[2014-02-15 16:59:54] [disconnect] Failed: End of File
[2014-02-15 16:59:54] [connect] WebSocket Connection [...]
[2014-02-15 17:00:03] [error] got TLS short read, killing connection for now
[2014-02-15 17:00:03] [disconnect] Disconnect close local:...
[2014-02-15 17:00:03] [application] 3 connection dtor
[2014-02-15 17:00:05] [application] 2 connection ctor
[2014-02-15 17:00:08] [application] 3 connection ctor
[2014-02-15 17:00:08] [fatal] handle_transport_init received error: ...
[2014-02-15 17:00:08] [disconnect] Failed: Pass through from underlying library
[2014-02-15 17:00:08] [application] 4 connection dtor

It seems strange that connection dtor wasn't called for 16:59:54 disconnect.

@zaphoyd
Copy link
Owner

zaphoyd commented Feb 15, 2014

those logs definitely look strange, I'll see what I can dig up

@AndriusA
Copy link

I thought I'd chip in on the issue with my experience: asio_tls configuration does seem to have this problem, but after a lot of trying to get it to work better I also tried the example boost ssl server/client (no websockets), and while not as bad as ws which naturally adds some overheads, it still ends up with ~600MB for 10k connections, so I don't think the issue is with websocketpp...

An option I found to work a lot better in terms of memory consumption was to use stud (https://github.com/bumptech/stud - a really rather thin layer on top of openssl) in front of non-tls websocket - I end up with ~257.5MB used by stud and another 93MB used by websocketpp (example echo server) for 7.5k connections. Re-running echo_server_tls with 7.5k connections gets me up to 780MB total.

@droppy
Copy link
Author

droppy commented Mar 2, 2014

Thanks for a hint for a good library. The issue with websocketpp library memory consumption is that a connection is not freed under some circumnstances. The one I'm already familiar with - no connection destructor is called when handle_read_handshake fails. This is log file with added 'connection dtor' logging to visualise the issue (app is build using the latest websocketpp from a trunk) :

[2014-03-02 09:00:25] [connect] asio con transport constructor
[2014-03-02 09:00:27] [connect] WebSocket Connection ....
[2014-03-02 09:00:28] [connect] asio con transport constructor
[2014-03-02 09:00:28] [connect] asio con transport constructor
[2014-03-02 09:00:29] [connect] asio con transport constructor
[2014-03-02 09:00:30] [connect] asio con transport constructor
[2014-03-02 09:00:30] [connect] asio con transport constructor
[2014-03-02 09:00:30] [connect] asio con transport constructor
[2014-03-02 09:00:31] [connect] WebSocket Connection ....
[2014-03-02 09:00:31] [fatal] error in handle_read_handshake: End of File
[2014-03-02 09:00:31] [disconnect] Failed: End of File
[2014-03-02 09:00:33] [error] handle_read_frame error: websocketpp.transport:8 (TLS Short Read)
[2014-03-02 09:00:33] [disconnect] Disconnect close local:[1006,TLS Short Read]remote:[1006]
[2014-03-02 09:00:33] [connect] asio con transport dtor
[2014-03-02 09:00:33] [info] asio async_shutdown error: asio.ssl:336130329 (decryption failed or bad record mac)
[2014-03-02 09:00:33] [disconnect] Disconnect close local:[1001] remote:[1001]
[2014-03-02 09:00:33] [connect] asio con transport dtor
[2014-03-02 09:00:33] [error] handle_read_frame error: websocketpp.transport:8 (TLS Short Read)
[2014-03-02 09:00:33] [disconnect] Disconnect close local:[1006,TLS Short Read] remote:[1006]
[2014-03-02 09:00:33] [connect] asio con transport dtor
[2014-03-02 09:00:34] [connect] asio con transport constructor
[2014-03-02 09:00:35] [fatal] error in handle_read_handshake: End of File
[2014-03-02 09:00:35] [disconnect] Failed: End of File
[2014-03-02 09:00:35] [connect] WebSocket Connection ....
[2014-03-02 09:00:35] [connect] WebSocket Connection ....
[2014-03-02 09:00:36] [error] handle_read_frame error: websocketpp.transport:8 (TLS Short Read)
[2014-03-02 09:00:36] [disconnect] Disconnect close local:[1006,TLS Short Read] remote:[1006]
[2014-03-02 09:00:36] [connect] asio con transport dtor
[2014-03-02 09:00:37] [connect] asio con transport constructor

As you can see there were two handle_read_handshake errors but connection destructors are not invoked. This causes memory and handles to be leaked slowly. My app consumes up to 2gb of memory after a 24 hours of uptime because of connections intensity. I had to schedule a daily service restart as a temporarily solution for the issue.

@zaphoyd
Copy link
Owner

zaphoyd commented Mar 3, 2014

I've reproduced and fixed the memory leak described by droppy here. Specifically the one that caused failed connections to never get destroyed. This would affect both regular and TLS versions, although TLS versions would leak more memory because TLS uses more memory per connection.

I'll continue looking into TLS related memory usage. I've found few articles about memory optimizations for asio's TLS client.

@zaphoyd zaphoyd added 0.3.x and removed Bug labels Mar 3, 2014
@zaphoyd zaphoyd closed this as completed Jan 31, 2016
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants