Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Need an explicit way to signal when it is safe to destroy an endpoint #501

Open
Thalhammer opened this issue Jan 2, 2016 · 4 comments
Open

Comments

@Thalhammer
Copy link

I have a application that allows creating and removing new servers at runtime.
To do this I have a wrapper class that contains a instance of

websocketpp::server<websocketpp::config::asio>

In the constructor I initialize the server instance using this code:

_server.init_asio(&service);
auto os = Logger::GetOutputStream();
_log_out = os;
_server.get_alog().set_ostream(_log_out.get());
_server.get_elog().set_ostream(_log_out.get());

_server.set_message_handler(boost::bind(&ws_on_message, this, ::_1, ::_2));
_server.set_close_handler(boost::bind(&ws_on_close, this, ::_1));
_server.set_open_handler(boost::bind(&ws_on_open, this, ::_1));
_server.set_http_handler(boost::bind(&ws_on_http, this, ::_1));

...
_server.set_reuse_addr(reuse_addr);
_server.listen(config_port);
_server.start_accept();

Inside my destructor I stop listening and iterate over all connections calling close on them.
Afterwards I call stop on the server

_server.stop_listening();
{
    std::unique_lock<std::mutex> lck(_connections_mutex);
    for (auto& c : _connections)
    {
        _server.close(c.first, websocketpp::close::status::going_away, "Server shutdown");
    }
}
_server.stop();

However I have a problem.
Some time later when the server tries to write anything to the log it crashes while locking the mutex in logger/basic.hpp

void write(level channel, std::string const & msg) {
        scoped_lock_type lock(m_lock);            // <= here
        if (!this->dynamic_test(channel)) { return; }
        *m_out << "[" << timestamp << "] "
                  << "[" << names::channel_name(channel) << "] "
                  << msg << "\n";
        m_out->flush();
    }

I suppose that some operations are still running in the asio pool which try to access the already closed log file.
Is this the correct way to shutdown the server ?
How can I force the server to wait for all its asynchronous tasks to exit.
I can not shutdown the asio pool because it is shared by several tasks.
I'm not sure whether my code is wrong or not.

@zaphoyd
Copy link
Owner

zaphoyd commented Jan 31, 2016

Firstly, you'll want to get rid of the call to _server.stop(). That forcibly kills all Asio requests immediately (and may also kill everything else on your Asio pool) and does not allow the stop_listening and close calls to finish naturally.

To rephrase the issue then: In the case where you are using a shared Asio io_service pool where you can't simply wait until the asio pool is empty. When is it safe to let the server endpoint go out of scope?

I don't have a great answer immediately. A stopgap would be to simply wait at least as long as the close handshake timeout (default is 5 seconds, but you can configure that). Longer term, I will look at two changes:

  1. Pull the loggers out into a separate heap object managed by a smart pointer so that they stick around as long as needed rather than being deallocated when the server goes away.
  2. Have the endpoint method that blocks until endpoint shutdown tasks are complete.

@zaphoyd zaphoyd added the Bug label Jan 31, 2016
@zaphoyd zaphoyd changed the title Crash on server shutdown Need an explicit way to signal when it is safe to destroy an endpoint Jan 31, 2016
@jupp0r
Copy link
Contributor

jupp0r commented Mar 30, 2016

change number 1 is in pull #537

zaphoyd pushed a commit that referenced this issue Apr 6, 2016
adds related changelog entry and fix one overzealous find and replace
operation.
@zaphoyd
Copy link
Owner

zaphoyd commented Apr 6, 2016

@jupp0r's patch has been committed to develop. This should fix the crash when log entries are printed after an endpoint is deallocated. Any further testing from folks who were experiencing this problem would be appreciated.

@harrishancock
Copy link

Seems to work for me. I had an accept loop that would occasionally die for no discernible reason -- analyzing the problem led me to discover that my code was affected by this issue. I upgraded from websocketpp 0.7.0 to develop a month ago, and my accept loop doesn't die anymore.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants