Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add shutdown_socket() function for follow up of #505 #506

Open
wants to merge 13 commits into
base: refactor-only-comm-type
Choose a base branch
from

Conversation

Jakio815
Copy link
Collaborator

@Jakio815 Jakio815 commented Dec 20, 2024

This PR follow ups #505

New Integrated Function: shutdown_socket()

int shutdown_socket(int* socket, bool read_before_closing);

Arguments

  • socket: A pointer to the socket descriptor that needs to be shut down and closed.
  • read_before_closing: A boolean indicating whether the socket should read any remaining incoming data before closing:
    • true: Initiates a graceful shutdown by sending a FIN packet (SHUT_WR) and waits for EOF (0-length message) from the peer.
    • false: Immediately shuts down both reading and writing directions (SHUT_RDWR).

Return Values

  • 0: Indicates successful shutdown and closure of the socket.
  • -1: Indicates a failure during shutdown or closure, with errno set to describe the specific error.

Function Description
This function gracefully shuts down and closes a socket.

  • When read_before_closing is false:
    • The function calls shutdown(SHUT_RDWR) to immediately stop both sending and receiving data. Then calls close().
  • When read_before_closing is true:
    • The function calls shutdown(SHUT_WR) to signal the end of writing but allows reading to continue.
    • It waits for the peer to send an EOF (indicated by read() returning 0), and discards all received bytes.

Refactoring on close_inbound_socket() and close_outbound_socket() from federate.c

close_inbound_socket()

The original code looked not updated. There were no use case when the argument flag was not 1. I removed the flag argument, and all unused code.

close_outbound_socket()

The original flags were only set by this one line code.
int flag = _lf_normal_termination ? 1 : -1;
So it depends on _lf_normal_termination. However, this function can directly use _lf_normal_termination, so I removed the flag, and directly used _lf_normal_termination.

@@ -155,7 +155,7 @@ static int create_server(uint16_t port, int* final_socket, uint16_t* final_port,
return -1;
}
set_socket_timeout_option(socket_descriptor, &timeout_time);
int used_port = set_socket_bind_option(socket_descriptor, port, increment_port_on_retry);
uint16_t used_port = set_socket_bind_option(socket_descriptor, port, increment_port_on_retry);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fix on port's data type.

@@ -1943,9 +1918,11 @@ void lf_connect_to_rti(const char* hostname, int port) {

void lf_create_server(int specified_port) {
assert(specified_port <= UINT16_MAX && specified_port >= 0);
if (create_TCP_server(specified_port, &_fed.server_socket, (uint16_t*)&_fed.server_port, false)) {
uint16_t port;
if (create_TCP_server(specified_port, &_fed.server_socket, &port, false)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fix on port's data type.

@Jakio815 Jakio815 marked this pull request as ready for review December 21, 2024 01:18
@Jakio815 Jakio815 changed the title Draft: Add shutdown_socket() function. Add shutdown_socket() function. Dec 21, 2024
@Jakio815 Jakio815 changed the title Add shutdown_socket() function. Add shutdown_socket() function for follow up of #505 Dec 21, 2024
Copy link
Member

@hokeun hokeun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice refactoring. Just left minor suggestions.

@@ -1030,9 +1009,7 @@ void send_reject(int* socket_id, unsigned char error_code) {
lf_print_warning("RTI failed to write MSG_TYPE_REJECT message on the socket.");
}
// Close the socket.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: how about adding more information for this call? For example, "Close the socket without reading until EOF."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I fixed it as suggested.

@@ -1418,9 +1395,7 @@ void lf_connect_to_federates(int socket_descriptor) {
if (!authenticate_federate(&socket_id)) {
lf_print_warning("RTI failed to authenticate the incoming federate.");
// Close the socket.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I fixed it as suggested.

@@ -1488,8 +1463,7 @@ void* respond_to_erroneous_connections(void* nothing) {
lf_print_warning("RTI failed to write FEDERATION_ID_DOES_NOT_MATCH to erroneous incoming connection.");
}
// Close the socket.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I fixed it as suggested.

@Jakio815
Copy link
Collaborator Author

Jakio815 commented Dec 22, 2024

The macOS test is not passing for some unknown reason.

The CI test log seems to not work starting from Running: src/federated/DistributedCountPhysical.lf (28%), however if you run the federated tests on the local machine, it actually doesn't work starting from Dataflow.lf.

The reason for not passing is that the program blocks on the final shutdown phase.

First, this is how the LF code is intended to work.

  1. The federate sends a RESIGN signal.
  2. The RTI calls shutdown(socket, SHUT_WR), and then immediately calls read()
    • The shutdown() sends a FIN packet to the federate.
  3. The federate was blocked on read() to listen messages from the RTI.
    • The federate receives the FIN packet, and finishes reading the data in the socket buffer, and then returns EOF.
  4. The federate will also send a FIN packet and the RTI's read() will return 0.

This normally works on Linux, thus passing all tests. However, this fails on Mac.
Where it fails is exactly in step 3. The federate received a FIN packet, and the read() should return EOF, however it does not, and gets blocked forever.

I also checked that the FIN packet was properly sent, and the ACK from the federate to the RTI was also sent.

// netstat -ant | grep 15045
tcp4       0      0  127.0.0.1.15045        127.0.0.1.64842        FIN_WAIT_2
tcp4       0      0  127.0.0.1.64842        127.0.0.1.15045        CLOSE_WAIT

The RTI(15045) is in FIN_WAIT_2 state, which means that it sent the FIN, and received the ACK.
The federate(64842) is in CLOSE_WAIT which means that the remote side's(RTI) connection is closed, and waiting to connect its own connection.

This error can be reproduced by the lingua-franca's repo on branch refactor-only-comm-type and reactor-c branch on shutdown, and it should be done on a MacOS machine.

@hokeun @edwardalee Could anyone help investigating this problem?

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants