One of our customers found a lot of errors contained this text:
Error -27776: [GENERAL_MSG_CAT_SSL_ERROR] connect to host "<hostname>" failed: [10054] Connection reset by peer
Customer asked, what does it mean?
We answered this is performance problem on his server because connection was dropped on his side.
Customer sad he checked logs and found nothing and asked us to check our side.
Well, this is an interesting challenge - to check carefully and, after investigation, fix the problem on our side (if we find anything) or prove to the customer that the problem is on his side.
Stage 1: preparing
To start an investigation we need to collect data.
Let's start with the script RunTimeSettings:
- There is problem with SSL, so we need to have not only an extended logs, but also print SSL information:
RunTimeSettings => Preferences => General => Check on checkbox "Print SSL information"
I have also enabled logging.
Pay attention: I chose not "Always", but "Log when error occurs..." and set the cache settings to 10 Kb - this means that in case of an error, the last 10 Kb of log events will be added to the log, even if these events do not contain errors.
I also disabled the "Data returned by server" checkbox, because this option will add a bunch of extra information that I don't need - because I already know that problem is at the connection stage, and this means that the server didn't send anything.
Next, I added the following parameters for this test:
-
Delayed start - because we need to change the Load Generator settings before the VUsers start working.
-
Collecting VUser logs from cloud-based load generator (if the customer uses OPLG, you need to collect Vuser logs from on-premise load generator)
Now we can start the test.
Wait for such a window (don't click anything!)
Before resuming the test, we need to connect to the Load Generator and install Wireshark (default settings are OK).
[https://www.wireshark.org/download.html]{.underline}
After installation and running Wireshark, we can capture the traffic, but it would be nice to collect only the traffic that we need. But to filtering traffic we need the IP-address of problematic server.
Check the logs (from previous test) to find host name:
534,2023-08-24T08:18:58:083 (-05:00),azure-us-east,no_emulation,Magnet_S01_Navigations_Debug (1),-27776,"vuser_init.c(115): Error -27776: [GENERAL_MSG_CAT_SSL_ERROR]connect to host ""[s1magnet.server.com]{.mark}"" failed: [10054] Connection reset by peer
Ok, we got this.
To find the IP-address, we will use following command in CMD:
A little note: Why did I use "nslookup" instead of "ping" ?
Because a server can have more than one IP-address and we need all of them.
Ping shows only one thing, but nslookup shows everything.
To filter traffic we need to do the following (the order of actions is important):
- Select the network adapter (you can select all or only one that is used to transmit traffic - as usual, it is easy to find it by wavy line of the graph).
- In the filter field, we have to write a filter for the host
If we have multiple IP-addresses, we can write them all using &&
- The capture starts by pressing the blue button with image of a fin.
Now we can Resume the test:
** **
Stage 2: Collecting information
After we have seen the errors we need in the LRC dashboard, we need to check if we have any information in Wireshark window.
If that's the case, well, we have the data that we need.
But if we stop the test right now, LG will be deleted along with the Wireshark dump.
So, first we need to prevent this.
In AWS it's a "termination lock", in Azure it is a "lock for delete" and I don't know about GCP.
In any case, please don't forget to remove this lock at the end of investigation.
Now we can stop the test and after that stop the Wireshark capturing.
Besides we have to export and download Logs from LRC:
Stage 3: Examination
We need 2 things:
-
Vuser logs
-
Wireshark dump
First, let's look for an error in the logs (note that this is the Vuser logs, not "Errors in CSV").
If you have a lot of log files, like me, it's convenient to use the Notepad++ "Find in files" option to search through all files at the same time:
-
In the "Find" window select the "Find in Files" tab.
-
In the field "Find what" enter the text that you want to find.
-
In the "Directory" field, paste the path to the folder containing the a files (subfolders are supported)
And after the completing the search, double-click on the result line to go to a specific log line in a specific log file.
Now we need to find the SessionID in the lines before the error (highlighted in yellow):
+-----------------------------------------------------------------------+ | ... | | | | vuser_init.c(551): t=887446ms: Connecting [4] to host | | 52.188.181.149:443 [MsgId: MMSG-26000] | | | | vuser_init.c(551): t=887519ms: Connected socket [4] from | | 10.0.0.12:16286 to 52.188.181.149:443 in 73 ms [MsgId: | | MMSG-26000] | | | | vuser_init.c(551): [SSL:] New SSL, socket=06BDD9D8 [4], | | connection=s1magnet.server.com:443, SSL=114D1FA0, ctx=0A399738, not | | reused, no session [MsgId: MMSG-26000] | | | | vuser_init.c(551): t=887519ms: Trying to set SNI with servername | | s1magnet.server.com [MsgId: MMSG-26000] | | | | vuser_init.c(551): t=887519ms: Setting SNI was succesful | | [MsgId: MMSG-26000] | | | | vuser_init.c(551): [SSL:] Received callback about handshake | | completion, connection=s1magnet.server.com:443, SSL=114D1FA0, | | ctx=0A399738, not reused, session address=11E37260, with no ID | | [MsgId: MMSG-26000] | | | | vuser_init.c(551): [SSL:] Received callback about handshake | | completion, connection=s1magnet.server.com:443, SSL=114D1FA0, | | ctx=0A399738, not reused, session address=11E37260, with no ID | | [MsgId: MMSG-26000] | | | | vuser_init.c(551): [SSL:] Received callback about handshake | | completion, connection=s1magnet.server.com:443, SSL=114D1FA0, | | ctx=0A399738, not reused, session address=11E37260, ID (length 32): | | 264C4906BBD0FD2B8E949C46A199243359AE99F811BC66D5B943A7FB72297443 | | [MsgId: MMSG-26000] | | | | vuser_init.c(551): [SSL:] X509 certificate error 19: self-signed | | certificate in certificate chain | | | | > for certificate "/C=US/O=DigiCert | | > Inc/OU=www.digicert.com/CN=DigiCert Global Root CA" | | > | | > depth=2, x509_ctx=0A1B0280, x509_ctx->ctx=, preverify_ok=0, | | > current_cert=11A3C410, subject_name=0A4B35C8 [MsgId: | | > MMSG-26000] | | | | vuser_init.c(551): [SSL:] Received callback about handshake | | completion, connection=s1magnet.server.com:443, SSL=114D1FA0, | | ctx=0A399738, not reused, session address=11E37260, ID (length 32): | | 264C4906BBD0FD2B8E949C46A199243359AE99F811BC66D5B943A7FB72297443 | | [MsgId: MMSG-26000] | | | | vuser_init.c(551): [SSL:] Received callback about handshake | | completion, connection=s1magnet.server.com:443, SSL=114D1FA0, | | ctx=0A399738, not reused, session address=11E37260, ID (length 32): | | 264C4906BBD0FD2B8E949C46A199243359AE99F811BC66D5B943A7FB72297443 | | [MsgId: MMSG-26000] | | | | vuser_init.c(551): [SSL:] Received callback about handshake | | completion, connection=s1magnet.server.com:443, SSL=114D1FA0, | | ctx=0A399738, not reused, session address=11E37260, ID (length 32): | | 264C4906BBD0FD2B8E949C46A199243359AE99F811BC66D5B943A7FB72297443 | | [MsgId: MMSG-26000] | | | | vuser_init.c(551): [SSL:] Received callback about handshake | | completion, connection=s1magnet.server.com:443, SSL=114D1FA0, | | ctx=0A399738, not reused, session address=11E37260, ID (length 32): | | [264 | | C4906BBD0FD2B8E949C46A199243359AE99F811BC66D5B943A7FB72297443]{.mark} | | [MsgId: MMSG-26000] | | | | vuser_init.c(551): ssl_handle_status encounter error : | | SSL_ERROR_SYSCALL, error id : 0 [MsgId: MMSG-26000] | | | | vuser_init.c(551): Error -27776: | | [[GENERAL_MSG_CAT_SSL_ERROR]{.mark}]connect to host | | "s1magnet.server.com" failed: [10054] Connection reset by peer | | | | This is probably because the SSL protocol version is not supported. | | Try switching the TLS/SSL version using "web_set_sockets_option" | | API. [MsgId: MERR-27776] [Time:2023-08-24 13:19:29] | | | | vuser_init.c(551): t=897888ms: Closed connection [4] to | | s1magnet.server.com:443 after completing 0 request(s) | | [MsgId: MMSG-26000] | | | | ... | +=======================================================================+ +-----------------------------------------------------------------------+
Now let's find it in Wireshark dump:
We can do this by simply doing a search (Ctrl+F, Paste SessionID, Find)
But it would be a VERY long time.
It is better to use right filter:
tls.handshake.session_id == 26:4C:49:06:BB:D0:FD:2B:8E:94:9C:46:A1:99:24:33:59:AE:99:F8:11:BC:66:D5:B9:43:A7:FB:72:29:74:43
+---+---------------------------------------------------------------------+
| | A short description: |
| | |
| | In this case, it doesn't matter, but it is useful for working with |
| | Wireshark: |
| | |
| | You can create new filters just from the interface (so I realized |
| | that SessionID should be written with a colon after each two |
| | characters): |
| | |
| | |
| | |
| | - Right-click on the field you want to use as a filter |
| | |
| | - Select "Prepare as Filter" |
| | |
| | - Select sub-option that you prefer |
| | |
| | A new filter will be created automatically |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
+===+=====================================================================+
+---+---------------------------------------------------------------------+
So, we found one packet related to our problem. Now we need to show all the packets related to this stream.
-
Right-click to the found line
-
Select "Follow"
-
Select "TLS Stream"
As a result, we have such a window:
First I want to tell you how it should be (below you can see an example of a successful TLS handshake between the load generator and server, which we recorded during the test run):
10.0.0.12 is the IP address of the load generator
52.188.181.149 is the IP of your server
the 3 lines in the end are the most interesting (highlighted in green):
In line #16 we sent information about the key exchanges and the selected cipher specifications
In line #17 your server sent a acknowledge for message in line 16
In line #18 your server sent the last unencrypted message "Change Cipher Spec, Encrypted Handshake Message", which means that the SSL Handshake is completed, and from the next message we can start sending the data on Application layer (http requests)
So, what happened when we encountered the error "Error -27776: [GENERAL_MSG_CAT_SSL_ERROR]connect to host "s1magnet.server.com" failed: [10054] Connection reset by peer":
What does this mean?
We should especially check the last 3 lines:
-
In line #12 we sent information about key exchanges and selected cipher specifications
-
In line #13 your server sent a acknowledge for message in line 12
So, instead of sending "Change Cipher Spec, Encrypted Handshake Message", your server waited 6 seconds.
- And after that, he sent a packet with the [RST, ACK] flag (line #14) which means that this connection should be terminated.
I suppose with that customer can go to network support team to search the problem.
PS: You can get more details about Handshake process from this article:
[https://www.moserware.com/2009/06/first-few-milliseconds-of-https.html]{.underline}