Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ADS -1 Error #20

Closed
Satheesh-satz opened this issue Apr 25, 2016 · 8 comments
Closed

ADS -1 Error #20

Satheesh-satz opened this issue Apr 25, 2016 · 8 comments

Comments

@Satheesh-satz
Copy link

Satheesh-satz commented Apr 25, 2016

Hi Sven,

Sometimes i am getting an error -1 from the ADS library and i have no idea how to interpret the issue with the error code -1.

I am seeing this particular warning message before getting the error -1 sometimes in a loop when I tried to call the ads function “AdsSyncReadDeviceInfoReqEx” with 2 seconds of delay

Message:
“Warning: Port: 30000 already in use as xxxxx" and that comes from the function “AmsConnection::Reserve” which is called from “AmsConnection::Write”

AmsResponse* AmsConnection::Reserve(uint32_t id, uint16_t port)
{
uint32_t isFree = 0;
if (!queue[port - Router::PORT_BASE].invokeId.compare_exchange_weak(isFree, id)) {
LOG_WARN("Port: " << port << " already in use as " << isFree);
return nullptr;
}
return &queue[port - Router::PORT_BASE];
}

Thanks,
Satheesh

@pbruenn
Copy link
Member

pbruenn commented Jan 26, 2017

The -1 is emitted at line 104@AmsConnection.h, because AmsConnection::Reserve() detected there is still a request pending on your port.
So I think what happens is:
You send a first request asynchronously to your port. A response is still pending and you try to send a second request, which then fails, because Reserve() is detecting the race.
You could:

  1. ignore -1 and try again
  2. sync your calls
  3. use a pool of ports and manage access to them

@Satheesh-satz
Copy link
Author

Thanks for the reply,

I am did try the option 3. "use a pool of ports and manage access to them"

And tested the following test case with my code

  1. Connected to the PLC successfully
  2. In a loop continuously reading data using the two port
    i. One to read the PLC version info
    ii. Read the PLC data from multiple PLC index_offset
  3. Set the PLC to config mode
  4. This will lead to -1 error thrown from the ADS library
  5. Disconnect from the PLC when we get the -1 errror
  6. Set the PLC to run mode again
  7. Try to connect and read the data again (step 2)

But this is not recovering back and it is (code) failing all the time with the -1 error.

Could you please let me know the right procedure to the recover.

Thanks in advance.

@pbruenn
Copy link
Member

pbruenn commented Jan 31, 2017

I can't follow you. I don't think the -1 is related to config mode. I'm strongly convinced it is related to multiple threads accessing the same AmsPort.
To force -1 I would do something like:

auto first = std::async(std::launch::async, AdsSyncReadReqEx2, port, &server, 0x4020, 0, sizeof(buffer1), &buffer1, &bytesRead1);
auto second = std::async(std::launch::async, AdsSyncReadReqEx2, port, &server, 0x4020, 0, sizeof(buffer2), &buffer2, &bytesRead2);

There is a very high probability that second will return a -1. Unless first does win the race and returns before second has reached AmsConnection::Reserve. (or vice versa)
Recovering from -1 is easy. Either the timeout kills the pending request or the corresponding response arrives.

I tried to simulate your config mode issue with:

for (;;) {
           readExample(out, port, remote);
           std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}

But that looks just fine here:

// run mode
readExample():
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
// config mode
readExample():
ADS read failed with: 6
readExample():
ADS read failed with: 6
readExample():
ADS read failed with: 6
readExample():
ADS read failed with: 6
readExample():
ADS read failed with: 1795
// run mode, again
readExample():
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0
ADS read 4 bytes, value: 0x0

@dabrowne
Copy link

dabrowne commented Oct 3, 2017

Hi Patrick,

Thanks for your work on this excellent library. I'm a contributor to the pyads package which provides a Python wrapper around TcAdsDll.dll on Windows, and this library for Linux/Mac support.

We recently had an issue raised (stlehmann/pyads#35) with the same failure mode seen here, which I believe is caused by the use of compare_exchange_weak which you can read my comment on in the linked issue. I have solved the issue by replacing this call with compare_exchange_strong. I thought you might be interested in implementing this change upstream as it's a very simple, low risk solution.

Kind Regards,
David Browne

@pbruenn
Copy link
Member

pbruenn commented Oct 4, 2017

Hi David,

Thanks for your report, that's definitely an interesting reading and everything makes perfect sense.

I am just curious to reproduce this issue. Today I tried AdsSyncReadReqEx2() in an endless loop. But the highest rate I could achieve on one thread was 2500 requests per second and I never saw the issue during today.
David do you or @lexe and @Satheesh-satz have any other hint on how I might be able to reproduce this? On what kind of system(OS, CPU, network) do you run the AdsLib? What rate do you achieve?

I just want to have some test added to the test suites to have a nice before/after effect when merging the compare_exchange_weak/strong() patch.

Best regards,
Patrick

@dabrowne
Copy link

dabrowne commented Oct 5, 2017

Personally I haven't reproduced. I initially misinterpreted the screenshot by @lexe, the 16446 samples are actually happening over a 5 minute period so the sample rate will be lower than what you're achieving. He's running on a Raspberry Pi which will be ARM, so perhaps there's something architecture-specific going on?

@pbruenn
Copy link
Member

pbruenn commented Oct 5, 2017

That's it! ARM!
Today, I run the test on CX9020 (ARM based Embedded PC) and guess what? Immediately I got the issue confirmed. Replaced weak with strong and everything was fine.
I will prepare a patch for tomorrow.
Thank's a lot for hunting this down!

@dabrowne
Copy link

dabrowne commented Oct 6, 2017

Wow, that's a tricky one. Glad to be able to help.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants