Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

StartService RPC not ACKed on IVI boot #1783

Closed
ghost opened this issue Sep 18, 2020 · 7 comments · Fixed by #1951
Closed

StartService RPC not ACKed on IVI boot #1783

ghost opened this issue Sep 18, 2020 · 7 comments · Fixed by #1951
Assignees
Labels
bug A defect in the library protocol Relating to the protocol layer

Comments

@ghost
Copy link

ghost commented Sep 18, 2020

Bug Report

With sdl_ios 6.5.0 apps could properly start an RPC service and register if the phone is connected via USB before the IVI system turns on. With 6.7.0 it seems that this is not the case anymore.

I investigated code changes and saw a new class SDLLifecycleProtocolHandler which seems to be the successor of the SDLProxy. My guess is a race condition in SYNC's old Core version couldn't catch the bytes of the StartService frame and probably the SDLProxy class was not as fast in sending the frame.

The issue only appears on MY19 and earlier but not MY20 or later and only for apps which process name results in low delay calculation in the iAP transport class. It was reproducible with the name "Sygic" (~ 2 seconds delay) but not with "Waze" (~9 seconds delay).

Reproduction Steps
  1. Turn off test bench
  2. Connect iPhone via USB
  3. Start SDL Example
  4. Select iAP tab and tap Connect
  5. Turn on test bench
  6. After boot tap on Apps
Expected Behavior
  • The app tile should be visible on the bench
Observed Behavior
  • No app tile is visible
  • SDL Logs show "Start session timed out after 10.000000 seconds, closing the connection."
OS & Version Information
  • iOS Version: iOS 14
  • SDL iOS Version: 6.7.0
  • Testing Against: SDL Example
  • Bench: Ford TDK with SYNC3 MY19 (3.3.19052)
Test Case, Sample Code, and / or Example App

You will need to mod the sdl_retryDelay method to fake other process names.

@joeljfischer joeljfischer added bug A defect in the library protocol Relating to the protocol layer labels Sep 18, 2020
@theresalech
Copy link
Contributor

Hi @kshala-ford, thank you for reporting this issue. We have already completed development and started testing for the iOS 7.0 release, but will look to include a fix for this issue in the following release. If you are able to submit a PR to resolve this issue, please let us know once that is ready for review. Thanks!

@ghost
Copy link
Author

ghost commented Sep 24, 2020

@theresalech I have not been able to identify the root cause of the issue, whether the system isn't responding or the response got stuck in the library. I could provide a PR to retry starting the session, similar to how the lib retries connecting to the the transport.

@ghost
Copy link
Author

ghost commented Nov 3, 2020

First of all this issue is also visible with 7.0.0 RC3. This is not a surprise as the behavior hasn't changed. The release of 7.0.0 should not be blocked because this issue is no regression in 7.0.0 but goes back to 6.7.0. A possible solution however is to simply increase the timeout StartSessionTime in the lifecycle protocol handler to a higher value. The suggestion is to increase it to 20 seconds as tests showed that the head unit responds after 14 seconds. Older head units may have a similar issue running on weaker hardware. Therefore 20 seconds allows these apps to connect properly. The risk of increasing the timeout is very little. It's still within the iOS background time of 30 seconds and at the moment the app would just cancel the connect procedure on the timeout.

Suggested change in SDLLifecycleProtocolHandler.m

24  #import "SDLRPCMessage.h"
25  #import "SDLRPCMessageType.h"
26  #import "SDLTimer.h"
27
28 -static const float StartSessionTime = 10.0;
28 +static const float StartSessionTime = 20.0;
29
30  NS_ASSUME_NONNULL_BEGIN
31
32  @interface SDLLifecycleProtocolHandler ()

Testing with retry mechanism

I tested several options and before I start with the important findings I want to note that a retry mechanism to resend StartService(RPC) in the lifecycle protocol handler causes a lot of problems. The head unit queues all the StartService requests and eventually processes all basically at once. This causes the head unit to enter a state returning APPLICATION_NOT_REGISTERED or GENERIC_ERROR to any RPC even to ListFiles and SetAppIcon. The app registers but you won't see an app icon. Therefore retrying StartService(RPC) is not an option.

Testing with increased timeout

After several tests I found that on boot, the iAP accessory driver boots faster than SDL Core hence the library sends the bytes for StartService(RPC) but there's nothing waiting to receive these bytes. They remain in the head unit buffer until SDL Core starts.

I tested with a single app with transport based retryDelay disabled to eliminate race conditions by this delay.

- (void)sdl_accessoryConnected:(NSNotification *)notification {
    EAAccessory *newAccessory = notification.userInfo[EAAccessoryKey];
...
    [self sdl_connect:newAccessory];
    //[self performSelector:@selector(sdl_connect:) withObject:nil afterDelay:retryDelay];
}

Also I increased the timeout in the lifecycle protocol handler to 30 seconds

//static const float StartSessionTime = 10.0;
static const float StartSessionTime = 30.0;

I also added SDLLog calls here and there in the transport classes to help finding the root cause. I performed the repro steps and turned on the test bench (step 5) at ~ 13:02:45 timestamp. Below filtered logs show the important pieces of what's happening:

2020-11-03 13:02:50.259255+0100 SDL Example[845:347518] CoreAccessories accessoryInfo for arrival = {
2020-11-03 13:02:50.270443+0100 SDL Example[845:347518] eaAccessory CoreAccessoryPrimaryUUID = 70AA4FE3-D905-494E-9FD6-33165FECFC2C
2020-11-03 13:02:50.270557+0100 SDL Example[845:347518] On: 1426 Accessory Connected with UUID 70AA4FE3-D905-494E-9FD6-33165FECFC2C . RegisteredForLocalNotifications Count: 2
2020-11-03 13:02:50.270707+0100 SDL Example[845:347896] On: 1460 Accessory Connected. InternalNotPresentIniAPAccArray: 0
2020-11-03 13:02:50.270862+0100 SDL Example[845:347518] On: 1485 Posting EA DidConnect Notification for Accessory with UUID: 70AA4FE3-D905-494E-9FD6-33165FECFC2C
2020-11-03 13:02:50.271558+0100 SDL Example[845:347518] [Transport/IAP] Accessory Connected (<EAAccessory: 0x281c7c400> { 
2020-11-03 13:02:50.273157+0100 SDL Example[845:347518] [Transport/IAP] No data session in progress. Starting setup.
2020-11-03 13:02:50.273302+0100 SDL Example[845:347518] [Transport/IAP] Attempting to connect accessory named: SYNC 3, with connectionID: 37437685
2020-11-03 13:02:50.273968+0100 SDL Example[845:347518] [Transport/IAP] iAP data session init for accessory: <EAAccessory: 0x281c7c400> { 
2020-11-03 13:02:50.274525+0100 SDL Example[845:347518] [Transport/IAP] SDLIAPSession init with accessory:SYNC 3 for protocol:com.smartdevicelink.multisession
2020-11-03 13:02:50.274679+0100 SDL Example[845:347518] [Transport/IAP] Starting data session with accessory: SYNC 3, using protocol: com.smartdevicelink.multisession
2020-11-03 13:02:50.274801+0100 SDL Example[845:347518] [Transport/IAP] Opening EASession with accessory: SYNC 3
2020-11-03 13:02:50.277027+0100 SDL Example[845:347518] accessoryd sessionResult = 1, useSocketInterace = 1, sessionID = 0x47
2020-11-03 13:02:50.277153+0100 SDL Example[845:347518] socketPath from app = /var/mobile/Library/ExternalAccessory/ea.6778761824267866847
2020-11-03 13:02:50.278818+0100 SDL Example[845:348096] [Transport/IAP] Starting the accessory event loop on thread: com.smartdevicelink.iostream
2020-11-03 13:02:50.279026+0100 SDL Example[845:348096] [Transport/IAP] <EAInputStream: 0x282265200> -[SDLIAPDataSession sdl_streamDidOpen:]
2020-11-03 13:02:50.279147+0100 SDL Example[845:348096] [Transport/IAP] Data session input stream opened
2020-11-03 13:02:50.279330+0100 SDL Example[845:348096] [Transport/IAP] <EAOutputStream: 0x2821782d0> -[SDLIAPDataSession sdl_streamDidOpen:]
2020-11-03 13:02:50.279464+0100 SDL Example[845:348096] [Transport/IAP] Data session output stream opened
2020-11-03 13:02:50.279596+0100 SDL Example[845:348096] [Transport/IAP] Data session I/O streams opened for protocol: com.smartdevicelink.multisession
2020-11-03 13:02:50.279789+0100 SDL Example[845:347518] [Protocol] Transport connected, opening protocol
2020-11-03 13:02:50.279802+0100 SDL Example[845:348096] [Transport/IAP] <EAOutputStream: 0x2821782d0> -[SDLIAPDataSession sdl_streamHasSpaceToWrite:]
2020-11-03 13:02:50.279925+0100 SDL Example[845:347518] [Lifecycle] Transport opened, sending an RPC Start Service, and starting timer for RPC Start Service ACK to be received.
2020-11-03 13:02:50.280163+0100 SDL Example[845:347518] [Protocol] Sending start service: Version:3, encrypted:0, frameType:Control(0), serviceType:7, frameData:StartSession(1), sessionID:0, dataSize:32, messageID:0  Payload: 32 bytes.
2020-11-03 13:02:50.280313+0100 SDL Example[845:347518] [Transport/IAP] Adding 44 bytes to queue
2020-11-03 13:02:50.280446+0100 SDL Example[845:347518] [Transport/IAP] Calling IO thread to dequeue the data
2020-11-03 13:02:50.280690+0100 SDL Example[845:348096] [Transport/IAP] Dequeuing and writing to output stream
2020-11-03 13:02:50.280844+0100 SDL Example[845:348096] [Transport/IAP] TX(44 bytes): 300701000000002000000000200000000270726F746F636F6C56657273696F6E0006000000352E322E300000
2020-11-03 13:02:50.280985+0100 SDL Example[845:348096] [Transport/IAP] Output stream returned 44
2020-11-03 13:02:50.281249+0100 SDL Example[845:348096] [Transport/IAP] Dequeuing and writing to output stream
2020-11-03 13:02:50.281376+0100 SDL Example[845:348096] [Transport/IAP] Attempted to dequeue data but data queue is empty
2020-11-03 13:02:50.281600+0100 SDL Example[845:348096] [Transport/IAP] <EAOutputStream: 0x2821782d0> -[SDLIAPDataSession sdl_streamHasSpaceToWrite:]
2020-11-03 13:02:50.281717+0100 SDL Example[845:348096] [Transport/IAP] Dequeuing and writing to output stream
2020-11-03 13:02:50.281839+0100 SDL Example[845:348096] [Transport/IAP] Attempted to dequeue data but data queue is empty
2020-11-03 13:03:03.991718+0100 SDL Example[845:348096] [Transport/IAP] <EAInputStream: 0x282265200> -[SDLIAPDataSession sdl_streamHasBytesAvailable:]
2020-11-03 13:03:03.992432+0100 SDL Example[845:348096] [Transport/IAP] RX(16 bytes): 30070201000000040000000000010001
2020-11-03 13:03:03.993304+0100 SDL Example[845:348096] [Protocol] Received start service ACK: Version:3, encrypted:0, frameType:Control(0), serviceType:7, frameData:StartSessionACK(2), sessionID:1, dataSize:4, messageID:0  Payload: 4 bytes.
2020-11-03 13:03:03.993954+0100 SDL Example[845:348096] [Protocol] Storing SessionID 1 of serviceType 7
2020-11-03 13:03:03.994671+0100 SDL Example[845:348096] [Transport/Secondary] Received Start Service ACK header of RPC service on primary (<SDLIAPTransport: 0x2810eca50>) transport
Unexpected object length 16777472, data is only 4 bytes

The StartService ACK is received almost 14 seconds later.

@ghost
Copy link
Author

ghost commented Nov 23, 2020

@joeljfischer After some testing with 20 "SDL Example" apps I found that on various SYNC3 systems, all apps connect reliably and no app is missing in the app list. On MY18 versions all apps connected as well but it can take up to 10 minutes for app icons to appear. However, at the moment only a small amount of apps would connect with lots of them hit the timeout of 10 seconds. At last app resumption / apps connecting while SYNC boots are affected as well and would properly connect without the timeout.

My recommendation is to remove the timeout altogether and wait indefinitely for the head unit response. This is not a problem for the library considering the external accessory has the SDL specific protocol strings. Also disconnecting is not a problem either as deallocation of the lower layer is not affected in any way.

Edit: The issue #1847 is not needed and can be closed If we remove the timeout.

@ghost
Copy link
Author

ghost commented Jan 11, 2021

Following up on if removing the timeout altogether is an acceptable approach. Confidence on if the head unit is SDL capable is given due to the protocol string. Code wise it's the easiest solution with regards to reconnect (see #1847). Removing the timeout also solves #1795 based on recent connection time tests.

@joeljfischer
Copy link
Contributor

I think it could be an acceptable solution, though we'd have to ensure that other OEMs have time to test a branch / PR.

@jordynmackool
Copy link
Contributor

Steering Committee Update from 2021-03-23: Ford has proposed changes to resolve this issue that they have tested and have seen market improvements on. Ford to work to submit a pull request for this issue that Livio can review following the upcoming April releases.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug A defect in the library protocol Relating to the protocol layer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants