Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

segmentation fault in Java binding's Module_Start #275

Closed
yfakariya opened this issue May 21, 2017 · 5 comments
Closed

segmentation fault in Java binding's Module_Start #275

yfakariya opened this issue May 21, 2017 · 5 comments
Assignees
Labels

Comments

@yfakariya
Copy link
Contributor

yfakariya commented May 21, 2017

Hi team,
thank you for great work and quick response!

I saw segmentation fault (or access violation in Windows) from Gateway_CreateFromJson. The debugger said that the exception had been thrown from jvm.dll which had been called https://github.com/Azure/iot-edge/blob/master/bindings/java/src/java_module_host.c#

Environment

Additional information:

  • My code calls Broker#publish() and wait for "response message" for request-response style messaging for initialization in GatewayModule#start().
  • When I commented out Broker#publish() call from GatewayModule#start(), the segfault looked disappear.
  • As of our logging code in Java, GatewayModule#start() finished successfully.
  • Another module sent messages to the Java module continuously.
  • The debugger said that the invalid address is almost '0x270' and '0xFFFFFFFFFFFFFFFF` once.
    • I failed to check call stack when I saw '0xFFFFFFFFFFFFFFFF`.

Hypothesis

As far as I look, I found similar error in Xamarin Android, it is race condition of JNIEnv*.
So, my hypothesis of this issue is race condition of env field of MODULE_JAVA_DATA as following:

  1. The java host module initializes JAVA_MODULE_HANDLE_DATA as module data in this line.
  2. The module_worker created from the broker, passing module_info in this line.
  3. The JavaModuleHost_Start calls JNI's AttachCurrentThread and stores JNIEnv* in the field of module argument in this line. I will call this thread as "main-thread".
  4. The module_worker calles JavaModuleHost_Receive simultaneously, and it also calls JNI's AttachCurrentThread and then stores JNIEnv* in the field of module argument in this line. This step overwrites JNIEnv* for "main-thread". I will call this thread as "worker-thread"
  5. The GatewayModule#start() is finished, then ``JavaModuleHost_Startcalls JNI'sExceptionOccurred` with overwritten. It causes segmentation fault.
    • If the "worker-thread" already called DetachCurrentThread, it is reasonable the env field was invalidated with jvm as -1L.

So I think that JNIEnv* should be stored in the local variable instead of "effectively global" variable in the struct, ant it may solve this problem. In fact, it just works in our testing environment.

Any idea?

Thank you for always.

@ancaantochi
Copy link
Contributor

Thank you for the detailed issue description. I was able to reproduce it when Start takes longer to complete and the module receives messages from a different module. It is recommended that Start should not hang and begin long running tasks in a different thread.

Can you please confirm that it works if in Start you create a new thread which calls publish as in the Sensor sample?

@damonbarry damonbarry added the bug label May 23, 2017
@yfakariya
Copy link
Contributor Author

Thank you for reply. Unfortunately, we have to wait for response from another module to receive the module's initial state stored in cloud's device twin previously.

@ancaantochi
Copy link
Contributor

Just to make sure I understand you scenario: in Start you have to wait for another module response and based on the response you create a new thread that publishes messages?

You mentioned that it works in your testing environment with local JNIEnv*, can you please create a pull request with the changes?

@yfakariya
Copy link
Contributor Author

I don't explicitly create a new thread in Java code.

(I'm convinced that creating new thread is better because it takes away module starting order dependency)

I will put the PR tonight(tomorrow in PDT).

yfakariya added a commit to yfakariya/azure-iot-gateway-sdk that referenced this issue May 26, 2017
This commit moves JNIEnv* variable from JAVA_MODULE_HANDLE_DATA structure to local variable.
The variable is assigned by JVM in AttachCurrentThread in both of start and receive functions,
it causes race condition of the variable, then segmentation fault is occurred in ExceptionOccurred JNI function.
So, the JNIEnv* variable should be local instead of a part of JAVA_MODULE_HANDLE_DATA.
@ancaantochi
Copy link
Contributor

Thank you for your contribution!

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants