-
Notifications
You must be signed in to change notification settings - Fork 258
segmentation fault in Java binding's Module_Start #275
Comments
Thank you for the detailed issue description. I was able to reproduce it when Start takes longer to complete and the module receives messages from a different module. It is recommended that Start should not hang and begin long running tasks in a different thread. Can you please confirm that it works if in Start you create a new thread which calls publish as in the Sensor sample? |
Thank you for reply. Unfortunately, we have to wait for response from another module to receive the module's initial state stored in cloud's device twin previously. |
Just to make sure I understand you scenario: in Start you have to wait for another module response and based on the response you create a new thread that publishes messages? You mentioned that it works in your testing environment with local JNIEnv*, can you please create a pull request with the changes? |
I don't explicitly create a new thread in Java code. (I'm convinced that creating new thread is better because it takes away module starting order dependency) I will put the PR tonight(tomorrow in PDT). |
This commit moves JNIEnv* variable from JAVA_MODULE_HANDLE_DATA structure to local variable. The variable is assigned by JVM in AttachCurrentThread in both of start and receive functions, it causes race condition of the variable, then segmentation fault is occurred in ExceptionOccurred JNI function. So, the JNIEnv* variable should be local instead of a part of JAVA_MODULE_HANDLE_DATA.
Thank you for your contribution! |
Hi team,
thank you for great work and quick response!
I saw segmentation fault (or access violation in Windows) from
Gateway_CreateFromJson
. The debugger said that the exception had been thrown from jvm.dll which had been called https://github.com/Azure/iot-edge/blob/master/bindings/java/src/java_module_host.c#Environment
runtime
folder to investigate.Additional information:
Broker#publish()
and wait for "response message" for request-response style messaging for initialization inGatewayModule#start()
.Broker#publish()
call fromGatewayModule#start()
, the segfault looked disappear.GatewayModule#start()
finished successfully.Hypothesis
As far as I look, I found similar error in Xamarin Android, it is race condition of
JNIEnv*
.So, my hypothesis of this issue is race condition of
env
field ofMODULE_JAVA_DATA
as following:JAVA_MODULE_HANDLE_DATA
as module data in this line.module_worker
created from the broker, passingmodule_info
in this line.JavaModuleHost_Start
calls JNI'sAttachCurrentThread
and storesJNIEnv*
in the field ofmodule
argument in this line. I will call this thread as "main-thread".module_worker
callesJavaModuleHost_Receive
simultaneously, and it also calls JNI'sAttachCurrentThread
and then storesJNIEnv*
in the field ofmodule
argument in this line. This step overwritesJNIEnv*
for "main-thread". I will call this thread as "worker-thread"GatewayModule#start()
is finished, then ``JavaModuleHost_Startcalls JNI's
ExceptionOccurred` with overwritten. It causes segmentation fault.DetachCurrentThread
, it is reasonable theenv
field was invalidated with jvm as-1L
.So I think that
JNIEnv*
should be stored in the local variable instead of "effectively global" variable in the struct, ant it may solve this problem. In fact, it just works in our testing environment.Any idea?
Thank you for always.
The text was updated successfully, but these errors were encountered: