Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Loading Native Libraries as Stream in Multithreaded Environment #103

Open
amaidment opened this issue Apr 9, 2018 · 7 comments
Open

Loading Native Libraries as Stream in Multithreaded Environment #103

amaidment opened this issue Apr 9, 2018 · 7 comments
Assignees
Labels
accepted This sounds like something to look into.

Comments

@amaidment
Copy link

amaidment commented Apr 9, 2018

When initializing JBLAS in a heavily multi-threaded environment in Java 8, I'm finding that I get the following exception:

Caused by: java.lang.NullPointerException: Inflater has been closed
at java.util.zip.Inflater.ensureOpen(Inflater.java:389)
at java.util.zip.Inflater.inflate(Inflater.java:257)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.jblas.util.LibraryLoader.loadLibraryFromStream(LibraryLoader.java:261)
at org.jblas.util.LibraryLoader.loadLibrary(LibraryLoader.java:186)
at org.jblas.NativeBlasLibraryLoader.loadLibraryAndCheckErrors(NativeBlasLibraryLoader.java:32)
at org.jblas.NativeBlas.(NativeBlas.java:77)

My best guess at the moment it that that there's a concurrency bug in core Java 8, such that resources loaded as a stream can be closed by the same JarFile/ZipFile being closed elsewhere. For similar examples, see this issue in org.reflections.reflections (ronmamo/reflections#81), or this issue on StackOverflow (https://stackoverflow.com/questions/49697901/java-xml-resource-inputstream-being-closed/49717516#49717516)

I would suggest that, in LibraryLoader.tryPath(String path), you change from:

private InputStream tryPath(String path) {
Logger.getLogger().debug("Trying path "" + path + "".");
return this.getClass().getResourceAsStream(path);
}

to:

private InputStream tryPath(String path) {
Logger.getLogger().debug("Trying path "" + path + "".");
try {
URLConnection connection = this.getClass.getResource(path).openConnection();
connection.setUseCaches(false);
return connection.getInputStream();
} catch (IOException ex) {
throw new RuntimeException(ex);
}
}

@mikiobraun
Copy link
Member

Hi,

thanks for pointing this out. Did this fix the problem? If yes, can you open a pull request with these changes?

@amaidment
Copy link
Author

amaidment commented Apr 13, 2018 via email

@mikiobraun
Copy link
Member

Hey Anthony,

corporate firewalls so you cannot even clone from github!? Sounds awful! ;)

Alright, thanks for taking the time, I'll add the patch. Pretty little time to work on this right now, but I promise it'll eventually make it in!

Best,

Mikio

@lukehutch
Copy link

lukehutch commented Jul 8, 2018

The exception happens when you try sharing one ZipFile instance between different threads, and then each thread tries to separately close the same ZipFile (you can only close a ZipFile once), or one thread tries to close it while another is still trying to read from it. There is actually no sense sharing one ZipFile instance between threads anyway, since it imposes a synchronized lock around all its methods.

My library FastClasspathScanner uses one ZipFile instance per thread, and I never run into this issue, so it's not a JDK bug. It's a bug in how libraries are using the ZipFile API.

@mikiobraun
Copy link
Member

Hi lukehutch, thanks for clarifying!

@mikiobraun
Copy link
Member

Uh, by the way, just thinking. Wouldn't it be best to lock the part that loads the code because that is something that should be done only once anyway?

@lukehutch
Copy link

lukehutch commented Jul 9, 2018

@mikiobraun you're welcome :-) It took me a while to figure out how to parallelize access to a zipfile too, and this bug seems relatively common.

You're right that a class can be loaded only once, and the classloading system will make sure that a class is loaded into a given classloader at most once, through the class caching system, which has its own lock, and works fine in a multithreaded environment. My comment above applies only to ZipFile, not to classloading.

In summary, if you want multiple threads to read from the same zipfile in a scalable way, you must open one ZipFile instance per thread. That way, the per-thread lock in the ZipFile methods does not block all but one thread from reading from the zipfile at one time. It also means that when each thread closes the ZipFile after they're done reading, they close their own instance, not the shared instance, so you don't get the exception on the second and subsequent close.

Protip: if you really care about speed, you can get more performance by reading all the ZipEntry objects from the first ZipFile instance, and sharing them with all threads, to avoid duplicating work in reading ZipEntry objects for each thread separately. A ZipEntry object is not tied to a specific ZipFile instance per se, ZipEntry just records metadata that will work with any ZipFile object representing the same zipfile.

@mikiobraun mikiobraun self-assigned this Aug 25, 2020
@mikiobraun mikiobraun added the accepted This sounds like something to look into. label Aug 25, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
accepted This sounds like something to look into.
Projects
None yet
Development

No branches or pull requests

3 participants