-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Resources not found when referencing external .jar files #51
Comments
So, what happened here.... everything worked, except it determined the assembly name to be "stanford.corenlp.dll", same as the non-models classifer. Thus resulting in exactly named DLLs. Thus resulting in one replacing the other. And we aren't incorporating the 'classifer' name into the DLL name anywhere, so that's expected. Wonder what we should do. |
I'm not sure we can do anything here. The -models.jar file does not contain an Automatic-Module-Name entry, and thus it has to be inferred from the jar name. And the jar name is such that the inferred name, according to the OpenJDK specifications, is stanford.opennlp@4.5.5-models. Which overlaps with the inferred automatic module name of the main JAR which is stanford.opennlp@4.5.5. Same name, different version. And of course we cannot allow a non-deterministic result for this, as it would mess up any future dependency hierachy. I think a bug needs to be opened with upstream to follow recommendations and define a module name explicitely. |
The classifier is just a string. In this case, there are other resource files that can also be added on, such as So, I imagine there needs to be special handling for the ones with special meanings, then the rest should be compiled into assemblies. IMO it makes more sense for them to be separate assemblies than the main jar, but not sure how well our classpath loader works between assemblies like that. As for the naming, why not simply tack the classifier on the end (cleaned of special characters, of course)? It is guaranteed to be unique because it is within the unique name of the jar. |
As for following the spec - this doesn't seem any different than how we compile satellite assemblies in .NET. Sometimes it makes sense to separate resources physically from code, especially for localization. |
These are not satellite assemblies. From IKVM's point of view, they are simply JARs. JARs become assemblies. Java has no separate concept of satellite anything: it's all on the class/module path. It is not really our choice as to what special handling can be added or not, as long as we stick to the JDK9+ specification. The algorithm is described on this page: https://docs.oracle.com/javase/9/docs/api/java/lang/module/ModuleFinder.html Our decision of using the JDK9+ module specification in the first place to determine assembly names, however, was our choice. But, some choice had to be made. And I'm not sure there was any other choice available that fulfilled the project goals. |
With guidance here: https://dev.java/learn/modules/automatic-module/
|
I should note, we had a similar situation with Apache Tika. They had some JAR file published which had an incorrect Automatic-Module-Name entry. It was a typo. They fixed it in 24 hours. But at least they bothered to include entries! |
So, you are saying this is the bug, and if they fix it, it works on our end? |
I would call it a bug. But it's not as clear cut as just being a defect. With JDK9, every JAR file publically published SHOULD have an automatic module name. But, IKVM doesn't implement much if anything from JDK9. But, we do pick this one bit, because we needed SOMETHING to deterministically predict a unique identifier for a JAR file within the Java ecosystem that we could piggyback on for assembly names. And this was available from JDK9 onward. Projects have since added it to their JAR files. Even for JDK8. So their JAR files can be considered modules when running on JDK9. They should have an interest in fixing it. I would expect them to. It has ramifications outside IKVM. The Tika people, for example, were very motivated to fix it. |
Ssee, for example, this query regarding of bugs related to AMNs in Apache projects: https://issues.apache.org/jira/browse/CURATOR-550?jql=text%20~%20%22automatic%20module%22 Everybody has motivation to add them and get them right. |
In a sense though, there is a natural conflict here. Maven conventions are for a specific file name pattern for classifiers. This pattern conflicts with the Automatic-Module-Name discovery mechanism in JDK9, making it impossible to infer a unique module name. Thing is, it's not OUR conflict. It's not like any JVM could infer a unique module name either. The real world issue is just a matter of the extent to which some tool requires a unique module name or not. We do. But, so do others. |
So, do we specifically need to spell out that we need each Automatic-Module-Name entry to be unique, or if they follow the spec will it work out that way anyway? I just want to be sure I understand what the recommended fix is before reporting it to them. |
I'm usually something like: The Stanford OpenNLP JAR files from Maven that I have examine thus far lack a JDK9 Automatic-Module-Name entry in their MANIFEST.MF files and are thus undiscoverable as unique modules by tooling that expects module names complying with this specification. As mentioned https://dev.java/learn/modules/automatic-module/ and in many other places, it is ideal that an Automatic-Module-Name entry be included in JAR files that are published publically, so that tooling that requires it can operate properly. In this particular case, for example, tooling is unable to locate the Automatic-Module-Name entry and thus falls back to the 'inference' specification described at https://docs.oracle.com/javase/9/docs/api/java/lang/module/ModuleFinder.html. For the JARs published to Maven under the 'model' classier, these files names (stanford-opennlp-4.5.5-models.jar, etc) would be infered to possses the module name of "stanford.opennlp", which would overlap with the module name of the core libray (stanford-opennlp-4.5.5.jar), thus causing a duplicate. This could be resolved by including an explicit entry. Something like that. |
For a more indepth analysis of the overall issue, see: https://blog.joda.org/2017/05/java-se-9-jpms-automatic-modules.html |
Stanford CoreNLP 4.5.6 now uses <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net6.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="IKVM" Version="8.7.5"/>
<PackageReference Include="IKVM.Maven.Sdk" Version="1.6.7" />
</ItemGroup>
<ItemGroup>
<MavenReference Include="edu.stanford.nlp:stanford-corenlp" Version="4.5.6" />
<MavenReference Include="edu.stanford.nlp:stanford-corenlp" Version="4.5.6" Classifier="models" />
</ItemGroup>
</Project> And now both DLLs get generated: But IKVM still doesn't see the resources from the models when calling into the main DLL. edu.stanford.nlp.io.RuntimeIOException
HResult=0x80131500
Message=Error while loading a tagger model (probably missing model file)
Source=edu.stanford.nlp.corenlp
StackTrace:
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(Properties config, String modelFileOrUrl, Boolean printLoading)
at edu.stanford.nlp.tagger.maxent.MaxentTagger..ctor(String modelFile, Properties config, Boolean printLoading)
at edu.stanford.nlp.tagger.maxent.MaxentTagger..ctor(String modelFile)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(String loc, Boolean verbose)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator..ctor(String annotatorName, Properties props)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(Properties properties)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$6(Properties props, AnnotatorImplementations impl)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.__<>Anon7.apply(Object , Object )
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$33(Entry entry, Properties inputProps, AnnotatorImplementations annotatorImplementation)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.__<>Anon41.get()
at edu.stanford.nlp.util.Lazy.3.compute()
at edu.stanford.nlp.util.Lazy.get()
at edu.stanford.nlp.pipeline.AnnotatorPool.get(String name)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements, AnnotatorPool annotatorPool)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props)
at IkvmMavenMissingResourcesError.Program.Main(String[] args) in F:\Users\shad\source\repos\IkvmMavenMissingResourcesError\IkvmMavenMissingResourcesError\Program.cs:line 36
This exception was originally thrown at this call stack:
[External Code]
Inner Exception 1:
IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger" as class path, filename or URL I am still using the original 3 lines of code that caused the error. using edu.stanford.nlp.pipeline;
using java.util;
namespace IkvmMavenMissingResourcesError
{
internal class Program
{
static void Main(string[] args)
{
// Initialize Stanford CoreNLP for sentiment analysis
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment");
var pipeline = new StanfordCoreNLP(props);
}
}
} So, it looks like the default class loader still doesn't emulate what Java does. |
Did you preload the assembly? |
Preload? No. All of the code is posted above. Are you saying this is a requirement? |
Always has been if there is no direct type reference. |
Could you get the Stanford CoreNLP 4.5.6 to work the marven way by preloading assembly? |
No, it still doesn't work. However, I recall getting a different error message previously than I am getting now. I didn't report it previously because I wanted to test it in Java to make sure the issue is because of IKVM, but I am fairly certain it is or someone would have reported it to the Stanford CoreNLP issue tracker by now. My Program.cs file now looks like this: <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net6.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="IKVM" Version="8.7.5"/>
<PackageReference Include="IKVM.Maven.Sdk" Version="1.6.7" />
</ItemGroup>
<ItemGroup>
<MavenReference Include="edu.stanford.nlp:stanford-corenlp" Version="4.5.6" />
<MavenReference Include="edu.stanford.nlp:stanford-corenlp" Version="4.5.6" Classifier="models" />
</ItemGroup>
</Project> using edu.stanford.nlp.pipeline;
using java.util;
using System;
using System.IO;
using System.Reflection;
namespace IkvmMavenMissingResourcesError
{
internal class Program
{
static void Main(string[] args)
{
Assembly.LoadFile(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "jollyday.dll"));
// Load the resource assemblies
string assemblyName = "edu.stanford.nlp.corenlp_english_models, Version=4.5.0.0, Culture=neutral, PublicKeyToken=13235d27fcbfff58";
Assembly.Load(new AssemblyName(assemblyName));
// Initialize Stanford CoreNLP for sentiment analysis
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment");
var pipeline = new StanfordCoreNLP(props);
}
}
} Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(String className, Object[] arguments)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.create(String className, String name, Properties props)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.createExtractor(String name, Properties props)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier..ctor(Properties props, Boolean useSUTime, Properties sutimeProps)
at edu.stanford.nlp.ie.NERClassifierCombiner..ctor(Boolean applyNumericClassifiers, Language nerLanguage, Boolean useSUTime, Properties nscProps, String[] loadPaths)
at edu.stanford.nlp.pipeline.NERCombinerAnnotator..ctor(Properties properties)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(Properties properties)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$8(Properties props, AnnotatorImplementations impl)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.__<>Anon9.apply(Object , Object )
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$33(Entry entry, Properties inputProps, AnnotatorImplementations annotatorImplementation)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.__<>Anon41.get()
at edu.stanford.nlp.util.Lazy.3.compute()
at edu.stanford.nlp.util.Lazy.get()
at edu.stanford.nlp.pipeline.AnnotatorPool.get(String name)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements, AnnotatorPool annotatorPool)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props)
at IkvmMavenMissingResourcesError.Program.Main(String[] args) in F:\Users\shad\source\repos\IkvmMavenMissingResourcesError\IkvmMavenMissingResourcesError\Program.cs:line 48 Previously, I was getting an error that originated in I have a demo showing loading all of the resource files explicitly that works. We shouldn't have to do that, though. The whole point of putting the resources in the .jar file is to have a default set of resources that load automatically just by including the package. |
Thanks for your initial codes version 4.5.6 is solved now NightOwl888/lucenenet-opennlp-mavenreference-demo#1 (comment) |
The documentation for Stanford CoreNLP states that the POM configuration should be:
So, my project file looks like this:
When specifying
MavenReference
s this way, it successfully downloads the .jar files with the classifiermodels
into the local Maven cache.However, there is no build output for the
stanford-corenlp-4.5.5-models.jar
.Then when I try to run a simple example, it cannot find the models.
Error
Looks like this PR https://github.com/sergey-tihon/Stanford.NLP.NET/pull/130/files#diff-9b0fd7e079a9dfbbaa7589009e76812239c92547bf40b6668d365b647207ed59R42-R57 has some workarounds for loading the resource files, which I will pursue. But it would be nice if this could be fixed so when adding
MavenReference
to the resources it would be able to discover them on its own.The text was updated successfully, but these errors were encountered: