Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[WiP]: Loading verilator as shared libraries #233

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

grebe
Copy link
Contributor

@grebe grebe commented Feb 25, 2019

This should help with the IPC overhead. Not sure how much of a pain it will be to make this portable.

The way it works is this:

  1. The verilator harness is augmented with a JNI API. Depending on an #ifdef, either the main function we currently use will be included or the JNI API.
  2. I currently unconditionally add the #ifdef for JNI to the verilator CFLAGS. A more polished version would treat this as an option
  3. After running verilator but before running the makefile verilator spits out, I add a target to the makefile that builds a shared library
  4. The tester backed currently drops the IPC-based backend. Ideally, this would be a parameter.

Things are hacky, but less hacky than I'd have thought. Getting this to work on your system shouldn't be too crazy, it will mostly depend on getting your verilator CFLAGS right. I haven't noticed any weird crashes.

As far as performance goes, on my GCD test I'm getting ~400kHz sustained. I think that's over a 5x speed-up if our old numbers are right.

This should help with the IPC overhead.
@grebe
Copy link
Contributor Author

grebe commented Feb 25, 2019

@chick

@ucbjrl
Copy link
Contributor

ucbjrl commented Feb 26, 2019

When I attempted this (the remains are in https://github.com/freechipsproject/chisel3/tree/testers2%2Binterprocesstester), I ran into a number of problems:

  • there is no System.unload() and attempts to load subsequent simulators after the first appear to be ignored and the tests silently fail with Cannot find the object ...
  • loading a static shim which loads and unloads the simulator runs afoul of sbt's class loader isolation and the fact that a shared library may only be loaded once.

There may be a way to interpose an expendable class loader to do the System.load(), in which case, the shared library will be unloaded when the class loader is destroyed but this will encounter problems if multiple tests using the same simulation are run in parallel.

I think this is ultimately possible, but we'll need some mechanism to better coordinate tests using the same simulation, and loading and unloading different simulations.

@chick
Copy link
Contributor

chick commented Feb 26, 2019

Seems worth throwing in here that @stevenmburns says that cocotb VCS is 20x faster than the testers VCS implementation. Cocotb is in python but there may tricks that are relevant to us

…neric_java_home

Use System.properties for java_home; report System.load() failures
@grebe
Copy link
Contributor Author

grebe commented Feb 26, 2019

Thanks Jim, I didn't experience issues reloading a shared library on OS X, to my surprise. I guess dylibs work differently.

My understanding is that shared libraries come and go with the classloader, but I'm not really sure how that gets translated to the OS/process level.

I suppose another option is to uniqify the class name per compilation unit. I bet you could load a lot of libraries before running into problems

@grebe
Copy link
Contributor Author

grebe commented Feb 26, 2019

Also re:VCS, I think you could do this same thing, but I'd be careful about the libraries it pulls in. I think it has more of a runtime than verilator.

@ucbjrl
Copy link
Contributor

ucbjrl commented Feb 26, 2019

The problems I experienced were/are on OSX, with Java 8 and 11.

@grebe
Copy link
Contributor Author

grebe commented Feb 26, 2019

Really? That's strange. I wonder why I didn't see it. I was using sbt in interactive mode running the verilator test. Here's my java version string:

openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

@stevenmburns
Copy link

Here is a git repo they will demonstrate the 20x runtime difference.

https://github.com/stevenmburns/cocotb-clamper

The bitonic sorter example is in there as well as others.

@ucbjrl
Copy link
Contributor

ucbjrl commented Feb 26, 2019

Try sbt clean test (i.e., run all the tests, some of them in parallel).

@stevenmburns
Copy link

Here are the runtime numbers.
Runtime wise, I’m getting these results on a 64 word 8-bit Chisel generated sorter, running through 10000 times.

Chisel test with treadle: 4 sec
Chisel test with VCS: 328 sec
cocotb with VCS: 14 sec

@grebe
Copy link
Contributor Author

grebe commented Feb 27, 2019

Thanks @stevenmburns. I think I'll look at that as a comparison point for this PR.

@ucbjrl, that does fail for me. I got an error in reset, though- it doesn't look like it is having any problems loading a shared library. My harness uses global variables in shady ways, so my guess is this the result of one tester re-initializing the global state while another tester is running. I should be passing state via the calling object rather than abusing static globals. Do you know JNI enough to give me advice on how to do that?

@ucbjrl
Copy link
Contributor

ucbjrl commented Feb 27, 2019

But each shared library defines some of the same symbols. How could you load more than one at a time? Unless you ensure there are no global symbols in common between simulations (i.e., each simulation has its own namespace and defines no globals), I suspect you'll have problems trying to load more than one.

And even if you could load more than one, eventually you'll start running up against memory limitations.

But in any case, yes, we should reduce the amount of global state.

@grebe
Copy link
Contributor Author

grebe commented Feb 27, 2019

I can imagine that it's OK for multiple loaded libraries to have the same symbols- I think when you load one, it will assign the native methods to the current addresses and won't get overwritten by later libraries getting loaded. This seems to indicate hundreds of thousands of shared libraries are OK to dynamically load.

Use JNI to save/get a pointer to state.
@grebe
Copy link
Contributor Author

grebe commented Feb 27, 2019

I got rid of global variables. Instead, I save a pointer to state in the wrapper class. Every function first uses JNI to get the pointer. This has a substantial negative impact on performance, so it is probably worth thinking about. I think we can cache the pointer without ill effect b/c other testers will get mapped to different addresses.

@grebe
Copy link
Contributor Author

grebe commented Feb 27, 2019

This makes an enormous difference, it's back to the original performance. Not sure why this particularly different than using globals the way I was before, but it isn't crashing for me. I think there's a better way to cache all this, though.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants