Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

suggestions #3

Open
SummerSec opened this issue Aug 20, 2021 · 8 comments
Open

suggestions #3

SummerSec opened this issue Aug 20, 2021 · 8 comments
Labels
enhancement New feature or request

Comments

@SummerSec
Copy link

It is recommended that the compiled jdk CodeQL database for each version be placed under release, which will be convenient for many people.

@Marcono1234
Copy link
Owner

placed under release

Do you mean for example databases/16/codeql-jdk-java-db-java/..., including the version number, here 16, as directory?

@SummerSec
Copy link
Author

yes, and It is best to add the name of the operating system in the ratio Windows, Linux ......

@Marcono1234
Copy link
Owner

The question is how the script should obtain this version number. Since it only has the Git repository URI, and that could in theory point anywhere, not necessarily the OpenJDK GitHub repository, the version number cannot be extracted from it.
Additionally running java -version (or similar) for the built JDK might not be possible either because the user might specify a make target which does not create the java binary. Additionally extracting the version number from java -version would be error-prone since its output is not clearly specified.

Do you have a suggestion how the JDK version could be obtained within the script?
It appears there is a make/conf/version-numbers.conf file, but that is still relatively new (based on the Git history), so not all JDK's will have it.

yes, and It is best to add the name of the operating system in the ratio Windows, Linux ......

I am not completely sure about this one. Ideally the Docker build would be host OS independent and always produce the same output (for Linux 64 bit, as specified by the configure command), but I am not actually sure if that is true. For example I am not certain whether the fact that WSL2 on Windows is used and which CPU architecture the host has (e.g. AMD64) makes a difference.

@Marcono1234
Copy link
Owner

Do you think an alternative would be to include the abbreviated Git SHA in the database file name, e.g. codeql-jdk-java-db-7dcedb672?
I also thought about including the git describe output. That would have the advantage that the database name would then include the JDK version because the OpenJDK Git tags include it (for example jdk-16.0.2-ga). However, that won't work when cloning with --depth 1 because then Git does not have the necessary Git history to determine the relevant tag.

@SummerSec
Copy link
Author

Do you think an alternative would be to include the abbreviated Git SHA in the database file name, e.g. codeql-jdk-java-db-7dcedb672?
I also thought about including the git describe output. That would have the advantage that the database name would then include the JDK version because the OpenJDK Git tags include it (for example jdk-16.0.2-ga). However, that won't work when cloning with --depth 1 because then Git does not have the necessary Git history to determine the relevant tag.

My idea is to create some standard JDK version compilation files, such as codeql Java build jdk8 Linux, codeql Java build jdk8 windows, etc. And you can upload these databases created by compiling standard files to GitHub for developers to download. Sometimes waiting is a painful thing. After all, it takes a long time to compile.

@SummerSec
Copy link
Author

Also I would like to ask a question not related to this project, how to create a script to scan *.jar or *.war, *.class or is there a way to do it?

@Marcono1234
Copy link
Owner

My idea is to create some standard JDK version compilation files, such as codeql Java build jdk8 Linux, codeql Java build jdk8 windows, etc.

Building the JDK for Windows would require a different setup. The OpenJDK building instructions have a section for this with WSL but I have not tested it. I have created #5 now to track this feature request.
For easily building different JDK versions I have created #6 now.

And you can upload these databases created by compiling standard files to GitHub for developers to download.

I probably won't be doing this for several reasons:

  • This might hit GitHub quotas; but rather unlikely based on the documentation
  • I am not sure if the CodeQL Terms and Conditions allow me to publish databases. It says:

    These Terms do not authorize [...] To otherwise or in any other context generate any CodeQL database for or during automated analysis

  • It is now clear how often / when CodeQL databases should be created. Every week, month, ...? Every database is at least 1 GB large, so this might put an unjustified amount of strain on GitHub, especially given that this repository here is not very well known.
  • CodeQL databases can be downloaded from https://lgtm.com/. Though to be fair the whole reason why I created this repository is because lgtm.com is currently not building JDK databases and that might not change in the future, see LGTM.com: Building OpenJDK github/codeql#6219 (reply in thread).

Sometimes waiting is a painful thing. After all, it takes a long time to compile.

On my machine it takes (if I recall correctly) about 30 - 40 minutes. The Docker image will also only be created once, unless the base Ubuntu image is updated(?). The question is also how often you want to create a JDK database; I assume you would only create one every new JDK release (or for specific changes), so in that case such long build times might not be that problematic.
As pointed out in the README there are ways the JDK build performance can be improved. For me specifying --memory-limit made a noticable difference. If you have any other suggestions for improving the performance (e.g. WSL config or Docker settings), please let me know 🙂

@Marcono1234
Copy link
Owner

how to create a script to scan *.jar or *.war, *.class or is there a way to do it?

I don't think this is possible. CodeQL databases are created based on the source code discovered during compilation. The compiled code (.class files) are probably missing most of the required information.
Maybe your question is similar to github/codeql#4304 (and the comments of that issue).

For general CodeQL questions it would probably be best to ask them on https://github.com/github/codeql; the maintainers of CodeQL can probably give you more information.

@Marcono1234 Marcono1234 added the enhancement New feature or request label Aug 23, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants