It is not necessary to load UniProt into a database (the code works also when just using UniProt XML files). However it is a possibility provided by this project. This has the advantage that queries can be formulated across all of UniProt.
There are two possibilities how to load into a database.
- from the IDE,
- from the command line, using a jar file
- Make sure you have access to a MySQL installation, create an empty new database there, make sure you have write permissions
- Update the configuration in
src/main/resources/database.properties
to match your configuration - Run
LoadMissing.java
(ideally over-night, by next morning you will have a populated database)
An executable jar file can be compiled with mvn package
. You can run this jar file and pass in the DB configuration
as a command line parameter (see LoadMissing.java
as the main class).
The default functionality provided here is to load all Swiss-Prot entries into the database (see the LoadMissing.java
class).
Currently these includes about 550k UniProt entries. The process to load all these files takes about 8 hours on a typical database server.