ARCHIVED ARCHIVED ARCHIVED ARCHIVED ARCHIVED ARCHIVED ARCHIVED ARCHIVED
## #### ### # # ### # # ##### #####
# # # # # # # # # # # # # #
# # # # # # # # # # # # #
# # # # # # # # # # # # #
# # #### # ###### # # # #### # #
###### ## # # # # # # # # #
# # # # # # # # # # # #
# # # # # # # # # # # # #
# # # # ### # # ### # ##### #####
ARCHIVED ARCHIVED ARCHIVED ARCHIVED ARCHIVED ARCHIVED ARCHIVED ARCHIVED
This Git repository is obsolete and has been archived.
This is an application that takes WARC files in a given directory and indexes them in Solr. A ready-to-use Solr Docker configuration can be found in solr/
The application is built using gradle. To run the app use the command gradle runApp
which should pull the depedencies, compile the code and run it.
A jar can be created by running gradle fatJar
and the jar file can be found in build/libs/
.
The application can be built and run using the Dockerfile. The watch directory containing the WARC files should be mounted so that the application can access thee files. The application is configured by using environment variables, these are LOCKSS_SOLR_WATCHDIR, LOCKSS_SOLR_URL and LOCKSS_SOLR_BATCH_SIZE. Default values are provided by the application but these can be overridden when necessary. An example Docker command to start the application is given below.
docker run -it --rm -e LOCKSS_SOLR_WATCHDIR=/samples -e LOCKSS_SOLR_URL=http://192.168.56.103:8983/solr/test-core -v /home/rwincewicz/workspace/lockss/lockss-solr/samples:/samples:ro lockss/indexer
Alternativily you can use the image from the hub, the following command will start a container with solr and create a tets-core:
docker run --name solr -d -p 8983:8983 solr solr-create -c test-core
Then run the application in Docker with as such:
docker run -it --rm --link solr:solr -e LOCKSS_SOLR_WATCHDIR=/samples -e LOCKSS_SOLR_URL=http://solr:8983/solr/test-core -v $WORKSPACE/lockss-solr/samples:/samples:ro lockss/indexer
It's also posible to use Docker Compose to build and start both containers. You'll need to install Docker Compose, and run the following command:
docker-compose up --build
If you want to use a different WARCs folder than the default (i.e. ./samples
), the can be defined in .env
as LOCKSS_SOLR_WATCHDIR
LOCKSS_SOLR_WATCHDIR=/var/data/warc
The application will not automatically pick-up existing WARC files, but a simple touch
should trigger the indexing:
touch /var/data/warc/*
You should now be able able to query the server at http://localhost:8983/solr/#/test
A Vagrantfile has been added to run the app on a VM.
If you are using a different WARCs folder than ./samples
, you'll have to make sure it's shared by updating the Vagrantfile
.
config.vm.synced_folder "/var/data/warc", "/var/data/warc"
You need to install Vagrant and run the following command:
vagrant up
This should start a VM running CentOS 7 with Docker
and Docker Compose
and other software. Please read the original Vagrant box page for details: Docker-enabled Vagrant boxes.
The SolR server running on the VM can be access from the host at http://localhost:58983/solr/ The VM is also running cAdvisor which can be access at http://localhost:58080/containers/