- Dockerized application for simple deployment
- PostgreSQL DB <=> Django + Gunicorn + Nginx web server <= REST API => Vue-based SPA + Vuex
- Django Whitenoise to serve static files, CDN Ready
- Annotations stored in relational database
- Access control / user management
- Vuex handles state management and persistance to never lose annotations on the front-end
Before getting started you should have the following installed and running:
- Docker >= v19
- Docker Compose >= v1.25
Data upload via web interface if not possible yet, so the data needs to be mounted inside the container.
If you have the images in the same machine, just put them in the expected location data/dataset/
by creating a symbolic link (below) or just moving your data.
ln -s $MY_DATASET_LOCATION $(pwd)/data/dataset
If your dataset is remote (cloud or another computer), you might want to start using dvc
. Check the Integrating DVC session below.
# copy all example dotenv files
sudo apt install mmv
mmv -c 'env/*.env.example' 'env/#1.env'
# edit all env/*.env files setting the following:
# DJANGO_STATIC_HOST
# SECRET_KEY
# DB_PASS
# POSTGRES_PASSWORD (same as DB_PASS)
find env -name "*.env" -exec nano {} \;
# create the public network
docker network create net-nginx-proxy
# build docker images and run containers
docker-compose up
# from another terminal, run the database migrations
docker-compose exec web pipenv run /app/manage.py migrate
# create django superuser
docker-compose exec web pipenv run /app/manage.py createsuperuser
# access localhost:80 in your browser
docker-compose exec web /bin/bash
docker-compose exec nginx /bin/sh
docker-compose exec db psql --username eyetagger_admin --dbname eyetagger
More PostgreSQL commands:
\h # help
\q # quit
\l # list databases
\d # list tables / relations
\d api_annotation # describe a table / relation
# run a query - don't forget the semicolon:
SELECT id, annotator_id, image_id FROM api_annotation;
Feature | Default location | Comment |
---|---|---|
Django REST Framework | http://localhost/api | Only available in development mode (i.e. DEBUG=True in env/django_app.env ) |
Django Administration Panel | http://localhost/api/admin | Credentials created with pipenv run ./manage.py createsuperuser |
Location from project root | Contents |
---|---|
backend/ |
Django Project & Backend Config |
backend/api/ |
Django App for REST api |
data/ |
Git-ignored: DB + backups |
deploy/ |
Scripts and configuration files |
dist/ |
Git-ignored: back+front generated files |
env/ |
Environment Files |
public/ |
Static Assets |
src/ |
Vue App |
To run it once:
# docker-compose up db # if db container is not running
docker-compose exec db pg_dump -U eyetagger_admin eyetagger | \
gzip > eyetagger_bkp_$(date +"%Y_%m_%d_%I_%M_%p").sql.gz
Check backups.sh for a simple automated version.
Tip: you can add the existing
backups.sh
to yourcrontab -e
for periodic backups:
To run it every 6 hours:
0 */6 * * * /eyetagger/backups.sh >> /eyetagger/data/logs/backups.log 2>&1
Or every business day (Mon-Fri) at 6pm:
0 18 * * 1-5 /eyetagger/backups.sh >> /eyetagger/data/logs/backups.log 2>&1
# replace $YOUR_DUMP_GZ by your .gz location:
# let's copy the backup before moving/modifying it
cp $YOUR_DUMP_GZ /tmp/dump.sql.gz
# extract the dump
gunzip -k /tmp/dump.sql.gz
# copy to the running DB container
# docker-compose up db # if db container is not running
docker cp /tmp/dump.sql eyetagger_db_1:/dump.sql
# create a new empty database
docker-compose exec db createdb -U eyetagger_admin -T template0 eyetagger_new
# populate the empty database with the dump
docker-compose exec db psql -U eyetagger_admin -d eyetagger_new -f /dump.sql
# swap database names
docker-compose exec db psql --username eyetagger_admin --dbname postgres
\l
ALTER DATABASE eyetagger RENAME TO eyetagger_old;
ALTER DATABASE eyetagger_new RENAME TO eyetagger;
\l
\q
# get the other services up and try it out!
docker-compose down && docker-compose up
# if successful, clean the temporary backup copies
rm /tmp/dump.sql.gz /tmp/dump.sql
-
There are 2 entries
command
underdocker-compose.yaml
> Serviceweb
. Select the "development" one by commenting out the alternative. -
Run
docker-compose up
(rundown
first if already up) and openlocalhost:9000
. Hot reload should be enabled i.e. live changes to the front-end code will update the browser.
- Adapt the environment files for the backend in
env/
. - Adapt the environment file for the frontend in
vue.config.js
. - Follow the Django deployment checklist for further configuration.
- Deploy the dockerized application in a remote server by running it in daemon form:
docker-compose up -d && docker-compose logs -f
.
-
Install
dvc
on hostpip install dvc
-
Setup access (using a GCP below)
# get provider-specific api pip install 'dvc[gs]' # create google bucket credentials mkdir -p $HOME/.gcp/ GOOGLE_APPLICATION_CREDENTIALS=$HOME/.gcp/iris-admin.json # paste the contents of the GCP JSON in this file # see https://cloud.google.com/docs/authentication/getting-started" nano $GOOGLE_APPLICATION_CREDENTIALS chmod 400 $GOOGLE_APPLICATION_CREDENTIALS export GOOGLE_APPLICATION_CREDENTIALS echo -e ' >> Add this to your ~/.bashrc:\n\n\ export GOOGLE_APPLICATION_CREDENTIALS='$GOOGLE_APPLICATION_CREDENTIALS'\n\n
-
Then get your data from the remote.
dvc pull
Or add new data to the bucket
dvc add data/dataset && dvc push
Eyetagger handles two types of data: the images - referred to as the dataset, and the metadata - stored in a relational database / db using PostgreSQL.
Metadata is necessary to keep track of the annotations, who did them, when, and any other data attribute that might be useful for the annotation workload. The dataset is usually a set of images to be displayed during the annotation process.
In order to serve a custom dataset, you will need to first A. run the app creating a database (steps 1.1-1.4 above) and then B. create the metadata entries for your dataset in PostgreSQL.
Below we describe how to do this part B by using a database migration:
-
Create a migration.
The metadata entries are created by running one or more database migrations. Let's create an empty one with:
# this assumes your containers are up, make sure to run docker-compose up first # below and onwards, "api" is the internal name of the Django app that we are working with docker-compose exec web pipenv run /app/manage.py makemigrations api --name dataset_import --empty
After this command will have a new Python file in the migrations' directory (e.g.
backend/api/migrations/####_dataset_import.py
). -
Call a new and customized migration script to ingest your dataset's metadata into the relational DB.
Change that created file to import your custom script as follows:
from backend.api.manual_migration import import_dataset # down in the Migration class, paste the following: class Migration(migrations.Migration): # ... initial = True # import_dataset is the function that will be called when you run the migration # reverse_code is the function that will be called when you rollback the migration, using a "no-op" function below operations = [ migrations.RunPython(import_dataset, reverse_code=migrations.RunPython.noop) ] # ...
-
Customize this migration script and ORM models to match your dataset.
- An example of a migration script can be found in
backend/api/manual_migration.py
- you can use this as a template for your own script. - All SQL code and database transactions are handled by Django's ORM, so you don't need to know SQL to populate the database.
- The existing migration script loads a CSV file that contains metadata for each image. Because each dataset is unique, yours might have different attributes.
- The
import_dataset
function in that script loads this CSV, creates all ORM objects (e.g. theimg
variable), and saves them to the databaseimg.save()
. The other functions help with this process. - Change the Image model:
- Modify
backend/api/models.py
to fit your needs. - Run
/app/manage.py makemigrations
- this compares model.py to the database, if their schemas differ it'll generate code that describes a new migration. - Run
/app/manage.py migrate
to "run" the necessary migrations, effectively updating the database. Django keeps track of the migrations that were run.
- Modify
⚠️ The attributes of yourImage
model should be close to the columns in your CSV file. If you try to store an ORM object that deviates from the table schema, the database transaction will fail.
- An example of a migration script can be found in
-
Run the migrations.
Only the necessary (new) migrations will be run with the following command:
docker-compose exec web pipenv run ./manage.py migrate
💡 After you create (and save) some entries like
Image
objects, you will be able to see them in the Django admin panel (see dashboards above). -
Troubleshooting: when a migration goes wrong.
Errors might happen if the migration script is not correct. If so, you can reverse it with:
# change 0001 below docker-compose exec web pipenv run ./manage.py migrate api 0001
Where
0001
is the number of the previous migration (i.e. the number####
inbackend/api/migrations/####_migration_name.py
).Another way is to reset them all: see scenario 2 in this guide, our "app name" is
api
.A note about migrations that change schemas: if a migration modifies the database schema, make sure your rollback function also undoes those changes. For example, if migration
N
adds a new column to theImage
model, and you roll back toN-1
, this roll back function should also remove that column from theImage
model. Otherwise, when you runN
again, Django will try to create a column that already exists, which will fail. Because of this rollback complication, I chose to separate migrations that change the database schema (e.g. creating tables, modifying attributes) from migrations that populate the database with data (e.g. the one inmanual_migration.py
).Above are the best ways to fix migration issues and avoid corruption or data loss. But if losing data is not an issue, you can also delete the database and start over, for example:
# ⚠️ this will cause data loss docker-compose exec db dropdb -U eyetagger_admin eyetagger docker-compose exec db createdb -U eyetagger_admin eyetagger docker-compose exec web pipenv run /app/manage.py migrate