`es-indexer` - Elasticsearch Indexer

This es-indexer for ingesting GeoNetwork4 metadata records into an Elasticsearch index. The index schema adheres to the STAC schema but includes some customisations.

Although GeoNetwork4 itself comes with a default Elasticsearch index (gn_records), the OGC APIs will use the es-indexer-created index to retrieve data for the new AODN portal.

Development

This application is built with Spring Boot 3 and Java 17.

There are required environment variables to run the es-indexer:

# Client calling the Indexer API must provide this token in the Authorization header, these value is set
# in [appdeply](https://github.com/aodn/appdeploy/blob/main/tg/edge/es-indexer/ecs/variables.yaml) for edge env
# under environment_variables:

APP_HTTP_AUTH_TOKEN=sampletoken

SERVER_PORT=8080

ELASTICSEARCH_INDEX_NAME=sampleindex
ELASTICSEARCH_SERVERURL=http://localhost:9200
ELASTICSEARCH_APIKEY=sampleapikey

GEONETWORK_HOST=http://localhost:8080

Maven build

$ mvn clean install # [-DskipTests]

If you do not use -DskipTests, then autotest will run where it will create a docker geonetwork instance, inject the sample data and then run the indexer. You can treat this as kind of integration testing.

This project container 3 submodules:

geonetwork - This is used to compile JAXB lib to handle XML return from GEONetowrk, it is iso19115 standard
stacmodel - A group of java class that create the STAC json which store in elastic search, so if app needs to read STAC from elastic, use this lib
indexer - The main app that do the transformation.

Docker

Start a local instance of indexer

$ docker-compose -f docker-compose-dev.yaml up # [-d: in daemon mode | --build: to see the console logs]

Endpoints:

Description	Endpoints	Environment	Param
Logfile	`/manage/logfile`	Edge
Beans info	`/manage/beans`	Edge
Env info	`/manage/env`	Edge
Info (Show version)	`/manage/info`	Edge
Health check	`/manage/health`	Edge
POST/GET/DELETE index metadata against specific record	`/api/v1/indexer/index/{uuid}`	All	withCO - set true will call index cloud optimized before index metadata
POST Index cloud optimized data on specific record	`/api/v1/indexer/index/{uuid}/cloud	All
Bulk index	`/api/v1/indexer/index/all`	All
Bulk index Async metadata on all	`/api/v1/indexer/index/async/all	All
POST Index Async cloud optimized data on all	`/api/v1/indexer/index/async/all-cloud	All
Swagger UI:	`/swagger-ui/index.html`	All

The 'async/all' endpoints use SSE (Server Side Events) to avoid gateway timeout, you should use postman version 10.2 or above (there is a bug with SSE for previous version), or use the web based postman (pref), once you issue the call, you should see event come back in the body at regular time.

The call header should contains

X-API-Key (Check with dev)

Accept = text/event-stream

Content-Type = text/event-stream;charset=utf-8

Method = POST

Notes

Centroid Calculation

The calculation of centroid isn't happens here, the indexer creates a spatial extents area with land removed. The resulting spatial extents is store in geometry_noland. The centroid point is calculated in the OGC api, please refer to the README in ogcapi for details.

ARDC Vocabulary

When indexer starts, it will try to fetch vocabs from ARDC, please check code under ardcvocabs. The url to the API call always points to the "current", this current is maintained manually by Nat. For each vocabs, system needs to call two separated API, one target the root level, and the other target all node. In order to avoid un-necessary call, the indexer will check the "current" version is diff from the saved version in Elastic, if version is the same then it will skip the download.

There is a gcmd-mapping.csv file which map the GCMD keywords to the AODN vocabs, this allow dataset having GCMD keyword searchable using AODN keywords. The mapping is created manually by Nat, right now store here excel

The vocab is assigned to the metadata manually, and is part of the suggested words. That means user can type vocabs in the search box, and able to select some known keywords. Although vocabs have multiple level, so far we only use level 1 and level 2.

Name		Name	Last commit message	Last commit date
Latest commit History 938 Commits
.github		.github
.mvn		.mvn
ardcvocabs		ardcvocabs
cloudoptimized		cloudoptimized
geonetwork4-api		geonetwork4-api
indexer		indexer
stacmodel		stacmodel
tests		tests
.dockerignore		.dockerignore
.env.sample		.env.sample
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
LICENSE		LICENSE
README.md		README.md
docker-compose-dev.yaml		docker-compose-dev.yaml
docker-compose.yaml		docker-compose.yaml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`es-indexer` - Elasticsearch Indexer

Development

Maven build

Docker

Endpoints:

Notes

Centroid Calculation

ARDC Vocabulary

About

Releases 32

Packages

Contributors 6

Languages

License

aodn/es-indexer

Folders and files

Latest commit

History

Repository files navigation

es-indexer - Elasticsearch Indexer

Development

Maven build

Docker

Endpoints:

Notes

Centroid Calculation

ARDC Vocabulary

About

Resources

License

Stars

Watchers

Forks

Releases 32

Packages 0

Contributors 6

Languages

`es-indexer` - Elasticsearch Indexer

Packages