This code sample illustrates how to:
- Stream documents OCI object storage, see OCIDocumentLoader
- Split those documents into chunks, see Splitter
- Embed document chunks using OCI GenAI Embeddings, OCIEmbeddingModel
- Store the resulting embeddings as vectors in Oracle Database, see OracleVectorStore
An example workflow (EmbeddingWorkflowIT) ties these steps together, a snippet of which is shown below:
// Stream documents from OCI object storage.
documentLoader.streamDocuments(BUCKET_NAME, OBJECT_PREFIX)
// Split each object storage document into chunks.
.map(splitter::split)
// Embed each chunk list using OCI GenAI service.
.map(embeddingModel::embedAll)
// Store embeddings in Oracle Database 23ai.
.forEach(vectorStore::addAll);
The sample test loads documents from an object storage bucket named "mybucket" using the object prefix "documents". These documents are then embedded using the OCI GenAI service, and finally stored in a local Oracle Database container.
# Set your OCI compartment and namespace before running the t
export OCI_COMPARTMENT="my compartment OCID"
export OCI_NAMESPACE="my oci namespace"
mvn integration-test
It should take about 30-40 seconds to run the test, which asserts that vector have been successfully added to the database:.
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.75 s -- in com.example.EmbedddingWorkflowIT
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36.016 s