Skip to content

Pipeline from object storage to OCI Embedding Service to Oracle Database Vector Storage

License

Notifications You must be signed in to change notification settings

anders-swanson/oracle-database-embedding-flow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

oracle-database-embedding-flow

This code sample illustrates how to:

  1. Stream documents OCI object storage, see OCIDocumentLoader
  2. Split those documents into chunks, see Splitter
  3. Embed document chunks using OCI GenAI Embeddings, OCIEmbeddingModel
  4. Store the resulting embeddings as vectors in Oracle Database, see OracleVectorStore

An example workflow (EmbeddingWorkflowIT) ties these steps together, a snippet of which is shown below:

// Stream documents from OCI object storage.
documentLoader.streamDocuments(BUCKET_NAME, OBJECT_PREFIX)
    // Split each object storage document into chunks.
    .map(splitter::split)
    // Embed each chunk list using OCI GenAI service.
    .map(embeddingModel::embedAll)
    // Store embeddings in Oracle Database 23ai.
    .forEach(vectorStore::addAll);

Run the test

The sample test loads documents from an object storage bucket named "mybucket" using the object prefix "documents". These documents are then embedded using the OCI GenAI service, and finally stored in a local Oracle Database container.

# Set your OCI compartment and namespace before running the t
export OCI_COMPARTMENT="my compartment OCID"
export OCI_NAMESPACE="my oci namespace"
mvn integration-test

It should take about 30-40 seconds to run the test, which asserts that vector have been successfully added to the database:.

[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.75 s -- in com.example.EmbedddingWorkflowIT
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  36.016 s

About

Pipeline from object storage to OCI Embedding Service to Oracle Database Vector Storage

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages