Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Package reference embeddings #1151

Open
wants to merge 2 commits into
base: initial-discord-bot
Choose a base branch
from

Conversation

breadchris
Copy link
Contributor

Add package reference embeddings to the database and generate them from the CLI. This will let us be able to do a semantic search of package readmes.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2023

Hasura Semantic Diff

Hasura config files have changed. This comment shows which fields have changed ignoring formatting.

Click to expand!
(root level)
+ two map entries added:
  table:
    name: content_embedding
    schema: package
  object_relationships:
  - name: reference_content
    using:
      foreign_key_constraint_on: reference_content_id


array_relationships
  + one list entry added:
    - name: reference_contents
      using:
        foreign_key_constraint_on:
          column: package_id
          table:
            name: reference_content
            schema: package


(root level)
+ three map entries added:
  table:
    name: reference_content
    schema: package
  object_relationships:
  - name: package
    using:
      foreign_key_constraint_on: package_id
  array_relationships:
  - name: content_embeddings
    using:
      foreign_key_constraint_on:
        column: reference_content_id
        table:
          name: content_embedding
          schema: package

diff --git a/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/down.sql b/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/down.sql
new file mode 100644
index 00000000..504ab7e0
--- /dev/null
+++ b/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/down.sql
@@ -0,0 +1,2 @@
+DROP TABLE "package"."content_embedding";
+DROP TABLE "package"."reference_content";
diff --git a/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/up.sql b/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/up.sql
new file mode 100644
index 00000000..48c39ca7
--- /dev/null
+++ b/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/up.sql
@@ -0,0 +1,25 @@
+CREATE TABLE "package"."reference_content" (
+    "id" uuid NOT NULL DEFAULT gen_random_uuid(),
+    "package_id" uuid NOT NULL REFERENCES "package"."package"("id") ON UPDATE cascade ON DELETE cascade,
+    "url" text NOT NULL,
+    "content" text NOT NULL,
+    "normalized_content" text NOT NULL,
+    "content_type" text NOT NULL,
+    "last_successful_fetch" timestamptz DEFAULT NULL,
+    PRIMARY KEY ("id"),
+    UNIQUE ("package_id", "url")
+);
+
+CREATE TABLE "package"."content_embedding" (
+    "id" uuid NOT NULL DEFAULT gen_random_uuid(),
+    "content_hash" text NOT NULL,
+    "reference_content_id" uuid NOT NULL REFERENCES "package"."reference_content"("id") ON UPDATE cascade ON DELETE cascade,
+    "content" text NOT NULL,
+    "embedding" vector (1536) NOT NULL,
+    PRIMARY KEY ("id"),
+    UNIQUE ("content_hash")
+);
+
+CREATE INDEX ON "package"."content_embedding"
+    USING ivfflat (embedding vector_cosine_ops)
+    WITH (lists = 100);

@breadchris breadchris force-pushed the package-reference-embeddings branch from 2a1ee91 to de48522 Compare March 9, 2023 14:25
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant