Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

New Components - vectorshift #16023

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

New Components - vectorshift #16023

wants to merge 11 commits into from

Conversation

luancazarine
Copy link
Collaborator

@luancazarine luancazarine commented Mar 25, 2025

Resolves #16018.

Summary by CodeRabbit

  • New Features
    • Added new actions to manage pipelines and enhance knowledge base operations, including the ability to add data, create pipelines, and execute pipelines.
    • Introduced additional event sources for chatbots, knowledge bases, and pipelines, expanding workflow automation.
    • Enhanced API integrations for listing and managing pipelines and knowledge bases.
    • Improved configuration options with dynamic controls based on selected data types.
    • Introduced new constants for data type and rescrape frequency options.
    • Updated the application version with corresponding dependency improvements.
    • Added structured error handling and metadata for new actions and modules.

@luancazarine luancazarine added the ai-assisted Content generated by AI, with human refinement and modification label Mar 25, 2025
Copy link

vercel bot commented Mar 25, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
docs-v2 ⬜️ Ignored (Inspect) Visit Preview Apr 2, 2025 3:22pm
pipedream-docs ⬜️ Ignored (Inspect) Apr 2, 2025 3:22pm
pipedream-docs-redirect-do-not-edit ⬜️ Ignored (Inspect) Apr 2, 2025 3:22pm

Copy link
Contributor

coderabbitai bot commented Mar 25, 2025

Walkthrough

This pull request introduces several new modules and enhancements for the Vectorshift component. New action modules have been added to add data to a knowledge base, create pipelines, and execute pipelines. Additionally, new source modules have been created to emit events for chatbots, knowledge bases, and pipelines, along with a shared event base. Common constants, utility functions, and additional API methods have been integrated into the main Vectorshift application. The package version has been updated, and new dependencies have been included.

Changes

File(s) Changed Change Summary
components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs
components/vectorshift/actions/create-pipeline/create-pipeline.mjs
components/vectorshift/actions/run-pipeline/run-pipeline.mjs
Added new action modules for adding data to a knowledge base, creating pipelines, and executing pipelines with defined properties, dynamic configuration, and error handling.
components/vectorshift/common/constants.mjs
components/vectorshift/common/utils.mjs
Introduced constants for data type and frequency options and utility functions (checkTmp, parseObject).
components/vectorshift/package.json Updated version from 0.0.1 to 0.1.0 and added dependencies for @pipedream/platform and fs.
components/vectorshift/sources/common/base.mjs Added a common base module for event sources with properties, methods, and hooks for event emission.
components/vectorshift/sources/new-chatbot/*
components/vectorshift/sources/new-knowledge-base/*
components/vectorshift/sources/new-pipeline/*
Introduced new event source modules (chatbot, knowledge base, pipeline) with unique deduplication, custom methods (getFunction, getSummary), and sample event emitters.
components/vectorshift/vectorshift.app.mjs Extended the Vectorshift app with new prop definitions (knowledgeBaseId, pipelineId) and comprehensive API methods (listPipelines, listKnowledgeBases, createPipeline, executePipeline, addDataToKnowledgeBase).

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant CP as Create-Pipeline Action
    participant VS as Vectorshift App
    participant API as API Server

    U->>CP: Trigger pipeline creation
    CP->>VS: Call createPipeline(config)
    VS->>API: _makeRequest({ path: "/pipelines/create", ... })
    API-->>VS: Pipeline ID response
    VS->>CP: Return pipeline summary
    CP-->>U: Output success summary
Loading
sequenceDiagram
    participant D as Deploy Hook
    participant CB as Common Base Module
    participant NS as New Knowledge Base Source
    participant API as API Server

    D->>CB: Invoke deploy hook
    CB->>NS: Call getFunction()
    NS->>API: Request list of knowledge bases
    API-->>NS: Return knowledge base list
    NS->>CB: Return list
    CB->>D: Emit events for each knowledge base
Loading

Assessment against linked issues

Objective Addressed Explanation
Polling Sources: Emit events for new pipeline, knowledge base, and chatbot ([#16018])
Actions: Implement create-pipeline, run-pipeline, and add-data-to-knowledge-base functionalities ([#16018])

Suggested reviewers

  • michelle0927

Poem

I'm a hopping rabbit full of cheer,
Codes and pipelines now appear,
I add data and run them right,
With every event, my heart takes flight,
In grids and flows our changes shine,
Happy hops and code divine!
🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs

Oops! Something went wrong! :(

ESLint: 8.57.1

Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs
at packageResolve (node:internal/modules/esm/resolve:839:9)
at moduleResolve (node:internal/modules/esm/resolve:908:18)
at defaultResolve (node:internal/modules/esm/resolve:1038:11)
at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:557:12)
at ModuleLoader.resolve (node:internal/modules/esm/loader:525:25)
at ModuleLoader.getModuleJob (node:internal/modules/esm/loader:246:38)
at ModuleJob._link (node:internal/modules/esm/module_job:126:49)


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a564ff and 4c0b3ab.

📒 Files selected for processing (1)
  • components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Verify TypeScript components
  • GitHub Check: pnpm publish
  • GitHub Check: Publish TypeScript components

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Sources
 - New Pipeline
 - New Knowledge Base
 - New Chatbot

Actions
 - Create Pipeline
 - Run Pipeline
 - Add Data To Knowledge Base
@luancazarine luancazarine marked this pull request as ready for review March 26, 2025 23:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (13)
components/vectorshift/actions/create-pipeline/create-pipeline.mjs (1)

30-47: Improve error handling for more robustness

The current error handling assumes the error message is always a valid JSON string, which might not be the case for all errors.

  } catch ({ message }) {
-    const parsedError = JSON.parse(message).error;
-    throw new ConfigurationError(parsedError);
+    try {
+      const parsedError = JSON.parse(message).error;
+      throw new ConfigurationError(parsedError);
+    } catch (e) {
+      // If error message isn't valid JSON, use the original message
+      throw new ConfigurationError(message);
+    }
  }
components/vectorshift/common/utils.mjs (2)

1-6: Add input validation to the checkTmp function

The function should handle edge cases like null, undefined, or empty string inputs.

export const checkTmp = (filename) => {
+  if (!filename) {
+    return "/tmp/undefined";
+  }
  if (!filename.startsWith("/tmp")) {
    return `/tmp/${filename}`;
  }
  return filename;
};

8-31: Enhance parseObject to handle nested objects

The current implementation handles arrays and top-level strings well, but doesn't recursively process nested objects within arrays or objects.

export const parseObject = (obj) => {
  if (!obj) return undefined;

  if (Array.isArray(obj)) {
    return obj.map((item) => {
-      if (typeof item === "string") {
-        try {
-          return JSON.parse(item);
-        } catch (e) {
-          return item;
-        }
-      }
-      return item;
+      return parseObject(item);
    });
  }
  if (typeof obj === "string") {
    try {
      return JSON.parse(obj);
    } catch (e) {
      return obj;
    }
  }
+  if (typeof obj === "object" && obj !== null) {
+    const result = {};
+    for (const key in obj) {
+      result[key] = parseObject(obj[key]);
+    }
+    return result;
+  }
  return obj;
};
components/vectorshift/sources/common/base.mjs (2)

15-32: Well-structured event emission logic

The emitEvent method follows good practices by:

  • Dynamically retrieving the function to call
  • Supporting optional limits for initial deployment
  • Correctly handling the emission of events with appropriate metadata

Consider adding JSDoc comments to document the expected return type of getFunction() and what the emitEvent method expects.


4-42: Verify error handling for API calls

The base module doesn't include error handling for API calls. If the call to fn() fails, it will throw an uncaught exception.

Consider adding try/catch blocks:

  async emitEvent(maxResults = false) {
    const fn = this.getFunction();
-   const { objects: response } = await fn();
+   try {
+     const { objects: response } = await fn();
+     
+     if (response.length) {
+       if (maxResults && (response.length > maxResults)) {
+         response.length = maxResults;
+       }
+     }
+
+     for (const item of response) {
+       this.$emit(item, {
+         id: item.id,
+         summary: this.getSummary(item),
+         ts: Date.parse(item.created || new Date()),
+       });
+     }
+   } catch (error) {
+     console.error("Error retrieving data:", error);
+     throw error;
+   }
-
-   if (response.length) {
-     if (maxResults && (response.length > maxResults)) {
-       response.length = maxResults;
-     }
-   }
-
-   for (const item of response) {
-     this.$emit(item, {
-       id: item.id,
-       summary: this.getSummary(item),
-       ts: Date.parse(item.created || new Date()),
-     });
-   }
  },
components/vectorshift/actions/run-pipeline/run-pipeline.mjs (1)

26-41: Improve error handling robustness

The current error handling makes an assumption that the error message is valid JSON, which could lead to unhandled exceptions if the API returns a non-JSON error.

Consider adding a try/catch block around the JSON parsing:

  } catch ({ message }) {
-    const parsedError = JSON.parse(message).error;
-    throw new ConfigurationError(parsedError);
+    try {
+      const parsedError = JSON.parse(message).error;
+      throw new ConfigurationError(parsedError);
+    } catch (e) {
+      // If parsing fails, use the original message
+      throw new ConfigurationError(`Failed to execute pipeline: ${message}`);
+    }
  }
components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs (2)

1-3: Fix typo in imported constant name

There's a typo in the imported constant name: RESCRAPE_FRANQUENCY_OPTIONS should be RESCRAPE_FREQUENCY_OPTIONS.

import {
-  DATA_TYPE_OPTIONS, RESCRAPE_FRANQUENCY_OPTIONS,
+  DATA_TYPE_OPTIONS, RESCRAPE_FREQUENCY_OPTIONS,
} from "../../common/constants.mjs";

78-92: Improve additionalProps implementation

The current implementation modifies the props object directly but returns an empty object. According to typical Pipedream patterns, this method should return the additional props to be added.

Consider refactoring to:

async additionalProps() {
  const additionalProps = {};
  
  if (this.dataType) {
-    props.url.hidden = true;
-    props.wikipedia.hidden = true;
-    props.youtube.hidden = true;
-    props.arxiv.hidden = true;
-    props.git.hidden = true;
-    props[this.dataType].hidden = false;
+    // Set all data type props to hidden except the selected one
+    ["url", "wikipedia", "youtube", "arxiv", "git"].forEach(type => {
+      additionalProps[type] = {
+        hidden: type !== this.dataType,
+      };
+    });

    const isUrl = this.dataType === "url";
-    props.rescrapeFrequency.hidden = !isUrl;
-    props.recursive.hidden = !isUrl;
+    additionalProps.rescrapeFrequency = {
+      hidden: !isUrl,
+    };
+    additionalProps.recursive = {
+      hidden: !isUrl,
+    };
  }
-  return {};
+  return additionalProps;
}
components/vectorshift/sources/new-knowledge-base/test-event.mjs (3)

6-7: Future dates used in test data

The start_date and end_date values are set to dates in October 2024, which are in the future. This could cause confusion or unexpected behavior when using this test data.

Use dates that are relative to the current date or use a fixed past date to ensure the test data remains valid:

-  "start_date": "2024-10-21",
-  "end_date": "2024-10-28",
+  "start_date": "2023-10-21",
+  "end_date": "2023-10-28",

13-14: Inconsistent data patterns

The course_description is empty while course_full_title contains meaningful data. This inconsistency could be confusing during testing or when using this as a template.

Ensure all fields have meaningful test data or explicitly set them to null:

-  "course_description": "",
+  "course_description": "This is a sample course description for testing purposes.",

22-23: Non-descriptive ID values in string format

The fields teachers and classrooms contain string IDs that don't provide meaningful context. Without additional context, it's difficult to understand what these values represent.

Consider using arrays of objects with more descriptive properties or add comments to explain these values:

-  "teachers": "66792",
-  "classrooms": "39627",
+  "teachers": [
+    {
+      "id": "66792",
+      "name": "John Doe"
+    }
+  ],
+  "classrooms": [
+    {
+      "id": "39627",
+      "name": "Online Room 1"
+    }
+  ],
components/vectorshift/vectorshift.app.mjs (2)

37-43: Inconsistent null checking between options methods

The knowledgeBaseId options method uses optional chaining and a fallback (objects?.map(...) || []), while the pipelineId options method assumes objects will always exist without any fallback.

Standardize the pattern for both methods to improve consistency:

-  return objects.map(({
+  return objects?.map(({
   _id: value, name: label,
 }) => ({
   label,
   value,
-}));
+})) || [];

83-91: Missing validation in executePipeline method

The executePipeline method doesn't validate if pipelineId is provided before making the API request. Without validation, this could lead to invalid API calls.

Add validation for required parameters:

executePipeline({
  pipelineId, ...opts
}) {
+  if (!pipelineId) {
+    throw new Error("Pipeline ID is required");
+  }
  return this._makeRequest({
    method: "POST",
    path: `/pipeline/${pipelineId}/run`,
    ...opts,
  });
},

Apply similar validation to the addDataToKnowledgeBase method as well.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3aea15a and bdce38d.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (14)
  • components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs (1 hunks)
  • components/vectorshift/actions/create-pipeline/create-pipeline.mjs (1 hunks)
  • components/vectorshift/actions/run-pipeline/run-pipeline.mjs (1 hunks)
  • components/vectorshift/common/constants.mjs (1 hunks)
  • components/vectorshift/common/utils.mjs (1 hunks)
  • components/vectorshift/package.json (2 hunks)
  • components/vectorshift/sources/common/base.mjs (1 hunks)
  • components/vectorshift/sources/new-chatbot/new-chatbot.mjs (1 hunks)
  • components/vectorshift/sources/new-chatbot/test-event.mjs (1 hunks)
  • components/vectorshift/sources/new-knowledge-base/new-knowledge-base.mjs (1 hunks)
  • components/vectorshift/sources/new-knowledge-base/test-event.mjs (1 hunks)
  • components/vectorshift/sources/new-pipeline/new-pipeline.mjs (1 hunks)
  • components/vectorshift/sources/new-pipeline/test-event.mjs (1 hunks)
  • components/vectorshift/vectorshift.app.mjs (1 hunks)
🔇 Additional comments (11)
components/vectorshift/common/constants.mjs (1)

1-7: LGTM! Data type options are well-defined.

The DATA_TYPE_OPTIONS array provides a clean list of supported data types for VectorShift operations. These values likely correspond to different data sources that can be processed by the system.

components/vectorshift/package.json (2)

3-3: Version bump looks appropriate.

The version update from 0.0.1 to 0.1.0 follows semantic versioning conventions, indicating new features without breaking changes.


15-17:

✅ Verification successful

Remove unnecessary "fs" dependency

The "fs" module is a core Node.js module and doesn't need to be installed as a dependency. The "-security" suffix suggests this is just a placeholder package.

  "dependencies": {
    "@pipedream/platform": "^3.0.3",
-    "fs": "^0.0.1-security"
  }

🏁 Script executed:

#!/bin/bash
# Verify if fs is used as an import from node_modules or as a core module

echo "Checking for fs imports in the codebase..."
# Search for imports from the fs package (would indicate reliance on the npm package)
rg "from\s+['\"]fs['\"]|require\s*\(\s*['\"]fs['\"]\s*\)" -g "*.{js,mjs}" --stats

echo "The fs module should be imported as a core module using:"
echo "import * as fs from 'node:fs';"
echo "or"
echo "const fs = require('node:fs');"

Length of output: 19144


Remove unnecessary "fs" dependency from package.json

The "fs" module is a core Node.js module and doesn’t need to be added as an npm dependency. Our verification confirms that all imports (e.g., import fs from "fs"; or const fs = require("fs");) resolve to the built-in module. Please remove the following line from your package.json:

  "dependencies": {
    "@pipedream/platform": "^3.0.3",
-    "fs": "^0.0.1-security"
  }
components/vectorshift/actions/create-pipeline/create-pipeline.mjs (2)

1-4: Imports look appropriate.

The imports are correctly organized, including the needed platform, utilities, and app modules.


5-29: Component definition is well-structured.

The component metadata and props are clearly defined with appropriate descriptions and types.

components/vectorshift/sources/new-pipeline/new-pipeline.mjs (1)

1-22:

✅ Verification successful

Well-structured source component

The pipeline source component correctly extends the common base module and implements the required methods:

  • getFunction() returns the appropriate API method
  • getSummary() generates a meaningful summary for events

Since this component uses the same test data as the knowledge base component, verify that the VectorShift API returns data in the expected format:


🏁 Script executed:

#!/bin/bash
# Check all the test event files to see if they're using the same sample data
grep -r "course_title" --include="*.mjs" components/vectorshift/sources/

Length of output: 353


Approval: New Pipeline Source Component is Consistent and Correct

The new pipeline source component is well-structured and properly extends the common base module. Both required methods—getFunction() and getSummary()—are implemented as expected. Additionally, the test event file for this component uses the same sample data ("course_title": "Class title") as the other VectorShift source components, confirming that the VectorShift API returns data in the expected format.

  • The component correctly imports and spreads the common base module.
  • The test event sample data is consistent across related modules.
  • No changes are required.
components/vectorshift/sources/new-chatbot/new-chatbot.mjs (1)

4-22: Component structure looks good!

This new source component for detecting Vectorshift chatbot creations follows a good pattern by extending the common base and implementing the necessary methods. The dedupe setting will prevent duplicate events, and the getSummary method properly handles unnamed chatbots.

components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs (1)

28-32:

❓ Verification inconclusive

Clarify file upload handling

The file property description mentions providing a file from /tmp, but there's no validation or guidance on how the file data should be formatted when passed to the API.

Please clarify:

  1. What format is expected for the file data? (file path, file content, buffer, etc.)
  2. Is there any size limit or file type restriction?
  3. How is the actual file upload handled by the underlying API?

🌐 Web query:

How to upload files to Vectorshift knowledge base API?

💡 Result:

To upload files to a VectorShift Knowledge Base via their API, you can use the /v1/knowledge-base/{id}/index POST endpoint with the required file metadata and authentication. Here’s a step-by-step guide:

API Request Structure

Endpoint:

POST https://api.vectorshift.ai/v1/knowledge-base/{knowledge_base_id}/index

Headers:

  • Authorization: Bearer <API_TOKEN>
  • Content-Type: application/json

Body (JSON):

{
  "file_data": {
    "file": {
      "type": "file",
      "metadata": {
        "name": "<filename>",
        "path": "<path>",
        "mime_type": "<mime_type>"
      },
      "custom_metadata": {}
    }
  }
}

Replace {knowledge_base_id} with your Knowledge Base ID and fill in the name, path (optional file path), and mime_type (e.g., application/pdf)[2][7].

Example cURL Command

curl --request POST \
  --url https://api.vectorshift.ai/v1/knowledge-base/KB_123/index \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "file_data": {
      "file": {
        "type": "file",
        "metadata": {
          "name": "report.pdf",
          "mime_type": "application/pdf"
        },
        "custom_metadata": {}
      }
    }
  }'

Key Parameters

  • File Metadata:
    • name: Name of the file.
    • mime_type: Valid MIME type (e.g., text/csv, application/pdf).
    • path: Optional directory path for organizing files[2][9].
  • Custom Metadata: Add optional key-value pairs for filtering/querying[2].

Response Handling

On success, the API returns:

{
  "status": "success",
  "document_ids": ["DOC_456"]
}

Use document_ids to reference or delete the uploaded file later[2][9].

For advanced configurations like chunking or AI enrichment (e.g., OCR), use the config field in the request body to specify parameters such as chunk_size or enrichment_tasks[2][7].

Citations:


Clarify File Upload Handling in add-data-to-knowledge-base

The current schema for "File Data" only specifies a string (i.e., a file from /tmp), but it isn’t clear how that string should be interpreted or transformed before uploading. Based on the Vectorshift Knowledge Base API, please update the documentation to clarify the following:

  • Expected Format:
    Confirm whether the input should be a simple file path (i.e., a string referencing a file in /tmp) or if it needs conversion into a structured file object. Keep in mind that the Vectorshift API expects a JSON payload under the file_data key containing a nested file object with metadata (e.g., name, mime_type, and optionally path).

  • File Restrictions:
    Indicate if there are any imposed limitations such as maximum file size or permitted MIME types, whether these are validated locally or by the API.

  • Upload Implementation:
    Explain how the file upload is processed. For instance, describe any steps that read the file from /tmp and convert its contents into the required JSON representation before being sent to the /v1/knowledge-base/{knowledge_base_id}/index endpoint.

Please refer to the Vectorshift Knowledge Base API documentation for detailed guidance on constructing the upload payload.

components/vectorshift/vectorshift.app.mjs (3)

6-45: Comprehensive property definitions for API integration

The property definitions for knowledgeBaseId and pipelineId are well structured with proper types, labels, descriptions, and dynamic options. This approach makes the component user-friendly in the Pipedream UI.


47-63: Well-structured API request helper methods

The helper methods _baseUrl, _headers, and _makeRequest are well-organized and follow good practices for centralizing API request logic. This approach makes it easier to maintain and update the API integration.


64-100: API methods have consistent structure and naming

The API methods (listPipelines, listKnowledgeBases, createPipeline, executePipeline, and addDataToKnowledgeBase) follow a consistent pattern, making the code more maintainable and easier to understand.

luancazarine and others added 5 commits March 26, 2025 20:21
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sources
 - New Pipeline
 - New Knowledge Base
 - New Chatbot

Actions
 - Create Pipeline
 - Run Pipeline
 - Add Data To Knowledge Base
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
components/vectorshift/vectorshift.app.mjs (1)

30-39: 🛠️ Refactor suggestion

Add error handling to options methods

The options methods for both knowledgeBaseId and pipelineId don't include error handling. If the API calls fail, this could cause the UI to break when displaying dropdown options.

async options() {
+  try {
    const { objects } = await this.listPipelines();

    return objects?.map(({
      _id: value, name: label,
    }) => ({
      label,
      value,
    })) || [];
+  } catch (error) {
+    console.error("Failed to fetch pipelines", error);
+    return [];
+  }
}
🧹 Nitpick comments (8)
components/vectorshift/sources/new-pipeline/test-event.mjs (1)

1-10: Improve the test event sample data to be more realistic and comprehensive.

The sample data for a VectorShift pipeline is significantly better than the previous version with course-related data. However, a few improvements would make it more useful:

  1. Use past dates instead of future dates (currently set to 2025-03-31)
  2. Include sample nodes in the nodes object to demonstrate expected structure
  3. Make the name more descriptive (e.g., "Document Processing Pipeline")
  4. Consider varying the IDs between fields for clarity
 export default {
   "_id": "6709e0f530eacae6703df110",
-  "createdDate": "2025-03-31T17:03:31.277Z",
+  "createdDate": "2024-03-15T14:23:18.277Z",
   "mainBranch": "6709e0f530eacae6703df110",
-  "modifiedDate": "2025-03-31T17:03:31.277Z",
+  "modifiedDate": "2024-03-16T09:45:21.511Z",
-  "name": "Pipeline Name",
+  "name": "Document Processing Pipeline",
-  "nodes": {},
+  "nodes": {
+    "node_1": {
+      "id": "node_1",
+      "type": "input",
+      "name": "Document Input"
+    },
+    "node_2": {
+      "id": "node_2",
+      "type": "processor",
+      "name": "Text Extraction"
+    }
+  },
   "orgID": "Personal",
   "userID": "auth0|6709e0f530eacae6703df110"
 }
components/vectorshift/sources/new-chatbot/test-event.mjs (1)

12-13: Consider using realistic dates for the test event.

The createdDate and modifiedDate fields are set to March 31, 2025, which is in the future. While this works for testing purposes, consider using realistic dates (current or past dates) to better simulate actual API responses and avoid potential validation issues.

-  "createdDate": "2025-03-31T17:24:39.768Z",
+  "createdDate": "2023-03-31T17:24:39.768Z",
   "deployed": true,
   ...
-  "modifiedDate": "2025-03-31T17:24:39.768Z",
+  "modifiedDate": "2023-03-31T17:24:39.768Z",

Also applies to: 123-124

components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs (3)

18-22: Add required attribute to URL prop

The URL property should be marked as required since it's essential for adding data to the knowledge base. Without a URL, the action would fail with an API error rather than providing a clear validation message to the user.

url: {
  type: "string",
  label: "URL",
  description: "URL to add to the knowledge base",
+  required: true,
},

18-22: Add URL format validation

The URL property accepts any string, but should be validated to ensure it's a properly formatted URL before making the API request.

url: {
  type: "string",
  label: "URL",
  description: "URL to add to the knowledge base",
+  pattern: "^https?://.*",
+  errorMessage: "Please enter a valid URL starting with http:// or https://",
},

53-53: Add defensive programming to handle response

The code assumes response.document_ids always exists and is an array. If the API response structure changes or returns an error, this could lead to runtime errors.

-$.export("$summary", `Added ${response.document_ids.length} document(s) to knowledge base ${this.knowledgeBaseId}. Document IDs: ${response.document_ids.join(", ")}`);
+const documentIds = response.document_ids || [];
+$.export("$summary", `Added ${documentIds.length} document(s) to knowledge base ${this.knowledgeBaseId}${documentIds.length ? `. Document IDs: ${documentIds.join(", ")}` : ""}`);
components/vectorshift/vectorshift.app.mjs (3)

104-112: Add input validation for executePipeline method

The executePipeline method doesn't validate that pipelineId is provided before making the API request, which could lead to a confusing API error.

executePipeline({
  pipelineId, ...opts
}) {
+  if (!pipelineId) {
+    throw new Error("Pipeline ID is required");
+  }
  return this._makeRequest({
    method: "POST",
    path: `/pipeline/${pipelineId}/run`,
    ...opts,
  });
},

67-69: Set consistent default values for verbose parameter

The verbose parameter is hardcoded to true in list methods, making it impossible to override. This reduces flexibility if a non-verbose response is needed for performance reasons.

params: {
-  verbose: true,
+  verbose: params.verbose ?? true,
  ...params,
},

55-59: Add timeout and response validation to API requests

The current implementation doesn't set a timeout for API requests, which could lead to hanging requests if the VectorShift API is slow to respond.

return axios($, {
  url: this._baseUrl() + path,
  headers: this._headers(),
+  timeout: 30000, // 30 seconds timeout
  ...opts,
});
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f56e964 and 6a564ff.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (9)
  • components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs (1 hunks)
  • components/vectorshift/sources/common/base.mjs (1 hunks)
  • components/vectorshift/sources/new-chatbot/new-chatbot.mjs (1 hunks)
  • components/vectorshift/sources/new-chatbot/test-event.mjs (1 hunks)
  • components/vectorshift/sources/new-knowledge-base/new-knowledge-base.mjs (1 hunks)
  • components/vectorshift/sources/new-knowledge-base/test-event.mjs (1 hunks)
  • components/vectorshift/sources/new-pipeline/new-pipeline.mjs (1 hunks)
  • components/vectorshift/sources/new-pipeline/test-event.mjs (1 hunks)
  • components/vectorshift/vectorshift.app.mjs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • components/vectorshift/sources/new-knowledge-base/new-knowledge-base.mjs
  • components/vectorshift/sources/new-knowledge-base/test-event.mjs
  • components/vectorshift/sources/common/base.mjs
  • components/vectorshift/sources/new-chatbot/new-chatbot.mjs
  • components/vectorshift/sources/new-pipeline/new-pipeline.mjs
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Publish TypeScript components
  • GitHub Check: pnpm publish
  • GitHub Check: Verify TypeScript components
🔇 Additional comments (5)
components/vectorshift/sources/new-chatbot/test-event.mjs (3)

1-145: Test event structure now aligns correctly with chatbot component purpose.

This test event now contains appropriate chatbot configuration data with all the expected fields for a Vectorshift chatbot, which addresses the previous review comment about course-related data inconsistency.

The test event includes comprehensive configuration including access settings, deployment options, UI customization parameters, and integration configurations for Slack and Twilio.


2-2: Ensure ID consistency throughout the test event.

The same ID value "67ea096700fcfd7de5313ffa" is used for _id, mainBranch, and pipeline.object_id. While this might be intentional if these are meant to refer to the same entity, verify that this represents a realistic relationship between these fields in the actual API responses.

Also applies to: 122-122, 129-129


96-96: Inconsistent document upload settings.

There's an inconsistency between deployment_options.showDocumentUpload (line 97) and the top-level show_document_upload field (line 134). Both control document upload functionality but have different naming conventions. Ensure this accurately reflects the API structure.

Also applies to: 134-134

components/vectorshift/actions/add-data-to-knowledge-base/add-data-to-knowledge-base.mjs (1)

1-56: Add validation for URL data type

The code doesn't validate the data type before attempting to add data to the knowledge base. While the current implementation only supports URL data, it should validate this to prevent potential issues.

components/vectorshift/vectorshift.app.mjs (1)

52-60: Add error handling to _makeRequest method

The _makeRequest method doesn't handle API errors, which could lead to confusing error messages for users.

lcaresia
lcaresia previously approved these changes Apr 1, 2025
@luancazarine
Copy link
Collaborator Author

/approve

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
ai-assisted Content generated by AI, with human refinement and modification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Components] vectorshift
2 participants