add an OpenAI-compatible provider as a generic Enterprise LLM adapter

Increasingly, LLM software is standardizing around the use of OpenAI-esque compatible endpoints. Some examples: * [OpenLLM](https://github.com/bentoml/OpenLLM) (commonly used to self-host/deploy various LLMs in enterprises) * [Huggingface TGI](huggingface/text-generation-inference#735) (and, by extension, [AWS SageMaker](https://aws.amazon.com/blogs/machine-learning/announcing-the-launch-of-new-hugging-face-llm-inference-containers-on-amazon-sagemaker/)) * [Ollama](https://github.com/ollama/ollama) (commonly used for running LLMs locally, useful for local testing) All of these projects either have OpenAI-compatible API endpoints already, or are actively building out support for it. On strat we are regularly working with enterprise customers that self-host their own specific-model LLM via one of these methods, and wish for Cody to consume an OpenAI endpoint (understanding some specific model is on the other side and that Cody should optimize for / target that specific model.) Since Cody needs to tailor to a specific model (prompt generation, stop sequences, context limits, timeouts, etc.) and handle other provider-specific nuances, it is insufficient to simply expect that a customer-provided OpenAI compatible endpoint is in fact 1:1 compatible with e.g. GPT-3.5 or GPT-4. We need to be able to configure/tune many of these aspects to the specific provider/model, even though it presents as an OpenAI endpoint. In response to these needs, I am working on adding an 'OpenAI-compatible' provider proper: the ability for a Sourcegraph enterprise instance to advertise that although it is connected to an OpenAI compatible endpoint, there is in fact a specific model on the other side (starting with Starchat and Starcoder) and that Cody should target that configuration. The _first step_ of this work is this change. After this change, an existing (current-version) Sourcegraph enterprise instance can configure an OpenAI endpoint for completions via the site config such as: ``` "cody.enabled": true, "completions": { "provider": "openai", "accessToken": "asdf", "endpoint": "http://openllm.foobar.com:3000", "completionModel": "gpt-4", "chatModel": "gpt-4", "fastChatModel": "gpt-4", }, ``` The `gpt-4` model parameters will be sent to the OpenAI-compatible endpoint specified, but will otherwise be unused today. Users may then specify in their VS Code configuration that Cody should treat the LLM on the other side as if it were e.g. Starchat: ``` "cody.autocomplete.advanced.provider": "experimental-openaicompatible", "cody.autocomplete.advanced.model": "starchat-16b-beta", "cody.autocomplete.advanced.timeout.multiline": 10000, "cody.autocomplete.advanced.timeout.singleline": 10000, ``` In the future, we will make it possible to configure the above options via the Sourcegraph site configuration instead of each user needing to configure it in their VS Code settings explicitly. Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
sourcegraph · Feb 20, 2024 · b7dd682 · b7dd682
1 parent f16ebbe
commit b7dd682
Show file tree

Hide file tree

Showing 8 changed files with 477 additions and 6 deletions.
diff --git a/lib/shared/src/chat/chat.ts b/lib/shared/src/chat/chat.ts
@@ -29,7 +29,7 @@ export class ChatClient {
             // HACK: The fireworks chat inference endpoints requires the last message to be from a
             // human. This will be the case in most of the prompts but if for some reason we have an
             // assistant at the end, we slice the last message for now.
-            params?.model?.startsWith('fireworks/')
+            params?.model?.startsWith('fireworks/') || params?.model?.startsWith('openaicompatible/')
                 ? isLastMessageFromHuman
                     ? messages
                     : messages.slice(0, -1)

diff --git a/lib/shared/src/configuration.ts b/lib/shared/src/configuration.ts
@@ -36,6 +36,7 @@ export interface Configuration {
         | 'anthropic'
         | 'fireworks'
         | 'unstable-openai'
+        | 'experimental-openaicompatible'
         | 'experimental-ollama'
         | null
     autocompleteAdvancedModel: string | null

diff --git a/lib/shared/src/sourcegraph-api/completions/client.ts b/lib/shared/src/sourcegraph-api/completions/client.ts
@@ -92,8 +92,12 @@ export abstract class SourcegraphCompletionsClient {
         params: CompletionParameters,
         signal?: AbortSignal
     ): AsyncGenerator<CompletionGeneratorValue> {
-        // This is a technique to convert a function that takes callbacks to an async generator.
+        // Provide default stop sequence for starchat models.
+        if (!params.stopSequences && params?.model?.startsWith('openaicompatible/starchat')) {
+            params.stopSequences = ['<|end|>']
+        }
 
+        // This is a technique to convert a function that takes callbacks to an async generator.
         const values: Promise<CompletionGeneratorValue>[] = []
         let resolve: ((value: CompletionGeneratorValue) => void) | undefined
         values.push(

diff --git a/vscode/package.json b/vscode/package.json
@@ -902,7 +902,7 @@
                 "cody.autocomplete.advanced.provider": {
                     "type": "string",
                     "default": null,
-                    "enum": [null, "anthropic", "fireworks", "unstable-openai", "experimental-ollama"],
+                    "enum": [null, "anthropic", "fireworks", "unstable-openai", "experimental-openaicompatible", "experimental-ollama"],
                     "markdownDescription": "The provider used for code autocomplete. Most providers other than `anthropic` require the `cody.autocomplete.advanced.serverEndpoint` and `cody.autocomplete.advanced.accessToken` settings to also be set. Check the Cody output channel for error messages if autocomplete is not working as expected."
                 },
                 "cody.autocomplete.advanced.serverEndpoint": {
@@ -925,9 +925,10 @@
                         "llama-code-7b",
                         "llama-code-13b",
                         "llama-code-13b-instruct",
-                        "mistral-7b-instruct-4k"
+                        "mistral-7b-instruct-4k",
+                        "starchat-16b-beta"
                     ],
-                    "markdownDescription": "Overwrite the  model used for code autocompletion inference. This is only supported with the `fireworks` provider"
+                    "markdownDescription": "Overwrite the model used for code autocompletion inference. This is only supported with the `fireworks` and 'experimental-openaicompatible' providers"
                 },
                 "cody.autocomplete.completeSuggestWidgetSelection": {
                     "type": "boolean",

diff --git a/vscode/src/completions/providers/create-provider.test.ts b/vscode/src/completions/providers/create-provider.test.ts
@@ -86,6 +86,30 @@ describe('createProviderConfig', () => {
             expect(provider?.model).toBe('starcoder-hybrid')
         })
 
+        // TODO: test 'openaicompatible'
+        // it('returns "fireworks" provider config and corresponding model if specified', async () => {
+        //     const provider = await createProviderConfig(
+        //         getVSCodeConfigurationWithAccessToken({
+        //             autocompleteAdvancedProvider: 'fireworks',
+        //             autocompleteAdvancedModel: 'starcoder-7b',
+        //         }),
+        //         dummyCodeCompletionsClient,
+        //         dummyAuthStatus
+        //     )
+        //     expect(provider?.identifier).toBe('fireworks')
+        //     expect(provider?.model).toBe('starcoder-7b')
+        // })
+
+        // it('returns "fireworks" provider config if specified in settings and default model', async () => {
+        //     const provider = await createProviderConfig(
+        //         getVSCodeConfigurationWithAccessToken({ autocompleteAdvancedProvider: 'fireworks' }),
+        //         dummyCodeCompletionsClient,
+        //         dummyAuthStatus
+        //     )
+        //     expect(provider?.identifier).toBe('fireworks')
+        //     expect(provider?.model).toBe('starcoder-hybrid')
+        // })
+
         it('returns "openai" provider config if specified in VSCode settings; model is ignored', async () => {
             const provider = await createProviderConfig(
                 getVSCodeConfigurationWithAccessToken({

diff --git a/vscode/src/completions/providers/create-provider.ts b/vscode/src/completions/providers/create-provider.ts
@@ -12,6 +12,7 @@ import {
     createProviderConfig as createFireworksProviderConfig,
     type FireworksOptions,
 } from './fireworks'
+import { createProviderConfig as createOpenAICompatibleProviderConfig } from './openaicompatible'
 import type { ProviderConfig } from './provider'
 import { createProviderConfig as createExperimentalOllamaProviderConfig } from './experimental-ollama'
 import { createProviderConfig as createUnstableOpenAIProviderConfig } from './unstable-openai'
@@ -49,6 +50,15 @@ export async function createProviderConfig(
             case 'anthropic': {
                 return createAnthropicProviderConfig({ client })
             }
+            case 'experimental-openaicompatible': {
+                return createOpenAICompatibleProviderConfig({
+                    client,
+                    model: config.autocompleteAdvancedModel ?? model ?? null,
+                    timeouts: config.autocompleteTimeouts,
+                    authStatus,
+                    config,
+                })
+            }
             case 'experimental-ollama':
             case 'unstable-ollama': {
                 return createExperimentalOllamaProviderConfig(
@@ -99,6 +109,14 @@ export async function createProviderConfig(
                     authStatus,
                     config,
                 })
+            case 'experimental-openaicompatible':
+                return createOpenAICompatibleProviderConfig({
+                    client,
+                    timeouts: config.autocompleteTimeouts,
+                    model: model ?? null,
+                    authStatus,
+                    config,
+                })
             case 'aws-bedrock':
             case 'anthropic':
                 return createAnthropicProviderConfig({