Skip to content

Commit

Permalink
add an OpenAI-compatible provider as a generic Enterprise LLM adapter
Browse files Browse the repository at this point in the history
Increasingly, LLM software is standardizing around the use of OpenAI-esque
compatible endpoints. Some examples:

* [OpenLLM](https://github.com/bentoml/OpenLLM) (commonly used to self-host/deploy various LLMs in enterprises)
* [Huggingface TGI](huggingface/text-generation-inference#735) (and, by extension, [AWS SageMaker](https://aws.amazon.com/blogs/machine-learning/announcing-the-launch-of-new-hugging-face-llm-inference-containers-on-amazon-sagemaker/))
* [Ollama](https://github.com/ollama/ollama) (commonly used for running LLMs locally, useful for local testing)

All of these projects either have OpenAI-compatible API endpoints already,
or are actively building out support for it. On strat we are regularly
working with enterprise customers that self-host their own specific-model
LLM via one of these methods, and wish for Cody to consume an OpenAI
endpoint (understanding some specific model is on the other side and that
Cody should optimize for / target that specific model.)

Since Cody needs to tailor to a specific model (prompt generation, stop
sequences, context limits, timeouts, etc.) and handle other provider-specific
nuances, it is insufficient to simply expect that a customer-provided OpenAI
compatible endpoint is in fact 1:1 compatible with e.g. GPT-3.5 or GPT-4.
We need to be able to configure/tune many of these aspects to the specific
provider/model, even though it presents as an OpenAI endpoint.

In response to these needs, I am working on adding an 'OpenAI-compatible'
provider proper: the ability for a Sourcegraph enterprise instance to
advertise that although it is connected to an OpenAI compatible endpoint,
there is in fact a specific model on the other side (starting with Starchat
and Starcoder) and that Cody should target that configuration. The _first
step_ of this work is this change.

After this change, an existing (current-version) Sourcegraph enterprise
instance can configure an OpenAI endpoint for completions via the site
config such as:

```
  "cody.enabled": true,
  "completions": {
    "provider": "openai",
    "accessToken": "asdf",
    "endpoint": "http://openllm.foobar.com:3000",
    "completionModel": "gpt-4",
    "chatModel": "gpt-4",
    "fastChatModel": "gpt-4",
  },
```

The `gpt-4` model parameters will be sent to the OpenAI-compatible endpoint
specified, but will otherwise be unused today. Users may then specify in
their VS Code configuration that Cody should treat the LLM on the other
side as if it were e.g. Starchat:

```
    "cody.autocomplete.advanced.provider": "experimental-openaicompatible",
    "cody.autocomplete.advanced.model": "starchat-16b-beta",
    "cody.autocomplete.advanced.timeout.multiline": 10000,
    "cody.autocomplete.advanced.timeout.singleline": 10000,
```

In the future, we will make it possible to configure the above options
via the Sourcegraph site configuration instead of each user needing to
configure it in their VS Code settings explicitly.

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
  • Loading branch information
Stephen Gutekanst committed Feb 20, 2024
1 parent f16ebbe commit b7dd682
Show file tree
Hide file tree
Showing 8 changed files with 477 additions and 6 deletions.
2 changes: 1 addition & 1 deletion lib/shared/src/chat/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ export class ChatClient {
// HACK: The fireworks chat inference endpoints requires the last message to be from a
// human. This will be the case in most of the prompts but if for some reason we have an
// assistant at the end, we slice the last message for now.
params?.model?.startsWith('fireworks/')
params?.model?.startsWith('fireworks/') || params?.model?.startsWith('openaicompatible/')
? isLastMessageFromHuman
? messages
: messages.slice(0, -1)
Expand Down
1 change: 1 addition & 0 deletions lib/shared/src/configuration.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ export interface Configuration {
| 'anthropic'
| 'fireworks'
| 'unstable-openai'
| 'experimental-openaicompatible'
| 'experimental-ollama'
| null
autocompleteAdvancedModel: string | null
Expand Down
6 changes: 5 additions & 1 deletion lib/shared/src/sourcegraph-api/completions/client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,12 @@ export abstract class SourcegraphCompletionsClient {
params: CompletionParameters,
signal?: AbortSignal
): AsyncGenerator<CompletionGeneratorValue> {
// This is a technique to convert a function that takes callbacks to an async generator.
// Provide default stop sequence for starchat models.
if (!params.stopSequences && params?.model?.startsWith('openaicompatible/starchat')) {
params.stopSequences = ['<|end|>']
}

// This is a technique to convert a function that takes callbacks to an async generator.
const values: Promise<CompletionGeneratorValue>[] = []
let resolve: ((value: CompletionGeneratorValue) => void) | undefined
values.push(
Expand Down
7 changes: 4 additions & 3 deletions vscode/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -902,7 +902,7 @@
"cody.autocomplete.advanced.provider": {
"type": "string",
"default": null,
"enum": [null, "anthropic", "fireworks", "unstable-openai", "experimental-ollama"],
"enum": [null, "anthropic", "fireworks", "unstable-openai", "experimental-openaicompatible", "experimental-ollama"],
"markdownDescription": "The provider used for code autocomplete. Most providers other than `anthropic` require the `cody.autocomplete.advanced.serverEndpoint` and `cody.autocomplete.advanced.accessToken` settings to also be set. Check the Cody output channel for error messages if autocomplete is not working as expected."
},
"cody.autocomplete.advanced.serverEndpoint": {
Expand All @@ -925,9 +925,10 @@
"llama-code-7b",
"llama-code-13b",
"llama-code-13b-instruct",
"mistral-7b-instruct-4k"
"mistral-7b-instruct-4k",
"starchat-16b-beta"
],
"markdownDescription": "Overwrite the model used for code autocompletion inference. This is only supported with the `fireworks` provider"
"markdownDescription": "Overwrite the model used for code autocompletion inference. This is only supported with the `fireworks` and 'experimental-openaicompatible' providers"
},
"cody.autocomplete.completeSuggestWidgetSelection": {
"type": "boolean",
Expand Down
24 changes: 24 additions & 0 deletions vscode/src/completions/providers/create-provider.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,30 @@ describe('createProviderConfig', () => {
expect(provider?.model).toBe('starcoder-hybrid')
})

// TODO: test 'openaicompatible'
// it('returns "fireworks" provider config and corresponding model if specified', async () => {
// const provider = await createProviderConfig(
// getVSCodeConfigurationWithAccessToken({
// autocompleteAdvancedProvider: 'fireworks',
// autocompleteAdvancedModel: 'starcoder-7b',
// }),
// dummyCodeCompletionsClient,
// dummyAuthStatus
// )
// expect(provider?.identifier).toBe('fireworks')
// expect(provider?.model).toBe('starcoder-7b')
// })

// it('returns "fireworks" provider config if specified in settings and default model', async () => {
// const provider = await createProviderConfig(
// getVSCodeConfigurationWithAccessToken({ autocompleteAdvancedProvider: 'fireworks' }),
// dummyCodeCompletionsClient,
// dummyAuthStatus
// )
// expect(provider?.identifier).toBe('fireworks')
// expect(provider?.model).toBe('starcoder-hybrid')
// })

it('returns "openai" provider config if specified in VSCode settings; model is ignored', async () => {
const provider = await createProviderConfig(
getVSCodeConfigurationWithAccessToken({
Expand Down
18 changes: 18 additions & 0 deletions vscode/src/completions/providers/create-provider.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import {
createProviderConfig as createFireworksProviderConfig,
type FireworksOptions,
} from './fireworks'
import { createProviderConfig as createOpenAICompatibleProviderConfig } from './openaicompatible'
import type { ProviderConfig } from './provider'
import { createProviderConfig as createExperimentalOllamaProviderConfig } from './experimental-ollama'
import { createProviderConfig as createUnstableOpenAIProviderConfig } from './unstable-openai'
Expand Down Expand Up @@ -49,6 +50,15 @@ export async function createProviderConfig(
case 'anthropic': {
return createAnthropicProviderConfig({ client })
}
case 'experimental-openaicompatible': {
return createOpenAICompatibleProviderConfig({
client,
model: config.autocompleteAdvancedModel ?? model ?? null,
timeouts: config.autocompleteTimeouts,
authStatus,
config,
})
}
case 'experimental-ollama':
case 'unstable-ollama': {
return createExperimentalOllamaProviderConfig(
Expand Down Expand Up @@ -99,6 +109,14 @@ export async function createProviderConfig(
authStatus,
config,
})
case 'experimental-openaicompatible':
return createOpenAICompatibleProviderConfig({
client,
timeouts: config.autocompleteTimeouts,
model: model ?? null,
authStatus,
config,
})
case 'aws-bedrock':
case 'anthropic':
return createAnthropicProviderConfig({
Expand Down
Loading

0 comments on commit b7dd682

Please # to comment.