server : add VSCode's Github Copilot Chat support #12896

ggerganov · 2025-04-11T14:17:24Z

Overview

VSCode recently added support to use local models with Github Copilot Chat:

https://code.visualstudio.com/updates/v1_99#_bring-your-own-key-byok-preview

This PR adds compatibility of llama-server with this feature.

Usage

Start a llama-server on port 11434 with an instruct model of your choice. For example, using Qwen 2.5 Coder Instruct 3B:

# downloads ~3GB of data

llama-server \
    -hf ggml-org/Qwen2.5-Coder-3B-Instruct-Q8_0-GGUF \
    --port 11434 -fa -ngl 99 -c 0

In VSCode -> Chat -> Manage models -> select "Ollama" (not sure why it is called like this):
Select the available model from the list and click "OK":
Enjoy local AI assistance using vanilla llama.cpp:
Advanced context reuse for faster prompt reprocessing can be enabled by adding --cache-reuse 256 to the llama-server command

Speculative decoding is also supported. Simply start the llama-server like this for example:

llama-server \
    -m  ./models/qwen2.5-32b-coder-instruct/ggml-model-q8_0.gguf \
    -md ./models/qwen2.5-1.5b-coder-instruct/ggml-model-q4_0.gguf \
    --port 11434 -fa -ngl 99 -ngld 99 -c 0 --cache-reuse 256

examples/server/server.cpp

ExtReMLapin · 2025-04-12T02:07:20Z

select "Ollama" (not sure why it is called like this):

Sounds like someone just got Edison'd 🤡

ericcurtin · 2025-04-16T20:57:41Z

There's a lot of tools like this, that work, but don't explicitly say llama.cpp, open-webui is another one (ramalama serve is just vanilla llama-server, but we try and make it easier to use, easier to pull accelerator runtimes and models):

https://github.com/open-webui/docs/pull/455/files

In RamaLama we are going to create a proxy that forks llama-server processes to mimic Ollama to make it even easier to use everyday llama-server.

With most tools if you select generic OpenAI endpoint, llama-server works.

* server : add VSCode's Github Copilot Chat support * cont : update handler name

kabakaev · 2025-04-25T22:23:49Z

@ggerganov, it seems, GET /api/tags API is missing.

At least, my vscode-insiders with github.copilot version 1.308.1532 (updated 2025-04-25, 18:46:22) requests /api/tags and gets HTTP/404 response.

ggerganov · 2025-04-26T20:26:11Z

It's probably some new logic - should be easy to add support. Feel free to open a PR if you are interested.

theoparis · 2025-07-25T22:50:53Z

This seems to be broken now. When I open the model selection dialog it shows no models with the following error in the logs:

srv  log_server_r: request: GET /api/version 127.0.0.1 404

I used the same command mentioned initially: llama-server -hf ggml-org/Qwen2.5-Coder-3B-Instruct-Q8_0-GGUF --port 11434 -fa -ngl 99 -c 0

server : add VSCode's Github Copilot Chat support

b1a6c8b

ggerganov requested a review from ngxson as a code owner April 11, 2025 14:17

github-actions bot added examples server labels Apr 11, 2025

ngxson reviewed Apr 11, 2025

View reviewed changes

examples/server/server.cpp Outdated Show resolved Hide resolved

ngxson approved these changes Apr 11, 2025

View reviewed changes

cont : update handler name

359cf64

ggerganov merged commit c94085d into master Apr 11, 2025
50 checks passed

ggerganov deleted the gg/vscode-integration branch April 11, 2025 20:37

ggerganov mentioned this pull request Apr 20, 2025

Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls #11970

Open

colout pushed a commit to colout/llama.cpp that referenced this pull request Apr 21, 2025

server : add VSCode's Github Copilot Chat support (ggml-org#12896)

9378269

* server : add VSCode's Github Copilot Chat support * cont : update handler name

R-Dson mentioned this pull request May 20, 2025

Add the endpoints /api/tags and /api/chat #13659

Merged

This was referenced Aug 8, 2025

Misc. bug: VSCode copilot chat now asks for a minimum version #15167

Open

server : implement /api/version endpoint for ollama compatibility (#15167 ) #15177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : add VSCode's Github Copilot Chat support #12896

server : add VSCode's Github Copilot Chat support #12896

Uh oh!

ggerganov commented Apr 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ExtReMLapin commented Apr 12, 2025

Uh oh!

ericcurtin commented Apr 16, 2025 •

edited

Loading

Uh oh!

kabakaev commented Apr 25, 2025

Uh oh!

ggerganov commented Apr 26, 2025

Uh oh!

theoparis commented Jul 25, 2025

Uh oh!

Uh oh!

server : add VSCode's Github Copilot Chat support #12896

server : add VSCode's Github Copilot Chat support #12896

Uh oh!

Conversation

ggerganov commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Usage

Uh oh!

Uh oh!

Uh oh!

ExtReMLapin commented Apr 12, 2025

Uh oh!

ericcurtin commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kabakaev commented Apr 25, 2025

Uh oh!

ggerganov commented Apr 26, 2025

Uh oh!

theoparis commented Jul 25, 2025

Uh oh!

Uh oh!

ggerganov commented Apr 11, 2025 •

edited

Loading

ericcurtin commented Apr 16, 2025 •

edited

Loading