Implementing RAG, some questions on llama.cpp #12125

gnusupport · 2025-03-01T08:23:57Z

gnusupport
Mar 1, 2025

While working with Emacs Lisp, I have so far imlpemented text splitting into chunks and using the embeddings model

/usr/local/bin/llama-server -ngl 999 -v -c 8192 -ub 8192 --embedding --log-timestamps --host 192.168.1.68 --port 9999 -m /mnt/data/LLM/nomic-ai/quantized/nomic-embed-text-v1.5-Q8_0.gguf

That works fine and well and embeddings are recorded in the PostgreSQL database with the pgvector extension, this works well. Searching by embeddings works well.

I can quickly implement listing of documents, or people, whatever I am searching. This is done by PostgreSQL database.

And then according to what I learned, I am supposed to insert that information into the context of the LLM prompt in order to get the RAG functionality.

Sure I have some clue how to do it by curl, or Emacs Lisp over the API endpoint.

Though I would like to know is there, or would be there any way of implementing it in background so that I can somehow inject that stuff and get the responses over the Llama.cpp web UI?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing RAG, some questions on llama.cpp #12125

{{title}}

Replies: 0 comments

Select a reply

Implementing RAG, some questions on llama.cpp #12125

gnusupport Mar 1, 2025

Replies: 0 comments

gnusupport
Mar 1, 2025