Code showing when running. #717

betolley · 2023-04-02T17:51:23Z

When I start chat.exe with a alpaca bin I get.
main: seed = 1680456908
llama_model_load: loading model from 'models/llama-7B/ggml-model.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 3
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 4820.95 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 6613.03 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'models/llama-7B/ggml-model.bin'
llama_model_load: model size = 4820.52 MB / num tensors = 291
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0

== Running in interactive mode. ==

Press Ctrl+C to interject at any time.
Press Return to return control to LLaMa.
If you want to submit another line, end your input in ''.

using System;

using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace _1.Write_a_program_to_find_the_minimum_element_in_an_array
{
class Program
{
static void Main()
{
//Create an array of five elements
int[] arr = new int[5];

        //Fill the array with random values between 0 and 10

The text was updated successfully, but these errors were encountered:

Belluxx · 2023-04-02T18:04:46Z

What are you trying to achieve? If you want to chat you must provide an example/context for that. If you want a story you must introduce it before starting the inference.

For example try adding this argument: -p "Once upon a time"

prusnak · 2023-04-02T18:06:17Z

Yes, you forgot to set a prompt so Llama just came up with its own text completely on her own! :-)

betolley · 2023-04-02T18:15:49Z

I was used to alpaca I didn't have to. WHen I run llama with -i it did it also.

Belluxx · 2023-04-02T19:44:08Z

Because alpaca.cpp adds context without showing you, it's a "wrapper" for alpaca finetuning

Co-authored-by: Andrei <abetlen@gmail.com>

prusnak closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2023

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

ggml-org#717: Add support for Huggingface Autotokenizer (ggml-org#790)

4ff8def

Co-authored-by: Andrei <abetlen@gmail.com>

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code showing when running. #717

Code showing when running. #717

betolley commented Apr 2, 2023

Belluxx commented Apr 2, 2023

prusnak commented Apr 2, 2023

betolley commented Apr 2, 2023

Belluxx commented Apr 2, 2023

Code showing when running. #717

Code showing when running. #717

Comments

betolley commented Apr 2, 2023

Belluxx commented Apr 2, 2023

prusnak commented Apr 2, 2023

betolley commented Apr 2, 2023

Belluxx commented Apr 2, 2023