Skip to content

Commit

Permalink
Update Mistral LLMs notebook
Browse files Browse the repository at this point in the history
- Add instructions to request access to the model (now required for
Mistral)
- Reduce `sequence_length` so notebook is more accessible - I got OOM
  even though I have a 32Gb RAM laptop
  • Loading branch information
caioaao committed Oct 17, 2024
1 parent 39f44c7 commit 94751b7
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions notebooks/llms.livemd
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,11 @@ Nx.Serving.batched_run(Llama, prompt) |> Enum.each(&IO.write/1)

We can easily test other LLMs, we just need to change the repository and possibly adjust the prompt template. In this example we run the [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model.

Just like Llama, Mistral now also requires users to request access to their models, so make sure you are granted access to the model, then generate a [HuggingFace auth token](https://huggingface.co/settings/tokens) and put it in a `HF_TOKEN` Livebook secret.

```elixir
repo = {:hf, "mistralai/Mistral-7B-Instruct-v0.2"}
hf_token = System.fetch_env!('LB_HF_TOKEN')
repo = {:hf, "mistralai/Mistral-7B-Instruct-v0.2", auth_token: hf_token}

{:ok, model_info} = Bumblebee.load_model(repo, type: :bf16, backend: EXLA.Backend)
{:ok, tokenizer} = Bumblebee.load_tokenizer(repo)
Expand All @@ -109,7 +112,7 @@ generation_config =

serving =
Bumblebee.Text.generation(model_info, tokenizer, generation_config,
compile: [batch_size: 1, sequence_length: 1028],
compile: [batch_size: 1, sequence_length: 512],
stream: true,
defn_options: [compiler: EXLA]
)
Expand Down

0 comments on commit 94751b7

Please # to comment.