Skip to content

Commit

Permalink
[Doc] Replace ibm-fms with ibm-ai-platform (vllm-project#12709)
Browse files Browse the repository at this point in the history
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Linkun Chen <github@lkchen.net>
  • Loading branch information
tdoublep authored and lk-chen committed Mar 5, 2025
1 parent 996fde9 commit 3a553d5
Show file tree
Hide file tree
Showing 5 changed files with 10 additions and 10 deletions.
12 changes: 6 additions & 6 deletions docs/source/features/spec_decode.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(
model="meta-llama/Meta-Llama-3.1-70B-Instruct",
tensor_parallel_size=4,
speculative_model="ibm-fms/llama3-70b-accelerator",
speculative_model="ibm-ai-platform/llama3-70b-accelerator",
speculative_draft_tensor_parallel_size=1,
)
outputs = llm.generate(prompts, sampling_params)
Expand All @@ -149,11 +149,11 @@ limitation will be fixed in a future release.

A variety of speculative models of this type are available on HF hub:

- [llama-13b-accelerator](https://huggingface.co/ibm-fms/llama-13b-accelerator)
- [llama3-8b-accelerator](https://huggingface.co/ibm-fms/llama3-8b-accelerator)
- [codellama-34b-accelerator](https://huggingface.co/ibm-fms/codellama-34b-accelerator)
- [llama2-70b-accelerator](https://huggingface.co/ibm-fms/llama2-70b-accelerator)
- [llama3-70b-accelerator](https://huggingface.co/ibm-fms/llama3-70b-accelerator)
- [llama-13b-accelerator](https://huggingface.co/ibm-ai-platform/llama-13b-accelerator)
- [llama3-8b-accelerator](https://huggingface.co/ibm-ai-platform/llama3-8b-accelerator)
- [codellama-34b-accelerator](https://huggingface.co/ibm-ai-platform/codellama-34b-accelerator)
- [llama2-70b-accelerator](https://huggingface.co/ibm-ai-platform/llama2-70b-accelerator)
- [llama3-70b-accelerator](https://huggingface.co/ibm-ai-platform/llama3-70b-accelerator)
- [granite-3b-code-instruct-accelerator](https://huggingface.co/ibm-granite/granite-3b-code-instruct-accelerator)
- [granite-8b-code-instruct-accelerator](https://huggingface.co/ibm-granite/granite-8b-code-instruct-accelerator)
- [granite-7b-instruct-accelerator](https://huggingface.co/ibm-granite/granite-7b-instruct-accelerator)
Expand Down
2 changes: 1 addition & 1 deletion examples/offline_inference/mlpspeculator.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def time_generation(llm: LLM, prompts: List[str],
# Create an LLM with spec decoding
llm = LLM(
model="meta-llama/Llama-2-13b-chat-hf",
speculative_model="ibm-fms/llama-13b-accelerator",
speculative_model="ibm-ai-platform/llama-13b-accelerator",
)

print("With speculation")
Expand Down
2 changes: 1 addition & 1 deletion tests/models/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ def check_available_online(
"MedusaModel": _HfExamplesInfo("JackFram/llama-68m",
speculative_model="abhigoyal/vllm-medusa-llama-68m-random"), # noqa: E501
"MLPSpeculatorPreTrainedModel": _HfExamplesInfo("JackFram/llama-160m",
speculative_model="ibm-fms/llama-160m-accelerator"), # noqa: E501
speculative_model="ibm-ai-platform/llama-160m-accelerator"), # noqa: E501
}

_FALLBACK_MODEL = {
Expand Down
2 changes: 1 addition & 1 deletion tests/spec_decode/e2e/test_mlp_correctness.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
MAIN_MODEL = "JackFram/llama-160m"

# speculative model
SPEC_MODEL = "ibm-fms/llama-160m-accelerator"
SPEC_MODEL = "ibm-ai-platform/llama-160m-accelerator"

# max. number of speculative tokens: this corresponds to
# n_predict in the config.json of the speculator model.
Expand Down
2 changes: 1 addition & 1 deletion vllm/model_executor/models/mlp_speculator.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ class MLPSpeculator(nn.Module):
https://arxiv.org/pdf/2404.19124
Trained speculators of this type are available on HF hub at:
https://huggingface.co/ibm-fms and https://huggingface.co/ibm-granite
https://huggingface.co/ibm-ai-platform and https://huggingface.co/ibm-granite
"""

def __init__(self, *, vllm_config: VllmConfig, prefix: str = "") -> None:
Expand Down

0 comments on commit 3a553d5

Please # to comment.