This repository packages Vicunlocked-Alpaca-30B as a Truss.
Utilizing this model for inference can be challenging given the hardware requirements. With Baseten and Truss, inference is dead simple.
We found this model runs reasonably fast on A100s; you can configure the hardware you'd like in the config.yaml.
...
resources:
cpu: "3"
memory: 14Gi
use_gpu: true
accelerator: A100
...
Before deployment:
- Make sure you have a Baseten account and API key. You can # for a Baseten account here.
- Install Truss and the Baseten Python client:
pip install --upgrade baseten truss
- Authenticate your development environment with
baseten login
Deploying the Truss is easy; simply load it and push from a Python script:
import baseten
import truss
vicunlocked_truss = truss.load('.')
baseten.deploy(vicunlocked_truss)
The usual GPT-style parameters will pass right through to the inference point:
import baseten
model = baseten.deployed_model_id('YOUR MODEL ID')
model.predict({"prompt": "Write a movie plot about vicunas planning to over the world", "do_sample": True, "max_new_tokens": 300})
You can also invoke your model via a REST API
curl -X POST " https://app.baseten.co/models/YOUR_MODEL_ID/predict" \
-H "Content-Type: application/json" \
-H 'Authorization: Api-Key {YOUR_API_KEY}' \
-d '{
"prompt": "Write a movie plot about vicunlockeds planning to over the world",
"do_sample": True,
"max_new_tokens": 300,
"temperature": 0.3
}'