-
Notifications
You must be signed in to change notification settings - Fork 528
Run Petals server on Windows
You can use WSL or Docker to run Petals on Windows. In this guide, we will show how to set up Petals on WSL (Windows Subsystem for Linux).
-
On Windows admin console, install WSL:
wsl --install
-
Open WSL, check that GPUs are available:
nvidia-smi
-
In WSL, install basic Python stuff:
sudo apt update sudo apt install python3-pip python-is-python3
-
Then, install Petals:
python -m pip install git+https://github.com/bigscience-workshop/petals
-
Run the Petals server:
python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b
This will host a part of LLaMA-65B with optional Guanaco adapters on your machine. You can also host
meta-llama/Llama-2-70b-hf
,meta-llama/Llama-2-70b-chat-hf
,bigscience/bloom
,bigscience/bloomz
, and other compatible models from 🤗 Model Hub, or add support for new model architectures.🦙 Want to host LLaMA 2? Request access to its weights at the ♾️ Meta AI website and 🤗 Model Hub, generate an 🔑 access token, then add the
--token YOUR_TOKEN
argument to the commands above.If you want to share multiple GPUs, you should run a Petals server for each. Open a separate WSL console for each, then run this in the first console:
CUDA_VISIBLE_DEVICES=0 python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b
Do the same for each console, replacing
CUDA_VISIBLE_DEVICES=0
withCUDA_VISIBLE_DEVICES=1
,CUDA_VISIBLE_DEVICES=2
, etc. -
Once all blocks are loaded, check that your server is available on https://health.petals.dev/
Petals will use NAT traversal via relays by default, but you can make it available directly if your computer has a public IP address. We recommend doing it when possible, since this allows other peers to connect to your server significantly faster.
-
In WSL, find out the IP address of your WSL container (
172.X.X.X
):sudo apt install net-tools ifconfig
-
Allow traffic to be routed into the WSL container (replace
172.X.X.X
with your actual IP):netsh interface portproxy add v4tov4 listenport=31330 listenaddress=0.0.0.0 connectport=31330 connectaddress=172.X.X.X
-
Set up your firewall (e.g., Windows Defender) to allow traffic from the outside world to the port 31330/tcp.
-
If you have a router, set it up to allow connections from the outside world (port 31330/tcp) to your computer (port 31330/tcp).
-
Run the Petals server with the parameter
--port 31330
:python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b --port 31330
-
Ensure that the server prints
This server is available directly
(notvia relays
) after startup.
-
I get this error:
hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others
on WSL. What should I do?Petals needs clocks on all nodes to be synchronized. Please set the date using an NTP server:
ntpdate pool.ntp.org
-
I get this error:
torch.cuda.OutOfMemoryError: CUDA out of memory
. What should I do?If you use an Anaconda env, run this before starting the server:
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
If you use Docker, add this argument after
--rm
in the Docker command:-e "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128"