Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

[FEATURE] Support BLOOMZ #815

Closed
zhanyuanucb opened this issue Dec 16, 2022 · 5 comments · Fixed by #844
Closed

[FEATURE] Support BLOOMZ #815

zhanyuanucb opened this issue Dec 16, 2022 · 5 comments · Fixed by #844
Labels
enhancement New feature good first issue Good for newcomers

Comments

@zhanyuanucb
Copy link
Contributor

zhanyuanucb commented Dec 16, 2022

System information

  • Alpa version: v0.2.2
  • Are you willing to contribute it (Yes/No): Yes

Describe the new feature and the current behavior/state
BLOOMZ model family is available here. It uses the same model architect as BLOOM's, so it is easy to extend the llm_serving example to serve BLOOMZ.

Will this change the current API? How?
At least need to change these lines

def get_config(name, **kwargs):
if name == "bloom-560m":
config = BloomConfig(
hidden_size=1024, n_head=16, num_hidden_layers=24,
pretraining_tp=1, use_cache=True
)
elif name == "bloom-1b1":
config = BloomConfig(
hidden_size=1536, n_head=16, num_hidden_layers=24,
pretraining_tp=1, use_cache=True
)
elif name == "bloom-1b7":
config = BloomConfig(
hidden_size=2048, n_head=16, num_hidden_layers=24,
pretraining_tp=2, use_cache=True
)
elif name == "bloom-3b":
config = BloomConfig(
hidden_size=2560, n_head=32, num_hidden_layers=30,
pretraining_tp=4, use_cache=True
)
elif name == "bloom-7b1":
config = BloomConfig(
hidden_size=4096, n_head=32, num_hidden_layers=30,
pretraining_tp=4, use_cache=True
)
elif name == "bloom":
config = BloomConfig(
hidden_size=14336, n_head=112, num_hidden_layers=70,
pretraining_tp=4, use_cache=True
)
elif name == "bloom-debug":
config = BloomConfig(
hidden_size=1024, n_head=16, num_hidden_layers=8,
pretraining_tp=4, use_cache=True
)
else:
raise ValueError()

I've tested bloomz-560m to bloomz-7b1 in my fork:

https://github.com/zhanyuanucb/alpa/blob/e196638768392d22d55af41ada6f85b07abe69c4/examples/llm_serving/model/bloom_model.py#L536-L557

Describe alternatives you've considered

Additional context

@zhuohan123
Copy link
Member

Hi! Please feel free to submit a PR for your proposed change.

@zhanyuanucb
Copy link
Contributor Author

I haven't tested the full-size BLOOMZ model. I can do it later. But if anyone can test it out, feel free to post what they find here.

@zhisbug
Copy link
Member

zhisbug commented Dec 19, 2022

You can submit a PR from your fork?

@merrymercy merrymercy changed the title Support BLOOMZ [FEATURE] Support BLOOMZ Dec 20, 2022
@merrymercy merrymercy added enhancement New feature good first issue Good for newcomers labels Dec 20, 2022
@zhanyuanucb
Copy link
Contributor Author

@zhuohan123 @zhisbug
I found that the deployment is not stable. Sometimes it may fail and get stuck after downloading the model weights.
I need to look into this.

@zhanyuanucb
Copy link
Contributor Author

@zhuohan123 @zhisbug
Seems like the deployment failure only happens for bloomz-7b1, and it is due to an unexpected exit in this function:

def download_weights(model_name, path):
"""Download weights from huggingface."""
if "opt" in model_name:
hf_model_name = "facebook/" + model_name
model_class = OPTForCausalLM
elif "bloom" in model_name:
hf_model_name = "bigscience/" + model_name
model_class = BloomForCausalLM
print(f"Load the pre-trained pytorch weights of {model_name} from huggingface. "
f"The downloading and cpu loading can take dozens of minutes. "
f"If it seems to get stuck, you can monitor the progress by "
f"checking the memory usage of this process.")
disable_torch_init()
model = model_class.from_pretrained(hf_model_name, torch_dtype=torch.float16,
_fast_init=True)
restore_torch_init()
os.makedirs(path, exist_ok=True)
print(f"Convert the weights to alpa format under {path} ...")
if "opt" in model_name:
for name, param in tqdm(list(model.model.named_parameters())):
name = name.replace("decoder.final_layer_norm", "decoder.layer_norm")
param_path = os.path.join(path, name)
with open(param_path, "wb") as f:
np.save(f, param.cpu().detach().numpy())
elif "bloom" in model_name:
for name, param in tqdm(list(model.transformer.named_parameters())):
param_path = os.path.join(path, name)
with open(param_path, "wb") as f:
np.save(f, param.cpu().detach().numpy())

More specifically, between line 600 and line 605, because I saw Load the pre-trained pytorch weights of printed out in the log, but Convert the weights to alpa format under was not printed out. I couldn't find other error messages in the logs, and the resources should be enough since I could serve bloomz-3b and bloom-7b1.

If I run the download_weights("bloomz-7b1", "/models/bloomz-7b1-np") separately, the weight conversion can be done successfully. Once I got the converted weights, the model serving had no problem.

Does this have anything to do with some timeout in Ray? It would be helpful if someone can point me to some related part of the code.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
enhancement New feature good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants