[FEATURE] Support BLOOMZ #815

zhanyuanucb · 2022-12-16T23:44:29Z

System information

Alpa version: v0.2.2
Are you willing to contribute it (Yes/No): Yes

Describe the new feature and the current behavior/state
BLOOMZ model family is available here. It uses the same model architect as BLOOM's, so it is easy to extend the llm_serving example to serve BLOOMZ.

Will this change the current API? How?
At least need to change these lines

alpa/examples/llm_serving/model/bloom_model.py

Lines 536 to 573 in fcd560d

    
           def get_config(name, **kwargs): 
        
               if name == "bloom-560m": 
        
                   config = BloomConfig( 
        
                       hidden_size=1024, n_head=16, num_hidden_layers=24, 
        
                       pretraining_tp=1, use_cache=True 
        
                   ) 
        
               elif name == "bloom-1b1": 
        
                   config = BloomConfig( 
        
                       hidden_size=1536, n_head=16, num_hidden_layers=24, 
        
                       pretraining_tp=1, use_cache=True 
        
                   ) 
        
               elif name == "bloom-1b7": 
        
                   config = BloomConfig( 
        
                       hidden_size=2048, n_head=16, num_hidden_layers=24, 
        
                       pretraining_tp=2, use_cache=True 
        
                   ) 
        
               elif name == "bloom-3b": 
        
                   config = BloomConfig( 
        
                       hidden_size=2560, n_head=32, num_hidden_layers=30, 
        
                       pretraining_tp=4, use_cache=True 
        
                   ) 
        
               elif name == "bloom-7b1": 
        
                   config = BloomConfig( 
        
                       hidden_size=4096, n_head=32, num_hidden_layers=30, 
        
                       pretraining_tp=4, use_cache=True 
        
                   ) 
        
               elif name == "bloom": 
        
                   config = BloomConfig( 
        
                       hidden_size=14336, n_head=112, num_hidden_layers=70, 
        
                       pretraining_tp=4, use_cache=True 
        
                   ) 
        
               elif name == "bloom-debug": 
        
                   config = BloomConfig( 
        
                       hidden_size=1024, n_head=16, num_hidden_layers=8, 
        
                       pretraining_tp=4, use_cache=True 
        
                   ) 
        
               else: 
        
                   raise ValueError()

I've tested bloomz-560m to bloomz-7b1 in my fork:

https://github.com/zhanyuanucb/alpa/blob/e196638768392d22d55af41ada6f85b07abe69c4/examples/llm_serving/model/bloom_model.py#L536-L557

Describe alternatives you've considered

Additional context

The text was updated successfully, but these errors were encountered:

zhuohan123 · 2022-12-17T00:02:29Z

Hi! Please feel free to submit a PR for your proposed change.

zhanyuanucb · 2022-12-18T06:55:59Z

I haven't tested the full-size BLOOMZ model. I can do it later. But if anyone can test it out, feel free to post what they find here.

zhisbug · 2022-12-19T03:14:23Z

You can submit a PR from your fork?

zhanyuanucb · 2023-01-02T00:35:41Z

@zhuohan123 @zhisbug
I found that the deployment is not stable. Sometimes it may fail and get stuck after downloading the model weights.
I need to look into this.

zhanyuanucb · 2023-01-04T04:14:32Z

@zhuohan123 @zhisbug
Seems like the deployment failure only happens for bloomz-7b1, and it is due to an unexpected exit in this function:

alpa/examples/llm_serving/model/wrapper.py

Lines 586 to 618 in b56e843

    
           def download_weights(model_name, path): 
        
               """Download weights from huggingface.""" 
        
               if "opt" in model_name: 
        
                   hf_model_name = "facebook/" + model_name 
        
                   model_class = OPTForCausalLM 
        
               elif "bloom" in model_name: 
        
                   hf_model_name = "bigscience/" + model_name 
        
                   model_class = BloomForCausalLM 
        
               print(f"Load the pre-trained pytorch weights of {model_name} from huggingface. " 
        
                     f"The downloading and cpu loading can take dozens of minutes. " 
        
                     f"If it seems to get stuck, you can monitor the progress by " 
        
                     f"checking the memory usage of this process.") 
        
               disable_torch_init() 
        
               model = model_class.from_pretrained(hf_model_name, torch_dtype=torch.float16, 
        
                                                   _fast_init=True) 
        
               restore_torch_init() 
        
               os.makedirs(path, exist_ok=True) 
        
               print(f"Convert the weights to alpa format under {path} ...") 
        
               if "opt" in model_name: 
        
                   for name, param in tqdm(list(model.model.named_parameters())): 
        
                       name = name.replace("decoder.final_layer_norm", "decoder.layer_norm") 
        
                       param_path = os.path.join(path, name) 
        
                       with open(param_path, "wb") as f: 
        
                           np.save(f, param.cpu().detach().numpy()) 
        
               elif "bloom" in model_name: 
        
                   for name, param in tqdm(list(model.transformer.named_parameters())): 
        
                       param_path = os.path.join(path, name) 
        
                       with open(param_path, "wb") as f: 
        
                           np.save(f, param.cpu().detach().numpy())

More specifically, between line 600 and line 605, because I saw Load the pre-trained pytorch weights of printed out in the log, but Convert the weights to alpa format under was not printed out. I couldn't find other error messages in the logs, and the resources should be enough since I could serve bloomz-3b and bloom-7b1.

If I run the download_weights("bloomz-7b1", "/models/bloomz-7b1-np") separately, the weight conversion can be done successfully. Once I got the converted weights, the model serving had no problem.

Does this have anything to do with some timeout in Ray? It would be helpful if someone can point me to some related part of the code.

merrymercy changed the title ~~Support BLOOMZ~~ [FEATURE] Support BLOOMZ Dec 20, 2022

merrymercy added enhancement New feature good first issue Good for newcomers labels Dec 20, 2022

zhanyuanucb mentioned this issue Jan 4, 2023

Support Bloomz #844

Merged

merrymercy closed this as completed in #844 Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support BLOOMZ #815

[FEATURE] Support BLOOMZ #815

zhanyuanucb commented Dec 16, 2022 •

edited

Loading

zhuohan123 commented Dec 17, 2022

zhanyuanucb commented Dec 18, 2022

zhisbug commented Dec 19, 2022

zhanyuanucb commented Jan 2, 2023

zhanyuanucb commented Jan 4, 2023

[FEATURE] Support BLOOMZ #815

[FEATURE] Support BLOOMZ #815

Comments

zhanyuanucb commented Dec 16, 2022 • edited Loading

zhuohan123 commented Dec 17, 2022

zhanyuanucb commented Dec 18, 2022

zhisbug commented Dec 19, 2022

zhanyuanucb commented Jan 2, 2023

zhanyuanucb commented Jan 4, 2023

zhanyuanucb commented Dec 16, 2022 •

edited

Loading