-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Classification Pipeline #11
Comments
Hi, classification should be just another linear layer at the end when the embeddings are done. In the model conversion script, replace the automodel with BertForSequenceClassification or whichever you are using. During this step, check the name of the classification layer and modify the c++ to load and rename that layer. Then in the model eval code, take out the mean pooling and embeddings normalization and add multiplication with the classification matrix. I think for classification you take the embeddings of the first token and discard everything else. |
Thanks. Apparently Bert uses tanh for for pooler. In your code you have defined pooler as following
This may make sense in case if you need to calculate just the embeddings. But in my case, there is a tanh activation after the pooler. I have to load the pooler, apply tanh, apply dropout and then apply classification layer. This is how my code looks like.
Is this correct ? Apparently there are no ggml_tanh and ggml_dropout functions in the ggml library. Do I have to implement them from scratch ? Also, I didn't find dropout anywhere in the code, although it has been mentioned in the model specification. Is it calculated implicitly ? |
Dropout is only used during training and you can just ignore it when implementing inference. Tanh is indeed missing from ggml. You could either try to implement it as an operator in ggml or use |
Thanks. This is very useful. Just as a context, I am trying to convert prajjwal1/bert-tiny. It has a layer "bert.embeddings.position_ids". After reading this layer and before reading "bert.embeddings.word_embeddings.weight", the following line is giving segmentation fault.
Am I supposed to load position_ids in a some different way ? Can you please give this model a try by any chance ? |
position_ids is just a vector with indices 0,1,2,3... For a CPU impelementation it doesn't really make sense to store it as parameter to the model. It's done in pytorch like that for GPU memory related reasons. But it's just a for loop. |
Sure. But if I skip them, I get following error.
If I don't, then I get segmentation fault. Am I missing something ? |
While creating the model file, don't store the tensors you don't load. Or modify the code to ignore the error. |
Got it. You had skipped them using "embeddings.position_ids". I had to change it to "bert.embeddings.position_ids" in convert ggml file. Can I skip the pooler weights as well ? As you had mentioned in your first reply ? Or do I have to implement them including tanh ? |
The pooler weights are skipped in bert.cpp, because sbert.net doesn't use the vanilla BERT pooler. For classification, I think, you should use the vanilla pooler so that includes the learned pooler weights and the tanh step. |
Thanks man ! I was able to get it working by temporarily using relu instead of tanh.
Does it seem like a correct code ? At least it is working with your server and client implementation. I will implement tanh later, after your comments. |
The code looks at least reasonable. It's probably close to being right. Compare the outputs with the original, I guess? |
I am not sure. I replaced these lines with the code above.
Is this correct ? |
It is giving correct results. Thank you for your help ! |
@okpatil4u I want to use classification models as well, are your changes public? I would like to work with them. |
@okpatil4u pinging you again: I want to use classification models as well, are your changes public? I would like to work with them. |
The reason I didn’t make it public because it doesn’t work. There is at
least 25% accuracy drop when compared with the same model in
huggingface/transformers. We ended up rebuilding it in huggingface/candle
from scratch. You are welcome to try it.
…On Thu, 21 Sep 2023 at 7:49 AM, red thing ***@***.***> wrote:
@okpatil4u <https://github.com/okpatil4u> pinging you again: I want to
use classification models as well, are your changes public? I would like to
work with them.
—
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXGU4FQ6OVKOXQVY6JCLADX3OP3HANCNFSM6AAAAAAYOSS3VI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thank you for the update. |
Would you mind sharing the source modifications you made? I would like to troubleshoot. |
Is it possible to retrofit BERT classification model into this code ?
Can you please provide me some guidelines so that I can take care of it myself ?
Thanks in advance.
The text was updated successfully, but these errors were encountered: