Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Classification Pipeline #11

Closed
okpatil4u opened this issue May 25, 2023 · 18 comments
Closed

Classification Pipeline #11

okpatil4u opened this issue May 25, 2023 · 18 comments

Comments

@okpatil4u
Copy link

Is it possible to retrofit BERT classification model into this code ?

Can you please provide me some guidelines so that I can take care of it myself ?

Thanks in advance.

@skeskinen
Copy link
Owner

Hi, classification should be just another linear layer at the end when the embeddings are done.

In the model conversion script, replace the automodel with BertForSequenceClassification or whichever you are using. During this step, check the name of the classification layer and modify the c++ to load and rename that layer.

Then in the model eval code, take out the mean pooling and embeddings normalization and add multiplication with the classification matrix. I think for classification you take the embeddings of the first token and discard everything else.

@okpatil4u
Copy link
Author

Thanks. Apparently Bert uses tanh for for pooler. In your code you have defined pooler as following

        // pooler
        struct ggml_tensor *sum = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, N, 1);
        ggml_set_f32(sum, 1.0f / N);
        inpL = ggml_mul_mat(ctx0, inpL, sum);

        // normalizer
        ggml_tensor *length = ggml_sqrt(ctx0,
                                        ggml_sum(ctx0, ggml_sqr(ctx0, inpL)));
        inpL = ggml_scale(ctx0, inpL, ggml_div(ctx0, ggml_new_f32(ctx0, 1.0f), length));

        ggml_tensor *output = inpL;

This may make sense in case if you need to calculate just the embeddings. But in my case, there is a tanh activation after the pooler. I have to load the pooler, apply tanh, apply dropout and then apply classification layer. This is how my code looks like.


       // pooler
        inpL = ggml_mul_mat(ctx0, model.pooler_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.pooler_b, inpL), inpL);
        // Apply Tanh activation function
        inpL = ggml_tanh(ctx0, inpL);
        
        // Apply dropout
        inpL = ggml_dropout(ctx0, inpL, model.dropout);
        
        // Adding the classifier layer here
        inpL = ggml_mul_mat(ctx0, model.classifier_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.classifier_b, inpL), inpL);

Is this correct ? Apparently there are no ggml_tanh and ggml_dropout functions in the ggml library. Do I have to implement them from scratch ? Also, I didn't find dropout anywhere in the code, although it has been mentioned in the model specification. Is it calculated implicitly ?

@skeskinen
Copy link
Owner

Dropout is only used during training and you can just ignore it when implementing inference.

Tanh is indeed missing from ggml. You could either try to implement it as an operator in ggml or use ggml_map_unary_f32 to wrap tanh into a tensor function.

@okpatil4u
Copy link
Author

Thanks. This is very useful.

Just as a context, I am trying to convert prajjwal1/bert-tiny. It has a layer "bert.embeddings.position_ids". After reading this layer and before reading "bert.embeddings.word_embeddings.weight", the following line is giving segmentation fault.

            int64_t nelements = 1;
            int64_t ne[2] = {1, 1};
            for (int i = 0; i < n_dims; ++i)
            {
                int32_t ne_cur;
                fin.read(reinterpret_cast<char *>(&ne_cur), sizeof(ne_cur));
                ne[i] = ne_cur;
                nelements *= ne[i];
            }

Am I supposed to load position_ids in a some different way ? Can you please give this model a try by any chance ?

@skeskinen
Copy link
Owner

position_ids is just a vector with indices 0,1,2,3...
In bert.cpp, it's here: https://github.com/skeskinen/bert.cpp/blob/master/bert.cpp#L791

For a CPU impelementation it doesn't really make sense to store it as parameter to the model. It's done in pytorch like that for GPU memory related reasons. But it's just a for loop.

@okpatil4u
Copy link
Author

Sure. But if I skip them, I get following error.

bert_load_from_file: unknown tensor 'bert.embeddings.position_ids' in model file

If I don't, then I get segmentation fault. Am I missing something ?

@skeskinen
Copy link
Owner

While creating the model file, don't store the tensors you don't load. Or modify the code to ignore the error.

@okpatil4u
Copy link
Author

Got it. You had skipped them using "embeddings.position_ids". I had to change it to "bert.embeddings.position_ids" in convert ggml file.

Can I skip the pooler weights as well ? As you had mentioned in your first reply ? Or do I have to implement them including tanh ?

@skeskinen
Copy link
Owner

The pooler weights are skipped in bert.cpp, because sbert.net doesn't use the vanilla BERT pooler. For classification, I think, you should use the vanilla pooler so that includes the learned pooler weights and the tanh step.

@okpatil4u
Copy link
Author

okpatil4u commented May 29, 2023

Thanks man !

I was able to get it working by temporarily using relu instead of tanh.

        inpL = ggml_mul_mat(ctx0, model.pooler_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.pooler_b, inpL), inpL);
        // Apply Tanh activation function
        inpL = ggml_relu(ctx0, inpL);
        
        inpL = ggml_mul_mat(ctx0, model.classifier_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.classifier_b, inpL), inpL);
    
        ggml_tensor *output = inpL;

        // run the computation
        ggml_build_forward_expand(&gf, output);
        ggml_graph_compute(ctx0, &gf);

Does it seem like a correct code ? At least it is working with your server and client implementation. I will implement tanh later, after your comments.

@skeskinen
Copy link
Owner

The code looks at least reasonable. It's probably close to being right. Compare the outputs with the original, I guess?
Only thing I'd check is that afaik inpL in the beginning should be just the embeddings for the first token (CLS or BOS token). Maybe you do that already but it's not shown here.
Or maybe I'm wrong, like I said, check the original code

@okpatil4u
Copy link
Author

I am not sure. I replaced these lines with the code above.

        inpL = ggml_cont(ctx0, ggml_transpose(ctx0, inpL));
        // pooler
        struct ggml_tensor *sum = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, N, 1);
        ggml_set_f32(sum, 1.0f / N);
        inpL = ggml_mul_mat(ctx0, inpL, sum);

        // normalizer
        ggml_tensor *length = ggml_sqrt(ctx0,
                                        ggml_sum(ctx0, ggml_sqr(ctx0, inpL)));
        inpL = ggml_scale(ctx0, inpL, ggml_div(ctx0, ggml_new_f32(ctx0, 1.0f), length));

        ggml_tensor *output = inpL;
        // run the computation
        ggml_build_forward_expand(&gf, output);
        ggml_graph_compute(ctx0, &gf);

Is this correct ?

@okpatil4u
Copy link
Author

It is giving correct results. Thank you for your help !

@redthing1
Copy link

@okpatil4u I want to use classification models as well, are your changes public? I would like to work with them.

@redthing1
Copy link

@okpatil4u pinging you again: I want to use classification models as well, are your changes public? I would like to work with them.

@okpatil4u
Copy link
Author

okpatil4u commented Sep 21, 2023 via email

@redthing1
Copy link

Thank you for the update.

@redthing1
Copy link

@okpatil4u

The reason I didn’t make it public because it doesn’t work. There is at least 25% accuracy drop when compared with the same model in huggingface/transformers. We ended up rebuilding it in huggingface/candle from scratch. You are welcome to try it.

On Thu, 21 Sep 2023 at 7:49 AM, red thing @.> wrote: @okpatil4u https://github.com/okpatil4u pinging you again: I want to use classification models as well, are your changes public? I would like to work with them. — Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXGU4FQ6OVKOXQVY6JCLADX3OP3HANCNFSM6AAAAAAYOSS3VI . You are receiving this because you were mentioned.Message ID: @.>

Would you mind sharing the source modifications you made? I would like to troubleshoot.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants