Classification Pipeline #11

okpatil4u · 2023-05-25T09:45:13Z

Is it possible to retrofit BERT classification model into this code ?

Can you please provide me some guidelines so that I can take care of it myself ?

Thanks in advance.

skeskinen · 2023-05-25T18:47:35Z

Hi, classification should be just another linear layer at the end when the embeddings are done.

In the model conversion script, replace the automodel with BertForSequenceClassification or whichever you are using. During this step, check the name of the classification layer and modify the c++ to load and rename that layer.

Then in the model eval code, take out the mean pooling and embeddings normalization and add multiplication with the classification matrix. I think for classification you take the embeddings of the first token and discard everything else.

okpatil4u · 2023-05-29T07:18:37Z

Thanks. Apparently Bert uses tanh for for pooler. In your code you have defined pooler as following

        // pooler
        struct ggml_tensor *sum = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, N, 1);
        ggml_set_f32(sum, 1.0f / N);
        inpL = ggml_mul_mat(ctx0, inpL, sum);

        // normalizer
        ggml_tensor *length = ggml_sqrt(ctx0,
                                        ggml_sum(ctx0, ggml_sqr(ctx0, inpL)));
        inpL = ggml_scale(ctx0, inpL, ggml_div(ctx0, ggml_new_f32(ctx0, 1.0f), length));

        ggml_tensor *output = inpL;

This may make sense in case if you need to calculate just the embeddings. But in my case, there is a tanh activation after the pooler. I have to load the pooler, apply tanh, apply dropout and then apply classification layer. This is how my code looks like.


       // pooler
        inpL = ggml_mul_mat(ctx0, model.pooler_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.pooler_b, inpL), inpL);
        // Apply Tanh activation function
        inpL = ggml_tanh(ctx0, inpL);
        
        // Apply dropout
        inpL = ggml_dropout(ctx0, inpL, model.dropout);
        
        // Adding the classifier layer here
        inpL = ggml_mul_mat(ctx0, model.classifier_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.classifier_b, inpL), inpL);

Is this correct ? Apparently there are no ggml_tanh and ggml_dropout functions in the ggml library. Do I have to implement them from scratch ? Also, I didn't find dropout anywhere in the code, although it has been mentioned in the model specification. Is it calculated implicitly ?

skeskinen · 2023-05-29T09:29:01Z

Dropout is only used during training and you can just ignore it when implementing inference.

Tanh is indeed missing from ggml. You could either try to implement it as an operator in ggml or use ggml_map_unary_f32 to wrap tanh into a tensor function.

okpatil4u · 2023-05-29T09:40:16Z

Thanks. This is very useful.

Just as a context, I am trying to convert prajjwal1/bert-tiny. It has a layer "bert.embeddings.position_ids". After reading this layer and before reading "bert.embeddings.word_embeddings.weight", the following line is giving segmentation fault.

            int64_t nelements = 1;
            int64_t ne[2] = {1, 1};
            for (int i = 0; i < n_dims; ++i)
            {
                int32_t ne_cur;
                fin.read(reinterpret_cast<char *>(&ne_cur), sizeof(ne_cur));
                ne[i] = ne_cur;
                nelements *= ne[i];
            }

Am I supposed to load position_ids in a some different way ? Can you please give this model a try by any chance ?

skeskinen · 2023-05-29T10:07:08Z

position_ids is just a vector with indices 0,1,2,3...
In bert.cpp, it's here: https://github.com/skeskinen/bert.cpp/blob/master/bert.cpp#L791

For a CPU impelementation it doesn't really make sense to store it as parameter to the model. It's done in pytorch like that for GPU memory related reasons. But it's just a for loop.

okpatil4u · 2023-05-29T10:11:03Z

Sure. But if I skip them, I get following error.

bert_load_from_file: unknown tensor 'bert.embeddings.position_ids' in model file

If I don't, then I get segmentation fault. Am I missing something ?

skeskinen · 2023-05-29T10:13:00Z

While creating the model file, don't store the tensors you don't load. Or modify the code to ignore the error.

okpatil4u · 2023-05-29T10:24:58Z

Got it. You had skipped them using "embeddings.position_ids". I had to change it to "bert.embeddings.position_ids" in convert ggml file.

Can I skip the pooler weights as well ? As you had mentioned in your first reply ? Or do I have to implement them including tanh ?

skeskinen · 2023-05-29T10:42:15Z

The pooler weights are skipped in bert.cpp, because sbert.net doesn't use the vanilla BERT pooler. For classification, I think, you should use the vanilla pooler so that includes the learned pooler weights and the tanh step.

okpatil4u · 2023-05-29T12:00:27Z

Thanks man !

I was able to get it working by temporarily using relu instead of tanh.

        inpL = ggml_mul_mat(ctx0, model.pooler_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.pooler_b, inpL), inpL);
        // Apply Tanh activation function
        inpL = ggml_relu(ctx0, inpL);
        
        inpL = ggml_mul_mat(ctx0, model.classifier_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.classifier_b, inpL), inpL);
    
        ggml_tensor *output = inpL;

        // run the computation
        ggml_build_forward_expand(&gf, output);
        ggml_graph_compute(ctx0, &gf);

Does it seem like a correct code ? At least it is working with your server and client implementation. I will implement tanh later, after your comments.

skeskinen · 2023-05-29T12:59:27Z

The code looks at least reasonable. It's probably close to being right. Compare the outputs with the original, I guess?
Only thing I'd check is that afaik inpL in the beginning should be just the embeddings for the first token (CLS or BOS token). Maybe you do that already but it's not shown here.
Or maybe I'm wrong, like I said, check the original code

okpatil4u · 2023-05-29T13:49:30Z

I am not sure. I replaced these lines with the code above.

        inpL = ggml_cont(ctx0, ggml_transpose(ctx0, inpL));
        // pooler
        struct ggml_tensor *sum = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, N, 1);
        ggml_set_f32(sum, 1.0f / N);
        inpL = ggml_mul_mat(ctx0, inpL, sum);

        // normalizer
        ggml_tensor *length = ggml_sqrt(ctx0,
                                        ggml_sum(ctx0, ggml_sqr(ctx0, inpL)));
        inpL = ggml_scale(ctx0, inpL, ggml_div(ctx0, ggml_new_f32(ctx0, 1.0f), length));

        ggml_tensor *output = inpL;
        // run the computation
        ggml_build_forward_expand(&gf, output);
        ggml_graph_compute(ctx0, &gf);

Is this correct ?

okpatil4u · 2023-05-30T06:44:04Z

It is giving correct results. Thank you for your help !

redthing1 · 2023-06-05T18:41:37Z

@okpatil4u I want to use classification models as well, are your changes public? I would like to work with them.

redthing1 · 2023-09-21T02:19:19Z

@okpatil4u pinging you again: I want to use classification models as well, are your changes public? I would like to work with them.

okpatil4u · 2023-09-21T02:41:09Z

The reason I didn’t make it public because it doesn’t work. There is at least 25% accuracy drop when compared with the same model in huggingface/transformers. We ended up rebuilding it in huggingface/candle from scratch. You are welcome to try it.

…

On Thu, 21 Sep 2023 at 7:49 AM, red thing ***@***.***> wrote: @okpatil4u <https://github.com/okpatil4u> pinging you again: I want to use classification models as well, are your changes public? I would like to work with them. — Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXGU4FQ6OVKOXQVY6JCLADX3OP3HANCNFSM6AAAAAAYOSS3VI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

redthing1 · 2023-09-21T03:39:48Z

Thank you for the update.

redthing1 · 2023-09-21T18:54:52Z

@okpatil4u

The reason I didn’t make it public because it doesn’t work. There is at least 25% accuracy drop when compared with the same model in huggingface/transformers. We ended up rebuilding it in huggingface/candle from scratch. You are welcome to try it.
…
On Thu, 21 Sep 2023 at 7:49 AM, red thing @.> wrote: @okpatil4u https://github.com/okpatil4u pinging you again: I want to use classification models as well, are your changes public? I would like to work with them. — Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXGU4FQ6OVKOXQVY6JCLADX3OP3HANCNFSM6AAAAAAYOSS3VI . You are receiving this because you were mentioned.Message ID: @.>

Would you mind sharing the source modifications you made? I would like to troubleshoot.

okpatil4u closed this as completed May 30, 2023

skeskinen mentioned this issue Jun 5, 2023

Can this handle classification models, e.g. MNLI? #12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification Pipeline #11

Classification Pipeline #11

okpatil4u commented May 25, 2023

skeskinen commented May 25, 2023

okpatil4u commented May 29, 2023

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023 •

edited

Loading

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023

okpatil4u commented May 30, 2023

redthing1 commented Jun 5, 2023

redthing1 commented Sep 21, 2023

okpatil4u commented Sep 21, 2023 via email

redthing1 commented Sep 21, 2023

redthing1 commented Sep 21, 2023

Classification Pipeline #11

Classification Pipeline #11

Comments

okpatil4u commented May 25, 2023

skeskinen commented May 25, 2023

okpatil4u commented May 29, 2023

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023 • edited Loading

skeskinen commented May 29, 2023

okpatil4u commented May 29, 2023

okpatil4u commented May 30, 2023

redthing1 commented Jun 5, 2023

redthing1 commented Sep 21, 2023

okpatil4u commented Sep 21, 2023 via email

redthing1 commented Sep 21, 2023

redthing1 commented Sep 21, 2023

okpatil4u commented May 29, 2023 •

edited

Loading