feat: implement backward computation for more operators #921

Ronsor · 2024-08-12T20:11:00Z

This PR will add backward computations for most operators once completed.

Leaving pad, im2col, and norm for a future PR now.

Currently unsure if I should fuse the multiply + gradient computation for gelu_back/gelu_quick_back like with silu_back.

We use the following formulas to compute the gradients: Let g be `tensor->grad`, let x be `src0`, and let y be `tensor`. For tanh, `g * (1 - tanh^2(x)) = g * (1 - y^2) = g - gy^2`. For sigmoid, `g * (sigmoid(x) * (1 - sigmoid(x))) = g * (y * (1 - y)) = gy - gy^2`.

This comes with a breaking change: `ggml_clamp` is no longer an in-place operation. If you still want/need that behavior, use `ggml_clamp_inplace`. I hope no one depended on that. Also introduces `GGML_OP_CLAMP_BACK`, whose implementations for other backends will be added in a subsequent commit. The definition of `clamp_back` is as follows: { 0 if x < min d/dx(clamp(x, min, max)) = { 1 if min <= x <= max { 0 if x > max

Slice the gradient using a view operation, reshape, and then add to the inputs' gradients.

Introduces `GGML_UNARY_OP_ELU_BACK`, defined as the following: ELU'(x) = { e^x if x <= 0 { x if x > 0

d/dx(LeakyRELU(x, negative_slope)) = { 1 if x > 0 { negative_slope if x <= 0 The equivalent formula `negative_slope * step(-x) + step(x)` is used for backward computation.

…GELU_BACK` Introduces corresponding `*_BACK` operators for both. Backend-specific accelerated implementations forthcoming.

ggerganov

Should add tests to tests/test-grad0.cpp

src/ggml.c

JohannesGaessler · 2024-08-15T10:52:22Z

I'm currently working on adding training support for the MNIST example in #908 . I have a working backward pass for im2col and pool2d (the ops needed for the convolutional neural network). I'm currently working on cleaning up the code and putting it into a state that can be reviewed. When I added tests to test-grad0 I also added a fix to deal with noncontinuous gradients when numerically calculating the gradients to compare against backpropagation; this fix or an equivalent one will also be needed for clamp.

d/dx(ELU(x)) is 1 if x >= 0, not x

ggerganov · 2024-08-16T07:43:25Z

It might be better to wait for @JohannesGaessler to merge #908 and then continue this PR?

Ronsor · 2024-08-16T21:53:55Z

That's probably best, considering the changes needed for the tests.

JohannesGaessler · 2024-09-03T15:28:59Z

I extended the code in test-backend-ops to enable checking gradients from backpropagation against numerically calculated gradients. New tests for gradients should be implemented there if possible (the only thing that currently doesn't work is support for FP16). In principle all that should be necessary is to add ggml_set_param to the existing tests (though tuning the parameters in such a way that you get good numerical precision for the reference values can be tricky).

Ronsor · 2024-09-03T16:32:28Z

Perfect. I plan to finish this PR this weekend.

Ronsor added 9 commits August 12, 2024 12:58

ggml: implement backward for GGML_OP_MEAN

3f8b2d0

ggml-cuda: implement GGML_OP_CLAMP_BACK operator

200bf7b

ggml: implement backward for GGML_OP_CONCAT

255d9b9

Slice the gradient using a view operation, reshape, and then add to the inputs' gradients.

ggml: fix concat backward for src1

c375fde

ggml: add backward for GGML_UNARY_OP_ELU

1410e60

Introduces `GGML_UNARY_OP_ELU_BACK`, defined as the following: ELU'(x) = { e^x if x <= 0 { x if x > 0

ggml: implement backward for GGML_OP_LEAKY_RELU

cb4d55e

d/dx(LeakyRELU(x, negative_slope)) = { 1 if x > 0 { negative_slope if x <= 0 The equivalent formula `negative_slope * step(-x) + step(x)` is used for backward computation.

ggml: implement backward for GGML_UNARY_OP_GELU and `GGML_UNARY_OP_…

05adfc6

…GELU_BACK` Introduces corresponding `*_BACK` operators for both. Backend-specific accelerated implementations forthcoming.

Ronsor marked this pull request as ready for review August 14, 2024 02:31

ggerganov reviewed Aug 15, 2024

View reviewed changes

src/ggml.c Outdated Show resolved Hide resolved

ggml: fix elu_back, clean up clamp forward

fabadfc

d/dx(ELU(x)) is 1 if x >= 0, not x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement backward computation for more operators #921

feat: implement backward computation for more operators #921

Ronsor commented Aug 12, 2024 •

edited

Loading

ggerganov left a comment

JohannesGaessler commented Aug 15, 2024

ggerganov commented Aug 16, 2024

Ronsor commented Aug 16, 2024

JohannesGaessler commented Sep 3, 2024

Ronsor commented Sep 3, 2024

feat: implement backward computation for more operators #921

Are you sure you want to change the base?

feat: implement backward computation for more operators #921

Conversation

Ronsor commented Aug 12, 2024 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

JohannesGaessler commented Aug 15, 2024

ggerganov commented Aug 16, 2024

Ronsor commented Aug 16, 2024

JohannesGaessler commented Sep 3, 2024

Ronsor commented Sep 3, 2024

Ronsor commented Aug 12, 2024 •

edited

Loading