Skip to content

struct ggml_tensor -> add a struct meta pointer into it (can replace padding) #1093

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
cmp-nct opened this issue Apr 20, 2023 · 5 comments
Closed
Labels
enhancement New feature or request high priority Very important issue

Comments

@cmp-nct
Copy link
Contributor

cmp-nct commented Apr 20, 2023

Given that the tensor struct uses padding it's not nice to add any more information into it.
It currently has a static 8 byte padding at the end, that's perfect to be replaced by a pointer to a struct to store additional information.
For example a optional human readable tensor name (to be used in graph print) or a couple u_int8 to switch on or off features by tensor.
For example: use_cublas=0
This would allow to fine-control the usage of such a library instead of hard-coding it on through a define flag.
For example: performance_type=HIGH/LOW/MID
For example: threads_override=2

use_cublas could be initialized depending on the define as 1/0.

I'd also move the task scheduling n_task out and the performance stuff
The overhead to access a compact external struct should be zero.

I suppose all that stuff could also be added directly into the tensor struct but if we have to keep it aligned that's not nice.
Especially given 64/32 bit environments with different pointer sizes etc.

Just as an example, giving tensors a name can look like that in the debug output:

 - 842: [    4096,       1,       1]          MUL_MAT   (  1) cpu =   0.000 /   0.000 ms, wall =   0.595 /   0.595 ms [wo x cur 22]
 - 843: [    4096,       1,       1]              ADD   (  1) cpu =   0.000 /   0.000 ms, wall =   0.008 /   0.008 ms [noname]
 - 844: [    4096,       1,       1]         RMS_NORM   (  1) cpu =   0.000 /   0.000 ms, wall =   0.015 /   0.015 ms [noname]
 - 845: [    4096,       1,       1]              MUL   (  1) cpu =   0.000 /   0.000 ms, wall =   0.004 /   0.004 ms [noname]
 - 846: [   11008,       1,       1]          MUL_MAT   (  1) cpu =   2.000 /   2.000 ms, wall =   1.584 /   1.584 ms [w1 x cur 22] (Hot)
 - 847: [   11008,       1,       1]             SILU   (  1) cpu =   0.000 /   0.000 ms, wall =   0.219 /   0.219 ms [noname]
 - 848: [   11008,       1,       1]          MUL_MAT   (  1) cpu =   2.000 /   2.000 ms, wall =   1.643 /   1.643 ms [w3 x cur 22] (Hot)
 - 849: [   11008,       1,       1]              MUL   (  1) cpu =   0.000 /   0.000 ms, wall =   0.009 /   0.009 ms [noname]
 - 850: [    4096,       1,       1]          MUL_MAT   (  1) cpu =   3.000 /   3.000 ms, wall =   3.137 /   3.137 ms [w2 x cur 22] (Hot)

Without names it's unreadable, the image version looks ok but in text form it's just too many operations to keep track.

@ggerganov
Copy link
Member

Yes, good idea. I think it's better to just extend ggml_tensor instead of another level of indirection.
But anyway, this extension will happen soon as I also see various use cases

@ggerganov ggerganov added enhancement New feature or request high priority Very important issue labels Apr 21, 2023
@cmp-nct
Copy link
Contributor Author

cmp-nct commented Apr 21, 2023

Is the padding still relevant ? Can I just extend it without causing problems ?

Maybe a couple structs (non pointers) in that case, separating the tensorstruct into a few logical groups like performance, debug, scheduler would make it a bit more accessible while still keeping all together in one parent.

I am a bit confused on padding in general because in 32 bit pointer size is 4 bytes, so I am not sure if the static 8 byte padding is right for all hardware.

Currently I got that one in use. The name is set by an optional function call:

typedef struct
{
    struct ggml_tensor *tensor; // points back to it's tensor
    char *name;                 // a optional name

    uint8_t flag_high_performance; // if this tensor is high performance (relevant for hybrid core systems like Intel 12/13 gen)
    uint8_t flag_use_blas_cuda;    // defines if CUDA is enabled for this tensor (destination tensor)
} tensor_meta;

@ggerganov
Copy link
Member

Is the padding still relevant ? Can I just extend it without causing problems ?

Padding is just to make sure these asserts hold:

https://github.com/ggerganov/llama.cpp/blob/5f939498d517b4dddbe904f202e895a3ecfb9dc4/ggml.c#L3403-L3404

Maybe a couple structs (non pointers) in that case

I think we want extra members without wrapping them in extra structures. Just streamline it as much as possible.
Also, no char * pointers. Names should have a predetermined max length (e.g. 16 or 32) so that memory size per tensor is fixed

@sw
Copy link
Contributor

sw commented Apr 28, 2023

The static_assert(sizeof(... seems a bit dubious, it should probably be static_assert(_Alignof(... (which currently fails). If there was a clean and portable way to specify struct alignment, we could apply that to these two structs and remove the explicit padding bytes.

@ggerganov
Copy link
Member

Decided to add void * extra; after all

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request high priority Very important issue
Projects
None yet
Development

No branches or pull requests

3 participants