Skip to content

ggml : fix arch check in bf16_to_fp32 #10164

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Nov 4, 2024
Merged

Conversation

slaren
Copy link
Member

@slaren slaren commented Nov 4, 2024

Fixes #10154

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 4, 2024
@slaren slaren merged commit a9e8a9a into master Nov 4, 2024
54 checks passed
@slaren slaren deleted the sl/fix-bf16-arch-check branch November 4, 2024 22:17
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 7, 2024
* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 8, 2024
* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 15, 2024
* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 16, 2024
* merge (#20)

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 22, 2024
* Merge (#21)

* merge (#20)

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Dec 1, 2024
* Temp (#23)

* Merge (#21)

* merge (#20)

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Bump the pip group across 2 directories with 2 updates (#24)

Updates the requirements on [pillow](https://github.com/python-pillow/Pillow) and [aiohttp](https://github.com/aio-libs/aiohttp) to permit the latest version.

Updates `pillow` to 11.0.0
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](python-pillow/Pillow@10.2.0...11.0.0)

Updates `aiohttp` to 3.11.7
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](aio-libs/aiohttp@v3.9.3...v3.11.7)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: aiohttp
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: apicalshark <58538165+apicalshark@users.noreply.github.com>

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Create docker.yml

* Create python-lint.yml

* Create server.yml

* Update requirements.txt

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: __AVX2__ missing
2 participants