Add support for running on with Apple GPUs #181

kacpnowak · 2025-04-14T15:21:02Z

Is your feature request related to a problem? Please describe.

Currently to test the model one needs to use Nvidia GPU. In principale it should be possible to run it on with Apple GPU via Metal as well. It's implemented in llama.cpp

Describe the solution you'd like

When model is installed on a Mac it should use this dependency: https://github.com/philipturner/metal-flash-attention

Describe alternatives you've considered

No response

Additional context

No response

Organisation

AWI

clessig · 2025-04-14T16:53:38Z

Being able to run locally on a Mac would be useful for development. But how substainable would be this solution? How much time do we need to make it work?

kacpnowak · 2025-04-15T13:46:50Z

In the pyproject.toml the flash_attn is listed as optional, so one could just replace it flex_attn. If we don't plan to that, I think it should be changed as it can be confusing

kacpnowak added the enhancement New feature or request label Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for running on with Apple GPUs #181

Add support for running on with Apple GPUs #181

kacpnowak commented Apr 14, 2025

clessig commented Apr 14, 2025

kacpnowak commented Apr 15, 2025

Add support for running on with Apple GPUs #181

Add support for running on with Apple GPUs #181

Comments

kacpnowak commented Apr 14, 2025

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

clessig commented Apr 14, 2025

kacpnowak commented Apr 15, 2025