Add Qwen3 Moe #2260

kanpuriyanawab · 2025-05-19T16:17:06Z

No description provided.

mattdangerw

Thanks! Took an initial pass. Let's try to clean up the config and state passing.

No passing an index down the layer stack, plus data structures that apply to the whole layer stack.

keras_hub/src/models/qwen3_moe/qwen3_causal_lm_preprocessor.py

keras_hub/src/models/qwen3_moe/qwen3_moe_attention.py

mattdangerw · 2025-05-20T15:39:46Z

keras_hub/src/models/qwen3_moe/qwen3_moe_attention.py

+        self,
+        num_query_heads,
+        num_key_value_heads,
+        layer_index,


This layer index is gross, let's remove it. Handle the args properly in the backbone and pass the correct sliding_window_size to this layer and the decoder layer above it.

since it's an Moe, layer index is not just used for sliding window but for experts

I replaced this passing of layer_index, decoder_sparse_step and mlp_only_layers with a single boolean switch:

https://github.com/kanpuriyanawab/keras-nlp/blob/730a9c41e95a74906a041d8933b8d7738391b438/keras_hub/src/models/qwen3_moe/qwen3_moe_backbone.py#L129-L156

keras_hub/src/models/qwen3_moe/qwen3_moe_attention.py

keras_hub/src/models/qwen3_moe/qwen3_moe_backbone.py

mattdangerw · 2025-05-20T15:47:09Z

keras_hub/src/models/qwen3_moe/qwen3_moe_backbone.py

+    model(input_data)
+    """
+
+    def __init__(


In general, let's make sure we prune this list down just to the config options we need.

mattdangerw · 2025-05-20T15:47:58Z

keras_hub/src/models/qwen3_moe/qwen3_moe_backbone.py

+        sliding_window_size=32768,
+        output_router_logits=False,
+        router_aux_loss_coefficient=0.001,
+        mlp_only_layers=[],


Fine to have something like this for the toplevel, but let's pass something more direct to each decoder layer (so we don't need to pass the index down). Make sure to document if we keep it.

but let's pass something more direct to each decoder layer

what do you suggest?

keras_hub/src/models/qwen3_moe/qwen3_moe_decoder.py

qwen3 moe init

e528378

kanpuriyanawab self-assigned this May 19, 2025

kanpuriyanawab added 2 commits May 20, 2025 14:43

bug fixes

84043a3

update

750412c

mattdangerw reviewed May 20, 2025

View reviewed changes

kanpuriyanawab and others added 5 commits May 24, 2025 09:16

Merge branch 'keras-team:master' into qwen3_moe

9b3d779

address comments

6b74171

address comments

730a9c4

update output matching script

5f90d10

fix test

cda9cfc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Qwen3 Moe #2260

Add Qwen3 Moe #2260

Uh oh!

kanpuriyanawab commented May 19, 2025

Uh oh!

mattdangerw left a comment

Uh oh!

Uh oh!

Uh oh!

mattdangerw May 20, 2025

Uh oh!

kanpuriyanawab May 24, 2025

Uh oh!

kanpuriyanawab May 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw May 20, 2025

Uh oh!

mattdangerw May 20, 2025

Uh oh!

kanpuriyanawab May 27, 2025

Uh oh!

Uh oh!

Uh oh!

Add Qwen3 Moe #2260

Are you sure you want to change the base?

Add Qwen3 Moe #2260

Uh oh!

Conversation

kanpuriyanawab commented May 19, 2025

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mattdangerw May 20, 2025

Choose a reason for hiding this comment

Uh oh!

kanpuriyanawab May 24, 2025

Choose a reason for hiding this comment

Uh oh!

kanpuriyanawab May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw May 20, 2025

Choose a reason for hiding this comment

Uh oh!

mattdangerw May 20, 2025

Choose a reason for hiding this comment

Uh oh!

kanpuriyanawab May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!