Multidimensional device mesh #3821

cowanmeg · 2025-02-04T22:04:02Z

Extends DeviceMesh to support multidimensional meshes. Representation is a flat vector of devices and a vector of shapes. Since we only have plans to support DIDx, DIDy, and DIDz this naturally restricts the max dimensions to 3D.
Key functionality is in getSlice(DeviceIdxType, ParallelType) which gives a set of devices that a tensor is sharded over and communicates with given the (device) parallel type which translates into a dimension into the mesh.
Trivially modifies communication lowering to analyze a slice instead of the entire mesh for collectives that use the same mesh (AllReduce, ReduceScatter, AllGather). Collectives without this property assert that it is operating over a 1D mesh.
Adds ParallelType's DIDy, DIDz. These are not used anywhere except to index into 2D and 3D device mesh tests.

cowanmeg · 2025-02-06T20:49:01Z

!build

csrc/multidevice/device_mesh.cpp

wujingyue · 2025-02-07T00:34:45Z

csrc/host_ir/lower.h


  static std::unique_ptr<hir::HostIrContainer> lower(
      std::unique_ptr<Fusion> fusion,
-      int64_t my_device_index);
+      DeviceIdxType my_device_index);


Question: will lowering have to take the current GPU ID? I understand 2D meshes lead to different teams so host IR as is has to be different. An alternative would be for each Communication to take teams (a vector of vectors which include all GPUs) rather than team (the GPUs in the team that the current GPU is in). Only at runtime, each device looks up which team in teams it's in. This way, all ranks have the same host IR (except for pipeline parallelism) so we can distribute the lowering on all ranks for compilation speed.

cc @samnordmann

Lowering does not have to take the GPU IDs, but more a matter of where to shift to overhead from lookup and what parallelism strategies you expect to support.
If you expect teams to be dynamic throughout training, basically dynamic meshes for each run, obviously you only know the team at runtime. However, if we expect one training job to keep the mesh static, then moving the creation into compile time and push any overheads into a one time cost.

The vector of teams is interesting, it does save lowering time since it pushes that cost into run time. Run time lookup is probably not an issue since the mesh's will only be within a node so O10s device.

Question: will lowering have to take the current GPU ID? I understand 2D meshes lead to different teams so host IR as is has to be different. An alternative would be for each Communication to take teams

Yes, sounds good. An alternative would be that the Communication only takes one team, but Team inherits from Val* and is bound to concrete value through expr_evalutator_. That Team could be defined through an Expr* that indicates taking the mesh's slice over some axis at my_device_id. This way, the Communication IR is symmetric across all ranks, and it covers both static and dynamic case.

cowanmeg · 2025-02-07T21:35:00Z

!test

cowanmeg · 2025-02-07T23:18:03Z

!build

wujingyue · 2025-02-08T01:04:39Z

cc @xwang233 there's apparently a bug in the PR agent tool

wujingyue · 2025-02-08T01:04:42Z

!test

xwang233 · 2025-02-08T01:30:18Z

cc @xwang233 there's apparently a bug in the PR agent tool

For safety reasons, GitHub actions triggered from forked repos (not nvidia/fuser but another_user/fuser) cannot see action secrets, where LLM API keys are stored. Thus, no reviews can be generated.

I would recommend @cowanmeg to directly create branch on this repo for future PRs. 😉

wujingyue · 2025-02-08T01:57:40Z

I would recommend @cowanmeg to directly create branch on this repo for future PRs.

Got it! @cowanmeg, did you run into permission issues when pushing branches to nvFuser? I don't remember what it takes.

cowanmeg · 2025-02-08T02:07:33Z

Hmm the option to squash and merge is coming up. I can also just push this directly to a branch on Fuser and open a new PR so that the PR agent tools can run? We can use this PR for the unit tests at least

wujingyue · 2025-02-08T03:50:03Z

I didn't mean to ask you wait for PR agent -- it's certainly optional. I was probably unclear, sorry.

I meant to ask you check whether you are able to create new branches in NVIDIA/Fuser. If yes, I'd recommend do that for your future PRs so the PR agent can kick and you can stack your PRs. If no, @xwang233 and I will try to figure out why because you are apparently a "collaborator" of NVIDIA/Fuser already.

cowanmeg · 2025-02-08T18:00:54Z

I can make branches on Fuser! I am just used to working on a fork so defaulted to that! Thanks!

This reverts commit 19c46bf.

cowanmeg added 2 commits February 3, 2025 16:37

initial commit

d59c9ba

printing

529980b

cowanmeg marked this pull request as draft February 4, 2025 22:04

cowanmeg changed the title ~~Multi-dimension device mesh~~ Multidimensional device mesh Feb 4, 2025

cowanmeg added 4 commits February 4, 2025 15:55

update lowering

0f28ba6

temp

f3653a5

clean up

9e8bdf4

lint

8f6a561

cowanmeg marked this pull request as ready for review February 6, 2025 20:42

cowanmeg requested a review from wujingyue February 6, 2025 20:54

wujingyue reviewed Feb 7, 2025

View reviewed changes

csrc/multidevice/device_mesh.cpp Outdated Show resolved Hide resolved

csrc/multidevice/device_mesh.cpp Outdated Show resolved Hide resolved

csrc/multidevice/device_mesh.cpp Outdated Show resolved Hide resolved

wujingyue reviewed Feb 7, 2025

View reviewed changes

wujingyue approved these changes Feb 7, 2025

View reviewed changes

feedback

843cd6a

wujingyue merged commit 19c46bf into NVIDIA:main Feb 8, 2025
40 of 43 checks passed

wujingyue mentioned this pull request Feb 8, 2025

Segfault after #3821 #3856

Open

wujingyue added a commit that referenced this pull request Feb 8, 2025

Revert "Multidimensional device mesh (#3821)"

d76c08a

This reverts commit 19c46bf.

wujingyue mentioned this pull request Feb 8, 2025

Revert "Multidimensional device mesh" #3857

Merged

cowanmeg mentioned this pull request Feb 21, 2025

Multidimensional mesh #3937

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multidimensional device mesh #3821

Multidimensional device mesh #3821

cowanmeg commented Feb 4, 2025 •

edited

Loading

cowanmeg commented Feb 6, 2025

wujingyue Feb 7, 2025

cowanmeg Feb 7, 2025

samnordmann Feb 10, 2025

cowanmeg commented Feb 7, 2025

cowanmeg commented Feb 7, 2025

wujingyue commented Feb 8, 2025

wujingyue commented Feb 8, 2025

xwang233 commented Feb 8, 2025

wujingyue commented Feb 8, 2025

cowanmeg commented Feb 8, 2025 •

edited

Loading

wujingyue commented Feb 8, 2025

cowanmeg commented Feb 8, 2025 •

edited

Loading

Multidimensional device mesh #3821

Multidimensional device mesh #3821

Conversation

cowanmeg commented Feb 4, 2025 • edited Loading

cowanmeg commented Feb 6, 2025

wujingyue Feb 7, 2025

Choose a reason for hiding this comment

cowanmeg Feb 7, 2025

Choose a reason for hiding this comment

samnordmann Feb 10, 2025

Choose a reason for hiding this comment

cowanmeg commented Feb 7, 2025

cowanmeg commented Feb 7, 2025

wujingyue commented Feb 8, 2025

wujingyue commented Feb 8, 2025

xwang233 commented Feb 8, 2025

wujingyue commented Feb 8, 2025

cowanmeg commented Feb 8, 2025 • edited Loading

wujingyue commented Feb 8, 2025

cowanmeg commented Feb 8, 2025 • edited Loading

cowanmeg commented Feb 4, 2025 •

edited

Loading

cowanmeg commented Feb 8, 2025 •

edited

Loading

cowanmeg commented Feb 8, 2025 •

edited

Loading