[Mlir-commits] [mlir] mlir::mesh::shardingOp adding shard-size control (PR #98145)

Wed Aug 7 00:43:56 PDT 2024

================
@@ -879,4 +1051,40 @@ def Mesh_ShiftOp : Mesh_CollectiveCommunicationOpBase<"shift", [
   let hasCanonicalizer = 1;
 }
 
+def Mesh_UpdateHaloOp : Mesh_CollectiveCommunicationOpBase<"update_halo", [
+    AllShapesMatch<["input", "result"]>,
+    AllElementTypesMatch<["input", "result"]>
+  ]> {
+  let summary = "Update halo data.";
+  let description = [{
+    This operation updates halo regions of shards, e.g. if their sharding
+    specified halos and the actual tensor data might have changed
+    on the remote devices. Changes might be caused by mutating operations
+    and/or if the new halo regions are larger than the existing ones.
+
+    Assumes all devices hold tensors with same-sized halo data as specified
+    by `dynamic/static_halo_sizes`.
+
+    `mesh_axes` specifies the tensor axes along which the halo data is updated.
+    Currently each tensor dim can be sharded along a single mesh axis only.
+
+    Optionally resizes to new halo sizes `target_halo_sizes`.
----------------
fschlimb wrote:

This might need adjustments now that `force` is gone. As long as we know the tensor and its sharding we have all data we need. The current definition does not allow splitting a tensor along more than one mesh dimension (the inherited axis argument is 1d). The doc should read
"`mesh_axes` specifies for each tensor axis along which mesh axis its halo data is updated.
    Currently each tensor dim can be sharded along a single mesh axis only."

I started working on a separate PR which will lower this op to MPI. I will fix this there. Hope that's ok.

https://github.com/llvm/llvm-project/pull/98145