[Mlir-commits] [mlir] mlir::mesh::shardingOp adding shard-size control (PR #98145)

Tue Aug 6 22:11:03 PDT 2024

================
@@ -879,4 +1051,40 @@ def Mesh_ShiftOp : Mesh_CollectiveCommunicationOpBase<"shift", [
   let hasCanonicalizer = 1;
 }
 
+def Mesh_UpdateHaloOp : Mesh_CollectiveCommunicationOpBase<"update_halo", [
+    AllShapesMatch<["input", "result"]>,
+    AllElementTypesMatch<["input", "result"]>
+  ]> {
+  let summary = "Update halo data.";
+  let description = [{
+    This operation updates halo regions of shards, e.g. if their sharding
+    specified halos and the actual tensor data might have changed
+    on the remote devices. Changes might be caused by mutating operations
+    and/or if the new halo regions are larger than the existing ones.
+
+    Assumes all devices hold tensors with same-sized halo data as specified
+    by `dynamic/static_halo_sizes`.
+
+    `mesh_axes` specifies the tensor axes along which the halo data is updated.
+    Currently each tensor dim can be sharded along a single mesh axis only.
+
+    Optionally resizes to new halo sizes `target_halo_sizes`.
----------------
sogartar wrote:

I don't understand what this operation should do exactly. We assume there is an overlap of the halos on different devices. How do we know which device holds the "true" value for some tensor region, so that overlapping neighbors can be updated?

https://github.com/llvm/llvm-project/pull/98145