[Mlir-commits] [mlir] mlir::mesh::shardingOp adding shard-size control (PR #98145)
Frank Schlimbach
llvmlistbot at llvm.org
Wed Aug 7 00:43:56 PDT 2024
================
@@ -879,4 +1051,40 @@ def Mesh_ShiftOp : Mesh_CollectiveCommunicationOpBase<"shift", [
let hasCanonicalizer = 1;
}
+def Mesh_UpdateHaloOp : Mesh_CollectiveCommunicationOpBase<"update_halo", [
+ AllShapesMatch<["input", "result"]>,
+ AllElementTypesMatch<["input", "result"]>
+ ]> {
+ let summary = "Update halo data.";
+ let description = [{
+ This operation updates halo regions of shards, e.g. if their sharding
+ specified halos and the actual tensor data might have changed
+ on the remote devices. Changes might be caused by mutating operations
+ and/or if the new halo regions are larger than the existing ones.
+
+ Assumes all devices hold tensors with same-sized halo data as specified
+ by `dynamic/static_halo_sizes`.
+
+ `mesh_axes` specifies the tensor axes along which the halo data is updated.
+ Currently each tensor dim can be sharded along a single mesh axis only.
+
+ Optionally resizes to new halo sizes `target_halo_sizes`.
----------------
fschlimb wrote:
This might need adjustments now that `force` is gone. As long as we know the tensor and its sharding we have all data we need. The current definition does not allow splitting a tensor along more than one mesh dimension (the inherited axis argument is 1d). The doc should read
"`mesh_axes` specifies for each tensor axis along which mesh axis its halo data is updated.
Currently each tensor dim can be sharded along a single mesh axis only."
I started working on a separate PR which will lower this op to MPI. I will fix this there. Hope that's ok.
https://github.com/llvm/llvm-project/pull/98145
More information about the Mlir-commits
mailing list