[llvm] [AArch64][SVE2] Lower read-after-write mask to whilerw (PR #114028)

Wed Oct 30 09:17:55 PDT 2024

================
@@ -30,6 +30,36 @@ entry:
   ret <vscale x 16 x i1> %active.lane.mask.alias
 }
 
+define <vscale x 16 x i1> @whilerw_8(ptr noalias %a, ptr %b, ptr %c, i32 %n) {
+; CHECK-LABEL: whilerw_8:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    whilerw p0.b, x2, x1
+; CHECK-NEXT:    ret
+;
+; CHECK-NOSVE2-LABEL: whilerw_8:
+; CHECK-NOSVE2:       // %bb.0: // %entry
+; CHECK-NOSVE2-NEXT:    subs x8, x2, x1
+; CHECK-NOSVE2-NEXT:    cneg x8, x8, mi
+; CHECK-NOSVE2-NEXT:    cmp x8, #0
+; CHECK-NOSVE2-NEXT:    cset w9, lt
+; CHECK-NOSVE2-NEXT:    whilelo p0.b, xzr, x8
+; CHECK-NOSVE2-NEXT:    sbfx x8, x9, #0, #1
+; CHECK-NOSVE2-NEXT:    whilelo p1.b, xzr, x8
+; CHECK-NOSVE2-NEXT:    sel p0.b, p0, p0.b, p1.b
+; CHECK-NOSVE2-NEXT:    ret
+entry:
+  %b24 = ptrtoint ptr %b to i64
+  %c25 = ptrtoint ptr %c to i64
+  %sub.diff = sub i64 %c25, %b24
+  %0 = tail call i64 @llvm.abs.i64(i64 %sub.diff, i1 false)
+  %neg.compare = icmp slt i64 %0, 0
----------------
davemgreen wrote:

I think you are right - the original whilewr testing was based on some bad testing from me (the tests returning a predicate confused things). It goes a bit further than what you said though, I had looked at these intrinsics a while ago and come to the conclusion that they were difficult to match. Mathematically they do:
 diff = zext(a) - zext(b)
 elem[i] = splat(diff <= 0) | ALM(0, diff)
The zext is important if the top bits can be 1. So with inputs like 0 and 0xfffffffffffffff4 we might produce the wrong results. That likely won't be the case for pointers, but as the inputs are just integers we might need to take that into account.

I think the operands of the whilewr are also the wrong way around.

https://github.com/llvm/llvm-project/pull/114028