[llvm] Subject: [PATCH] [AArch64ISelLowering] Optimize rounding shift and saturation truncation (PR #74325)

Thu Dec 14 05:56:27 PST 2023

================
@@ -95,9 +95,9 @@ entry:
 define <16 x i8> @rshrn_v16i16_8(<16 x i16> %a) {
 ; CHECK-LABEL: rshrn_v16i16_8:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    movi v2.2d, #0000000000000000
-; CHECK-NEXT:    raddhn v0.8b, v0.8h, v2.8h
-; CHECK-NEXT:    raddhn2 v0.16b, v1.8h, v2.8h
+; CHECK-NEXT:    urshr v1.8h, v1.8h, #8
----------------
david-arm wrote:

On the surface this looks worse than before. raddhn has a latency of 2, throughput of 4 on neoverse-v1, whereas urshr has a latency of 4 and throughput of 2. I think the original code would likely be faster. Not sure if there is an easy way of keeping the old version here?

https://github.com/llvm/llvm-project/pull/74325