[llvm] [Codegen][LegalizeIntegerTypes] Improve shift through stack (PR #96151)

Fri Aug 30 07:15:40 PDT 2024

================
@@ -238,18 +270,34 @@ define void @ashr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; ALL-NEXT:    ldp x9, x8, [x0, #16]
 ; ALL-NEXT:    ldr x10, [x1]
 ; ALL-NEXT:    ldr q0, [x0]
-; ALL-NEXT:    and x10, x10, #0x1f
+; ALL-NEXT:    and x11, x10, #0x18
 ; ALL-NEXT:    stp x9, x8, [sp, #16]
 ; ALL-NEXT:    asr x8, x8, #63
 ; ALL-NEXT:    mov x9, sp
 ; ALL-NEXT:    str q0, [sp]
+; ALL-NEXT:    add x9, x9, x11
 ; ALL-NEXT:    stp x8, x8, [sp, #48]
 ; ALL-NEXT:    stp x8, x8, [sp, #32]
-; ALL-NEXT:    add x8, x9, x10
-; ALL-NEXT:    ldp x10, x9, [x8, #16]
-; ALL-NEXT:    ldr q0, [x8]
-; ALL-NEXT:    str q0, [x2]
-; ALL-NEXT:    stp x10, x9, [x2, #16]
+; ALL-NEXT:    lsl x8, x10, #3
----------------
davemgreen wrote:

Yeah OK. I don't see why we would make this optimization worse by requiring aligned loads on machines where there is no penalty for doing so, but it is relatively unlikely to come up in practice so feel free to ignore me and continue on getting this committed.

https://github.com/llvm/llvm-project/pull/96151