[llvm] [Codegen][LegalizeIntegerTypes] Improve shift through stack (PR #96151)

Thu Aug 29 08:50:54 PDT 2024

================
@@ -238,18 +270,34 @@ define void @ashr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; ALL-NEXT:    ldp x9, x8, [x0, #16]
 ; ALL-NEXT:    ldr x10, [x1]
 ; ALL-NEXT:    ldr q0, [x0]
-; ALL-NEXT:    and x10, x10, #0x1f
+; ALL-NEXT:    and x11, x10, #0x18
 ; ALL-NEXT:    stp x9, x8, [sp, #16]
 ; ALL-NEXT:    asr x8, x8, #63
 ; ALL-NEXT:    mov x9, sp
 ; ALL-NEXT:    str q0, [sp]
+; ALL-NEXT:    add x9, x9, x11
 ; ALL-NEXT:    stp x8, x8, [sp, #48]
 ; ALL-NEXT:    stp x8, x8, [sp, #32]
-; ALL-NEXT:    add x8, x9, x10
-; ALL-NEXT:    ldp x10, x9, [x8, #16]
-; ALL-NEXT:    ldr q0, [x8]
-; ALL-NEXT:    str q0, [x2]
-; ALL-NEXT:    stp x10, x9, [x2, #16]
+; ALL-NEXT:    lsl x8, x10, #3
----------------
davemgreen wrote:

Am I right is saying that we start off with a large i256 shift, where we can work out that it shifts by a multiple of 8? So we store the original value to the stack slot of twice the size, and reload from `stack+shift/8`? And if you use that alignment you don't need to do anything else, but if you require a higher alignment then you might need to do a follow-up shift too?

https://github.com/llvm/llvm-project/pull/96151