[llvm-dev] SCEV and LoopStrengthReduction Formulae
Gopalasubramanian, Ganesh via llvm-dev
llvm-dev at lists.llvm.org
Wed Apr 4 01:11:00 PDT 2018
> cmpq %rbx, %r14
> jne .LBB0_1
>
> LLVM can perform compare-jump fusion, it already does in certain cases, but
> not in the case above. We can remove the cmp above if we were to perform
> the following transformation:
Do you mean branch-fusion (https://en.wikichip.org/wiki/macro-operation_fusion)?
Is there any more limitation why these two or not fused?
> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of via llvm-
> dev
> Sent: Wednesday, April 4, 2018 3:30 AM
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] SCEV and LoopStrengthReduction Formulae
>
> I am attempting to implement a minor loop strength reduction optimization for
> targets that support compare and jump fusion, specifically
> TTI::canMacroFuseCmp(). My approach might be wrong; however, I am
> soliciting the idea for feedback, so that I can implement this correctly. My plan
> is to add a Supplemental LSR formula to LoopStrengthReduce.cpp that optimizes
> the following case, but perhaps this should stand alone as its own pass:
>
> // Example which can be optimized via cmp/jmp fusion.
> // clang -O3 -S test.c
> extern void g(int);
> void f(int *p, long long n) {
> do {
> g(*p++);
> } while (--n);
> }
>
> LLVM currently generates the following sequence for x86_64 targets:
> LBB0_1:
> movl (%r15,%rbx,4), %edi
> callq g
> addq $1, %rbx
> cmpq %rbx, %r14
> jne .LBB0_1
>
> LLVM can perform compare-jump fusion, it already does in certain cases, but
> not in the case above. We can remove the cmp above if we were to perform
> the following transformation:
> 1.0) Initialize the induction variable, %rbx, to be 'n' instead of zero.
> 1.1) Negate the induction variable, so that the loop increments from -n to zero.
> 2.0) Set the base pointer, %r15, to be the last address that will be accessed
> from 'p'
> 2.1) In the preheader of the loop set %r15 to be: %r15 = <base> + n * <ele size>
> 3.0) Remove the cmpq
>
> We can remove the cmp, since we are now counting from -n to zero. The
> termination case for the loop will now be zero. The add instruction will set the
> zero-flag when %rbx reaches zero (target specific of course).
>
> The pointer passed to g() will also be correct, since we access the same items we
> normally would have, but the access is with respect to the base being the last
> item and subtracting from the last item, via a negative induction variable. I
> imagine the result would look something like the following:
> movq %rsi, %rbx ; rbx becomes the value n
> leaq (%rdi, %rbx, 4), %r15 ; <base> = p + (n * 4)
> negq %rbx ; set rbx to be -n, so we count from -n to 0
> LBB0_1:
> movl (%r15,%rbx,4), %edi
> callq g
> addq $1, %rbx
> jne .LBB0_1
>
> I realize this is a micro-op saving a single cycle. But this reduces the instruction
> count, one less instr to decode in a potentially hot path. If this all makes sense,
> and seems like a reasonable addition to llvm, would it make sense to implement
> this as a supplemental LSR formula, or as a separate pass?
>
> -Matt
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list