[llvm-dev] SCEV and LoopStrengthReduction Formulae

Wed Apr 4 01:11:00 PDT 2018

> 	cmpq	%rbx, %r14
> 	jne	.LBB0_1
>
> LLVM can perform compare-jump fusion, it already does in certain cases, but
> not in the case above.  We can remove the cmp above if we were to perform
> the following transformation:

Do you mean branch-fusion (https://en.wikichip.org/wiki/macro-operation_fusion)? 
Is there any more limitation why these two or not fused?

> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of via llvm-
> dev
> Sent: Wednesday, April 4, 2018 3:30 AM
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] SCEV and LoopStrengthReduction Formulae
> 
> I am attempting to implement a minor loop strength reduction optimization for
> targets that support compare and jump fusion, specifically
> TTI::canMacroFuseCmp().  My approach might be wrong; however, I am
> soliciting the idea for feedback, so that I can implement this correctly.  My plan
> is to add a Supplemental LSR formula to LoopStrengthReduce.cpp that optimizes
> the following case, but perhaps this should stand alone as  its own pass:
> 
> // Example which can be optimized via cmp/jmp fusion.
> // clang -O3 -S test.c
> extern void g(int);
> void f(int *p, long long n) {
>     do {
>         g(*p++);
>     } while (--n);
> }
> 
> LLVM currently generates the following sequence for x86_64 targets:
> LBB0_1:
> 	movl	(%r15,%rbx,4), %edi
> 	callq	g
> 	addq	$1, %rbx
> 	cmpq	%rbx, %r14
> 	jne	.LBB0_1
> 
> LLVM can perform compare-jump fusion, it already does in certain cases, but
> not in the case above.  We can remove the cmp above if we were to perform
> the following transformation:
> 1.0) Initialize the induction variable, %rbx, to be 'n' instead of zero.
> 1.1) Negate the induction variable, so that the loop increments from -n to zero.
> 2.0) Set the base pointer, %r15, to be the last address that will be accessed
>        from 'p'
> 2.1) In the preheader of the loop set %r15 to be: %r15 = <base> + n * <ele size>
> 3.0) Remove the cmpq
> 
> We can remove the cmp, since we are now counting from -n to zero.  The
> termination case for the loop will now be zero.  The add instruction will set the
> zero-flag when %rbx reaches zero (target specific of course).
> 
> The pointer passed to g() will also be correct, since we access the same items we
> normally would have, but the access is with respect to the base being the last
> item and subtracting from the last item, via a negative induction variable.  I
> imagine the result would look something like the following:
>     movq   %rsi,   %rbx                   ; rbx becomes the value n
>     leaq     (%rdi, %rbx, 4), %r15  ; <base> = p + (n * 4)
>     negq    %rbx                              ; set rbx to be -n, so we count from -n to 0
> LBB0_1:
>     movl    (%r15,%rbx,4), %edi
>     callq    g
>     addq   $1, %rbx
>     jne      .LBB0_1
> 
> I realize this is a micro-op saving a single cycle.  But this reduces the instruction
> count, one less instr to decode in a potentially hot path. If this all makes sense,
> and seems like a reasonable addition to llvm, would it make sense to implement
> this as a supplemental LSR formula, or as a separate pass?
> 
> -Matt
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev