[PATCH] D34515: [ARM] Materialise some boolean values to avoid a branch

Fri Sep 29 02:08:17 PDT 2017

samparker added inline comments.

================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:12106
+  // Materialize a boolean comparison for integers so we can avoid branching.
+  if (VT.isInteger()) {
+    if (isNullConstant(FalseVal)) {
----------------
rogfer01 wrote:
> samparker wrote:
> > early exit when not integer instead?
> Sadly there is some code after this block that does not require integer (if I'm reading it correctly).
Couldn't you move the original combiner to live above this? Doesn't look like the known bits will be operating on floats.

================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:12136
+                          DAG.getConstant(1, dl, MVT::i32), Neg.getValue(1));
+          Res = DAG.getNode(ISD::ADDCARRY, dl, VTs, Sub, Neg, Carry);
+        }
----------------
rogfer01 wrote:
> samparker wrote:
> > I don't follow how this sequence better? What makes this more efficient than a cmp and conditional move? On a Thumb1Only target, even a cmp and branch will only be 3 cycles, I think.
> Consider a case like
> 
> ```
> int test(int a, int b) {
>   return a != b;
> }
> ```
> 
> ToT is currently generating (`--target=arm -mcpu=cortex-m0 -O2`)
> 
> ```
> test:
> 	.fnstart
> @ BB#0:                                 @ %entry
> 	mov	r2, r0
> 	movs	r0, #1
> 	movs	r3, #0
> 	cmp	r2, r1
> 	bne	.LBB0_2
> @ BB#1:                                 @ %entry
> 	mov	r0, r3
> .LBB0_2:                                @ %entry
> 	bx	lr
> ```
> 
> with this combiner we can generate
> 
> ```
> test:
> 	.fnstart
> @ BB#0:                                 @ %entry
> 	subs	r0, r0, r1
> 	subs	r1, r0, #1
> 	sbcs	r0, r1
> 	bx	lr
> ```
> 
> So, not counting `bx`, we go from a sequence of 5/6 to 3.
> 
> 
aaah, awesome, thanks!

================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:12185
+      if (ShiftAmount)
+        Res = DAG.getNode(ISD::SHL, dl, VT, Res,
+                          DAG.getConstant(ShiftAmount, dl, MVT::i32));
----------------
rogfer01 wrote:
> samparker wrote:
> > Same as above, I guess I'm missing something... could you explain why this is better please?
> Consider a case like
> 
> ```
> int test(int a, int b) {
>   return a == b;
> }
> ```
> 
> ToT is currently generating (`--target=arm -mcpu=cortex-m0 -O2`)
> 
> ```
> test:
> 	.fnstart
> @ BB#0:                                 @ %entry
> 	mov	r2, r0
> 	movs	r0, #1
> 	movs	r3, #0
> 	cmp	r2, r1
> 	beq	.LBB0_2
> @ BB#1:                                 @ %entry
> 	mov	r0, r3
> .LBB0_2:                                @ %entry
> 	bx	lr
> 
> ```
> 
> But the code above is in practice like
> 
> ```
> int test(int a, int b) {
>   return a == b ? 1 : 0;
> }
> ```
> 
> where `k = 0` (`2^k = 1`). 
> 
> So with this change we can generate
> 
> ```
> test:
> 	.fnstart
> @ BB#0:                                 @ %entry
> 	subs	r1, r0, r1
> 	movs	r0, #0
> 	subs	r0, r0, r1
> 	adcs	r0, r1
> 	bx	lr
> ```
> 
> We go from 5/6 instructions to always 4 (not counting the `bx`).
> 
> When `k != 0` (e.g. `return a == b ? 0 : k`) the improvement is more modest from 5/6 to always 5, due to the LSL
> 
> 
> 
thanks for the great explanation!

================
Comment at: test/CodeGen/ARM/atomic-cmpxchg.ll:29
+; CHECK-THUMB: subs r0, r0, [[R1]]
+; CHECK-THUMB: adcs r0, [[R1]]

----------------
Sorry, I really should of checked the tested before I asked you to explain... so thanks again for taking the time.

================
Comment at: test/CodeGen/ARM/cmn.ll:10
   %cmp = icmp sgt i32 %a, -78
-  %. = zext i1 %cmp to i32
-  ret i32 %.
+  %ret = select i1 %cmp, i32 42, i32 24
+  ret i32 %ret
----------------
Why change the input?

================
Comment at: test/CodeGen/ARM/cmpxchg-O0.ll:23
+; CHECK:     clz [[CMP2:r[0-9]+]], [[CMP1]]
+; CHECK:     lsr{{(s)?}} {{r[0-9]+}}, [[CMP2]], #5
 ; CHECK:     dmb ish
----------------
Is this still beneficial when conditional moves are available? This test makes it look like an extra instruction is used.

================
Comment at: test/CodeGen/ARM/select-imm.ll:155
+
+define i32 @t7(i32 %a, i32 %b) nounwind readnone {
+entry:
----------------
I'd say that it's also worth checking that branches aren't generated for these tests.

================
Comment at: test/CodeGen/Thumb/branchless-cmp.ll:3
+
+define i32 @test1a(i32 %a, i32 %b) {
+entry:
----------------
Same here, it would be nice to ensure that there isn't a cmp and a br generated.

================
Comment at: test/CodeGen/Thumb2/thumb2-cmn.ll:9
     %tmp = icmp ne i32 %a, %nb
-    ret i1 %tmp
+    %ret = select i1 %tmp, i32 42, i32 24
+    ret i32 %ret
----------------
Why the input change for these tests?

https://reviews.llvm.org/D34515