[PATCH] D34515: [ARM] Materialise some boolean values to avoid a branch

Roger Ferrer Ibanez via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 28 09:41:13 PDT 2017


rogfer01 added inline comments.


================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:12106
+  // Materialize a boolean comparison for integers so we can avoid branching.
+  if (VT.isInteger()) {
+    if (isNullConstant(FalseVal)) {
----------------
samparker wrote:
> early exit when not integer instead?
Sadly there is some code after this block that does not require integer (if I'm reading it correctly).


================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:12136
+                          DAG.getConstant(1, dl, MVT::i32), Neg.getValue(1));
+          Res = DAG.getNode(ISD::ADDCARRY, dl, VTs, Sub, Neg, Carry);
+        }
----------------
samparker wrote:
> I don't follow how this sequence better? What makes this more efficient than a cmp and conditional move? On a Thumb1Only target, even a cmp and branch will only be 3 cycles, I think.
Consider a case like

```
int test(int a, int b) {
  return a != b;
}
```

ToT is currently generating (`--target=arm -mcpu=cortex-m0 -O2`)

```
test:
	.fnstart
@ BB#0:                                 @ %entry
	mov	r2, r0
	movs	r0, #1
	movs	r3, #0
	cmp	r2, r1
	bne	.LBB0_2
@ BB#1:                                 @ %entry
	mov	r0, r3
.LBB0_2:                                @ %entry
	bx	lr
```

with this combiner we can generate

```
test:
	.fnstart
@ BB#0:                                 @ %entry
	subs	r0, r0, r1
	subs	r1, r0, #1
	sbcs	r0, r1
	bx	lr
```

So, not counting `bx`, we go from a sequence of 5/6 to 3.




================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:12185
+      if (ShiftAmount)
+        Res = DAG.getNode(ISD::SHL, dl, VT, Res,
+                          DAG.getConstant(ShiftAmount, dl, MVT::i32));
----------------
samparker wrote:
> Same as above, I guess I'm missing something... could you explain why this is better please?
Consider a case like

```
int test(int a, int b) {
  return a == b;
}
```

ToT is currently generating (`--target=arm -mcpu=cortex-m0 -O2`)

```
test:
	.fnstart
@ BB#0:                                 @ %entry
	mov	r2, r0
	movs	r0, #1
	movs	r3, #0
	cmp	r2, r1
	beq	.LBB0_2
@ BB#1:                                 @ %entry
	mov	r0, r3
.LBB0_2:                                @ %entry
	bx	lr

```

But the code above is in practice like

```
int test(int a, int b) {
  return a == b ? 1 : 0;
}
```

where `k = 0` (`2^k = 1`). 

So with this change we can generate

```
test:
	.fnstart
@ BB#0:                                 @ %entry
	subs	r1, r0, r1
	movs	r0, #0
	subs	r0, r0, r1
	adcs	r0, r1
	bx	lr
```

We go from 5/6 instructions to always 4 (not counting the `bx`).

When `k != 0` (e.g. `return a == b ? 0 : k`) the improvement is more modest from 5/6 to always 5, due to the LSL





================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:12227-12230
+  // case ARMISD::ADDC:    return PerformADDCCombine(N, DCI, Subtarget);
   case ARMISD::ADDE:    return PerformADDECombine(N, DCI, Subtarget);
+  // case ARMISD::SUBC:    return PerformSUBCCombine(N, DCI, Subtarget);
+  // case ARMISD::SUBE:    return PerformSUBECombine(N, DCI.DAG, Subtarget);
----------------
Ignore these comments, this ended here accidentally. I'll remove them in the next update.


https://reviews.llvm.org/D34515





More information about the llvm-commits mailing list