[PATCH] D27861: [DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2.

Mon Jan 9 05:03:31 PST 2017

apilipenko added a comment.

Currently update_llc_test_checks.py supports arm-eabi target only. I left ARM test cases with manually written checks for now.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:4481
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  if (!TLI.isOperationLegal(ISD::LOAD, VT))
+    return SDValue();
----------------
RKSimon wrote:
> What is the effect of changing this to:
> ```
> if (LegalOperations && !TLI.isOperationLegal(ISD::LOAD, VT))
> ```
> Would the legalize do such a bad job of splitting poorly combined loads/bswaps?
This looks like a good idea, it enables combining of i64 pattern to two i32 loads on 32 bit targets (first loads are combined to a single i64 load and  then it is split into to i32 loads).

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:4559
+  bool NeedsBswap = DAG.getDataLayout().isBigEndian() != BigEndian;
+  if (NeedsBswap && !TLI.isOperationLegal(ISD::BSWAP, VT))
+    return SDValue();
----------------
filcab wrote:
> RKSimon wrote:
> > Would this work?
> > ```
> > if (NeedsBswap && LegalOperations && !TLI.isOperationLegal(ISD::BSWAP, VT))
> > ```
> I wonder if it's useful to generate a bswap only to change it back later. Do you have an example of something llvm already does? Or would this be a future optimization possibility?
As a result we have a single load followed by an instruction sequence doing the swap. E.g. for load_i32_by_i8_bswap from test/CodeGen/ARM/load-combine.ll we'll have:
```
	ldr	r0, [r0]
	mov	r1, #65280
	mov	r2, #16711680
	and	r1, r1, r0, lsr #8
	and	r2, r2, r0, lsl #8
	orr	r1, r1, r0, lsr #24
	orr	r0, r2, r0, lsl #24
	orr	r0, r0, r1
```
instead of 
```
	ldrb	r2, [r0, #1]
	ldrb	r1, [r0]
	ldrb	r3, [r0, #2]
	ldrb	r0, [r0, #3]
	lsl	r2, r2, #16
	orr	r1, r2, r1, lsl #24
	orr	r1, r1, r3, lsl #8
	orr	r0, r1, r0
```
Assuming that shuffling bytes in a register is cheaper that loading from memory it looks like a generally good transformation.

https://reviews.llvm.org/D27861