[PATCH] D30416: [InstCombine] Redo reduceLoadOpStoreWidth in instcombine for bitfield store optimization.

Mon Feb 27 15:01:30 PST 2017

wmi added a comment.

> Your testcase has dead instructions (%conv).

Will fix it.

> This is missing a significant safety check: we have to make sure the memory isn't modified between the load and the store.

Good catch! Will add the safety check.

> It's not clear that we want to perform this transformation so early in the pipeline... dealing with overlapping stores makes other optimizations more difficult. Do you have performance numbers for this?

What kind of optimizations are you worried about?  We get 20% improvement for an internal benchmark because of the change and I havn't run other benchmarks. I will get the numbers. If putting the optimization in instcombine is really an issue, we can move it to CodeGenPrepare.

================
Comment at: lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp:1331
+  // will handle patterns like either 0..01..1 or 1..10..01..1
+  uint64_t Mask = Cst->getValue().getSExtValue();
+  unsigned MaskLeadOnes = countLeadingOnes(Mask);
----------------
efriedma wrote:
> This will crash if the value is too large (e.g. i128).  You can use APInt operations instead.
bitfields wider than 32 bits are uncommon, so how about adding a limit of TBits <= 64?

================
Comment at: lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp:1388
+                               "uglygep");
+    Align = DL.getABITypeAlignment(NewTy);
+  }
----------------
efriedma wrote:
> What does the ABI type alignment have to do with this?
If StOffset is 0, the NewPtr is the same as Ptr and the alignment of original store can be reused. If StOffset is not 0, use the ABI type alignment. 

================
Comment at: test/Transforms/InstCombine/bitfield-store.ll:89
+; a4.f1 = n;
+; The bitfield store cannot be shrinked because the field is not 8/16/32 bits.
+; CHECK-LABEL: @test4(
----------------
efriedma wrote:
> I'm not following; I'm pretty sure we could shrink the store if we wanted to.  Or do you mean it isn't profitable to shrink it?
There is a correctness issue. If we shrink the store and generate store like below:

  %t0 = trunc i32 %n to i13
  store i13 %t0, i13* bitcast (%class.A4* @a4 to i13*), align 8
  ret void

llvm will generate code:
        andl    $8191, %edi             # imm = 0x1FFF
        movw    %di, a4(%rip)
        retq

and a4.f2 will be overwritten.

Repository:
  rL LLVM

https://reviews.llvm.org/D30416