[PATCH] D40304: [InstCombine] PR35354: Convert load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast &V1, load bitcast &V2)

Tue Nov 28 08:03:49 PST 2017

spatel added a comment.

In https://reviews.llvm.org/D40304#937624, @ABataev wrote:

> The original transformation `load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2)` also increases number of instructions from 2 to 3

Yes - that's why I said the existing transform is suspect. There are 2 InstCombine problems here as I think you've noted:

1. The transform that hoists loads ahead of select.
2. The transform that adds bitcasts around FP loads ("Try to canonicalize loads which are only ever stored to operate over integers instead of any other type.")

I've looked at this a bit closer now, and SimplifyCFG really wants to create a select for this:

  define float* @mymax(float* dereferenceable(4) %__a, float* dereferenceable(4) %__b) {
  entry:
    %__comp = alloca %"struct.std::__1::__less", align 1
    %call = call zeroext i1 @less(%"struct.std::__1::__less"* nonnull %__comp, float* nonnull dereferenceable(4) %__a, float* nonnull dereferenceable(4) %
  __b)
    br i1 %call, label %cond.true, label %cond.false

  cond.true:                                        ; preds = %entry
    br label %cond.end

  cond.false:                                       ; preds = %entry
    br label %cond.end

  cond.end:                                         ; preds = %cond.false, %cond.true
    %cond-lvalue = phi float* [ %__b, %cond.true ], [ %__a, %cond.false ]
    ret float* %cond-lvalue
  }

For this example, a select will be created by FoldTwoEntryPHINode(). If I stub that out, then a select will still be created in HoistThenElseCodeToIf(). If I stub that out, then a select will still be created in SpeculativelyExecuteBB(). So we would have to disable much more of SimplifyCFG to avoid creating the select. But even that is not enough - the bitcasts are interfering with further optimization.

If the consensus is that this instcombine bitcast transform is valid:

  target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
  define void @store_bitcasted_load2(float* %loadaddr, float* %storeaddr) {
    %ld = load float, float* %loadaddr
    store float %ld, float* %storeaddr
    ret void
  }

$ ./opt -instcombine -S bitcastload.ll

  define void @store_bitcasted_load2(float* %loadaddr, float* %storeaddr) {
    %1 = bitcast float* %loadaddr to i32*
    %ld1 = load i32, i32* %1, align 4
    %2 = bitcast float* %storeaddr to i32*
    store i32 %ld1, i32* %2, align 4
    ret void
  }

...then I think we have to account for that here and look through the bitcasts (as this patch is proposing). We could make this patch not create more instructions than it removes by starting the pattern match at the store instruction rather than the load?

https://reviews.llvm.org/D40304