[PATCH] D40304: [InstCombine] PR35354: Convert load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast &V1, load bitcast &V2)

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 28 08:03:49 PST 2017


spatel added a comment.

In https://reviews.llvm.org/D40304#937624, @ABataev wrote:

> The original transformation `load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2)` also increases number of instructions from 2 to 3


Yes - that's why I said the existing transform is suspect. There are 2 InstCombine problems here as I think you've noted:

1. The transform that hoists loads ahead of select.
2. The transform that adds bitcasts around FP loads ("Try to canonicalize loads which are only ever stored to operate over integers instead of any other type.")

I've looked at this a bit closer now, and SimplifyCFG really wants to create a select for this:

  define float* @mymax(float* dereferenceable(4) %__a, float* dereferenceable(4) %__b) {
  entry:
    %__comp = alloca %"struct.std::__1::__less", align 1
    %call = call zeroext i1 @less(%"struct.std::__1::__less"* nonnull %__comp, float* nonnull dereferenceable(4) %__a, float* nonnull dereferenceable(4) %
  __b)
    br i1 %call, label %cond.true, label %cond.false
  
  cond.true:                                        ; preds = %entry
    br label %cond.end
  
  cond.false:                                       ; preds = %entry
    br label %cond.end
  
  cond.end:                                         ; preds = %cond.false, %cond.true
    %cond-lvalue = phi float* [ %__b, %cond.true ], [ %__a, %cond.false ]
    ret float* %cond-lvalue
  }

For this example, a select will be created by FoldTwoEntryPHINode(). If I stub that out, then a select will still be created in HoistThenElseCodeToIf(). If I stub that out, then a select will still be created in SpeculativelyExecuteBB(). So we would have to disable much more of SimplifyCFG to avoid creating the select. But even that is not enough - the bitcasts are interfering with further optimization.

If the consensus is that this instcombine bitcast transform is valid:

  target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
  define void @store_bitcasted_load2(float* %loadaddr, float* %storeaddr) {
    %ld = load float, float* %loadaddr
    store float %ld, float* %storeaddr
    ret void
  }

$ ./opt -instcombine -S bitcastload.ll

  define void @store_bitcasted_load2(float* %loadaddr, float* %storeaddr) {
    %1 = bitcast float* %loadaddr to i32*
    %ld1 = load i32, i32* %1, align 4
    %2 = bitcast float* %storeaddr to i32*
    store i32 %ld1, i32* %2, align 4
    ret void
  }

...then I think we have to account for that here and look through the bitcasts (as this patch is proposing). We could make this patch not create more instructions than it removes by starting the pattern match at the store instruction rather than the load?


https://reviews.llvm.org/D40304





More information about the llvm-commits mailing list