[PATCH] D64142: [SLP] try to create vector loads from bitcasted scalar pointers

Thu Jul 25 06:45:44 PDT 2019

lebedev.ri added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/load-bitcast-vec.ll:7
 ; CHECK-LABEL: @matching_scalar(
-; CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x float>* [[P:%.*]] to float*
-; CHECK-NEXT:    [[R:%.*]] = load float, float* [[BC]], align 16
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x float>, <4 x float>* [[P:%.*]], align 16
+; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0
----------------
ABataev wrote:
> lebedev.ri wrote:
> > spatel wrote:
> > > lebedev.ri wrote:
> > > > spatel wrote:
> > > > > ABataev wrote:
> > > > > > Seems to me, it must be masked load rather than just load. Plus, what about the cost? This does not look like cost optimal.
> > > > > If the load is guaranteed dereferenceable, does that not allow speculated load of the entire vector?
> > > > > 
> > > > > I'm open to suggestions about the cost calc. It's not clear to me if there's an existing TTI API for this or if we need to create a new one?
> > > > I agree that there is no reason this should be a maskedload.
> > > > Do we have opposite folds for this in dagcombine?
> > > > 
> > > Yes - see narrowExtractedVectorLoad() in DAGCombiner.
> > Then as far i'm concerned this is zero-cost change.
> getCastInstrCost + getMemoryOpCost for scalar instructions.
> getMemoryOpCost + getExtractWithExtendCost for vector instructions. No?
> Then as far i'm concerned this is zero-cost change.

... in the sense that if the further passes won't make more use of this load,
it is guaranteed to be demoted back into simple scalar load.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64142/new/

https://reviews.llvm.org/D64142