[PATCH] D64142: [SLP] try to create vector loads from bitcasted scalar pointers

Thu Jul 25 06:36:30 PDT 2019

ABataev added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/load-bitcast-vec.ll:7
 ; CHECK-LABEL: @matching_scalar(
-; CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x float>* [[P:%.*]] to float*
-; CHECK-NEXT:    [[R:%.*]] = load float, float* [[BC]], align 16
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x float>, <4 x float>* [[P:%.*]], align 16
+; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0
----------------
lebedev.ri wrote:
> spatel wrote:
> > lebedev.ri wrote:
> > > spatel wrote:
> > > > ABataev wrote:
> > > > > Seems to me, it must be masked load rather than just load. Plus, what about the cost? This does not look like cost optimal.
> > > > If the load is guaranteed dereferenceable, does that not allow speculated load of the entire vector?
> > > > 
> > > > I'm open to suggestions about the cost calc. It's not clear to me if there's an existing TTI API for this or if we need to create a new one?
> > > I agree that there is no reason this should be a maskedload.
> > > Do we have opposite folds for this in dagcombine?
> > > 
> > Yes - see narrowExtractedVectorLoad() in DAGCombiner.
> Then as far i'm concerned this is zero-cost change.
getCastInstrCost + getMemoryOpCost for scalar instructions.
getMemoryOpCost + getExtractWithExtendCost for vector instructions. No?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64142/new/

https://reviews.llvm.org/D64142