[llvm-dev] Opportunity to split store of shuffled vector.

Tue Oct 8 12:35:19 PDT 2019

InstCombine is not intended to use a cost model, so I don't think that is
an option for store splitting.
This should probably be implemented as a pattern match starting from
DAGCombiner::visitSTORE() and using a TLI hook. The TLI function would
include parameters for the vector value type and a number to specify the
maximum scalar stores the target is willing to create to replace the vector
load+store. So for example, we could decide it is profitable to replace
load+store of v4f32 with two f32 stores, but it's not profitable to replace
load+store of v16i8 with fifteen i8 stores.

On Tue, Oct 8, 2019 at 1:58 PM Qiu Chaofan <fwage73 at gmail.com> wrote:

> > would it perhaps make sense to canonicalize this to a masked store?
>
> Thanks for the reminder. It should help. There should also be
> something missing/limited in `InstCombineVectorOps`.
>
> > Should this have been translated to GEP+scalar stores by clang rather
> than vector load+store?
>
> Thanks for your comment! Implementing this in Clang (at `CGExpr`) is a
> reasonable way. But it may break some existing assumptions/assertions,
> and won't benefit other language.
>
> In the same time, I think, if we want to do this in `InstCombine` or
> other backend transformations, it's important to build a cost model
> (like in SLP vectorizer). Otherwise, a series of STOREs split first
> and then combined is meaningless. Can I use existing infrastructure in
> `TargetTransformInfo` to determine if splitting them is better?
>
> Regards,
> Chaofan
>
> Sanjay Patel <spatel at rotateright.com> 于2019年10月4日周五 下午8:36写道：
> >
> > Canonicalizing to a masked store intrinsic is possible, but we might
> have to expand back to the load/shuffle/store sequence for targets that
> don't support masked store. And then you'd likely have to do the store
> splitting that I think is being requested for the original pattern anyway.
> >
> > But I'd like to step back to the premise - "LLVM seems to always
> generate vector instructions for vector operations in C, even it's just
> simple stores."
> > That's not correct. It's not LLVM that created the vector memory
> accesses. That's how the IR begins from clang:
> >
> > define void @foo(<4 x float>* %a) #0 {
> > entry:
> >   %a.addr = alloca <4 x float>*, align 8
> >   store <4 x float>* %a, <4 x float>** %a.addr, align 8, !tbaa !3
> >   %0 = load <4 x float>*, <4 x float>** %a.addr, align 8, !tbaa !3
> >   %1 = load <4 x float>, <4 x float>* %0, align 16
> >   %vecins = insertelement <4 x float> %1, float 1.000000e+00, i32 0
> >   store <4 x float> %vecins, <4 x float>* %0, align 16
> >   %2 = load <4 x float>*, <4 x float>** %a.addr, align 8, !tbaa !3
> >   %3 = load <4 x float>, <4 x float>* %2, align 16
> >   %vecins1 = insertelement <4 x float> %3, float 2.000000e+00, i32 3
> >   store <4 x float> %vecins1, <4 x float>* %2, align 16
> >   ret void
> > }
> >
> > Should this have been translated to GEP+scalar stores by clang rather
> than vector load+store?
> >
> >
> > On Thu, Oct 3, 2019 at 12:33 PM Nemanja Ivanovic via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >> Just out of curiosity,
> >> would it perhaps make sense to canonicalize this to a masked store?
> >>
> >> Nemanja
> >>
> >> On Thu, Sep 26, 2019 at 10:59 PM Qiu Chaofan via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>
> >>> > I may be missing something obvious, but what is `vector` defined as
> here? Can you provide a buildable example?
> >>>
> >>> Sorry, I should provide a cross-platform version using vector
> >>> extension of frontend :) `vector int` is a vector extension on
> >>> PowerPC, which is enabled if you set target to PowerPC platforms.
> >>> Example below should be successfully compiled in any platform:
> >>>
> >>>     typedef float v4sf __attribute__ ((vector_size(16)));
> >>>
> >>>     void foo(v4sf *a) {
> >>>       (*a)[0] = 1;
> >>>       (*a)[3] = 2;
> >>>     }
> >>>
> >>> And we can get the IR mentioned before:
> >>>
> >>>     %0 = load <4 x float>, <4 x float>* %a, align 16
> >>>     %vecins1 = shufflevector <4 x float> <float 1.000000e+00, float
> >>> undef, float undef, float 2.000000e+00>, <4 x float> %0, <4 x i32>
> >>> <i32 0, i32 5, i32 6, i32 3>
> >>>     store <4 x float> %vecins1, <4 x float>* %a, align 16
> >>>
> >>> Regards,
> >>> Qiu Chaofan
> >>>
> >>>
> >>> Florian Hahn <florian_hahn at apple.com> 于2019年9月26日周四 下午7:15写道：
> >>> >
> >>> > Hi
> >>> >
> >>> > > On Sep 26, 2019, at 10:53, Qiu Chaofan via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>> > >
> >>> > > Hi there,
> >>> > >
> >>> > > I notice that LLVM seems to always generate vector instructions for
> >>> > > vector operations in C, even it's just simple stores:
> >>> > >
> >>> > > void foo(vector int* c) {
> >>> > >  (*c)[0] = 1;
> >>> > >  (*c)[1] = 2;
> >>> > > }
> >>> > >
> >>> >
> >>> > I may be missing something obvious, but what is `vector` defined as
> here? Can you provide a buildable example?
> >>> >
> >>> > > %0 = load <4 x i32>, <4 x i32>* %c, align 16
> >>> > > %vecins1 = shufflevector <4 x i32> <i32 1, i32 2, i32 undef, i32
> >>> > > undef>, <4 x i32> %0, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
> >>> > > store <4 x i32> %vecins1, <4 x i32>* %c, align 16
> >>> > >
> >>> >
> >>> > For some reason, we load 4 elements from %c and write the last 2
> elements back unchanged. This causes sub-optimal codegen here. We could do
> a better job at dropping the writes of unchanged elements. But from the
> original code, it is not immediately obvious to me why we generate them in
> the first place. Maybe we could avoid generating them?
> >>> >
> >>> > Cheers,
> >>> > Florian
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/2b780296/attachment.html>