[llvm] 9fcd212 - [X86] Remove isel patterns from broadcast of loadi32.

Mon Mar 2 12:36:36 PST 2020

This was specifically looking for extload not zextload/sextload. So the
SelectionDAG said do a 16-bit or 8-bit load, extend it however you like to
i32 and broadcast those 32-bits. The patterns I removed recognized that the
load was aligned and that the upper bits of i32 elements were allowed to be
garbage, so it just loaded 32-bit and broadcasted it.

In your example, the upper bits of the i64 elements are expected to be 0
right?

~Craig

On Mon, Mar 2, 2020 at 12:27 PM Philip Reames <listmail at philipreames.com>
wrote:

> Craig,
>
> I might not be understanding you correctly, but on the surface, this
> seems like a fairly common case.  Wouldn't something like the following
> trigger this?
>
> struct T {
>    uint64_t j;
>    uint8_t k;
> }
>
> void foo(uint64_t *a, struct T& t)
> for (int i = 0; i < N; i++) {
>    a[i] += (uint64_t)t.k;
> }
>
> Given an 8 byte alignment of objects, and a packed layout the 8 bit
> field would have an 8 byte starting alignment.  After vectorization, I'd
> expect to see a load of the field outside the loop followed by an extend
> and broadcast to VF x i64.  Wouldn't that create exactly the pattern you
> removed?
>
> Philip
>
> On 2/28/20 4:39 PM, Craig Topper via llvm-commits wrote:
> > Author: Craig Topper
> > Date: 2020-02-28T16:39:27-08:00
> > New Revision: 9fcd212e2f678fdbdf304399a1e58ca490dc54d1
> >
> > URL:
> https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1
> > DIFF:
> https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1.diff
> >
> > LOG: [X86] Remove isel patterns from broadcast of loadi32.
> >
> > We already combine non extending loads with broadcasts in DAG
> > combine. All these patterns are picking up is the aligned extload
> > special case. But the only lit test we have that exercsises it is
> > using v8i1 load that datalayout is reporting align 8 for. That
> > seems generous. So without a realistic test case I don't think
> > there is much value in these patterns.
> >
> > Added:
> >
> >
> > Modified:
> >      llvm/lib/Target/X86/X86InstrAVX512.td
> >      llvm/lib/Target/X86/X86InstrSSE.td
> >      llvm/test/CodeGen/X86/vector-sext.ll
> >
> > Removed:
> >
> >
> >
> >
> ################################################################################
> > diff  --git a/llvm/lib/Target/X86/X86InstrAVX512.td
> b/llvm/lib/Target/X86/X86InstrAVX512.td
> > index a2bd6a2853a0..1d3ef67c9d3d 100644
> > --- a/llvm/lib/Target/X86/X86InstrAVX512.td
> > +++ b/llvm/lib/Target/X86/X86InstrAVX512.td
> > @@ -1427,10 +1427,6 @@ let Predicates = [HasAVX512] in {
> >     // 32-bit targets will fail to load a i64 directly but can use
> ZEXT_LOAD.
> >     def : Pat<(v8i64 (X86VBroadcast (v2i64 (X86vzload64 addr:$src)))),
> >               (VPBROADCASTQZrm addr:$src)>;
> > -
> > -  // FIXME this is to handle aligned extloads from i8.
> > -  def : Pat<(v16i32 (X86VBroadcast (loadi32 addr:$src))),
> > -            (VPBROADCASTDZrm addr:$src)>;
> >   }
> >
> >   let Predicates = [HasVLX] in {
> > @@ -1439,12 +1435,6 @@ let Predicates = [HasVLX] in {
> >               (VPBROADCASTQZ128rm addr:$src)>;
> >     def : Pat<(v4i64 (X86VBroadcast (v2i64 (X86vzload64 addr:$src)))),
> >               (VPBROADCASTQZ256rm addr:$src)>;
> > -
> > -  // FIXME this is to handle aligned extloads from i8.
> > -  def : Pat<(v4i32 (X86VBroadcast (loadi32 addr:$src))),
> > -            (VPBROADCASTDZ128rm addr:$src)>;
> > -  def : Pat<(v8i32 (X86VBroadcast (loadi32 addr:$src))),
> > -            (VPBROADCASTDZ256rm addr:$src)>;
> >   }
> >   let Predicates = [HasVLX, HasBWI] in {
> >     // loadi16 is tricky to fold, because !isTypeDesirableForOp,
> justifiably.
> >
> > diff  --git a/llvm/lib/Target/X86/X86InstrSSE.td
> b/llvm/lib/Target/X86/X86InstrSSE.td
> > index e66f15747787..73bba723ab96 100644
> > --- a/llvm/lib/Target/X86/X86InstrSSE.td
> > +++ b/llvm/lib/Target/X86/X86InstrSSE.td
> > @@ -7529,12 +7529,6 @@ let Predicates = [HasAVX2, NoVLX] in {
> >               (VPBROADCASTQrm addr:$src)>;
> >     def : Pat<(v4i64 (X86VBroadcast (v2i64 (X86vzload64 addr:$src)))),
> >               (VPBROADCASTQYrm addr:$src)>;
> > -
> > -  // FIXME this is to handle aligned extloads from i8/i16.
> > -  def : Pat<(v4i32 (X86VBroadcast (loadi32 addr:$src))),
> > -            (VPBROADCASTDrm addr:$src)>;
> > -  def : Pat<(v8i32 (X86VBroadcast (loadi32 addr:$src))),
> > -            (VPBROADCASTDYrm addr:$src)>;
> >   }
> >   let Predicates = [HasAVX2, NoVLX_Or_NoBWI] in {
> >     // loadi16 is tricky to fold, because !isTypeDesirableForOp,
> justifiably.
> >
> > diff  --git a/llvm/test/CodeGen/X86/vector-sext.ll
> b/llvm/test/CodeGen/X86/vector-sext.ll
> > index 44ba29d978e2..0b35db5cadb2 100644
> > --- a/llvm/test/CodeGen/X86/vector-sext.ll
> > +++ b/llvm/test/CodeGen/X86/vector-sext.ll
> > @@ -2259,7 +2259,8 @@ define <8 x i32> @load_sext_8i1_to_8i32(<8 x i1>
> *%ptr) {
> >   ;
> >   ; AVX2-LABEL: load_sext_8i1_to_8i32:
> >   ; AVX2:       # %bb.0: # %entry
> > -; AVX2-NEXT:    vpbroadcastd (%rdi), %ymm0
> > +; AVX2-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
> > +; AVX2-NEXT:    vpbroadcastd %xmm0, %ymm0
> >   ; AVX2-NEXT:    vmovdqa {{.*#+}} ymm1 = [1,2,4,8,16,32,64,128]
> >   ; AVX2-NEXT:    vpand %ymm1, %ymm0, %ymm0
> >   ; AVX2-NEXT:    vpcmpeqd %ymm1, %ymm0, %ymm0
> >
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200302/62cefa75/attachment.html>