[llvm] 9fcd212 - [X86] Remove isel patterns from broadcast of loadi32.

Philip Reames via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 2 13:27:31 PST 2020


Ah, gotcha.  I'd missed the fact that "extload" was explicitly meaning 
"aextload" (i.e. any extend).  I agree that an any extend variant on 
this pattern seems a lot less likely.  Only case I can see that 
happening would be if the vector op had been widened beyond the 
interesting data type, and we were going to end up ignoring the high 
bits in the end.

Philip

On 3/2/20 12:36 PM, Craig Topper wrote:
> This was specifically looking for extload not zextload/sextload. So 
> the SelectionDAG said do a 16-bit or 8-bit load, extend it however you 
> like to i32 and broadcast those 32-bits. The patterns I 
> removed recognized that the load was aligned and that the upper bits 
> of i32 elements were allowed to be garbage, so it just loaded 32-bit 
> and broadcasted it.
>
> In your example, the upper bits of the i64 elements are expected to be 
> 0 right?
>
> ~Craig
>
>
> On Mon, Mar 2, 2020 at 12:27 PM Philip Reames 
> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>
>     Craig,
>
>     I might not be understanding you correctly, but on the surface, this
>     seems like a fairly common case.  Wouldn't something like the
>     following
>     trigger this?
>
>     struct T {
>        uint64_t j;
>        uint8_t k;
>     }
>
>     void foo(uint64_t *a, struct T& t)
>     for (int i = 0; i < N; i++) {
>        a[i] += (uint64_t)t.k;
>     }
>
>     Given an 8 byte alignment of objects, and a packed layout the 8 bit
>     field would have an 8 byte starting alignment.  After
>     vectorization, I'd
>     expect to see a load of the field outside the loop followed by an
>     extend
>     and broadcast to VF x i64.  Wouldn't that create exactly the
>     pattern you
>     removed?
>
>     Philip
>
>     On 2/28/20 4:39 PM, Craig Topper via llvm-commits wrote:
>     > Author: Craig Topper
>     > Date: 2020-02-28T16:39:27-08:00
>     > New Revision: 9fcd212e2f678fdbdf304399a1e58ca490dc54d1
>     >
>     > URL:
>     https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1
>     > DIFF:
>     https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1.diff
>     >
>     > LOG: [X86] Remove isel patterns from broadcast of loadi32.
>     >
>     > We already combine non extending loads with broadcasts in DAG
>     > combine. All these patterns are picking up is the aligned extload
>     > special case. But the only lit test we have that exercsises it is
>     > using v8i1 load that datalayout is reporting align 8 for. That
>     > seems generous. So without a realistic test case I don't think
>     > there is much value in these patterns.
>     >
>     > Added:
>     >
>     >
>     > Modified:
>     >      llvm/lib/Target/X86/X86InstrAVX512.td
>     >      llvm/lib/Target/X86/X86InstrSSE.td
>     >      llvm/test/CodeGen/X86/vector-sext.ll
>     >
>     > Removed:
>     >
>     >
>     >
>     >
>     ################################################################################
>     > diff  --git a/llvm/lib/Target/X86/X86InstrAVX512.td
>     b/llvm/lib/Target/X86/X86InstrAVX512.td
>     > index a2bd6a2853a0..1d3ef67c9d3d 100644
>     > --- a/llvm/lib/Target/X86/X86InstrAVX512.td
>     > +++ b/llvm/lib/Target/X86/X86InstrAVX512.td
>     > @@ -1427,10 +1427,6 @@ let Predicates = [HasAVX512] in {
>     >     // 32-bit targets will fail to load a i64 directly but can
>     use ZEXT_LOAD.
>     >     def : Pat<(v8i64 (X86VBroadcast (v2i64 (X86vzload64
>     addr:$src)))),
>     >               (VPBROADCASTQZrm addr:$src)>;
>     > -
>     > -  // FIXME this is to handle aligned extloads from i8.
>     > -  def : Pat<(v16i32 (X86VBroadcast (loadi32 addr:$src))),
>     > -            (VPBROADCASTDZrm addr:$src)>;
>     >   }
>     >
>     >   let Predicates = [HasVLX] in {
>     > @@ -1439,12 +1435,6 @@ let Predicates = [HasVLX] in {
>     >               (VPBROADCASTQZ128rm addr:$src)>;
>     >     def : Pat<(v4i64 (X86VBroadcast (v2i64 (X86vzload64
>     addr:$src)))),
>     >               (VPBROADCASTQZ256rm addr:$src)>;
>     > -
>     > -  // FIXME this is to handle aligned extloads from i8.
>     > -  def : Pat<(v4i32 (X86VBroadcast (loadi32 addr:$src))),
>     > -            (VPBROADCASTDZ128rm addr:$src)>;
>     > -  def : Pat<(v8i32 (X86VBroadcast (loadi32 addr:$src))),
>     > -            (VPBROADCASTDZ256rm addr:$src)>;
>     >   }
>     >   let Predicates = [HasVLX, HasBWI] in {
>     >     // loadi16 is tricky to fold, because !isTypeDesirableForOp,
>     justifiably.
>     >
>     > diff  --git a/llvm/lib/Target/X86/X86InstrSSE.td
>     b/llvm/lib/Target/X86/X86InstrSSE.td
>     > index e66f15747787..73bba723ab96 100644
>     > --- a/llvm/lib/Target/X86/X86InstrSSE.td
>     > +++ b/llvm/lib/Target/X86/X86InstrSSE.td
>     > @@ -7529,12 +7529,6 @@ let Predicates = [HasAVX2, NoVLX] in {
>     >               (VPBROADCASTQrm addr:$src)>;
>     >     def : Pat<(v4i64 (X86VBroadcast (v2i64 (X86vzload64
>     addr:$src)))),
>     >               (VPBROADCASTQYrm addr:$src)>;
>     > -
>     > -  // FIXME this is to handle aligned extloads from i8/i16.
>     > -  def : Pat<(v4i32 (X86VBroadcast (loadi32 addr:$src))),
>     > -            (VPBROADCASTDrm addr:$src)>;
>     > -  def : Pat<(v8i32 (X86VBroadcast (loadi32 addr:$src))),
>     > -            (VPBROADCASTDYrm addr:$src)>;
>     >   }
>     >   let Predicates = [HasAVX2, NoVLX_Or_NoBWI] in {
>     >     // loadi16 is tricky to fold, because !isTypeDesirableForOp,
>     justifiably.
>     >
>     > diff  --git a/llvm/test/CodeGen/X86/vector-sext.ll
>     b/llvm/test/CodeGen/X86/vector-sext.ll
>     > index 44ba29d978e2..0b35db5cadb2 100644
>     > --- a/llvm/test/CodeGen/X86/vector-sext.ll
>     > +++ b/llvm/test/CodeGen/X86/vector-sext.ll
>     > @@ -2259,7 +2259,8 @@ define <8 x i32> @load_sext_8i1_to_8i32(<8
>     x i1> *%ptr) {
>     >   ;
>     >   ; AVX2-LABEL: load_sext_8i1_to_8i32:
>     >   ; AVX2:       # %bb.0: # %entry
>     > -; AVX2-NEXT:    vpbroadcastd (%rdi), %ymm0
>     > +; AVX2-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
>     > +; AVX2-NEXT:    vpbroadcastd %xmm0, %ymm0
>     >   ; AVX2-NEXT:    vmovdqa {{.*#+}} ymm1 = [1,2,4,8,16,32,64,128]
>     >   ; AVX2-NEXT:    vpand %ymm1, %ymm0, %ymm0
>     >   ; AVX2-NEXT:    vpcmpeqd %ymm1, %ymm0, %ymm0
>     >
>     >
>     >
>     > _______________________________________________
>     > llvm-commits mailing list
>     > llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>     > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200302/357533f2/attachment.html>


More information about the llvm-commits mailing list