<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Ah, gotcha.  I'd missed the fact that "extload" was explicitly
      meaning "aextload" (i.e. any extend).  I agree that an any extend
      variant on this pattern seems a lot less likely.  Only case I can
      see that happening would be if the vector op had been widened
      beyond the interesting data type, and we were going to end up
      ignoring the high bits in the end.  <br>
    </p>
    <p>Philip<br>
    </p>
    <div class="moz-cite-prefix">On 3/2/20 12:36 PM, Craig Topper wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAF7ks-Ocoh4xfOjezbcP3GDKiNOa_RwVostvSLUhCLo7p_Fjsg@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div>This was specifically looking for extload not
          zextload/sextload. So the SelectionDAG said do a 16-bit or
          8-bit load, extend it however you like to i32 and broadcast
          those 32-bits. The patterns I removed recognized that the load
          was aligned and that the upper bits of i32 elements were
          allowed to be garbage, so it just loaded 32-bit and
          broadcasted it.</div>
        <div><br>
        </div>
        <div>In your example, the upper bits of the i64 elements are
          expected to be 0 right?</div>
        <div><br clear="all">
          <div>
            <div dir="ltr" class="gmail_signature"
              data-smartmail="gmail_signature">~Craig</div>
          </div>
          <br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Mon, Mar 2, 2020 at 12:27
          PM Philip Reames <<a
            href="mailto:listmail@philipreames.com"
            moz-do-not-send="true">listmail@philipreames.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Craig,<br>
          <br>
          I might not be understanding you correctly, but on the
          surface, this <br>
          seems like a fairly common case.  Wouldn't something like the
          following <br>
          trigger this?<br>
          <br>
          struct T {<br>
             uint64_t j;<br>
             uint8_t k;<br>
          }<br>
          <br>
          void foo(uint64_t *a, struct T& t)<br>
          for (int i = 0; i < N; i++) {<br>
             a[i] += (uint64_t)t.k;<br>
          }<br>
          <br>
          Given an 8 byte alignment of objects, and a packed layout the
          8 bit <br>
          field would have an 8 byte starting alignment.  After
          vectorization, I'd <br>
          expect to see a load of the field outside the loop followed by
          an extend <br>
          and broadcast to VF x i64.  Wouldn't that create exactly the
          pattern you <br>
          removed?<br>
          <br>
          Philip<br>
          <br>
          On 2/28/20 4:39 PM, Craig Topper via llvm-commits wrote:<br>
          > Author: Craig Topper<br>
          > Date: 2020-02-28T16:39:27-08:00<br>
          > New Revision: 9fcd212e2f678fdbdf304399a1e58ca490dc54d1<br>
          ><br>
          > URL: <a
href="https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1</a><br>
          > DIFF: <a
href="https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1.diff"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1.diff</a><br>
          ><br>
          > LOG: [X86] Remove isel patterns from broadcast of
          loadi32.<br>
          ><br>
          > We already combine non extending loads with broadcasts in
          DAG<br>
          > combine. All these patterns are picking up is the aligned
          extload<br>
          > special case. But the only lit test we have that
          exercsises it is<br>
          > using v8i1 load that datalayout is reporting align 8 for.
          That<br>
          > seems generous. So without a realistic test case I don't
          think<br>
          > there is much value in these patterns.<br>
          ><br>
          > Added:<br>
          >      <br>
          ><br>
          > Modified:<br>
          >      llvm/lib/Target/X86/X86InstrAVX512.td<br>
          >      llvm/lib/Target/X86/X86InstrSSE.td<br>
          >      llvm/test/CodeGen/X86/vector-sext.ll<br>
          ><br>
          > Removed:<br>
          >      <br>
          ><br>
          ><br>
          >
################################################################################<br>
          > diff  --git a/llvm/lib/Target/X86/X86InstrAVX512.td
          b/llvm/lib/Target/X86/X86InstrAVX512.td<br>
          > index a2bd6a2853a0..1d3ef67c9d3d 100644<br>
          > --- a/llvm/lib/Target/X86/X86InstrAVX512.td<br>
          > +++ b/llvm/lib/Target/X86/X86InstrAVX512.td<br>
          > @@ -1427,10 +1427,6 @@ let Predicates = [HasAVX512] in {<br>
          >     // 32-bit targets will fail to load a i64 directly
          but can use ZEXT_LOAD.<br>
          >     def : Pat<(v8i64 (X86VBroadcast (v2i64
          (X86vzload64 addr:$src)))),<br>
          >               (VPBROADCASTQZrm addr:$src)>;<br>
          > -<br>
          > -  // FIXME this is to handle aligned extloads from i8.<br>
          > -  def : Pat<(v16i32 (X86VBroadcast (loadi32
          addr:$src))),<br>
          > -            (VPBROADCASTDZrm addr:$src)>;<br>
          >   }<br>
          >   <br>
          >   let Predicates = [HasVLX] in {<br>
          > @@ -1439,12 +1435,6 @@ let Predicates = [HasVLX] in {<br>
          >               (VPBROADCASTQZ128rm addr:$src)>;<br>
          >     def : Pat<(v4i64 (X86VBroadcast (v2i64
          (X86vzload64 addr:$src)))),<br>
          >               (VPBROADCASTQZ256rm addr:$src)>;<br>
          > -<br>
          > -  // FIXME this is to handle aligned extloads from i8.<br>
          > -  def : Pat<(v4i32 (X86VBroadcast (loadi32
          addr:$src))),<br>
          > -            (VPBROADCASTDZ128rm addr:$src)>;<br>
          > -  def : Pat<(v8i32 (X86VBroadcast (loadi32
          addr:$src))),<br>
          > -            (VPBROADCASTDZ256rm addr:$src)>;<br>
          >   }<br>
          >   let Predicates = [HasVLX, HasBWI] in {<br>
          >     // loadi16 is tricky to fold, because
          !isTypeDesirableForOp, justifiably.<br>
          ><br>
          > diff  --git a/llvm/lib/Target/X86/X86InstrSSE.td
          b/llvm/lib/Target/X86/X86InstrSSE.td<br>
          > index e66f15747787..73bba723ab96 100644<br>
          > --- a/llvm/lib/Target/X86/X86InstrSSE.td<br>
          > +++ b/llvm/lib/Target/X86/X86InstrSSE.td<br>
          > @@ -7529,12 +7529,6 @@ let Predicates = [HasAVX2, NoVLX]
          in {<br>
          >               (VPBROADCASTQrm addr:$src)>;<br>
          >     def : Pat<(v4i64 (X86VBroadcast (v2i64
          (X86vzload64 addr:$src)))),<br>
          >               (VPBROADCASTQYrm addr:$src)>;<br>
          > -<br>
          > -  // FIXME this is to handle aligned extloads from
          i8/i16.<br>
          > -  def : Pat<(v4i32 (X86VBroadcast (loadi32
          addr:$src))),<br>
          > -            (VPBROADCASTDrm addr:$src)>;<br>
          > -  def : Pat<(v8i32 (X86VBroadcast (loadi32
          addr:$src))),<br>
          > -            (VPBROADCASTDYrm addr:$src)>;<br>
          >   }<br>
          >   let Predicates = [HasAVX2, NoVLX_Or_NoBWI] in {<br>
          >     // loadi16 is tricky to fold, because
          !isTypeDesirableForOp, justifiably.<br>
          ><br>
          > diff  --git a/llvm/test/CodeGen/X86/vector-sext.ll
          b/llvm/test/CodeGen/X86/vector-sext.ll<br>
          > index 44ba29d978e2..0b35db5cadb2 100644<br>
          > --- a/llvm/test/CodeGen/X86/vector-sext.ll<br>
          > +++ b/llvm/test/CodeGen/X86/vector-sext.ll<br>
          > @@ -2259,7 +2259,8 @@ define <8 x i32>
          @load_sext_8i1_to_8i32(<8 x i1> *%ptr) {<br>
          >   ;<br>
          >   ; AVX2-LABEL: load_sext_8i1_to_8i32:<br>
          >   ; AVX2:       # %bb.0: # %entry<br>
          > -; AVX2-NEXT:    vpbroadcastd (%rdi), %ymm0<br>
          > +; AVX2-NEXT:    vmovd {{.*#+}} xmm0 =
          mem[0],zero,zero,zero<br>
          > +; AVX2-NEXT:    vpbroadcastd %xmm0, %ymm0<br>
          >   ; AVX2-NEXT:    vmovdqa {{.*#+}} ymm1 =
          [1,2,4,8,16,32,64,128]<br>
          >   ; AVX2-NEXT:    vpand %ymm1, %ymm0, %ymm0<br>
          >   ; AVX2-NEXT:    vpcmpeqd %ymm1, %ymm0, %ymm0<br>
          ><br>
          ><br>
          >          <br>
          > _______________________________________________<br>
          > llvm-commits mailing list<br>
          > <a href="mailto:llvm-commits@lists.llvm.org"
            target="_blank" moz-do-not-send="true">llvm-commits@lists.llvm.org</a><br>
          > <a
            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
        </blockquote>
      </div>
    </blockquote>
  </body>
</html>