<div dir="ltr"><div>I haven't looked into actually implementing revectorization, so we may just want to ignore that possibility for now. <br><br>But I imagined that revectorization could hit the same problem that we're trying to avoid here: if the cost models say that wider vectors are legal and cheaper, but the reality is that perf will suffer when using those wider vectors, then we want to avoid using the wider ops. The user pref/override will be taken into account when deciding if we should go wider.<br><br></div>In either scenario, we're not actually removing or limiting vector widths, right? They're still legal as far as the ISA is concerned. We're just avoiding those ops because the programmer and/or the CPU model says we'll do better with narrower ops.<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 14, 2017 at 10:26 AM, Craig Topper via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">For the re-vectorization case mentioned by Sanjay. That seems like a different type of limit than what's being proposed here. For revectorization you want to remove smaller vector widths. This is removing larger vector widths. I don't think we want the -mprefer-vector-width=256 being proposed here to say we can't do 128-bit vectors with the 256-bit. Maybe this should be called -mlimit-vector-width?<div><br></div><div>Its not clear to be why revectorization would need a preference at all? Shouldn't we be able to decide from the cost models? We go from scalar to vector today based on cost models. Why couldn't we go from vector to wider vector?</div></div><div class="gmail_extra"><br clear="all"><div><div class="m_-5672491955778672750gmail_signature" data-smartmail="gmail_signature">~Craig</div></div><div><div class="h5">
<br><div class="gmail_quote">On Mon, Nov 13, 2017 at 3:54 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><span>
    <p><br>
    </p>
    <br>
    <div class="m_-5672491955778672750m_-938083871188661067moz-cite-prefix">On 11/13/2017 05:49 PM, Eric
      Christopher wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr"><br>
        <br>
        <div class="gmail_quote">
          <div dir="ltr">On Mon, Nov 13, 2017 at 2:15 PM Craig Topper
            via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote">On Sat, Nov 11, 2017 at 8:52
                  PM, Hal Finkel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>
                  wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div bgcolor="#FFFFFF"><span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-">
                        <p><br>
                        </p>
                        <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527moz-cite-prefix">On
                          11/11/2017 09:52 PM, UE US via llvm-dev wrote:<br>
                        </div>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div>If skylake is that bad at AVX2</div>
                          </div>
                        </blockquote>
                        <br>
                      </span> I don't think this says anything negative
                      about AVX2, but AVX-512.</div>
                  </blockquote>
                </div>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Right. I think we're at AVX/AVX2 is "bad" on
            Haswell/Broadwell and AVX512 is "bad" on Skylake. At least
            in the "random autovectorization spread out" aspect.</div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div bgcolor="#FFFFFF"><span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-"><br>
                        <br>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div> it belongs in -mcpu / -march IMO. </div>
                          </div>
                        </blockquote>
                        <br>
                      </span> No. We'd still want to enable the
                      architectural features for vector intrinsics and
                      the like.</div>
                  </blockquote>
                  <div><br>
                  </div>
                </div>
              </div>
            </div>
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div>I took this to mean that the feature should be
                    enabled by default for -march=skylake-avx512.</div>
                </div>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div><br>
          </div>
          <div>Agreed.</div>
        </div>
      </div>
    </blockquote>
    <br></span>
    Yes. Also, GNOMETOYS clarified to me (off list) that is what he
    meant.<span class="m_-5672491955778672750HOEnZb"><font color="#888888"><br>
    <br>
     -Hal</font></span><div><div class="m_-5672491955778672750h5"><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>-eric</div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div><br>
                  </div>
                  <div> </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div bgcolor="#FFFFFF"><span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-"><br>
                        <br>
                        <blockquote type="cite">Based on the current
                          performance data we're seeing, we think we
                          need to ultimately default skylake-avx512 to
                          -mprefer-vector-width=256.</blockquote>
                        <br>
                      </span> Craig, is this for both integer and
                      floating-point code?</div>
                  </blockquote>
                  <div><br>
                  </div>
                </div>
              </div>
            </div>
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div>I believe so, but I'll try to get confirmation
                    from the people with more data.</div>
                </div>
              </div>
            </div>
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div> </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div bgcolor="#FFFFFF"><span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-HOEnZb"><font color="#888888"><br>
                          <br>
                           -Hal <br>
                        </font></span>
                      <div>
                        <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-h5"> <br>
                          <blockquote type="cite">
                            <div dir="ltr">
                              <div>   Most people will build for the
                                standard x86_64-pc-linux or whatever
                                anyway,  and completely ignore the
                                change. This will mainly affect those
                                who build their own software and
                                optimize for their system, and lots
                                there have probably caught on to this
                                already.  I always thought that's what
                                -march was made for, really. <br>
                              </div>
                            </div>
                            <div class="gmail_extra"><br clear="all">
                              <div>
                                <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527gmail_signature">GNOMETOYS<br>
                                </div>
                              </div>
                              <br>
                              <div class="gmail_quote">On Sat, Nov 11,
                                2017 at 10:25 AM, Sanjay Patel via
                                llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>
                                wrote:<br>
                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                  <div dir="ltr">
                                    <div>
                                      <div>Yes - I was thinking of
                                        FeatureFastScalarFSQRT /
                                        FeatureFastVectorFSQRT which are
                                        used by isFsqrtCheap(). These
                                        were added to override the
                                        default x86 sqrt estimate
                                        codegen with:<br>
                                        <a href="https://reviews.llvm.org/D21379" target="_blank">https://reviews.llvm.org/D2137<wbr>9</a><br>
                                        <br>
                                      </div>
                                      But I'm not sure we really need
                                      that kind of hack. Can we adjust
                                      the attribute in clang based on
                                      the target cpu? Ie, if you have
                                      something like:<br>
                                    </div>
                                    $ clang -O2 -march=skylake-avx512
                                    foo.c<br>
                                    <br>
                                    Then you can detect that in the
                                    clang driver and pass
                                    -mprefer-vector-width=256 to clang
                                    codegen as an option? Clang codegen
                                    then adds that function attribute to
                                    everything it outputs. Then, the
                                    vectorizers and/or backend detect
                                    that attribute and adjust their
                                    behavior based on it. <br>
                                  </div>
                                </blockquote>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <div><br>
                  </div>
                </div>
              </div>
            </div>
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div>Do we have a precedent for setting a target
                    independent flag from a target specific cpu string
                    in the clang driver? Want to make sure I understand
                    what the processing on such a thing would look like.
                    Particularly to get the order right so the user can
                    override it.<br>
                  </div>
                </div>
              </div>
            </div>
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div> </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div bgcolor="#FFFFFF">
                      <div>
                        <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-h5">
                          <blockquote type="cite">
                            <div class="gmail_extra">
                              <div class="gmail_quote">
                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                  <div dir="ltr"> <br>
                                    So I don't think we should be
                                    messing with any kind of type
                                    legality checking because that stuff
                                    should all be correct already. We're
                                    just choosing a vector size based on
                                    a pref. I think we should even allow
                                    the pref to go bigger than a legal
                                    type. This came up somewhere on
                                    llvm-dev or in a bug recently in the
                                    context of vector reductions.<br>
                                    <br>
                                    <br>
                                  </div>
                                  <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527HOEnZb">
                                    <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527h5">
                                      <div class="gmail_extra"><br>
                                        <div class="gmail_quote">On Fri,
                                          Nov 10, 2017 at 6:04 PM, Craig
                                          Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span>
                                          wrote:<br>
                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                            <div dir="ltr">Are you
                                              referring to
                                              the X86TargetLowering::isFsqrt<wbr>Cheap
                                              hook?</div>
                                            <div class="gmail_extra"><br clear="all">
                                              <div>
                                                <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527m_6454106954572217318m_771050129279988374gmail_signature">~Craig</div>
                                              </div>
                                              <br>
                                              <div class="gmail_quote">On
                                                Fri, Nov 10, 2017 at
                                                7:39 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span>
                                                wrote:<br>
                                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                  <div dir="ltr">We can
                                                    tie a user
                                                    preference /
                                                    override to a CPU
                                                    model. We do
                                                    something like that
                                                    for square root
                                                    estimates already
                                                    (although it does
                                                    use a
                                                    SubtargetFeature
                                                    currently for x86;
                                                    ideally, we'd key
                                                    that off of
                                                    something in the CPU
                                                    scheduler model).
                                                    <div>
                                                      <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527m_6454106954572217318m_771050129279988374h5"><br>
                                                        <div>
                                                          <div class="gmail_extra"><br>
                                                          <div class="gmail_quote">On
                                                          Thu, Nov 9,
                                                          2017 at 4:21
                                                          PM, Craig
                                                          Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span>
                                                          wrote:<br>
                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                          <div dir="ltr">I
                                                          agree that a
                                                          less x86
                                                          specific
                                                          command line
                                                          makes sense.
                                                          I've been
                                                          having an
                                                          internal
                                                          discussions
                                                          with gcc folks
                                                          and their
                                                          evaluating
                                                          switching to
                                                          something like
-mprefer-vector-width=128/256/<wbr>512/none
                                                          <div><br>
                                                          </div>
                                                          <div>Based on
                                                          the current
                                                          performance
                                                          data we're
                                                          seeing, we
                                                          think we need
                                                          to ultimately
                                                          default
                                                          skylake-avx512
                                                          to
                                                          -mprefer-vector-width=256.
                                                          If we go with
                                                          a target
                                                          independent
                                                          option/implementation
                                                          is there
                                                          someway we
                                                          could still
                                                          affect the
                                                          default
                                                          behavior in a
                                                          target
                                                          specific way?</div>
                                                          </div>
                                                          <div class="gmail_extra"><br clear="all">
                                                          <div>
                                                          <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527m_6454106954572217318m_771050129279988374m_4887027107317541871m_-9050519988835790991gmail_signature">~Craig</div>
                                                          </div>
                                                          <br>
                                                          <div class="gmail_quote">On
                                                          Tue, Nov 7,
                                                          2017 at 9:06
                                                          AM, Sanjay
                                                          Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span>
                                                          wrote:<br>
                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                          <div dir="ltr">
                                                          <div>It's
                                                          clear from the
                                                          Intel docs how
                                                          this has
                                                          evolved, but
                                                          from a
                                                          compiler
                                                          perspective,
                                                          this isn't a
                                                          Skylake
                                                          "feature" :)
                                                          ... nor an
                                                          Intel feature,
                                                          nor an x86
                                                          feature. <br>
                                                          <br>
                                                          It's a generic
                                                          programmer
                                                          hint for any
                                                          target with
                                                          multiple
                                                          potential
                                                          vector
                                                          lengths. <br>
                                                          </div>
                                                          <div><br>
                                                          </div>
                                                          <div>On x86,
                                                          there's
                                                          already a
                                                          potential use
                                                          case for this
                                                          hint with a
                                                          different
                                                          starting
                                                          motivation:
                                                          re-vectorization.
                                                          That's where
                                                          we take C code
                                                          that uses
                                                          128-bit vector
                                                          intrinsics and
                                                          selectively
                                                          widen it to
                                                          256- or
                                                          512-bit vector
                                                          ops based on a
                                                          newer CPU
                                                          target than
                                                          the code was
                                                          originally
                                                          written for.<br>
                                                          <div><br>
                                                          </div>
                                                          <div>I think
                                                          it's just a
                                                          matter of time
                                                          before a
                                                          customer
                                                          requests the
                                                          same ability
                                                          for another
                                                          target (maybe
                                                          they already
                                                          have and I
                                                          don't know
                                                          about it). So
                                                          we should have
                                                          a solution
                                                          that
                                                          recognizes
                                                          that
                                                          possibility. <br>
                                                          </div>
                                                          <div><br>
                                                          </div>
                                                          </div>
                                                          Note that
                                                          having a
                                                          target-independent
                                                          implementation
                                                          in the
                                                          optimizer
                                                          doesn't
                                                          preclude a
                                                          flag alias in
                                                          clang to
                                                          maintain
                                                          compatibility
                                                          with gcc.
                                                          <div>
                                                          <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527m_6454106954572217318m_771050129279988374m_4887027107317541871m_-9050519988835790991h5"><br>
                                                          <div><br>
                                                          </div>
                                                          <div class="gmail_extra"><br>
                                                          <div class="gmail_quote">On
                                                          Tue, Nov 7,
                                                          2017 at 2:02
                                                          AM, Tobias
                                                          Grosser via
                                                          llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>
                                                          wrote:<br>
                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On
                                                          Fri, Nov 3,
                                                          2017, at
                                                          05:47, Craig
                                                          Topper via
                                                          llvm-dev
                                                          wrote:<br>
                                                          > That's a
                                                          very good
                                                          point about
                                                          the ordering
                                                          of the command
                                                          line options.<br>
                                                          > gcc's
                                                          current
                                                          implementation
                                                          treats
                                                          -mprefer-avx256
                                                          has "prefer
                                                          256 over<br>
                                                          > 512" and
-mprefer-avx128 as "prefer 128 over 256". Which feels weird for<br>
                                                          > other
                                                          reasons, but
                                                          has less of an
                                                          ordering
                                                          ambiguity.<br>
                                                          ><br>
                                                          >
                                                          -mprefer-avx128
                                                          has been in
                                                          gcc for many
                                                          years and
                                                          predates the
                                                          creation<br>
                                                          > of<br>
                                                          > avx512.
                                                          -mprefer-avx256
                                                          was added a
                                                          couple months
                                                          ago.<br>
                                                          ><br>
                                                          > We've had
                                                          an internal
                                                          conversation
                                                          with the
                                                          implementor of<br>
                                                          >
                                                          -mprefer-avx256<br>
                                                          > in gcc
                                                          about making
                                                          -mprefer-avx128
                                                          affect 512-bit
                                                          vectors as
                                                          well. I'll<br>
                                                          > bring up
                                                          the ambiguity
                                                          issue with
                                                          them.<br>
                                                          ><br>
                                                          > Do we
                                                          want to be
                                                          compatible
                                                          with gcc here?<br>
                                                          <br>
                                                          I certainly
                                                          believe we
                                                          would want to
                                                          be compatible
                                                          with gcc (if
                                                          we use<br>
                                                          the same
                                                          names).<br>
                                                          <br>
                                                          Best,<br>
                                                          Tobias<br>
                                                          <br>
                                                          ><br>
                                                          > ~Craig<br>
                                                          ><br>
                                                          > On Thu,
                                                          Nov 2, 2017 at
                                                          7:18 PM, Eric
                                                          Christopher
                                                          <<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>><br>
                                                          > wrote:<br>
                                                          ><br>
                                                          > ><br>
                                                          > ><br>
                                                          > > On
                                                          Thu, Nov 2,
                                                          2017 at 7:05
                                                          PM James Y
                                                          Knight via
                                                          llvm-dev <<br>
                                                          > > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
                                                          wrote:<br>
                                                          > ><br>
                                                          > >>
                                                          On Wed, Nov 1,
                                                          2017 at 7:35
                                                          PM, Craig
                                                          Topper via
                                                          llvm-dev <<br>
                                                          > >>
                                                          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
                                                          wrote:<br>
                                                          > >><br>
                                                          >
                                                          >>>
                                                          Hello all,<br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>> I
                                                          would like to
                                                          propose adding
                                                          the
                                                          -mprefer-avx256
                                                          and
                                                          -mprefer-avx128<br>
                                                          >
                                                          >>>
                                                          command line
                                                          flags
                                                          supported by
                                                          latest GCC to
                                                          clang. These
                                                          flags will be<br>
                                                          >
                                                          >>>
                                                          used to limit
                                                          the vector
                                                          register size
                                                          presented by
                                                          TTI to the
                                                          vectorizers.<br>
                                                          >
                                                          >>>
                                                          The backend
                                                          will still be
                                                          able to use
                                                          wider
                                                          registers for
                                                          code written<br>
                                                          >
                                                          >>>
                                                          using the
                                                          instrinsics in
                                                          x86intrin.h.
                                                          And the
                                                          backend will
                                                          still be able
                                                          to<br>
                                                          >
                                                          >>>
                                                          use AVX512VL
                                                          instructions
                                                          and the
                                                          additional
                                                          XMM16-31 and
                                                          YMM16-31<br>
                                                          >
                                                          >>>
                                                          registers.<br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          Motivation:<br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          -Using 512-bit
                                                          operations on
                                                          some Intel
                                                          CPUs may cause
                                                          a decrease in
                                                          CPU<br>
                                                          >
                                                          >>>
                                                          frequency that
                                                          may offset the
                                                          gains from
                                                          using the
                                                          wider register
                                                          size. See<br>
                                                          >
                                                          >>>
                                                          section 15.26
                                                          of Intel® 64
                                                          and IA-32
                                                          Architectures
                                                          Optimization
                                                          Reference<br>
                                                          >
                                                          >>>
                                                          Manual
                                                          published
                                                          October 2017.<br>
                                                          >
                                                          >>><br>
                                                          > >><br>
                                                          > >>
                                                          I note the doc
                                                          mentions that
                                                          256-bit AVX
                                                          operations
                                                          also have the
                                                          same<br>
                                                          > >>
                                                          issue with
                                                          reducing the
                                                          CPU frequency,
                                                          which is nice
                                                          to see
                                                          documented!<br>
                                                          > >><br>
                                                          > >>
                                                          There's also
                                                          the issues
                                                          discussed here
                                                          <<a href="http://www.agner.org/" rel="noreferrer" target="_blank">http://www.agner.org/</a><br>
                                                          > >>
optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time<br>
                                                          > >>
                                                          for the
                                                          256-bit
                                                          execution
                                                          pipeline,
                                                          which is
                                                          another issue
                                                          with using<br>
                                                          > >>
                                                          wide-vector
                                                          ops.<br>
                                                          > >><br>
                                                          > >><br>
                                                          > >>
                                                          -The vector
                                                          ALUs on ports
                                                          0 and 1 of the
                                                          Skylake Server
microarchitecture<br>
                                                          >
                                                          >>>
                                                          are only
                                                          256-bits wide.
                                                          512-bit
                                                          instructions
                                                          using these
                                                          ALUs must use
                                                          both<br>
                                                          >
                                                          >>>
                                                          ports. See
                                                          section 2.1 of
                                                          Intel® 64 and
                                                          IA-32
                                                          Architectures
                                                          Optimization<br>
                                                          >
                                                          >>>
                                                          Reference
                                                          Manual
                                                          published
                                                          October 2017.<br>
                                                          >
                                                          >>><br>
                                                          > >><br>
                                                          > >><br>
                                                          >
                                                          >>> 
                                                          Implementation
                                                          Plan:<br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          -Add
                                                          prefer-avx256
                                                          and
                                                          prefer-avx128
                                                          as
                                                          SubtargetFeatures
                                                          in X86.td not<br>
                                                          >
                                                          >>>
                                                          mapped to any
                                                          CPU.<br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          -Add
                                                          mprefer-avx256
                                                          and
                                                          mprefer-avx128
                                                          and the
                                                          corresponding<br>
                                                          >
                                                          >>>
                                                          -mno-prefer-avx128/256
                                                          options to
                                                          clang's driver
                                                          Options.td
                                                          file. I
                                                          believe<br>
                                                          >
                                                          >>>
                                                          this will
                                                          allow clang to
                                                          pass these
                                                          straight
                                                          through to the
-target-feature<br>
                                                          >
                                                          >>>
                                                          attribute in
                                                          IR.<br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          -Modify
                                                          X86TTIImpl::getRegisterBitWidt<wbr>h
                                                          to only return
                                                          512 if AVX512
                                                          is<br>
                                                          >
                                                          >>>
                                                          enabled and
                                                          prefer-avx256
                                                          and
                                                          prefer-avx128
                                                          is not set.
                                                          Similarly
                                                          return<br>
                                                          >
                                                          >>>
                                                          256 if AVX is
                                                          enabled and
                                                          prefer-avx128
                                                          is not set.<br>
                                                          >
                                                          >>><br>
                                                          > >><br>
                                                          > >>
                                                          Instead of
                                                          multiple flags
                                                          that have
                                                          difficult to
                                                          understand
                                                          intersecting<br>
                                                          > >>
                                                          behavior, one
                                                          flag with a
                                                          value would be
                                                          better. E.g.,
                                                          what should<br>
                                                          > >>
"-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the<br>
                                                          > >>
                                                          answer, it's
                                                          confusing.
                                                          (Similarly
                                                          with other
                                                          such
                                                          combinations).
                                                          Just a<br>
                                                          > >>
                                                          single arg
                                                          "-mprefer-avx={128/256/512}"
                                                          (with no "no"
                                                          version) seems
                                                          easier<br>
                                                          > >>
                                                          to understand
                                                          to me (keeping
                                                          the same
                                                          behavior as
                                                          you mention:
                                                          asking to<br>
                                                          > >>
                                                          prefer a
                                                          larger width
                                                          than is
                                                          supported by
                                                          your
                                                          architecture
                                                          should be fine<br>
                                                          > >>
                                                          but ignored).<br>
                                                          > >><br>
                                                          > >><br>
                                                          > > I
                                                          agree with
                                                          this. It's a
                                                          little more
                                                          plumbing as
                                                          far as
                                                          subtarget<br>
                                                          > >
                                                          features etc
                                                          (represent via
                                                          an optional
                                                          value or just
                                                          various "set
                                                          the avx<br>
                                                          > >
                                                          width"
                                                          features - the
                                                          latter being
                                                          easier, but
                                                          uglier),
                                                          however, it's<br>
                                                          > >
                                                          probably the
                                                          right thing to
                                                          do.<br>
                                                          > ><br>
                                                          > > I
                                                          was looking at
                                                          this myself
                                                          just a couple
                                                          weeks ago and
                                                          think this is
                                                          the<br>
                                                          > >
                                                          right
                                                          direction
                                                          (when and how
                                                          to turn things
                                                          off) - and
                                                          probably makes<br>
                                                          > >
                                                          sense to be a
                                                          default for
                                                          these
                                                          architectures?
                                                          We might end
                                                          up needing to<br>
                                                          > >
                                                          check a couple
                                                          of additional
                                                          TTI places,
                                                          but it sounds
                                                          like you're on
                                                          top<br>
                                                          > > of
                                                          it. :)<br>
                                                          > ><br>
                                                          > >
                                                          Thanks very
                                                          much for doing
                                                          this work.<br>
                                                          > ><br>
                                                          > >
                                                          -eric<br>
                                                          > ><br>
                                                          > ><br>
                                                          > >><br>
                                                          > >><br>
                                                          > >>
                                                          There may be
                                                          some other
                                                          backend
                                                          changes
                                                          needed, but I
                                                          plan to
                                                          address<br>
                                                          >
                                                          >>>
                                                          those as we
                                                          find them.<br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          At a later
                                                          point,
                                                          consider
                                                          making
                                                          -mprefer-avx256
                                                          the default
                                                          for<br>
                                                          >
                                                          >>>
                                                          Skylake Server
                                                          due to the
                                                          above
                                                          mentioned
                                                          performance
                                                          considerations.<br>
                                                          >
                                                          >>><br>
                                                          > >><br>
                                                          > >><br>
                                                          > >><br>
                                                          > >><br>
                                                          > >><br>
                                                          >
                                                          >>><br>
                                                          > >>
                                                          Does this
                                                          sound
                                                          reasonable?<br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          *Latest Intel
                                                          Optimization
                                                          manual
                                                          available
                                                          here:<br>
                                                          >
                                                          >>> <a href="https://software.intel.com/en-us/articles/intel-sdm#optimization" rel="noreferrer" target="_blank">https://software.intel.com/en-<wbr>us/articles/intel-sdm#optimiza<wbr>tion</a><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          -Craig Topper<br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          ______________________________<wbr>_________________<br>
                                                          >
                                                          >>>
                                                          LLVM
                                                          Developers
                                                          mailing list<br>
                                                          >
                                                          >>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
                                                          >
                                                          >>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
                                                          >
                                                          >>><br>
                                                          >
                                                          >>>
                                                          ______________________________<wbr>_________________<br>
                                                          > >>
                                                          LLVM
                                                          Developers
                                                          mailing list<br>
                                                          > >>
                                                          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
                                                          > >>
                                                          <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
                                                          > >><br>
                                                          > ><br>
                                                          >
                                                          ______________________________<wbr>_________________<br>
                                                          > LLVM
                                                          Developers
                                                          mailing list<br>
                                                          > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
                                                          > <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
______________________________<wbr>_________________<br>
                                                          LLVM
                                                          Developers
                                                          mailing list<br>
                                                          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
                                                          <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          </div>
                                                        </div>
                                                      </div>
                                                    </div>
                                                  </div>
                                                </blockquote>
                                              </div>
                                              <br>
                                            </div>
                                          </blockquote>
                                        </div>
                                        <br>
                                      </div>
                                    </div>
                                  </div>
                                  <br>
______________________________<wbr>_________________<br>
                                  LLVM Developers mailing list<br>
                                  <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
                                  <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
                                  <br>
                                </blockquote>
                              </div>
                              <br>
                            </div>
                            <br>
                            <fieldset class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527mimeAttachmentHeader"></fieldset>
                            <br>
                            <pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
                          </blockquote>
                          <br>
                        </div>
                      </div>
                      <span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-">
                        <pre class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
                      </span></div>
                    <br>
                    ______________________________<wbr>_________________<br>
                    LLVM Developers mailing list<br>
                    <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
                    <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
                    <br>
                  </blockquote>
                </div>
              </div>
            </div>
            ______________________________<wbr>_________________<br>
            LLVM Developers mailing list<br>
            <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
            <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
          </blockquote>
        </div>
      </div>
    </blockquote>
    <br>
    <pre class="m_-5672491955778672750m_-938083871188661067moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </div></div></div>

</blockquote></div><br></div></div></div>
<br>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div>